In the not-so-distant-future, your every public action may be monitored by cameras that deliver video feeds to behavior tracking systems capable of analyzing your actions for suspicious elements in real-time. The system is called Deep Intermodal Video Analytics, DIVA for short, and it is currently a research project with the U.S. Office of the Director of National Intelligence’s Intelligence Advanced Research Projects Agency. As you may have guessed, it is being developed under the banner of fighting terrorism.
The Intelligence Advanced Research Projects Agency published a project synopsis last week which, in part, says:
The DIVA program will produce a common framework and software prototype for activity detection, person/object detection and recognition across a multicamera network. The impact will be the development of tools for forensic analysis, as well as real-time alerting for user-defined threat scenarios.
Things like user-defined threat scenarios sounds like a bunch of Newspeak, but on the IARPA website we find a more plain explanation of the program and examples. DIVA aims to “develop robust automated activity detection for a multi-camera streaming video environment.”
The program will be broken down into three phases, with each focusing on increasingly complex types of behavior tracking and monitoring. The first phase will focus on so-called “primitive activities,” with IARPA giving the following examples:
▪ Person getting into a vehicle,
▪ Person getting out of vehicle,
▪ Person carrying object.
The second phase will focus on the detection of “complex activities” like two people exchanging something or a person being picked up by a car. The third is described as focusing on “Person and object detection and recognition across multiple overlapping and nonoverlapping camera viewpoints.”
The program is described as aiming to use video collected from a vast array of cameras: indoor cameras, outdoor security cameras, handheld cameras, and body cameras, as well as infrared cameras and cameras that collect video “from other portion of the electromagnetic spectrum.”
The agency expects that experts across the following fields will participate in the creation of this framework:
Machine learning, deep learning or hierarchical modeling, artificial intelligence, object detection, recognition, person detection and re-identification, person action recognition, video activity detection, tracking across multiple non-overlapping camera viewpoints, 3D reconstruction from video, super-resolution, statistics, probability and mathematics.
The system is intended to monitor areas (it isn’t clear how broadly the agency anticipates this technology being used) for suspicious behaviors, aiding law enforcement and the intelligence community in spotting planned acts of terrorism, such as the Boston bombing, before they can be carried out.