Everyone's On Board with IP Analytics
The why and how of on-camera analysis of video
Until recently, IP surveillance cameras sent only a multimedia stream to video management software, using well-defined formats and a fixed set of compression/decompression tools. So cooperation between video hardware manufacturers and software developers was stuck in a rut of sorts.
But now the market is bubbling with interest for "smart" security systems that feature more advanced tools. There is good reason for this: protection sites are becoming bigger and bigger, and cities are becoming ever "smarter". So traditional surveillance systems that support only video playback and a few standard detection tools are quickly fading from the picture.
Developers are hard at work on creating powerful tools for video analytics, in response to the roar of demand from end clients. Video analytics are in use at sites both big and small, distributed and strategic, industrial and athletic, infrastructure and retail. Thus integrated video management software has been built to meet the challenge of smart, next-generation protection for such sites as the Olympic facilities in Sochi and many others.
It's all about integration
Any would-be modern video management software has to be at the leading edge of IT and strong across the board. And the only way to make this happen is to pool the efforts and knowledge of industry leaders.
It is almost impossible to overstate the security benefits of this synergy between IP hardware manufacturers and software developers: some of the best results have been seen in integration of analytic tools, as included with IP cameras as part of on-board software. But the big payoff of this process is offloading the CPU load from servers onto the cameras themselves, freeing up the capacity needed to perform truly next-generation analytics at infinitely large or distributed sites.
This is why advanced video management software contains unique analytic tools that make security systems perform better and more efficiently. Here is how they work.
Searching archived video is one of the most important and time-consuming tasks in video surveillance. The faster and more accurate you can search, the more you can get out of your system. However, most systems today have very limited capabilities for analyzing their video archives. So the human factor is still king, as hundreds of operators are forced to spend hundreds of working hours looking for needles in haystacks.
With forensic search, this situation changes in a big way. It is more than a search tool (although it is one). It is a set of technologies that generate video metadata right at the moment of recording. The metadata database is the basis for quick and accurate analysis of archives. To find an event of interest later, just enter the necessary criteria: motion in certain areas, crossing of a line, size, color, direction, speed of object motion, and more. In just seconds, this advanced tool shows you all video fragments that meet these criteria. All-night viewing marathons are a thing of the past now. It has replaced them with fast, effective criteria-based forensic search.
This advanced tool is valuable at geographically or structurally complicated sites, when incident investigation becomes a time-intensive task. What if something unexpected happens that is not "pre-programmed" in a detection tool? Oftentimes we know what we want to find only after it has already happened. So no matter how precise or modern our detection tools might be, they cannot capture all potential events of interest since we cannot perfectly predict all possible situations we might want to find later.
We are forced to go back to the recorded video and in these cases it is critical that the system have a search tool for finding the relevant data as quickly as possible. This is why the most popular option for security systems at distributed sites will surely be the ability to go through prior incidents and take reactive measures. Quickly searching recorded video is a key ability.
Tools that allow ignoring time, looking right through it to see events
Often we know that an event happened within a certain timeframe (for example, with precision to a few hours or even day) but we are unable to use rapid search with MomentQuest2 since we do not know which criteria to specify. In this case, another innovative technology comes to the rescue. How does it function? It works by writing "tracks" of object movement to the archive in parallel with video. What are these tracks? The technology analyzes video in real time. This results in metadata that describes the movement of objects in the frame. This lets us separate moving objects from static ones, and then display multiple relevant objects on screen at the same time. When there is a small number of objects in the frame at any one time – when we're not dealing with a crowd – this view substantially reduces the time needed to review the timeframe in question.
To start viewing video, the user selects the timeframe of interest and sets a maximum number of objects to see in a frame simultaneously – say, no more than 10 objects at the same time. Then the user can start viewing practically right away: our system does the calculations in real time.
When can this tool be useful? For example, we might know some trait of an object. Say, we want to see what a man in a red jacket did this morning (9 AM to 1 PM) in a room. We set the time interval and start watching. If there were not too many people in that room this morning, we will probably see the man in the red jacket right away. Then you can click "pause" click on the man in the red jacket, and start viewing the corresponding video clip in normal mode.
Tools that monitor moving objects by automatically adjusting the level of digital zoom
One of the keys to the surveillance process is separating the wheat from the chaff so you can focus on key details. Operators must be careful and attentive. So tools that help operators to do their job better and reduce the room for human error are a great help.
Such kind of tools help to monitor moving objects by automatically adjusting the level of digital zoom. They show close-in video for parts of the frame that contain a moving object and follow it as it moves, just as a movie camera does when doing a close-up shot. The level of zoom is automatically selected to catch all moving objects in the frame. This function works with ordinary fixed cameras and fisheye cameras; in the later case, it acts as an ePTZ camera following the moving object.
Tools that allow simultaneously getting the "big picture" of everything happening at a protected site while obtaining detailed imagery of the objects moving around it
Except for "sterile zones" with minimal traffic, usually there is more than one object in a camera's field of view. This is almost always the case for complicated environments such as cities. So following the movements of multiple objects simultaneously using special tools and mathematical equations to predict movement between the fields of view of different cameras is a way to make security systems perform better and more effectively.
The feature allows simultaneously getting the "big picture" of everything happening at a protected site while obtaining detailed imagery of the objects moving around it.
Both sets of images can be recorded for later use, which is important for event investigation.
The feature requires at least two cameras: one is a panoramic camera, the second one a PTZ camera. The panoramic camera is configured with a tracker, which detects objects moving in the frame and calculates their coordinates. The position of the PTZ camera (its pan, tilt, and zoom values) are compared to the coordinates of the panoramic camera's field of view.
The object's coordinates in the field of view are mathematically converted into the pan/tilt/zoom values necessary for the PTZ camera to track the object. This continues until the object leaves the field of view or the user selects another object.
On-board generation of metadata on IP cameras
So unique analytic tools are suitable for when we already know what has happened and what footage we are looking for. If we know what has happened, then we can find the necessary video fragment in just seconds – drastically reducing the time necessary for review – and zoom in on the part of the frame that is of interest.
These are useful, amazing developments, no question about it! There is one "catch" though. At real-world sites, there is a brick wall that these efforts run into – excessive CPU use. Enormous CPU power is devoted to unpacking streams received from cameras, analyzing events in the scene, and generating metadata.
Metadata consists of markers that comprise a formal, logical description of everything that the camera sees. What an object is and where it is, what size, where it's going and how quickly, what color... this information is "lighter" than full-on video and is saved in parallel with the video.
First the software has to decompress incoming video, which is resource-intensive. Decompressing on a GPU or video card does not make a difference here. The reason is that video analytics are not compatible with GPU offloading: the stream is saved immediately to the GPU memory after decompression, which means that the stream cannot be analyzed.
So all the work to decompress and analyze video from cameras is left to server CPUs.
This puts tight restrictions on the number of cameras can be connected to a single server. High-definition cameras are particularly troublesome, confining the number to around ten per server.
But this situation does not live up to the expectations of the market or clients today. Project discussions center around per-server numbers of 100, 200 or even 500 cameras.
More servers mean a project that is too expensive, hard to support, and viable only at huge and well-funded sites.
So there is a "simple" question, then, that arises: why not analyze video using the camera itself? Isn't a modern IP camera just a rather powerful computer? That means that a camera can send two things to the server: the multimedia stream, as before, plus a description of what is happening in the frame – i.e., metadata. Then all of these amazing video analytic tools can be leveraged on any sort of project, regardless of size or budget.
Pros and cons of on-board analytics
There are some powerful pluses:
- Analysis is performed on the original image, before any distortion by compression/decompression. The quality of video analysis goes up.
- "Blind" servers do not need to decompress video (since they do not need to generate metadata anymore), which frees up CPU capacity. Security systems can be implemented with fewer computers while maintaining full functionality.
But there is one minus: Each manufacturer has its own algorithms for generating metadata. Some algorithms work better than others, or even provide different types of information altogether.
Leading IP vendors are already bundling powerful video analytic capabilities with their cameras. But almost no software actually supports these on-board features yet! It's as if nobody "knows how" to get metadata from a camera and save it to an archive for later use.
This is what has inspired software developers to so actively integrate the video analytic tools provided by leading world manufacturers, leveraging these opportunities to make surveillance a less expensive endeavor while providing revolutionary tools for parsing footage automatically.
Video analytics are already "here", in use at real-world sites as a real part of mission-critical systems. This is something to celebrate, but also an urgent call to developers: it is critical to integrate software with the analytic capabilities of today's IP cameras.
The first company to truly master this will walk away with an enormous victory, both in technology and in the marketplace. We look forward to getting there.