University of Minnesota Driven to Discover
U of MNUniversity of Minnesota
Center for Transportation Studies

Programs & Labs

ITS Sensor Header

Spring 2004

Monitoring activity in public places

Photo of Osama Masoud and Nikolaos Papanikolopoulos

Osama Masoud and Nikolaos Papanikolopoulos

Since the events of September 11, the surveillance of public spaces has taken on greater importance and urgency. As video cameras are increasingly used at vulnerable areas—bridges, seaports, and potentially on airplanes—the volume of video data generated will be enormous. It simply won't be feasible for human operators to monitor and evaluate it all.

According to Professor Nikolaos Papanikolopoulos, of the University's Department of Computer Science, autonomous vision-based systems are ideal for monitoring human activities in public places because they are more "attentive" than a human. A computer system could be used to first screen data, then highlight significant cases for human operators to evaluate.

Such systems have the potential to play a major role in national security, Papanikolopoulos said. But first, he and research associate Osama Masoud had to test the concepts on a smaller scale.

Metro Transit, the transit system for the Minneapolis-St. Paul metropolitan area, has a problem with illegal activities—specifically, drug dealing—at certain bus stops. Drug dealing is typically characterized by individuals loitering at a stop, and tends to move from bus stop to bus stop depending on factors such as police presence. What Papanikolopoulos proposed developing for Metro Transit was a vision-based system that would identify individuals from a video feed at a bus stop and track how much time they spend there.

Photo of bus stop

As individuals enter the scene, each is assigned a unique number. Besides monitoring their activity, the system can recognize those individuals who leave the scene and then return.

Drawing on his and Masoud's earlier work on human detection and crowd monitoring, Papanikolopoulos, along with graduate students Guillaume Gasser and Nathaniel Bird, developed such a system. Across the street from a busy bus stop on the University of Minnesota campus, the researchers installed a video camera to watch people come and go at their test site.

The system in action

Papanikolopoulos points out that their system uses standard equipment: an off-the-shelf video camera and a computer equipped with a Pentium 4 2.66 GHz processor and 1 GB main memory running Mircrosoft Windows 2000. The monitoring process itself is divided into three distinct phases: background subtraction, object tracking, and human recognition.

Background subtraction involves separating the background scene supplied by the video feed from the foreground. By comparing each new frame in a video sequence to a background model of the scene (without activity), the system can detect moving objects. These objects (represented by groups of pixels) are then separated from the background image and tracked. However, static background subtraction, while working fairly well in controlled environments, does not work as well for continuously changing environments such as those found outdoors—and in this case, at a bus stop, the researchers note. A wide range of possible lighting conditions, the existence of shadows, and objects such as street signs that can block the view of a given individual are just some of the problems that must be addressed. "I can give you a system that will work perfectly in a controlled environment, but outdoors, it's a big challenge," Papanikolopoulos says. So the researchers used a method based on an adaptive background modeling and subtraction technique known as nonparametric kernel density estimation, which is able to detect moving objects in outdoor environments with respect to changes in the background like changing illumination.

Object tracking. To enable the system to track objects in real time—here, people as they walk around a bus stop—the researchers developed algorithms to recognize pre-specified actions. Motion is extracted directly from an image sequence, and motion information is calculated using an Infinite Impulse Response (IIR) filter. Then, an algorithm compares this action with other actions that have been entered into a database, finds the best match, and labels or classified the action accordingly.

Although the method is limited by the scope of its action database, it does seem promising for identifying well-defined behaviors, Papanikolopoulos says.

As individuals enter the scene, the system assigns each a unique number and creates a database of these individuals. Because presence history information is generated for each target, the system can recognize individuals who leave the scene and return later. This is an important aspect for this particular application because individuals dealing drugs at a bus stop may come and go.

Human recognition. Research in the area of biometrics has produced various methods for identifying specific people, such as fingerprint and face recognition. For the purposes of the bus stop monitoring project, the researchers chose a short-term biometric technique—clothing color.

The system's human recognition module segments the image of an individual into three portions corresponding to the head, torso, and legs. Using the median color of each of these regions, two people can be quickly compared to see if they are the same person. Drawbacks to this method include recognizing individuals who are dressed alike, individuals who remove articles of clothing such as an overcoat, and people who cross into areas of deep shadows, Papanikolopoulos says.

Indeed, shadows present the most difficult challenge for this type of work, Papanikolopoulos says. Although the researchers have been able to address most of the problems with shadows, one remains: an individual wearing black clothing who casts a shadow. "This is difficult even for the human eye to separate," Papanikolopoulos says. Most of the current literature and traditional approaches offer no solution, he adds, although he is hopeful about the progress being made in this area.

Results of the researchers' test showed that the system could successfully track individuals in sparsely-populated outdoor scenes, with limited occlusion, in near real time. The human recognition algorithm was tested with a test set of 21 people with between three and nine images for each person (106 images total). By checking all possible combinations in this test set, the algorithm demonstrated an accuracy of 82 percent. The system was also robust in handling image size changes due to differences in perspective as an individual walked across the scene, Papanikolopoulos says.

What's next

Papanikolopoulos says future work could include improving the segmentation of body portions to better recognize individuals who have appeared in a scene previously, and using optical flow to determine which part of an image corresponds to head, torso, and legs to help identify individuals by improving median color recognition for those areas.

Expanding the system to recognize certain behaviors—such as stretching for extended periods of time without ever jogging, or leaving a package unattended—is also a priority, Papanikolopoulos says, especially since the Department of Homeland Security (DHS) has recently expressed interest in the research for the purpose of detecting security threats. The National Science Foundation, acting as the manager on the project for the DHS, is funding continued work on the system for national security applications—specifically, threat detection. The plan is to deploy the system at a transit stop in Philadelphia in early 2005.

Defining what constitutes a threat, or threatening behavior, may be the toughest issue, Papanikolopoulos admits. "For me, this is one critical question that we need to answer. Can we learn suspicious activity?" he says. One aspect may be detecting anxiety in an individual. While that may be possible (using infrared technology with video cameras), it may be difficult to judge whether the anxiety is due to a person's fear of flying or from the intent to commit an act of terrorism, for example.

If the potential uses for a system like this are numerous, so are the issues raised, Papanikolopoulos says. A major retailer could monitor customers' shopping behavior. Law enforcement could watch for impaired drivers. Auto insurance companies could analyze driving behavior. "As long as we have video feeds, someone is going to exploit this information and extract data," he says. "Where we draw the line is...a critical question for our society."

Related research