28.08.2014 Views

Online Body Tracking by a PTZ Camera in IP Surveillance System

Online Body Tracking by a PTZ Camera in IP Surveillance System

Online Body Tracking by a PTZ Camera in IP Surveillance System

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

The target actor also moves parallel, toward or away from the<br />

camera. Fig. 2 shows the <strong>in</strong>itial model selection and some<br />

frames obta<strong>in</strong>ed dur<strong>in</strong>g track<strong>in</strong>g.<br />

To evaluate our method, four metrics are used as expla<strong>in</strong>ed<br />

<strong>in</strong> Table I. We have done twenty experiments with<br />

the two <strong>IP</strong> cameras. The experiments are reported <strong>in</strong> Table<br />

II. Experiments are classified <strong>in</strong>to four classes based on the<br />

camera model, <strong>in</strong>itial model position from camera, and image<br />

resolution.<br />

B. Results<br />

We first present the results. Discussion follows <strong>in</strong> the next<br />

section. Fig.3 shows the d gp and d gc values for E 8 and E 14 .<br />

These distances allow us to verify if the tracker has lost the<br />

target, and if the target is near the ground-truth. If d gc is more<br />

than 1, target is lost (outside of FOV). For distances smaller<br />

than 0.6, the object is <strong>in</strong> the FOV, but for the range of (0.6 <<br />

d gc < 1), it depends if the centroid coord<strong>in</strong>ates of the target<br />

are <strong>in</strong>side the image.<br />

For E 8 , the target is always <strong>in</strong> the FOV. For E 14 , the target is<br />

lost several times between frame 59-80. For both experiments,<br />

d gp is small except when the target is lost. Table II shows<br />

the results of the five metrics with the system frame rate for<br />

all experiments. The algorithm is implemented on an Intel<br />

Xeon(R) 5150 <strong>in</strong> C++ us<strong>in</strong>g OpenCV. For d gc and d gp , we<br />

show the mean and variance of all experiments. For class 1<br />

and 3, because of the lower system frame rate, the method<br />

has lost the target several times, but eventually recovers. For<br />

experiments of class 1, the system frame rate is a little higher.<br />

Thus, there is fewer target lost (e.g. TF is smaller), less<br />

distance error (µ dgc and µ dgp ) and the results of precision<br />

are better (P is larger). Because of d EP and camera control,<br />

the error on µ dgc has effect on µ dgp and vice versa. For class<br />

2 and 4, the system frame rate is <strong>in</strong>creased because of smaller<br />

image size. T F for class 2 and 4 are smaller than class 1 and<br />

3. A faster system frame rate improves the results of T F , µ dgc ,<br />

µ dgp and P . ∆x m<strong>in</strong> and ∆x max <strong>in</strong> Table II are the m<strong>in</strong>imum<br />

and maximum length of target motion vector <strong>in</strong> number of<br />

pixels.<br />

In comparison of two Sony <strong>IP</strong> <strong>PTZ</strong> cameras, results from<br />

SNC-RZ50N (class 1 and class 2) are better than the results<br />

of SNC-RZ25N (class 3 and class 4).It is because of camera<br />

model, the SNC-RZ50N moves faster than SNC-RZ25N [19],<br />

[20].<br />

C. Discussion<br />

Results show that our algorithm can handle and overcome<br />

large motion (i.e. high values of ∆x max ) because of us<strong>in</strong>g<br />

a repetitive target detection scheme and a motion prediction<br />

technique that do not rely on spatial proximity. It will lose a<br />

target only if the target changes its motion direction suddenly<br />

and walks very fast <strong>in</strong> the opposite predicted position (e.g.<br />

experiments with T F ≠ 0). By us<strong>in</strong>g a repetitive detection<br />

scheme and comb<strong>in</strong><strong>in</strong>g it with a motion predictor, we can<br />

handle random motion between frames, as long as the target<br />

position is well predicted, and its appearance does not change<br />

significantly. The motion predictor is used to compensate<br />

the two delays τ 1 and τ 2 discussed <strong>in</strong> Section III, which<br />

may cause the target to exit the FOV. Small values of σ 2 d gp<br />

result from utiliz<strong>in</strong>g pyramidal Lucas Kanade optical flow<br />

for the motion extraction and comb<strong>in</strong><strong>in</strong>g it with a sampl<strong>in</strong>g<br />

method. This process of motion extraction with comb<strong>in</strong>ation<br />

of fuzzy classifier which uses color features to generate sample<br />

likelihood allow us to detect and track any mov<strong>in</strong>g objects.<br />

Generally, accord<strong>in</strong>g to the mean of distances, the location<br />

of the target is near to the ground-truth. The target is usually<br />

localized with<strong>in</strong> 1/4th of the image diagonal from the image<br />

center. With faster system frame rate the results of track<strong>in</strong>g<br />

have been improved significantly. In that case, we will have<br />

more frames to process. When localization fails, it is because<br />

of similarity or closeness of the color histogram of the target<br />

with other samples. The image resolution has effect on the<br />

system frame rate and thus on track<strong>in</strong>g error.<br />

In all experiments, there are scale changes to verify track<strong>in</strong>g<br />

aga<strong>in</strong>st scal<strong>in</strong>g. Our algorithm can overcome scal<strong>in</strong>g variations<br />

even <strong>in</strong> images with m<strong>in</strong>imum 5×5 face size(e.g. Fig.2(e) and<br />

(d)). It is because of us<strong>in</strong>g normalized color histogram and<br />

average color features. These two features are <strong>in</strong>dependent of<br />

the size of the target.<br />

Our method can also recover the track<strong>in</strong>g if it loses the<br />

object ( e.g. experiments with T F ≠ 0), because of the repetitive<br />

detection scheme. Of course, it is conditional to the object<br />

be<strong>in</strong>g <strong>in</strong> the FOV of the camera. Occlusions are handled <strong>in</strong><br />

the same way. However, when the object is occluded, another<br />

similar object will be tracked (the most likely sample) until the<br />

occlusion ends. This may cause the real target to become out<br />

of the FOV of the camera. Fig. 2 shows an example of shortterm<br />

occlusion handl<strong>in</strong>g. The proposed method can handle it<br />

<strong>in</strong> this case. In the reported experiments, occlusion did not<br />

cause difficulties. The duration of the experiments are short<br />

because the goal the algorithm will be zoom<strong>in</strong>g on target face<br />

and captur<strong>in</strong>g it for identification purpose.<br />

VI. CONCLUSION<br />

In this paper, an object track<strong>in</strong>g method is proposed and<br />

applied to human upper body track<strong>in</strong>g <strong>by</strong> <strong>IP</strong> <strong>PTZ</strong> camera<br />

<strong>in</strong> onl<strong>in</strong>e application. Human body track<strong>in</strong>g determ<strong>in</strong>es the<br />

location and size of each human body for each <strong>in</strong>put image<br />

of a video sequence. It can be used to get images of the<br />

face of a human target <strong>in</strong> different poses. It detects <strong>in</strong> every<br />

frame, candidate targets <strong>by</strong> extract<strong>in</strong>g mov<strong>in</strong>g objects us<strong>in</strong>g<br />

optical flow, and sampl<strong>in</strong>g around the image center. The target<br />

is detected among candidate target samples us<strong>in</strong>g a fuzzy<br />

classifier. Then, a movement command is sent to the camera<br />

us<strong>in</strong>g the target position and speed. Results show that our<br />

algorithm can handle and overcome large motion between two<br />

consecutive frames with low track fragmentation. Future work<br />

will be add<strong>in</strong>g camera zoom<strong>in</strong>g and enhanc<strong>in</strong>g robustness of<br />

the motion prediction to prevent the target from be<strong>in</strong>g out of<br />

the camera FOV.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!