Online Body Tracking by a PTZ Camera in IP Surveillance System
Online Body Tracking by a PTZ Camera in IP Surveillance System
Online Body Tracking by a PTZ Camera in IP Surveillance System
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
The target actor also moves parallel, toward or away from the<br />
camera. Fig. 2 shows the <strong>in</strong>itial model selection and some<br />
frames obta<strong>in</strong>ed dur<strong>in</strong>g track<strong>in</strong>g.<br />
To evaluate our method, four metrics are used as expla<strong>in</strong>ed<br />
<strong>in</strong> Table I. We have done twenty experiments with<br />
the two <strong>IP</strong> cameras. The experiments are reported <strong>in</strong> Table<br />
II. Experiments are classified <strong>in</strong>to four classes based on the<br />
camera model, <strong>in</strong>itial model position from camera, and image<br />
resolution.<br />
B. Results<br />
We first present the results. Discussion follows <strong>in</strong> the next<br />
section. Fig.3 shows the d gp and d gc values for E 8 and E 14 .<br />
These distances allow us to verify if the tracker has lost the<br />
target, and if the target is near the ground-truth. If d gc is more<br />
than 1, target is lost (outside of FOV). For distances smaller<br />
than 0.6, the object is <strong>in</strong> the FOV, but for the range of (0.6 <<br />
d gc < 1), it depends if the centroid coord<strong>in</strong>ates of the target<br />
are <strong>in</strong>side the image.<br />
For E 8 , the target is always <strong>in</strong> the FOV. For E 14 , the target is<br />
lost several times between frame 59-80. For both experiments,<br />
d gp is small except when the target is lost. Table II shows<br />
the results of the five metrics with the system frame rate for<br />
all experiments. The algorithm is implemented on an Intel<br />
Xeon(R) 5150 <strong>in</strong> C++ us<strong>in</strong>g OpenCV. For d gc and d gp , we<br />
show the mean and variance of all experiments. For class 1<br />
and 3, because of the lower system frame rate, the method<br />
has lost the target several times, but eventually recovers. For<br />
experiments of class 1, the system frame rate is a little higher.<br />
Thus, there is fewer target lost (e.g. TF is smaller), less<br />
distance error (µ dgc and µ dgp ) and the results of precision<br />
are better (P is larger). Because of d EP and camera control,<br />
the error on µ dgc has effect on µ dgp and vice versa. For class<br />
2 and 4, the system frame rate is <strong>in</strong>creased because of smaller<br />
image size. T F for class 2 and 4 are smaller than class 1 and<br />
3. A faster system frame rate improves the results of T F , µ dgc ,<br />
µ dgp and P . ∆x m<strong>in</strong> and ∆x max <strong>in</strong> Table II are the m<strong>in</strong>imum<br />
and maximum length of target motion vector <strong>in</strong> number of<br />
pixels.<br />
In comparison of two Sony <strong>IP</strong> <strong>PTZ</strong> cameras, results from<br />
SNC-RZ50N (class 1 and class 2) are better than the results<br />
of SNC-RZ25N (class 3 and class 4).It is because of camera<br />
model, the SNC-RZ50N moves faster than SNC-RZ25N [19],<br />
[20].<br />
C. Discussion<br />
Results show that our algorithm can handle and overcome<br />
large motion (i.e. high values of ∆x max ) because of us<strong>in</strong>g<br />
a repetitive target detection scheme and a motion prediction<br />
technique that do not rely on spatial proximity. It will lose a<br />
target only if the target changes its motion direction suddenly<br />
and walks very fast <strong>in</strong> the opposite predicted position (e.g.<br />
experiments with T F ≠ 0). By us<strong>in</strong>g a repetitive detection<br />
scheme and comb<strong>in</strong><strong>in</strong>g it with a motion predictor, we can<br />
handle random motion between frames, as long as the target<br />
position is well predicted, and its appearance does not change<br />
significantly. The motion predictor is used to compensate<br />
the two delays τ 1 and τ 2 discussed <strong>in</strong> Section III, which<br />
may cause the target to exit the FOV. Small values of σ 2 d gp<br />
result from utiliz<strong>in</strong>g pyramidal Lucas Kanade optical flow<br />
for the motion extraction and comb<strong>in</strong><strong>in</strong>g it with a sampl<strong>in</strong>g<br />
method. This process of motion extraction with comb<strong>in</strong>ation<br />
of fuzzy classifier which uses color features to generate sample<br />
likelihood allow us to detect and track any mov<strong>in</strong>g objects.<br />
Generally, accord<strong>in</strong>g to the mean of distances, the location<br />
of the target is near to the ground-truth. The target is usually<br />
localized with<strong>in</strong> 1/4th of the image diagonal from the image<br />
center. With faster system frame rate the results of track<strong>in</strong>g<br />
have been improved significantly. In that case, we will have<br />
more frames to process. When localization fails, it is because<br />
of similarity or closeness of the color histogram of the target<br />
with other samples. The image resolution has effect on the<br />
system frame rate and thus on track<strong>in</strong>g error.<br />
In all experiments, there are scale changes to verify track<strong>in</strong>g<br />
aga<strong>in</strong>st scal<strong>in</strong>g. Our algorithm can overcome scal<strong>in</strong>g variations<br />
even <strong>in</strong> images with m<strong>in</strong>imum 5×5 face size(e.g. Fig.2(e) and<br />
(d)). It is because of us<strong>in</strong>g normalized color histogram and<br />
average color features. These two features are <strong>in</strong>dependent of<br />
the size of the target.<br />
Our method can also recover the track<strong>in</strong>g if it loses the<br />
object ( e.g. experiments with T F ≠ 0), because of the repetitive<br />
detection scheme. Of course, it is conditional to the object<br />
be<strong>in</strong>g <strong>in</strong> the FOV of the camera. Occlusions are handled <strong>in</strong><br />
the same way. However, when the object is occluded, another<br />
similar object will be tracked (the most likely sample) until the<br />
occlusion ends. This may cause the real target to become out<br />
of the FOV of the camera. Fig. 2 shows an example of shortterm<br />
occlusion handl<strong>in</strong>g. The proposed method can handle it<br />
<strong>in</strong> this case. In the reported experiments, occlusion did not<br />
cause difficulties. The duration of the experiments are short<br />
because the goal the algorithm will be zoom<strong>in</strong>g on target face<br />
and captur<strong>in</strong>g it for identification purpose.<br />
VI. CONCLUSION<br />
In this paper, an object track<strong>in</strong>g method is proposed and<br />
applied to human upper body track<strong>in</strong>g <strong>by</strong> <strong>IP</strong> <strong>PTZ</strong> camera<br />
<strong>in</strong> onl<strong>in</strong>e application. Human body track<strong>in</strong>g determ<strong>in</strong>es the<br />
location and size of each human body for each <strong>in</strong>put image<br />
of a video sequence. It can be used to get images of the<br />
face of a human target <strong>in</strong> different poses. It detects <strong>in</strong> every<br />
frame, candidate targets <strong>by</strong> extract<strong>in</strong>g mov<strong>in</strong>g objects us<strong>in</strong>g<br />
optical flow, and sampl<strong>in</strong>g around the image center. The target<br />
is detected among candidate target samples us<strong>in</strong>g a fuzzy<br />
classifier. Then, a movement command is sent to the camera<br />
us<strong>in</strong>g the target position and speed. Results show that our<br />
algorithm can handle and overcome large motion between two<br />
consecutive frames with low track fragmentation. Future work<br />
will be add<strong>in</strong>g camera zoom<strong>in</strong>g and enhanc<strong>in</strong>g robustness of<br />
the motion prediction to prevent the target from be<strong>in</strong>g out of<br />
the camera FOV.