Object Tracking and Face Recognition in Video Streams

10 Chapter 2. The TLD algorithm(a) Image patch reportedby short-term tracker.(b) Expanded to grid points.In this case, the object is centredin the expanded patch.(c) Translation (d) Scaling (e) RotationFigure 2.5: The affine transformations applied on an image patch during training. Thepatch reported by the tracker is first expanded to match one of the detector’s slidingwindow positions. A number of positive examples are then generated from the patch,where each example randomly combines the affine transformations.i.e. if the calculated value is below a certain threshold. Similarly, negative examples are onlyused if the component thinks the patch does depict the object, i.e. if the calculated value isabove a certain threshold. In other words, the area reported by the tracker is assumed tobe the correct patch, so components that disagree try to learn from it. Figure 2.6 illustratesthe concept.Since the training sets are based on the location reported by the tracker, the result of thetracker must be considered sufficiently good if the training stage is to be applied: the trackerresult must be valid, and the tracker must be more confident than the detector. Additionally,if the tracker was not deemed good enough in the previous frame, the confidence of thetracker must be above a certain threshold, i.e. the confidence of the tracker must be largeenough for training to start up again.Initial training examples are generated from the first frame, where the bounding box ofthe object is known. As such, there is no need to take into account the confidence or theforward-backward error. Positive examples are generated from the image patch defined bythe bounding box, and negative examples are generated from other parts of the frame.2.5 SummaryThe overall flow of TLD is shown in Figure 2.7. The short-term tracker and the objectdetector are both run on the current frame. If neither component reports a valid result, theobject is considered lost until it is re-detected. Otherwise, the result of the more confidentcomponent is reported. Furthermore, if the tracker is more confident than the detector, thelearning stage is applied.

Previous page

Next page

1

3

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

43

44

45

46

47

Object Tracking and Face Recognition in Video Streams

Create successful ePaper yourself

Delete template?

Save as template?