02.02.2014 Views

Recognizing Cars in Aerial Imagery to Improve Orthophotos

Recognizing Cars in Aerial Imagery to Improve Orthophotos

Recognizing Cars in Aerial Imagery to Improve Orthophotos

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

published imagery. <strong>Cars</strong> are an obvious clutter <strong>in</strong> a 3D city<br />

model. Therefore we want <strong>to</strong> detect and remove them.<br />

Our efficient car detec<strong>to</strong>r exploits an on-l<strong>in</strong>e Boost<strong>in</strong>g algorithm<br />

for tra<strong>in</strong><strong>in</strong>g the detec<strong>to</strong>r. We use on-l<strong>in</strong>e learn<strong>in</strong>g with<br />

a human <strong>to</strong> build a good detec<strong>to</strong>r from only a few labeled<br />

samples, and we can successively improve the detec<strong>to</strong>r <strong>to</strong><br />

achieve satisfaction with the performance. We believe that<br />

our procedure can be applied <strong>to</strong> a range of different objects,<br />

not just cars. Of urgent <strong>in</strong>terest are people.<br />

In our experimental work, we demonstrated that we can<br />

tra<strong>in</strong> the detec<strong>to</strong>r with just a few hundred samples. This is<br />

far less than other approaches need, for example [27] report<br />

results with 7800 (positive and negative) examples. We believe<br />

that we can decrease the tra<strong>in</strong><strong>in</strong>g effort perhaps <strong>to</strong> a<br />

level where just two samples are needed.<br />

If we scale up our current results, as obta<strong>in</strong>ed from only two<br />

images with 500 cars, <strong>to</strong> apply <strong>to</strong> 400 non-redundant images<br />

with 100,000 cars, we would achieve an au<strong>to</strong>mated detection<br />

of 95000 cars, we would have <strong>to</strong> cope with 5000 false detections<br />

(which do not hurt for <strong>in</strong>pa<strong>in</strong>t<strong>in</strong>g) and we would use<br />

about 130 hours of comput<strong>in</strong>g (on a s<strong>in</strong>gle mach<strong>in</strong>e). Scal<strong>in</strong>g<br />

the work <strong>to</strong> use 3000 images for the same area and the<br />

same number of cars will be possible once we have started<br />

<strong>to</strong> exploit the high overlap imagery.<br />

6.3 Next Steps<br />

Our hope for further improvements is nourished by the availability<br />

of overlapp<strong>in</strong>g images, typically 10 images per ground<br />

patch. We envision an approach that au<strong>to</strong>matically improves<br />

the detec<strong>to</strong>r <strong>in</strong> an unsupervised fashion. The image<br />

overlaps will help <strong>to</strong> obta<strong>in</strong> additionally positive samples,<br />

e.g. when a car is detected <strong>in</strong> one image and not <strong>in</strong> the<br />

other.<br />

Our optimism is further encouraged by recent unpublished<br />

work that employs 3D <strong>in</strong>formation from the images <strong>to</strong> au<strong>to</strong>matically<br />

label the false negatives and use them for tra<strong>in</strong><strong>in</strong>g.<br />

We are also explor<strong>in</strong>g co-tra<strong>in</strong><strong>in</strong>g strategies by us<strong>in</strong>g<br />

extracted height field features <strong>to</strong> construct a shape-based<br />

car model <strong>in</strong> an appearance driven detection process.<br />

We speculate that a comb<strong>in</strong>ation of approaches will achieve a<br />

better performance and also a greater level of generalization<br />

<strong>to</strong>wards use for various objects and types of aerial imagery.<br />

We have recently proposed an approach called “conservative<br />

learn<strong>in</strong>g” [21]. There we have demonstrated that by<br />

us<strong>in</strong>g a comb<strong>in</strong>ation of generative and discrim<strong>in</strong>ative classifiers<br />

and a conservative update strategy a person detec<strong>to</strong>r<br />

can be learned <strong>in</strong> an unsupervised manner.<br />

7. ACKNOWLEDGMENTS<br />

This work has been supported by the APAFA Project No.<br />

813397, f<strong>in</strong>anced by the Austrian Research Promotion Agency<br />

(http://www.ffg.at) and the Virtual Earth Academic Research<br />

Collaboration funded by Microsoft.<br />

8. REFERENCES<br />

[1] S. Agarwal, A. Awan, and D. Roth. Learn<strong>in</strong>g <strong>to</strong> detect<br />

objects <strong>in</strong> images via a sparse, part-based<br />

representation. PAMI, 26(11):1475–1490, 2004.<br />

[2] M. Bertalmio, L. Vese, G. Sapiro, and S. Osher.<br />

Simultaneous structure and texture image <strong>in</strong>pa<strong>in</strong>t<strong>in</strong>g.<br />

Technical Report 02(47), UCLA CAM Report, 2002.<br />

[3] T. Chan and J. Shen. Mathematical models for local<br />

nontexture <strong>in</strong>pa<strong>in</strong>t<strong>in</strong>gs. SIAM J. Appl. Math.,<br />

62(3):1019–1043, 2001.<br />

[4] D. Comaniciu and P. Meer. Mean shift analysis and<br />

applications. In ICCV, pages 1197–1203, 1999.<br />

[5] N. Cornelis, B. Leibe, K. Cornelis, and L. van Gool.<br />

3d city model<strong>in</strong>g us<strong>in</strong>g cognitive loops. In Third<br />

International Symposium on 3D Data Process<strong>in</strong>g,<br />

Visualization and Transmission, 3DPVT, 2006.<br />

[6] A. Demiriz, K. Bennett, and J. Shawe-Taylor. L<strong>in</strong>ear<br />

programm<strong>in</strong>g boost<strong>in</strong>g via column generation.<br />

Mach<strong>in</strong>e Learn<strong>in</strong>g, 46:225–254, 2002.<br />

[7] R. Fergus, P. Perona, and A. Zisserman. Object class<br />

recognition by unsupervised scale-<strong>in</strong>variant learn<strong>in</strong>g.<br />

In In Proc. IEEE Conf. Computer Vision and Pattern<br />

Recognition, 2003., pages 264–271, 2003.<br />

[8] W. Foerstner. Etrims.<br />

http://www.ipb.uni-bonn.de/projects/etrims,<br />

2006.<br />

[9] Y. Freund and R. Schapire. A decision-theoretic<br />

generalization of on-l<strong>in</strong>e learn<strong>in</strong>g and an application <strong>to</strong><br />

boost<strong>in</strong>g. Journal of Computer and System Sciences,<br />

55(1):119–139, 1997.<br />

[10] H. Grabner and H. Bischof. On-l<strong>in</strong>e boost<strong>in</strong>g and<br />

vision. In Proc. of CVPR 2006, volume I, pages<br />

260–267. IEEE CS, 2006.<br />

[11] A. Klaus, M. Sormann, and K. Karner. Segment-based<br />

stereo match<strong>in</strong>g us<strong>in</strong>g belief propagation and a<br />

self-adapt<strong>in</strong>g dissimilarity measure. In Proc. ICPR<br />

2006, pages 15–18. IEEE CS, 2006.<br />

[12] K. Levi and Y. Weiss. Learn<strong>in</strong>g object detection from<br />

a small number of examples: The importance of good<br />

features. In CVPR, pages 53–60, 2004.<br />

[13] N. Littles<strong>to</strong>ne. Learn<strong>in</strong>g quickly when irrelevant<br />

attributes abound. Mach<strong>in</strong>e Learn<strong>in</strong>g, 2:285–318,<br />

1987. W<strong>in</strong>now algorithm.<br />

[14] T. Ojala, M. Pietikä<strong>in</strong>en, and T. Mäenpää.<br />

Multiresolution gray-scale and rotation <strong>in</strong>variant<br />

texture classification with local b<strong>in</strong>ary patterns.<br />

PAMI, 24(7):971–987, 2002.<br />

[15] A. Opelt, M. Fussenegger, A. P<strong>in</strong>z, and P. Auer. Weak<br />

hypotheses and boost<strong>in</strong>g for generic object detection<br />

and recognition. In Proc. ECCV2004, volume II, pages<br />

71–84, 2004.<br />

[16] N. Oza and S. Russell. Onl<strong>in</strong>e bagg<strong>in</strong>g and boost<strong>in</strong>g.<br />

In Proceed<strong>in</strong>gs Artificial Intelligence and Statistics,<br />

pages 105–112, 2001.<br />

[17] D. S. P. Viola, M.J. Jones. Detect<strong>in</strong>g pedestrians<br />

us<strong>in</strong>g patterns of motion and appearance. In<br />

Proceed<strong>in</strong>gs of the N<strong>in</strong>th IEEE Conference on<br />

Computer Vision (ICCV’03), volume 2, pages<br />

734–741, 2003. Pedestria dedection us<strong>in</strong>g AdaBoost.<br />

[18] J.-H. Park and Y.-K. Choi. On-l<strong>in</strong>e learn<strong>in</strong>g for active<br />

pattern recognition. In IEEE Signal Process<strong>in</strong>g<br />

Letters, pages 301–303, 1996.<br />

[19] J. Ponce, M. Herbert, C. Schmid, and A. Zisserman,<br />

edi<strong>to</strong>rs. Toward Category-Level Object Recognition.<br />

Spr<strong>in</strong>ger, 2006.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!