10.07.2015 Views

Text Localization & Segmentation in Images, Web Pages and Videos

Text Localization & Segmentation in Images, Web Pages and Videos

Text Localization & Segmentation in Images, Web Pages and Videos

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Related Work1. Y. Zhong, K. Karu <strong>and</strong> A. K. Ja<strong>in</strong>. Locat<strong>in</strong>g <strong>Text</strong> <strong>in</strong>Complex Color <strong>Images</strong>. Pattern Recognition, Vol. 28, No.10, pp. 1523-1535, October 1995.2. Ra<strong>in</strong>er Lienhart <strong>and</strong> Frank Stuber. Automatic <strong>Text</strong>Recognition <strong>in</strong> Digital <strong>Videos</strong>. In Image <strong>and</strong> VideoProcess<strong>in</strong>g IV 1996, Proc. SPIE 2666-20, pp. 180-188,Jan. 1996; also TR-95-036, Dec. 1995.3. B.-L. Yeo, B. Liu. Visual Content Highlightn<strong>in</strong>g viaAuromatic Extraction of Embedded Captions on MPEGCompressed Video. IS&T / SPIE Digital VideoCompression: Algorithms <strong>and</strong> Technologies, Feb. 1996.4. Ra<strong>in</strong>er Lienhart. Automatic <strong>Text</strong> Recognition for VideoIndex<strong>in</strong>g. Proc. ACM Multimedia 96, Boston, MA, Nov.1996, pp. 11-20.5. S. Sato <strong>and</strong> T. Kanade. NAME-IT: Association of Face<strong>and</strong> Name <strong>in</strong> Video. In Proceed<strong>in</strong>gs of IEEE ComputerSociety Conference on Computer Vision <strong>and</strong> PatternRecognition, San Juan, Puerto Rico, 17-19 June, 1997.6. Sato, T., Kanade, T., Hughes, E., Smith, M. Video OCRfor Digital News Archives. IEEE Workshop on Content-Based Access of Image <strong>and</strong> Video Databases(CAIVD'98), Bombay, India, January, 1998.7. Anil K. Ja<strong>in</strong> <strong>and</strong> B<strong>in</strong> Yu. Automatic <strong>Text</strong> Location <strong>in</strong><strong>Images</strong> <strong>and</strong> Video Frames. Pattern Recognition, Vol. 31,No. 12, pp. 2055-2076, 1998.8. H. Li, O. Kia <strong>and</strong> D. Doermann. <strong>Text</strong> Enhancement InDigital <strong>Videos</strong>. In Proceed<strong>in</strong>gs of SPIE99, DocumentRecognition <strong>and</strong> Retrieval, 1999.9. Ra<strong>in</strong>er Lienhart <strong>and</strong> Wolfgang Effelsberg. Automatic <strong>Text</strong><strong>Segmentation</strong> <strong>and</strong> <strong>Text</strong> Recognition for Video Index<strong>in</strong>g.ACM/Spr<strong>in</strong>ger Multimedia Systems Magaz<strong>in</strong>e, Vol. 8, pp.69-81, Jan. 2000.10. Huip<strong>in</strong>g Li, David Doemann, Omid Kia. Automatic textdetection <strong>and</strong> track<strong>in</strong>g <strong>in</strong> digital video. IEEE Transactionson Image Process<strong>in</strong>g, Vol. 9, No. 1, Jan. 2000.11. Daniel Loprestie <strong>and</strong> JiangY<strong>in</strong>g Zhou. Locat<strong>in</strong>g <strong>and</strong>Recogniz<strong>in</strong>g <strong>Text</strong> <strong>in</strong> WWW <strong>Images</strong>. Information Retrieval2 (Kluwer Academic Publishers.), 177-206, (2000).12. Axel Wernicke <strong>and</strong> Ra<strong>in</strong>er Lienhart. On the <strong>Segmentation</strong>of <strong>Text</strong> <strong>in</strong> <strong>Videos</strong>. IEEE Int. Conference on Multimedia<strong>and</strong> Expo (ICME2000), Vol.3, pp. 1511-1514, July 2000. More <strong>in</strong>formation at www.videoanalysis.orgRa<strong>in</strong>er Lienhart, Axel Wernicke. Localiz<strong>in</strong>g <strong>and</strong> Segment<strong>in</strong>g <strong>Text</strong> <strong>in</strong> <strong>Images</strong> <strong>and</strong> <strong>Videos</strong>.IEEE Transactions on Circuits <strong>and</strong> Systems for Video Technology, pp. 256-268, April 2002.19961998 20001 2 3 4 5 6 7 8 9,10 12 11© 2005-2009 Prof. Dr. Ra<strong>in</strong>er Lienhart, Head of Multimedia Comput<strong>in</strong>g, Institut für Informatik, Universität AugsburgEichleitnerstr. 30, D-86135 Augsburg, Germany; email: Ra<strong>in</strong>er.Lienhart@<strong>in</strong>formatik.uni-augsburg.de3


Design Decisions• What k<strong>in</strong>d of textoccurrences?– Scene text– Overlay text• With what style attributes?– Font size– Font type– <strong>Text</strong> color• In what k<strong>in</strong>d of media data?– Image-based– Video-basedanyboth• What should be achieved?– <strong>Localization</strong>– <strong>Segmentation</strong>– Recognition– Integrated recognition• How will the results beused?– Index<strong>in</strong>gboth– Object-based video encod<strong>in</strong>g© 2005-2009 Prof. Dr. Ra<strong>in</strong>er Lienhart, Head of Multimedia Comput<strong>in</strong>g, Institut für Informatik, Universität AugsburgEichleitnerstr. 30, D-86135 Augsburg, Germany; email: Ra<strong>in</strong>er.Lienhart@<strong>in</strong>formatik.uni-augsburg.de4


OverviewOCR result:Dec 25 1998© 2005-2009 Prof. Dr. Ra<strong>in</strong>er Lienhart, Head of Multimedia Comput<strong>in</strong>g, Institut für Informatik, Universität AugsburgEichleitnerstr. 30, D-86135 Augsburg, Germany; email: Ra<strong>in</strong>er.Lienhart@<strong>in</strong>formatik.uni-augsburg.de5


<strong>Text</strong> <strong>Localization</strong> (1/2)© 2005-2009 Prof. Dr. Ra<strong>in</strong>er Lienhart, Head of Multimedia Comput<strong>in</strong>g, Institut für Informatik, Universität AugsburgEichleitnerstr. 30, D-86135 Augsburg, Germany; email: Ra<strong>in</strong>er.Lienhart@<strong>in</strong>formatik.uni-augsburg.de6


<strong>Text</strong> Box Consolidation (2/2)• Derive <strong>in</strong>itial text bound<strong>in</strong>g boxes• Ref<strong>in</strong>e bound<strong>in</strong>g boxes• Remove text boxes which are– Too small/large, or– Have a bad width-to-height aspect ratio© 2005-2009 Prof. Dr. Ra<strong>in</strong>er Lienhart, Head of Multimedia Comput<strong>in</strong>g, Institut für Informatik, Universität AugsburgEichleitnerstr. 30, D-86135 Augsburg, Germany; email: Ra<strong>in</strong>er.Lienhart@<strong>in</strong>formatik.uni-augsburg.de7


Monitor<strong>in</strong>g + Track<strong>in</strong>g Result: <strong>Text</strong> Objects 8© 2005-2009 Prof. Dr. Ra<strong>in</strong>er Lienhart, Head of Multimedia Comput<strong>in</strong>g, Institut für Informatik, Universität AugsburgEichleitnerstr. 30, D-86135 Augsburg, Germany; email: Ra<strong>in</strong>er.Lienhart@<strong>in</strong>formatik.uni-augsburg.de


Experimental Results• <strong>Text</strong> localization– Image-based: 69.5% (boxes) / 85% (pixels)– Video-based: 94.9% (boxes)• <strong>Text</strong> segmentation– 79.6% correctly segmented– 7.6% damaged, but still recognizable• <strong>Text</strong> recognition– 70% (over all steps)© 2005-2009 Prof. Dr. Ra<strong>in</strong>er Lienhart, Head of Multimedia Comput<strong>in</strong>g, Institut für Informatik, Universität AugsburgEichleitnerstr. 30, D-86135 Augsburg, Germany; email: Ra<strong>in</strong>er.Lienhart@<strong>in</strong>formatik.uni-augsburg.de10


Demo© 2005-2009 Prof. Dr. Ra<strong>in</strong>er Lienhart, Head of Multimedia Comput<strong>in</strong>g, Institut für Informatik, Universität AugsburgEichleitnerstr. 30, D-86135 Augsburg, Germany; email: Ra<strong>in</strong>er.Lienhart@<strong>in</strong>formatik.uni-augsburg.de11

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!