13.07.2015 Views

WWW/Internet - Portal do Software Público Brasileiro

WWW/Internet - Portal do Software Público Brasileiro

WWW/Internet - Portal do Software Público Brasileiro

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

ISBN: 978-972-8939-25-0 © 2010 IADISA FEASIBILITY STUDY ON DISTRIBUTEDCOOPERATIVE OCR SYSTEMS USING WeOCRHideaki GotoCyberscience Center, Tohoku UniversityAoba-ku, Sendai-shi, 980-8578 JapanABSTRACTCollaboration of the network and optical character recognition (OCR) has a great potential to extend the applications ofOCR and to open up new vistas in some future services using OCR. We presented the basic ideas of synergetic OCRsystem in 2004, and developed a platform called WeOCR to realize the web-based OCR services. The number ofaccesses has soared since the experimental servers were put into service in 2005. This paper provides an overview of thecurrent WeOCR system, shows some updated statistics, and describes the knowledge obtained through a feasibility studyon the web-based OCR system using WeOCR. The unique survey conveys some useful hints for future development ofnetwork-oriented OCRs, not limited to WeOCR, and potential OCR applications in the Grid/Cloud Computing era.KEYWORDSWeb-based OCR, WeOCR, OCRGrid, character recognition, <strong>do</strong>cument analysis, Cloud Computing1. INTRODUCTIONCollaboration of the network and optical character recognition (OCR) is expected to be quite useful formaking lightweight applications not only for desktop computers but also for small gadgets such as SmartPhones and PDAs (Personal Digital Assistants). For example, some mobile phones with character recognitioncapabilities became available in the market (Koga 2005). Another mobile phone with a real-time English-Japanese translator was proposed by a Japanese company. Although it has a simple, built-in OCR, it requiresa large language dictionary provided by an external server on the <strong>Internet</strong>. Another example is wearablevision system with character recognition capabilities for visually-impaired people (Goto 2009). Due to thelimitations of the hardware, it is very difficult to put a large character-set data and a dictionary for languageprocessing into such small gadgets or wearable devices. In addition, we cannot use a sophisticated characterrecognition method because the processor power is quite limited. Thus, use of network can be beneficial tovarious applications.The network-based architecture has some advantages from the researchers' and developers' points of viewas well. Recent OCR systems are becoming more and more complicated, and the development requiresexpertise in various fields of research. Building and studying a complete system has become very difficult fora researcher or a small group of people. A possible solution is to share software components as web services.Since the programs run on the server sides, people can provide computing (pattern recognition) powers toothers without having their source codes or executables open (Lucas 2005, Goto 2006).We presented a new concept of network-oriented OCR systems called Synergetic OCR (Goto 2004), andproposed a revised version called OCRGrid in 2006 (Goto 2006). Since year 2004, we have been designing aweb-based OCR system, called WeOCR, as a HTTP (HyperText Transfer Protocol)-based implementation ofOCRGrid. The WeOCR system was put into service in 2005. To our knowledge, there was no commercialweb-based OCR service at that time. In addition, WeOCR is currently the only Grid-oriented, cooperativeOCR platform in the world. One of the biggest concerns with web-based OCR is about the privacy of<strong>do</strong>cuments. A simple question arises – Do people really want to use a system like that? To answer thisquestion, we need to investigate the usage of the system.42

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!