13.07.2015 Views

WWW/Internet - Portal do Software Público Brasileiro

WWW/Internet - Portal do Software Público Brasileiro

WWW/Internet - Portal do Software Público Brasileiro

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

IADIS International Conference <strong>WWW</strong>/<strong>Internet</strong> 2010In this paper, we provide an overview of the current WeOCR system, show some updated statisticsfollowing our previous survey (Goto 2007), and describe the knowledge obtained through a feasibility studyon the web-based OCR system using WeOCR.2. OVERVIEW OF THE WeOCR SYSTEM2.1 Basic ConceptWe designed OCRGrid to make a lot of OCR engines work cooperatively over the network worldwide togain some synergetic effects such as performance improvement of OCR, to realize multilingual, highfunction,sophisticated OCR systems, and to provide ubiquitous OCR services. The basic concept ofOCRGrid is quite simple. Many OCR servers are deployed on a network as shown in Figure 1. The OCRservers work either independently or cooperatively communicating with each other. A client connects to oneof the OCR servers and sends text images to it. The servers recognize the images and send the recognitionresults (i.e. character codes, etc.) back to the client.Although OCRGrid is expected to be used world-wide over the <strong>Internet</strong>, we may use any networks suchas wireless networks, corporate local area networks (LANs), virtual private networks (VPNs),interconnection networks in parallel servers, and even inter-process communication channels.Some potential applications of the platform have been discussed in (Goto 2004, Goto 2006).Unlike some commercial OCR systems equipped with web user interfaces, we are expecting WeOCRservers will be supported by the communities of researchers, developers, and individuals as well asapplication service providers.cooperative workOCRserverOCRserversend imagesclientreceiverecognition resultsOCRserverclientFigure 1. OCRGrid platformOCRserverOCRserverOCRserverWAN/LAN2.2 WeOCR Toolkit and Server Search SystemMaking a web-based OCR from scratch as an application server requires a lot of expertise about networkprogramming and network security. Since many OCR developers and researchers are not so familiar withnetwork programming, we developed a toolkit to help those people build secure web-based OCRs easily.(See http://weocr.ocrgrid.org/)The <strong>do</strong>cument image is sent from the client computer to the WeOCR toolkit via the Apache web serverprogram. The programs in the toolkit check the image size, examine the file integrity, uncompress the file ifnecessary, convert the image file into a common file format, and invoke the OCR software. The toolkitincludes some programs for protecting the web server from malicious attacks or from the defects of the OCRprograms. The recognition results are converted into HTML data and sent back to the client.To enable the end users to search for appropriate OCR engines easily, we developed a directory-basedserver search system. Figure 2 depicts the overview.43

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!