13.07.2015 Views

WWW/Internet - Portal do Software Público Brasileiro

WWW/Internet - Portal do Software Público Brasileiro

WWW/Internet - Portal do Software Público Brasileiro

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

IADIS International Conference <strong>WWW</strong>/<strong>Internet</strong> 2010commercial web-based OCR services have been launched so far, none of them has WeOCR-compatible WebAPI (Application Programming Interface).The following list shows the web-based OCR services and demos spotted by the author.• LEADTOOLS OCR Service – by LEADTOOLS• Web OCR service – by KYODONETWORKS (for Japanese text only)• OCR Demo – by Startek Technology (for printed numerals only)• ocrNow! – by Wordcraft International Limited (limited trial)• Google Docs – by Google (built-in OCR engine can handle some European languages only)None of them takes into account Grid-based, collaborative use of OCR engines as far as we know.Our idea of “Grid-style open OCR servers” has not been successful yet. For commercial providers, one ofthe reasons is probably the lack of plausible business model. From the researchers’ and developers’ points ofview, the incentives may not be so clear because the return gains are often indirect. Without a lot ofcontributions, we would not be able to achieve collaborative environments or synergetic effects such asmultilingual processing. We need to investigate the feasibility of OCRGrid platform further.3. FEASIBILITY STUDY ON WEB-BASED OCR SYSTEM3.1 Access StatisticsWe are interested in how frequently the WeOCR servers are used. A survey was conducted from Nov. 2005to July 2010. Note that a warning message “Do not send any confidential <strong>do</strong>cuments” is shown on everyimage submission page (portal site). Some detailed information about the privacy is also provided.Figure 3 shows the monthly access counts on the two WeOCR servers, Michelle and Maggie. Thenumbers of requests to six engines (except NHocr) are added together. The line “access” represents thenumbers of requests, while another one “image_ok” shows the numbers of successful image uploading. Thefailures were mainly due to file format mismatch and interrupts of data transmissions.The number of accesses was increasing gradually in the first year, despite the privacy concerns, until thefirst automated client was spotted. Then, the server load began soaring and reached to the current averagearound 400,000 counts/month (July 2010). One of the heavy users seems to be running a program formonitoring numerals shown on displays of some kinds of apparatuses. Figure 4 shows the access counts onthe NHocr server only, which is almost free from an automated client. The server load has reached to thecurrent average 25,000 counts/month.Figure 3. Monthly process counts. (Michelle + Maggie)45

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!