14.12.2012 Views

OCR and OMR Final Project Report

OCR and OMR Final Project Report

OCR and OMR Final Project Report

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Page 1<br />

<strong>OCR</strong> <strong>and</strong> <strong>OMR</strong> <strong>Final</strong> <strong>Project</strong> <strong>Report</strong><br />

<strong>OCR</strong> <strong>and</strong> <strong>OMR</strong> <strong>Final</strong> <strong>Project</strong> <strong>Report</strong><br />

Refi Sencia Dity / ICT-7<br />

1. Introduction<br />

We choose <strong>OCR</strong> <strong>and</strong> <strong>OMR</strong> as implementation to what we already learn according to<br />

subject image processing during this semester. Through this final project, we<br />

combine various image processing techniques <strong>and</strong> implemented it in a simple C/C++<br />

code.<br />

2. Technical Details<br />

� Programming languages: C <strong>and</strong> C++<br />

� Compiler: GCC<br />

� Platform: Linux<br />

� Software: ImageMagick<br />

3. Design<br />

The user interface (GUI) is build to make the user easier to use the FormReader. <strong>OCR</strong><br />

is used for reading letter (name) however; <strong>OMR</strong> is used for reading the answer part<br />

(mark). These 2 functions (<strong>OCR</strong> <strong>and</strong> <strong>OMR</strong>) will combine together with the GUI to<br />

perform their function properly. GUI itself is created using GTK+.<br />

1


4. Implementation<br />

Page 2<br />

<strong>OCR</strong> <strong>and</strong> <strong>OMR</strong> <strong>Final</strong> <strong>Project</strong> <strong>Report</strong><br />

Before <strong>OMR</strong> <strong>and</strong> <strong>OCR</strong> can work properly, there are 2 preliminary steps that should<br />

be made:<br />

1. Binarization<br />

Binarizarion is used to simplify the image representation, with binarization,<br />

image will only have 2 value, black (0) or white (255). We used the 200 as a<br />

threshold for this project.<br />

2. Inversion<br />

<strong>OCR</strong> <strong>OMR</strong><br />

With inversion, the value 255 (white) will be change to 0 <strong>and</strong> vice versa. It<br />

helps in the calculation.<br />

GUI<br />

4.1 Optical Mark Reader (<strong>OMR</strong>)<br />

1. Binarization<br />

2. Search Box Coordinate<br />

Image Answer<br />

3. Calculate Histogram<br />

2


Page 3<br />

<strong>OCR</strong> <strong>and</strong> <strong>OMR</strong> <strong>Final</strong> <strong>Project</strong> <strong>Report</strong><br />

Histogram is used to read the answer. There will be 5 boxes (20x20 pixels) for every<br />

number. Later, histogram calculation is performed for each of boxes. The box that<br />

has the maximum value of black (0) pixel will be chosen as the answer.<br />

4.2 Optical Character Recognition (<strong>OCR</strong>)<br />

Inverted Image<br />

Name Box<br />

Answer<br />

1. Find coordinates of crosses<br />

2. Extract subset of image located within crosses range.<br />

3. Name box extraction<br />

4. Frame/border elimination<br />

5. Normalization<br />

6. Database Matching<br />

3


1. Find coordinates of crosses<br />

Page 4<br />

<strong>OCR</strong> <strong>and</strong> <strong>OMR</strong> <strong>Final</strong> <strong>Project</strong> <strong>Report</strong><br />

To find the coordinates of crosses, we take a 50 x 50 pixels subset of image in<br />

every corner <strong>and</strong> then we use histogram to find the center point of every<br />

cross.<br />

2. Extract subset of image located within crosses range.<br />

After getting the coordinate of each cross, we use that coordinate to subtract<br />

the box inside the cross.<br />

3. Name box extraction<br />

The coordinate of the name box can be calculates through the pre-defined<br />

(left-top) coordinates. After getting the coordinate, we can extract the box<br />

based on the coordinate that we get.<br />

4. Frame Border Elimination<br />

4


Page 5<br />

<strong>OCR</strong> <strong>and</strong> <strong>OMR</strong> <strong>Final</strong> <strong>Project</strong> <strong>Report</strong><br />

To make name box free from noise, we need to eliminate the border that<br />

may be exist. This elimination is done Using Connected Component<br />

Extraction.<br />

5. Normalization<br />

Normalization done in2 steps:<br />

1. Extract the letter from the box<br />

For extract the letter, we assume that there is no noise in the paper,<br />

this method done by calculate the histogram vertically <strong>and</strong><br />

horizontally, <strong>and</strong> searching for boundary. Once we get the boundary,<br />

we can get the coordinate of each corner. After we get the coordinate<br />

of each corner, extraction for the letter can be done<br />

5


2. Scaling<br />

6. Matching<br />

From the boundary, we can get:<br />

Left Top Coordinate: (10,15)<br />

Right Top Coordinate: (100,15)<br />

Left Bottom Coordinate: (10,150)<br />

Right Bottom Coordinate: (100,150)<br />

Page 6<br />

<strong>OCR</strong> <strong>and</strong> <strong>OMR</strong> <strong>Final</strong> <strong>Project</strong> <strong>Report</strong><br />

The box based on the size of the letter can cause the size of every<br />

boxes is different <strong>and</strong> it will be difficult in the matching process.<br />

Because of that reason, scaling process is needed before the matching<br />

process can be done. Scaling is done using Nearest neighbor<br />

interpolation.<br />

Matching is done using K-Nearest Neighbors algorithm. It is done by<br />

comparing each letter with database <strong>and</strong> calculates the distance of the pixel;<br />

finally we choose the closest one as the answer.<br />

6


4.3 GUI<br />

� Created using GTK+.<br />

Page 7<br />

<strong>OCR</strong> <strong>and</strong> <strong>OMR</strong> <strong>Final</strong> <strong>Project</strong> <strong>Report</strong><br />

o GTK+ is a highly usable, feature rich toolkit for creating graphical user<br />

� How to use it:<br />

interfaces which boasts cross platform compatibility <strong>and</strong> an easy to use<br />

API. GTK+ it is written in C, but has bindings to many other popular<br />

programming languages such as C++, Python <strong>and</strong> C# among others.<br />

1. File – Open (Answer Sheet)<br />

o Choose a PGM file (scanned form of an answer sheet).<br />

2. File – Load (Answer File)<br />

o Choose a txt file containing correct answers.<br />

3. Click “scan” button<br />

3. Right panel displays:<br />

Format: [number],[answer]<br />

o Score (calculated by comparing scanned answers with<br />

correct answers)<br />

o Name (read by <strong>OCR</strong> program)<br />

o Scanned answers (read by <strong>OMR</strong> program)<br />

o Correct answers (loaded from txt file)<br />

7


5. Evaluation<br />

� Accepts PGM file only.<br />

� Does not h<strong>and</strong>le skewed images yet.<br />

Page 8<br />

<strong>OCR</strong> <strong>and</strong> <strong>OMR</strong> <strong>Final</strong> <strong>Project</strong> <strong>Report</strong><br />

� Tested on answer sheet filled with computer (not filled by human yet).<br />

� <strong>OCR</strong>:<br />

o Ignore noise.<br />

o Letters connected to the frame cannot be read properly.<br />

� No filter during file selection.<br />

o User can feed any type of file although the program can only process<br />

PGM files correctly.<br />

� Does not use capital letters yet.<br />

o Should use case-insensitive character comparison in C program.<br />

� Graphical User Interface was designed mainly to demonstrate the program.<br />

o It does not consider ease of use <strong>and</strong> other aesthetic aspects.<br />

6. Additional Information<br />

8


Important files:<br />

Page 9<br />

<strong>OCR</strong> <strong>and</strong> <strong>OMR</strong> <strong>Final</strong> <strong>Project</strong> <strong>Report</strong><br />

� lib/imglib.cpp -> Libraries for basic image processing tasks (generic <strong>and</strong> reusable).<br />

� lib/imgoplib.cpp -> Collection of image filters (e.g., scaling, connected comp<br />

extraction, dilation).<br />

� lib/formlib.cpp -> Libraries for form reader functionalities (specific for the<br />

application).<br />

� form-reader-cli.cpp -> Main file for CLI version of the application (only for testing).<br />

� form-reader-gui.cpp -> Main file for GUI version of the application.<br />

Important classes:<br />

� GrayscaleImage -> This class represents a Grayscale Image (can be constructed<br />

manually or read from PGM file).<br />

� GrayscaleImageOp -> Abstract class for GrayscaleImage operation. Direct subclasses:<br />

DilationOp, ConnectedCompOp, ScaleOp.<br />

� Point -> This class can represent a coordinate (x, y).<br />

� Dimension -> Self-explanatory. Consists of width <strong>and</strong> height.<br />

� Rectangle -> This class can represent a region in the image (x, y, width, height).<br />

� FormReader -> The core class for the form reader application. Contains form reader<br />

functionality.<br />

� <strong>OCR</strong> -> This class carries all the <strong>OCR</strong> tasks. This class is used within the FormReader<br />

class.<br />

7. References<br />

� Gonzales, R. C. <strong>and</strong> Woods, R. E. Digital Image Processing, 3 rd Ed. Upper<br />

Saddle River, NJ: Prentice Hall, 2010.<br />

� “The GTK <strong>Project</strong>”. http://www.gtk.org/, accessed January 12, 2011.<br />

9

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!