OCR and OMR Final Project Report

Page 1 

OCR and OMR Final Project Report 


Refi Sencia Dity / ICT-7 

1. Introduction 

We choose OCR and OMR as implementation to what we already learn according to 

subject image processing during this semester. Through this final project, we 

combine various image processing techniques and implemented it in a simple C/C++ 

code. 

2. Technical Details 

� Programming languages: C and C++ 

� Compiler: GCC 

� Platform: Linux 

� Software: ImageMagick 

3. Design 

The user interface (GUI) is build to make the user easier to use the FormReader. OCR 

is used for reading letter (name) however; OMR is used for reading the answer part 

(mark). These 2 functions (OCR and OMR) will combine together with the GUI to 

perform their function properly. GUI itself is created using GTK+. 

1

4. Implementation 

Page 2 


Before OMR and OCR can work properly, there are 2 preliminary steps that should 

be made: 

1. Binarization 

Binarizarion is used to simplify the image representation, with binarization, 

image will only have 2 value, black (0) or white (255). We used the 200 as a 

threshold for this project. 

2. Inversion 

OCR OMR 

With inversion, the value 255 (white) will be change to 0 and vice versa. It 

helps in the calculation. 

GUI 

4.1 Optical Mark Reader (OMR) 

1. Binarization 

2. Search Box Coordinate 

Image Answer 

3. Calculate Histogram 

2

Page 3 


Histogram is used to read the answer. There will be 5 boxes (20x20 pixels) for every 

number. Later, histogram calculation is performed for each of boxes. The box that 

has the maximum value of black (0) pixel will be chosen as the answer. 

4.2 Optical Character Recognition (OCR) 

Inverted Image 

Name Box 

Answer 

1. Find coordinates of crosses 

2. Extract subset of image located within crosses range. 

3. Name box extraction 

4. Frame/border elimination 

5. Normalization 

6. Database Matching 

3

1. Find coordinates of crosses 

Page 4 


To find the coordinates of crosses, we take a 50 x 50 pixels subset of image in 

every corner and then we use histogram to find the center point of every 

cross. 

2. Extract subset of image located within crosses range. 

After getting the coordinate of each cross, we use that coordinate to subtract 

the box inside the cross. 

3. Name box extraction 

The coordinate of the name box can be calculates through the pre-defined 

(left-top) coordinates. After getting the coordinate, we can extract the box 

based on the coordinate that we get. 

4. Frame Border Elimination 

4

Page 5 


To make name box free from noise, we need to eliminate the border that 

may be exist. This elimination is done Using Connected Component 

Extraction. 

5. Normalization 

Normalization done in2 steps: 

1. Extract the letter from the box 

For extract the letter, we assume that there is no noise in the paper, 

this method done by calculate the histogram vertically and 

horizontally, and searching for boundary. Once we get the boundary, 

we can get the coordinate of each corner. After we get the coordinate 

of each corner, extraction for the letter can be done 

5

2. Scaling 

6. Matching 

From the boundary, we can get: 

Left Top Coordinate: (10,15) 

Right Top Coordinate: (100,15) 

Left Bottom Coordinate: (10,150) 

Right Bottom Coordinate: (100,150) 

Page 6 


The box based on the size of the letter can cause the size of every 

boxes is different and it will be difficult in the matching process. 

Because of that reason, scaling process is needed before the matching 

process can be done. Scaling is done using Nearest neighbor 

interpolation. 

Matching is done using K-Nearest Neighbors algorithm. It is done by 

comparing each letter with database and calculates the distance of the pixel; 

finally we choose the closest one as the answer. 

6

4.3 GUI 

� Created using GTK+. 

Page 7 


o GTK+ is a highly usable, feature rich toolkit for creating graphical user 

� How to use it: 

interfaces which boasts cross platform compatibility and an easy to use 

API. GTK+ it is written in C, but has bindings to many other popular 

programming languages such as C++, Python and C# among others. 

1. File – Open (Answer Sheet) 

o Choose a PGM file (scanned form of an answer sheet). 

2. File – Load (Answer File) 

o Choose a txt file containing correct answers. 

3. Click “scan” button 

3. Right panel displays: 

Format: [number],[answer] 

o Score (calculated by comparing scanned answers with 

correct answers) 

o Name (read by OCR program) 

o Scanned answers (read by OMR program) 

o Correct answers (loaded from txt file) 

7

5. Evaluation 

� Accepts PGM file only. 

� Does not handle skewed images yet. 

Page 8 


� Tested on answer sheet filled with computer (not filled by human yet). 

� OCR: 

o Ignore noise. 

o Letters connected to the frame cannot be read properly. 

� No filter during file selection. 

o User can feed any type of file although the program can only process 

PGM files correctly. 

� Does not use capital letters yet. 

o Should use case-insensitive character comparison in C program. 

� Graphical User Interface was designed mainly to demonstrate the program. 

o It does not consider ease of use and other aesthetic aspects. 

6. Additional Information 

8

Important files: 

Page 9 


� lib/imglib.cpp -> Libraries for basic image processing tasks (generic and reusable). 

� lib/imgoplib.cpp -> Collection of image filters (e.g., scaling, connected comp 

extraction, dilation). 

� lib/formlib.cpp -> Libraries for form reader functionalities (specific for the 

application). 

� form-reader-cli.cpp -> Main file for CLI version of the application (only for testing). 

� form-reader-gui.cpp -> Main file for GUI version of the application. 

Important classes: 

� GrayscaleImage -> This class represents a Grayscale Image (can be constructed 

manually or read from PGM file). 

� GrayscaleImageOp -> Abstract class for GrayscaleImage operation. Direct subclasses: 

DilationOp, ConnectedCompOp, ScaleOp. 

� Point -> This class can represent a coordinate (x, y). 

� Dimension -> Self-explanatory. Consists of width and height. 

� Rectangle -> This class can represent a region in the image (x, y, width, height). 

� FormReader -> The core class for the form reader application. Contains form reader 

functionality. 

� OCR -> This class carries all the OCR tasks. This class is used within the FormReader 

class. 

7. References 

� Gonzales, R. C. and Woods, R. E. Digital Image Processing, 3 rd Ed. Upper 

Saddle River, NJ: Prentice Hall, 2010. 

� “The GTK Project”. http://www.gtk.org/, accessed January 12, 2011. 

9

OCR and OMR Final Project Report

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?