OCR Based Mapless Navigation Method of Robot

VISVESVARAYA TECHNOLOGICAL UNIVERSITY 

Belgaum, Karnataka–590 018 

A project dissertation 

on 

OCR Based Mapless Navigation Method of Robot 

for the award of the degree of 

MASTER OF TECHNOLOGY 

in 

INDUSTRIAL AUTOMATION AND ROBOTICS 

by 

Mr. SAYYAN N. SHAIKH 

(USN: 4SN11MAR10) 

Under the guidance of 

Dr. NEELAKANTHA V. LONDHE Mr. BASAVARAJ 

Professor Project guide 

Dept of Mechanical Engg. BCS Innovations 

Srinivas Institute of Technology Bangalore- 560 054. 

Valachil, Mangalore-574 143. 

DEPARTMENT OF MECHANICAL ENGINEERING 

SRINIVAS INSTITUTE OF TECHNOLOGY 

Mangalore, Karnataka, India-574 143 

2012-2013

Srinivas Institute of Technology 

Mangalore, Karnataka, India-574 143 

(Affiliated to Visvesvaraya Technological University, Belgaum) 

Department of Mechanical Engineering 

CERTIFICATE 

This is to certify that the report entitled “OCR Based Mapless Navigation Method of 

Robot” is a bonafide document of project work carried out by Mr. SAYYAN N. SHAIKH 

bearing USN 4SN11MAR10, submitted in partial fulfillment for the award of Master of 

Technology in Industrial Automation and Robotics of the Visvesvaraya Technological 

University, Belgaum, during the year 2012-2013. It is certified that all corrections/suggestions 

indicated have been incorporated in the dissertation report deposited in the department library. 

The dissertation report has been approved, as it satisfies the academic requirements in respect of 

project work regulations prescribed for the said Degree. 

Signature of Guide Signature of HOD Signature of Principal 

Dr. Neelakantha V. Londhe Dr. Thomas Pinto Dr. Shrinivasa Mayya D 

Name of the Examiners: Signature with date 

1. _____________________________ ------------------------------------- 

2. _____________________________ ------------------------------------- 

ii

iii

DECLARATION 

I, Sayyan N. Shaikh, bearing USN 4SN11MAR10, a student of M.Tech in the 

Department of Mechanical Engineering, Srinivas Institute of Technology, Mangalore, 

hereby declare that the project work entitled “OCR Based Mapless Navigation Method 

of Robot” embodies the report of my project work carried out under the guidance of 

Dr. Neelakantha V. Londhe, Professor, Department of Mechanical Engineering, 

Srinivas Institute of Technology, Mangalore and Mr. Basavaraj C.H, Co-guide at BCS 

Innovations, Bangalore. This project has been submitted in partial fulfillment of the 

requirements for the award of Master of Technology in Industrial Automation and 

Robotics by the Visvesvaraya Technological University, Belgaum. 

This work contained in this thesis has not been submitted in part or full to any 

other university or institution or professional body for the award of any other Degree or 

Diploma or any Fellowship. 

Date: Sayyan N. Shaikh 

Place: Mangalore USN: 4SN11MAR10 

iv 

Industrial Automation and Robotics 

Department of Mechanical Engineering 

Srinivas Institute of Technology, 

Valachil, Mangalore-574143

ACKNOWLEDGEMENT 

This project work was successfully completed with the help and guidance 

received. The elation and gratification of this project would be incomplete without 

mentioning the people who helped to make it possible, whose encouragement and support 

is valuable to me. 

At the outset I would like to express my gratitude to my guide Dr. Neelakantha 

V. Londhe, Professor, Department of Mechanical Engineering, Srinivas Institute of 

Technology, Mangalore, for his guidance, encouragement and support for successful 

completion of this project work. 

I pay my deepest thanks to Mr. Basavaraj C.H, Co-guide at BCS Innovations for 

being such a patient and understanding project guardian. His foresight, intuition, and care 

were instrumental in shaping this work. I want to thank him for his role as both a teacher 

and an advisor. He taught me how to dig deeper and provided me with the guidance 

needed to get started. 

I am highly indebted to Dr. Thomas Pinto, Professor and HOD, Department of 

Mechanical Engineering, Srinivas Institute of Technology, Mangalore, for his excellent 

guidance, encouragement and support throughout the course. I consider it to be an honour 

for working under him. 

I am immensely thankful and I would like to express my deep sense of gratitude to 

Mr. C. G. Ramachandra, Associate Professor, Department of Mechanical Engineering, 

Srinivas Institute of Technology, Mangalore, who has been a great source of inspiration 

and for his help & encouragement. 

My sincere thanks to Dr. Srinivasa Mayya D, Principal, Srinivas Institute of 

Technology, Mangalore for providing me the necessary facilities and encouragement to 

carry the project successfully. 

I would like to thank our Management, „A. Shama Rao Foundation’ for their cooperation 

and inspiration during my course. 

I would also like to thank the almighty, for always being there for me and guided 

me to work on the right path of life. My greatest thanks are to my parents who bestowed 

ability and strength in me to complete the work. 

To all my friends, thank you for your understanding and encouragement in my 

moment of crisis. Your friendship makes my life a wonderful experience. I can't list all 

the names here, but you are always in my mind. 

Finally, I would like to thanks to all my well-wishers who helped me directly and 

indirectly throughout this project. 

v 

(Sayyan N. Shaikh)

ABSTRACT 

This proposed method for a robot can locate and track landmarks via a colour- 

based region segmentation algorithm, within proper distance. It can extract text or sign 

from regions of landmark and inputs them into the OCR engine for recognition. 

Simultaneously, a projection analysis of signs or text on the landmark is conducted, and 

semanteme of arrows or text is identified. Finally, by combining the semanteme of arrows 

and texts which extract from landmark, the robot can find the routes to the destinations 

automatically. 

The hardware implementation of the system includes a navigator robot with an 

Android Phone, receives a signal to the phone through Bluetooth as it comes across any 

RF card in its path. The phone, on receiving the specific signal invokes its camera to 

capture the snap of the landmark and sends the image to the server through the internet. 

The server processes the received image to find out the actual meaning of the symbol or 

text like petrol pump, restaurant, left turn, right turn etc. through OCR mechanism. In 

OCR (Optical Character Recognition) it has shown how the text detection and recognition 

system, combined with several other ingredients, allows a robot to recognize named 

locations specified by a user. This meaning is reverted back to the phone which speaks up 

the meaning through its „Text To Speech‟ synthesizer, and transfers the signal to the 

navigator through Bluetooth for the possible move. 

The objective of this robot system is to show that this method can be used to 

accomplish mapless navigation task perfectly in the indoors or customised environment. 

Being able to navigate with landmarks in real life directly without generating maps or 

resetting new navigation sign for robots. Specially, this method can be applied in the field 

of service robots rapidly, which can enhance their adaptability and viability significantly. 

vi

CHAPTER 

CONTENTS 

DESCRIPTION PAGE NO. 

Acknowledgements…………………………………. v 

Abstract…………………………………………….... vi 

List of figures ……………………………………...... x 

List of tables ………………………………………... xii 

1 INTRODUCTION 

1.1 Introduction to Landmark Navigation…….……...…. 1 

1.2 Problem Statement……………………………....….. 3 

1.3 Existing System ………………………….……....…. 3 

1.4 Proposed System ……………………………..…....... 3 

1.5 Objectives of the Project………………….……….... 5 

1.6 Scope of the Project……………………..………....... 5 

1.7 Optical Character Recognition.................................... 5 

1.7.1 What Is OCR? ……………………………….... 5 

1.7.2 History of OCR …………………………….…. 6 

2 LITERATURE REVIEW AND SURVEY 

2.1 Research Paper Review………………..……………. 8 

2.2 Java…………………………………….…...….……. 10 

2.2.1 Features of Java………………….……….…… 10 

2.3 Apache Tomcat………………………….……….…. 11 

2.3.1 Benefits of Tomcat Server……….….………. 12 

2.4 Android………………………………….…………... 12 

2.4.1 Android Versions………………….…………. 13 

2.4.2 Features of Android………………………...... 13 

2.5 Embedded C ………………………………….…….. 14 

2.5.1 Features of Embedded C……………….….…. 14 

2.6 Outcome of the Literature Survey…………….…….. 15 

3 DESIGN ANALYSIS AND METHODOLOGY 

3.1 Data Flow Diagram………………………..………... 16 

3.2 Flow Chart …..…………………………….………... 17 

4 IMAGE EXTRACTION ALGORITHM 

4.1 Landmark Image Extraction........................................ 18 

4.1.1 Binarization..………………………….….….. 18 

4.1.2 Smearing…………………………….…….…. 18 

4.2 Landmark Location Finder ………………….……… 19 

4.3 Segmentation …………………………………….…. 19 

4.3.1 Filtering……………………………….….….. 19 

4.3.2 Dilation………………………………..……... 20 

4.3.3 Individual Image Separation………….……... 20 

vii

4.3.4 Normalization……………………….…….…. 20 

4.4 Template Matching………………………….…….… 21 

5 IMAGE RECOGNITION USING KOHONEN 

5.1 Introduction to the Network……………………….... 22 

5.2 General Image Recognition Procedure…………….... 22 

5.3 Image Recognition Procedures with Kohonen…….... 23 

5.4 Data Collection……………………………….……... 24 

5.5 Image Pre-processing……………………………….. 25 

6.5.1 RGB to Grayscale Image Conversion …….… 26 

6.5.2 Grayscale to Binary Image Conversion….…... 26 

5.6 Feature Extraction ……………………………….….. 27 

6.6.1 Pixel Grabbing From Image ……………….... 28 

6.6.2 Finding Probability of Making Square …….... 28 

6.6.3 Mapped To Sampled Area …………………... 29 

5.7 

6.6.4 Creating Vector ………………………….…... 30 

6.6.5 Representing Character with a Model Number 30 

Kohonen Neural Network............................................ 31 

5.7.1 Introduction to Kohonen Network………….... 31 

5.7.2 The Structure of Kohonen Network……….… 32 

5.7.3 Sample Input to Kohonen Network……….…. 33 

5.7.4 Normalizing the Input…………………….….. 33 

5.7.5 Calculating Each Neuron‟s Output……….….. 34 

5.7.6 Mapping to Bipolar……………………….….. 34 

5.7.7 Choosing a Winner………………………..…. 35 

5.7.8 Kohonen Network Learning Procedure ……... 35 

5.7.9 Learning Algorithm Flowchart…………….… 36 

5.7.10 Learning Rate ………………………….……. 36 

5.7.11 Adjusting Weight …………………….…...…. 37 

5.7.12 Calculating the Errors ………………….….… 38 

5.7.13 Recognition with Kohonen Network ……….. 38 

6 HARDWARE DESCRIPTION AND IMPLIENTATION 

6.1 Circuit Diagram……………………………..…….… 39 

6.1.1 RFID Reader…………………….…….…….. 39 

6.1.2 Bluetooth Module……………………….…… 41 

6.1.3 PIC16F873A Microcontroller ……….………. 42 

6.1.4 ULN 2003…………………………….……… 43 

6.1.5 Relay …………………………………...……. 44 

6.1.6 DC Motor………………………………..…… 46 

7 OVERALL DESCRIPTION & IMPLEMENTATION 

7.1 Syatem Perspective ……………….…….…….…….. 48 

7.2 Operating Environment …………………….….…… 48 

7.3 Design and Implementation Constraints ……..…….. 48 

7.4 User Documentation ………………………….…….. 49 

viii

7.5 Hardware Interfaces ………….……….…………….. 49 

7.6 Software Interfaces …………………………………. 49 

7.7 Software Requirements……………………….…….. 49 

7.8 PC Requirements……………………………….…… 50 

7.9 How to Execute…………………………….……….. 50 

8 RESULTS AND DISCUSSION 

8.1 Experimental Results …………………….….……… 55 

8.2 Accuracy Rates ……………………………….…….. 55 

8.3 Drawbacks…………………………………….…….. 55 

9 CONCLUSION 

9.1 Conclusion ………………………………….………. 56 

10 SCOPE FOR FUTURE WORK 

10.2 Future Work ……………………………………….... 57 

REFERENCES............................................................................ 58 

Appendix A: Document Conventions......................................... 60 

ix

LIST OF FIGURES 

FIGURE NO. DESCRIPTION PAGE NO. 

1.1 Landmarks in Our Daily Life............................................ 1 

1.2 Proposed System............................................................... 4 

3.1 DFD for the Proposed System........................................... 16 

3.2 Flow Chart Model of the Proposed System...................... 17 

4.1 Captured Image................................................................. 

4.2 Binarized Image................................................................ 18 

4.3 Landmark Involving Text.................................................. 19 

4.4 Landmark Text Region...................................................... 19 

4.5 Separated Text Images After Dilation Process................. 20 

4.6 Individual Image Cut......................................................... 20 

4.7 Image after Normalization................................................. 20 

4.8 Database Template Images................................................ 21 

5.1 General Character Recognition Procedure........................ 23 

5.2 Image Recognition Procedure Using Kohonen Network 24 

5.3 Cropped Input Image from Landmark Region.................. 25 

5.4 Captured Input Image........................................................ 25 

5.5 Computer Image................................................................ 25 

5.6 RGB Image........................................................................ 26 

5.7 Grayscale Image................................................................ 26 

5.8 Grayscale Image with Histogram...................................... 27 

5.9 Binary Image with Histogram........................................... 27 

5.10 Sampled Image.................................................................. 29 

5.11 Vector Representation....................................................... 30 

5.12 Kohonen Neural Network................................................. 32 

5.13 Simple Kohonen Network with 2 I/p And 2 O/p Neurons 32 

5.14 Flow Chart Model for Learning Algorithm....................... 36 

6.1 Circuit Diagram of the Proposed System.......................... 39 

6.2 RFID Reader...................................................................... 40 

6.3 Block Diagram of LF DT125R Module............................ 41 

6.4 Bluetooth SMD Module - RN-42...................................... 41 

6.5 Pin Configuration of PIC16F873A................................... 42 

x 

18

6.6 ULN 2003.......................................................................... 43 

6.7 Pin Configuration of ULN 2003........................................ 44 

6.8 Relay.................................................................................. 44 

6.9 Relay Control Circuit......................................................... 45 

6.10 Relay Energized (ON)....................................................... 45 

6.11 Relay De-Energized (OFF)................................................ 45 

6.12 Relay Operation................................................................. 46 

6.13 Working of DC Motor....................................................... 46 

6.14 Hardware Implementation of Project................................. 47 

7.1 Snapshot of the Apache Tomcat Server............................ 50 

7.2 Snapshot of the Apache Tomcat Server under Execution 51 

7.3 Snapshot of the Apache Tomcat Server under Execution 52 

7.4 Snapshot of OCR Application waiting for the Input Data 53 

7.5 Snapshot of OCR Application Reading the Data.............. 54 

xi

LIST OF TABLES 

TABLE NO. DESCRIPTION PAGE NO. 

2.1 A Brief History of Android Versions................................ 13 

5.1 Binary Converted Grid Values.......................................... 28 

5.2 Aligned Marked Values..................................................... 29 

5.3 Sample Inputs to a Kohonen Neural Network.................. 33 

5.4 Connection Weights in the Sample Kohonen Network.... 33 

8.1 Accuracy Rates of the system............................................ 55 

xii

INTRODUCTION 

OCR Based Mapless Navigation Method Of Robot 

Chapter 1 

This chapter gives a brief description about OCR based Mapless navigation 

method of robot. It also includes the need of this project, existing system and brief 

introduction of the project its objective and scope. 

1.1 Introduction to Landmark Navigation 

There are enormous landmarks in the environment we live (Figure 1.1), which are 

set to help people‟s travel, and they often contain various text and direction. With these 

signs, people can locate their positions and target their destinations easily. Even in an 

utterly strange place, people can find their way with landmark system. For example, in a 

public place like an airport or a station, people can reach their destinations successfully 

by just following the right landmark without being familiar with that place. Compared 

with robots, humans firstly labels the environment with landmarks, then directly draw the 

labels during the navigation process. As a result, humans are less dependent on maps. In 

other words, humans‟ navigation system is partially transported into the environment 

instead of being completely endogenous. 

Fig. 1.1 Landmarks Used in Our Daily Life 

A number of potential markets are slowly emerging for mobile robotic systems. 

Entertainment applications and household or office assistants are the primary targets in 

this area of development. These types of robots are designed to move around within an 

often highly unstructured and unpredictable environment. Existing and future applications 

for these types of autonomous systems have one key problem in common: navigation. 

M-Tech [IAR], Dept. of Mechanical Eng., SIT, Mangalore Page | 1


Vision is one of the most powerful and popular sensing method used for 

autonomous navigation. When compared with other on-board sensing techniques, vision 

based approaches to navigation continue to demand a lot of attention from the mobile 

robot research community, due to its ability to provide detailed information about the 

environment, which may not be available using combinations of other types of sensors. 

The past decade has seen the rapid development of vision based sensing for indoor 

navigation tasks. For example, 20 years ago it would have been impossible for an indoor 

mobile robot to find its way in a cluttered hallway, and even now it still remains a 

challenge. Vision-based indoor navigation for mobile robots is still an open research area. 

Autonomous robots operating in an unknown and uncertain environment have to cope 

with dynamic changes in the environment, and for a robot to navigate successfully to its 

goals while avoiding both static and dynamic obstacles is a major challenge. Most current 

techniques are based on complex mathematical equations and models of the working 

environment, however following a predetermined path may not require a complicated 

solution, and the following proposed methodology should be more robust. [1] 

Basically vision based navigation falls into three main groups depends on 

localization methods. 

Map Based Navigation: This consists of providing the robot with a model of the 

environment. These models may contain different degrees of detail, varying from 

a complete CAD model of the environment to a simple graph of interconnections 

or interrelationships between the elements in the environment. 

Map Building Based Navigation: In this approach first a 2D or 3D model of the 

environment is first constructed by the robot using its on-board sensors after 

which the model is used for navigation in the environment. 

Mapless Navigation: This category contains all systems in which navigation is 

achieved without any prior description of the environment. The required robot 

motions are determined by observing and extracting relevant information about 

the elements in the environment, such as walls, desks, doorways, etc. It is not 

necessary that the absolute position of these objects is known as further navigation 

to be carried out. 


1.2 Problem Statement 


Traditional robot navigation techniques mostly are map based method, and 

mapless method based on the VSRR (View Sequenced Route Representation) model. The 

map based method need to build the map first, which can be done either by human or the 

robot itself. While the VSRR-based approach need the robot roam around the scene, 

extract and save the feature at each position, and localize it by matching during the 

navigation. Both of the two approaches are vulnerable to wider scenes therefore can‟t be 

applied in a strange environment. GPRS is another widely used navigation tool, but it 

fails to function in indoor environments. Despite we can label the environment for robots 

(ex. RFID) just like we set landmarks for ourselves, however, it‟s not efficient either in 

terms of time or money. So, to solve this problem, we propose a method using a landmark 

system for human directly to navigate robots. 

1.3 Existing System 

Service robots need to have maps that support their tasks. Traditional robot 

mapping solutions are well-suited to supporting navigation and obstacle avoidance tasks 

by representing occupancy information. However, it can be difficult to enable higher- 

level understanding of the world‟s structure using occupancy-based mapping solutions. 

One of the most important competencies for a service robot is to be able to accept 

commands from a human user. Many such commands will include instructions that 

reference objects, structures, or places, so a new mapping system should be designed with 

this in mind. 

1.4 Proposed System 

This proposed system that allows a robot to discover a path automatically by 

detecting and reading textual information or signs located on the landmark by using OCR. 

In this proposed method it has shown that the text-extraction component developed is 

valuable on mobile robots. In particular, this system allows the robot to identify named 

locations or Landmark placed sideways on a road with high reliability, allowing it to 

satisfy requests from a user that refer to these places by name. Just remember that OCR 

(Optical Character Recognition) is, as of now, an inexact science and you won't get 

flawless transcription in all cases. 


Fig. 1.4 Proposed System 


OCR based mapless navigation method is an essential prerequisite for successful 

service robots. In contexts such as homes and offices, landmarks are placed sideways of 

the road, places are often identified by text or signs posted throughout the environment, 

by using the concept of the OCR, textual data can be extracted from the image of 

landmark and navigate the robot. Landmarks such as signs make labeling particularly 

easy, as the appropriate label can be read directly from the landmark using Optical 

Character Recognition (OCR), without the need for human assistance. 

The navigation hardware is connected with Android phone through the Bluetooth 

module for transfer of data for its ordered movement. On the other hand the Android 

phone is connected to the server through the internet (GPRS) by its specific IP address 

SOCKET connection for transfer of image to the server and later for receiving the 

interpreted information conveyed by the image after its processing through the OCR 

module. The received data are spoken by the „Text to Speech‟ module on the phone for 

human interface and the related byte code is sent to the robot based upon which the robot 

navigates through the path. 

Optical Character Recognition (OCR) also referred as the Optical Image Reader is 

a system that provides a full alphanumeric recognition of printed or handwritten images at 

electronic speed by simply scanning the form. Forms can be scanned through a scanner 

and then the recognition engine of the OCR system interpret the images and turn images 

of handwritten or printed images into ASCII data (machine-readable images). 



This technology provides a complete form processing and document capture 

solution. The basic programming language used in the development of this project is 

JAVA and ANDROID. 

The Java APIs (application program interface) that are used include Bluetooth, 

Android Text to Speech, Sockets. 

1.5 Objectives of the Project 

The main objective of the project is to locate and tracks landmarks via a colour 

based region segmentation algorithm. And then, with proper distance find the landmark 

region by colour and shape, extracts texts or sign from regions of landmark and inputs 

them into the OCR engine. Simultaneously, a projection analysis of signs on the landmark 

is conducted, identifying the semanteme of signs. Finally, by combining the semanteme 

of signs and texts which extract from landmark, and control the robot to march forward 

and track this region simultaneously. All those procedures will do repeatedly until the 

robot reaches the destination. 

1.6 Scope of the Project 

Mapless navigation robots are slowly finding their way into outdoor and open 

environments and as well currently receiving an increasing attention of the scientific 

community as in the industry. These robots have many potential applications in routine or 

dangerous environment conditions such as operations in a nuclear plant, delivery of 

supplies in hospitals and cleaning of Offices, in Labs, in Parks, Zoos, in Hotels as service 

robot. 

Compared with traditional robot navigation method, this method‟s has advantages 

which are as follows: 

1. It is Mapless. 

2. Direct utilization of existing landmarks. No need to reset. 

3. Adaptability to strange environment. No need to roam first. 

1.7 OPTICAL CHARACTER RECOGNITION 

1.7.1 What Is OCR? 

OCR is the acronym for Optical Character Recognition. This technology allows a 

machine to automatically recognize image through an optical mechanism. Human beings 

recognize many objects in this manner our eyes are the "optical mechanism." But while 



the brain "sees" the input, the ability to comprehend these signals varies with each person 

according to many factors. By reviewing these variables, we can understand the 

challenges faced by the technologist developing an OCR system. 

First, if we read a page in a language other than our own, we may recognize the 

various images, but be unable to recognize words. However, on the same page, we are 

usually able to interpret numerical statements - the symbols for numbers are universally 

used. This explains why many OCR systems recognize numbers only, while relatively 

few understand the full alphanumeric image range. 

Second, there is similarity between many numerical and alphabetical symbol 

shapes. For example, while examining a string of images combining letters and numbers, 

there is very little visible difference between a capital letter "O" and the numeral "0". As 

humans, we can re-read the sentence or entire paragraphs to help us determine the 

accurate meaning. This procedure, however, is much more difficult for a machine. 

Third, we rely on contrast to help us recognize images. We may find it very 

difficult to read text which appears against a very dark background, or is printed over 

other words or graphics. Again, programming a system to interpret only the relevant data 

and disregard the rest is a difficult task for OCR engineers. There are many other 

problems which challenge the developers of OCR systems. In this report, we will review 

the history, advancements, abilities and limitations of existing systems. This analysis 

should help determine if OCR is the correct application for your company's needs, and if 

so, which type of system to implement. 

1.7.2 History of OCR 

The engineering attempts at automated recognition of printed images started prior 

to World War II. But it was not until the early 1950's that a commercial venture was 

identified that justified necessary funding for research and development of the 

technology. They challenged all the major equipment manufacturers to come up with a 

"Common Language" to automatically process checks. After the war, check processing 

had become the single largest paper processing application in the world. Although the 

banking industry eventually chose Magnetic Ink Recognition (MICR), some vendors had 

proposed the use of an optical recognition technology. However, OCR was still in its 

infancy at the time and did not perform as acceptably as MICR. The advantage of MICR 

was that it is relatively impervious to change, fraudulent alteration and interference from 

non-MICR inks. 



The "eye'' of early OCR equipment utilized lights, mirrors, fixed slits for the 

reflected light to pass through, and a moving disk with additional slits. The reflected 

image was broken into discrete bits of black and white data, presented in a photo- 

multiplier tube, and converted to electronic bits. The "brain's" logic required the presence 

or absence of "black'' or "white" data bits at prescribed intervals. This allowed it to 

recognize a very limited, specially designed image set. To accomplish this, the units 

required sophisticated transports for documents to be processed. The documents were 

required to run at a consistent speed and the printed data had to occur in a fixed location 

on each and every form. 

The next generation of equipment, introduced in the mid to late 1960's, used a 

cathode ray tube, a pencil of light, and photo multipliers in a technique called "curve 

following". These systems offered more flexibility in both the location of the data and the 

font or design of the images that could be read. It was this technique that introduced the 

concept that handwritten images could be automatically read, particularly if certain 

constraints were utilized. This technology also introduced the concept of blue, non- 

reading inks as the system was sensitive to the ultraviolet spectrum. The third generation 

of recognition devices, introduced in the early 1970's, consisted of photo-diode arrays. 

These tiny little sensors were aligned in an array so the reflected image of a document 

would pass by at a prescribed speed. These devices were most sensitive in the infra-red 

portion of the visual spectrum so "red" inks were used as non-reading inks. 

General Applications of OCR include: 

Data entry for business documents, e.g. checks clearing. 

Automatic number plate recognition. 

Importing business card information into a contact list. 

More quickly make textual versions of printed documents, e.g. book scanning for 

Project Gutenberg. 

Make electronic images of printed documents searchable, e.g. Google Books. 

Converting handwriting in real time to control a computer (pen computing). 

Defeating CAPTCHA anti-bot systems, though these are specifically designed to 

prevent OCR. 


LITERATURE REVIEW 


Chapter 2 

This chapter gives information about the literature review taken from different 

websites, industry manuals, IEEE papers related to OCR, Landmark recognition and 

Mapless navigation method. 

2.1 Research Paper Review 

As we know, current researches on landmark recognition and the method of 

mapless navigation mainly focuses on the area of Intelligent Transportation system and 

neural network, especially the application of Artificial Intelligence or DAS (Driver 

Assistance System) 

Some of the distinguished ones which are relevant and carry basic information for 

this paper have been highlighted briefly. 

F. Moutarde, A. Bargeton, A. Herbin, and L. Chanussot, in their paper on 

“Robust on vehicle real-time visual detection of American and European speed 

limit signs, with a modular traffic sign recognition system” they have discussed 

techniques for pattern-match to extract specific landmark region, but not the OCR 

technique to identify texts on the landmark for further reasoning. For example, the 

author identifies numbers on the landmark via neural network and uses the result 

to judge whether the landmark is a speed limit landmark, without using these 

numbers to control speed. [2] 

C. Keller, C. Sprunk, C. Bahlmann, J. Giebel, and G. Baratoff, in their paper on 

“Real-time recognition of us speed signs” they have discussed about LDA which 

is used to decide whether the number region on the landmark matches with that 

stored before, thus judging whether it‟s a speed limit sign or not. [3] 

G. Qingji, Y. Yue, and Y. Guoqing, in their paper on “Detection of public 

information sign in airport terminal based on multi-scales spatio-temporal vision 

information” they have discussed SIFT features which are used to identify a 

landmark in an airport which also doesn‟t involve reasoning, besides, the SIFT 

approaches need to extract and save the feature points, and match them during 

recognition. [4] 



J. Maye, L. Spinello, R. Triebel, and R. Siegwart, in their paper on “Inferring the 

semantics of directional signs in public places” They have proposed the system 

which is similar to this in the sense that we both make use of the text and arrow 

information contained in the landmark. However, its author applies the HISM 

(Hierarchical Implicit Shape Models) to represent the landmark region, and 

identifies the sign via pattern-match method without extraction, recognition or 

interpretation of texts in the region. [5] 

T. Breuer, G. Giorgana Macedo, R. Hartanto, N. Hochgeschwender, D. Holz, F. 

Hegger, Z. Jin and G. Kraetzschmar, in their paper “Johnny: An autonomous 

service robot for domestic environments,” they have presented the technique of 

extracting text information from scenes is widely combined with application of 

robot platform, but rarely seen in the area of navigation. For instance, in the 

system of service robots that proposes, the text information is extracted from the 

image as an auxiliary feature to interpret the semantic information of objects 

contained in the scene, which is mainly used to reduce the ambiguity of 

interpretation. [6] 

In contrast, I. Posner, P. Corke, and P. Newman in their paper on “Using text- 

spotting to query the world,” it has been pointed out that the system in fully 

interprets the scene with text information and constructs a query system with a 

probabilistic model in which once input a string, the robot can locate the scenes 

related to this string. What‟s attractive is that the texts in the scenes don‟t have to 

fully match with strings you input. That is, when you input „lunch‟, the robot can 

return all the scenes that contain „restaurant‟, which resembles humans. [7] 

Balkenius C in his paper on “Spatial learning with perceptually grounded 

representations”. He has proposed a spatial navigation system based on visual 

templates is presented. Templates are created by selecting a number of high 

contrast features in the image and storing them together with their relative spatial 

locations in the image. [8] 

Franz, Matthias O, in their paper on “Learning view graphs for robot navigation” 

they have developed a vision based system for topological navigation in open 

environments. This system represents selected places by local 360º views of the 

surrounding scenes. The second approach uses objects of the environment as 

landmarks, with perception algorithms designed specifically for each object. [9] 


2.2 Java 


Beccari, G. Caselli, S. Zanichelli, in their paper on "Qualitative spatial 

representations from task-oriented perception and exploratory behaviors" they 

have described a series of motor and perceptual behaviors used for indoor 

navigation of mobile robot, walls, doors and corridors are used as landmarks. [10] 

Auranuch Lorsakul, Jackrit Suthakorn in their proposed paper “Traffic Sign 

Recognition for Intelligent Vehicle/Driver Assistance System Using Neural 

Network on OpenCV” they have discussed preprocessing techniques such as, 

threshold technique, Gaussian filter, canny edge detection, Contour and Fit 

Ellipse. Then, the implementation of Neural Networks stages to recognize the 

traffic sign patterns. They have also proposed two strategies to reduce complexity 

and to decrease the computational cost in order to facilitate the real time 

implementation. [11] 

Java is a general-purpose, concurrent, class-based, object-oriented computer 

programming language that is specifically designed to have as few implementation 

dependencies as possible. Java was originally developed by James Gosling at Sun 

Microsystems and released in 1995. [13] 

Java is intended to let application developers "Write Once, Run Anywhere" 

(WORA), meaning that code that runs on one platform does not need to be recompiled to 

run on another. Java applications are typically compiled to byte code (class file) that can 

run on any Java virtual machine (JVM) regardless of computer architecture. 

Google and Android, Inc. have chosen to use Java as a key pillar in the creation 

of the Android operating system, an open-source Smartphone operating system. Besides 

the fact that the operating system, built on the Linux kernel, was written largely in C, the 

Android SDK uses Java to design applications for the Android platform. 

2.2.1 Features of Java 

Java is a purely OOP Language that is all the Code of the Java Language is 

written into the classes and Objects. 

Java Language is Platform Independent means its program can run on any 

platform by using byte code. The Java compiler does not produce native 

executable code for a particular machine like a C compiler would. Instead it 

produces a special format called byte code and it can be transferable to another 

Computer. But byte code still needs an interpreter to execute them on any given 



platform. The interpreter reads the byte code and translates it into the native 

language of the host machine on the fly. Since the byte code is completely 

platform independent, only the interpreter and a few native libraries need to be 

ported to get Java to run on a new computer or operating system. 

Java language that is standardized enough so that executable applications can run 

on any computer that contains a Virtual Machine (run-time environment). Virtual 

machines can be embedded in web browsers (such as Netscape Navigator, 

Microsoft Internet Explorer, and IBM Web Explorer) and operating systems. 

A standardized set of Class Libraries (packages), which support creating graphical 

user interfaces, controlling multimedia data and communicating over networks. 

Java was designed with security in mind. As Java is intended to be used in 

networked/distributor environments so it implements several security mechanisms 

to protect against malicious code that might try to invade file system. 

Java is a distributed language which means that the program can be designed to 

run on computer networks. Java provides an extensive library of classes for 

communicating, using TCP/IP protocols such as HTTP and FTP. This makes 

creating network connections much easier than in C/C++. 

Java uses Multithreaded Techniques for execution means like in other in structure 

languages code is divided into smaller parts like these codes of Java is divided 

into the smaller parts those are executed by Java in the sequence and timing 

manner this is called as Multithreaded. In this program of Java is divided into the 

Small parts those are executed by compiler of Java itself. Java is called as 

interactive because a code of Java supports CUI and GUI Programs. 

2.3 Apache Tomcat 

Tomcat is one of the specific open source collaboration from a larger group of 

open source projects that are collectively famous as the Apache Jakarta Project. It is 

initially developed as a servlet reference implementation by James Duncan Davidson, at 

Sun Microsystems. Later he helped to make the project open source and played a key role 

in its donation by Sun Microsystems to the Apache Software Foundation (ASF). 

Tomcat is an application, a product of the Apache Software foundation, which 

enables standalone PC to work as a Server. This can help with a lot of tasks such as 

programming using Java Server Pages (JSP). By installing the Tomcat software, PC can 

use as a server and do any related task that a server does. 



Tomcat executes Java servlets and renders Web pages that comprise Java Server 

Page coding. Described as a “reference implementation” of the Java Servlet and the Java 

Server Page (JSP) specifications from Sun Microsystems, and provides a "pure Java" 

HTTP web server environment for Java code to run, Tomcat is the result of an open 

involvement of developers and is accessible from the Apache Web site in both binary and 

source versions. Tomcat can be used as either a separate product with its own internal 

Web server or mutually with other Web servers, including Apache, Netscape Enterprise 

Server, Microsoft Internet Information Server (IIS), and Microsoft Personal Web Server. 

Tomcat requires a Java Runtime Enterprise Environment that conforms to JRE 1.1 or 

above. 

2.3.1 Benefits of Tomcat Server 

The foremost benefit of Tomcat Server is its flexibility. For example, if you 

wanted to run Apache on one physical server but the Tomcat service and the actual 

tomcat JSP and servlets on another machine, you can. Some companies employ this 

method to offer an extra level of security, with the Tomcat server behind another firewall 

only accessible from the Apache server. Stability is one more advantage. If a significant 

failure within Tomcat caused it to fail completely, it would not render entire Apache 

service unusable, just servlets and JSP pages will get affected. 

2.4 Android 

Android is a mobile operating system that is based on a modified version of 

Linux. It is designed primarily for touchscreen mobile devices such as smart phones and 

tablet computers. It was originally developed by a startup of the same name, Android, Inc. 

In 2005, as part of its strategy to enter the mobile space, Google purchased Android and 

took over its development work (as well as its development team). Google wanted 

Android to be open and free; hence, most of the Android code was released under the 

open-source Apache License, which means that anyone who wants to use Android can do 

so by downloading the full Android source code. [17] 

The main advantage of adopting Android is that it offers a unified approach to 

application development. Developers need only develop for Android, and their 

applications should be able to run on numerous different devices, as long as the devices 

are powered using Android. In the world of smart phones, applications are the most 

important part of the success chain. Device manufacturers therefore see Android as their 

best hope to challenge the iPhone, which already commands a large base of applications. 


2.4.1 Android Versions 


Android Version Code Name Release Date 

1.5 Cupcake 30-Apr-09 

1.6 Donut 15-Sep-09 

2.0-2.1 Eclair 26-Oct-09 

2.2 Froyo 20-May-10 

2.3-2.3.2 Gingerbread 6-Dec-10 

2.3.3-2.3.7 Gingerbread 9-Feb-11 

3.1 Honeycomb 10-May-11 

3.2 Honeycomb 15-Jul-11 

4.0.x Ice Cream Sandwich 16-Dec-11 

4.1.x Jelly Bean 9-Jul-12 

4.2.x Jelly Bean 13-Nov-12 

2.4.2 Features of Android 

Tab. 2.1 A Brief History of Android Versions 

Android applications have Linux core, which safeguards them against anomalies 

and prevent them from crashing which leads you to get robust and stable android 

apps by Android programming. 

Simple and easy Android application development process. 

Hassle free application porting and it facilitate easy to use APIs and development 

tools. 

It allows fast information gathering and testing of the application. 

Web Kit engine integration for rich browser facility. 

Low mobile application development cost due to its open source nature. 

Save time of developers which tend to understand their client‟s requirements. 

Android is based on Linux Kernel programming language so high performance 

stability and security. 

Professional android app development is a tremendous platform for personal 

application development. It makes network development between applications 

easy and thereby offers best experience between android apps and end-users. 

Android is the best platform for various smart phones which facilitates developer 

to develop world class android applications. 


2.5 Embedded C 


The most common programming languages for embedded systems are C, BASIC 

and assembly languages. C used for embedded systems is slightly different compared to C 

used for general purpose (under a PC platform). Programs for embedded systems are 

usually expected to monitor and control external devices and directly manipulate and use 

the internal architecture of the processor such as interrupt handling, timers, serial 

communications and other available features. 

Two salient features of Embedded Programming are code speed and code size. 

Code speed is governed by the processing power, timing constraints, whereas code size is 

governed by available program memory and use of programming language. Goal of 

embedded system programming is to get maximum features in minimum space and 

minimum time. 

Most embedded C compilers (as well as ordinary C compilers) have been 

developed to support the ANSI (American National Standard for Information) but 

compared to ordinary C they may differ in terms of the outcome of some of the 

statements. Standard C compiler communicates with the hardware components via the 

operating system of the machine but the C compiler for the embedded system must 

communicate directly with the processor and its components. 

2.5.1 Features of Embedded C 

Embedded C is small and reasonably simpler to learn, understand, program and 

debug. 

C Compilers are available for almost all embedded devices in use today, and there 

is a large pool of experienced C programmers. 

Unlike assembly, Embedded C has an advantage of processor-independence and is 

not specific to any particular microprocessor/ microcontroller or any system. This 

makes it convenient for a user to develop programs that can run on most of the 

systems. 

As Embedded C combines functionality of assembly language and features of 

high level languages, Hence Embedded C is treated as a „middle-level computer 

language‟ or „high level assembly language‟. 

Embedded C is very efficient programming language. 

Embedded C supports access to I/O and provides ease of management of large 

embedded projects. 


2.6 Outcome of the Literature Survey 


From our literature survey, it is evident that no fully sophisticated mapless 

navigating service robots are experimented till date which uses its own intelligence for 

navigation. In this project we are proposing the basic idea of mapless navigation using a 

robot to detect landmarks for indoor or customized environment. In this method we are 

implementing OCR (optical character recognition) to track the landmark by using the 

Kohonen Neural Network. The main reason to select a Kohonen Neural Network is to 

reduce the computational cost in order to facilitate the real time implementation. After 

identification landmark, we extract the semantic information of texts or arrows contained 

in those signs, and use the result to guide the robot to the destination and produce speech 

output. 

Implementation of this method results mapless navigation of robots which is 

similar to a human navigation system that uses landmark for locating its positions and 

target its destinations easily. 



DESIGN ANALYSIS AND METHODOLOGY 

Chapter 3 

This chapter will describe the design methodology and flow of control in this 

project using data flow diagrams and flow chart analysis. 

The design phase expands the details of an analysis model by taking into account 

all technical implementations and restrictions. The purpose of the design is to specify a 

working solution that can be easily translated into programming code and implementation 

models. 

3.1 Data Flow Diagram 

Fig. 3.1 DFD for the Proposed System 


3.2 Flow Chart 


Fig. 3.2 Flow Chart Model of the Proposed System 



IMAGE EXTRACTION ALGORITHM 

Chapter 4 

This chapter will describe the detail technique for extracting images from 

landmark using image extraction algorithm. 

4.1 Landmark Image Extraction 

4.1.1 Binarization 

Landmark extraction is the first stage of this algorithm. Image captured from the 

camera is first converted to the binary image consisting of only 1‟s and 0‟s (only black 

and white) by thresholding the pixel values of 0 (black) for all pixels in the input image 

with luminance less than threshold value and 1 (white) for all other pixels. Captured 

image (original image) and binarized image are shown in Figure 4.1 and 4.2 respectively. 

4.1.2 Smearing 

Fig. 4.1 Captured Image Fig 4.2 Binarized Image 

To find the Landmark region, the smearing algorithm is used. Smearing is a 

method for the extraction of text or sign areas on a mixed image. With the smearing 

algorithm, the image is processed along vertical and horizontal runs (scan-lines). If the 

number of white pixels is less than a desired threshold or greater than any other desired 

threshold, white pixels are converted to black. In this system, threshold values are 

selected as 10 and 100 for both horizontal and vertical smearing. 

If the number of „white‟ pixels < 10; pixels become „black‟ and removing the noises and 

unwanted spots. 

Else; no change 


If the number of „white‟ pixels > 100; pixels become „black‟ 

Else; no change 

4.2 Landmark Location Finder 


After smearing, a morphological operation, dilation, is applied to the image for 

specifying the landmark location. However, there may be more than one candidate region 

for landmark location. To find the exact region of landmark and eliminate the other 

regions, some criteria tests are applied to the image by smearing and filtering operation. 

After this stage the processed image of a landmark with the text is shown in Figure 4.3. 

Fig. 4.3 Landmark Involving Text 

After obtaining Landmark location, region involving only text or sign is cut from 

the landmark as shown in Figure 4.4. 

4.3 Segmentation 

Fig. 4.4 Landmark Text Region 

In the segmentation of Landmark image, text/sign landmark region is segmented 

into its constituent parts obtaining the images individually. It includes the following steps 

4.3.1 Filtering 

Firstly, the image is filtered for enhancing the image and removing the noises and 

unwanted spots. Images are often corrupted by random variations in intensity, 

illumination, or have poor contrast and can‟t be used directly. In Filtering we transform 

pixel intensity values to reveal certain image characteristics. Filtering is a necessary 

process because the presence of noise or unwanted spot will produce wrong results. 


4.3.2 Dilation 


Dilation operation is applied to the image for separating the images from each other if 

the images are close to each other. After this operation, horizontal and vertical smearing is 

applied for finding the image regions. The result of this segmentation is in Figure 4.5. 

Fig. 4.5 Separated Text Images after Dilation Process 

4.3.3 Individual Image Separation 

The next step is to cut the landmark image. It is done by finding starting and end 

points of each individual image in the horizontal direction. The individual images cuttings 

from the landmark are as follows in Figure 4.6. 

4.3.4 Normalization 

Fig. 4.6 Individual Image Cut 

Normalization is to refine the images into a block containing no extra white 

spaces (pixels) in all the four sides of the images. Then each image fits to equal size as 

shown in Figure 4.7. 

Fig. 4.7 Image after Normalization 

Fitting approach is necessary for template matching. For matching the images to 

the database, input images must be equal sized with the database images. Here the images 

are fit to 36 * 18. The extracted images cut from landmark and the images on the database 

are now equal sized. The next step is template matching. 


4.4 Template Matching 


Template matching is an effective algorithm for recognition of images. The image 

is compared with the ones in the database and the best similarity is measured. Template 

matching is a technique of image processing for finding small parts of an image which 

match a template image or data base image. A basic method of template matching uses 

a convolution mask (template), tailored to a specific feature of the search image, which 

we want to detect. This technique can be easily performed on grey images or edge 

images. 

Fig. 4.8 Database Template Images 



IMAGE RECOGNITION USING KOHONEN 

Chapter 5 

This chapter will describe the Introduction and detail techniques for recognizing 

image using the Kohonen Neural Network. 

5.1 Introduction to the Network 

Optical Image Recognition abbreviated as OCR means that converting some text 

image into computer editable text format. For example we can say about ASCII code. But 

in this thesis Unicode is considered as converted text. Lots of recognition systems are 

available in computer science and also OCR plays a prominent role in the field of 

character Recognition. This recognition system works well for simple language like 

English. It has only 26 image sets. And for standard text there are 42 numbers of images 

including capital and small letters. But a complex but organized sign language like a 

landmark in OCR system is still in preliminary level. The reasons of its complexities are 

its image shapes, its top bars and end bars more over it has some modified, vowel and 

compound images. In this project we recognize some landmarks and words which are 

useful in robot navigation by using neural network. We are using Kohonen Neural 

Network for training and recognition procedure which means classification stage. At the 

beginning grayscale and then BW image conversion takes place for producing binary 

data. These preprocessing steps are described further. After that the image containing 

landmark is need to be converted into a trainable form by means of processing steps. 

These processing steps are described below. 

5.2 General Image Recognition Procedure 

As like all other recognition procedure character recognition is nothing but a 

recognition process. Here a simple and general character recognition procedure in Figure 

5.1 is described below. 

First of all we need a large number of raw data or collected data which will be 

processed and later trained by the system. It is very important to collect a specific data. 

Later on we need to compare with similar kind of data. And also we have to think the 

complexity level of collected data because the next steps will be dependent on input data 

type. It can be scanned documents or handwritten documents. 



Secondly we have to consider pre-processing stage. Here mainly image processing 

procedures take place. Like grayscale image conversion, binary image conversion, skew 

correction. This processing stage depends on pre-processing stages. So we need to design 

this pre-processing steps with great care. 

Thirdly the processing steps are occurring. Thinning, Edge Detection, Chain 

Code, Pixel Mappings, Histogram Analysis is some feature of processing stages. This 

stage basically converts raw data into trainable components. 

Finally the training and recognition in short classification stage take place. The 

pre-processed and processed data is trained by means of taught the system about the 

incoming data. So later on it can easily recognize an input data. 

Fig. 5.1 General Character Recognition Procedure 

5.3 Image Recognition Procedures with Kohonen 

So far we have described about the image recognition procedure. Now we are 

describing the procedure used in my image recognition system (Figure 5.3). Steps are 

described below: 

a. Landmark image which is taken as raw data for input. 

b. Extracted landmark image is now grayscale and then converted into a BW image 

in the preprocessing stage. 



c. Pixels are grabbed and mapped into specific areas and vector is extracted from the 

image containing given word or image. This part is considered as processing 

stage. 

d. Lastly Kohonen Neural Network is taken as classification stage. 

Fig. 5.2 Image Recognition Procedure Using Kohonen Neural Network 

5.4. Data Collection 

As we said we have to choose the input data type with great care because we need 

to develop the system according to raw data or collected data. Here we are talking about 

character or sign recognition in landmark so obviously we need Landmark Image and also 

text characters. But we are also considering some given word as well. But no word level 

or character level segmentation is considered here rather whole single character image or 

single word image is taken as raw input. And before entering into the system the image is 

resized into 250 X 250 pixels to satisfy the procedure. No matter, whether it‟s text 

character or sign image. We took the computer image for my system to be trained. But no 

skew correction took place here. So when we need to capture image we have to be careful 

about the image size and shape. 



Here the whole text or sign is taken and trained. Because there are lots of features, 

irregular shapes and curvatures in image of landmarks as we have seen. And still no 

general formula is generated for feature extraction from the landmark. So rather than 

extracting character or individual image from the landmark, we will take whole text or 

whole landmark image as input data in the first phase of the image recognition system. 

Fig. 5.3 Cropped Input Image from Landmark Region 

One thing is really important here. That is the character size of the image. The 

character shouldn‟t be partially present on the landmark image and it shouldn‟t be too 

small in size. In this proposed system we are considering above 36 font size which is bold 

for each individual text or image. 

5.5 Image Pre-processing 

A digital text image that contains character is generally an RGB image. The figure 

below shows two types of images containing digital English character „T‟. The character 

on Figure 5.4 is captured and resized into 250 X 250 pixels. The same thing for the Figure 

5.5 is digital computer image. 

Fig. 5.4 Captured Input Image Fig. 5.5 Computer Image 


5.5.1 RGB to Grayscale Image Conversion 


In the pre-processing first stage we are converting the input RGB image into 

grayscale image. Here we are considering the Othu‟s algorithm for RGB to grayscale 

conversion. The algorithm is given below: 

1. Count the number of pixels according to colour (256 colours) and save it to 

matrix count. 

2. Calculate probability matrix P of each colour, Pi = counti / sum of the count, 

where i= 1, 2, … … 256. 

3. Find matrix omega, omegai = cumulative sum of Pi, where i= 1, 2,..... 256. 

4. Find matrix mu, mui = cumulative sum of Pi *i, where i= 1, 2,.............256 

and mu_t = cumulative sum of P256 * 256 

5. Calculate matrix sigma_b_squared 

where, sigma_b_squaredi = (mu_t × omegai − mui) 2 /omegai - (1- omegai ) 

6. Find the location, idx, of the maximum value of sigma_b_squared. The maximum 

may extend over several bins, so average together the locations. 

7. If the maximum is not a number, meaning that sigma_b_squared is all not a 

number, and then the threshold is 0. 

8. If the maximum is a finite number, threshold = (idx - 1) / (256 - 1); 

Figure 5.6 below shows an RGB image and Figure 5.7 shows a grayscale 

converted image. 

Fig. 5.6 RGB Image Fig. 5.7 Grayscale Image 

5.5.2 Grayscale to Binary Image Conversion 

In the pre-processing second stage we are converting the grayscale image into a 

binary image. In a grayscale image there are 256 combinations of black and white colours 

where 0 means pure black and 255 means pure white. This image is converted to binary 

image by checking whether or not each pixel value is greater than 255•level (level, found 



by Otsu's Method). If the pixel value is greater than or equal to 255•level then the value is 

set to 1 i.e. white otherwise 0 i.e. black. Figure 5.8 shows a grayscale image with 0-255 

level of histogram and figure 5.9 shows a BW or binary image with two levels of the 

histogram. [14] 

Fig. 5.8 Grayscale Image with Histogram Fig. 5.9 Binary Image with Histogram 

5.6 Feature Extraction 

Next and the most important feature of the given character recognition is feature 

extraction. In this system we are considering a few steps for extracting a vector. Our main 

target is finding a vector from the image. So the image is processed and then binary 

image is created. So we have only two types of data on the image. Those are 1 for the 

white space and 0 for the black space. Now we have to pass the following steps for 

creating 625 length vectors for a particular character or image. Those are: [15] 

1. Pixel grabbing. 

2. Finding the probability of making a square. 

3. Mapped to sampled area. 

4. Creating vector. 

5. Representing character with a model no. 


5.6.1 Pixel Grabbing From Image 


As we are considering the binary image and we also fixed the image size, so we 

can easily get 250 X 250 pixels from a particular image containing given text character or 

sign. One thing is clear that we can grab and separate only character portion of the digital 

image. In specific, we took a given character contained image. And obviously it‟s a 

binary image. As we specified that the pixel containing the value 1 is a white spot and 0 

for a black one, so naturally the 0 portioned spots are the original character. 

5.6.2 Finding Probability of Making Square 

Now we are going to sample the entire image into a specified portion so that we 

can get the vector easily. We specified an area of 25 X 25 pixels. For this we need to 

convert the 250 X 250 image into the 25 X 25 area. So for each sampled area we need to 

take 10 X 10 pixels from binary image. We can give a short example for that. Table 2 is 

the original binary image of 25 X 15 pixels. We need to sample it 5 X 3 pixel areas. So, 

for each area we will consider 5 X 5 pixels from the binary image. Table 3 will show how 

pixels are classified for finding the probability of making a square. 

Tab. 5.1 Binary Converted Grid Values 


5.6.3 Mapped To Sampled Area 


We can recall the previous example from Table 4 Now the same sample pixel 

from binary image after separating is shown in Table 5. Now we will find out for each 5 

X 5 pixels from the separated pixel portion and gives a unique number for each separated 

pixel class. And this number will be equal to the 5 X 3 sampled areas. Now we need not 

consider whether 5 X 5 pixels will make a black area or square or a white area or square. 

We will take the priority of 0s or 1s from 5 X 5 pixels. And from there we can say, if the 

0s get the priority from 5X5 in i th location then we will make a black square on i th 

position of the sample area. Table 4 is having a unique number of 5 X 5 separated pixels 

and in Table 5 covering black or white depending on probabilistic manner. 

Table 5.2 Aligned Marked Values 

Here is an example of how 250 X 250 pixels of English character „T‟ are sampled 

into 25 X 25 sampled areas. 

Fig. 5.10 Sampled Image 


5.6.4 Creating Vector 


Once we have sampled the binary image we have black area and white are. Now 

we will put a single 1 (one) for each black square and 0 (zero) for each white square. And 

Figure 5.10 from above is represented by 1s and 0s combination in Figure 5.11 below. 

Fig. 5.11 Vector Representation 

Now we will collect each row, combine together and it will make a vector. Vector 

for Figure 5.11 is given as below. 

111001001011111111111111111010000001111111111111111111111111101000000001 

011111111111111111111110111110000000111111111011111111111111111010000000 

000000000000000111100000000000000000000000000000011111000000000000000000 

000000000000001111100000000000000010000000000000001111100000000000000000 

000000000000000111110000000000000000000000000000001111100000000010000000 

010000100000000011110000000000000100000000000000000001111100000000000000 

000000000000000000111110000000100000000000000000001000011110100000000000 

000000000000000000001111000000000000000000000000000000000011110000000000 

000000000000000000000011101000000000010000001000000000000000111100000000 

0001000000000000000000000011110000000000000001000000000000000. 

5.6.5 Representing a Character with a Model Number 

One thing need to mention here. That is, the system is assigned by numerical 

number and special symbols as a model number for each vector and also the 

corresponding given input word or text character from that particular model. This is 

because given character is unique length. But we are also considering word which has an 



irregular length. When we need to train, we will train it with its model no. and model 

number knows its corresponding input character. So in short we can say that, a particular 

model has a unique vector with the length of 625 characters of 1s and 0s and a unique 

character. 

5.7 Kohonen Neural Network 

We did lots of activities in pre-processing stages and as well as in the processing 

stage. The main idea is to make it simple and acceptable for the Kohonen Neural 

Network. The Kohonen neural network contains no hidden layer. This network 

architecture is named after its creator, Tuevo Kohonen. The Kohonen neural network 

differs from the feed forward back propagation neural network in several important ways. 

5.7.1 Introduction to Kohonen Network 

The Kohonen neural network differs considerably from the feed forward back 

propagation neural network. The Kohonen neural network differs both in how it is trained 

and how it recalls a pattern. The Kohohen neural network does not use any sort of 

activation function. Further, the Kohonen neural network does not use any sort of a bias 

weight. [12] 

Output from the Kohonen neural network does not consist of the output of several 

neurons. When a pattern is presented to a Kohonen network one of the output neurons is 

selected as a "winner". This "winning" neuron is the output from the Kohonen network. 

Often these "winning" neurons represent groups in the data that is presented to the 

Kohonen network. For example, in this system we consider 10 digits, 5 vowels, 21 

consonants, and some signs from the total model. The most significant difference between 

the Kohonen neural network and the feed forward back propagation neural network is that 

the Kohonen network trained in an unsupervised mode. This means that the Kohonen 

network is presented with data, but the correct output that corresponds to that data is not 

specified. Using the Kohonen network this data can be classified into groups. We will 

begin a review of the Kohonen network by examining the training process. 

Consider the vector length is 625 so input layer has 625 neurons. But in this 

output layer the number of neurons depends on the number of character trained with the 

network. As we take 625 as input and n for output character, we can draw this suitable 

Kohonen Neural Network model as shown in Figure 5.12. 


Fig. 5.12 Kohonen Neural Network 

5.7.2 The Structure of Kohonen Network 


The Kohonen neural network contains only an input and output layer of neurons. 

There is no hidden layer in a Kohonen neural network. First we will examine the input 

and output to a Kohonen neural network. [12] 

The input to a Kohonen neural network is given to the neural network using the 

input neurons. These input neurons are each given the floating point numbers that make 

up the input pattern to the network. A Kohonen neural network requires that these inputs 

be normalized to the range between -1 and 1. Presenting an input pattern to the network 

will cause a reaction from the output neurons. In a Kohonen neural network only one of 

the output neurons actually produces a value. Additionally, this single value is either true 

or false. When the pattern is presented to the Kohonen neural network, one single output 

neuron is chosen as the output neuron. Therefore, the output of the Kohonen neural 

network is usually the index of the neuron that fired. The structure of a typical Kohonen 

neural network is shown in Figure 5.13. 

Fig. 5.13 Simple Kohonen Network with 2 Input and 2 Output Neurons 


5.7.3 Sample Input to Kohonen Network 


As we understand the structure of the Kohonenneural network, we will examine 

how the network processes information. To examine this process we will step through the 

calculation process. For this example we will consider a very simple Kohonen neural 

network. This network will have only two input and two output neurons. The input given 

to the two input neurons is shown in Table 5.3. 

Table 5.3 Sample Inputs to a Kohonen Neural Network 

We must also know the connection weights between the neurons. These 

connection weights are given in Table 5.4. 

Table 5.4 Connection Weights in the Sample Kohonen Neural Network 

Using these values we will now examine which neuron would win and produce 

output. We will begin by normalizing the input. 

5.7.4 Normalizing the Input 

The requirements that the Kohonen neural network places on its input data are one 

of the most severe limitations of the Kohonen neural network. Input to the Kohonen 

neural network should be between the values -1 and 1. In addition, each of the inputs 

should fully use the range. If one, or more, of the input neurons were to use only the 

numbers between 0 and 1, the performance of the neural network would suffer. To 

normalize the input we must first calculate the "vector length" of the input data, or vector. 

This is done by summing the squares of the input vector. In this case it would be. ((0.5 * 

0.5) + (0.75 * 0.75)). This would result in a "vector length" of 0.8125. If the length 

becomes too small, say less than the length is set to that same arbitrarily small value. In 

this case the "vector length" is a sufficiently large number. Using this length we can now 



determine the normalization factor. The normalization factor is the reciprocal of the 

square root of the length. For this value the normalization factor is calculated as follows, 

and this results in a normalization factor of 1.1094. This normalization process 

will be used in the next step where the output layer is calculated. 

5.7.5 Calculating Each Neuron’s Output 

To calculate the output the input vector and neuron connection weights must both 

be considered. First the "dot product" of the input neurons and their connection weights 

must be calculated. To calculate the dot product between two vectors you must multiply 

each of the elements in the two vectors. We will now examine how this is done. 

The Kohonen algorithm specifies that we must take the dot product of the input 

vector and the weights between the input neurons and the output neurons. The result of 

this is as follows. 

| . . |•| . . |= ( . ∗ . )+ ( . ∗ . ) 

As we can see from the above calculation the dot product would be 0.395. This 

calculation will be performed for the first output neuron. This calculation will have to be 

done for each of the output neurons. Through this example we will only examine the 

calculations for the first output neuron. The calculations necessary for the second output 

neuron are calculated in the same way. 

This output must now be normalized by multiplying it by the normalization factor 

that was determined in the previous step. We must now multiply the dot product of 0.395 

by the normalization factor of 1.1094. This results in an output of 0.438213. Now that the 

output has been calculated and normalized it must be mapped to a bipolar number. 

5.7.6 Mapping to Bipolar 

In the bipolar system the binary zero maps to -1 and the binary remains a 1. 

Because the input to the neural network normalized to this range we must perform a 

similar normalization to the output of the neurons. To make this mapping we add one and 

divide the result in half. For the output of 0.438213 this would result in a final output of 

0.7191065. The value 0.7191065 is the output of the first neuron. This value will be 

compared with the outputs of the other neuron. By comparing these values we can 

determine a "winning" neuron. 


5.7.7 Choosing a Winner 


We have seen how to calculate the value for the first output neuron. If we are to 

determine a winning output neuron we must also calculate the value for the second output 

neuron. We will now quickly review the process to calculate the second neuron. For a 

more detailed description you should refer to the previous section. 

The second output neuron will use exactly the same normalization factor as was 

used to calculate the first output neuron. As you recall from the previous section the 

normalization factor is 1.1094. If we apply the dot product for the weights of the second 

output neuron and the input vector we get a value of 0.45. This value is multiplied by the 

normalization factor of 1.1094 to give the value of 0.0465948. We can now calculate the 

final output for neuron 2 by converting the output of 0.0465948 to bipolar yields 0.49923. 

As we can see we now have an output value for each of the neurons. The first 

neuron has an output value of 0.7191065 and the second neuron has an output value of 

0.49923. To choose the winning neuron we choose the output that has the largest output 

value. In this case the winning neuron is the first output neuron with an output of 

0.7191065, which beats neuron two's output of 0.49923. 

We have now seen how the output of the Kohonen neural network was derived. 

As we can see the weights between the input and output neurons determine this output. 

5.7.8 Kohonen Network Learning Procedure 

The training process for the Kohonen neural network is competitive. For each 

training set one neuron will "win". This winning neuron will have its weight adjusted so 

that it will react even more strongly to the input the next time. As different neurons win 

for different patterns, their ability to recognize that particular pattern will be increased. 

We will first examine the overall process involving training the Kohonen neural network. 


5.7.9 Learning Algorithm Flowchart 


Fig. 5.14 Flow Chart Model for Learning Algorithm 


5.7.10 Learning Rate 


The learning rate is a constant that will be used by the learning algorithm. The 

learning rate must be a positive number less than 1. Typically the learning rate is a 

number such as .4 or .5. In the following section the learning rate will be specified by the 

symbol alpha. Generally setting the learning rate to a larger value will cause the training 

to progress faster. Though setting the learning rate to too large a number could cause the 

network to never converge. This is because the oscillations of the weight vectors will be 

too great for the classification patterns to ever emerge. Another technique is to start with a 

relatively high learning rate and decrease this rate as training progresses. This allows 

initial rapid training of the neural network that will be "fine-tuned" as training progresses. 

The learning rate is just a variable that is used as part of the algorithm used to adjust the 

weights of the neurons. 

5.7.11 Adjusting Weight 

The entire memory of the Kohonen neural network is stored inside of the 

weighted connections between the input and output layer. The weights are adjusted in 

each epoch. An epoch occurs when training data are presented to the Kohonen neural 

network and the weights are adjusted based on the results of this item of training data. 

The adjustments to the weights should produce a network that will yield more favourable 

results the next time the same training data is presented. Epochs continue as more and 

more data is presented to the network and the weights are adjusted. Eventually the return 

on these weight adjustments will diminish to the point that it is no longer valuable to 

continue with this particular set of weights. When this happens the entire weight matrix is 

reset to new random values. This forms a new cycle. The final weight matrix that will be 

used will be the best weight matrix determined from each of the cycles. We will now 

examine how these weights are transformed. The original method for calculating the 

changes to weights, which was proposed by Kohonen, is often called the additive method. 

This method uses the following equation, 

The variable x is the training vector that was presented to the network. The 

variable w t is the weight of the winning neuron, and the variable w t+1 is the new weight. 

The double vertical bars represent the vector length. 

The additive method generally works well for Kohonen neural networks. Though 

in cases where the additive method shows excessive instability, and fails to converge, an 



alternate method can be used. This method is called the subtractive method. The 

subtractive method uses the following equations. = − and = + + . 

These two equations show you the basic transformation that will occur on the weights of 

the network. 

5.7.12 Calculating the Errors 

Before we can understand how to calculate the error for chronic neural network 

must first understand what the error means. The coming neural network is trained in an 

unsupervised fashion so the definition of the error is somewhat different involving they 

normally think of as an error. The purpose of the Kohonen neural network is to classify 

the input into several sets. The error for the Kohonen neural network, therefore, must be 

able to measure how well the network is classifying these items. 

5.7.13 Recognition with Kohonen Network 

So for a given pattern we can easily find out the vector and can send it through the 

Kohonen Neural Network. And for that particular pattern any one of the neuron will be 

fired. As for all input pattern the weight is normalized so the input pattern will be 

calculated with the normalized weight. As a result the fired neuron is the best answer for 

that particular input pattern. 



HARDWARE DESCRIPTION AND 

IMPLEMENTATION 

Chapter 6 

This chapter gives the brief description about the hardware implementation of the 

proposed system, components used and its circuit logic. 

6.1 Circuit Diagram 

Fig. 6.1 Circuit Diagram of the Proposed System 

The hardware implementation of this project includes the following components. 

6.1.1 RFID Reader 

RFID stands for Radio Frequency Identification. RFID is one member in the 

family of Automatic Identification and Data Capture (AIDC) technologies and is a fast 

and reliable means of identifying objects. There are two main components: The 

Interrogator (RFID Reader) which transmits and receives the signal and the Transponder 

(tag) that is attached to the object. An RFID tag is composed of a miniscule microchip 



and an antenna. RFID tags can be passive or active and come in a wide variety of sizes, 

shapes, and forms. Communication between the RFID Reader and tags occurs wirelessly 

and generally does not require a line of sight between the devices. 

An RFID Reader can read through most anything with the exception of conductive 

materials like water and metal, but with modifications and positioning, even these can be 

overcome. The RFID Reader emits a low-power radio wave field which is used to power 

up the tag so as to pass on any information that is contained on the chip. In addition, 

readers can be fitted with an additional interface that converts the radio waves returned 

from the tag into a form that can then be passed on to another system, like a computer or 

any programmable logic controller. Passive tags are generally smaller, lighter and less 

expensive than those that are active and can be applied to objects in harsh environments, 

are maintenance free and will last for years. These transponders are only activated when 

within the response range of an RFID Reader. Active tags differ in that they incorporate 

their own power source, whereas the tag is a transmitter rather than a reflector of radio 

frequency signals which enables a broader range of functionality like programmable and 

read/write capabilities 

Fig. 6.2 RFID Reader 


Fig. 6.3 Block Diagram of LF DT125R Module 


The LF DT125R reader consists of RF front end interfaced with the baseband 

processor that operates with +5V power supply. An antenna interfaces with the RF front 

end, and tuned at 125 kHz to detect a tag (transponder) that comes in the vicinity of the 

reader field. The data read from the tag by the front end is detected and decoded by the 

baseband processor and is then sent to the UART interface. The DT125R is designed for a 

reading range of 50 mm to 100 mm. An LED and a beeper can be interfaced to the print 

to indicate the tag read status. DT125R has a built-in circuitry for noise reduction. 

6.1.2 Bluetooth Module 

The concept behind Bluetooth is to provide a universal short-range wireless 

capability. Using the 2.4 GHz band, available globally for unlicensed low-power uses, 

two Bluetooth devices within 10 m of each other can share up to 720 Kbps of capacity. 

Bluetooth is designed to operate in an environment of many users. Up to eight devices 

can communicate in a small network called a piconet. Networks are usually formed ad- 

hoc from portable devices such as cellular phones, Handhelds and laptops. Unlike the 

other popular wireless technology, Wi-Fi, Bluetooth offers higher level service profiles, 

e.g., FTP-like file servers, file pushing, voice transport, serial line emulation, and more. 

Fig. 6.4 Bluetooth SMD Module - RN-42 



In this project we are using Bluetooth SMD Module - RN-42. This module from 

Roving Networks is powerful, small, and very easy to use. This Bluetooth module is 

designed to replace serial cables. The Bluetooth stack is completely encapsulated. The 

end user just sees serial characters being transmitted back and forth. Press the 'A' 

character from a terminal program on the computer and an 'A' will be pushed out the TX 

pin of the Bluetooth module. 

The RN-42 is perfect for short range, battery powered applications. The RN-42 

uses only 26µA in sleep mode while still being discoverable and connectable. Multiple 

user configurable power modes allow the user to dial in the lowest power profile for a 

given application. 

6.1.3 PIC16F873A Microcontroller 

PIC Microcontrollers are quickly replacing computers when it comes to 

programming robotic devices. These microcontrollers are small and can be programmed 

to carry out a number of tasks and are ideal for school and industrial projects. A simple 

program is written using a computer; it is then downloaded to a microcontroller which in 

turn can control a robotic device. In this project we are using Microcontroller 

PIC16F873A because it is very easy for using PIC16F873A and use FLASH memory 

technology so that can be write-erase until thousand times. The superiority this RISC 

Microcontroller compared to with other microcontroller 8-bit especially at a speed of and 

its code compression. PIC16F873A have 40 pins by 33 paths of I/O and all 40 pins are 

divided into 5 ports. 

Fig. 6.5 Pin Configuration of PIC16F873A 



The 40 pins make it easier to use the peripherals as the functions are spread out 

over the pins. This makes it easier to decide what external devices to attach without 

worrying too much if there enough pins to do the job. One of the main advantages of this 

is each pin is only shared between two or three functions so it‟s easier to decide what the 

pin functions. 

PIC16F873A perfectly fits many uses, from automotive industries and controlling 

home appliances to industrial instruments, remote sensors, electric door locks and safety 

devices. It is also ideal for smart cards as well as for battery supplied devices because of 

its low consumption. EEPROM memory makes it easier to apply microcontrollers to 

devices where permanent storage of various parameters is needed (codes for transmitters, 

motor speed, receiver frequencies, etc.). Low cost, low consumption, easy handling and 

flexibility make PIC16F873A applicable even in areas where microcontrollers had not 

previously been considered (example: timer functions, interface replacement in larger 

systems, coprocessor applications, etc.). 

6.1.4 ULN 2003 

ULN2003 is a high voltage and high current Darlington array IC. It contains seven 

open collector Darlington pairs with common emitters. A Darlington pair is an 

arrangement of two bipolar transistors. 

Fig. 6.6 ULN 2003 

The ULN2003AP/AFW Series are high-voltage, high current Darlington drivers 

comprised of seven NPN Darlington pairs. ULN2003 belongs to the family of ULN200X 

series of ICs. Different versions of this family interface to different logic families. 

ULN2003 is for 5V TTL, CMOS logic devices. These ICs are used when driving a wide 

range of loads and are used as relay drivers, display drivers, line drivers etc. ULN2003 is 

also commonly used while driving Stepper Motors. Refer Stepper Motor interfacing using 

ULN2003. 


Fig. 6.7 Pin Configuration of ULN 2003 


Each channel or Darlington pair in ULN2003 is rated at 500mA and can withstand 

peak current of 600mA. The inputs and outputs are provided opposite to each other in the 

pin layout. Each driver also contains a suppression diode to dissipate voltage spikes while 

driving inductive loads. 

These versatile devices are useful for driving a wide range of loads including 

solenoids, relays DC motors; LED display filament lamps, thermal print heads and high 

power buffers. 

6.1.5 Relay 

A relay is an electrically operated switch. Many relays use an electromagnet to 

operate a switching mechanism mechanically, but other operating principles are also 

used. Relays are used where it is necessary to control a circuit by a low-power signal 

(with complete electrical isolation between control and controlled circuits), or where 

several circuits must be controlled by one signal. 

Fig. 6.8 Relay 



All relays operate using the same basic principle. In this example will use 

commonly used 4 pin relay. Relays have two circuits: A control circuit (shown in 

GREEN) and a load circuit (shown in RED). The control circuit has small control coil 

while the load circuit has a switch. The coil controls the operation of the switch. 

Fig. 6.9 Relay Control Circuit 

Current flowing through the control circuit coil (pins 1 and 3) creates a small 

magnetic field which causes the switch to close, pins 2 and 4. The switch, which is part of 

the load circuit, is used to control an electric circuit that may connect to it. Current now 

flows through pins 2 and 4 shown in RED, when relay is energized. 

Fig. 6.10 Relay Energized (ON) 

When current stop flowing through the control circuit, pins 1 and 3, the relay 

become de-energized. Without the magnetic field, the switch opens and the current is 

prevented from flowing through pins 2 and 4. The relay is now OFF. 

Fig. 6.11 Relay De-Energized (OFF) 



When no voltage is applied to pin 1, there is no current flow through the coil. No 

current means no magnetic field is developed, and the switch is open. When voltage is 

supplied to pin 1, current flow through the coil creates a magnetic field needed to close 

the switch allowing continuity between pins 2 and 4. 

6.1.6 DC Motor 

Fig. 6.12 Relay Operation 

A DC motor is a mechanically commutated electric motor powered from direct 

current (DC). The stator is stationary in space by definition and therefore it‟s current. The 

current in the rotor is switched by the commutator to also be stationary in space. This is 

how the relative angle between the stator and rotor magnetic flux is maintained near 90 

degrees, which generates the maximum torque. 

Operation of a DC motor is based on the principle that when a current carrying 

conductor is placed in a magnetic field, the conductor experiences a mechanical force. 

The direction of this force is given by Fleming‟s left hand rule and magnitude is given by: 

F = BIℓ newtons 

Fig. 6.13 Working of DC Motor 



DC motors have a rotating armature winding (winding in which a voltage is 

induced) but non-rotating armature magnetic field and a static field winding (winding that 

produce the main magnetic flux) or permanent magnet. Different connections of field and 

armature winding provide different inherent speed/torque regulation characteristics. 

When the terminals of the motor are connected to an external source of direct current 

supply: 

(i) The field magnets are excited to develop alternate N and S poles. 

(ii) The armature conductors carry currents. All conductors under N-pole carry currents in 

one direction while all the conductors under S-pole carry currents in the opposite 

direction. 

Since each armature conductor is carrying current and is placed in the magnetic 

field, mechanical force acts on it. And applying Fleming‟s left hand rule, it is clear that 

force on each conductor is tending to rotate the armature in anticlockwise direction. All 

these forces add together to produce a driving torque which sets the armature rotating. 

When the conductor moves from one side of a brush to the other, the current in that 

conductor is reversed and at the same time it comes under the influence of next pole 

which is of opposite polarity. Consequently, the direction of force on the conductor 

remains the same. 

The speed of a DC motor can be controlled by changing the voltage applied to the 

armature or by changing the field current. The introduction of variable resistance in the 

armature circuit or field circuit allowed speed control. 

Fig. 6.14 Hardware Implementation of Project 



Chapter 7 

OVERALL DESCRIPTION & IMPLEMENTATION 

7.1 System Perspective 

This project provides the way to navigate the robot without any human 

intervention. A robot serves with the purpose here. Mount the camera on the robot. The 

communication between the robot and the PC is though GPRS. So, the distance between 

the control unit and the robot does not matter and distance between cell phone and robot 

vehicle should be minimized because here we are communicating through Bluetooth. 

Java Application runs on the server side and Android application in mobile. 

Initially robot will be moving in a particular direction. If a robot comes across RF Card 

then robot stops immediately and takes the snap and sends it to the server. The server 

processes the image and sends the instructions to the robot. 

As soon as an RF card reader gets the data, micro controller stops the robot and 

sends instructions to Cell phone through Bluetooth to capture the image. Cell phone takes 

the image and sends this image on GPRS to the server for processing. The server receives 

the image from the cell phone, applies OCR to extract the data. Based on the extracted 

data, the server sends the instruction to the robot. The robot moves according to 

instruction. If the data such as Restaurant, Petrol pump, Men at work etc. then server 

sends instructions to robot to speak up the received data, then the robot waits for the next 

instruction. 

7.2 Operating Environment 

The mobile application is created using Android and the server application is 

created using Java which is platform independent hence this application runs in the all 

types of platforms. Android Cell phone with Android OS 2.1 and above is needed and it 

should be GPRS Enabled. 

7.3 Design and Implementation Constraints 

This system is developed using Android, Java programming environment. On 

client side application we are using inbuilt Text to speech facility to produce speech 

output. Almost all Android phones will come with this feature. 


7.4 User Documentation 

This project consists of two parts. 

1. Server: In Server there are two applications running on it 


a) Web Application: To receive the captured image from mobile, received 

image will be stored in the local disk. 

b) Java Application: OCR will be applied to receive image and identifies the 

particular landmark text/sign and sends the instructions to mobile to 

navigate the robot. 

2. Client (Mobile): Android application running on it, when the data comes from 

the robot it takes the snap of the sign board and it sends to a server through GPRS 

for processing and it receives the instructions from the server for robot navigation. 

7.5 Hardware Interfaces 

1. Bluetooth and GPRS enabled Android cell phone. 

2. Robot vehicle model 

3. Micro controller 

4. Regulators 

5. Battery 

6. Bluetooth Module 

7. RF ID Reader 

8. RF Cards 

7.6 Software Interfaces 

1. Application: Java, Android. 

2. Network: App depends on the internet and Bluetooth. 

3. Mobile Operating System: Android OS 2.1 or higher version. 

4. Text to speech: Used to convert the text to voice. 

7.7 Software Requirements 

1. Jdk1.6.1_01 or above 

2. Android sdk 

3. Eclipse Galileo 


7.8 PC Requirements 

1. Dual core Processor 

2. 40GB HDD 

3. 1GB RAM 

4. Static IP GPRS connection. 

7.9 How to Execute 

On Server Part 

Step1: 


1. Copy Auto Navigation to “C: \ Apache Tomcat 6.0.16\webapps ” 

2. Remove read only attribute for this folder. 

3. Copy C:\Apache Tomcat 6.0.16\webapps\Auto_Navigation in C:\ Drive only 

Fig. 7.1 Snapshot of the Apache Tomcat Server in C drive 



Step 2: C:\Apache Tomcat 6.0.16\bin\run.bat double click on run.bat. The server will be 

started. 

Step 3: Once Server is started it will wait for the file form the mobile. 

Fig. 7.2 Snapshot of the Apache Tomcat Server under Execution 


Step 4: Then run the OCR application in the server 


Fig. 7.3 Snapshot of the OCR Server Side Application 



Step 5: Application will be waiting for the image from the mobile. Once it receives the image it 

applies the OCR and identifies the landmark. 

Fig. 7.4 Snapshot of OCR Server Side Application Waiting for the Input Data from 

Mobile 



Step 6: Once the sign has been identified, the server sends instructions to mobile for 

navigation of robot. 

Fig. 7.5 Snapshot of OCR Server Side Application Which Reads the Data 

On Mobile Part 

We need to install an android application (.apk) file on mobile (client) and by 

entering Bluetooth mac address and server IP address to the application we establish the 

connection. The mobile device should pair with the robot Bluetooth module. The mobile 

handset should be GPRS and Bluetooth enabled for the transfer of data. 


8.1 Experimental Results 

Text/Sign Accuracy rate 

Trained 100% 

Untrained 97% 

Irregular font size 98% 


RESULTS AND DISCUSSION 

Chapter 8 

The objective of this project is to develop a mapless navigation method for service 

robots, where this project fulfills this purpose. According to this system, we have 

conducted a study on the accuracy level for different input data. It is important because it 

is an identifier of feasibility and efficiency. We consider both accuracy rates and also 

some drawbacks respected to OCR based mapless navigation method. 

8.2 Accuracy Rates 

The experiments show consistent results with accurate classifications of landmark 

sign patterns with speech output. Table 8.2 shows some accuracy level for different input 

data. The data are considered as trained text or significant, untrained similar, irregular 

font size. 

8.3 Drawbacks 

Table 8.2 Accuracy Rates the system 

As this method is new and we are at the preliminary level of the Optical Character 

Recognition so the main drawback we can consider is that we need to modify and make it 

more accurate. Again like all other Neural Network training time increase with the increase in 

number of characters or words and more landmark signs in Kohonen Neural Network. 

Besides I defined fixed picture size 250 X 250. So it will not work for different images. So it 

needs to be more generalized. Finally, the system can‟t work for a small text or sign which is 

captured with long distance. This is because; system needs to grab the pixels at first from the 

original image and then map it. But in the case of small text or sign images it's not possible 

for the system to grab the pixel from an original image. So it creates problem for capturing 

landmark from the long distance and need to focus it manually for better input. 


CONCLUSION 


Chapter 9 

This chapter will provide the brief conclusion of the OCR based mapless 

navigation method. 

9.1 Conclusion 

This report tries to emphasis on a way for mapless navigation method in the 

simplest possible manner. So we propose a method that applies human navigation 

system—landmark to fulfill mapless navigation of robots. In this project we have 

implemented OCR (optical character recognition) to track the landmark by using the 

Kohonen Neural Network. But there are also lots of ways to implement OCR that could 

be more efficient than a Kohonen Neural Network. The main reason to select a Kohonen 

Neural Network is to reduce the computational cost in order to facilitate the real time 

implementation. After the locating and tracking of the landmark, we extract the semantic 

information of texts and arrows contained in those signs, and use the result to guide the 

robot to the destination. 


SCOPE FOR FUTURE WORK 


Chapter 10 

This chapter will give the scope for future work in the research area of OCR 

procedure and mapless navigation method. 

10.1 Future Work 

Currently in this project, OCR based mapless navigation method we are using a 

single low-cost mobile camera to capture the image and we will focus it on landmark by 

manually. But visual sensing will be essential for mobile robots to progress in the 

direction of increased robustness, reduced cost, reduced power consumption and 

reliability. If robots can make use of computationally efficient algorithms and off-the- 

shelf cameras with an extra features (e.g., high resolution, auto focus, auto detect, night 

vision, shake free, capturing images during motion), then the opportunity exists for robots 

to be widely deployed in outdoor environment also. 

Considerably more work will need to be done in future to determine a perfect 

OCR based mapless navigation strategy. We need to increase its limits by increasing 

number of character or sign to recognize a wide variety of landmarks. We need to 

consider applying a better machine learning algorithm for robot self-learning. A future we 

can use the artificial intelligence to make robot to take self-decisions during the 

navigation. Finally, we hope it could be applied to the field of service robot, so that they 

could be more adaptive to the human daily environment and offer better service. 


REFERENCES 


[1] R.C. Gonzalez and R.E. Woods, Digital Image Processing, second edition, Pearson 

Education, 2005. 

[2] F. Moutarde, A. Bargeton, A. Herbin, and L. Chanussot, “Robust on vehicle real- 

time visual detection of American and European speed limit signs, with a modular 

traffic signs recognition system,” in Intelligent Vehicles Symposium, IEEE 2007. 

[3] C. Keller, C. Sprunk, C. Bahlmann, J. Giebel, and G. Baratoff, “Real-time 

recognition of us speed signs,” in Intelligent Vehicles Symposium, IEEE 2008. 

[4] G. Qingji, Y. Yue, and Y. Guoqing, “Detection of public information sign in airport 

terminal based on multi-scales spatio-temporal vision information,” International 

Conference on IEEE 2008. 

[5] J. Maye, L. Spinello, R. Triebel, and R. Siegwart, “Inferring the semantics of 

direction signs in public places,” in Robotics and Automation (ICRA), IEEE 2010. 

[6] T. Breuer, G. Giorgana Macedo, R. Hartanto, N. Hochgeschwender, D. Holz, F. 

Hegger, Z. Jin and G. Kraetzschmar “Johnny: An autonomous service robot for 

domestic environments,” Journal of Intelligent & Robotic Systems, Jul. 2011. 

[7] I. Posner, P. Corke, and P. Newman, “Using text-spotting to query the world,” in 

Intelligent Robots and Systems (IROS), 2010 IEEE/RSJ International Conference 

on IEEE 2010. 

[8] Balkenius, C. “Spatial learning with perceptually grounded representations". Robotics 

and Autonomous Systems, Vol.25, IEEE 1998. 

[9] Franz, Matthias O, “Learning view graphs for robot navigation”. Autonomous 

robots, vol. 5, IEEE 1998. 

[10] Beccari. G., Caselli, S., Zanichelli, F., “Qualitative spatial representations from task- 

oriented perception and exploratory behaviors". Robotics and Autonomous Systems, 

Vol. 25, IEEE 1998. 

[11] Auranuch Lorsakul , Jackrit Suthakorn “Traffic Sign Recognition for Intelligent 



Vehicle/Driver Assistance System Using Neural Network on OpenCV”. International 

Conference on Ubiquitous Robots and Ambient Intelligence (URAI 2007). 

[12] J. Heaton, Introduction to Neural Network in Java, HR publication, 2010. 

[13] E.Balagurusamy, Programming with JAVA, Third edition, McRraw-Hill 

publications, 2008. 

[14] N. Otsu, “A Threshold Selection Method from Gray-Level Histograms”, IEEE 

Transactions on Systems, Man, and Cybernetics, 2001. 

[15] A.K. Jain and T. Taxt, Feature Extraction Methods for Character Recognition - a 

survey, Michigan State University, 2001. 

[16] Adnan Md. Shoeb Shatil, “Research Report on Bangla Optical Character 

Recognition Using Kohonen Network”, BRAC University, 2005. 

[17] Beginning Android by Mark L Murphy Apress publication, 2008. 


DOCUMENT CONVENTIONS 


The following are the list of conventions and acronyms used in this report. 

AIDC: Automatic Identification and Data Capture 

API: Application Programming Interface. 

APPENDIX A 

ANDROID: This is a programming language used for mobile based applications. 

ASF: Apache Software Foundation. 

Data Flow Diagram (DFD): It shows the data flow between the entities. 

DAS: Driver Assistance System. 

DC: Direct Current. 

Eclipse: Tool used to develop application. 

GPRS: General packet radio service. 

IEEE: Institute of Electrical and Electronics Engineers. 

JAVA: Is a widely-used general-purpose application programming language. 

JSP: Java Server Pages. 

JVM: Java Virtual Machine. 

TCP/IP: Internet Protocol. 

Text to Speech: An Engine used for converting text to speech. 

Tomcat Server: is a web container where we are running this application. 

TTL: Transistor- Transistor Logic. 

MICR: Magnetic Ink Recognition 

Mobile: Where android application running in it and guides the robot. 

Pixel: Smallest physical element in a raster image. 

OCR: Optical Character Recognizer 

SDK: Used to develop and run the android application. 

Server: Application runs on server PC which will wait for the input data from the mobile. 

VSRR: View Sequenced Route Representation.

OCR Based Mapless Navigation Method of Robot

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?