21.03.2015 Views

Natural Language Processing with Text-to- Speech on Android (May ...

Natural Language Processing with Text-to- Speech on Android (May ...

Natural Language Processing with Text-to- Speech on Android (May ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<str<strong>on</strong>g>Natural</str<strong>on</strong>g> <str<strong>on</strong>g>Language</str<strong>on</strong>g> <str<strong>on</strong>g>Processing</str<strong>on</strong>g> <str<strong>on</strong>g>with</str<strong>on</strong>g> <str<strong>on</strong>g>Text</str<strong>on</strong>g>-<str<strong>on</strong>g>to</str<strong>on</strong>g>-<str<strong>on</strong>g>Speech</str<strong>on</strong>g> <strong>on</strong> <strong>Android</strong> S<strong>on</strong>al Bhatt <strong>May</strong> 2011 1<br />

<str<strong>on</strong>g>Natural</str<strong>on</strong>g> <str<strong>on</strong>g>Language</str<strong>on</strong>g> <str<strong>on</strong>g>Processing</str<strong>on</strong>g> <str<strong>on</strong>g>with</str<strong>on</strong>g> <str<strong>on</strong>g>Text</str<strong>on</strong>g>-<str<strong>on</strong>g>to</str<strong>on</strong>g>-<br />

<str<strong>on</strong>g>Speech</str<strong>on</strong>g> <strong>on</strong> <strong>Android</strong> (<strong>May</strong> 2011)<br />

S<strong>on</strong>al Bhatt, Graduate Student, Ariz<strong>on</strong>a State University, Divisi<strong>on</strong> of Computing Studies.<br />

Email: sbbhatt1@asu.edu<br />

Abstract— As the use of mobile devices is expanding and<br />

affecting various aspects of human life, the number and<br />

smartph<strong>on</strong>e users is dramatically increasing. C<strong>on</strong>sequently, the<br />

robustness of interacti<strong>on</strong> between smartph<strong>on</strong>e and human is<br />

essential for better system performance. This paper presents the<br />

detail implementati<strong>on</strong> approach for interactive natural language<br />

system <str<strong>on</strong>g>with</str<strong>on</strong>g> ‘Cab Reservati<strong>on</strong>’ applicati<strong>on</strong> <strong>on</strong> android<br />

smartph<strong>on</strong>e. By using the speech synthesizer technology for the<br />

android, the applicati<strong>on</strong> presents the modality of text-<str<strong>on</strong>g>to</str<strong>on</strong>g>-speech<br />

resp<strong>on</strong>ses <strong>on</strong> android device.<br />

Index Terms— natural language processing, speech recogniti<strong>on</strong>,<br />

text-<str<strong>on</strong>g>to</str<strong>on</strong>g>-speech synthesizer, artificial intelligence, semantic<br />

structure, c<strong>on</strong>versati<strong>on</strong>al dialog, ad-hoc-quarries.<br />

S<br />

I. INTRODUCTION<br />

peech applicati<strong>on</strong> can be defined as interacti<strong>on</strong> between<br />

the user and the computer in more natural way. As people<br />

find speaking naturally is easy, it is the most advantageous<br />

<str<strong>on</strong>g>to</str<strong>on</strong>g> incorporate speech in<str<strong>on</strong>g>to</str<strong>on</strong>g> any natural language processing<br />

software. C<strong>on</strong>versati<strong>on</strong>al dialog is a verbal acti<strong>on</strong> which takes<br />

place turn by turn between human and computer <str<strong>on</strong>g>with</str<strong>on</strong>g><br />

feedback and acknowledgement <str<strong>on</strong>g>to</str<strong>on</strong>g> indicate understanding.<br />

The field of Artificial Intelligence (AI) and the idea of a<br />

machine dialog <str<strong>on</strong>g>with</str<strong>on</strong>g> humans are as old as the field of<br />

Computer Science. Fifty-three years ago, the British<br />

Mathematician Allan Turing proposed the Turing Test in his<br />

paper “Computing Machinery and Intelligence” [1]. In the<br />

Turing Test, a user A is placed at a terminal <str<strong>on</strong>g>with</str<strong>on</strong>g> a keyboard<br />

and at the other end another user B is placed at a different<br />

terminal. In additi<strong>on</strong> at the other end there is a computer<br />

program designed <str<strong>on</strong>g>to</str<strong>on</strong>g> maintain humanlike c<strong>on</strong>versati<strong>on</strong>s. The<br />

user A cannot see who or what is at the other end. The user A<br />

is then engaged in c<strong>on</strong>versati<strong>on</strong>s <str<strong>on</strong>g>with</str<strong>on</strong>g> user B and <str<strong>on</strong>g>with</str<strong>on</strong>g> the<br />

computer program. If the user A cannot tell the difference<br />

between the user B and the computer program, then we say<br />

that the computer program has passed the “Turing Test”. Since<br />

Alan Turing’s paper was published, for many years the Turing<br />

Test has been the ultimate goal of AI and c<strong>on</strong>versati<strong>on</strong>al<br />

systems.<br />

<str<strong>on</strong>g>Speech</str<strong>on</strong>g> applicati<strong>on</strong> should be based <strong>on</strong> an understanding of<br />

the different ways that people use language <str<strong>on</strong>g>to</str<strong>on</strong>g> communicate<br />

[2]. Nowadays people use texting and IVR (Interactive voice<br />

resp<strong>on</strong>se) <str<strong>on</strong>g>to</str<strong>on</strong>g> communicate <str<strong>on</strong>g>with</str<strong>on</strong>g> the computers via cell ph<strong>on</strong>e.<br />

IVR system can be used by teleph<strong>on</strong>e’s keypad or <str<strong>on</strong>g>with</str<strong>on</strong>g> the<br />

speech recogniti<strong>on</strong>. To order or book something <str<strong>on</strong>g>with</str<strong>on</strong>g> this kind<br />

of applicati<strong>on</strong>, it follows the exact c<strong>on</strong>versati<strong>on</strong>al dialog. IVR<br />

is prerecorded audio <str<strong>on</strong>g>to</str<strong>on</strong>g> direct user how <str<strong>on</strong>g>to</str<strong>on</strong>g> proceed. With the<br />

use of speech recognizer and speech synthesizer, the<br />

applicati<strong>on</strong>s based <strong>on</strong> IVR can be deployed <str<strong>on</strong>g>to</str<strong>on</strong>g> au<str<strong>on</strong>g>to</str<strong>on</strong>g>mobile<br />

systems for hands-free operati<strong>on</strong>.[3] Most of the time IVR<br />

based applicati<strong>on</strong> can be used for transacti<strong>on</strong>al dialog where<br />

grammar is predefined, and user is bound <str<strong>on</strong>g>to</str<strong>on</strong>g> say or type<br />

restricted quarries.<br />

Despite of the advanced AI <str<strong>on</strong>g>to</str<strong>on</strong>g>ols available, the<br />

questi<strong>on</strong> always remained for how <str<strong>on</strong>g>to</str<strong>on</strong>g> translate a semantic<br />

structure in<str<strong>on</strong>g>to</str<strong>on</strong>g> computer queries or commands that can re-use<br />

existing commercial applicati<strong>on</strong>s and databases that are<br />

proprietary <str<strong>on</strong>g>to</str<strong>on</strong>g> a specific business. Furthermore, such AI suited<br />

languages are difficult <str<strong>on</strong>g>to</str<strong>on</strong>g> use and comprehend. In the latter<br />

years software developers have been forced <str<strong>on</strong>g>to</str<strong>on</strong>g> aband<strong>on</strong> these<br />

languages that are better suited for natural language and opt <str<strong>on</strong>g>to</str<strong>on</strong>g><br />

develop specific dialog flows from scratch using Java, C++<br />

and now VoiceXML. The dialog is then designed for the<br />

specific applicati<strong>on</strong>, but it tends <str<strong>on</strong>g>to</str<strong>on</strong>g> limit the user <str<strong>on</strong>g>to</str<strong>on</strong>g> specify<br />

commands, due <str<strong>on</strong>g>to</str<strong>on</strong>g> the task-oriented nature of these languages.<br />

Although these languages have Object Oriented capabilities<br />

they are still very much task oriented.<br />

A. Problem Statement<br />

This project implements the process of developing a<br />

c<strong>on</strong>versati<strong>on</strong>al dialog for booking a cab. It allows the complex<br />

natural language requests <str<strong>on</strong>g>with</str<strong>on</strong>g> text and speech. In this<br />

applicati<strong>on</strong>, I have designed and implemented language<br />

dialogue for <strong>Android</strong> smartph<strong>on</strong>e that allows a process of<br />

booking a cab. It also allows the user <str<strong>on</strong>g>to</str<strong>on</strong>g> ask open ended<br />

questi<strong>on</strong> related <str<strong>on</strong>g>to</str<strong>on</strong>g> cab, are more c<strong>on</strong>versati<strong>on</strong>al. I have<br />

accounted all possible outcomes of the user’s utterances and<br />

have built a reply for every possible situati<strong>on</strong>. The applicati<strong>on</strong><br />

takes user input as text, entered by the keypad of <strong>Android</strong><br />

smartph<strong>on</strong>e or it can be the text from speech-<str<strong>on</strong>g>to</str<strong>on</strong>g>-text facility<br />

provided <str<strong>on</strong>g>with</str<strong>on</strong>g> speech recognizer opti<strong>on</strong> for <strong>Android</strong><br />

smartph<strong>on</strong>e. In this project I have developed the dicti<strong>on</strong>aries<br />

of words used for the domain of making a reservati<strong>on</strong> for cab.<br />

The words are categorized <str<strong>on</strong>g>with</str<strong>on</strong>g> English linguistic knowledge.<br />

The dicti<strong>on</strong>aries c<strong>on</strong>tain nouns, verbs, adjectives, pr<strong>on</strong>ouns<br />

and numbers. ‘Cab Reservati<strong>on</strong>’ applicati<strong>on</strong> can be<br />

interrupted by exact transacti<strong>on</strong>al dialog <str<strong>on</strong>g>to</str<strong>on</strong>g> simple questi<strong>on</strong>answer<br />

knowledge base c<strong>on</strong>versati<strong>on</strong>al dialog. The exact<br />

dialog allows the user <str<strong>on</strong>g>to</str<strong>on</strong>g> book a cab from <strong>on</strong>e destinati<strong>on</strong> <str<strong>on</strong>g>to</str<strong>on</strong>g><br />

another. The dialog is designed <str<strong>on</strong>g>to</str<strong>on</strong>g> take inputs of departure,<br />

destinati<strong>on</strong>, date, time and the type of vehicle user want <str<strong>on</strong>g>to</str<strong>on</strong>g><br />

reserve. It also provides the ability for user <str<strong>on</strong>g>to</str<strong>on</strong>g> ask open ended


<str<strong>on</strong>g>Natural</str<strong>on</strong>g> <str<strong>on</strong>g>Language</str<strong>on</strong>g> <str<strong>on</strong>g>Processing</str<strong>on</strong>g> <str<strong>on</strong>g>with</str<strong>on</strong>g> <str<strong>on</strong>g>Text</str<strong>on</strong>g>-<str<strong>on</strong>g>to</str<strong>on</strong>g>-<str<strong>on</strong>g>Speech</str<strong>on</strong>g> <strong>on</strong> <strong>Android</strong> S<strong>on</strong>al Bhatt <strong>May</strong> 2011 2<br />

questi<strong>on</strong> not related <str<strong>on</strong>g>to</str<strong>on</strong>g> reservati<strong>on</strong>. This type of questi<strong>on</strong> can<br />

be called as ad-hoc query that allows the user <str<strong>on</strong>g>to</str<strong>on</strong>g> get<br />

informati<strong>on</strong> about cab, i.e. rates, car seats and payment<br />

methods. This dialog has <str<strong>on</strong>g>to</str<strong>on</strong>g> be paired <str<strong>on</strong>g>with</str<strong>on</strong>g> complex business<br />

logic at the applicati<strong>on</strong> level, in order <str<strong>on</strong>g>to</str<strong>on</strong>g> support all such<br />

possible outcomes. Currently there are VoiceXML based<br />

systems are available but those system do not handle<br />

sp<strong>on</strong>taneous user requests and interrupti<strong>on</strong>s in the dialog.<br />

Those applicati<strong>on</strong>s <strong>on</strong>ly support static dialog flow.<br />

The implementati<strong>on</strong> uses the speech synthesizer for<br />

the android device. The text-<str<strong>on</strong>g>to</str<strong>on</strong>g>-speech (TTS) library allows<br />

the user <str<strong>on</strong>g>to</str<strong>on</strong>g> hear the resp<strong>on</strong>se from the system.<br />

B. Motivati<strong>on</strong><br />

Since speech is such a natural medium for<br />

communicati<strong>on</strong>, users' expectati<strong>on</strong>s of a speech applicati<strong>on</strong><br />

tend <str<strong>on</strong>g>to</str<strong>on</strong>g> be extremely high. There are some particular situati<strong>on</strong>s<br />

when people want <str<strong>on</strong>g>to</str<strong>on</strong>g> use speech applicati<strong>on</strong> - for example,<br />

when the user's hands and eyes are busy – while driving a car,<br />

accessing some locati<strong>on</strong> or ordering something over the<br />

ph<strong>on</strong>e. Sometimes people just want <str<strong>on</strong>g>to</str<strong>on</strong>g> access their electr<strong>on</strong>ic<br />

mail while driving. In this kind of situati<strong>on</strong> people tend <str<strong>on</strong>g>to</str<strong>on</strong>g> use<br />

speech applicati<strong>on</strong> expecting a successful dialog <str<strong>on</strong>g>with</str<strong>on</strong>g> accurate<br />

translati<strong>on</strong> of speech-<str<strong>on</strong>g>to</str<strong>on</strong>g>-text.<br />

People use airline informati<strong>on</strong> system, banking, ordering,<br />

reservati<strong>on</strong> system used at a hotel, ATM and many more as<br />

the transacti<strong>on</strong>al exact requests. Any <strong>on</strong>line system may<br />

require transacti<strong>on</strong> that are just informati<strong>on</strong> retrieval open<br />

ended questi<strong>on</strong> and some systems use exact request<br />

c<strong>on</strong>firmati<strong>on</strong> dialog. By moving forward developing this<br />

applicati<strong>on</strong>, the idea of ease of access for ‘Cab Reservati<strong>on</strong>’<br />

can be achieved. The integrati<strong>on</strong> of speech-<str<strong>on</strong>g>to</str<strong>on</strong>g>-text and text-<str<strong>on</strong>g>to</str<strong>on</strong>g>speech<br />

allows the applicati<strong>on</strong> <str<strong>on</strong>g>to</str<strong>on</strong>g> be used in certain<br />

envir<strong>on</strong>ment where people cannot text. Multimodality of<br />

texting and speech of this applicati<strong>on</strong> allows the user <str<strong>on</strong>g>to</str<strong>on</strong>g> book a<br />

cab while they are in the situati<strong>on</strong> where they cannot speak<br />

loudly. Furthermore, the features of ‘Cab Reservati<strong>on</strong>’<br />

applicati<strong>on</strong> allow easy usability of the applicati<strong>on</strong> for the<br />

people <str<strong>on</strong>g>with</str<strong>on</strong>g> disabilities.<br />

C. Applicati<strong>on</strong>s<br />

The first thing comes in<str<strong>on</strong>g>to</str<strong>on</strong>g> mind when we talk about<br />

text-<str<strong>on</strong>g>to</str<strong>on</strong>g>-speech applicati<strong>on</strong> is aiding the handicapped people.<br />

Blind people widely benefit from text-<str<strong>on</strong>g>to</str<strong>on</strong>g>-speech systems,<br />

when coupled <str<strong>on</strong>g>with</str<strong>on</strong>g> Optical Recogniti<strong>on</strong> Systems (OCR), [4]<br />

which give them access <str<strong>on</strong>g>to</str<strong>on</strong>g> written informati<strong>on</strong>. The market for<br />

speech synthesis for blind users of pers<strong>on</strong>al computers will<br />

so<strong>on</strong> be invaded by mass-market synthesizers bundled <str<strong>on</strong>g>with</str<strong>on</strong>g><br />

sound cards. DECtalk (TM) is already available <str<strong>on</strong>g>with</str<strong>on</strong>g> the latest<br />

SoundBlaster cards now. When the computer Aided Learning<br />

System combines <str<strong>on</strong>g>with</str<strong>on</strong>g> a <str<strong>on</strong>g>Text</str<strong>on</strong>g>-<str<strong>on</strong>g>to</str<strong>on</strong>g>-speech synthesizer (TTS), it<br />

will provide more helpful language educati<strong>on</strong> <str<strong>on</strong>g>to</str<strong>on</strong>g>ol [4].<br />

More natural communicati<strong>on</strong> can be d<strong>on</strong>e between human<br />

and machine <str<strong>on</strong>g>with</str<strong>on</strong>g> the text-<str<strong>on</strong>g>to</str<strong>on</strong>g>-speech (TTS) and also <str<strong>on</strong>g>with</str<strong>on</strong>g> the<br />

help of good voice recognizer. In this category <strong>on</strong>ly having<br />

good voice recognizer does not help having a successful<br />

communicati<strong>on</strong> in more natural way. There has <str<strong>on</strong>g>to</str<strong>on</strong>g> be a precise<br />

natural language understanding. For the domain specific<br />

communicati<strong>on</strong>, natural language parser plays an important<br />

role for high quality multimedia communicati<strong>on</strong>s [5].<br />

D. Expected End Results/Goals<br />

By implementati<strong>on</strong> the ‘Cab Reservati<strong>on</strong>’<br />

applicati<strong>on</strong>, following accomplishments are expected <str<strong>on</strong>g>to</str<strong>on</strong>g> be<br />

achieved.<br />

· Allowing the users <str<strong>on</strong>g>to</str<strong>on</strong>g> be sp<strong>on</strong>taneous, at any point in time of<br />

the transacti<strong>on</strong>al dialog of reservati<strong>on</strong> and informati<strong>on</strong>al<br />

retrieval about the cab.<br />

. Allowing the users <str<strong>on</strong>g>to</str<strong>on</strong>g> speak naturally and ask questi<strong>on</strong>s in<br />

different ways. At each step providing feedback or<br />

acknowledgement prompts.<br />

· Understand the user request and categorize it as exact<br />

transacti<strong>on</strong> or open ended fuzzy logic <str<strong>on</strong>g>with</str<strong>on</strong>g> simply questi<strong>on</strong>answer<br />

c<strong>on</strong>versati<strong>on</strong>.<br />

. Building dialogs for interacting <str<strong>on</strong>g>with</str<strong>on</strong>g> the users in more<br />

natural way and engage them in more natural c<strong>on</strong>versati<strong>on</strong>s.<br />

. Recognize and understand over 90% of the speaker’s<br />

requests even <str<strong>on</strong>g>with</str<strong>on</strong>g> l<strong>on</strong>g and complicated sentences.<br />

· Interacting <str<strong>on</strong>g>with</str<strong>on</strong>g> backend process for informati<strong>on</strong> retrieval,<br />

updating, inserting or deleting data.<br />

. Allowing the users <str<strong>on</strong>g>to</str<strong>on</strong>g> get the resp<strong>on</strong>se in either medium of<br />

speech or text.<br />

Once all of the above items are completed, the end result<br />

will be the ‘Cab Reservati<strong>on</strong>’ applicati<strong>on</strong> <str<strong>on</strong>g>with</str<strong>on</strong>g> Artificial<br />

Intelligent Agent (“AI Agent”) which will be capable of<br />

processing user’s natural speech and text for reserving a cab<br />

and engaging the user in c<strong>on</strong>versati<strong>on</strong>s <str<strong>on</strong>g>with</str<strong>on</strong>g> follow up<br />

prompts, interacting <str<strong>on</strong>g>with</str<strong>on</strong>g> a backend applicati<strong>on</strong> and in turn<br />

perform true au<str<strong>on</strong>g>to</str<strong>on</strong>g>mated cus<str<strong>on</strong>g>to</str<strong>on</strong>g>mer self-service providing multi<br />

modality resp<strong>on</strong>se <str<strong>on</strong>g>with</str<strong>on</strong>g> use of text-<str<strong>on</strong>g>to</str<strong>on</strong>g>-speech synthesizer <strong>on</strong><br />

android. Interactive voice resp<strong>on</strong>se systems are relied <strong>on</strong><br />

teleph<strong>on</strong>e keypad input. Therefore, in this ‘Cab Reservati<strong>on</strong>’<br />

speech applicati<strong>on</strong> user can speak any phrase such as “I would<br />

like <str<strong>on</strong>g>to</str<strong>on</strong>g> reserve a cab from Phoenix zoo <str<strong>on</strong>g>to</str<strong>on</strong>g> Ariz<strong>on</strong>a State<br />

University” instead of pressing different but<str<strong>on</strong>g>to</str<strong>on</strong>g>ns or numbers<br />

for various opti<strong>on</strong>s <strong>on</strong> the mobile device.<br />

II. RELATED WORK<br />

<str<strong>on</strong>g>Speech</str<strong>on</strong>g> technology can use compositi<strong>on</strong>,<br />

transcripti<strong>on</strong>, transacti<strong>on</strong> and collaborati<strong>on</strong> dialog based <strong>on</strong><br />

particular domain [4]. <str<strong>on</strong>g>Natural</str<strong>on</strong>g> <str<strong>on</strong>g>Language</str<strong>on</strong>g> Understanding<br />

(NLU) and speech recogniti<strong>on</strong> are two independent<br />

technologies. When these two technologies can be combined,<br />

it provides the powerful human-computer interacti<strong>on</strong> (HCI).<br />

<str<strong>on</strong>g>Natural</str<strong>on</strong>g> language understanding has been an active area<br />

research for decades. Since then, the field of Artificial<br />

Intelligence (AI) has evolved researchers are borrowing ideas<br />

from the fields of mathematics, linguistics, psychology and<br />

philosophy. From the research of l<strong>on</strong>g decades it can be<br />

derived that the c<strong>on</strong>venti<strong>on</strong>al computer programs and<br />

procedural paradigms were not suited for the challenge at<br />

hand. By procedural paradigms I am referring <str<strong>on</strong>g>to</str<strong>on</strong>g> task oriented<br />

programming, such as a program written in a 3rd generati<strong>on</strong><br />

language like COBOL, Fortran, or C [5]. Completely different<br />

languages and <str<strong>on</strong>g>to</str<strong>on</strong>g>ols had <str<strong>on</strong>g>to</str<strong>on</strong>g> be created <str<strong>on</strong>g>to</str<strong>on</strong>g> help the development


<str<strong>on</strong>g>Natural</str<strong>on</strong>g> <str<strong>on</strong>g>Language</str<strong>on</strong>g> <str<strong>on</strong>g>Processing</str<strong>on</strong>g> <str<strong>on</strong>g>with</str<strong>on</strong>g> <str<strong>on</strong>g>Text</str<strong>on</strong>g>-<str<strong>on</strong>g>to</str<strong>on</strong>g>-<str<strong>on</strong>g>Speech</str<strong>on</strong>g> <strong>on</strong> <strong>Android</strong> S<strong>on</strong>al Bhatt <strong>May</strong> 2011 3<br />

of C<strong>on</strong>versati<strong>on</strong>al Systems, such as Lisp (based <strong>on</strong> Lamda<br />

Calculus), Prolog (based <strong>on</strong> predicate calculus), SmallTalk<br />

(based <strong>on</strong> objects), semantic nets, frames, etc.<br />

There are many systems in the world <str<strong>on</strong>g>to</str<strong>on</strong>g> dem<strong>on</strong>strates<br />

the various linguistic issues such as ELISA, Winograd’s<br />

SHRDLU simulated a robot that c<strong>on</strong>trol blocks <strong>on</strong> a table<str<strong>on</strong>g>to</str<strong>on</strong>g>p,<br />

LUNAR, LIFER [6].<br />

ELIZA is a very early example of natural language<br />

processing. When I research about the natural language<br />

processing, first comes in<str<strong>on</strong>g>to</str<strong>on</strong>g> mind, ELIZA written at MIT by<br />

Josef Weizenbaum between 1964 <str<strong>on</strong>g>to</str<strong>on</strong>g> 1966 [6]. ELIZA worked<br />

by parsing and substituti<strong>on</strong> of key words in<str<strong>on</strong>g>to</str<strong>on</strong>g> phrases [ref].<br />

ELIZA computer program using almost no informati<strong>on</strong> about<br />

the human thought or emoti<strong>on</strong>, it provided human-like<br />

interacti<strong>on</strong>.<br />

Sec<strong>on</strong>dly, a chatterbot is designed <str<strong>on</strong>g>to</str<strong>on</strong>g> simulate an<br />

intelligent c<strong>on</strong>versati<strong>on</strong> <str<strong>on</strong>g>with</str<strong>on</strong>g> humans via speech or text. This<br />

chatterbot is based <strong>on</strong> the theory of Turning Test described in<br />

the introducti<strong>on</strong> of this paper. The technology, used by the<br />

chatterbot, <str<strong>on</strong>g>to</str<strong>on</strong>g> generate resp<strong>on</strong>se is simply finding a keyword<br />

from the input and get the reply from database <str<strong>on</strong>g>with</str<strong>on</strong>g> matching<br />

keywords or wording patterns [6].<br />

In the field of linguistic, Chomsky proposed the X-<br />

bar theory in 1970 and was further developed by Jackendoff in<br />

1977. X-bar theory identifies the syntactic presumably for<br />

human languages [7]. The letter X is used for part of speech.<br />

So in the process of parsing a speech or utterances, all the<br />

lexic<strong>on</strong> or <strong>on</strong><str<strong>on</strong>g>to</str<strong>on</strong>g>logical categories are assigned <str<strong>on</strong>g>to</str<strong>on</strong>g> each of the<br />

word in speech. Therefore N assigned <str<strong>on</strong>g>to</str<strong>on</strong>g> noun, A assigned for<br />

adjective, V for Verb and P assigned <str<strong>on</strong>g>to</str<strong>on</strong>g> prepositi<strong>on</strong>. The main<br />

proposal of X-bar theory is all phrases are defined by rules.<br />

According <str<strong>on</strong>g>to</str<strong>on</strong>g> the X-bar theory, every rule has c<strong>on</strong>ceptual<br />

structural schema [8].<br />

Figure 1: General Understanding of X-bar theory<br />

According <str<strong>on</strong>g>to</str<strong>on</strong>g> this theory the simple sentence “Mike<br />

likes Maria” is parsed as following syntactic category as<br />

shown in Figure 2. Where a zero-level word (category)<br />

“Maria” combines <str<strong>on</strong>g>with</str<strong>on</strong>g> some other element, “like” and an x-<br />

bar level category formed called V-bar. When X-bar level<br />

category combines <str<strong>on</strong>g>with</str<strong>on</strong>g> some further element, called “Mike”<br />

is formed the XP level category called VP [9].<br />

<str<strong>on</strong>g>Text</str<strong>on</strong>g>-<str<strong>on</strong>g>to</str<strong>on</strong>g>-<str<strong>on</strong>g>Speech</str<strong>on</strong>g> synthesizer c<strong>on</strong>verts the written text<br />

<str<strong>on</strong>g>to</str<strong>on</strong>g> sounds. Nowadays in the world there are number of speech<br />

synthesizers available which have low performance ratio yet<br />

satisfac<str<strong>on</strong>g>to</str<strong>on</strong>g>ry audio output in different languages such as<br />

English, Japanese and Swedish. The techniques used in speech<br />

synthesizer includes c<strong>on</strong>catenati<strong>on</strong> of digital recordings,<br />

synthesis by rule, where the informati<strong>on</strong> is provided for the<br />

words <str<strong>on</strong>g>to</str<strong>on</strong>g> make in<str<strong>on</strong>g>to</str<strong>on</strong>g>nati<strong>on</strong>, and <str<strong>on</strong>g>to</str<strong>on</strong>g>ne.<br />

Figure 2: Parsing <str<strong>on</strong>g>with</str<strong>on</strong>g> X-bar theory<br />

Figure 3 shows the functi<strong>on</strong>al diagram of a very<br />

general TTS synthesizer. A simple text is process by <str<strong>on</strong>g>Natural</str<strong>on</strong>g><br />

language processing software <str<strong>on</strong>g>with</str<strong>on</strong>g> linguistic knowledge and<br />

some logical inferences. Then the text goes <str<strong>on</strong>g>to</str<strong>on</strong>g> make some<br />

ph<strong>on</strong>etic transcripti<strong>on</strong> <str<strong>on</strong>g>with</str<strong>on</strong>g> desired in<str<strong>on</strong>g>to</str<strong>on</strong>g>nati<strong>on</strong> and rhymes [5].<br />

Then it passes through the Digital Signal processing <str<strong>on</strong>g>to</str<strong>on</strong>g><br />

transform that symbolic informati<strong>on</strong> in<str<strong>on</strong>g>to</str<strong>on</strong>g> speech <str<strong>on</strong>g>with</str<strong>on</strong>g> the help<br />

of mathematical models, algorithms and computati<strong>on</strong>s.<br />

Figure 3: <str<strong>on</strong>g>Text</str<strong>on</strong>g>-<str<strong>on</strong>g>to</str<strong>on</strong>g>-<str<strong>on</strong>g>Speech</str<strong>on</strong>g> Synthesizer<br />

Most of the time text-<str<strong>on</strong>g>to</str<strong>on</strong>g>-speech synthesizer costs the<br />

user <str<strong>on</strong>g>to</str<strong>on</strong>g> say some specific and restricted text <str<strong>on</strong>g>to</str<strong>on</strong>g> pr<strong>on</strong>ounce.<br />

Sometimes the quality of the “emoti<strong>on</strong>al dynamics” also<br />

comes in<str<strong>on</strong>g>to</str<strong>on</strong>g> the play as the outputs are not as comparable <str<strong>on</strong>g>to</str<strong>on</strong>g><br />

human speech performances. Although giving less satisfac<str<strong>on</strong>g>to</str<strong>on</strong>g>ry<br />

output, these synthesizers solve the problem in real time <str<strong>on</strong>g>with</str<strong>on</strong>g><br />

limited memory requirements. The example of this technology<br />

is Emily(ref), that acts as reading coach for children. It<br />

provides reading passages and makes correcti<strong>on</strong>. Other<br />

examples of speech processing c<strong>on</strong>sist DECtalk,<br />

Drag<strong>on</strong>Dictate, and Ph<strong>on</strong>etic Engine [6].<br />

A. Architecture<br />

III. IMPLEMENTATION<br />

Figure 4: Software Architecture


<str<strong>on</strong>g>Natural</str<strong>on</strong>g> <str<strong>on</strong>g>Language</str<strong>on</strong>g> <str<strong>on</strong>g>Processing</str<strong>on</strong>g> <str<strong>on</strong>g>with</str<strong>on</strong>g> <str<strong>on</strong>g>Text</str<strong>on</strong>g>-<str<strong>on</strong>g>to</str<strong>on</strong>g>-<str<strong>on</strong>g>Speech</str<strong>on</strong>g> <strong>on</strong> <strong>Android</strong> S<strong>on</strong>al Bhatt <strong>May</strong> 2011 4<br />

To develop the “Cab Reservati<strong>on</strong>” applicati<strong>on</strong> <str<strong>on</strong>g>with</str<strong>on</strong>g> <str<strong>on</strong>g>Natural</str<strong>on</strong>g><br />

<str<strong>on</strong>g>Language</str<strong>on</strong>g> <str<strong>on</strong>g>Processing</str<strong>on</strong>g> and <str<strong>on</strong>g>Text</str<strong>on</strong>g>-<str<strong>on</strong>g>to</str<strong>on</strong>g>-<str<strong>on</strong>g>Speech</str<strong>on</strong>g>, I have used the<br />

Client-Server Architecture approach. As the goal of the<br />

applicati<strong>on</strong> is <str<strong>on</strong>g>to</str<strong>on</strong>g> provide text-<str<strong>on</strong>g>to</str<strong>on</strong>g>-speech functi<strong>on</strong>ality <strong>on</strong><br />

mobile device, it uses <strong>Android</strong> operating system device as<br />

client. It can be any model of the ph<strong>on</strong>e using <strong>Android</strong> 2.2 or<br />

above versi<strong>on</strong>. On the server side, The <str<strong>on</strong>g>Natural</str<strong>on</strong>g> <str<strong>on</strong>g>Language</str<strong>on</strong>g><br />

Parser is developed in Java. The grammars and dicti<strong>on</strong>aries<br />

used for the ‘Cab Reservati<strong>on</strong> applicati<strong>on</strong> are in the .txt format<br />

packaged in ‘<str<strong>on</strong>g>Language</str<strong>on</strong>g>’ direc<str<strong>on</strong>g>to</str<strong>on</strong>g>ry of the applicati<strong>on</strong> package.<br />

User can c<strong>on</strong>nect <str<strong>on</strong>g>to</str<strong>on</strong>g> the server from the android ph<strong>on</strong>e by<br />

giving the host name and the port number of the server. User<br />

can speak or type the questi<strong>on</strong> <str<strong>on</strong>g>to</str<strong>on</strong>g> the device and then it<br />

c<strong>on</strong>nects <str<strong>on</strong>g>to</str<strong>on</strong>g> the server <str<strong>on</strong>g>to</str<strong>on</strong>g> parse the speech and process it using<br />

natural language processing. The server communicates <str<strong>on</strong>g>to</str<strong>on</strong>g> the<br />

android client and gives the resp<strong>on</strong>se in text format. <strong>Android</strong><br />

client is using <str<strong>on</strong>g>Text</str<strong>on</strong>g>-<str<strong>on</strong>g>to</str<strong>on</strong>g>-speech library <str<strong>on</strong>g>to</str<strong>on</strong>g> process the text in<str<strong>on</strong>g>to</str<strong>on</strong>g><br />

speech. Finally user can get the resp<strong>on</strong>se of the questi<strong>on</strong> asked<br />

in speech format <strong>on</strong> android device.<br />

C. Design<br />

B. Technologies and Libraries Used<br />

The “Cab Reservati<strong>on</strong>” applicati<strong>on</strong> <str<strong>on</strong>g>with</str<strong>on</strong>g> <str<strong>on</strong>g>Natural</str<strong>on</strong>g><br />

<str<strong>on</strong>g>Language</str<strong>on</strong>g> <str<strong>on</strong>g>Processing</str<strong>on</strong>g> (NLP) and text-<str<strong>on</strong>g>to</str<strong>on</strong>g>-speech (TTS)<br />

functi<strong>on</strong>ality is developed using the Java language versi<strong>on</strong> 1.6<br />

<strong>on</strong> the Windows operating system. In additi<strong>on</strong> it requires the<br />

<strong>Android</strong> operating system <str<strong>on</strong>g>to</str<strong>on</strong>g>ol <str<strong>on</strong>g>to</str<strong>on</strong>g> use as client. In this<br />

applicati<strong>on</strong> the <str<strong>on</strong>g>Text</str<strong>on</strong>g>-<str<strong>on</strong>g>to</str<strong>on</strong>g>-<str<strong>on</strong>g>Speech</str<strong>on</strong>g> library used for the <strong>Android</strong><br />

device is open-sourced. It can be downloaded <str<strong>on</strong>g>to</str<strong>on</strong>g> any <strong>Android</strong><br />

device from android market. This TTS library is also packaged<br />

in the software running <strong>on</strong> Windows machine.<br />

I have developed two different types of grammar for this<br />

applicati<strong>on</strong>. First, <str<strong>on</strong>g>to</str<strong>on</strong>g> facilitate transacti<strong>on</strong>al dialog for booking<br />

a cab, I have created cabexactgrammar.txt file and sec<strong>on</strong>d, for<br />

ad-hoc, open-ended questi<strong>on</strong>s for cab informati<strong>on</strong> retrieval I<br />

have created cabfuzzygrammar.txt. To generate these<br />

grammars I have used the NuGram IDE [10] plug-in for<br />

Eclipse SDK. NuGram IDE is open sourced Eclipse plug-in<br />

that offers <str<strong>on</strong>g>to</str<strong>on</strong>g> generate speech recogniti<strong>on</strong> grammar in<br />

Augmented Bakus Naur Form (ABNF) format. This format is<br />

a plain-text, n<strong>on</strong>-XML, representati<strong>on</strong> of a traditi<strong>on</strong>al Bakus<br />

Naur Form(BNF) grammar. The body of a grammar c<strong>on</strong>sists<br />

of a set of rule definiti<strong>on</strong>s. Each rule definiti<strong>on</strong> associates a<br />

rule name <str<strong>on</strong>g>with</str<strong>on</strong>g> a rule expansi<strong>on</strong>. The purpose of the rule<br />

definiti<strong>on</strong> is <str<strong>on</strong>g>to</str<strong>on</strong>g> associate a legal rule expansi<strong>on</strong> <str<strong>on</strong>g>with</str<strong>on</strong>g> a rule<br />

name. Most grammars identify a set of possible words that a<br />

user might text or say, the <str<strong>on</strong>g>to</str<strong>on</strong>g>p-level rule expansi<strong>on</strong> in a<br />

grammar rule is usually a set of alternatives. For example,<br />

$Vehicle = sedan | coup | minivan;<br />

This rule is named ‘Vehicle’, and the rule expansi<strong>on</strong> is the set<br />

of alternatives. This grammar is matched if the user says or<br />

texts "sedan", "coup", or "minivan". Here the terminology<br />

used for these actual words, user might say, is <str<strong>on</strong>g>to</str<strong>on</strong>g>kens. I have<br />

generated ABNF format grammars and c<strong>on</strong>vert them in<str<strong>on</strong>g>to</str<strong>on</strong>g><br />

simple text format for the <str<strong>on</strong>g>Natural</str<strong>on</strong>g> <str<strong>on</strong>g>Language</str<strong>on</strong>g> Parser <str<strong>on</strong>g>to</str<strong>on</strong>g> be<br />

unders<str<strong>on</strong>g>to</str<strong>on</strong>g>od and processed easily.<br />

Figure 5: Applicati<strong>on</strong> Flow<br />

Figure 5 describes the basic communicati<strong>on</strong> structure of<br />

android client <str<strong>on</strong>g>to</str<strong>on</strong>g> ‘Cab Reservati<strong>on</strong>’ applicati<strong>on</strong> <strong>on</strong> the server.<br />

The android smarph<strong>on</strong>e takes the input as speech or as text<br />

and is processed though the server using natural language<br />

parser. The natural language parser understands the input and<br />

parses the speech in parts and categorizes the request as<br />

questi<strong>on</strong>-answer fuzzy logic or cab booking transacti<strong>on</strong>al<br />

dialog based <strong>on</strong> the inputs of two grammar files. The server<br />

gets the appropriate resp<strong>on</strong>se from the knowledge base text<br />

database. This resp<strong>on</strong>se is processed by <str<strong>on</strong>g>Text</str<strong>on</strong>g>-<str<strong>on</strong>g>to</str<strong>on</strong>g>-<str<strong>on</strong>g>Speech</str<strong>on</strong>g> (TTS)<br />

library <strong>on</strong> android device and plays and displays the prompt<br />

<strong>on</strong> android screen by android client.<br />

An utterance is c<strong>on</strong>verted in<str<strong>on</strong>g>to</str<strong>on</strong>g> multiple recogniti<strong>on</strong><br />

(n-best) results or captured from text box of the client API and<br />

sent <str<strong>on</strong>g>to</str<strong>on</strong>g> the server API. The server API takes the (n-best)<br />

phrase and processes it through the <str<strong>on</strong>g>Natural</str<strong>on</strong>g> <str<strong>on</strong>g>Language</str<strong>on</strong>g><br />

Understanding Parser, the Parser first checks the Cab Booking<br />

exact grammar. If the parsing is not successful, then it checks<br />

the cab fuzzy logic Grammar. The phrase is then broken apart<br />

syntactically by the Parser using the <str<strong>on</strong>g>to</str<strong>on</strong>g>kens and rules defined<br />

in ABNF grammars. As the phrase is being broken apart<br />

syntactically, the C<strong>on</strong>ceptual Structure is being assembled<br />

through the process of parsing. From this point <strong>on</strong>, the system<br />

no l<strong>on</strong>ger deals <str<strong>on</strong>g>with</str<strong>on</strong>g> phrases, but rather <str<strong>on</strong>g>with</str<strong>on</strong>g> c<strong>on</strong>ceptual<br />

structures. Dicti<strong>on</strong>aries of nouns, verbs, prepositi<strong>on</strong>, adjectives<br />

and numbers are used <str<strong>on</strong>g>to</str<strong>on</strong>g> form the semantic c<strong>on</strong>ceptual<br />

structure from the parts of speech. This process is d<strong>on</strong>e<br />

defining the <str<strong>on</strong>g>to</str<strong>on</strong>g>kens in the grammar file. As the parser goes<br />

through each of the rule for correct match, it puts the proper<br />

<strong>on</strong><str<strong>on</strong>g>to</str<strong>on</strong>g>logical category defined as <str<strong>on</strong>g>to</str<strong>on</strong>g>ken in grammar file. For the<br />

booking grammar, the <strong>on</strong><str<strong>on</strong>g>to</str<strong>on</strong>g>logical category defined as the<br />

name of the states of the states file which are the pieces of the<br />

informati<strong>on</strong> need <str<strong>on</strong>g>to</str<strong>on</strong>g> fulfill the cab booking process.<br />

For example, if the user asks the open ended questi<strong>on</strong> “Can I<br />

bring my pet?”, <str<strong>on</strong>g>Natural</str<strong>on</strong>g> <str<strong>on</strong>g>Language</str<strong>on</strong>g> Parser breaks the sentence


<str<strong>on</strong>g>Natural</str<strong>on</strong>g> <str<strong>on</strong>g>Language</str<strong>on</strong>g> <str<strong>on</strong>g>Processing</str<strong>on</strong>g> <str<strong>on</strong>g>with</str<strong>on</strong>g> <str<strong>on</strong>g>Text</str<strong>on</strong>g>-<str<strong>on</strong>g>to</str<strong>on</strong>g>-<str<strong>on</strong>g>Speech</str<strong>on</strong>g> <strong>on</strong> <strong>Android</strong> S<strong>on</strong>al Bhatt <strong>May</strong> 2011 5<br />

in<str<strong>on</strong>g>to</str<strong>on</strong>g> parts (words). For each word its finding the right category<br />

based <strong>on</strong> the dicti<strong>on</strong>aries in the system. Here the word ‘pet’<br />

falls in<str<strong>on</strong>g>to</str<strong>on</strong>g> the ‘noun’ category, so the parser puts that word in<str<strong>on</strong>g>to</str<strong>on</strong>g><br />

the <strong>on</strong><str<strong>on</strong>g>to</str<strong>on</strong>g>logical category named DOBJECT. For the verb<br />

‘bring’ the <strong>on</strong><str<strong>on</strong>g>to</str<strong>on</strong>g>logical category is EVENT. So below is the<br />

c<strong>on</strong>ceptual structure parsed made for the above example<br />

questi<strong>on</strong>. It changes the first pers<strong>on</strong> of ‘I’ from the questi<strong>on</strong> <str<strong>on</strong>g>to</str<strong>on</strong>g><br />

sec<strong>on</strong>d pers<strong>on</strong> ‘you’ when it parts the request.<br />

(THING(SUBJ(VALUE you))(EVENT bring(DOBJECT my(<br />

VALUE pet))))<br />

Figure 6: Dialog Direc<str<strong>on</strong>g>to</str<strong>on</strong>g>r<br />

As shown in Figure 6, The Dialog Direc<str<strong>on</strong>g>to</str<strong>on</strong>g>r is resp<strong>on</strong>sible<br />

for keeping track of the c<strong>on</strong>versati<strong>on</strong> and building the dialog<br />

flow <strong>on</strong> the fly. If the parsing <str<strong>on</strong>g>with</str<strong>on</strong>g> the cab booking<br />

transacti<strong>on</strong>al grammar is successful, then the Dialog Direc<str<strong>on</strong>g>to</str<strong>on</strong>g>r<br />

determines that this is an “exact request” (booking type). If the<br />

parsing <str<strong>on</strong>g>with</str<strong>on</strong>g> the cab fuzzy logic Grammar is successful, then it<br />

determines that this is a “fuzzy request” (ad-hoc query). If the<br />

request is an ad-hoc query or questi<strong>on</strong>, then the Dialog<br />

Direc<str<strong>on</strong>g>to</str<strong>on</strong>g>r determines whether the request is interrupting a subdialog<br />

and it keeps the current dialog c<strong>on</strong>text. If the request<br />

interrupts the dialog for any other reas<strong>on</strong>s, then the Dialog<br />

Direc<str<strong>on</strong>g>to</str<strong>on</strong>g>r determines whether <str<strong>on</strong>g>to</str<strong>on</strong>g> interrupt the current sub dialog<br />

<str<strong>on</strong>g>with</str<strong>on</strong>g> the new request. The Dialog direc<str<strong>on</strong>g>to</str<strong>on</strong>g>r also keeps track of<br />

where the dialog is at any given point in time, whether this is a<br />

correcti<strong>on</strong>, or whether the dialog is at a c<strong>on</strong>firmati<strong>on</strong> or user<br />

wants <str<strong>on</strong>g>to</str<strong>on</strong>g> repeat the prompt.<br />

Once the Dialog Direc<str<strong>on</strong>g>to</str<strong>on</strong>g>r determines how the request<br />

must be processed, a c<strong>on</strong>ceptual structure is sent <str<strong>on</strong>g>to</str<strong>on</strong>g> either the<br />

cab informati<strong>on</strong> dialog or <str<strong>on</strong>g>to</str<strong>on</strong>g> the cab booking dialog as show<br />

in Figure 6. If the request is open ended informati<strong>on</strong> retirieval<br />

based then it c<strong>on</strong>tacts the XML Knowledge Base named<br />

cabanswer.xml file <str<strong>on</strong>g>to</str<strong>on</strong>g> retrieves a correct answer (in the form<br />

of a c<strong>on</strong>ceptual structure). These answers are also c<strong>on</strong>verted<br />

in<str<strong>on</strong>g>to</str<strong>on</strong>g> c<strong>on</strong>ceptual networks and are compared heuristically <str<strong>on</strong>g>with</str<strong>on</strong>g><br />

the c<strong>on</strong>ceptual structure. A successful comparis<strong>on</strong> yields a<br />

correct answer. If the request is categorized by the direc<str<strong>on</strong>g>to</str<strong>on</strong>g>r as<br />

transacti<strong>on</strong>al booking dialog then direc<str<strong>on</strong>g>to</str<strong>on</strong>g>r c<strong>on</strong>tacts the<br />

cabstates.xml file <str<strong>on</strong>g>to</str<strong>on</strong>g> fire the next prompt for the user.<br />

Coding Details<br />

The cab booking applicati<strong>on</strong> <str<strong>on</strong>g>with</str<strong>on</strong>g> natural language<br />

processing c<strong>on</strong>figures the Artificial Intelligence envir<strong>on</strong>ment.<br />

The applicati<strong>on</strong> engine builds the dialog <strong>on</strong> the fly, depending<br />

<strong>on</strong> a specific situati<strong>on</strong> in the c<strong>on</strong>versati<strong>on</strong>, where the number<br />

of situati<strong>on</strong>s could potentially be exp<strong>on</strong>ential. That is why the<br />

system gives a more natural flow <str<strong>on</strong>g>to</str<strong>on</strong>g> the dialog that would be<br />

difficult <str<strong>on</strong>g>to</str<strong>on</strong>g> simulate <str<strong>on</strong>g>with</str<strong>on</strong>g> another programming paradigm.<br />

The nature of the c<strong>on</strong>versati<strong>on</strong>s is driven by both<br />

grammars described above al<strong>on</strong>g <str<strong>on</strong>g>with</str<strong>on</strong>g> the XML declarative<br />

definiti<strong>on</strong>s as cabstates.xml for booking dialog and<br />

cabanswers.xml <str<strong>on</strong>g>to</str<strong>on</strong>g> retrieve knowledge base answer. Changes<br />

in grammars and the XML definiti<strong>on</strong>s make up the nature and<br />

c<strong>on</strong>tent of the c<strong>on</strong>versati<strong>on</strong>s. The implementati<strong>on</strong> c<strong>on</strong>tains the<br />

following comp<strong>on</strong>ents:<br />

· The Java based API for natural language parser that takes<br />

user input and parse the speech in<str<strong>on</strong>g>to</str<strong>on</strong>g> meaningful and system<br />

understandable semantic, c<strong>on</strong>ceptual form.<br />

· A direc<str<strong>on</strong>g>to</str<strong>on</strong>g>ry c<strong>on</strong>sists of text files defined as dicti<strong>on</strong>aries.<br />

These dicti<strong>on</strong>aries are categorized based <strong>on</strong> the English<br />

language grammar.<br />

· A cab booking grammar for exact queries/booking<br />

transacti<strong>on</strong>s in ABNF format.<br />

· A cab fuzzy logic grammar for handling ad-hoc open ended<br />

informati<strong>on</strong> retrieval request.<br />

These grammars are more powerful as they allow the <str<strong>on</strong>g>to</str<strong>on</strong>g>kens<br />

of the comprehensive dicti<strong>on</strong>aries included <str<strong>on</strong>g>with</str<strong>on</strong>g>in the system.<br />

The semantic informati<strong>on</strong> refers <str<strong>on</strong>g>to</str<strong>on</strong>g> <strong>on</strong><str<strong>on</strong>g>to</str<strong>on</strong>g>logical categories or<br />

other special categories freely chosen by the user. As the<br />

booking request transacti<strong>on</strong>al dialog uses cab booking<br />

grammar and cabstates.xml file <str<strong>on</strong>g>to</str<strong>on</strong>g> build the c<strong>on</strong>ceptual<br />

structure, ad-hoc query type questi<strong>on</strong>s uses the fuzzy logic<br />

grammar <str<strong>on</strong>g>to</str<strong>on</strong>g> parse the speech and build a semantic structure.<br />

This process uses nouns, verbs, prepositi<strong>on</strong>, adjectives, and<br />

numbers dicti<strong>on</strong>ary files <str<strong>on</strong>g>to</str<strong>on</strong>g> map the words <str<strong>on</strong>g>to</str<strong>on</strong>g> semantic<br />

<strong>on</strong><str<strong>on</strong>g>to</str<strong>on</strong>g>logical category. On<str<strong>on</strong>g>to</str<strong>on</strong>g>logical dicti<strong>on</strong>ary in the system<br />

helps the word <str<strong>on</strong>g>to</str<strong>on</strong>g> define in category called place, quantity,<br />

time, thing and manner.<br />

This type of category is mapped <str<strong>on</strong>g>to</str<strong>on</strong>g> the word when the<br />

questi<strong>on</strong> starts <str<strong>on</strong>g>with</str<strong>on</strong>g> ‘how’, ’what’,’ why’, ’where’, ’when’,<br />

’how many’, and ‘how much’. Below example shows the<br />

sentence parsed <str<strong>on</strong>g>to</str<strong>on</strong>g> different <strong>on</strong><str<strong>on</strong>g>to</str<strong>on</strong>g>logical semantic <strong>on</strong><str<strong>on</strong>g>to</str<strong>on</strong>g>logical<br />

categories. For example, below Figure 7 shows the different<br />

<strong>on</strong><str<strong>on</strong>g>to</str<strong>on</strong>g>logical category for sentence, “Can I bring my pet in<br />

cab?”<br />

Figure 7: On<str<strong>on</strong>g>to</str<strong>on</strong>g>logical Category<br />

Once the c<strong>on</strong>ceptual structure is made for the user<br />

request, the actual answer search begins. The answer has <str<strong>on</strong>g>to</str<strong>on</strong>g><br />

fulfill the missing <strong>on</strong><str<strong>on</strong>g>to</str<strong>on</strong>g>logical category. Therefore when there


<str<strong>on</strong>g>Natural</str<strong>on</strong>g> <str<strong>on</strong>g>Language</str<strong>on</strong>g> <str<strong>on</strong>g>Processing</str<strong>on</strong>g> <str<strong>on</strong>g>with</str<strong>on</strong>g> <str<strong>on</strong>g>Text</str<strong>on</strong>g>-<str<strong>on</strong>g>to</str<strong>on</strong>g>-<str<strong>on</strong>g>Speech</str<strong>on</strong>g> <strong>on</strong> <strong>Android</strong> S<strong>on</strong>al Bhatt <strong>May</strong> 2011 6<br />

are two answers in the answer set <str<strong>on</strong>g>with</str<strong>on</strong>g> same subjects and<br />

events then the knowledge base xml system tries <str<strong>on</strong>g>to</str<strong>on</strong>g> get the<br />

best answer <str<strong>on</strong>g>with</str<strong>on</strong>g> the right c<strong>on</strong>ceptual structure which fulfills<br />

the missing <strong>on</strong><str<strong>on</strong>g>to</str<strong>on</strong>g>logy from the questi<strong>on</strong> <str<strong>on</strong>g>to</str<strong>on</strong>g> answer the request.<br />

Following scenario explains when there is more than <strong>on</strong>e<br />

answer in knowledge base system, how the semantic<br />

transformati<strong>on</strong> and answer search processed.<br />

Figure 9: Cab Grammar for ‘Booking’ request<br />

Figure 8: Semantic Transformati<strong>on</strong> and Answer Search<br />

Figure 8 dem<strong>on</strong>strates the scenario when user asks<br />

‘Who picked the order <strong>on</strong> Sunday?’ After recognizing the<br />

c<strong>on</strong>ceptual structure, system understands that ‘subject’<br />

category is missing when asking ‘who’ type of questi<strong>on</strong>. From<br />

the answer set, it finds two similar answers. By making the<br />

c<strong>on</strong>ceptual structure of those two answers, applicati<strong>on</strong> knows<br />

that the first answer’s c<strong>on</strong>ceptual structure is more relevant <str<strong>on</strong>g>to</str<strong>on</strong>g><br />

the c<strong>on</strong>ceptual structure of the questi<strong>on</strong>. Therefore it returns<br />

the first answer for questi<strong>on</strong> asked.<br />

Based <strong>on</strong> the dialog design <strong>on</strong> a natural dialog study<br />

ensures that the input grammar will match the phrasing<br />

actually used by people when speaking in the domain of the<br />

applicati<strong>on</strong> [2]. A natural dialog study also assures that<br />

prompts and feedback follow c<strong>on</strong>versati<strong>on</strong>al c<strong>on</strong>venti<strong>on</strong>s that<br />

users expect in a successful interacti<strong>on</strong>. <str<strong>on</strong>g>Natural</str<strong>on</strong>g> <str<strong>on</strong>g>Language</str<strong>on</strong>g><br />

<str<strong>on</strong>g>Processing</str<strong>on</strong>g> based speech applicati<strong>on</strong>s should adopt language<br />

c<strong>on</strong>venti<strong>on</strong>s that help the end user know what they should say<br />

next and that avoid c<strong>on</strong>versati<strong>on</strong>al patterns that violate<br />

standards successful and cooperative behavior. This process is<br />

d<strong>on</strong>e in ‘cab booking’ applicati<strong>on</strong> by providing user the follow<br />

up prompts that let the user know what <str<strong>on</strong>g>to</str<strong>on</strong>g> ask or answer next.<br />

“Cab Booking” applicati<strong>on</strong> c<strong>on</strong>tains two types of grammars,<br />

cab fuzzy logic grammar and cab exact transacti<strong>on</strong> grammar<br />

described above.<br />

Figure 9 shows the template of ABNF grammar for transacti<strong>on</strong><br />

sub-dialog. The cab applicati<strong>on</strong> uses the ‘booking’ request for<br />

the transacti<strong>on</strong>al dialog <str<strong>on</strong>g>to</str<strong>on</strong>g> book a cab. It c<strong>on</strong>tains the grammar<br />

rules <str<strong>on</strong>g>to</str<strong>on</strong>g> parse the speech in different semantic transformati<strong>on</strong>.<br />

The words in brackets define the <strong>on</strong><str<strong>on</strong>g>to</str<strong>on</strong>g>logical category for<br />

different words. In above figure (DEPARTURE) and<br />

(VEHICLE) are the <strong>on</strong><str<strong>on</strong>g>to</str<strong>on</strong>g>logical categories. Grammar rules<br />

always start <str<strong>on</strong>g>with</str<strong>on</strong>g> ‘$Firstrule’. Each rule is separated by ‘|”<br />

symbol. This grammar file also c<strong>on</strong>tains the states of<br />

destinati<strong>on</strong>, date, and time the user want <str<strong>on</strong>g>to</str<strong>on</strong>g> reserve a cab for. It<br />

c<strong>on</strong>tains the <str<strong>on</strong>g>to</str<strong>on</strong>g>ken as described above for each state <str<strong>on</strong>g>with</str<strong>on</strong>g><br />

words in figure as ‘minivan’, ‘sedan’, ‘coup’ for ‘Vehicle’<br />

state and ‘phoenix zoo’ for ‘Departure’ state. These <str<strong>on</strong>g>to</str<strong>on</strong>g>kens are<br />

listed based <strong>on</strong> user likely say or text for booking a cab. The<br />

functi<strong>on</strong>ality of ABNF grammar helps the user <str<strong>on</strong>g>to</str<strong>on</strong>g> say different<br />

words for same meaning. For example, in above figure, the<br />

grammar rule for ‘$Reserve’ c<strong>on</strong>tains the words ‘book’ and<br />

‘reserve’. That grammar rule I combined <str<strong>on</strong>g>with</str<strong>on</strong>g> ‘$What’ rule<br />

which has two <str<strong>on</strong>g>to</str<strong>on</strong>g>kens of ‘taxi’ and ‘cab’. So now user can<br />

triggers the this grammar by asking the same thing in four<br />

different ways as ‘book a cab’, ‘reserve a cab’, ‘book a taxi’,<br />

and ‘reserve a taxi’. For the time and date category when user<br />

says different number, the number dicti<strong>on</strong>ary file can be<br />

referenced as <str<strong>on</strong>g>to</str<strong>on</strong>g>ken in this grammar file <str<strong>on</strong>g>to</str<strong>on</strong>g> take all kind of<br />

different number combinati<strong>on</strong>s user likely <str<strong>on</strong>g>to</str<strong>on</strong>g> ask.<br />

Figure 10 describes the cabstates.xml file. When user<br />

wants <str<strong>on</strong>g>to</str<strong>on</strong>g> do the exact request for booking a cab, this state file<br />

is read by the system. The request is triggered by the ‘reserve’<br />

command when user says ‘I want <str<strong>on</strong>g>to</str<strong>on</strong>g> reserve a cab’. The pieces<br />

of informati<strong>on</strong> needed <str<strong>on</strong>g>to</str<strong>on</strong>g> complete the booking process, are<br />

destinati<strong>on</strong>, departure, date, time and what type of vehicle user


<str<strong>on</strong>g>Natural</str<strong>on</strong>g> <str<strong>on</strong>g>Language</str<strong>on</strong>g> <str<strong>on</strong>g>Processing</str<strong>on</strong>g> <str<strong>on</strong>g>with</str<strong>on</strong>g> <str<strong>on</strong>g>Text</str<strong>on</strong>g>-<str<strong>on</strong>g>to</str<strong>on</strong>g>-<str<strong>on</strong>g>Speech</str<strong>on</strong>g> <strong>on</strong> <strong>Android</strong> S<strong>on</strong>al Bhatt <strong>May</strong> 2011 7<br />

want <str<strong>on</strong>g>to</str<strong>on</strong>g> reserve. These pieces of informati<strong>on</strong> are s<str<strong>on</strong>g>to</str<strong>on</strong>g>red in<br />

tag of the XML file. The name of the state is<br />

used by the system <str<strong>on</strong>g>to</str<strong>on</strong>g> make the c<strong>on</strong>ceptual structure <str<strong>on</strong>g>with</str<strong>on</strong>g> the<br />

<strong>on</strong><str<strong>on</strong>g>to</str<strong>on</strong>g>logical categories. The tag is activated when<br />

the user first triggers the request by asking the ‘reserve’<br />

acti<strong>on</strong>. Once it is in state file, it prompts the user <str<strong>on</strong>g>with</str<strong>on</strong>g> text in<br />

tag <str<strong>on</strong>g>with</str<strong>on</strong>g>in tag. Once the <br />

value has been fulfilled it jumps <str<strong>on</strong>g>to</str<strong>on</strong>g> the next state <str<strong>on</strong>g>to</str<strong>on</strong>g> capture<br />

the value. If the user fulfils all the states in <strong>on</strong>e request then,<br />

the system captured all the values and puts in the proper<br />

<strong>on</strong><str<strong>on</strong>g>to</str<strong>on</strong>g>logical category <str<strong>on</strong>g>to</str<strong>on</strong>g> form the c<strong>on</strong>ceptual structure. Once<br />

all the values are captured <str<strong>on</strong>g>with</str<strong>on</strong>g>in xml file, it goes <str<strong>on</strong>g>to</str<strong>on</strong>g><br />

tag. This tag allows the user <str<strong>on</strong>g>to</str<strong>on</strong>g><br />

review the request. At this stage, user can change his mind<br />

and change the values he entered before. Once the user<br />

c<strong>on</strong>firms the reservati<strong>on</strong>, the applicati<strong>on</strong> returns the assertive<br />

resp<strong>on</strong>se of reservati<strong>on</strong> by ‘Ok, your reservati<strong>on</strong> has been<br />

c<strong>on</strong>firmed’. If the user says ‘no’ at the c<strong>on</strong>firmati<strong>on</strong> step,<br />

tag will trigger, and prompts the user for<br />

next questi<strong>on</strong>. These tags allows the end users know what<br />

they should say next and avoid c<strong>on</strong>versati<strong>on</strong>al patterns that<br />

violate standards successful and cooperative behavior for the<br />

reservati<strong>on</strong> process.<br />

Figure 10 describes the parsing of exact request from the<br />

cabstates.xml file and cab exact grammar. Acti<strong>on</strong> command<br />

triggers when user asks for reservati<strong>on</strong>. All the blue boxes<br />

indicate the states have <str<strong>on</strong>g>to</str<strong>on</strong>g> be captured <str<strong>on</strong>g>to</str<strong>on</strong>g> go <str<strong>on</strong>g>to</str<strong>on</strong>g> the<br />

c<strong>on</strong>firmati<strong>on</strong> dialog. Until system gets all the states value, it<br />

asks the user <str<strong>on</strong>g>with</str<strong>on</strong>g> prompt <str<strong>on</strong>g>to</str<strong>on</strong>g> fulfill the value. These values can<br />

be given <str<strong>on</strong>g>to</str<strong>on</strong>g> the system at <strong>on</strong>ce or <strong>on</strong>e by <strong>on</strong>e. System parses<br />

the speech and fills the <strong>on</strong><str<strong>on</strong>g>to</str<strong>on</strong>g>logical category <str<strong>on</strong>g>with</str<strong>on</strong>g> values<br />

parsed. Missing states are gathered at end <str<strong>on</strong>g>to</str<strong>on</strong>g> give the follow<br />

up resp<strong>on</strong>se. This implementati<strong>on</strong> gives the ability user ask for<br />

the transacti<strong>on</strong> in many ways naturally. Below figure parses<br />

the sentence of “reserve a sedan for <str<strong>on</strong>g>to</str<strong>on</strong>g>day at 3 p.m.”<br />

Figure 10: C<strong>on</strong>ceptual Structure of Exact Request<br />

The process of building ‘cab booking’ applicati<strong>on</strong> started <str<strong>on</strong>g>with</str<strong>on</strong>g><br />

building answers sets. As described above this applicati<strong>on</strong><br />

c<strong>on</strong>tains two types of answer sets, knowledge base and exact<br />

booking transacti<strong>on</strong> dialog set. From the user requirement, I<br />

have determined what queries needed <str<strong>on</strong>g>to</str<strong>on</strong>g> be answered and<br />

provided actual answer in the cabanswer.xml file. For the<br />

reservati<strong>on</strong> process, I have designed the transacti<strong>on</strong>al dialog<br />

for booking process. I have built the states depends <strong>on</strong> pieces<br />

of informati<strong>on</strong> needed <str<strong>on</strong>g>to</str<strong>on</strong>g> fulfill the reservati<strong>on</strong> of cab. Based


<str<strong>on</strong>g>Natural</str<strong>on</strong>g> <str<strong>on</strong>g>Language</str<strong>on</strong>g> <str<strong>on</strong>g>Processing</str<strong>on</strong>g> <str<strong>on</strong>g>with</str<strong>on</strong>g> <str<strong>on</strong>g>Text</str<strong>on</strong>g>-<str<strong>on</strong>g>to</str<strong>on</strong>g>-<str<strong>on</strong>g>Speech</str<strong>on</strong>g> <strong>on</strong> <strong>Android</strong> S<strong>on</strong>al Bhatt <strong>May</strong> 2011 8<br />

<strong>on</strong> that requirement I have built the cab exact grammar that<br />

supports the states.xml file. This grammar is designed by<br />

c<strong>on</strong>sidering all opti<strong>on</strong>s user might ask or say <str<strong>on</strong>g>to</str<strong>on</strong>g> complete the<br />

state values. The robustness of ABNF grammar allows the<br />

developer <str<strong>on</strong>g>to</str<strong>on</strong>g> add more <str<strong>on</strong>g>to</str<strong>on</strong>g>kens easily for each state <str<strong>on</strong>g>to</str<strong>on</strong>g> provide<br />

more opti<strong>on</strong>s <str<strong>on</strong>g>to</str<strong>on</strong>g> the grammar. For example, the destinati<strong>on</strong><br />

state for the cab reservati<strong>on</strong> process c<strong>on</strong>sists limited number of<br />

destinati<strong>on</strong>. In future the value of the destinati<strong>on</strong> can be easily<br />

changed from ‘Phoenix zoo’ <str<strong>on</strong>g>to</str<strong>on</strong>g> ‘Aquarium’.<br />

Figure 11 describes the cabanswer.xml file. This file<br />

is an xml knowledge base answer set. It is used when the user<br />

request is categorized as ad-hoc request. The parser makes the<br />

c<strong>on</strong>ceptual structure based <strong>on</strong> cab fuzzy logic grammar. The<br />

c<strong>on</strong>ceptual structure is mapped from the fuzzy grammar. The<br />

system understands which answer <str<strong>on</strong>g>to</str<strong>on</strong>g> return from the c<strong>on</strong>text<br />

and meaning of the user’s request.<br />

The “Cab Booking” applicati<strong>on</strong> <str<strong>on</strong>g>with</str<strong>on</strong>g> <str<strong>on</strong>g>Natural</str<strong>on</strong>g><br />

<str<strong>on</strong>g>Language</str<strong>on</strong>g> <str<strong>on</strong>g>Processing</str<strong>on</strong>g> has the following features which allow<br />

the user <str<strong>on</strong>g>to</str<strong>on</strong>g> communicate <str<strong>on</strong>g>to</str<strong>on</strong>g> the android ph<strong>on</strong>e in more natural<br />

way. I have developed these functi<strong>on</strong>alities by adding more<br />

dicti<strong>on</strong>aries <str<strong>on</strong>g>to</str<strong>on</strong>g> the applicati<strong>on</strong>, and designing the dialog<br />

direc<str<strong>on</strong>g>to</str<strong>on</strong>g>r <str<strong>on</strong>g>to</str<strong>on</strong>g> allow the interrupti<strong>on</strong> between exact and ad-hoc<br />

type request. This feature facilitates the user <str<strong>on</strong>g>to</str<strong>on</strong>g> complete the<br />

reservati<strong>on</strong> process at any time in the c<strong>on</strong>versati<strong>on</strong>al dialog.<br />

User can ask the informati<strong>on</strong> retrieval request and come back<br />

<str<strong>on</strong>g>to</str<strong>on</strong>g> complete the reservati<strong>on</strong>. For example, if the user is in<br />

middle of the reservati<strong>on</strong> process and applicati<strong>on</strong> prompts the<br />

user of “What type of vehicle would you like <str<strong>on</strong>g>to</str<strong>on</strong>g> reserve?” ,<br />

user can refuse <str<strong>on</strong>g>to</str<strong>on</strong>g> give the answer of this prompt and interrupt<br />

the system by asking another open ended questi<strong>on</strong> i.e. “What<br />

kind of vehicle this cab company offer?” . The Dialog direc<str<strong>on</strong>g>to</str<strong>on</strong>g>r<br />

saves the reservati<strong>on</strong> state, and retrieves the matching answer<br />

from knowledge base answer set. The system gives the<br />

resp<strong>on</strong>se of answer and also gives the follow up prompt of<br />

reservati<strong>on</strong> process.<br />

Sentence pre-fixes:<br />

Sometimes the users just say some useless words <str<strong>on</strong>g>to</str<strong>on</strong>g><br />

get the informati<strong>on</strong>. The words like ‘umm’, ‘oh’, ‘please’, can<br />

be ignored by the system <str<strong>on</strong>g>to</str<strong>on</strong>g> successfully parse the speech.<br />

Sometimes the l<strong>on</strong>g phrases like ‘I want <str<strong>on</strong>g>to</str<strong>on</strong>g>’, ‘Can you please<br />

tell me’, ‘I would like <str<strong>on</strong>g>to</str<strong>on</strong>g>’ used by the user, also does not need<br />

<str<strong>on</strong>g>to</str<strong>on</strong>g> be parsed when retrieving the correct prompt or dialog from<br />

the system. I have achieved this functi<strong>on</strong>ality by creating the<br />

text file “Filters.txt”. This file c<strong>on</strong>tains all these useless words<br />

and phrases user likely <str<strong>on</strong>g>to</str<strong>on</strong>g> say. The Parser loads the file, and<br />

before it starts <str<strong>on</strong>g>to</str<strong>on</strong>g> parse the request, it checks weather the string<br />

is starting <str<strong>on</strong>g>with</str<strong>on</strong>g> these useless words. If the request c<strong>on</strong>tains this<br />

word, it will be spitted and the parser will use filtered request<br />

<str<strong>on</strong>g>to</str<strong>on</strong>g> make c<strong>on</strong>ceptual structure.<br />

Correcti<strong>on</strong>s:<br />

In my project, I have developed the functi<strong>on</strong>ality of the<br />

correcti<strong>on</strong> at any point in the dialog. In real world, when<br />

people make reservati<strong>on</strong>, they should have facility <str<strong>on</strong>g>to</str<strong>on</strong>g> change<br />

mind. I have accomplished this functi<strong>on</strong>ality by providing<br />

explicitly the tab in the cabstates.xml file.<br />

Example: What would you like <str<strong>on</strong>g>to</str<strong>on</strong>g> do<br />

then?. Therefore in the reservati<strong>on</strong> process,<br />

at c<strong>on</strong>firmati<strong>on</strong> stage, when applicati<strong>on</strong> plays a c<strong>on</strong>firmati<strong>on</strong><br />

prompt, user has facility <str<strong>on</strong>g>to</str<strong>on</strong>g> change the values he has entered<br />

previously and parser will process the request and will replace<br />

the values in the same c<strong>on</strong>ceptual structure.<br />

tag is triggered when c<strong>on</strong>firmati<strong>on</strong> is<br />

denied by the user.<br />

For example:<br />

System: You want <str<strong>on</strong>g>to</str<strong>on</strong>g> reserve a sedan. Is that correct?<br />

User: No.<br />

System: What would you like <str<strong>on</strong>g>to</str<strong>on</strong>g> do then?<br />

User: Actually I want <str<strong>on</strong>g>to</str<strong>on</strong>g> reserve a minivan.<br />

At any point in the dialog, when user wants <str<strong>on</strong>g>to</str<strong>on</strong>g> change the<br />

mind, the Filter.txt file has “ChangeMind’ words listed, which<br />

will be filtered out and parser will get new values. For<br />

example,<br />

User: I want <str<strong>on</strong>g>to</str<strong>on</strong>g> book a sedan.<br />

System: To what date do you want <str<strong>on</strong>g>to</str<strong>on</strong>g> be picked up?<br />

User: Actually I want <str<strong>on</strong>g>to</str<strong>on</strong>g> book a minivan.<br />

Navigati<strong>on</strong> Commands:<br />

“Cab Booking” applicati<strong>on</strong> allows the user <str<strong>on</strong>g>to</str<strong>on</strong>g> flow<br />

the dialog in any directi<strong>on</strong>. It is not like the IVR system,<br />

where user is bound <str<strong>on</strong>g>to</str<strong>on</strong>g> say specific resp<strong>on</strong>se or type particular<br />

number for navigati<strong>on</strong>. I have developed this functi<strong>on</strong>ality by<br />

creating navigati<strong>on</strong>_command.txt file. It helps <str<strong>on</strong>g>to</str<strong>on</strong>g> flow the


<str<strong>on</strong>g>Natural</str<strong>on</strong>g> <str<strong>on</strong>g>Language</str<strong>on</strong>g> <str<strong>on</strong>g>Processing</str<strong>on</strong>g> <str<strong>on</strong>g>with</str<strong>on</strong>g> <str<strong>on</strong>g>Text</str<strong>on</strong>g>-<str<strong>on</strong>g>to</str<strong>on</strong>g>-<str<strong>on</strong>g>Speech</str<strong>on</strong>g> <strong>on</strong> <strong>Android</strong> S<strong>on</strong>al Bhatt <strong>May</strong> 2011 9<br />

dialog in certain directi<strong>on</strong>s depends <strong>on</strong> user’s resp<strong>on</strong>se <str<strong>on</strong>g>with</str<strong>on</strong>g><br />

yes, no, repeat and cancellati<strong>on</strong>. Some of the mappings <str<strong>on</strong>g>with</str<strong>on</strong>g><br />

repeat and cancel commands are as follows. REPEAT maps <str<strong>on</strong>g>to</str<strong>on</strong>g><br />

a variety of phrases: What? What did you say? Can you<br />

repeat? Say that again please? Pard<strong>on</strong>? Pard<strong>on</strong> me? Please<br />

repeat. CANCEL maps <str<strong>on</strong>g>to</str<strong>on</strong>g> a variety of phrases: Please cancel<br />

this transacti<strong>on</strong>. S<str<strong>on</strong>g>to</str<strong>on</strong>g>p please. I d<strong>on</strong>'t want <str<strong>on</strong>g>to</str<strong>on</strong>g> reserve. YES<br />

maps <str<strong>on</strong>g>to</str<strong>on</strong>g> yes, of course, sure, yeah, yup. NO maps <str<strong>on</strong>g>to</str<strong>on</strong>g> no, nah,<br />

no thanks. This facilitates the user <str<strong>on</strong>g>to</str<strong>on</strong>g> speak or type more<br />

naturally <str<strong>on</strong>g>with</str<strong>on</strong>g>out binding <str<strong>on</strong>g>to</str<strong>on</strong>g> enter certain input. The<br />

applicati<strong>on</strong> understands all kind of resp<strong>on</strong>ses listed in the text<br />

file.<br />

Syn<strong>on</strong>yms:<br />

I have created ‘syn<strong>on</strong>yms.txt’ file <str<strong>on</strong>g>to</str<strong>on</strong>g> map similar<br />

c<strong>on</strong>text words in<str<strong>on</strong>g>to</str<strong>on</strong>g> <strong>on</strong>e word used in dicti<strong>on</strong>ary. For example,<br />

user can use bird, animal, cat, dog instead of using word ‘pet’.<br />

I have mapped these words <str<strong>on</strong>g>to</str<strong>on</strong>g> word ‘pet’ which is in ‘noun’<br />

dicti<strong>on</strong>ary. Therefore, user can request <str<strong>on</strong>g>with</str<strong>on</strong>g> any word, but the<br />

parser will refined the request by replacing other word <str<strong>on</strong>g>to</str<strong>on</strong>g> ‘pet’<br />

and will c<strong>on</strong>tinue <str<strong>on</strong>g>to</str<strong>on</strong>g> parse the sentence.<br />

D. Adopted Design<br />

The “Cab Reservati<strong>on</strong>” <str<strong>on</strong>g>with</str<strong>on</strong>g> <str<strong>on</strong>g>Natural</str<strong>on</strong>g> <str<strong>on</strong>g>Language</str<strong>on</strong>g> <str<strong>on</strong>g>Processing</str<strong>on</strong>g> is<br />

based <strong>on</strong> the X-bar theory [7] explained above. I have<br />

designed the <str<strong>on</strong>g>Natural</str<strong>on</strong>g> <str<strong>on</strong>g>Language</str<strong>on</strong>g> parser and Dialog direc<str<strong>on</strong>g>to</str<strong>on</strong>g>r for<br />

the applicati<strong>on</strong>. I have adopted the executi<strong>on</strong> interface<br />

technology for retrieval of the answer, based up<strong>on</strong> the correct<br />

match of the c<strong>on</strong>ceptual structure, I developed from the parser.<br />

This interface uses “tree” data structure. Using mathematical<br />

equati<strong>on</strong>s and algorithms, the executi<strong>on</strong> interface calculates<br />

the weight and positi<strong>on</strong> of the word in the knowledge based<br />

answer set. It finds the best matching paragraph. The<br />

searching is d<strong>on</strong>e based <strong>on</strong> the “verb” in the sentence. The<br />

parser puts the verb in <strong>on</strong><str<strong>on</strong>g>to</str<strong>on</strong>g>logical category named “EVENT”.<br />

Therefore, the executi<strong>on</strong> interface starts the search from leaf<br />

node of the tree which is the verb in the sentence.<br />

A. Validati<strong>on</strong><br />

IV.<br />

VALIDATION<br />

The “Cab Reservati<strong>on</strong>” <str<strong>on</strong>g>with</str<strong>on</strong>g> <str<strong>on</strong>g>Natural</str<strong>on</strong>g> <str<strong>on</strong>g>Language</str<strong>on</strong>g> <str<strong>on</strong>g>Processing</str<strong>on</strong>g><br />

(NLP) using <str<strong>on</strong>g>Text</str<strong>on</strong>g>-<str<strong>on</strong>g>to</str<strong>on</strong>g>-speech (TTS) is developed c<strong>on</strong>sidering<br />

validati<strong>on</strong> at every stage of the applicati<strong>on</strong> development cycle.<br />

Using the requirements as the baseline, all the functi<strong>on</strong>alities<br />

of the applicati<strong>on</strong> described above were tested. Applicati<strong>on</strong> is<br />

tested for ad-hoc queries and transacti<strong>on</strong>al dialog system. For<br />

the android client the applicati<strong>on</strong> is installed <strong>on</strong> Mo<str<strong>on</strong>g>to</str<strong>on</strong>g>rola<br />

device provided by Dr. Richard Whitehouse, Lecturer CTI<br />

Department of Engineering at Ariz<strong>on</strong>a State University. The<br />

server is installed <strong>on</strong> <strong>on</strong>e of the virtual server of CTI<br />

department, Ariz<strong>on</strong>a State University. The applicati<strong>on</strong> is also<br />

tested <str<strong>on</strong>g>with</str<strong>on</strong>g> the speech recognizer for speech input.<br />

B. Results<br />

After testing “Cab Reservati<strong>on</strong>” applicati<strong>on</strong> <strong>on</strong> actual<br />

android device, it can be derived that answers are more<br />

accurate and precise when all words the user likely <str<strong>on</strong>g>to</str<strong>on</strong>g> use in<br />

formulating the questi<strong>on</strong>, are found in dicti<strong>on</strong>aries. Answer<br />

performs better if the questi<strong>on</strong> is fully parsed through either<br />

from the cab exact transacti<strong>on</strong>al grammar or cab fuzzy logic<br />

grammar rather then not finding the matching rule for building<br />

proper c<strong>on</strong>ceptual structure. 95% of the questi<strong>on</strong> maps the<br />

proper <strong>on</strong><str<strong>on</strong>g>to</str<strong>on</strong>g>logical category and found the words from<br />

syn<strong>on</strong>yms files.<br />

Although most of the questi<strong>on</strong>s worked, I have found out<br />

that if the phrase is not in active voice it is difficult for the<br />

applicati<strong>on</strong> <str<strong>on</strong>g>to</str<strong>on</strong>g> parse the speech in right semantic structure.<br />

Also when the answer for particular questi<strong>on</strong> is found in<br />

multiple places in knowledge base answer set then system<br />

does not give the accurate resp<strong>on</strong>se. That time the applicati<strong>on</strong><br />

is found the answers <str<strong>on</strong>g>to</str<strong>on</strong>g> be ambiguous. Although this ‘Cab<br />

Reservati<strong>on</strong>’ applicati<strong>on</strong> is not an expert in the subject area,<br />

the undesirable effect is that it gives the wr<strong>on</strong>g answer for<br />

some informati<strong>on</strong> retrieval area rather than no answer.<br />

V. CONCLUSION AND FUTURE WORK<br />

The <str<strong>on</strong>g>Natural</str<strong>on</strong>g> <str<strong>on</strong>g>Language</str<strong>on</strong>g> <str<strong>on</strong>g>Processing</str<strong>on</strong>g> <str<strong>on</strong>g>to</str<strong>on</strong>g>ol <str<strong>on</strong>g>with</str<strong>on</strong>g> text-<str<strong>on</strong>g>to</str<strong>on</strong>g>-speech<br />

looks through dicti<strong>on</strong>aries and grammars <str<strong>on</strong>g>to</str<strong>on</strong>g> gather relevant<br />

resp<strong>on</strong>se of user request. I believe that the ‘Cab Reservati<strong>on</strong>’<br />

applicati<strong>on</strong> is developed using the c<strong>on</strong>cepts and design of<br />

natural language processing described in this paper, when<br />

deployed, helps the end user <str<strong>on</strong>g>to</str<strong>on</strong>g> speak more naturally for the<br />

reservati<strong>on</strong> process. This project uses primarily the android<br />

device, <str<strong>on</strong>g>with</str<strong>on</strong>g> text-<str<strong>on</strong>g>to</str<strong>on</strong>g>-speech library supported for android<br />

operating system. Sec<strong>on</strong>d, this applicati<strong>on</strong> gives the<br />

appropriate resp<strong>on</strong>ses based <strong>on</strong> the natural language parser<br />

and understanding <str<strong>on</strong>g>to</str<strong>on</strong>g>ol <str<strong>on</strong>g>to</str<strong>on</strong>g> flow the user communicati<strong>on</strong> <str<strong>on</strong>g>with</str<strong>on</strong>g><br />

android smartph<strong>on</strong>e more natural way. Further the soluti<strong>on</strong><br />

implemented during this project is scalable, portable, can be<br />

deployed <str<strong>on</strong>g>to</str<strong>on</strong>g> any android device.<br />

Some areas for future work would be <str<strong>on</strong>g>to</str<strong>on</strong>g> add more<br />

intelligence <str<strong>on</strong>g>to</str<strong>on</strong>g> cab grammar <str<strong>on</strong>g>to</str<strong>on</strong>g> achieve more natural and<br />

robust dialog resp<strong>on</strong>ses by using collaborative user<br />

communicati<strong>on</strong>. Hence an important extensi<strong>on</strong> <str<strong>on</strong>g>to</str<strong>on</strong>g> the current<br />

work would be <str<strong>on</strong>g>to</str<strong>on</strong>g> c<strong>on</strong>sider mobile agent security mechanisms<br />

such as authenticati<strong>on</strong>, authorizati<strong>on</strong>, encrypti<strong>on</strong> and others as<br />

in [12] <str<strong>on</strong>g>to</str<strong>on</strong>g> ensure that secure transacti<strong>on</strong> can be d<strong>on</strong>e<br />

throughout the booking process. In the future, a more<br />

graphical user interface can be developed for ease of the end<br />

user and more functi<strong>on</strong>s like recording the transacti<strong>on</strong> dialog<br />

can be added <strong>on</strong> as the mobile applicati<strong>on</strong>. At this stage, the<br />

system uses speech recogniti<strong>on</strong> of android device itself, it can<br />

be developed separately <str<strong>on</strong>g>to</str<strong>on</strong>g> detect the user’s speech input time<br />

and made the c<strong>on</strong>versati<strong>on</strong> more natural <str<strong>on</strong>g>with</str<strong>on</strong>g>out <str<strong>on</strong>g>to</str<strong>on</strong>g>uching the<br />

device. This applicati<strong>on</strong> c<strong>on</strong>tains the dicti<strong>on</strong>aries and<br />

grammar for booking a cab in English language. In future the<br />

functi<strong>on</strong>ality can be added <str<strong>on</strong>g>to</str<strong>on</strong>g> work <str<strong>on</strong>g>with</str<strong>on</strong>g> multiple language<br />

dicti<strong>on</strong>aries and grammars.<br />

Furthermore, when the system is unable <str<strong>on</strong>g>to</str<strong>on</strong>g> find the answer<br />

from the answer set or text database, it would be able <str<strong>on</strong>g>to</str<strong>on</strong>g> search<br />

thru intranet or internet <str<strong>on</strong>g>to</str<strong>on</strong>g> give the related answer of user<br />

request. For more accurate communicati<strong>on</strong>, the functi<strong>on</strong>ality<br />

of pushing a related page <str<strong>on</strong>g>with</str<strong>on</strong>g> the appropriate answer can be<br />

added in future. For example, when the user does not know the<br />

exact locati<strong>on</strong> of the departure, the system will be able <str<strong>on</strong>g>to</str<strong>on</strong>g><br />

detect the current locati<strong>on</strong> of android device <str<strong>on</strong>g>with</str<strong>on</strong>g> the help of<br />

GPS. In additi<strong>on</strong>, the applicati<strong>on</strong> can be enhanced <str<strong>on</strong>g>to</str<strong>on</strong>g> au<str<strong>on</strong>g>to</str<strong>on</strong>g>mate<br />

grammar genera<str<strong>on</strong>g>to</str<strong>on</strong>g>r process from the states files. This feature<br />

will reduce the amount of human error in linguistic field and


<str<strong>on</strong>g>Natural</str<strong>on</strong>g> <str<strong>on</strong>g>Language</str<strong>on</strong>g> <str<strong>on</strong>g>Processing</str<strong>on</strong>g> <str<strong>on</strong>g>with</str<strong>on</strong>g> <str<strong>on</strong>g>Text</str<strong>on</strong>g>-<str<strong>on</strong>g>to</str<strong>on</strong>g>-<str<strong>on</strong>g>Speech</str<strong>on</strong>g> <strong>on</strong> <strong>Android</strong> S<strong>on</strong>al Bhatt <strong>May</strong> 2011 10<br />

syntax errors. C<strong>on</strong>sequently, it would save the man hours <str<strong>on</strong>g>to</str<strong>on</strong>g><br />

build the robust c<strong>on</strong>versati<strong>on</strong>al dialog. For testing purpose the<br />

tester can be developed which will take all the questi<strong>on</strong>s user<br />

likely ask and will feed in<str<strong>on</strong>g>to</str<strong>on</strong>g> the applicati<strong>on</strong>, will generate the<br />

related answer and dump in<str<strong>on</strong>g>to</str<strong>on</strong>g> the text file <str<strong>on</strong>g>to</str<strong>on</strong>g> analyze the<br />

results <str<strong>on</strong>g>to</str<strong>on</strong>g> save the time of testing the applicati<strong>on</strong> <str<strong>on</strong>g>with</str<strong>on</strong>g> different<br />

types of questi<strong>on</strong>s.<br />

ACKNOWLEDGMENT<br />

I would like <str<strong>on</strong>g>to</str<strong>on</strong>g> thank my committee chair Dr Timothy<br />

Lindquist for providing his valuable guidance and advice<br />

throughout this work. I also extend my thanks <str<strong>on</strong>g>to</str<strong>on</strong>g> my<br />

committee members Professor Richard Whitehouse and Dr.<br />

John Femiani for their feedback. Professor Richard’s android<br />

class inspired me <str<strong>on</strong>g>to</str<strong>on</strong>g> work <strong>on</strong> this project idea and helped <str<strong>on</strong>g>to</str<strong>on</strong>g><br />

build the client-server architecture <str<strong>on</strong>g>with</str<strong>on</strong>g> the android device.<br />

REFERENCES<br />

[1] A. Turing, “Computing Machinery and Intelligence”, Mind 49, 1950, pp.<br />

433-460.<br />

[2] “Designing Effective <str<strong>on</strong>g>Speech</str<strong>on</strong>g> Applicati<strong>on</strong>”, Java TM <str<strong>on</strong>g>Speech</str<strong>on</strong>g> API<br />

Programmer's Guide, Sun Microsystems, Inc.<br />

[3] C.Bajorek, “The state of IVR navigati<strong>on</strong> technology”, Computer<br />

Teleph<strong>on</strong>y Magazine, New York, NY, Volume 8, September 2000.<br />

[4] J. A. Jacko, A. Sears, “The human –computer interacti<strong>on</strong> handbook:<br />

fundamentals, evolving technologies, and emerging” New Jersey:<br />

Lawrence Erlbaum Associates, 2003, pp. 712-750.<br />

[5] T.Du<str<strong>on</strong>g>to</str<strong>on</strong>g>it, “An Introducti<strong>on</strong> <str<strong>on</strong>g>to</str<strong>on</strong>g> <str<strong>on</strong>g>Text</str<strong>on</strong>g>-<str<strong>on</strong>g>to</str<strong>on</strong>g>-<str<strong>on</strong>g>Speech</str<strong>on</strong>g> Synthesis.” TTS Research<br />

team , TCTS Lab, pp. 2-6.<br />

[6] B.Manaris. “<str<strong>on</strong>g>Natural</str<strong>on</strong>g> <str<strong>on</strong>g>Language</str<strong>on</strong>g> <str<strong>on</strong>g>Processing</str<strong>on</strong>g> :A human-Computer<br />

Interacti<strong>on</strong> Perspective”, University of Southwestern Lousiana,<br />

Louisiana.<br />

[7] Chomsky, N.. “Remarks <strong>on</strong> Nominalizati<strong>on</strong>.” In R.Jacobs & P.<br />

Rosembaum, eds., Readings in English, 1970.<br />

[8] Jackendoff, R. ‘ Foundati<strong>on</strong>s of <str<strong>on</strong>g>Language</str<strong>on</strong>g>.’ Oxford University Press,<br />

New York, NY. 2002.<br />

[9] Jackendoff, R. ‘ Semantics and Cogniti<strong>on</strong>.’ The MITPress, Cambridge,<br />

MA. 1983.<br />

[10] NuGram Platform, “http://nugram.nuecho.com/product_app/welcome”,<br />

nu Echo Inc., 2003-2011.<br />

[11] M.D. Riley. “Tree-based modeling for speech synthesis”, In G. Bailly,<br />

C. Benoit, and T.R. Sawallis, edi<str<strong>on</strong>g>to</str<strong>on</strong>g>rs, Talking Machines: Theories,<br />

Models, and Designs, pages 265–273.<br />

[12] Yang Kun, Guo Xin, Liu Dayou, “Security in mobile agent system:<br />

problems and approaches”, ACM SIGOPS Operating Systems Review,<br />

Volume 34 Issue 1, January 2000.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!