Natural Language Processing with Text-to- Speech on Android (May ...
Natural Language Processing with Text-to- Speech on Android (May ...
Natural Language Processing with Text-to- Speech on Android (May ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<str<strong>on</strong>g>Natural</str<strong>on</strong>g> <str<strong>on</strong>g>Language</str<strong>on</strong>g> <str<strong>on</strong>g>Processing</str<strong>on</strong>g> <str<strong>on</strong>g>with</str<strong>on</strong>g> <str<strong>on</strong>g>Text</str<strong>on</strong>g>-<str<strong>on</strong>g>to</str<strong>on</strong>g>-<str<strong>on</strong>g>Speech</str<strong>on</strong>g> <strong>on</strong> <strong>Android</strong> S<strong>on</strong>al Bhatt <strong>May</strong> 2011 1<br />
<str<strong>on</strong>g>Natural</str<strong>on</strong>g> <str<strong>on</strong>g>Language</str<strong>on</strong>g> <str<strong>on</strong>g>Processing</str<strong>on</strong>g> <str<strong>on</strong>g>with</str<strong>on</strong>g> <str<strong>on</strong>g>Text</str<strong>on</strong>g>-<str<strong>on</strong>g>to</str<strong>on</strong>g>-<br />
<str<strong>on</strong>g>Speech</str<strong>on</strong>g> <strong>on</strong> <strong>Android</strong> (<strong>May</strong> 2011)<br />
S<strong>on</strong>al Bhatt, Graduate Student, Ariz<strong>on</strong>a State University, Divisi<strong>on</strong> of Computing Studies.<br />
Email: sbbhatt1@asu.edu<br />
Abstract— As the use of mobile devices is expanding and<br />
affecting various aspects of human life, the number and<br />
smartph<strong>on</strong>e users is dramatically increasing. C<strong>on</strong>sequently, the<br />
robustness of interacti<strong>on</strong> between smartph<strong>on</strong>e and human is<br />
essential for better system performance. This paper presents the<br />
detail implementati<strong>on</strong> approach for interactive natural language<br />
system <str<strong>on</strong>g>with</str<strong>on</strong>g> ‘Cab Reservati<strong>on</strong>’ applicati<strong>on</strong> <strong>on</strong> android<br />
smartph<strong>on</strong>e. By using the speech synthesizer technology for the<br />
android, the applicati<strong>on</strong> presents the modality of text-<str<strong>on</strong>g>to</str<strong>on</strong>g>-speech<br />
resp<strong>on</strong>ses <strong>on</strong> android device.<br />
Index Terms— natural language processing, speech recogniti<strong>on</strong>,<br />
text-<str<strong>on</strong>g>to</str<strong>on</strong>g>-speech synthesizer, artificial intelligence, semantic<br />
structure, c<strong>on</strong>versati<strong>on</strong>al dialog, ad-hoc-quarries.<br />
S<br />
I. INTRODUCTION<br />
peech applicati<strong>on</strong> can be defined as interacti<strong>on</strong> between<br />
the user and the computer in more natural way. As people<br />
find speaking naturally is easy, it is the most advantageous<br />
<str<strong>on</strong>g>to</str<strong>on</strong>g> incorporate speech in<str<strong>on</strong>g>to</str<strong>on</strong>g> any natural language processing<br />
software. C<strong>on</strong>versati<strong>on</strong>al dialog is a verbal acti<strong>on</strong> which takes<br />
place turn by turn between human and computer <str<strong>on</strong>g>with</str<strong>on</strong>g><br />
feedback and acknowledgement <str<strong>on</strong>g>to</str<strong>on</strong>g> indicate understanding.<br />
The field of Artificial Intelligence (AI) and the idea of a<br />
machine dialog <str<strong>on</strong>g>with</str<strong>on</strong>g> humans are as old as the field of<br />
Computer Science. Fifty-three years ago, the British<br />
Mathematician Allan Turing proposed the Turing Test in his<br />
paper “Computing Machinery and Intelligence” [1]. In the<br />
Turing Test, a user A is placed at a terminal <str<strong>on</strong>g>with</str<strong>on</strong>g> a keyboard<br />
and at the other end another user B is placed at a different<br />
terminal. In additi<strong>on</strong> at the other end there is a computer<br />
program designed <str<strong>on</strong>g>to</str<strong>on</strong>g> maintain humanlike c<strong>on</strong>versati<strong>on</strong>s. The<br />
user A cannot see who or what is at the other end. The user A<br />
is then engaged in c<strong>on</strong>versati<strong>on</strong>s <str<strong>on</strong>g>with</str<strong>on</strong>g> user B and <str<strong>on</strong>g>with</str<strong>on</strong>g> the<br />
computer program. If the user A cannot tell the difference<br />
between the user B and the computer program, then we say<br />
that the computer program has passed the “Turing Test”. Since<br />
Alan Turing’s paper was published, for many years the Turing<br />
Test has been the ultimate goal of AI and c<strong>on</strong>versati<strong>on</strong>al<br />
systems.<br />
<str<strong>on</strong>g>Speech</str<strong>on</strong>g> applicati<strong>on</strong> should be based <strong>on</strong> an understanding of<br />
the different ways that people use language <str<strong>on</strong>g>to</str<strong>on</strong>g> communicate<br />
[2]. Nowadays people use texting and IVR (Interactive voice<br />
resp<strong>on</strong>se) <str<strong>on</strong>g>to</str<strong>on</strong>g> communicate <str<strong>on</strong>g>with</str<strong>on</strong>g> the computers via cell ph<strong>on</strong>e.<br />
IVR system can be used by teleph<strong>on</strong>e’s keypad or <str<strong>on</strong>g>with</str<strong>on</strong>g> the<br />
speech recogniti<strong>on</strong>. To order or book something <str<strong>on</strong>g>with</str<strong>on</strong>g> this kind<br />
of applicati<strong>on</strong>, it follows the exact c<strong>on</strong>versati<strong>on</strong>al dialog. IVR<br />
is prerecorded audio <str<strong>on</strong>g>to</str<strong>on</strong>g> direct user how <str<strong>on</strong>g>to</str<strong>on</strong>g> proceed. With the<br />
use of speech recognizer and speech synthesizer, the<br />
applicati<strong>on</strong>s based <strong>on</strong> IVR can be deployed <str<strong>on</strong>g>to</str<strong>on</strong>g> au<str<strong>on</strong>g>to</str<strong>on</strong>g>mobile<br />
systems for hands-free operati<strong>on</strong>.[3] Most of the time IVR<br />
based applicati<strong>on</strong> can be used for transacti<strong>on</strong>al dialog where<br />
grammar is predefined, and user is bound <str<strong>on</strong>g>to</str<strong>on</strong>g> say or type<br />
restricted quarries.<br />
Despite of the advanced AI <str<strong>on</strong>g>to</str<strong>on</strong>g>ols available, the<br />
questi<strong>on</strong> always remained for how <str<strong>on</strong>g>to</str<strong>on</strong>g> translate a semantic<br />
structure in<str<strong>on</strong>g>to</str<strong>on</strong>g> computer queries or commands that can re-use<br />
existing commercial applicati<strong>on</strong>s and databases that are<br />
proprietary <str<strong>on</strong>g>to</str<strong>on</strong>g> a specific business. Furthermore, such AI suited<br />
languages are difficult <str<strong>on</strong>g>to</str<strong>on</strong>g> use and comprehend. In the latter<br />
years software developers have been forced <str<strong>on</strong>g>to</str<strong>on</strong>g> aband<strong>on</strong> these<br />
languages that are better suited for natural language and opt <str<strong>on</strong>g>to</str<strong>on</strong>g><br />
develop specific dialog flows from scratch using Java, C++<br />
and now VoiceXML. The dialog is then designed for the<br />
specific applicati<strong>on</strong>, but it tends <str<strong>on</strong>g>to</str<strong>on</strong>g> limit the user <str<strong>on</strong>g>to</str<strong>on</strong>g> specify<br />
commands, due <str<strong>on</strong>g>to</str<strong>on</strong>g> the task-oriented nature of these languages.<br />
Although these languages have Object Oriented capabilities<br />
they are still very much task oriented.<br />
A. Problem Statement<br />
This project implements the process of developing a<br />
c<strong>on</strong>versati<strong>on</strong>al dialog for booking a cab. It allows the complex<br />
natural language requests <str<strong>on</strong>g>with</str<strong>on</strong>g> text and speech. In this<br />
applicati<strong>on</strong>, I have designed and implemented language<br />
dialogue for <strong>Android</strong> smartph<strong>on</strong>e that allows a process of<br />
booking a cab. It also allows the user <str<strong>on</strong>g>to</str<strong>on</strong>g> ask open ended<br />
questi<strong>on</strong> related <str<strong>on</strong>g>to</str<strong>on</strong>g> cab, are more c<strong>on</strong>versati<strong>on</strong>al. I have<br />
accounted all possible outcomes of the user’s utterances and<br />
have built a reply for every possible situati<strong>on</strong>. The applicati<strong>on</strong><br />
takes user input as text, entered by the keypad of <strong>Android</strong><br />
smartph<strong>on</strong>e or it can be the text from speech-<str<strong>on</strong>g>to</str<strong>on</strong>g>-text facility<br />
provided <str<strong>on</strong>g>with</str<strong>on</strong>g> speech recognizer opti<strong>on</strong> for <strong>Android</strong><br />
smartph<strong>on</strong>e. In this project I have developed the dicti<strong>on</strong>aries<br />
of words used for the domain of making a reservati<strong>on</strong> for cab.<br />
The words are categorized <str<strong>on</strong>g>with</str<strong>on</strong>g> English linguistic knowledge.<br />
The dicti<strong>on</strong>aries c<strong>on</strong>tain nouns, verbs, adjectives, pr<strong>on</strong>ouns<br />
and numbers. ‘Cab Reservati<strong>on</strong>’ applicati<strong>on</strong> can be<br />
interrupted by exact transacti<strong>on</strong>al dialog <str<strong>on</strong>g>to</str<strong>on</strong>g> simple questi<strong>on</strong>answer<br />
knowledge base c<strong>on</strong>versati<strong>on</strong>al dialog. The exact<br />
dialog allows the user <str<strong>on</strong>g>to</str<strong>on</strong>g> book a cab from <strong>on</strong>e destinati<strong>on</strong> <str<strong>on</strong>g>to</str<strong>on</strong>g><br />
another. The dialog is designed <str<strong>on</strong>g>to</str<strong>on</strong>g> take inputs of departure,<br />
destinati<strong>on</strong>, date, time and the type of vehicle user want <str<strong>on</strong>g>to</str<strong>on</strong>g><br />
reserve. It also provides the ability for user <str<strong>on</strong>g>to</str<strong>on</strong>g> ask open ended
<str<strong>on</strong>g>Natural</str<strong>on</strong>g> <str<strong>on</strong>g>Language</str<strong>on</strong>g> <str<strong>on</strong>g>Processing</str<strong>on</strong>g> <str<strong>on</strong>g>with</str<strong>on</strong>g> <str<strong>on</strong>g>Text</str<strong>on</strong>g>-<str<strong>on</strong>g>to</str<strong>on</strong>g>-<str<strong>on</strong>g>Speech</str<strong>on</strong>g> <strong>on</strong> <strong>Android</strong> S<strong>on</strong>al Bhatt <strong>May</strong> 2011 2<br />
questi<strong>on</strong> not related <str<strong>on</strong>g>to</str<strong>on</strong>g> reservati<strong>on</strong>. This type of questi<strong>on</strong> can<br />
be called as ad-hoc query that allows the user <str<strong>on</strong>g>to</str<strong>on</strong>g> get<br />
informati<strong>on</strong> about cab, i.e. rates, car seats and payment<br />
methods. This dialog has <str<strong>on</strong>g>to</str<strong>on</strong>g> be paired <str<strong>on</strong>g>with</str<strong>on</strong>g> complex business<br />
logic at the applicati<strong>on</strong> level, in order <str<strong>on</strong>g>to</str<strong>on</strong>g> support all such<br />
possible outcomes. Currently there are VoiceXML based<br />
systems are available but those system do not handle<br />
sp<strong>on</strong>taneous user requests and interrupti<strong>on</strong>s in the dialog.<br />
Those applicati<strong>on</strong>s <strong>on</strong>ly support static dialog flow.<br />
The implementati<strong>on</strong> uses the speech synthesizer for<br />
the android device. The text-<str<strong>on</strong>g>to</str<strong>on</strong>g>-speech (TTS) library allows<br />
the user <str<strong>on</strong>g>to</str<strong>on</strong>g> hear the resp<strong>on</strong>se from the system.<br />
B. Motivati<strong>on</strong><br />
Since speech is such a natural medium for<br />
communicati<strong>on</strong>, users' expectati<strong>on</strong>s of a speech applicati<strong>on</strong><br />
tend <str<strong>on</strong>g>to</str<strong>on</strong>g> be extremely high. There are some particular situati<strong>on</strong>s<br />
when people want <str<strong>on</strong>g>to</str<strong>on</strong>g> use speech applicati<strong>on</strong> - for example,<br />
when the user's hands and eyes are busy – while driving a car,<br />
accessing some locati<strong>on</strong> or ordering something over the<br />
ph<strong>on</strong>e. Sometimes people just want <str<strong>on</strong>g>to</str<strong>on</strong>g> access their electr<strong>on</strong>ic<br />
mail while driving. In this kind of situati<strong>on</strong> people tend <str<strong>on</strong>g>to</str<strong>on</strong>g> use<br />
speech applicati<strong>on</strong> expecting a successful dialog <str<strong>on</strong>g>with</str<strong>on</strong>g> accurate<br />
translati<strong>on</strong> of speech-<str<strong>on</strong>g>to</str<strong>on</strong>g>-text.<br />
People use airline informati<strong>on</strong> system, banking, ordering,<br />
reservati<strong>on</strong> system used at a hotel, ATM and many more as<br />
the transacti<strong>on</strong>al exact requests. Any <strong>on</strong>line system may<br />
require transacti<strong>on</strong> that are just informati<strong>on</strong> retrieval open<br />
ended questi<strong>on</strong> and some systems use exact request<br />
c<strong>on</strong>firmati<strong>on</strong> dialog. By moving forward developing this<br />
applicati<strong>on</strong>, the idea of ease of access for ‘Cab Reservati<strong>on</strong>’<br />
can be achieved. The integrati<strong>on</strong> of speech-<str<strong>on</strong>g>to</str<strong>on</strong>g>-text and text-<str<strong>on</strong>g>to</str<strong>on</strong>g>speech<br />
allows the applicati<strong>on</strong> <str<strong>on</strong>g>to</str<strong>on</strong>g> be used in certain<br />
envir<strong>on</strong>ment where people cannot text. Multimodality of<br />
texting and speech of this applicati<strong>on</strong> allows the user <str<strong>on</strong>g>to</str<strong>on</strong>g> book a<br />
cab while they are in the situati<strong>on</strong> where they cannot speak<br />
loudly. Furthermore, the features of ‘Cab Reservati<strong>on</strong>’<br />
applicati<strong>on</strong> allow easy usability of the applicati<strong>on</strong> for the<br />
people <str<strong>on</strong>g>with</str<strong>on</strong>g> disabilities.<br />
C. Applicati<strong>on</strong>s<br />
The first thing comes in<str<strong>on</strong>g>to</str<strong>on</strong>g> mind when we talk about<br />
text-<str<strong>on</strong>g>to</str<strong>on</strong>g>-speech applicati<strong>on</strong> is aiding the handicapped people.<br />
Blind people widely benefit from text-<str<strong>on</strong>g>to</str<strong>on</strong>g>-speech systems,<br />
when coupled <str<strong>on</strong>g>with</str<strong>on</strong>g> Optical Recogniti<strong>on</strong> Systems (OCR), [4]<br />
which give them access <str<strong>on</strong>g>to</str<strong>on</strong>g> written informati<strong>on</strong>. The market for<br />
speech synthesis for blind users of pers<strong>on</strong>al computers will<br />
so<strong>on</strong> be invaded by mass-market synthesizers bundled <str<strong>on</strong>g>with</str<strong>on</strong>g><br />
sound cards. DECtalk (TM) is already available <str<strong>on</strong>g>with</str<strong>on</strong>g> the latest<br />
SoundBlaster cards now. When the computer Aided Learning<br />
System combines <str<strong>on</strong>g>with</str<strong>on</strong>g> a <str<strong>on</strong>g>Text</str<strong>on</strong>g>-<str<strong>on</strong>g>to</str<strong>on</strong>g>-speech synthesizer (TTS), it<br />
will provide more helpful language educati<strong>on</strong> <str<strong>on</strong>g>to</str<strong>on</strong>g>ol [4].<br />
More natural communicati<strong>on</strong> can be d<strong>on</strong>e between human<br />
and machine <str<strong>on</strong>g>with</str<strong>on</strong>g> the text-<str<strong>on</strong>g>to</str<strong>on</strong>g>-speech (TTS) and also <str<strong>on</strong>g>with</str<strong>on</strong>g> the<br />
help of good voice recognizer. In this category <strong>on</strong>ly having<br />
good voice recognizer does not help having a successful<br />
communicati<strong>on</strong> in more natural way. There has <str<strong>on</strong>g>to</str<strong>on</strong>g> be a precise<br />
natural language understanding. For the domain specific<br />
communicati<strong>on</strong>, natural language parser plays an important<br />
role for high quality multimedia communicati<strong>on</strong>s [5].<br />
D. Expected End Results/Goals<br />
By implementati<strong>on</strong> the ‘Cab Reservati<strong>on</strong>’<br />
applicati<strong>on</strong>, following accomplishments are expected <str<strong>on</strong>g>to</str<strong>on</strong>g> be<br />
achieved.<br />
· Allowing the users <str<strong>on</strong>g>to</str<strong>on</strong>g> be sp<strong>on</strong>taneous, at any point in time of<br />
the transacti<strong>on</strong>al dialog of reservati<strong>on</strong> and informati<strong>on</strong>al<br />
retrieval about the cab.<br />
. Allowing the users <str<strong>on</strong>g>to</str<strong>on</strong>g> speak naturally and ask questi<strong>on</strong>s in<br />
different ways. At each step providing feedback or<br />
acknowledgement prompts.<br />
· Understand the user request and categorize it as exact<br />
transacti<strong>on</strong> or open ended fuzzy logic <str<strong>on</strong>g>with</str<strong>on</strong>g> simply questi<strong>on</strong>answer<br />
c<strong>on</strong>versati<strong>on</strong>.<br />
. Building dialogs for interacting <str<strong>on</strong>g>with</str<strong>on</strong>g> the users in more<br />
natural way and engage them in more natural c<strong>on</strong>versati<strong>on</strong>s.<br />
. Recognize and understand over 90% of the speaker’s<br />
requests even <str<strong>on</strong>g>with</str<strong>on</strong>g> l<strong>on</strong>g and complicated sentences.<br />
· Interacting <str<strong>on</strong>g>with</str<strong>on</strong>g> backend process for informati<strong>on</strong> retrieval,<br />
updating, inserting or deleting data.<br />
. Allowing the users <str<strong>on</strong>g>to</str<strong>on</strong>g> get the resp<strong>on</strong>se in either medium of<br />
speech or text.<br />
Once all of the above items are completed, the end result<br />
will be the ‘Cab Reservati<strong>on</strong>’ applicati<strong>on</strong> <str<strong>on</strong>g>with</str<strong>on</strong>g> Artificial<br />
Intelligent Agent (“AI Agent”) which will be capable of<br />
processing user’s natural speech and text for reserving a cab<br />
and engaging the user in c<strong>on</strong>versati<strong>on</strong>s <str<strong>on</strong>g>with</str<strong>on</strong>g> follow up<br />
prompts, interacting <str<strong>on</strong>g>with</str<strong>on</strong>g> a backend applicati<strong>on</strong> and in turn<br />
perform true au<str<strong>on</strong>g>to</str<strong>on</strong>g>mated cus<str<strong>on</strong>g>to</str<strong>on</strong>g>mer self-service providing multi<br />
modality resp<strong>on</strong>se <str<strong>on</strong>g>with</str<strong>on</strong>g> use of text-<str<strong>on</strong>g>to</str<strong>on</strong>g>-speech synthesizer <strong>on</strong><br />
android. Interactive voice resp<strong>on</strong>se systems are relied <strong>on</strong><br />
teleph<strong>on</strong>e keypad input. Therefore, in this ‘Cab Reservati<strong>on</strong>’<br />
speech applicati<strong>on</strong> user can speak any phrase such as “I would<br />
like <str<strong>on</strong>g>to</str<strong>on</strong>g> reserve a cab from Phoenix zoo <str<strong>on</strong>g>to</str<strong>on</strong>g> Ariz<strong>on</strong>a State<br />
University” instead of pressing different but<str<strong>on</strong>g>to</str<strong>on</strong>g>ns or numbers<br />
for various opti<strong>on</strong>s <strong>on</strong> the mobile device.<br />
II. RELATED WORK<br />
<str<strong>on</strong>g>Speech</str<strong>on</strong>g> technology can use compositi<strong>on</strong>,<br />
transcripti<strong>on</strong>, transacti<strong>on</strong> and collaborati<strong>on</strong> dialog based <strong>on</strong><br />
particular domain [4]. <str<strong>on</strong>g>Natural</str<strong>on</strong>g> <str<strong>on</strong>g>Language</str<strong>on</strong>g> Understanding<br />
(NLU) and speech recogniti<strong>on</strong> are two independent<br />
technologies. When these two technologies can be combined,<br />
it provides the powerful human-computer interacti<strong>on</strong> (HCI).<br />
<str<strong>on</strong>g>Natural</str<strong>on</strong>g> language understanding has been an active area<br />
research for decades. Since then, the field of Artificial<br />
Intelligence (AI) has evolved researchers are borrowing ideas<br />
from the fields of mathematics, linguistics, psychology and<br />
philosophy. From the research of l<strong>on</strong>g decades it can be<br />
derived that the c<strong>on</strong>venti<strong>on</strong>al computer programs and<br />
procedural paradigms were not suited for the challenge at<br />
hand. By procedural paradigms I am referring <str<strong>on</strong>g>to</str<strong>on</strong>g> task oriented<br />
programming, such as a program written in a 3rd generati<strong>on</strong><br />
language like COBOL, Fortran, or C [5]. Completely different<br />
languages and <str<strong>on</strong>g>to</str<strong>on</strong>g>ols had <str<strong>on</strong>g>to</str<strong>on</strong>g> be created <str<strong>on</strong>g>to</str<strong>on</strong>g> help the development
<str<strong>on</strong>g>Natural</str<strong>on</strong>g> <str<strong>on</strong>g>Language</str<strong>on</strong>g> <str<strong>on</strong>g>Processing</str<strong>on</strong>g> <str<strong>on</strong>g>with</str<strong>on</strong>g> <str<strong>on</strong>g>Text</str<strong>on</strong>g>-<str<strong>on</strong>g>to</str<strong>on</strong>g>-<str<strong>on</strong>g>Speech</str<strong>on</strong>g> <strong>on</strong> <strong>Android</strong> S<strong>on</strong>al Bhatt <strong>May</strong> 2011 3<br />
of C<strong>on</strong>versati<strong>on</strong>al Systems, such as Lisp (based <strong>on</strong> Lamda<br />
Calculus), Prolog (based <strong>on</strong> predicate calculus), SmallTalk<br />
(based <strong>on</strong> objects), semantic nets, frames, etc.<br />
There are many systems in the world <str<strong>on</strong>g>to</str<strong>on</strong>g> dem<strong>on</strong>strates<br />
the various linguistic issues such as ELISA, Winograd’s<br />
SHRDLU simulated a robot that c<strong>on</strong>trol blocks <strong>on</strong> a table<str<strong>on</strong>g>to</str<strong>on</strong>g>p,<br />
LUNAR, LIFER [6].<br />
ELIZA is a very early example of natural language<br />
processing. When I research about the natural language<br />
processing, first comes in<str<strong>on</strong>g>to</str<strong>on</strong>g> mind, ELIZA written at MIT by<br />
Josef Weizenbaum between 1964 <str<strong>on</strong>g>to</str<strong>on</strong>g> 1966 [6]. ELIZA worked<br />
by parsing and substituti<strong>on</strong> of key words in<str<strong>on</strong>g>to</str<strong>on</strong>g> phrases [ref].<br />
ELIZA computer program using almost no informati<strong>on</strong> about<br />
the human thought or emoti<strong>on</strong>, it provided human-like<br />
interacti<strong>on</strong>.<br />
Sec<strong>on</strong>dly, a chatterbot is designed <str<strong>on</strong>g>to</str<strong>on</strong>g> simulate an<br />
intelligent c<strong>on</strong>versati<strong>on</strong> <str<strong>on</strong>g>with</str<strong>on</strong>g> humans via speech or text. This<br />
chatterbot is based <strong>on</strong> the theory of Turning Test described in<br />
the introducti<strong>on</strong> of this paper. The technology, used by the<br />
chatterbot, <str<strong>on</strong>g>to</str<strong>on</strong>g> generate resp<strong>on</strong>se is simply finding a keyword<br />
from the input and get the reply from database <str<strong>on</strong>g>with</str<strong>on</strong>g> matching<br />
keywords or wording patterns [6].<br />
In the field of linguistic, Chomsky proposed the X-<br />
bar theory in 1970 and was further developed by Jackendoff in<br />
1977. X-bar theory identifies the syntactic presumably for<br />
human languages [7]. The letter X is used for part of speech.<br />
So in the process of parsing a speech or utterances, all the<br />
lexic<strong>on</strong> or <strong>on</strong><str<strong>on</strong>g>to</str<strong>on</strong>g>logical categories are assigned <str<strong>on</strong>g>to</str<strong>on</strong>g> each of the<br />
word in speech. Therefore N assigned <str<strong>on</strong>g>to</str<strong>on</strong>g> noun, A assigned for<br />
adjective, V for Verb and P assigned <str<strong>on</strong>g>to</str<strong>on</strong>g> prepositi<strong>on</strong>. The main<br />
proposal of X-bar theory is all phrases are defined by rules.<br />
According <str<strong>on</strong>g>to</str<strong>on</strong>g> the X-bar theory, every rule has c<strong>on</strong>ceptual<br />
structural schema [8].<br />
Figure 1: General Understanding of X-bar theory<br />
According <str<strong>on</strong>g>to</str<strong>on</strong>g> this theory the simple sentence “Mike<br />
likes Maria” is parsed as following syntactic category as<br />
shown in Figure 2. Where a zero-level word (category)<br />
“Maria” combines <str<strong>on</strong>g>with</str<strong>on</strong>g> some other element, “like” and an x-<br />
bar level category formed called V-bar. When X-bar level<br />
category combines <str<strong>on</strong>g>with</str<strong>on</strong>g> some further element, called “Mike”<br />
is formed the XP level category called VP [9].<br />
<str<strong>on</strong>g>Text</str<strong>on</strong>g>-<str<strong>on</strong>g>to</str<strong>on</strong>g>-<str<strong>on</strong>g>Speech</str<strong>on</strong>g> synthesizer c<strong>on</strong>verts the written text<br />
<str<strong>on</strong>g>to</str<strong>on</strong>g> sounds. Nowadays in the world there are number of speech<br />
synthesizers available which have low performance ratio yet<br />
satisfac<str<strong>on</strong>g>to</str<strong>on</strong>g>ry audio output in different languages such as<br />
English, Japanese and Swedish. The techniques used in speech<br />
synthesizer includes c<strong>on</strong>catenati<strong>on</strong> of digital recordings,<br />
synthesis by rule, where the informati<strong>on</strong> is provided for the<br />
words <str<strong>on</strong>g>to</str<strong>on</strong>g> make in<str<strong>on</strong>g>to</str<strong>on</strong>g>nati<strong>on</strong>, and <str<strong>on</strong>g>to</str<strong>on</strong>g>ne.<br />
Figure 2: Parsing <str<strong>on</strong>g>with</str<strong>on</strong>g> X-bar theory<br />
Figure 3 shows the functi<strong>on</strong>al diagram of a very<br />
general TTS synthesizer. A simple text is process by <str<strong>on</strong>g>Natural</str<strong>on</strong>g><br />
language processing software <str<strong>on</strong>g>with</str<strong>on</strong>g> linguistic knowledge and<br />
some logical inferences. Then the text goes <str<strong>on</strong>g>to</str<strong>on</strong>g> make some<br />
ph<strong>on</strong>etic transcripti<strong>on</strong> <str<strong>on</strong>g>with</str<strong>on</strong>g> desired in<str<strong>on</strong>g>to</str<strong>on</strong>g>nati<strong>on</strong> and rhymes [5].<br />
Then it passes through the Digital Signal processing <str<strong>on</strong>g>to</str<strong>on</strong>g><br />
transform that symbolic informati<strong>on</strong> in<str<strong>on</strong>g>to</str<strong>on</strong>g> speech <str<strong>on</strong>g>with</str<strong>on</strong>g> the help<br />
of mathematical models, algorithms and computati<strong>on</strong>s.<br />
Figure 3: <str<strong>on</strong>g>Text</str<strong>on</strong>g>-<str<strong>on</strong>g>to</str<strong>on</strong>g>-<str<strong>on</strong>g>Speech</str<strong>on</strong>g> Synthesizer<br />
Most of the time text-<str<strong>on</strong>g>to</str<strong>on</strong>g>-speech synthesizer costs the<br />
user <str<strong>on</strong>g>to</str<strong>on</strong>g> say some specific and restricted text <str<strong>on</strong>g>to</str<strong>on</strong>g> pr<strong>on</strong>ounce.<br />
Sometimes the quality of the “emoti<strong>on</strong>al dynamics” also<br />
comes in<str<strong>on</strong>g>to</str<strong>on</strong>g> the play as the outputs are not as comparable <str<strong>on</strong>g>to</str<strong>on</strong>g><br />
human speech performances. Although giving less satisfac<str<strong>on</strong>g>to</str<strong>on</strong>g>ry<br />
output, these synthesizers solve the problem in real time <str<strong>on</strong>g>with</str<strong>on</strong>g><br />
limited memory requirements. The example of this technology<br />
is Emily(ref), that acts as reading coach for children. It<br />
provides reading passages and makes correcti<strong>on</strong>. Other<br />
examples of speech processing c<strong>on</strong>sist DECtalk,<br />
Drag<strong>on</strong>Dictate, and Ph<strong>on</strong>etic Engine [6].<br />
A. Architecture<br />
III. IMPLEMENTATION<br />
Figure 4: Software Architecture
<str<strong>on</strong>g>Natural</str<strong>on</strong>g> <str<strong>on</strong>g>Language</str<strong>on</strong>g> <str<strong>on</strong>g>Processing</str<strong>on</strong>g> <str<strong>on</strong>g>with</str<strong>on</strong>g> <str<strong>on</strong>g>Text</str<strong>on</strong>g>-<str<strong>on</strong>g>to</str<strong>on</strong>g>-<str<strong>on</strong>g>Speech</str<strong>on</strong>g> <strong>on</strong> <strong>Android</strong> S<strong>on</strong>al Bhatt <strong>May</strong> 2011 4<br />
To develop the “Cab Reservati<strong>on</strong>” applicati<strong>on</strong> <str<strong>on</strong>g>with</str<strong>on</strong>g> <str<strong>on</strong>g>Natural</str<strong>on</strong>g><br />
<str<strong>on</strong>g>Language</str<strong>on</strong>g> <str<strong>on</strong>g>Processing</str<strong>on</strong>g> and <str<strong>on</strong>g>Text</str<strong>on</strong>g>-<str<strong>on</strong>g>to</str<strong>on</strong>g>-<str<strong>on</strong>g>Speech</str<strong>on</strong>g>, I have used the<br />
Client-Server Architecture approach. As the goal of the<br />
applicati<strong>on</strong> is <str<strong>on</strong>g>to</str<strong>on</strong>g> provide text-<str<strong>on</strong>g>to</str<strong>on</strong>g>-speech functi<strong>on</strong>ality <strong>on</strong><br />
mobile device, it uses <strong>Android</strong> operating system device as<br />
client. It can be any model of the ph<strong>on</strong>e using <strong>Android</strong> 2.2 or<br />
above versi<strong>on</strong>. On the server side, The <str<strong>on</strong>g>Natural</str<strong>on</strong>g> <str<strong>on</strong>g>Language</str<strong>on</strong>g><br />
Parser is developed in Java. The grammars and dicti<strong>on</strong>aries<br />
used for the ‘Cab Reservati<strong>on</strong> applicati<strong>on</strong> are in the .txt format<br />
packaged in ‘<str<strong>on</strong>g>Language</str<strong>on</strong>g>’ direc<str<strong>on</strong>g>to</str<strong>on</strong>g>ry of the applicati<strong>on</strong> package.<br />
User can c<strong>on</strong>nect <str<strong>on</strong>g>to</str<strong>on</strong>g> the server from the android ph<strong>on</strong>e by<br />
giving the host name and the port number of the server. User<br />
can speak or type the questi<strong>on</strong> <str<strong>on</strong>g>to</str<strong>on</strong>g> the device and then it<br />
c<strong>on</strong>nects <str<strong>on</strong>g>to</str<strong>on</strong>g> the server <str<strong>on</strong>g>to</str<strong>on</strong>g> parse the speech and process it using<br />
natural language processing. The server communicates <str<strong>on</strong>g>to</str<strong>on</strong>g> the<br />
android client and gives the resp<strong>on</strong>se in text format. <strong>Android</strong><br />
client is using <str<strong>on</strong>g>Text</str<strong>on</strong>g>-<str<strong>on</strong>g>to</str<strong>on</strong>g>-speech library <str<strong>on</strong>g>to</str<strong>on</strong>g> process the text in<str<strong>on</strong>g>to</str<strong>on</strong>g><br />
speech. Finally user can get the resp<strong>on</strong>se of the questi<strong>on</strong> asked<br />
in speech format <strong>on</strong> android device.<br />
C. Design<br />
B. Technologies and Libraries Used<br />
The “Cab Reservati<strong>on</strong>” applicati<strong>on</strong> <str<strong>on</strong>g>with</str<strong>on</strong>g> <str<strong>on</strong>g>Natural</str<strong>on</strong>g><br />
<str<strong>on</strong>g>Language</str<strong>on</strong>g> <str<strong>on</strong>g>Processing</str<strong>on</strong>g> (NLP) and text-<str<strong>on</strong>g>to</str<strong>on</strong>g>-speech (TTS)<br />
functi<strong>on</strong>ality is developed using the Java language versi<strong>on</strong> 1.6<br />
<strong>on</strong> the Windows operating system. In additi<strong>on</strong> it requires the<br />
<strong>Android</strong> operating system <str<strong>on</strong>g>to</str<strong>on</strong>g>ol <str<strong>on</strong>g>to</str<strong>on</strong>g> use as client. In this<br />
applicati<strong>on</strong> the <str<strong>on</strong>g>Text</str<strong>on</strong>g>-<str<strong>on</strong>g>to</str<strong>on</strong>g>-<str<strong>on</strong>g>Speech</str<strong>on</strong>g> library used for the <strong>Android</strong><br />
device is open-sourced. It can be downloaded <str<strong>on</strong>g>to</str<strong>on</strong>g> any <strong>Android</strong><br />
device from android market. This TTS library is also packaged<br />
in the software running <strong>on</strong> Windows machine.<br />
I have developed two different types of grammar for this<br />
applicati<strong>on</strong>. First, <str<strong>on</strong>g>to</str<strong>on</strong>g> facilitate transacti<strong>on</strong>al dialog for booking<br />
a cab, I have created cabexactgrammar.txt file and sec<strong>on</strong>d, for<br />
ad-hoc, open-ended questi<strong>on</strong>s for cab informati<strong>on</strong> retrieval I<br />
have created cabfuzzygrammar.txt. To generate these<br />
grammars I have used the NuGram IDE [10] plug-in for<br />
Eclipse SDK. NuGram IDE is open sourced Eclipse plug-in<br />
that offers <str<strong>on</strong>g>to</str<strong>on</strong>g> generate speech recogniti<strong>on</strong> grammar in<br />
Augmented Bakus Naur Form (ABNF) format. This format is<br />
a plain-text, n<strong>on</strong>-XML, representati<strong>on</strong> of a traditi<strong>on</strong>al Bakus<br />
Naur Form(BNF) grammar. The body of a grammar c<strong>on</strong>sists<br />
of a set of rule definiti<strong>on</strong>s. Each rule definiti<strong>on</strong> associates a<br />
rule name <str<strong>on</strong>g>with</str<strong>on</strong>g> a rule expansi<strong>on</strong>. The purpose of the rule<br />
definiti<strong>on</strong> is <str<strong>on</strong>g>to</str<strong>on</strong>g> associate a legal rule expansi<strong>on</strong> <str<strong>on</strong>g>with</str<strong>on</strong>g> a rule<br />
name. Most grammars identify a set of possible words that a<br />
user might text or say, the <str<strong>on</strong>g>to</str<strong>on</strong>g>p-level rule expansi<strong>on</strong> in a<br />
grammar rule is usually a set of alternatives. For example,<br />
$Vehicle = sedan | coup | minivan;<br />
This rule is named ‘Vehicle’, and the rule expansi<strong>on</strong> is the set<br />
of alternatives. This grammar is matched if the user says or<br />
texts "sedan", "coup", or "minivan". Here the terminology<br />
used for these actual words, user might say, is <str<strong>on</strong>g>to</str<strong>on</strong>g>kens. I have<br />
generated ABNF format grammars and c<strong>on</strong>vert them in<str<strong>on</strong>g>to</str<strong>on</strong>g><br />
simple text format for the <str<strong>on</strong>g>Natural</str<strong>on</strong>g> <str<strong>on</strong>g>Language</str<strong>on</strong>g> Parser <str<strong>on</strong>g>to</str<strong>on</strong>g> be<br />
unders<str<strong>on</strong>g>to</str<strong>on</strong>g>od and processed easily.<br />
Figure 5: Applicati<strong>on</strong> Flow<br />
Figure 5 describes the basic communicati<strong>on</strong> structure of<br />
android client <str<strong>on</strong>g>to</str<strong>on</strong>g> ‘Cab Reservati<strong>on</strong>’ applicati<strong>on</strong> <strong>on</strong> the server.<br />
The android smarph<strong>on</strong>e takes the input as speech or as text<br />
and is processed though the server using natural language<br />
parser. The natural language parser understands the input and<br />
parses the speech in parts and categorizes the request as<br />
questi<strong>on</strong>-answer fuzzy logic or cab booking transacti<strong>on</strong>al<br />
dialog based <strong>on</strong> the inputs of two grammar files. The server<br />
gets the appropriate resp<strong>on</strong>se from the knowledge base text<br />
database. This resp<strong>on</strong>se is processed by <str<strong>on</strong>g>Text</str<strong>on</strong>g>-<str<strong>on</strong>g>to</str<strong>on</strong>g>-<str<strong>on</strong>g>Speech</str<strong>on</strong>g> (TTS)<br />
library <strong>on</strong> android device and plays and displays the prompt<br />
<strong>on</strong> android screen by android client.<br />
An utterance is c<strong>on</strong>verted in<str<strong>on</strong>g>to</str<strong>on</strong>g> multiple recogniti<strong>on</strong><br />
(n-best) results or captured from text box of the client API and<br />
sent <str<strong>on</strong>g>to</str<strong>on</strong>g> the server API. The server API takes the (n-best)<br />
phrase and processes it through the <str<strong>on</strong>g>Natural</str<strong>on</strong>g> <str<strong>on</strong>g>Language</str<strong>on</strong>g><br />
Understanding Parser, the Parser first checks the Cab Booking<br />
exact grammar. If the parsing is not successful, then it checks<br />
the cab fuzzy logic Grammar. The phrase is then broken apart<br />
syntactically by the Parser using the <str<strong>on</strong>g>to</str<strong>on</strong>g>kens and rules defined<br />
in ABNF grammars. As the phrase is being broken apart<br />
syntactically, the C<strong>on</strong>ceptual Structure is being assembled<br />
through the process of parsing. From this point <strong>on</strong>, the system<br />
no l<strong>on</strong>ger deals <str<strong>on</strong>g>with</str<strong>on</strong>g> phrases, but rather <str<strong>on</strong>g>with</str<strong>on</strong>g> c<strong>on</strong>ceptual<br />
structures. Dicti<strong>on</strong>aries of nouns, verbs, prepositi<strong>on</strong>, adjectives<br />
and numbers are used <str<strong>on</strong>g>to</str<strong>on</strong>g> form the semantic c<strong>on</strong>ceptual<br />
structure from the parts of speech. This process is d<strong>on</strong>e<br />
defining the <str<strong>on</strong>g>to</str<strong>on</strong>g>kens in the grammar file. As the parser goes<br />
through each of the rule for correct match, it puts the proper<br />
<strong>on</strong><str<strong>on</strong>g>to</str<strong>on</strong>g>logical category defined as <str<strong>on</strong>g>to</str<strong>on</strong>g>ken in grammar file. For the<br />
booking grammar, the <strong>on</strong><str<strong>on</strong>g>to</str<strong>on</strong>g>logical category defined as the<br />
name of the states of the states file which are the pieces of the<br />
informati<strong>on</strong> need <str<strong>on</strong>g>to</str<strong>on</strong>g> fulfill the cab booking process.<br />
For example, if the user asks the open ended questi<strong>on</strong> “Can I<br />
bring my pet?”, <str<strong>on</strong>g>Natural</str<strong>on</strong>g> <str<strong>on</strong>g>Language</str<strong>on</strong>g> Parser breaks the sentence
<str<strong>on</strong>g>Natural</str<strong>on</strong>g> <str<strong>on</strong>g>Language</str<strong>on</strong>g> <str<strong>on</strong>g>Processing</str<strong>on</strong>g> <str<strong>on</strong>g>with</str<strong>on</strong>g> <str<strong>on</strong>g>Text</str<strong>on</strong>g>-<str<strong>on</strong>g>to</str<strong>on</strong>g>-<str<strong>on</strong>g>Speech</str<strong>on</strong>g> <strong>on</strong> <strong>Android</strong> S<strong>on</strong>al Bhatt <strong>May</strong> 2011 5<br />
in<str<strong>on</strong>g>to</str<strong>on</strong>g> parts (words). For each word its finding the right category<br />
based <strong>on</strong> the dicti<strong>on</strong>aries in the system. Here the word ‘pet’<br />
falls in<str<strong>on</strong>g>to</str<strong>on</strong>g> the ‘noun’ category, so the parser puts that word in<str<strong>on</strong>g>to</str<strong>on</strong>g><br />
the <strong>on</strong><str<strong>on</strong>g>to</str<strong>on</strong>g>logical category named DOBJECT. For the verb<br />
‘bring’ the <strong>on</strong><str<strong>on</strong>g>to</str<strong>on</strong>g>logical category is EVENT. So below is the<br />
c<strong>on</strong>ceptual structure parsed made for the above example<br />
questi<strong>on</strong>. It changes the first pers<strong>on</strong> of ‘I’ from the questi<strong>on</strong> <str<strong>on</strong>g>to</str<strong>on</strong>g><br />
sec<strong>on</strong>d pers<strong>on</strong> ‘you’ when it parts the request.<br />
(THING(SUBJ(VALUE you))(EVENT bring(DOBJECT my(<br />
VALUE pet))))<br />
Figure 6: Dialog Direc<str<strong>on</strong>g>to</str<strong>on</strong>g>r<br />
As shown in Figure 6, The Dialog Direc<str<strong>on</strong>g>to</str<strong>on</strong>g>r is resp<strong>on</strong>sible<br />
for keeping track of the c<strong>on</strong>versati<strong>on</strong> and building the dialog<br />
flow <strong>on</strong> the fly. If the parsing <str<strong>on</strong>g>with</str<strong>on</strong>g> the cab booking<br />
transacti<strong>on</strong>al grammar is successful, then the Dialog Direc<str<strong>on</strong>g>to</str<strong>on</strong>g>r<br />
determines that this is an “exact request” (booking type). If the<br />
parsing <str<strong>on</strong>g>with</str<strong>on</strong>g> the cab fuzzy logic Grammar is successful, then it<br />
determines that this is a “fuzzy request” (ad-hoc query). If the<br />
request is an ad-hoc query or questi<strong>on</strong>, then the Dialog<br />
Direc<str<strong>on</strong>g>to</str<strong>on</strong>g>r determines whether the request is interrupting a subdialog<br />
and it keeps the current dialog c<strong>on</strong>text. If the request<br />
interrupts the dialog for any other reas<strong>on</strong>s, then the Dialog<br />
Direc<str<strong>on</strong>g>to</str<strong>on</strong>g>r determines whether <str<strong>on</strong>g>to</str<strong>on</strong>g> interrupt the current sub dialog<br />
<str<strong>on</strong>g>with</str<strong>on</strong>g> the new request. The Dialog direc<str<strong>on</strong>g>to</str<strong>on</strong>g>r also keeps track of<br />
where the dialog is at any given point in time, whether this is a<br />
correcti<strong>on</strong>, or whether the dialog is at a c<strong>on</strong>firmati<strong>on</strong> or user<br />
wants <str<strong>on</strong>g>to</str<strong>on</strong>g> repeat the prompt.<br />
Once the Dialog Direc<str<strong>on</strong>g>to</str<strong>on</strong>g>r determines how the request<br />
must be processed, a c<strong>on</strong>ceptual structure is sent <str<strong>on</strong>g>to</str<strong>on</strong>g> either the<br />
cab informati<strong>on</strong> dialog or <str<strong>on</strong>g>to</str<strong>on</strong>g> the cab booking dialog as show<br />
in Figure 6. If the request is open ended informati<strong>on</strong> retirieval<br />
based then it c<strong>on</strong>tacts the XML Knowledge Base named<br />
cabanswer.xml file <str<strong>on</strong>g>to</str<strong>on</strong>g> retrieves a correct answer (in the form<br />
of a c<strong>on</strong>ceptual structure). These answers are also c<strong>on</strong>verted<br />
in<str<strong>on</strong>g>to</str<strong>on</strong>g> c<strong>on</strong>ceptual networks and are compared heuristically <str<strong>on</strong>g>with</str<strong>on</strong>g><br />
the c<strong>on</strong>ceptual structure. A successful comparis<strong>on</strong> yields a<br />
correct answer. If the request is categorized by the direc<str<strong>on</strong>g>to</str<strong>on</strong>g>r as<br />
transacti<strong>on</strong>al booking dialog then direc<str<strong>on</strong>g>to</str<strong>on</strong>g>r c<strong>on</strong>tacts the<br />
cabstates.xml file <str<strong>on</strong>g>to</str<strong>on</strong>g> fire the next prompt for the user.<br />
Coding Details<br />
The cab booking applicati<strong>on</strong> <str<strong>on</strong>g>with</str<strong>on</strong>g> natural language<br />
processing c<strong>on</strong>figures the Artificial Intelligence envir<strong>on</strong>ment.<br />
The applicati<strong>on</strong> engine builds the dialog <strong>on</strong> the fly, depending<br />
<strong>on</strong> a specific situati<strong>on</strong> in the c<strong>on</strong>versati<strong>on</strong>, where the number<br />
of situati<strong>on</strong>s could potentially be exp<strong>on</strong>ential. That is why the<br />
system gives a more natural flow <str<strong>on</strong>g>to</str<strong>on</strong>g> the dialog that would be<br />
difficult <str<strong>on</strong>g>to</str<strong>on</strong>g> simulate <str<strong>on</strong>g>with</str<strong>on</strong>g> another programming paradigm.<br />
The nature of the c<strong>on</strong>versati<strong>on</strong>s is driven by both<br />
grammars described above al<strong>on</strong>g <str<strong>on</strong>g>with</str<strong>on</strong>g> the XML declarative<br />
definiti<strong>on</strong>s as cabstates.xml for booking dialog and<br />
cabanswers.xml <str<strong>on</strong>g>to</str<strong>on</strong>g> retrieve knowledge base answer. Changes<br />
in grammars and the XML definiti<strong>on</strong>s make up the nature and<br />
c<strong>on</strong>tent of the c<strong>on</strong>versati<strong>on</strong>s. The implementati<strong>on</strong> c<strong>on</strong>tains the<br />
following comp<strong>on</strong>ents:<br />
· The Java based API for natural language parser that takes<br />
user input and parse the speech in<str<strong>on</strong>g>to</str<strong>on</strong>g> meaningful and system<br />
understandable semantic, c<strong>on</strong>ceptual form.<br />
· A direc<str<strong>on</strong>g>to</str<strong>on</strong>g>ry c<strong>on</strong>sists of text files defined as dicti<strong>on</strong>aries.<br />
These dicti<strong>on</strong>aries are categorized based <strong>on</strong> the English<br />
language grammar.<br />
· A cab booking grammar for exact queries/booking<br />
transacti<strong>on</strong>s in ABNF format.<br />
· A cab fuzzy logic grammar for handling ad-hoc open ended<br />
informati<strong>on</strong> retrieval request.<br />
These grammars are more powerful as they allow the <str<strong>on</strong>g>to</str<strong>on</strong>g>kens<br />
of the comprehensive dicti<strong>on</strong>aries included <str<strong>on</strong>g>with</str<strong>on</strong>g>in the system.<br />
The semantic informati<strong>on</strong> refers <str<strong>on</strong>g>to</str<strong>on</strong>g> <strong>on</strong><str<strong>on</strong>g>to</str<strong>on</strong>g>logical categories or<br />
other special categories freely chosen by the user. As the<br />
booking request transacti<strong>on</strong>al dialog uses cab booking<br />
grammar and cabstates.xml file <str<strong>on</strong>g>to</str<strong>on</strong>g> build the c<strong>on</strong>ceptual<br />
structure, ad-hoc query type questi<strong>on</strong>s uses the fuzzy logic<br />
grammar <str<strong>on</strong>g>to</str<strong>on</strong>g> parse the speech and build a semantic structure.<br />
This process uses nouns, verbs, prepositi<strong>on</strong>, adjectives, and<br />
numbers dicti<strong>on</strong>ary files <str<strong>on</strong>g>to</str<strong>on</strong>g> map the words <str<strong>on</strong>g>to</str<strong>on</strong>g> semantic<br />
<strong>on</strong><str<strong>on</strong>g>to</str<strong>on</strong>g>logical category. On<str<strong>on</strong>g>to</str<strong>on</strong>g>logical dicti<strong>on</strong>ary in the system<br />
helps the word <str<strong>on</strong>g>to</str<strong>on</strong>g> define in category called place, quantity,<br />
time, thing and manner.<br />
This type of category is mapped <str<strong>on</strong>g>to</str<strong>on</strong>g> the word when the<br />
questi<strong>on</strong> starts <str<strong>on</strong>g>with</str<strong>on</strong>g> ‘how’, ’what’,’ why’, ’where’, ’when’,<br />
’how many’, and ‘how much’. Below example shows the<br />
sentence parsed <str<strong>on</strong>g>to</str<strong>on</strong>g> different <strong>on</strong><str<strong>on</strong>g>to</str<strong>on</strong>g>logical semantic <strong>on</strong><str<strong>on</strong>g>to</str<strong>on</strong>g>logical<br />
categories. For example, below Figure 7 shows the different<br />
<strong>on</strong><str<strong>on</strong>g>to</str<strong>on</strong>g>logical category for sentence, “Can I bring my pet in<br />
cab?”<br />
Figure 7: On<str<strong>on</strong>g>to</str<strong>on</strong>g>logical Category<br />
Once the c<strong>on</strong>ceptual structure is made for the user<br />
request, the actual answer search begins. The answer has <str<strong>on</strong>g>to</str<strong>on</strong>g><br />
fulfill the missing <strong>on</strong><str<strong>on</strong>g>to</str<strong>on</strong>g>logical category. Therefore when there
<str<strong>on</strong>g>Natural</str<strong>on</strong>g> <str<strong>on</strong>g>Language</str<strong>on</strong>g> <str<strong>on</strong>g>Processing</str<strong>on</strong>g> <str<strong>on</strong>g>with</str<strong>on</strong>g> <str<strong>on</strong>g>Text</str<strong>on</strong>g>-<str<strong>on</strong>g>to</str<strong>on</strong>g>-<str<strong>on</strong>g>Speech</str<strong>on</strong>g> <strong>on</strong> <strong>Android</strong> S<strong>on</strong>al Bhatt <strong>May</strong> 2011 6<br />
are two answers in the answer set <str<strong>on</strong>g>with</str<strong>on</strong>g> same subjects and<br />
events then the knowledge base xml system tries <str<strong>on</strong>g>to</str<strong>on</strong>g> get the<br />
best answer <str<strong>on</strong>g>with</str<strong>on</strong>g> the right c<strong>on</strong>ceptual structure which fulfills<br />
the missing <strong>on</strong><str<strong>on</strong>g>to</str<strong>on</strong>g>logy from the questi<strong>on</strong> <str<strong>on</strong>g>to</str<strong>on</strong>g> answer the request.<br />
Following scenario explains when there is more than <strong>on</strong>e<br />
answer in knowledge base system, how the semantic<br />
transformati<strong>on</strong> and answer search processed.<br />
Figure 9: Cab Grammar for ‘Booking’ request<br />
Figure 8: Semantic Transformati<strong>on</strong> and Answer Search<br />
Figure 8 dem<strong>on</strong>strates the scenario when user asks<br />
‘Who picked the order <strong>on</strong> Sunday?’ After recognizing the<br />
c<strong>on</strong>ceptual structure, system understands that ‘subject’<br />
category is missing when asking ‘who’ type of questi<strong>on</strong>. From<br />
the answer set, it finds two similar answers. By making the<br />
c<strong>on</strong>ceptual structure of those two answers, applicati<strong>on</strong> knows<br />
that the first answer’s c<strong>on</strong>ceptual structure is more relevant <str<strong>on</strong>g>to</str<strong>on</strong>g><br />
the c<strong>on</strong>ceptual structure of the questi<strong>on</strong>. Therefore it returns<br />
the first answer for questi<strong>on</strong> asked.<br />
Based <strong>on</strong> the dialog design <strong>on</strong> a natural dialog study<br />
ensures that the input grammar will match the phrasing<br />
actually used by people when speaking in the domain of the<br />
applicati<strong>on</strong> [2]. A natural dialog study also assures that<br />
prompts and feedback follow c<strong>on</strong>versati<strong>on</strong>al c<strong>on</strong>venti<strong>on</strong>s that<br />
users expect in a successful interacti<strong>on</strong>. <str<strong>on</strong>g>Natural</str<strong>on</strong>g> <str<strong>on</strong>g>Language</str<strong>on</strong>g><br />
<str<strong>on</strong>g>Processing</str<strong>on</strong>g> based speech applicati<strong>on</strong>s should adopt language<br />
c<strong>on</strong>venti<strong>on</strong>s that help the end user know what they should say<br />
next and that avoid c<strong>on</strong>versati<strong>on</strong>al patterns that violate<br />
standards successful and cooperative behavior. This process is<br />
d<strong>on</strong>e in ‘cab booking’ applicati<strong>on</strong> by providing user the follow<br />
up prompts that let the user know what <str<strong>on</strong>g>to</str<strong>on</strong>g> ask or answer next.<br />
“Cab Booking” applicati<strong>on</strong> c<strong>on</strong>tains two types of grammars,<br />
cab fuzzy logic grammar and cab exact transacti<strong>on</strong> grammar<br />
described above.<br />
Figure 9 shows the template of ABNF grammar for transacti<strong>on</strong><br />
sub-dialog. The cab applicati<strong>on</strong> uses the ‘booking’ request for<br />
the transacti<strong>on</strong>al dialog <str<strong>on</strong>g>to</str<strong>on</strong>g> book a cab. It c<strong>on</strong>tains the grammar<br />
rules <str<strong>on</strong>g>to</str<strong>on</strong>g> parse the speech in different semantic transformati<strong>on</strong>.<br />
The words in brackets define the <strong>on</strong><str<strong>on</strong>g>to</str<strong>on</strong>g>logical category for<br />
different words. In above figure (DEPARTURE) and<br />
(VEHICLE) are the <strong>on</strong><str<strong>on</strong>g>to</str<strong>on</strong>g>logical categories. Grammar rules<br />
always start <str<strong>on</strong>g>with</str<strong>on</strong>g> ‘$Firstrule’. Each rule is separated by ‘|”<br />
symbol. This grammar file also c<strong>on</strong>tains the states of<br />
destinati<strong>on</strong>, date, and time the user want <str<strong>on</strong>g>to</str<strong>on</strong>g> reserve a cab for. It<br />
c<strong>on</strong>tains the <str<strong>on</strong>g>to</str<strong>on</strong>g>ken as described above for each state <str<strong>on</strong>g>with</str<strong>on</strong>g><br />
words in figure as ‘minivan’, ‘sedan’, ‘coup’ for ‘Vehicle’<br />
state and ‘phoenix zoo’ for ‘Departure’ state. These <str<strong>on</strong>g>to</str<strong>on</strong>g>kens are<br />
listed based <strong>on</strong> user likely say or text for booking a cab. The<br />
functi<strong>on</strong>ality of ABNF grammar helps the user <str<strong>on</strong>g>to</str<strong>on</strong>g> say different<br />
words for same meaning. For example, in above figure, the<br />
grammar rule for ‘$Reserve’ c<strong>on</strong>tains the words ‘book’ and<br />
‘reserve’. That grammar rule I combined <str<strong>on</strong>g>with</str<strong>on</strong>g> ‘$What’ rule<br />
which has two <str<strong>on</strong>g>to</str<strong>on</strong>g>kens of ‘taxi’ and ‘cab’. So now user can<br />
triggers the this grammar by asking the same thing in four<br />
different ways as ‘book a cab’, ‘reserve a cab’, ‘book a taxi’,<br />
and ‘reserve a taxi’. For the time and date category when user<br />
says different number, the number dicti<strong>on</strong>ary file can be<br />
referenced as <str<strong>on</strong>g>to</str<strong>on</strong>g>ken in this grammar file <str<strong>on</strong>g>to</str<strong>on</strong>g> take all kind of<br />
different number combinati<strong>on</strong>s user likely <str<strong>on</strong>g>to</str<strong>on</strong>g> ask.<br />
Figure 10 describes the cabstates.xml file. When user<br />
wants <str<strong>on</strong>g>to</str<strong>on</strong>g> do the exact request for booking a cab, this state file<br />
is read by the system. The request is triggered by the ‘reserve’<br />
command when user says ‘I want <str<strong>on</strong>g>to</str<strong>on</strong>g> reserve a cab’. The pieces<br />
of informati<strong>on</strong> needed <str<strong>on</strong>g>to</str<strong>on</strong>g> complete the booking process, are<br />
destinati<strong>on</strong>, departure, date, time and what type of vehicle user
<str<strong>on</strong>g>Natural</str<strong>on</strong>g> <str<strong>on</strong>g>Language</str<strong>on</strong>g> <str<strong>on</strong>g>Processing</str<strong>on</strong>g> <str<strong>on</strong>g>with</str<strong>on</strong>g> <str<strong>on</strong>g>Text</str<strong>on</strong>g>-<str<strong>on</strong>g>to</str<strong>on</strong>g>-<str<strong>on</strong>g>Speech</str<strong>on</strong>g> <strong>on</strong> <strong>Android</strong> S<strong>on</strong>al Bhatt <strong>May</strong> 2011 7<br />
want <str<strong>on</strong>g>to</str<strong>on</strong>g> reserve. These pieces of informati<strong>on</strong> are s<str<strong>on</strong>g>to</str<strong>on</strong>g>red in<br />
tag of the XML file. The name of the state is<br />
used by the system <str<strong>on</strong>g>to</str<strong>on</strong>g> make the c<strong>on</strong>ceptual structure <str<strong>on</strong>g>with</str<strong>on</strong>g> the<br />
<strong>on</strong><str<strong>on</strong>g>to</str<strong>on</strong>g>logical categories. The tag is activated when<br />
the user first triggers the request by asking the ‘reserve’<br />
acti<strong>on</strong>. Once it is in state file, it prompts the user <str<strong>on</strong>g>with</str<strong>on</strong>g> text in<br />
tag <str<strong>on</strong>g>with</str<strong>on</strong>g>in tag. Once the <br />
value has been fulfilled it jumps <str<strong>on</strong>g>to</str<strong>on</strong>g> the next state <str<strong>on</strong>g>to</str<strong>on</strong>g> capture<br />
the value. If the user fulfils all the states in <strong>on</strong>e request then,<br />
the system captured all the values and puts in the proper<br />
<strong>on</strong><str<strong>on</strong>g>to</str<strong>on</strong>g>logical category <str<strong>on</strong>g>to</str<strong>on</strong>g> form the c<strong>on</strong>ceptual structure. Once<br />
all the values are captured <str<strong>on</strong>g>with</str<strong>on</strong>g>in xml file, it goes <str<strong>on</strong>g>to</str<strong>on</strong>g><br />
tag. This tag allows the user <str<strong>on</strong>g>to</str<strong>on</strong>g><br />
review the request. At this stage, user can change his mind<br />
and change the values he entered before. Once the user<br />
c<strong>on</strong>firms the reservati<strong>on</strong>, the applicati<strong>on</strong> returns the assertive<br />
resp<strong>on</strong>se of reservati<strong>on</strong> by ‘Ok, your reservati<strong>on</strong> has been<br />
c<strong>on</strong>firmed’. If the user says ‘no’ at the c<strong>on</strong>firmati<strong>on</strong> step,<br />
tag will trigger, and prompts the user for<br />
next questi<strong>on</strong>. These tags allows the end users know what<br />
they should say next and avoid c<strong>on</strong>versati<strong>on</strong>al patterns that<br />
violate standards successful and cooperative behavior for the<br />
reservati<strong>on</strong> process.<br />
Figure 10 describes the parsing of exact request from the<br />
cabstates.xml file and cab exact grammar. Acti<strong>on</strong> command<br />
triggers when user asks for reservati<strong>on</strong>. All the blue boxes<br />
indicate the states have <str<strong>on</strong>g>to</str<strong>on</strong>g> be captured <str<strong>on</strong>g>to</str<strong>on</strong>g> go <str<strong>on</strong>g>to</str<strong>on</strong>g> the<br />
c<strong>on</strong>firmati<strong>on</strong> dialog. Until system gets all the states value, it<br />
asks the user <str<strong>on</strong>g>with</str<strong>on</strong>g> prompt <str<strong>on</strong>g>to</str<strong>on</strong>g> fulfill the value. These values can<br />
be given <str<strong>on</strong>g>to</str<strong>on</strong>g> the system at <strong>on</strong>ce or <strong>on</strong>e by <strong>on</strong>e. System parses<br />
the speech and fills the <strong>on</strong><str<strong>on</strong>g>to</str<strong>on</strong>g>logical category <str<strong>on</strong>g>with</str<strong>on</strong>g> values<br />
parsed. Missing states are gathered at end <str<strong>on</strong>g>to</str<strong>on</strong>g> give the follow<br />
up resp<strong>on</strong>se. This implementati<strong>on</strong> gives the ability user ask for<br />
the transacti<strong>on</strong> in many ways naturally. Below figure parses<br />
the sentence of “reserve a sedan for <str<strong>on</strong>g>to</str<strong>on</strong>g>day at 3 p.m.”<br />
Figure 10: C<strong>on</strong>ceptual Structure of Exact Request<br />
The process of building ‘cab booking’ applicati<strong>on</strong> started <str<strong>on</strong>g>with</str<strong>on</strong>g><br />
building answers sets. As described above this applicati<strong>on</strong><br />
c<strong>on</strong>tains two types of answer sets, knowledge base and exact<br />
booking transacti<strong>on</strong> dialog set. From the user requirement, I<br />
have determined what queries needed <str<strong>on</strong>g>to</str<strong>on</strong>g> be answered and<br />
provided actual answer in the cabanswer.xml file. For the<br />
reservati<strong>on</strong> process, I have designed the transacti<strong>on</strong>al dialog<br />
for booking process. I have built the states depends <strong>on</strong> pieces<br />
of informati<strong>on</strong> needed <str<strong>on</strong>g>to</str<strong>on</strong>g> fulfill the reservati<strong>on</strong> of cab. Based
<str<strong>on</strong>g>Natural</str<strong>on</strong>g> <str<strong>on</strong>g>Language</str<strong>on</strong>g> <str<strong>on</strong>g>Processing</str<strong>on</strong>g> <str<strong>on</strong>g>with</str<strong>on</strong>g> <str<strong>on</strong>g>Text</str<strong>on</strong>g>-<str<strong>on</strong>g>to</str<strong>on</strong>g>-<str<strong>on</strong>g>Speech</str<strong>on</strong>g> <strong>on</strong> <strong>Android</strong> S<strong>on</strong>al Bhatt <strong>May</strong> 2011 8<br />
<strong>on</strong> that requirement I have built the cab exact grammar that<br />
supports the states.xml file. This grammar is designed by<br />
c<strong>on</strong>sidering all opti<strong>on</strong>s user might ask or say <str<strong>on</strong>g>to</str<strong>on</strong>g> complete the<br />
state values. The robustness of ABNF grammar allows the<br />
developer <str<strong>on</strong>g>to</str<strong>on</strong>g> add more <str<strong>on</strong>g>to</str<strong>on</strong>g>kens easily for each state <str<strong>on</strong>g>to</str<strong>on</strong>g> provide<br />
more opti<strong>on</strong>s <str<strong>on</strong>g>to</str<strong>on</strong>g> the grammar. For example, the destinati<strong>on</strong><br />
state for the cab reservati<strong>on</strong> process c<strong>on</strong>sists limited number of<br />
destinati<strong>on</strong>. In future the value of the destinati<strong>on</strong> can be easily<br />
changed from ‘Phoenix zoo’ <str<strong>on</strong>g>to</str<strong>on</strong>g> ‘Aquarium’.<br />
Figure 11 describes the cabanswer.xml file. This file<br />
is an xml knowledge base answer set. It is used when the user<br />
request is categorized as ad-hoc request. The parser makes the<br />
c<strong>on</strong>ceptual structure based <strong>on</strong> cab fuzzy logic grammar. The<br />
c<strong>on</strong>ceptual structure is mapped from the fuzzy grammar. The<br />
system understands which answer <str<strong>on</strong>g>to</str<strong>on</strong>g> return from the c<strong>on</strong>text<br />
and meaning of the user’s request.<br />
The “Cab Booking” applicati<strong>on</strong> <str<strong>on</strong>g>with</str<strong>on</strong>g> <str<strong>on</strong>g>Natural</str<strong>on</strong>g><br />
<str<strong>on</strong>g>Language</str<strong>on</strong>g> <str<strong>on</strong>g>Processing</str<strong>on</strong>g> has the following features which allow<br />
the user <str<strong>on</strong>g>to</str<strong>on</strong>g> communicate <str<strong>on</strong>g>to</str<strong>on</strong>g> the android ph<strong>on</strong>e in more natural<br />
way. I have developed these functi<strong>on</strong>alities by adding more<br />
dicti<strong>on</strong>aries <str<strong>on</strong>g>to</str<strong>on</strong>g> the applicati<strong>on</strong>, and designing the dialog<br />
direc<str<strong>on</strong>g>to</str<strong>on</strong>g>r <str<strong>on</strong>g>to</str<strong>on</strong>g> allow the interrupti<strong>on</strong> between exact and ad-hoc<br />
type request. This feature facilitates the user <str<strong>on</strong>g>to</str<strong>on</strong>g> complete the<br />
reservati<strong>on</strong> process at any time in the c<strong>on</strong>versati<strong>on</strong>al dialog.<br />
User can ask the informati<strong>on</strong> retrieval request and come back<br />
<str<strong>on</strong>g>to</str<strong>on</strong>g> complete the reservati<strong>on</strong>. For example, if the user is in<br />
middle of the reservati<strong>on</strong> process and applicati<strong>on</strong> prompts the<br />
user of “What type of vehicle would you like <str<strong>on</strong>g>to</str<strong>on</strong>g> reserve?” ,<br />
user can refuse <str<strong>on</strong>g>to</str<strong>on</strong>g> give the answer of this prompt and interrupt<br />
the system by asking another open ended questi<strong>on</strong> i.e. “What<br />
kind of vehicle this cab company offer?” . The Dialog direc<str<strong>on</strong>g>to</str<strong>on</strong>g>r<br />
saves the reservati<strong>on</strong> state, and retrieves the matching answer<br />
from knowledge base answer set. The system gives the<br />
resp<strong>on</strong>se of answer and also gives the follow up prompt of<br />
reservati<strong>on</strong> process.<br />
Sentence pre-fixes:<br />
Sometimes the users just say some useless words <str<strong>on</strong>g>to</str<strong>on</strong>g><br />
get the informati<strong>on</strong>. The words like ‘umm’, ‘oh’, ‘please’, can<br />
be ignored by the system <str<strong>on</strong>g>to</str<strong>on</strong>g> successfully parse the speech.<br />
Sometimes the l<strong>on</strong>g phrases like ‘I want <str<strong>on</strong>g>to</str<strong>on</strong>g>’, ‘Can you please<br />
tell me’, ‘I would like <str<strong>on</strong>g>to</str<strong>on</strong>g>’ used by the user, also does not need<br />
<str<strong>on</strong>g>to</str<strong>on</strong>g> be parsed when retrieving the correct prompt or dialog from<br />
the system. I have achieved this functi<strong>on</strong>ality by creating the<br />
text file “Filters.txt”. This file c<strong>on</strong>tains all these useless words<br />
and phrases user likely <str<strong>on</strong>g>to</str<strong>on</strong>g> say. The Parser loads the file, and<br />
before it starts <str<strong>on</strong>g>to</str<strong>on</strong>g> parse the request, it checks weather the string<br />
is starting <str<strong>on</strong>g>with</str<strong>on</strong>g> these useless words. If the request c<strong>on</strong>tains this<br />
word, it will be spitted and the parser will use filtered request<br />
<str<strong>on</strong>g>to</str<strong>on</strong>g> make c<strong>on</strong>ceptual structure.<br />
Correcti<strong>on</strong>s:<br />
In my project, I have developed the functi<strong>on</strong>ality of the<br />
correcti<strong>on</strong> at any point in the dialog. In real world, when<br />
people make reservati<strong>on</strong>, they should have facility <str<strong>on</strong>g>to</str<strong>on</strong>g> change<br />
mind. I have accomplished this functi<strong>on</strong>ality by providing<br />
explicitly the tab in the cabstates.xml file.<br />
Example: What would you like <str<strong>on</strong>g>to</str<strong>on</strong>g> do<br />
then?. Therefore in the reservati<strong>on</strong> process,<br />
at c<strong>on</strong>firmati<strong>on</strong> stage, when applicati<strong>on</strong> plays a c<strong>on</strong>firmati<strong>on</strong><br />
prompt, user has facility <str<strong>on</strong>g>to</str<strong>on</strong>g> change the values he has entered<br />
previously and parser will process the request and will replace<br />
the values in the same c<strong>on</strong>ceptual structure.<br />
tag is triggered when c<strong>on</strong>firmati<strong>on</strong> is<br />
denied by the user.<br />
For example:<br />
System: You want <str<strong>on</strong>g>to</str<strong>on</strong>g> reserve a sedan. Is that correct?<br />
User: No.<br />
System: What would you like <str<strong>on</strong>g>to</str<strong>on</strong>g> do then?<br />
User: Actually I want <str<strong>on</strong>g>to</str<strong>on</strong>g> reserve a minivan.<br />
At any point in the dialog, when user wants <str<strong>on</strong>g>to</str<strong>on</strong>g> change the<br />
mind, the Filter.txt file has “ChangeMind’ words listed, which<br />
will be filtered out and parser will get new values. For<br />
example,<br />
User: I want <str<strong>on</strong>g>to</str<strong>on</strong>g> book a sedan.<br />
System: To what date do you want <str<strong>on</strong>g>to</str<strong>on</strong>g> be picked up?<br />
User: Actually I want <str<strong>on</strong>g>to</str<strong>on</strong>g> book a minivan.<br />
Navigati<strong>on</strong> Commands:<br />
“Cab Booking” applicati<strong>on</strong> allows the user <str<strong>on</strong>g>to</str<strong>on</strong>g> flow<br />
the dialog in any directi<strong>on</strong>. It is not like the IVR system,<br />
where user is bound <str<strong>on</strong>g>to</str<strong>on</strong>g> say specific resp<strong>on</strong>se or type particular<br />
number for navigati<strong>on</strong>. I have developed this functi<strong>on</strong>ality by<br />
creating navigati<strong>on</strong>_command.txt file. It helps <str<strong>on</strong>g>to</str<strong>on</strong>g> flow the
<str<strong>on</strong>g>Natural</str<strong>on</strong>g> <str<strong>on</strong>g>Language</str<strong>on</strong>g> <str<strong>on</strong>g>Processing</str<strong>on</strong>g> <str<strong>on</strong>g>with</str<strong>on</strong>g> <str<strong>on</strong>g>Text</str<strong>on</strong>g>-<str<strong>on</strong>g>to</str<strong>on</strong>g>-<str<strong>on</strong>g>Speech</str<strong>on</strong>g> <strong>on</strong> <strong>Android</strong> S<strong>on</strong>al Bhatt <strong>May</strong> 2011 9<br />
dialog in certain directi<strong>on</strong>s depends <strong>on</strong> user’s resp<strong>on</strong>se <str<strong>on</strong>g>with</str<strong>on</strong>g><br />
yes, no, repeat and cancellati<strong>on</strong>. Some of the mappings <str<strong>on</strong>g>with</str<strong>on</strong>g><br />
repeat and cancel commands are as follows. REPEAT maps <str<strong>on</strong>g>to</str<strong>on</strong>g><br />
a variety of phrases: What? What did you say? Can you<br />
repeat? Say that again please? Pard<strong>on</strong>? Pard<strong>on</strong> me? Please<br />
repeat. CANCEL maps <str<strong>on</strong>g>to</str<strong>on</strong>g> a variety of phrases: Please cancel<br />
this transacti<strong>on</strong>. S<str<strong>on</strong>g>to</str<strong>on</strong>g>p please. I d<strong>on</strong>'t want <str<strong>on</strong>g>to</str<strong>on</strong>g> reserve. YES<br />
maps <str<strong>on</strong>g>to</str<strong>on</strong>g> yes, of course, sure, yeah, yup. NO maps <str<strong>on</strong>g>to</str<strong>on</strong>g> no, nah,<br />
no thanks. This facilitates the user <str<strong>on</strong>g>to</str<strong>on</strong>g> speak or type more<br />
naturally <str<strong>on</strong>g>with</str<strong>on</strong>g>out binding <str<strong>on</strong>g>to</str<strong>on</strong>g> enter certain input. The<br />
applicati<strong>on</strong> understands all kind of resp<strong>on</strong>ses listed in the text<br />
file.<br />
Syn<strong>on</strong>yms:<br />
I have created ‘syn<strong>on</strong>yms.txt’ file <str<strong>on</strong>g>to</str<strong>on</strong>g> map similar<br />
c<strong>on</strong>text words in<str<strong>on</strong>g>to</str<strong>on</strong>g> <strong>on</strong>e word used in dicti<strong>on</strong>ary. For example,<br />
user can use bird, animal, cat, dog instead of using word ‘pet’.<br />
I have mapped these words <str<strong>on</strong>g>to</str<strong>on</strong>g> word ‘pet’ which is in ‘noun’<br />
dicti<strong>on</strong>ary. Therefore, user can request <str<strong>on</strong>g>with</str<strong>on</strong>g> any word, but the<br />
parser will refined the request by replacing other word <str<strong>on</strong>g>to</str<strong>on</strong>g> ‘pet’<br />
and will c<strong>on</strong>tinue <str<strong>on</strong>g>to</str<strong>on</strong>g> parse the sentence.<br />
D. Adopted Design<br />
The “Cab Reservati<strong>on</strong>” <str<strong>on</strong>g>with</str<strong>on</strong>g> <str<strong>on</strong>g>Natural</str<strong>on</strong>g> <str<strong>on</strong>g>Language</str<strong>on</strong>g> <str<strong>on</strong>g>Processing</str<strong>on</strong>g> is<br />
based <strong>on</strong> the X-bar theory [7] explained above. I have<br />
designed the <str<strong>on</strong>g>Natural</str<strong>on</strong>g> <str<strong>on</strong>g>Language</str<strong>on</strong>g> parser and Dialog direc<str<strong>on</strong>g>to</str<strong>on</strong>g>r for<br />
the applicati<strong>on</strong>. I have adopted the executi<strong>on</strong> interface<br />
technology for retrieval of the answer, based up<strong>on</strong> the correct<br />
match of the c<strong>on</strong>ceptual structure, I developed from the parser.<br />
This interface uses “tree” data structure. Using mathematical<br />
equati<strong>on</strong>s and algorithms, the executi<strong>on</strong> interface calculates<br />
the weight and positi<strong>on</strong> of the word in the knowledge based<br />
answer set. It finds the best matching paragraph. The<br />
searching is d<strong>on</strong>e based <strong>on</strong> the “verb” in the sentence. The<br />
parser puts the verb in <strong>on</strong><str<strong>on</strong>g>to</str<strong>on</strong>g>logical category named “EVENT”.<br />
Therefore, the executi<strong>on</strong> interface starts the search from leaf<br />
node of the tree which is the verb in the sentence.<br />
A. Validati<strong>on</strong><br />
IV.<br />
VALIDATION<br />
The “Cab Reservati<strong>on</strong>” <str<strong>on</strong>g>with</str<strong>on</strong>g> <str<strong>on</strong>g>Natural</str<strong>on</strong>g> <str<strong>on</strong>g>Language</str<strong>on</strong>g> <str<strong>on</strong>g>Processing</str<strong>on</strong>g><br />
(NLP) using <str<strong>on</strong>g>Text</str<strong>on</strong>g>-<str<strong>on</strong>g>to</str<strong>on</strong>g>-speech (TTS) is developed c<strong>on</strong>sidering<br />
validati<strong>on</strong> at every stage of the applicati<strong>on</strong> development cycle.<br />
Using the requirements as the baseline, all the functi<strong>on</strong>alities<br />
of the applicati<strong>on</strong> described above were tested. Applicati<strong>on</strong> is<br />
tested for ad-hoc queries and transacti<strong>on</strong>al dialog system. For<br />
the android client the applicati<strong>on</strong> is installed <strong>on</strong> Mo<str<strong>on</strong>g>to</str<strong>on</strong>g>rola<br />
device provided by Dr. Richard Whitehouse, Lecturer CTI<br />
Department of Engineering at Ariz<strong>on</strong>a State University. The<br />
server is installed <strong>on</strong> <strong>on</strong>e of the virtual server of CTI<br />
department, Ariz<strong>on</strong>a State University. The applicati<strong>on</strong> is also<br />
tested <str<strong>on</strong>g>with</str<strong>on</strong>g> the speech recognizer for speech input.<br />
B. Results<br />
After testing “Cab Reservati<strong>on</strong>” applicati<strong>on</strong> <strong>on</strong> actual<br />
android device, it can be derived that answers are more<br />
accurate and precise when all words the user likely <str<strong>on</strong>g>to</str<strong>on</strong>g> use in<br />
formulating the questi<strong>on</strong>, are found in dicti<strong>on</strong>aries. Answer<br />
performs better if the questi<strong>on</strong> is fully parsed through either<br />
from the cab exact transacti<strong>on</strong>al grammar or cab fuzzy logic<br />
grammar rather then not finding the matching rule for building<br />
proper c<strong>on</strong>ceptual structure. 95% of the questi<strong>on</strong> maps the<br />
proper <strong>on</strong><str<strong>on</strong>g>to</str<strong>on</strong>g>logical category and found the words from<br />
syn<strong>on</strong>yms files.<br />
Although most of the questi<strong>on</strong>s worked, I have found out<br />
that if the phrase is not in active voice it is difficult for the<br />
applicati<strong>on</strong> <str<strong>on</strong>g>to</str<strong>on</strong>g> parse the speech in right semantic structure.<br />
Also when the answer for particular questi<strong>on</strong> is found in<br />
multiple places in knowledge base answer set then system<br />
does not give the accurate resp<strong>on</strong>se. That time the applicati<strong>on</strong><br />
is found the answers <str<strong>on</strong>g>to</str<strong>on</strong>g> be ambiguous. Although this ‘Cab<br />
Reservati<strong>on</strong>’ applicati<strong>on</strong> is not an expert in the subject area,<br />
the undesirable effect is that it gives the wr<strong>on</strong>g answer for<br />
some informati<strong>on</strong> retrieval area rather than no answer.<br />
V. CONCLUSION AND FUTURE WORK<br />
The <str<strong>on</strong>g>Natural</str<strong>on</strong>g> <str<strong>on</strong>g>Language</str<strong>on</strong>g> <str<strong>on</strong>g>Processing</str<strong>on</strong>g> <str<strong>on</strong>g>to</str<strong>on</strong>g>ol <str<strong>on</strong>g>with</str<strong>on</strong>g> text-<str<strong>on</strong>g>to</str<strong>on</strong>g>-speech<br />
looks through dicti<strong>on</strong>aries and grammars <str<strong>on</strong>g>to</str<strong>on</strong>g> gather relevant<br />
resp<strong>on</strong>se of user request. I believe that the ‘Cab Reservati<strong>on</strong>’<br />
applicati<strong>on</strong> is developed using the c<strong>on</strong>cepts and design of<br />
natural language processing described in this paper, when<br />
deployed, helps the end user <str<strong>on</strong>g>to</str<strong>on</strong>g> speak more naturally for the<br />
reservati<strong>on</strong> process. This project uses primarily the android<br />
device, <str<strong>on</strong>g>with</str<strong>on</strong>g> text-<str<strong>on</strong>g>to</str<strong>on</strong>g>-speech library supported for android<br />
operating system. Sec<strong>on</strong>d, this applicati<strong>on</strong> gives the<br />
appropriate resp<strong>on</strong>ses based <strong>on</strong> the natural language parser<br />
and understanding <str<strong>on</strong>g>to</str<strong>on</strong>g>ol <str<strong>on</strong>g>to</str<strong>on</strong>g> flow the user communicati<strong>on</strong> <str<strong>on</strong>g>with</str<strong>on</strong>g><br />
android smartph<strong>on</strong>e more natural way. Further the soluti<strong>on</strong><br />
implemented during this project is scalable, portable, can be<br />
deployed <str<strong>on</strong>g>to</str<strong>on</strong>g> any android device.<br />
Some areas for future work would be <str<strong>on</strong>g>to</str<strong>on</strong>g> add more<br />
intelligence <str<strong>on</strong>g>to</str<strong>on</strong>g> cab grammar <str<strong>on</strong>g>to</str<strong>on</strong>g> achieve more natural and<br />
robust dialog resp<strong>on</strong>ses by using collaborative user<br />
communicati<strong>on</strong>. Hence an important extensi<strong>on</strong> <str<strong>on</strong>g>to</str<strong>on</strong>g> the current<br />
work would be <str<strong>on</strong>g>to</str<strong>on</strong>g> c<strong>on</strong>sider mobile agent security mechanisms<br />
such as authenticati<strong>on</strong>, authorizati<strong>on</strong>, encrypti<strong>on</strong> and others as<br />
in [12] <str<strong>on</strong>g>to</str<strong>on</strong>g> ensure that secure transacti<strong>on</strong> can be d<strong>on</strong>e<br />
throughout the booking process. In the future, a more<br />
graphical user interface can be developed for ease of the end<br />
user and more functi<strong>on</strong>s like recording the transacti<strong>on</strong> dialog<br />
can be added <strong>on</strong> as the mobile applicati<strong>on</strong>. At this stage, the<br />
system uses speech recogniti<strong>on</strong> of android device itself, it can<br />
be developed separately <str<strong>on</strong>g>to</str<strong>on</strong>g> detect the user’s speech input time<br />
and made the c<strong>on</strong>versati<strong>on</strong> more natural <str<strong>on</strong>g>with</str<strong>on</strong>g>out <str<strong>on</strong>g>to</str<strong>on</strong>g>uching the<br />
device. This applicati<strong>on</strong> c<strong>on</strong>tains the dicti<strong>on</strong>aries and<br />
grammar for booking a cab in English language. In future the<br />
functi<strong>on</strong>ality can be added <str<strong>on</strong>g>to</str<strong>on</strong>g> work <str<strong>on</strong>g>with</str<strong>on</strong>g> multiple language<br />
dicti<strong>on</strong>aries and grammars.<br />
Furthermore, when the system is unable <str<strong>on</strong>g>to</str<strong>on</strong>g> find the answer<br />
from the answer set or text database, it would be able <str<strong>on</strong>g>to</str<strong>on</strong>g> search<br />
thru intranet or internet <str<strong>on</strong>g>to</str<strong>on</strong>g> give the related answer of user<br />
request. For more accurate communicati<strong>on</strong>, the functi<strong>on</strong>ality<br />
of pushing a related page <str<strong>on</strong>g>with</str<strong>on</strong>g> the appropriate answer can be<br />
added in future. For example, when the user does not know the<br />
exact locati<strong>on</strong> of the departure, the system will be able <str<strong>on</strong>g>to</str<strong>on</strong>g><br />
detect the current locati<strong>on</strong> of android device <str<strong>on</strong>g>with</str<strong>on</strong>g> the help of<br />
GPS. In additi<strong>on</strong>, the applicati<strong>on</strong> can be enhanced <str<strong>on</strong>g>to</str<strong>on</strong>g> au<str<strong>on</strong>g>to</str<strong>on</strong>g>mate<br />
grammar genera<str<strong>on</strong>g>to</str<strong>on</strong>g>r process from the states files. This feature<br />
will reduce the amount of human error in linguistic field and
<str<strong>on</strong>g>Natural</str<strong>on</strong>g> <str<strong>on</strong>g>Language</str<strong>on</strong>g> <str<strong>on</strong>g>Processing</str<strong>on</strong>g> <str<strong>on</strong>g>with</str<strong>on</strong>g> <str<strong>on</strong>g>Text</str<strong>on</strong>g>-<str<strong>on</strong>g>to</str<strong>on</strong>g>-<str<strong>on</strong>g>Speech</str<strong>on</strong>g> <strong>on</strong> <strong>Android</strong> S<strong>on</strong>al Bhatt <strong>May</strong> 2011 10<br />
syntax errors. C<strong>on</strong>sequently, it would save the man hours <str<strong>on</strong>g>to</str<strong>on</strong>g><br />
build the robust c<strong>on</strong>versati<strong>on</strong>al dialog. For testing purpose the<br />
tester can be developed which will take all the questi<strong>on</strong>s user<br />
likely ask and will feed in<str<strong>on</strong>g>to</str<strong>on</strong>g> the applicati<strong>on</strong>, will generate the<br />
related answer and dump in<str<strong>on</strong>g>to</str<strong>on</strong>g> the text file <str<strong>on</strong>g>to</str<strong>on</strong>g> analyze the<br />
results <str<strong>on</strong>g>to</str<strong>on</strong>g> save the time of testing the applicati<strong>on</strong> <str<strong>on</strong>g>with</str<strong>on</strong>g> different<br />
types of questi<strong>on</strong>s.<br />
ACKNOWLEDGMENT<br />
I would like <str<strong>on</strong>g>to</str<strong>on</strong>g> thank my committee chair Dr Timothy<br />
Lindquist for providing his valuable guidance and advice<br />
throughout this work. I also extend my thanks <str<strong>on</strong>g>to</str<strong>on</strong>g> my<br />
committee members Professor Richard Whitehouse and Dr.<br />
John Femiani for their feedback. Professor Richard’s android<br />
class inspired me <str<strong>on</strong>g>to</str<strong>on</strong>g> work <strong>on</strong> this project idea and helped <str<strong>on</strong>g>to</str<strong>on</strong>g><br />
build the client-server architecture <str<strong>on</strong>g>with</str<strong>on</strong>g> the android device.<br />
REFERENCES<br />
[1] A. Turing, “Computing Machinery and Intelligence”, Mind 49, 1950, pp.<br />
433-460.<br />
[2] “Designing Effective <str<strong>on</strong>g>Speech</str<strong>on</strong>g> Applicati<strong>on</strong>”, Java TM <str<strong>on</strong>g>Speech</str<strong>on</strong>g> API<br />
Programmer's Guide, Sun Microsystems, Inc.<br />
[3] C.Bajorek, “The state of IVR navigati<strong>on</strong> technology”, Computer<br />
Teleph<strong>on</strong>y Magazine, New York, NY, Volume 8, September 2000.<br />
[4] J. A. Jacko, A. Sears, “The human –computer interacti<strong>on</strong> handbook:<br />
fundamentals, evolving technologies, and emerging” New Jersey:<br />
Lawrence Erlbaum Associates, 2003, pp. 712-750.<br />
[5] T.Du<str<strong>on</strong>g>to</str<strong>on</strong>g>it, “An Introducti<strong>on</strong> <str<strong>on</strong>g>to</str<strong>on</strong>g> <str<strong>on</strong>g>Text</str<strong>on</strong>g>-<str<strong>on</strong>g>to</str<strong>on</strong>g>-<str<strong>on</strong>g>Speech</str<strong>on</strong>g> Synthesis.” TTS Research<br />
team , TCTS Lab, pp. 2-6.<br />
[6] B.Manaris. “<str<strong>on</strong>g>Natural</str<strong>on</strong>g> <str<strong>on</strong>g>Language</str<strong>on</strong>g> <str<strong>on</strong>g>Processing</str<strong>on</strong>g> :A human-Computer<br />
Interacti<strong>on</strong> Perspective”, University of Southwestern Lousiana,<br />
Louisiana.<br />
[7] Chomsky, N.. “Remarks <strong>on</strong> Nominalizati<strong>on</strong>.” In R.Jacobs & P.<br />
Rosembaum, eds., Readings in English, 1970.<br />
[8] Jackendoff, R. ‘ Foundati<strong>on</strong>s of <str<strong>on</strong>g>Language</str<strong>on</strong>g>.’ Oxford University Press,<br />
New York, NY. 2002.<br />
[9] Jackendoff, R. ‘ Semantics and Cogniti<strong>on</strong>.’ The MITPress, Cambridge,<br />
MA. 1983.<br />
[10] NuGram Platform, “http://nugram.nuecho.com/product_app/welcome”,<br />
nu Echo Inc., 2003-2011.<br />
[11] M.D. Riley. “Tree-based modeling for speech synthesis”, In G. Bailly,<br />
C. Benoit, and T.R. Sawallis, edi<str<strong>on</strong>g>to</str<strong>on</strong>g>rs, Talking Machines: Theories,<br />
Models, and Designs, pages 265–273.<br />
[12] Yang Kun, Guo Xin, Liu Dayou, “Security in mobile agent system:<br />
problems and approaches”, ACM SIGOPS Operating Systems Review,<br />
Volume 34 Issue 1, January 2000.