1 - LumenVox
1 - LumenVox
1 - LumenVox
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Table of Contents<br />
About <strong>LumenVox</strong><br />
Why Choose <strong>LumenVox</strong><br />
Discover <strong>LumenVox</strong><br />
Speech Recognition Engine<br />
Speech enables any application with its flexible API, powering every solution that<br />
<strong>LumenVox</strong> provides.<br />
Speech Platform<br />
Allows you to develop and deploy your speech application: with just a few steps,<br />
your application can go from conception to reality.<br />
Speech Driven Assistant<br />
Integrates seamlessly with Vertical Communication's TeleVantage IP-PBX, permitting<br />
you to speech-enable the name directory, contact list, voicemail, IVR, and email.<br />
Speech Tuner<br />
Maintains and tests existing applications, ensuring that any speech recognition<br />
application, including those driven by Nuance and ScanSoft⎯continue to work well.<br />
<strong>LumenVox</strong> Training<br />
Describes classes and instructors available to help you learn about the speech<br />
industry, application design, and tuning with <strong>LumenVox</strong>’s products.<br />
Application Development Overview<br />
Provides insight to help you develop high-quality and effective applications for your<br />
customer base.<br />
Tuning Guide<br />
Gives a basic overview of the steps required when tuning and improving your speech<br />
applications.<br />
2<br />
4<br />
6<br />
8<br />
14<br />
22<br />
26<br />
36<br />
40<br />
52<br />
1
About <strong>LumenVox</strong><br />
<strong>LumenVox</strong> is a speech recognition company with over a decade of<br />
telephony experience. We utilize a business and technology approach<br />
that allows businesses, corporations, resellers, platform providers, and<br />
service providers access to the complex speech recognition industry.<br />
Our revolutionary speech recognition software products have gained<br />
industry recognition, winning over 17 awards for innovation, technical<br />
excellence, and user's choice.<br />
Whether your organization wants to quickly and easily speech-enable current applications, or<br />
maintain existing ones, <strong>LumenVox</strong> provides all the necessary tools: our state-of-the-art Speech<br />
Recognition Engine, Speech Platform, Speech Driven Assistant, and Speech Tuner.<br />
Tired of extra costs for adaptations, application updates, and new deployments? At <strong>LumenVox</strong>, we<br />
believe that you know your company's needs best: our tools allow you to develop speech<br />
applications on your own terms, at your company's pace, without service fees for each new update.<br />
With <strong>LumenVox</strong>'s software suite, you remain in control.<br />
<strong>LumenVox</strong> Team<br />
Expert. Dedicated. Innovative.<br />
<strong>LumenVox</strong>'s development team⎯speech scientists, speech user interface designers, and experienced<br />
sales and marketing personnel⎯has over 30 years of practical experience in the development and<br />
integration of speech recognition systems, telephony, database design, and hardware integration.<br />
At <strong>LumenVox</strong>, we know the challenges and opportunities that come with integrating speech into<br />
your application. We offer guidelines, tips, and best practices, and our design team can assist in<br />
any phase of your development cycle. We understand speech.<br />
<strong>LumenVox</strong> is committed to providing the most powerful, flexible, and useful speech products and<br />
services with excellent customer service to clients of any size.<br />
Recent Awards<br />
<strong>LumenVox</strong> offers an impressive suite of<br />
versatile, world-class speech recognition<br />
technologies.<br />
Dr. Danny Lange<br />
CEO of Vocomo Software<br />
2 3
Why Choose <strong>LumenVox</strong>?<br />
Rethinking the Business of Speech<br />
Traditionally, speech recognition technology providers keep the<br />
development and maintenance of speech applications under wraps.<br />
Instead, <strong>LumenVox</strong> believes in empowering the users and developers of<br />
our speech recognition technology.<br />
Other technology providers create, deploy, and maintain their own proprietary applications for their<br />
clients. For this, the price tag of the final application can be in the millions, due to the extensive<br />
and on-going professional service fees. Traditional companies also have tiered pricing and<br />
functionality for their core Speech Recognition technology. So depending on what you need or want,<br />
the pricing can and will vary greatly. This greatly limits the development and expansion of speech<br />
recognition, preventing smaller businesses from utilizing these products.<br />
<strong>LumenVox</strong> Provides:<br />
Exceptional customer and technical support<br />
Hardware independent Automatic Speech Recognizer<br />
(ASR) with a distributed client-server architecture<br />
that runs on both Windows and Linux<br />
Extensive logging of audio, grammars, results, and<br />
scores, which allow you to recreate every call<br />
State-of-the-art testing and post-deployment tools to<br />
constantly improve your <strong>LumenVox</strong>, Nuance, or<br />
ScanSoft speech application<br />
Support and development of current and emerging<br />
industry standards<br />
We provide the tools and education needed when creating applications. We aspire to make speech<br />
recognition widely available, so our business model is structured on a single per port charge, as<br />
opposed to tiered pricing models.<br />
At <strong>LumenVox</strong>, we believe in the power of speech recognition to help revolutionize many industries,<br />
including businesses that are currently excluded from using speech by tiered pricing and costly<br />
professional service fees. Contact us to learn more about how speech recognition can work for your<br />
business.<br />
I think your product is the tops, and<br />
I’ve been around the block. The MAJOR<br />
KEY TO ME is that I can use it effectively<br />
right out of the box BUT ALSO create<br />
custom speech-activated (true-IVR)<br />
applications right down to the<br />
TeleVantage API level!<br />
Evan Klayman President of Brainstem<br />
4 5
Discover <strong>LumenVox</strong><br />
<strong>LumenVox</strong>’s Products<br />
The speech industry<br />
contains many<br />
different technologies,<br />
platforms, and types<br />
of applications. This<br />
chart enables you to<br />
quickly view the<br />
components of a final<br />
speech application, as<br />
well as the products<br />
<strong>LumenVox</strong> offers.<br />
Tuning<br />
The process of<br />
changing an<br />
application to improve<br />
performance.<br />
Applications<br />
Speech applications<br />
allow callers to access<br />
any database to get<br />
account information,<br />
perform order entry,<br />
take customer surveys,<br />
or check order status.<br />
Speech Tuner<br />
Complete maintenance tool to<br />
tune and test any speech<br />
application using various ASRs<br />
including <strong>LumenVox</strong>, Nuance,<br />
and ScanSoft.<br />
Speech Assistant<br />
Complete solution, currently<br />
for TeleVantage owners to<br />
speech-enable their name<br />
directories, contact lists,<br />
voicemails, IVRs, and emails.<br />
Professional Services<br />
Experts in the speech industry<br />
that provide a variety of<br />
services including speech<br />
application design,<br />
development, and tuning.<br />
Pre-Packaged<br />
Pre-packaged applications can<br />
offer features such as email<br />
and calendar management.<br />
Custom Built<br />
Custom speech applications<br />
can address any vertical or<br />
horizontal market need<br />
where touch-tone<br />
applications can not.<br />
Platform<br />
Defines what type of<br />
applications can be<br />
run, and how those<br />
applications are<br />
allowed to operate.<br />
Speech Platform<br />
Complete platform with a<br />
toolkit to design, and deploy<br />
speech applications. The Call<br />
Handler supports any phone<br />
system via analog, digital,<br />
ISDN, or PRI.<br />
VoiceXML<br />
Standards-based speech<br />
application programming<br />
language supported by many<br />
platforms.<br />
Others<br />
Many platforms are available<br />
that use languages, other than<br />
VoiceXML, for creating and<br />
running speech applications.<br />
Core Speech<br />
Technology<br />
These technologies<br />
form the basis for<br />
building any speech<br />
application.<br />
Speech Engine<br />
Automatic Speech Recognizer<br />
(ASR) that supports SRGS,<br />
MRCP, and SISR. Integrated<br />
in various VoiceXML and<br />
proprietary platforms.<br />
ASR<br />
Technology used for<br />
interpreting audio data from<br />
phone, web, microphone, or 2-<br />
way radio.<br />
Other Core Technology<br />
Text-to-Speech is used to<br />
produce audio from text. Voice<br />
Verification is used to identify<br />
individual speakers for security<br />
purposes.<br />
6 7
Speech Engine<br />
<strong>LumenVox</strong>'s Speech Recognition Engine is a flexible<br />
API that performs speech recognition on audio data<br />
from any audio source.<br />
The Speech Engine is speaker and hardware<br />
independent: it supports SRGS and SISR on both<br />
Windows and Linux platforms.<br />
How the Speech Engine works<br />
The Speech Engine provides speech application developers with an<br />
efficient development and runtime platform, allowing for dynamic<br />
language, grammar, audio format, and logging capabilities to customize<br />
every step of their applications. Grammars are entered as a simple list of<br />
words or pronunciations, or in the industry standard Speech Recognition<br />
Grammar Specification (SRGS), as defined by the W3C.<br />
Grammars<br />
Just 13 lines of code (8 calls to the<br />
Speech Engine) will implement a<br />
simple "yes-no" speech recognition<br />
system. The system must provide the<br />
audio and the audio data length for<br />
SoundData and SoundDataLength.<br />
<strong>LumenVox</strong>’s technology provides us<br />
with some of the very best in speech<br />
technologies today, at the right price for<br />
our customers.<br />
Mark Kelley President of Parallax<br />
Sample Code<br />
void RecognizeSpeech (void* SoundData, int SoundDataLength)<br />
{<br />
const char* GrammarString =<br />
"#ABNF 1.0\n"<br />
"language en-US;\n"<br />
"mode voice;\n"<br />
"tag-format ;\n"<br />
"$yes = (yes | yeah | okay):'true';\n"<br />
"$no = (nope | no):'false';\n";<br />
LVSpeechPort Port;<br />
Port.OpenPort ();<br />
Port.LoadGrammarFromBuffer (0, GrammarString);<br />
Port.LoadVoiceChannel (0, SoundData,SoundDataLength, ULAW_8KHZ);<br />
Port.Decode (0, 0, LV_DECODE_SEMANTIC_INTERPRETATION |<br />
LV_DECODE_BLOCK );<br />
int NumInterpretations = Port.GetNumberOfInterpretations (0);<br />
for (int i = 0; i < NumInterpretations; ++i)<br />
cout
Supporting Standards<br />
<strong>LumenVox</strong> supports the W3C's Speech Recognition Grammar<br />
Specification (SRGS), part of the VoiceXML 2.0 and SALT<br />
specifications. Companies that track these specifications are<br />
dedicated to the future of speech, and to integrating with<br />
other companies committed to promoting speech recognition.<br />
The <strong>LumenVox</strong> SRGS implementation is backward compatible<br />
with the existing <strong>LumenVox</strong> BNF grammar format; current<br />
deployments will leverage the power of the SRGS system<br />
immediately and transparently.<br />
Both companies (<strong>LumenVox</strong> and<br />
SandCherry) have focused on simplifying<br />
the development, integration, and<br />
deployment of speech services while<br />
maintaining affordability.<br />
Charles Corfield<br />
President and CEO of<br />
SandCherry<br />
<strong>LumenVox</strong> recognizes that the speech community will need to<br />
work together to develop solutions for businesses, and as<br />
such, <strong>LumenVox</strong> applications complement the following<br />
technologies:<br />
VXML<br />
VoiceXML (VXML) is a mark-up language designed to code speech<br />
applications with many of the same architectural components as HTML.<br />
VoiceXML platforms connect to a combination of speech recognition engines,<br />
text-to-speech synthesis, telephony interfaces and VoiceXML interpreter<br />
software to process the call. In order to interface VXML with any speech<br />
engine, the engine must understand SRGS and SISR.<br />
<strong>LumenVox</strong>'s Speech Engine is compliant with what VXML expects, and our<br />
engine powers the speech recognition portion of several VXML platforms.<br />
SALT<br />
Speech Application Language Tags (SALT) is similar to VoiceXML but also<br />
adds support for multi-modal systems. SALT extends existing mark-up<br />
languages such as XHTML, XML, and HTML. Similar to our work with VXML,<br />
the <strong>LumenVox</strong> Speech Recognition Engine conforms to SALT specifications.<br />
Semantic Interpretation<br />
Engine Features and Functionality:<br />
Streaming audio<br />
<strong>LumenVox</strong> has implemented the W3C's Semantic Interpretation for Speech<br />
Recognition (SISR) working draft, also part of the VoiceXML 2.0 specification.<br />
SISR allows grammar authors to embed snippets of JavaScript code into<br />
Supports English, Latin American Spanish, and Canadian French<br />
their SRGS grammars, to automatically transform what a speaker says into a<br />
format understandable to an application. With <strong>LumenVox</strong>'s Semantic Tags,<br />
Flexible API easily integrates into current OA&M, billing, provisioning, and debugging<br />
callers can say, "September thirteenth two thousand four," and your<br />
systems<br />
application will understand "2004-09-13."<br />
<strong>LumenVox</strong> is committed to supporting the W3C's working draft. As the draft<br />
evolves, we will support both new and old drafts, so application developers<br />
can be confident that their grammars and tags will perform to specification.<br />
Client/Server architecture distributes speech-processing load<br />
Run time defined grammars entered as simple text, BNF, raw phonetic spelling, or SRGS<br />
Advanced dynamic barge-in adapts to each call in real-time<br />
SDK includes documentation and a demonstration C/C++ application<br />
10 Flexible error recovery through the use of confidence scores and n-best results<br />
11
Advanced Features<br />
Noise Reduction Module<br />
When noise is present, it will degrade the performance of any speech recognition system.<br />
Quality noise reduction improves the accuracy of Voice Activity Detection and Core<br />
Recognition, both essential parts of a speech recognition system.<br />
To improve application robustness in noisy environments, <strong>LumenVox</strong> implemented a Noise<br />
Reduction Module (NRM) into our Speech Recognition Engine. The NRM automatically<br />
adapts to the acoustic environment, and dynamically updates its estimate of noise levels.<br />
The adaptive algorithm enables the NRM to reduce the effects of noise.<br />
The waveforms below demonstrate the power of <strong>LumenVox</strong>'s Noise Reduction Module. In<br />
the original audio [Fig. 1], a truck driver speaks on a cell phone while driving. In addition<br />
to noise from the truck engine and blowing wind, another vehicle engine starts in the middle<br />
of the recording. Although traditional noise reduction implementations often fail to adapt to<br />
such dramatic changes, <strong>LumenVox</strong>'s NRM adjusts to the new noise characteristics rapidly<br />
and automatically. [Fig.2]<br />
Fig. 1 Original Audio<br />
Truck’s engine noise Another vehicle starting<br />
Fig. 2 Audio after noise cancellation<br />
Reduced truck’s noise Another vehicle starting Adapting to new engine noise<br />
Voice Activity Detection<br />
Voice Activity Detection (VAD), also referred to as barge-in and/or End-Of-Speech (EOS) detection,<br />
identifies when a person begins speaking, finishes speaking, or pauses while speaking.<br />
<strong>LumenVox</strong>'s VAD implementation delivers high performance despite challenging conditions: hisses,<br />
pops, abrupt changes in background noise, telephone line echo, and squawks from two-way radio<br />
communication.<br />
The Voice Activity Detection module is highly configurable and can be adapted to work equally well<br />
within telephone, VoIP, or microphone-based applications.<br />
We are delighted that <strong>LumenVox</strong> is<br />
extending the capabilities of our<br />
platform.<br />
Media Resource Control Protocol (MRCP)<br />
Speech synthesizers…Audio recorders…DTMF recognizers…Speech<br />
recognizers…Speech verifiers…a fully functioning, media-rich application<br />
needs lots of components to work together. Until now, all of these<br />
components had to be provided by a single vendor, or required extensive<br />
custom programming to integrate them. MRCP changes all this. The<br />
Media Resource Control Protocol allows you to seamlessly manage<br />
diverse media resources. MRCP provides a common language to speak<br />
to all of these devices.<br />
With MRCP, vendors can compete on the basis of their strengths, rather<br />
than attempting to create an all-inclusive, yet mediocre package.<br />
Customers can take the best product from each vendor, creating a<br />
speech application package that is tailored to their particular needs.<br />
For detailed information visit:<br />
http://www.ietf.org/internet-drafts/draft-ietf-speechsc-mrcpv2-06.txt<br />
n-best Results<br />
Instead of returning only the top scoring result, you can<br />
instruct the engine to return several of the highest scoring,<br />
most likely answers, often called n-best results. Returning<br />
n-best results is particularly effective when callers need to<br />
spell names, street addresses, or e-mail addresses.<br />
Without n-best results, if a caller spells a name beginning<br />
with "N," but the engine returns a low confidence score, the<br />
caller would be asked to repeat the letter-and given how<br />
similar "N" is to "M," it's likely that the second answer<br />
would have a similarly low confidence score. With n-best<br />
results, the system can prompt the caller using several of<br />
the likely results, such as "Did you mean 'M,' as in 'Mary'?"<br />
When the caller responds, "No," the system goes to its next<br />
option, "Perhaps you meant 'N' as in 'Nancy'?"<br />
Returning n-best results improves the caller's experience:<br />
instead of asking the caller to simply repeat an answer that<br />
received a low confidence score, the system can confirm the<br />
caller's intention using several likely choices.<br />
Server-Side Grammar<br />
<strong>LumenVox</strong> offers even more efficient support for<br />
large grammars, by allowing clients to pre-load<br />
grammars onto the server, allowing users to send<br />
the grammar prior to the decode requests.<br />
Typically, the grammar itself accompanies each<br />
decode request, but in the case of large grammars,<br />
sending the grammar to the server prior to<br />
decoding is more efficient⎯reducing network traffic.<br />
John Hibel Vocalocity’s Vice President<br />
of Marketing and Business Development<br />
12 13
Speech Platform<br />
The Speech Platform is an intuitive GUI-based toolkit to quickly design,<br />
develop, and deploy any speech application or IVR. By connecting to<br />
almost any phone system and database, the Platform can easily be<br />
designed for a Speech Driven Technical Support, Call Router, Customer<br />
Service, Dealer Locator, Auto-Attendant, or any other speech<br />
application.<br />
Platform Features:<br />
English, Latin American Spanish, and Canadian French Support<br />
Client/Server functionality<br />
Database Connectivity through Custom Action DLLs<br />
Call Bridging and Outbound Dialing through Custom Action DLLs<br />
Support for Intel Dialogic Dx1E, JCT, Dx2, JCT, and DMV Series cards<br />
Enterprise level distribution<br />
User-created sophisticated grammars<br />
Loop start / Analog or T1 / PRI ISDN / Digital Switch<br />
Barge-In capability<br />
Live updates without rebooting system<br />
DTMF and speech input<br />
Detailed Call Flow logging<br />
Live runtime GUI status monitoring<br />
Complete Call Flow handling<br />
Flash-hook transfer capability<br />
Flexible Call Job definitions<br />
Carrier-grade application-ready<br />
Full SRGS, SISR support<br />
On-the-fly project switching<br />
File or SQL/MSDE Database-based projects<br />
Flexible line setting<br />
Assign each phone line to a different database, file project, or CAPI only mode<br />
Noise Reduction Module<br />
Speech Tuner included<br />
Speech Recognition Engine included<br />
<strong>LumenVox</strong>’s Speech Platform is a<br />
clear leader in the speech recognition<br />
sector.<br />
Nadii Tehrani Chairman of TMC<br />
14 15
Everything You Need<br />
<strong>LumenVox</strong>'s Speech Platform includes all of the components you need to<br />
produce, adjust, and maintain your speech applications. Designed from<br />
the outset to work together, the Platform's components operate<br />
seamlessly.<br />
Platform Component Descriptions:<br />
The Platform Designer allows you to construct the framework for your speech<br />
application in a GUI environment.<br />
Platform Extensions are used to handle any situation that the Platform Designer<br />
cannot support internally.<br />
The Speech Engine recognizes what the caller says, returning the results to the Call<br />
Handler.<br />
The Call Handler works with the Platform’s Designer, Extensions, and Speech<br />
Recognition Engine, executing the logic of your application, and directing calls<br />
appropriately.<br />
1<br />
2<br />
3<br />
The Call Flow View allows the designer of the speech<br />
application to see all of the modules that are associated with<br />
the application and how they relate to each other. This is<br />
where the application dialog flow is created and can be<br />
tracked to see how callers will flow through the speech<br />
application or IVR.<br />
The Objects Panel allows application designers to drag and<br />
drop Modules that perform a specific function, or to add<br />
notes with Annotations to provide quick information or<br />
reminders on the Call Flow itself. Actions Lists, Actions,<br />
and Grammars are drag and drop icons to add within each<br />
individual Module's properties.<br />
The Properties Window has six tabs: Project, Modules, Audio<br />
Library, Quick Audio, To Do, and Notes. Project shows<br />
theglobal and project specific properties. Modules is a list of<br />
modules you have created so far for the application. The<br />
Audio Library and Quick Audio displays all audio prompts<br />
contained within the project. To Do reminds you that there<br />
are objects that have not been finished within the whole<br />
project. Notes provides a place to enter comments or<br />
thoughts concerning the applications.<br />
The Speech Tuner provides a comprehensive window into your application: it allows<br />
you to quickly note where your application performs well, and determine which<br />
areas need improvement. The Tuner also lets you simulate changes to your<br />
application, using audio from past calls, to determine the effectiveness of each<br />
change.<br />
2<br />
<strong>LumenVox</strong> allows us to have control<br />
over the development of call flow,<br />
grammars and tuning, while keeping the<br />
costs at a respectable level.<br />
Brian Lauzon President of TelASK<br />
3<br />
1<br />
16 17
Platform Extensions<br />
The Speech Platform can be extended, allowing you to create a<br />
number of speech applications, by accessing pre-built libraries and<br />
controlling the call flow. These Platform Extensions can be written<br />
as either a Visual Basic ActiveX exe or C/C++ DLL.<br />
Examples of Platform Extensions:<br />
Connect to a live, or regularly updated, website or RSS<br />
feed<br />
Connect to a database via an ADO (ActiveX Data<br />
Object)<br />
Use SRGS for grammars and enable semantic<br />
interpretation<br />
Direct callers to different modules within the<br />
Application Designer project based on external<br />
decision trees<br />
Example Applications:<br />
Pin code and account number capture<br />
Survey systems<br />
Automated billing<br />
Collecting demographic information<br />
Auto-attendant<br />
Transaction completion<br />
Appointment scheduling<br />
Email reader<br />
Voicemail access<br />
By utilizing <strong>LumenVox</strong>’s Speech<br />
Platform our callers actually enjoy the<br />
experience of the phone call, helping us<br />
to build a good relationship with our<br />
customers.<br />
Derek Henry CEO of 1-800-US-LOTTO<br />
Corporation<br />
VB ActiveX Platform Extension Example<br />
18 C/C++ Platform Extension Example 19
Call Handler<br />
The Speech Platform's Call Handler runs the speech application. The Platform's Settings, and<br />
more specifically Line Settings, inform the Call Handler as to which Project to use.<br />
CallHandler<br />
The Call Handler supports a variety of Intel Dialogic telephony<br />
cards, and is designed to make hardware setup as easy as possible;<br />
this allows more time for you to develop and test your speech<br />
application.<br />
When developing your speech application with the Speech Platform's Designer, run tests by<br />
clicking on the Test button in the Toolbar. The Call Handler allows you to use speakers and a<br />
microphone, to completely test your application in-house⎯before releasing it to customers.<br />
Handler<br />
Design Tips<br />
Build the application for the expected case, anticipating the caller's situation and the<br />
application's goals. Focus on these goals as you design the call flow and dialogues.<br />
Keep prompts and grammars concise in your initial design, then expand based on<br />
callers' interactions, learned from the tuning process.<br />
Give thought and time to determine the persona and brand image. Verify that the<br />
voice talent creates the proper tone and perception for your company.<br />
Make the system match the user. Listen to the way a user responds and interacts<br />
with the system.<br />
Apply the strengths of speech recognition to your application: remember that many<br />
of the hierarchical structures used in DTMF or touch tone call flows are not<br />
appropriate in speech applications!<br />
The reason we chose <strong>LumenVox</strong>’s<br />
speech technology was simply because of<br />
the flexibility it provides us, as developers<br />
Never disguise a list question as a yes/no by adding unnecessary pauses, as in,<br />
"Would you like Red (pause), Blue, or Black?" This leads to caller confusion.<br />
Brian Lauzon<br />
President of TelASK<br />
Adjust the "No Input" timeout to match the complexity of the question.<br />
Make explicit decisions with yes/no.<br />
Make it easy for the caller to leave the speech application and reach a live person.<br />
20 21
Speech Driven Assistant<br />
<strong>LumenVox</strong>'s Speech Driven Assistant for TeleVantage is a complete<br />
turn-key program to speech-enable name directories, contact lists,<br />
voicemail, IVRs, and emails, providing all callers hands-free phone<br />
interactions. The system also allows for alternate names, nicknames,<br />
and various spellings and pronunciations to be recognized.<br />
<strong>LumenVox</strong>’s Speech Driven Assistant<br />
for TeleVantage has become the most<br />
valued component in our operation of a<br />
speech-activated hotline, 1-800-US-LOTTO.<br />
Derek Henry<br />
CEO of 1-800-US-LOTTO<br />
Corporation<br />
Remote Access to:<br />
Speech enabled outbound dialing<br />
Users can say a name from their personal contact lists when they want to place a call<br />
Speech enabled name directory<br />
Callers can speak the name of the person or department they want to reach<br />
Support for workgroups and multiple company configurations<br />
Transfer fax calls to specified extension<br />
Add alternate names, spellings, and pronunciations<br />
Speech enabled voicemail access<br />
Access, control, and manage voicemail with speech<br />
Reply directly to caller's phone number or extension<br />
Forward messages to other users<br />
Speech enabled IVR (<strong>LumenVox</strong>'s Speech Platform)<br />
GUI-based development tool to create a variety of IVR applications<br />
Supports both Speech and DTMF input<br />
External database access<br />
Speech enabled access to email<br />
Access POP3, Exchange Server, and IMAP email<br />
Reply to email in an attached recorded audio format or with user predefined text messages<br />
NeoSpeech Text-to-Speech (TTS)<br />
Play prompts of un-recorded proper names, dynamic IVR applications, and emails<br />
Assistant Features:<br />
Speech activated dialing to contact list<br />
Transfer to another extension from voicemail<br />
Navigate, forward, save, and delete voicemail<br />
IMAP, POP3 and Exchange server email access<br />
DTMF and Speech input<br />
Configure Speech Driven Assistant database remotely<br />
Barge-in capability with CSP compatible Intel Dialogic cards<br />
Live run-time GUI status monitoring<br />
Speech Platform included<br />
Speech Tuner included<br />
NeoSpeech Text-to-Speech included<br />
22 23
Configuration Tool<br />
User’s email account information and pre-defined<br />
text message replies are easily managed.<br />
<strong>LumenVox</strong>’s Configuration tool is where all modifications to the Speech<br />
Driven Assistant are made.<br />
The Configuration program allows<br />
administrators to modify user<br />
information, maintain email<br />
servers, configure speech<br />
applications, and check the<br />
server’s telephony hardware<br />
compatibility.<br />
Administrators can easily select and<br />
change any user’s information, which is<br />
continuously synchronized with<br />
TeleVantage’s database.<br />
The Configuration tool<br />
can also discover and<br />
display all Intel Dialogic<br />
cards installed. All<br />
features associated with<br />
the telephony cards are<br />
shown as a quick<br />
reference of its<br />
capabilities.<br />
Alternate names or departments<br />
associated with a particular user can<br />
be added quickly to the live system.<br />
For commonly mispronounced<br />
names, alternate<br />
pronunciations can be added.<br />
The Phonetic Speller provides<br />
alternate pronunciations as<br />
well as the chance to hear<br />
how it sounds, using Text-to-<br />
Speech.<br />
<strong>LumenVox</strong>’s Speech Driven Assistant<br />
has provided the best integration to<br />
TeleVantage of any other speech<br />
recognition product we tested.<br />
John Gagliardi<br />
President of GTI Solutions<br />
24 25
Speech Tuner<br />
<strong>LumenVox</strong>'s Speech Tuner is a complete<br />
maintenance tool for end-users, valueadded<br />
resellers, and platform providers.<br />
It’s designed to perform tuning and<br />
transcription, as well as parameter,<br />
grammar, and version upgrade testing of<br />
any speech application.<br />
With this GUI-based tool, companies<br />
developing speech applications on<br />
various ASR platforms (including<br />
Nuance and ScanSoft) can bring speech<br />
application tuning in-house and avoid<br />
professional service fees.<br />
<strong>LumenVox</strong> is on the cutting edge of<br />
speech technologies and customer<br />
satisfaction by supporting not only their<br />
own speech engine, but other leaders in the<br />
industry as well.<br />
Bruce Balentine<br />
Executive Vice President<br />
and Chief Scientist at EIG<br />
Why Do Companies Need the<br />
<strong>LumenVox</strong> Speech Tuner?<br />
No untuned speech application<br />
survives contact with actual<br />
customers. Tuning is an absolute<br />
requirement for every speech application<br />
deployment. Our Tuner allows you to<br />
quickly assess changes and upgrades.<br />
Tuner Capabilities:<br />
Evaluate and improve the speech recognition<br />
application<br />
Analyze each stage in the call process<br />
Transcribe audio data, make pinpoint adjustments,<br />
and immediately measure the effects on performance<br />
Test changes against actual calls immediately<br />
Analyze data collected using different ASR engines<br />
Test design and development decisions of new<br />
applications, using data from deployed applications<br />
26 27
Tuner Overview<br />
The Speech Tuner is comprised of several key functions and windows.<br />
SRE Custom Tags<br />
displays log<br />
information supplied<br />
by the application’s<br />
developers.<br />
The Call Log displays the list calls and controls the display<br />
of information in the rest of the Tuner. Each interaction<br />
(or turn) in a call is marked with an event type, such as:<br />
: Beginning of the call<br />
: Speech event<br />
: Touch tone event<br />
: 'No input' event; the application<br />
did not detect any speech or touch tones<br />
: 'Unknown' event; these<br />
events are typically ASR-specific<br />
: End of the call<br />
The Transcription tool allows a<br />
transcriber to type the text of the<br />
caller's speech. Transcriptions<br />
are automatically evaluated and<br />
stored in the database for use in<br />
performance evaluations.<br />
The Statistics window<br />
displays performance<br />
statistics directly related<br />
to the calls in the Call Log<br />
window.<br />
The Answer window shows<br />
recognition results. Full<br />
n-best support is available,<br />
as well as the semantic<br />
interpretations, and actual<br />
words recognized.<br />
28<br />
Listen to the application prompt, and the caller's pre- and postprocessed<br />
speech. The recognized words are displayed where the<br />
ASR found them within the caller's speech. Vertical bars, which are<br />
useful for detecting problems, indicate the beginning and end of each<br />
word.<br />
29
Tuning Processes<br />
<strong>LumenVox</strong>'s Speech Tuner provides full support for <strong>LumenVox</strong>'s Speech<br />
Recognition Engine, Nuance 8.5, ScanSoft OSR 2, and other ASRs. The<br />
Speech Tuner allows you to work with any supported ASR via a single<br />
interface.<br />
<strong>LumenVox</strong> is an active supporter of the Tools committee in the VXML<br />
Forum, and is working to help define standard logging information, to<br />
help ease the tuning process.<br />
The tuning process involves three easy steps:<br />
Import Data.<br />
1<br />
2<br />
3<br />
The basic process is simple. Users import call log data into<br />
the Speech Tuner database. All information stored by the call<br />
log is available in the Speech Tuner. In most cases, log fields<br />
between ASR engines are very similar; when the information<br />
differs, every effort is made to preserve the original data.<br />
Each special case is fully documented.<br />
Transcribe Speech.<br />
Transcribers can type the text of the caller's speech directly<br />
into the Speech Tuner. Once the audio is transcribed, the<br />
Tuner compares audio transcripts with the speech engine<br />
results to determine accuracy, greatly reducing errors<br />
associated with hand evaluations. If semantic interpretations<br />
are available, the transcriber can also mark whether the<br />
semantic interpretation was correct or incorrect. The<br />
transcripts are evaluated using the actual decode grammar,<br />
producing measurements such as word-error-rate, in- and<br />
out-of-grammar rates, and semantic error rates.<br />
Test Immediately.<br />
Selecting an interaction in the Call Log automatically loads<br />
the associated audio and grammar into the Tester. The<br />
grammar can be edited, speech engine parameters set, and<br />
individual recognition tests generated. The Speech Tuner<br />
natively supports industry standard SRGS grammars. Once<br />
a set of possible changes is identified, users can batch test<br />
audio to evaluate performance, using those changes.<br />
The Speech Tuner assumes the user possesses licensed<br />
versions of the relevant ASR, that the ASR platform is up and<br />
running, and that the platform is able to accept connections.<br />
<strong>LumenVox</strong> Speech Tuner Database<br />
The Speech Tuner communicates with an open-source, freeware database called SQLite<br />
(www.sqlite.org). The Speech Tuner manages call log importing, searching, and exporting⎯so<br />
users can focus on the task of tuning, not log management. The database is contained in a single<br />
file, is easy to back up and transport, and can be queried using SQL-92 (see the SQLite website for<br />
full details) from a variety of exterior tools. Other speech engine vendors are free to convert their<br />
native logs to ones the engine understands. The format, content, and semantics of the <strong>LumenVox</strong><br />
Speech Tuner database are published.<br />
The database maintains all the information contained in the original call log. The Speech Tuner<br />
includes not only the decode grammar and ASR results, but also the decode platform, parameter<br />
settings, alternative results, prompt audio, and pre- and post-processed audio.<br />
Depending on the platform logging capabilities, the database can provide more advanced<br />
information, such as ASR result alignments within the audio; the list of phonemes used in the<br />
decode; and word, utterance, and semantic interpretation confidence measurements.<br />
In addition, the Tuner stores all transcripts and evaluations within the call log. As transcripts are<br />
entered into the Speech Tuner, they are automatically evaluated against the decode grammar.<br />
These transcripts, and any notes or additional information, are stored directly into the database.<br />
Individual scores⎯such as word error rate, semantic error rate, and in- and out-of-grammar<br />
measurements⎯are stored along with their alignments, as well as information about how the scores<br />
were reached.<br />
Users can generate a variety of reports from these results, including error rate by grammar or<br />
dialog, confusion matrices, transcription progress, and confidence thresholds for confirmation or<br />
rejection settings.<br />
In the future, <strong>LumenVox</strong>'s Speech Tuner will also support back-end database replacement, for use in<br />
enterprise level systems, where multiple users will be analyzing the same data simultaneously.<br />
Companies who use an ODBC-capable database can replace, with certain SQL changes, the diskbased<br />
SQLite system with an enterprise system such as MS SQL Server 2000, MySQL, PostgreSQL,<br />
and/or Oracle.<br />
<strong>LumenVox</strong> has created speech<br />
recognition products that are easy to<br />
code with and GUI-based tools, such as<br />
the new Speech Tuner that greatly<br />
simplifies post-deployment<br />
maintenance.<br />
Vern Baker<br />
President of enGenic<br />
Corporation<br />
30 31
Taking Out the Guesswork<br />
Make changes to grammars, parameters, or ASR engines, secure in the<br />
knowledge that those changes will make the application better, faster,<br />
and more accurate. The Speech Tuner uses historical information to<br />
validate your changes, ensuring your success.<br />
Grammar Tester<br />
Most 'tuning' tools are passive log viewers, requiring that changes be<br />
made in the live speech application and retested over a period of time<br />
with live callers. With <strong>LumenVox</strong>'s Tuner, we send the changes to the<br />
Speech Engine, simulating the recognition process and evaluating<br />
changes instantly. Instead of slow, non-interactive, static tuning, the<br />
Speech Tuner enables on-the-fly, highly interactive, dynamic tuning.<br />
Make a change, do the test, get the results!<br />
The Grammar Tester is a dynamic<br />
testing component. You can switch<br />
ASR engines, grammars, and engine<br />
search parameters on-the-fly, and test<br />
changes in single or batch tests.<br />
Grammar Evaluation<br />
Evaluate speech and grammar sets against the speech engine, as they<br />
took place during the actual call. Adjust grammars and instantly<br />
re-test and re-score to evaluate improvements in performance. With<br />
<strong>LumenVox</strong>'s Speech Tuner, you can instantly determine whether adding<br />
a new phrase to the grammar will improve your accuracy.<br />
Parameter Evaluation<br />
Setting parameters optimizes the speech engine performance, further<br />
improving the caller's experience. Traditionally, changing ASR<br />
parameters is a difficult and time-consuming task, often requiring long<br />
delays between changing a parameter, and evaluating its effects on<br />
performance. Our Speech Tuner can dramatically shorten the process.<br />
The dynamic test capability of the <strong>LumenVox</strong> Speech Tuner allows the<br />
user to quickly make and test parameter changes: now, ASR engine<br />
parameters such as search optimizations, speech end-pointing, and<br />
n-best result processing can be easily adjusted, and immediately<br />
re-tested and re-scored from within the Speech Tuner.<br />
Performance Measurements<br />
The Speech Tuner rates performance against commonly accepted<br />
measures like WER (Word Error Rate), Grammar Coverage, and<br />
Semantic Interpretation matching. This helps give an accurate picture<br />
of details such as average confidence scores, correct versus incorrect<br />
responses, and In-Grammar versus Out-of-Grammar performance.<br />
Assessing Upgrades<br />
Installing new versions of platforms and ASR engines entails a certain<br />
risk with each new upgrade. On occasion, new default settings, search<br />
routines, changes to acoustic models, and so on will actually worsen<br />
the caller's experience, until the application is re-tuned. But using the<br />
<strong>LumenVox</strong> Speech Tuner, you can perform baseline testing with the old<br />
version to establish the minimum acceptable performance. Then,<br />
using the upgraded version of the ASR engine, you can easily re-test<br />
all existing data and compare the results to the baseline. The new<br />
performance, judged against the baseline, gives you the information<br />
you need to make a decision, and deploy an upgrade with confidence.<br />
32 33
Tuner Reports<br />
The <strong>LumenVox</strong> Speech Tuner defines several pre-built queries for most<br />
common reports. The reports are generated using SQL queries against<br />
the Tuner database, with results produced in a pre-defined XML format.<br />
The format, content, and semantics of the <strong>LumenVox</strong> Speech Tuner<br />
database are published: if you need to extract data in logs that are not<br />
provided in the Tuner interface by default, you can easily produce<br />
custom reports by writing SQL commands.<br />
Tuner Tips<br />
When a problem occurs with a transaction,<br />
determine if the fault lies primarily with<br />
prompts or grammars.<br />
Never make a change in your call flow or<br />
grammar for just one failed call.<br />
Train acoustic models with environment<br />
noise in play.<br />
Train acoustic models to allow for caller<br />
dialects and regional pronunciations.<br />
Tune grammars, prompts, and all system<br />
parameters.<br />
Remember that accurate transcriptions<br />
need to account for noise.<br />
Speech Understood<br />
Speech Understood<br />
The Speech Tuner is an excellent fully<br />
integrated tool for improving speech<br />
applications.<br />
Bruce Balentine Executive Vice President<br />
and Chief Scientist at EIG<br />
34 35
<strong>LumenVox</strong> Training Courses<br />
The <strong>LumenVox</strong> Team, a group of knowledgeable<br />
professionals with extensive development and<br />
technical support experience, provides our<br />
courses. We will help you get the most out of<br />
your <strong>LumenVox</strong> products.<br />
Courses include Speech Application Design, API Development,<br />
Speech Application Tuning, and many others. These courses give<br />
developers and business personnel opportunities to learn about<br />
speech development and sales opportunities on our premises, with<br />
the assistance and advice of the <strong>LumenVox</strong> Team.<br />
Who Should Attend<br />
People responsible for developing, maintaining,<br />
marketing, and/or selling <strong>LumenVox</strong> speech<br />
applications will benefit from <strong>LumenVox</strong> training.<br />
Classes also benefit anyone with an interest in<br />
designing, developing, tuning, testing, or maintaining<br />
any speech telephony system.<br />
What to Expect<br />
<strong>LumenVox</strong> training is key to accelerating your learning curve. Through a combination of<br />
presentations and hands-on exercises, our courses provide the details of creating and maintaining<br />
applications. In these courses, you will learn solutions to real problems encountered during actual<br />
application design, development, deployment, tuning, marketing, and selling.<br />
<strong>LumenVox</strong> training will give you the guidance you need to successfully design, develop, deploy, and<br />
refine your applications. We tailor our trainings to meet your particular needs.<br />
About the Instructors<br />
Our team of expert instructors is committed to your success. Every <strong>LumenVox</strong> instructor has a<br />
background in computer telephony, application development, and speech recognition. We are<br />
familiar with the development challenges you will encounter on a daily basis; we will offer solutions<br />
to routine problems, as well as creative approaches to not-so-routine problems.<br />
<strong>LumenVox</strong> Support Services: Ensuring Your Success<br />
At <strong>LumenVox</strong>, we recognize that high quality, cost effective technical support is a crucial component<br />
of successful application development. With proper support, subscribers gain a deeper product<br />
understanding, resulting in enhanced productivity, and ultimately in greater customer satisfaction.<br />
With this in mind, <strong>LumenVox</strong> offers a simplified technical support system designed to meet varying<br />
customer needs. <strong>LumenVox</strong> technical support plans are available for VARs, Distributors, and End<br />
Users with ongoing projects/support needs. The key component of <strong>LumenVox</strong> technical support is<br />
the Customer Hotline. Two additional avenues are also available: Fax Support and Email Support.<br />
Whichever method you choose, know that <strong>LumenVox</strong> will work efficiently to answer your questions<br />
and resolve the problem.<br />
Our <strong>LumenVox</strong> technical support team is made up of knowledgeable<br />
professionals with extensive <strong>LumenVox</strong> development and support experience.<br />
We are well versed in computer telephony technology and are available to<br />
assist you with:<br />
General <strong>LumenVox</strong> Technical Assistance<br />
Timely Problem Resolution<br />
Product Installation Assistance<br />
<strong>LumenVox</strong> Database/Host Connection Assistance<br />
Intel Dialogic Hardware Optimization<br />
I really appreciate the cooperation<br />
and assistance that we received from the<br />
<strong>LumenVox</strong> engineers. They are an easy<br />
group to work with.<br />
Chris Riggenbach CXM Product<br />
Development Manager<br />
36 37
Our Partners Include...<br />
Keeping People Connected<br />
<strong>LumenVox</strong> and SandCherry Speech Enable 2-way Radio Networks<br />
SandCherry's Voice4 Radio Message System (RMS) dramatically improves workgroup efficiency for<br />
teams with 2-way radio, phone, and Web users by providing easy-to-use voice and data messaging.<br />
The ability to offer messaging to 2-way radios⎯one of the most widely used tools for mobile work<br />
forces⎯adds an entirely new dimension to workgroup communications.<br />
Leaving messages for radio users when they are unavailable, and providing access to the same<br />
system for phone and Web users to leave and retrieve messages, provides the critical bridge<br />
between what had been independent communications networks. The system also offers phone<br />
users a patch capability to connect to the radio network for real-time communication with radio<br />
users. Voice4 RMS equips a workgroup to improve its efficiency and performance using tools and<br />
processes already in place, without relying on a dispatcher.<br />
Using <strong>LumenVox</strong>'s Speech Engine, SandCherry's Voice4 RMS provides an unparalleled level of<br />
voice-driven functionality for mobile workgroups. Voice4 RMS is easy to install and use, and<br />
provides a cost-effective, robust communications solution.<br />
Making Travel Easier<br />
Los Angeles World Airports (LAWA) using <strong>LumenVox</strong> Engine and enGenic technology to<br />
speech-enable travel information<br />
Los Angeles World Airports, including Los Angeles International Airport (LAX), has implemented the<br />
first fully speech enabled voice response system for a major airport network. In compliance with<br />
Homeland Security regulations, LAWA helps callers access the most up-to-date flight information<br />
directly from the airport hotline, rather than calling individual airline carriers.<br />
Now callers can check the status of their flights, get information about parking, ground<br />
transportation, services for people with disabilities, inquire about lost and found items, contact<br />
administrative offices, and get directions to the airport⎯all through an automated speech system.<br />
The hotline was created using enGenic's development and engineering tools, and <strong>LumenVox</strong>'s<br />
Speech Engine. Jim Coulter, CEO of enGenic, states, "<strong>LumenVox</strong> provides us with fast and accurate<br />
recognition of speech grammars, in an ever-changing world environment. Callers around the world,<br />
with different accents, can use simple English commands to obtain information from all four<br />
airports, over 185 airlines, 40 different taxi and limousine services, and 20 different airport<br />
departments."<br />
The new system will enable us to<br />
continue our rapid growth and at the<br />
same time improve the efficiency of our<br />
ordering system without adding<br />
additional staff.<br />
Mike Gilson<br />
Vice President of ATA Retail<br />
Services<br />
Helping Doctors Treat Patients<br />
<strong>LumenVox</strong> Speech Engine Used in Innovative Patient Follow-up System<br />
The University of Ottawa Heart Institute in Ottawa, Canada has implemented an innovative<br />
automated patient follow-up system developed by <strong>LumenVox</strong> partner TelASK Technologies Inc.<br />
(www.TelASK.com). The TelASK System incorporates the <strong>LumenVox</strong> Speech Engine, allowing the<br />
Ottawa Heart Institute to closely monitor the progress and recovery of surgical patients by placing<br />
an automated call to their homes on day three, and again on day ten, after the patients have been<br />
discharged. The speech enabled outbound dialing system automatically phones the patient and<br />
delivers a pre-set list of questions. Each question is a strong indicator of the patient's progress.<br />
Using a specially designed algorithm, patients' answers are grouped as either requiring an<br />
immediate call back, contact within a day, or progressing normally. In the event of a response that<br />
may indicate a problem, the system will hold callers on the line and connect them to an Advance<br />
Practice Nurse at the Heart Institute for immediate attention. The University of Ottawa Heart<br />
Institute will also use this system to monitor patients with Acute Coronary Syndrome, and patients<br />
participating in their Smoking Cessation programs.<br />
38<br />
39
Types of Speech Recognition<br />
Speech recognition is used in a wide range of applications, from<br />
automated commercial phone systems to enhancing personal<br />
productivity. This technology appeals to anyone who needs or wants a<br />
hands-free approach to computing tasks.<br />
There are two main types of speaker models: speaker independent and speaker dependent.<br />
Speaker independent models recognize the speech patterns of a large group of people. Speaker<br />
dependent models recognize speech patterns from only one person. Both models use mathematical<br />
and statistical formulas to yield the best word match for speech. A third variation of speaker<br />
models is now emerging, called speaker adaptive. Speaker adaptive systems usually begin with a<br />
speaker independent model and adjust these models more closely to each individual during a brief<br />
training period.<br />
Leveraging the Power of Speech<br />
Although many companies' first instincts are to simply speech enable their existing DTMF<br />
applications, doing so does not leverage the power and strengths of speech: speech enabling a<br />
DTMF application will not make the system smoother, faster or easier-to-use.<br />
The combination speech/DTMF system lengthens already complex menus, by adding the "press or<br />
say" routine⎯"For Checking, press or say 'one,'" or even worse, "For Checking, press one or say<br />
'checking'." The typical combination speech/DTMF system requires the caller to remember too<br />
much, putting undue burdens on the caller.<br />
Migrating a DTMF application and its prompt design does not fully utilize the conversational aspect<br />
of speech. For speech applications to perform well, the call flow and dialog design is crucial.<br />
Designers must study the user patterns of the existing system, so they can redesign prompts,<br />
menus, and change the steps of the call flow to make the experience faster and more pleasant for<br />
callers.<br />
Well-designed speech applications offer many advantages over the combination speech/DTMF<br />
systems.<br />
By using a speech enabled system, our<br />
merchandisers realize significant time<br />
savings through 24-hour-a-day telephone<br />
access to information, from delivery status<br />
to new delivery days and product<br />
opportunities.<br />
Mike Gilson Vice President of ATA Retail<br />
Services<br />
Speech is:<br />
More Human<br />
With speech, prompts are phrased as easy<br />
questions, and callers can answer simply and<br />
naturally⎯with their voices. Speech systems<br />
provide a more natural interface than touch tone<br />
menus.<br />
Smooth and Fast<br />
Good speech call flow designs permit callers to<br />
get what they need faster, without having to wade<br />
through cumbersome filter menus.<br />
Easy-to-Use<br />
Navigation is much simpler, and callers can use<br />
the application with the interface mechanisms<br />
they are most familiar with: their voices.<br />
More Personal<br />
Speech applications give the impression of the<br />
ideal employee: attentive, empathic, alert, and<br />
consistently agreeable, rather than an impersonal<br />
string of numbers and tones.<br />
When to Use DTMF<br />
Sometimes, DTMF is appropriate: as an errorhandling<br />
backup, or in special, security-sensitive<br />
interactions, such as pin code or credit card entry.<br />
In terms of customer satisfaction for most calls,<br />
speech applications outperform DTMF. Rather<br />
than speech-enabling an existing DTMF system,<br />
design your application with a conversation in<br />
mind⎯and learn to leverage the power of speech.<br />
40 41
State of the Industry<br />
To get an accurate understanding of the current state of the speech<br />
industry, we must first look at the history of Interactive Voice Response<br />
(IVR) systems.<br />
Companies have attempted to handle customer interactions with touch tone IVR systems or live<br />
agents. Yet, most customers become frustrated with ineffective DTMF interfaces, or hang up while<br />
holding for a live agent. In order to support customer interactions more quickly and efficiently,<br />
companies began to request speech recognition interfaces.<br />
This movement towards speech provided the speech recognition industry with tremendous growth<br />
potential, however, many companies consider speech to be in the early adopter stage of the market.<br />
Why is that?<br />
While people have been hearing about speech for decades, only in the past decade or so have<br />
advances in the technology and supporting hardware allowed speech to finally become a viable<br />
option, with most performing tasks with over 90 percent accuracy. During this period many Fortune<br />
500 companies implemented speech recognition, and helped educate consumers on how to interact<br />
with speech applications. These applications have become so advanced and mainstream, that<br />
businesses⎯both large and small⎯now turn to speech solutions for everything from basic autoattendants<br />
to more complex order-taking systems.<br />
Vendor Selection Tips<br />
Select a partner with the technological and business<br />
expertise that best suits your company and future<br />
projects. Ensure that the partner you choose will<br />
provide all of the services and products you'll<br />
need to be successful.<br />
Search for tools that allow for every change and<br />
adjustment to be automatically and rigorously<br />
tested, with actual historical call data.<br />
Include a tool in your development process that<br />
verifies any dead ends or unfinished work.<br />
Choose technologies that best fit with your application<br />
requirements.<br />
Select partners with deployment experience.<br />
Verify the level of technical support provided⎯ensure that<br />
you will receive the support you need.<br />
Speech recognition offers a great solution for large and small businesses: it simplifies customer<br />
interactions, increases efficiency, and reduces operating costs.<br />
Analysts at Cahners In-Stat and Giga report that calls that are handled by live agents can have an<br />
average cost-per-call from $2 to over $15. With speech recognition, the average cost-per-call can<br />
be cut to $.20 per call or less.<br />
<strong>LumenVox</strong>’s corporate and product<br />
strategy is right in sync with us.<br />
Since this technology appeals to anyone who needs or wants a hands-free approach to computing<br />
tasks, it is becoming a standard software option. At <strong>LumenVox</strong>, we focus on developing tools that<br />
will empower the user to build, customize, maintain, and refine their own applications.<br />
Vern Baker<br />
President of enGenic<br />
Corporation<br />
Speech recognition system development is still something that requires time to prepare and<br />
monitor. With tools like <strong>LumenVox</strong>'s Speech Platform and Speech Tuner, we are continually working<br />
to simplify the aspects of speech application development⎯to help your business get the most out<br />
of speech recognition.<br />
42 43
Effective Design = Customer Satisfaction<br />
Speech recognition applications come in many varieties, from simple<br />
routers to complex ordering systems. What designers must remember is<br />
ease-of-use. Even with a complex system, callers must be able to<br />
navigate through the system easily.<br />
Speech applications allow customers to accomplish their goals quickly and easily. Much of the<br />
internal work is in the design phase: building the call flow, creating grammars, recording prompts,<br />
and conducting usability testing. Speech application designers will modify each aspect throughout<br />
design and internal testing phases.<br />
But with all the speech applications on the market today, and most prominent speech companies<br />
boasting recognition accuracy in the high 90%'s, why do so many people feel that "speech doesn't<br />
work?"<br />
Usually, it's because callers don't know what to say.<br />
You can avoid common problems by carrying<br />
out the following steps during the design phase:<br />
1 Research Needs and Create Initial Design.<br />
First, speak to the people who currently answer phone calls, and get their<br />
input. What current questions and interactions could potentially be automated?<br />
If the company already uses a DTMF application, how well does DTMF handle<br />
these interactions and questions? Not all interactions match well with speech's<br />
capabilities, so initial research is critical.<br />
Next, sketch out a potential progression of the call flow, and share this with<br />
others to make sure the progression makes sense and that callers can quickly<br />
and easily navigate the system.<br />
2Develop Prompts and Grammars.<br />
Designers should decide how much of a "natural language" system callers need<br />
or desire. "How may I help you?" only works if callers know precisely what<br />
they want, and the designer can accurately predict their responses. Generally,<br />
callers will need some guidance and cues as to what to say. A "How may I<br />
help you?" question involves more extensive grammar development and testing<br />
than a question like: "We offer three choices, A, B, or C. Which would you<br />
prefer?"<br />
The customer service that we have<br />
received from the <strong>LumenVox</strong> team has<br />
always been top of the line and we are<br />
looking forward to continuing our<br />
partnership for many years in the<br />
future.<br />
The application developer must keep the system as conversational as possible,<br />
but prevent callers from treating the machine as a human. When callers think<br />
the system "actually" understands, or think that it possesses a greater<br />
vocabulary than it really does, they get lost⎯or make requests that are outside<br />
the system's capabilities. These problems rapidly compound, resulting in caller<br />
frustration and dissatisfaction. Effective, intentional, clearly designed prompts<br />
and grammars help keep customers satisfied while managing cost.<br />
3 Test with Real Customers.<br />
The ultimate measure of an application is the first live deployment of the<br />
system. The first live deployment must be a test version with actual users, not<br />
the programmers who are intimately familiar with the application's design.<br />
This will be the first time that assumptions about user behavior will be<br />
seriously tested; the resulting data will allow designers to modify the<br />
application to meet the caller's needs.<br />
4 Tune with Real Data.<br />
Test deployments permit designers to fine-tune the system, often resulting in<br />
significant changes, if they review actual caller experiences. By refining<br />
prompts, grammars, and call flow design, the application will become more<br />
robust, error-free, clear, and effective⎯in short, an application that customers<br />
will want to use.<br />
To tune the application effectively, all of the components of initial<br />
design⎯prompts, grammars, call flow, and the persona of the call<br />
system⎯must be tested. Since these elements are often built separately,<br />
Victor Salazar President of Technical<br />
developers must ensure that all of the parts combine effectively in the testing<br />
Support Systems<br />
phase to achieve the desired effects. Properly tuning the speech application<br />
involves a thorough assessment of initial design components and real caller<br />
44<br />
interactions.<br />
45
Error Handling<br />
Make your speech system more efficient and usable by optimizing error<br />
handling.<br />
Focus on the basics: Is the system accurate? Does the caller achieve call completion? Does the<br />
caller like using the system, and want to continue doing business with your company?<br />
An optimal application addresses these issues by combining technology with art. Mixing technical<br />
aspects, like programming and testing, with aesthetic elements, like writing, casting, and coaching,<br />
reduces errors and increases customer satisfaction. Give sufficient consideration to each part of<br />
the development process.<br />
Understand the speech recognizer that you are using⎯the confidence scores it returns allow you to<br />
make good decisions about the call flow. Track confidence scores at the project, grammar, and<br />
single call levels, to set both static and dynamic thresholds. This will permit the system to make a<br />
good decision on whether or not to confirm.<br />
Remember that it is better to confirm than to make a mistake. Although confirming can be<br />
unpleasant for callers, it is preferable to the frustration of being lost. Figure out when you need to<br />
confirm using the confidence scores, and try to make the confirmation prompts less complex than<br />
the original prompt. If you choose your confirmations wisely, even though it takes a little longer,<br />
users will not become irritated or impatient, and will get to where they need to be.<br />
Help Callers Avoid Frustration…<br />
Errors occur because callers get lost⎯or are unsure of what to say. In either case, this<br />
is usually a prompt issue. Effective prompt writing guides callers to say what is in the<br />
grammar. Another option is to make the grammar robust enough to handle reasonable<br />
requests, even when the caller's phrasing is clumsy.<br />
No matter the style, the prompt should always focus on the task at hand: moving the<br />
call forward.<br />
Sequential prompts should be connected with transitions, e.g., first, next, finally.<br />
Prompts can also use related questions, as in "Please tell me your account<br />
number…And your pin code."<br />
Although long prompts should generally be avoided, sometimes a few extra words pay<br />
off. Phrases like "Just to confirm," "Almost finished," and "So I can help you better"<br />
create a forward mental model…in these cases, reassuring the caller is more beneficial<br />
than the few seconds you gain by clipping the prompt.<br />
Whatever your company's style or needs, <strong>LumenVox</strong> can help.<br />
Solving Problems<br />
When an error happens, fix it! And don't let it happen<br />
again.<br />
Sounds simple enough, but how? And what will<br />
customers do in the meantime?<br />
Figuring out why an error happens is the key to fixing<br />
it. The feedback callers provide is often vague;<br />
instead, go straight for empirical data. Examine the<br />
context of the call to help pinpoint what caused the<br />
error: too much background noise; not enough<br />
volume; mispronunciation. Invariably, errors will<br />
occur. To provide great customer service through<br />
speech applications, we need to minimize errors, and<br />
make the caller as comfortable as possible when errors<br />
do occur.<br />
Error-handling interfaces often increase caller<br />
frustration. Recognize any of these?<br />
Adversarial error responses:<br />
"I need you to be more specific."<br />
Generic responses:<br />
"I'm sorry, I didn't understand."<br />
Annoyingly enthusiastic responses:<br />
"Let's try it again!"<br />
When developing speech applications, most companies strive to<br />
achieve a balance between saving money on live agents and<br />
providing better service than traditional DTMF systems.<br />
Unfortunately, callers are sometimes uncomfortable with speech<br />
applications⎯they try to talk as they would to a human, or worse,<br />
speak in a stilted way, because they think computers will better<br />
process their requests. Successful experiences with your call<br />
system will help increase caller confidence⎯and when errors<br />
occur, callers will be more patient if they have had positive<br />
experiences in the past.<br />
Improving the caller experience is about good service.<br />
Thinking carefully about the error interface, designing<br />
effective prompts, testing the call system often, and<br />
examining the context of errors can help improve<br />
the caller's satisfaction with the speech<br />
application…and with your business.<br />
We are delighted that an industry<br />
leader like <strong>LumenVox</strong> has met the market<br />
demand with a product specialized for our<br />
TeleVantage platform.<br />
Rob Black Product Marketing Manager<br />
of Vertical (formerly Artisoft)<br />
46 47
Voice Matters<br />
The voice of your speech application is your company's<br />
first representative. Choose the voice wisely. You're not<br />
just looking for a type of voice; you're looking for an<br />
emissary. The voice needs to be able to explain, inspire,<br />
soothe, excite, and above all else, sound sincere.<br />
Assuming the Voice User Interface (VUI) design is good, the prompts must be<br />
recorded to best represent that design. The four essential pieces for great<br />
prompt recording include:<br />
Great Casting<br />
Great Directing<br />
Great Concatenate Recording<br />
Great "Voice"<br />
Think about your desired call experience⎯and consider using professional<br />
voice talent rather than your receptionist. Professionals will provide<br />
better sounding prompts: they will know appropriate variations in<br />
pitch, rhythm, speed, duration of pauses, and elongation of words.<br />
In addition, professionals will avoid novice errors like wobble,<br />
nervous tics, sloppy diction, and colloquial pronunciations. Most<br />
importantly, talent is directable; professionals can respond to<br />
instructions regarding persona ideas and desired inflections, to<br />
create the proper tone for the application.<br />
Voice Talents versus Voice Actors<br />
Voice Talents are people who speak well, with good resonance<br />
and intonation. Voice Actors are people who are Voice Talents,<br />
but through the use of their voices alone, can also convey<br />
character, humor, sincerity, and meaning. Voice actors can take<br />
your application to the next level, providing a better experience<br />
for your callers.<br />
Voice Actors do not necessarily over-articulate everything. They<br />
stress the most important words and concepts, which often<br />
corresponds to what the system is trying to recognize. Voice Actors will<br />
also know how to record concatenated prompts with consistency. If<br />
feasible, using a Voice Actor will give your speech application polish⎯and<br />
this polish, combined with a well-designed prompt and grammar, will lead to<br />
more satisfied customers.<br />
Prompt Tips<br />
Never disguise the system as being a real person.<br />
Ensure that prompts elicit a predictable response.<br />
Offer the needed information at the right time,<br />
and try not to frontload the application with too<br />
much information.<br />
Never use "Press or Say 1" style prompts.<br />
List options from specific to general⎯so that<br />
users do not choose a general category when a<br />
more specific one will be offered later.<br />
Allow more time for difficult tasks: the pace of<br />
the recorded prompts dictates the pace at which<br />
the callers will respond.<br />
Keep the caller in the transaction by saying<br />
"first…next…finally" or similar transition words, in<br />
the appropriate places.<br />
Insert pauses in your prompts to allow<br />
experienced callers a turn-taking option to<br />
interrupt and move forward in the application.<br />
Provide audio rewards at the completion of<br />
difficult transactions, such as "great, wonderful,<br />
excellent..."<br />
<strong>LumenVox</strong>'s software proved to be<br />
very effective in precisely interpreting<br />
callers’ speech patterns.<br />
Chris Riggenbach CXM Product<br />
Development Manager<br />
48 49
1<br />
2<br />
3<br />
Alternate Pronunciations<br />
One of the most useful speech applications today is the front-end call<br />
router, however, it's also one of the most challenging applications<br />
because the system must recognize names.<br />
Sometimes names are derived from languages other than English, and the<br />
pronunciation reflects rules from the other language. Often these names<br />
contain sounds that are not apparent from the spelling, or the caller<br />
stresses a syllable that differs from the common pronunciation.<br />
Imagine a person looking at the name "Elicia." Is the initial sound a<br />
soft 'AX' as in "about", an 'EH' as in "bed" or is it stressed heavily<br />
with a long 'IY' as in "equal?" Is the third syllable pronounced with<br />
an 's' sound or a 'SH'?" The speech application needs help to<br />
determine this.<br />
If a word or name is not in the dictionary, the Speech Recognition<br />
Engine will try to figure out how that word is pronounced using a set<br />
of phonetic rules, similar to how a person might try sounding out the<br />
new word. Unfortunately, the Speech Engine is not always correct. A<br />
good indicator is that if a person has trouble figuring out how to<br />
pronounce a name, the speech engine will, too.<br />
Steps for Developing a Smooth Call Router:<br />
Figure out who the incoming callers are. Are they strangers, people who know the employees, or<br />
a combination of both? In other words, will the callers know how to pronounce the name correctly,<br />
or will other likely pronunciations need to be added? Do the callers refer to employees by first or<br />
last name only, and are they familiar enough to know people's nicknames?<br />
Find out what the Speech Engine thinks is the correct pronunciation, or whether other<br />
pronunciations are needed. You can use the Phonetic Speller located in the Speech Platform to<br />
see how the Speech Engine determines the pronunciations of the names, without having to run the<br />
Speech Engine itself.<br />
Add the new pronunciations for names into the grammar. Here's an illustration of this process:<br />
The name "Paty" (pronounced "Patty") is not a common spelling and is not in the Speech Engine's<br />
dictionary. When typing it into the Phonetic Speller, the system returns 'P EY DX IY' which sounds<br />
something like "Paydee", instead of the correct 'P AE DX IY'. To add a new pronunciation, the<br />
phonemes will need to be made within a set of curly braces. Adding a colon followed by the true<br />
spelling of the name helps readability, so it's a good idea to include it. The final entry of {P AE DX<br />
IY: Paty} as an alternative pronunciation should help the system's performance, since now callers<br />
who say "Patty" will be more likely to be recognized, instead of the incorrect {P EY DX IY}.<br />
Phonemes<br />
The unit of sound the recognition engine actually recognizes is the phoneme. All phrase formats<br />
are ultimately translated into phonetic spellings for decoding. These phonetic spellings can be<br />
directly entered if surrounded by curly braces.<br />
The phonetic alphabet used by the American English language model is below.<br />
Phoneme Example 1 Phonetic Spelling 1 Example 2 Phonetic Spelling 2<br />
Considering alternate pronunciations and spellings at the outset will help avoid errors<br />
50 and frustration later!<br />
51<br />
Vowels<br />
AA<br />
AE<br />
AH<br />
AO<br />
AW<br />
AX<br />
AXR<br />
AY<br />
EH<br />
ER<br />
EY<br />
IH<br />
IX<br />
IY<br />
OW<br />
OY<br />
UH<br />
UW<br />
Consonants<br />
B<br />
CH<br />
D<br />
DH<br />
DX<br />
F<br />
G<br />
HH<br />
JH<br />
K<br />
L<br />
M<br />
N<br />
NG<br />
P<br />
R<br />
S<br />
SH<br />
T<br />
TH<br />
V<br />
W<br />
Y<br />
Z<br />
ZH<br />
barn<br />
bat<br />
what<br />
more<br />
cow<br />
about<br />
butter<br />
type<br />
check<br />
church<br />
take<br />
little<br />
action<br />
team<br />
loan<br />
hoist<br />
book<br />
flew<br />
web<br />
chair<br />
reed<br />
with<br />
forty<br />
four<br />
peg<br />
halt<br />
cage<br />
coin<br />
late<br />
lemon<br />
night<br />
ring<br />
pay<br />
rest<br />
sit<br />
blush<br />
raft<br />
three<br />
van<br />
swap<br />
yes<br />
arms<br />
Asian<br />
B AA R N<br />
B AE T<br />
W AH T<br />
M AO R<br />
C AW<br />
AX B AW T<br />
B AH DX AXR<br />
T AY P<br />
CH EH K<br />
CH ER CH<br />
T EY K<br />
L IH DX AX L<br />
AE K SH IX N<br />
T IY M<br />
L OW N<br />
H OY S T<br />
B UH K<br />
F L UW<br />
W EH B<br />
CH EY R<br />
R IY D<br />
W IH DH<br />
F AO R DX IY<br />
F AO R<br />
P EH G<br />
HH AO L T<br />
K EY JH<br />
K OY N<br />
L EY T<br />
L EH M AH N<br />
N AY T<br />
R IH NG<br />
P EY<br />
R EH S T<br />
S IH T<br />
B L AH SH<br />
R AE F T<br />
TH R IY<br />
V AE N<br />
S W AA P<br />
Y EH S<br />
AA R M Z<br />
EY ZH AH N<br />
top<br />
crab<br />
cut<br />
auto<br />
house<br />
dial<br />
career<br />
life<br />
mess<br />
bird<br />
hail<br />
rib<br />
women<br />
keep<br />
robe<br />
joy<br />
look<br />
who<br />
bear<br />
statue<br />
dark<br />
other<br />
butter<br />
graph<br />
exam<br />
Jose<br />
Jack<br />
back<br />
really<br />
mail<br />
any<br />
ankle<br />
beep<br />
prior<br />
bass<br />
sure<br />
taped<br />
youth<br />
river<br />
wing<br />
year<br />
blaze<br />
genre<br />
T AA P<br />
K R AE B<br />
K AH T<br />
AO T OW<br />
HH AW S<br />
D AY AX L<br />
K AXR IH R<br />
K AY F<br />
M EH S<br />
B ER D<br />
HH EY L<br />
R IH B<br />
W IH M IX N<br />
K IY P<br />
R OW B<br />
JH OY<br />
L UH K<br />
HH UW<br />
B EH R<br />
S T AE CH UW<br />
D AA R K<br />
AH DH ER<br />
B AH DX AXR<br />
G R AE F<br />
IH G Z AE M<br />
HH OW Z EY<br />
JH AE K<br />
B AE K<br />
R IH L IY<br />
M EY L<br />
EH N IY<br />
AE NG K AH L<br />
B IY P<br />
P R AY ER<br />
B AE S<br />
SH UH R<br />
Y EY P T<br />
Y UW TH<br />
R IH V AXR<br />
W IH NG<br />
Y IY R<br />
B L EY Z<br />
ZH AA N R AH
Practical Guide To Tuning<br />
Untuned speech applications do not survive contact with customers:<br />
whether your company has live speech applications in deployment<br />
today, plans to implement one within the next three to six months, or is<br />
only beginning to consider adding speech applications, you should<br />
consider the importance of tuning. Tuning uses prompts, grammars,<br />
call flow, and caller data to improve the speech application as a whole.<br />
There are three ideas to keep in mind when approaching<br />
the tuning task:<br />
1<br />
Tuning Takes Time.<br />
Even the best of "best-practices" build on assumptions that might not hold<br />
true after deployment⎯once you have callers, you must often readjust or<br />
remove these assumptions to provide the quality experience callers expect.<br />
To give an idea of how much time tuning can take, the speech industry<br />
estimates that 40-50% of total development and deployment time should<br />
be spent on the tuning process. Putting emphasis on tuning will help your<br />
application run more smoothly, keeping callers happy.<br />
2<br />
Adapt the System to the Caller.<br />
Start with Small Changes.<br />
In general, you will not be able to make<br />
users do anything in any particular way.<br />
You can, and should, give as much<br />
guidance for callers as possible, but<br />
ultimately the caller dictates the<br />
conversation. The trick is to provide<br />
good cues and guidelines, so callers<br />
choose the pathway you designed for<br />
the application. Remember that if the<br />
system fails to meet the caller's needs,<br />
it's not the caller who has failed; it's<br />
the speech application.<br />
3<br />
It is all too easy to get caught up in the<br />
moment, expending hours of effort on a<br />
seemingly enormous problem⎯for something<br />
that really only affects a few out of several<br />
hundred callers. Identify the issues that are the<br />
easiest to resolve and provide the biggest<br />
benefit. Making small changes to improve the<br />
experience for most callers is preferable to<br />
costly changes that only benefit a few.<br />
Instead, try this process when tuning an application:<br />
Familiarize Yourself with the Caller's Experiences.<br />
1<br />
2<br />
Do this by listening to the calls, from start to finish. Compare<br />
the ASR results with respect to the audio prompts and the<br />
caller's speech. Transcribe the audio, so you can analyze<br />
the accuracy and performance.<br />
Use your ASR platform's reporting and analytical tools<br />
to maximize your information. You can even use<br />
<strong>LumenVox</strong> Speech Tuner on Nuance's 8.5 or<br />
ScanSoft's OSR.<br />
Above all, identify the key issues and prioritize them.<br />
Solve the easiest dilemmas first, like typical grammar<br />
problems. Then, move to prompt and dialogue<br />
changes, and finally proceed to acoustic model<br />
training and adaptations.<br />
Test Changes Rigorously.<br />
When you make a change, you must test it. You did<br />
the transcripts, and so you have the grammar and<br />
audio data: as much as possible, test under 'real'<br />
conditions. Give yourself the assurance that any<br />
change will help, and then test to find what solution<br />
works best.<br />
What you shouldn't do when tuning a speech application:<br />
Don’t Make Changes<br />
Based on One Instance.<br />
This should be fairly obvious, but it still<br />
happens. Making changes based on a single<br />
instance usually results in fixing a problem that<br />
doesn't really exist. There are numerous<br />
'one-off' errors in speech recognition, many of<br />
which are associated with noise, or transient<br />
effects that won't be generally reproducible.<br />
Real issues will arise multiple times, in multiple<br />
places, with plenty of evidence to help you<br />
decide how to solve them.<br />
Don’t Make Changes on<br />
Unanalyzed Reports.<br />
Treat the report with respect: analyze the call,<br />
compare it with other calls, see what really<br />
happened⎯often, the system worked as<br />
designed, but the design was flawed. Research<br />
the problem carefully so that you avoid<br />
unnecessary (and costly) changes.<br />
The <strong>LumenVox</strong> support team is<br />
always available and always willing to go<br />
the extra mile to provide us with<br />
excellent support.<br />
Derek Henry<br />
CEO of 1-800-US-LOTTO<br />
Corporation<br />
52 53
Tuning Grammars<br />
There are many places to make effective changes, but generally, we have<br />
found grammars to be the easiest and most effective place to start. In<br />
this segment, we will look at how to detect errors and modify grammars<br />
to optimize performance.<br />
Grammar Terms<br />
In-Grammar (IG) and Out-of-Grammar (OOG) are labels that look at whether the ASR matches<br />
a path in the grammar with what the caller actually said. If it can, then the spoken words are<br />
In-Grammar, if not, the spoken words are considered Out-of-Grammar.<br />
Confidence scores indicate the ASR's certainty about the answer it returns.<br />
Confirmations are dialog techniques to help the speech application avoid making a mistake, in<br />
cases where the results are ambiguous.<br />
Substitutions are a particular kind of error the ASR makes; this occurs when the result from the<br />
ASR does not match the words the caller said.<br />
<strong>LumenVox</strong>’s development and support<br />
staff has been very responsive to our<br />
requirements and issues.<br />
Out-Of-Grammar Indicators<br />
There are a number of ways to determine whether or not the error is due to an Out-of-Grammar<br />
issue. The easiest, most efficient way to discover this is to use the tools provided by the platform<br />
or ASR provider. Pre-configured reports will usually highlight these issues up front.<br />
If Out-of-Grammar is a big problem, you will likely receive many customer complaints and low<br />
completion and usage rates. Call lots will also have many "No Matches" or empty results. Finally,<br />
when you do obtain results, the confidence score will be significantly lower than the rest of the<br />
application.<br />
As the call logs are transcribed, look for low accuracies. ASRs will usually get anything that is<br />
In-Grammar, so if there is still low accuracy, look for a bunch of Out-of-Grammar speech.<br />
The other good indicator is a high Out-of-Grammar rate. Generally, this is a direct indication that<br />
callers are not saying things your grammar understands.<br />
There are a few reasons for Out-of-Grammar problems, most of which are easy to resolve.<br />
One common reason is that the grammar designer simply forgot to add items, but callers are asking<br />
for them. Leaving "next" out of a navigation grammar, or forgetting to add a product name is not<br />
uncommon.<br />
Another common error is forgetting common synonyms, for example, 'copier', but not 'Xerox', or<br />
different dialectal versions such as 'soda' in the west, and 'pop' in the South. Usually, you can just<br />
add the missing items.<br />
In-Grammar Indicators<br />
Typical In-Grammar issues will often be oriented towards improving the confidence scores of<br />
recognition, but you will also confront misrecognitions.<br />
Doug Behl<br />
President of Malibu Software<br />
What kinds of issues arise frequently?<br />
Regularly confused phrases are fairly common, and often result because two or more phrases sound<br />
quite similar. Another common issue is the result of bad pronunciations in the grammar. ASRs<br />
provide a methodology for arriving at pronunciations for words that aren't in the dictionary, but the<br />
automatic pronunciations are not always the best.<br />
So, how can we handle this?<br />
For regularly confused phrases, differentiate them by choosing alternate ways to describe the<br />
words. For bad pronunciations, you must add the pronunciations to the dictionary or grammar.<br />
There are tools for helping with this task, although they are nearly always ASR-specific.<br />
54 55
Failures and Fixes: Common<br />
Prompt Tuning Issues<br />
Effective prompt design takes time and practice⎯some errors will not<br />
present themselves until the prompts are tested. There are, however,<br />
some common issues that arise when tuning prompts, which should<br />
help streamline your prompt design and tuning process.<br />
Here's what you should do when callers…<br />
…Give Long, Perplexing Answers<br />
Callers continuously give full sentence answers instead of short, to the point answers. Typically,<br />
this occurs because the prompt asks a very open-ended question, such as "How may I help<br />
you?" Avoid these open-ended prompts; callers usually do not know what responses are<br />
appropriate at particular points in the call. The only real solution when this error occurs is to<br />
redesign the prompt to be more specific, or redesign the interaction to focus caller responses<br />
on specific tasks.<br />
…Answer with Out-of-Grammar (OOG) Phrases<br />
Callers regularly use a particular phrase that is not in the grammar. Prompts are designed to<br />
elicit particular pieces of information from the caller. Because of this, the prompts usually try to<br />
lead the caller to using the correct words or phrases to minimize recognition errors and caller<br />
confusion. When callers regularly use Out-of-Grammar phrases, it's usually because the prompt<br />
leads them to the wrong phrases. Two choices are available: include the Out-of-Grammar<br />
phrases, or revise the prompt to more obviously reflect available options.<br />
…Answer Randomly, or 'Hunt' for the Right Phrase<br />
Unclear, incomplete prompts force the caller to search unnecessarily for the correct response.<br />
Adding clarifying information will generally fix this problem.<br />
Transcription and Training<br />
Humans are exceptionally good at speech processing.<br />
We handle a variety of accents, speaking styles, pitch<br />
differences, noise, and more, with a high degree of<br />
accuracy. No speakers say the same word exactly the<br />
same way. Needless to say, this represents a considerable<br />
challenge for automated speech recognizers. Accurate<br />
transcription and tuning is essential.<br />
Every speech recognizer uses a statistical model of speech in<br />
order to perform good recognition. These models are built<br />
during training, where speech audio and text transcriptions are<br />
combined with algorithms that 'learn' how speech sounds. The<br />
models attempt to determine what 'average' speakers sound<br />
like when they speak particular words, and apply that<br />
knowledge to new incoming speech to determine what words<br />
were spoken.<br />
Words and speaking styles are different for every<br />
application domain (i.e., the vocabulary for a travel<br />
system is quite different from that of a financial<br />
application). Speech applications benefit from<br />
acoustic models specially trained with data<br />
from their specific domains.<br />
Transcribing audio data must be exact, word<br />
for word, and include noise tags so the<br />
system can learn the differences between<br />
noise and speech. To do this, the data must<br />
include as many speakers (both male and<br />
female) as possible, so that the new acoustic<br />
models accurately reflect the average speaker, and<br />
not just one or two particular speakers.<br />
With a larger volume of transcribed audio data,<br />
new models will perform better. New acoustic<br />
models will likely require a new round of<br />
tuning, particularly with respect to<br />
confirmation thresholds.<br />
…Answer 'Yes' or 'No,' Instead of Expected Content<br />
If callers respond 'yes' or 'no' when the prompt requests a content word, a poorly designed<br />
prompt is responsible. For example, the prompt might ask a question like "Do you want..."<br />
or "Would you like..." and pauses after the first choice. The pause can be long enough to<br />
make the caller believe the desired answer is yes or no, rather than a list of choices.<br />
Similarly, multi-item lists may pause too long at later points, as in, "Would you like<br />
pizza, soft drinks, or side items?" Generally, we recommend that<br />
you reserve "Do" and "Would" lead-offs for questions that require yes or no answers.<br />
<strong>LumenVox</strong> support is very fast and<br />
accessible.<br />
Kelly Lumpkin CEO of Alternate Access<br />
56 57
Standards and Systems Supported<br />
Industry Standards<br />
VXML/VoiceXML (Voice eXtensible Markup<br />
Language)<br />
SALT (Speech Application Language Tags)<br />
MRCP (Media Resource Control Protocol)<br />
SRGS (Speech Recognition Grammar<br />
Specification)<br />
SAPI 5 (Microsoft SAPI TTS)<br />
Operating Systems Supported<br />
Linux<br />
Windows NT, 2000, XP, 2003<br />
Telephony Standards<br />
PSTN (Public Switched Telephone<br />
Network)<br />
SIP (Session Initiation Protocol) -<br />
Signaling protocol for Internet<br />
conferencing, telephony, and instant<br />
messaging<br />
VoIP (Voice Over Internet Protocol) -<br />
Protocol to send audio and data<br />
information in digital<br />
DM3 - Intel® series of boards<br />
HMP (Host Media Processing) -<br />
Software that performs media<br />
processing tasks based on Intel®<br />
architecture<br />
Global Call - Protocol for handling the<br />
call control interface for Intel® cards<br />
<strong>LumenVox</strong> allows us to have control<br />
over the development of call flow,<br />
grammars and tuning, while keeping the<br />
costs at a respectable level.<br />
Brian Lauzon<br />
President of TelASK<br />
University Research<br />
Carnegie Mellon University<br />
Performs research for all aspects of speech<br />
recognition, including signal processing,<br />
acoustic model training, language model<br />
training, decoding, spoken language parsing<br />
and interface building.<br />
University of Colorado, Boulder<br />
Focused on research and education in areas<br />
of human communication technology.<br />
Oregon University<br />
Center for Spoken Language-Synthesis,<br />
Recognition and Enhancement.<br />
Stanford University<br />
The Center for the Study of Language and<br />
Information (CSLI) is an Independent<br />
Research Center founded in 1983 by<br />
researchers from Stanford University,<br />
SRI International, and Xerox PARC.<br />
UC Berkeley<br />
The major application area researched in<br />
the Speech Group at ICSI is speech<br />
recognition, although some of this work has<br />
led to basic research in auditory processing.<br />
M.I.T.<br />
Computer Science and Artificial Intelligence<br />
Laboratory that focuses on Spoken<br />
Language Systems.<br />
Other Speech Groups<br />
Carnegie Mellon Speech Group<br />
Develops user interfaces that improve<br />
human-computer and human-human<br />
communication.<br />
IEICE<br />
The Institute of Electronics, Information and<br />
Communication Engineers aims at the<br />
investigation and exchange of knowledge on<br />
the science and technology of electronics,<br />
information, and communications.<br />
Speech Publications<br />
Speech Technology Magazine<br />
Leading source of information promoting the<br />
speech technology solutions that are<br />
changing communications, and reports the<br />
technology needs of organizations<br />
worldwide.<br />
Telephone Strategy News<br />
Newsletter that includes full coverage of the<br />
impact of the Voice User Interface on<br />
telephony.<br />
ASRNews<br />
Monthly newsletter, which tracks the latest<br />
development in the speech recognition and<br />
text-to-speech marketplace.<br />
Call Center And Telephony<br />
Technology Marketing<br />
Corporation<br />
TMC publishes industry-leading print<br />
magazines including Internet Telephony,<br />
Customer Inter@ction Solutions, and<br />
Communications Solutions.<br />
CMP Media<br />
Leading multimedia company that prints<br />
numerous magazines, including Call<br />
Center and Communications Convergence.<br />
Business Communications<br />
Review Magazine<br />
Magazine for the enterprise network<br />
manager and other communications<br />
professionals.
Standards and Systems Supported<br />
Industry Standards<br />
VXML/VoiceXML (Voice eXtensible Markup<br />
Language)<br />
SALT (Speech Application Language Tags)<br />
MRCP (Media Resource Control Protocol)<br />
SRGS (Speech Recognition Grammar<br />
Specification)<br />
SAPI 5 (Microsoft SAPI TTS)<br />
Operating Systems Supported<br />
Linux<br />
Windows NT, 2000, XP, 2003<br />
Telephony Standards<br />
PSTN (Public Switched Telephone<br />
Network)<br />
SIP (Session Initiation Protocol) -<br />
Signaling protocol for Internet<br />
conferencing, telephony, and instant<br />
messaging<br />
VoIP (Voice Over Internet Protocol) -<br />
Protocol to send audio and data<br />
information in digital<br />
DM3 - Intel® series of boards<br />
HMP (Host Media Processing) -<br />
Software that performs media<br />
processing tasks based on Intel®<br />
architecture<br />
Global Call - Protocol for handling the<br />
call control interface for Intel® cards<br />
<strong>LumenVox</strong> allows us to have control<br />
over the development of call flow,<br />
grammars and tuning, while keeping the<br />
costs at a respectable level.<br />
Brian Lauzon<br />
President of TelASK<br />
University Research<br />
Carnegie Mellon University<br />
Performs research for all aspects of speech<br />
recognition, including signal processing,<br />
acoustic model training, language model<br />
training, decoding, spoken language parsing<br />
and interface building.<br />
University of Colorado, Boulder<br />
Focused on research and education in areas<br />
of human communication technology.<br />
Oregon University<br />
Center for Spoken Language-Synthesis,<br />
Recognition and Enhancement.<br />
Stanford University<br />
The Center for the Study of Language and<br />
Information (CSLI) is an Independent<br />
Research Center founded in 1983 by<br />
researchers from Stanford University,<br />
SRI International, and Xerox PARC.<br />
UC Berkeley<br />
The major application area researched in<br />
the Speech Group at ICSI is speech<br />
recognition, although some of this work has<br />
led to basic research in auditory processing.<br />
M.I.T.<br />
Computer Science and Artificial Intelligence<br />
Laboratory that focuses on Spoken<br />
Language Systems.<br />
Other Speech Groups<br />
Carnegie Mellon Speech Group<br />
Develops user interfaces that improve<br />
human-computer and human-human<br />
communication.<br />
IEICE<br />
The Institute of Electronics, Information and<br />
Communication Engineers aims at the<br />
investigation and exchange of knowledge on<br />
the science and technology of electronics,<br />
information, and communications.<br />
Speech Publications<br />
Speech Technology Magazine<br />
Leading source of information promoting the<br />
speech technology solutions that are<br />
changing communications, and reports the<br />
technology needs of organizations<br />
worldwide.<br />
Telephone Strategy News<br />
Newsletter that includes full coverage of the<br />
impact of the Voice User Interface on<br />
telephony.<br />
ASRNews<br />
Monthly newsletter, which tracks the latest<br />
development in the speech recognition and<br />
text-to-speech marketplace.<br />
Call Center And Telephony<br />
Technology Marketing<br />
Corporation<br />
TMC publishes industry-leading print<br />
magazines including Internet Telephony,<br />
Customer Inter@ction Solutions, and<br />
Communications Solutions.<br />
CMP Media<br />
Leading multimedia company that prints<br />
numerous magazines, including Call<br />
Center and Communications Convergence.<br />
Business Communications<br />
Review Magazine<br />
Magazine for the enterprise network<br />
manager and other communications<br />
professionals.