06.03.2015 Views

1 - LumenVox

1 - LumenVox

1 - LumenVox

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Table of Contents<br />

About <strong>LumenVox</strong><br />

Why Choose <strong>LumenVox</strong><br />

Discover <strong>LumenVox</strong><br />

Speech Recognition Engine<br />

Speech enables any application with its flexible API, powering every solution that<br />

<strong>LumenVox</strong> provides.<br />

Speech Platform<br />

Allows you to develop and deploy your speech application: with just a few steps,<br />

your application can go from conception to reality.<br />

Speech Driven Assistant<br />

Integrates seamlessly with Vertical Communication's TeleVantage IP-PBX, permitting<br />

you to speech-enable the name directory, contact list, voicemail, IVR, and email.<br />

Speech Tuner<br />

Maintains and tests existing applications, ensuring that any speech recognition<br />

application, including those driven by Nuance and ScanSoft⎯continue to work well.<br />

<strong>LumenVox</strong> Training<br />

Describes classes and instructors available to help you learn about the speech<br />

industry, application design, and tuning with <strong>LumenVox</strong>’s products.<br />

Application Development Overview<br />

Provides insight to help you develop high-quality and effective applications for your<br />

customer base.<br />

Tuning Guide<br />

Gives a basic overview of the steps required when tuning and improving your speech<br />

applications.<br />

2<br />

4<br />

6<br />

8<br />

14<br />

22<br />

26<br />

36<br />

40<br />

52<br />

1


About <strong>LumenVox</strong><br />

<strong>LumenVox</strong> is a speech recognition company with over a decade of<br />

telephony experience. We utilize a business and technology approach<br />

that allows businesses, corporations, resellers, platform providers, and<br />

service providers access to the complex speech recognition industry.<br />

Our revolutionary speech recognition software products have gained<br />

industry recognition, winning over 17 awards for innovation, technical<br />

excellence, and user's choice.<br />

Whether your organization wants to quickly and easily speech-enable current applications, or<br />

maintain existing ones, <strong>LumenVox</strong> provides all the necessary tools: our state-of-the-art Speech<br />

Recognition Engine, Speech Platform, Speech Driven Assistant, and Speech Tuner.<br />

Tired of extra costs for adaptations, application updates, and new deployments? At <strong>LumenVox</strong>, we<br />

believe that you know your company's needs best: our tools allow you to develop speech<br />

applications on your own terms, at your company's pace, without service fees for each new update.<br />

With <strong>LumenVox</strong>'s software suite, you remain in control.<br />

<strong>LumenVox</strong> Team<br />

Expert. Dedicated. Innovative.<br />

<strong>LumenVox</strong>'s development team⎯speech scientists, speech user interface designers, and experienced<br />

sales and marketing personnel⎯has over 30 years of practical experience in the development and<br />

integration of speech recognition systems, telephony, database design, and hardware integration.<br />

At <strong>LumenVox</strong>, we know the challenges and opportunities that come with integrating speech into<br />

your application. We offer guidelines, tips, and best practices, and our design team can assist in<br />

any phase of your development cycle. We understand speech.<br />

<strong>LumenVox</strong> is committed to providing the most powerful, flexible, and useful speech products and<br />

services with excellent customer service to clients of any size.<br />

Recent Awards<br />

<strong>LumenVox</strong> offers an impressive suite of<br />

versatile, world-class speech recognition<br />

technologies.<br />

Dr. Danny Lange<br />

CEO of Vocomo Software<br />

2 3


Why Choose <strong>LumenVox</strong>?<br />

Rethinking the Business of Speech<br />

Traditionally, speech recognition technology providers keep the<br />

development and maintenance of speech applications under wraps.<br />

Instead, <strong>LumenVox</strong> believes in empowering the users and developers of<br />

our speech recognition technology.<br />

Other technology providers create, deploy, and maintain their own proprietary applications for their<br />

clients. For this, the price tag of the final application can be in the millions, due to the extensive<br />

and on-going professional service fees. Traditional companies also have tiered pricing and<br />

functionality for their core Speech Recognition technology. So depending on what you need or want,<br />

the pricing can and will vary greatly. This greatly limits the development and expansion of speech<br />

recognition, preventing smaller businesses from utilizing these products.<br />

<strong>LumenVox</strong> Provides:<br />

Exceptional customer and technical support<br />

Hardware independent Automatic Speech Recognizer<br />

(ASR) with a distributed client-server architecture<br />

that runs on both Windows and Linux<br />

Extensive logging of audio, grammars, results, and<br />

scores, which allow you to recreate every call<br />

State-of-the-art testing and post-deployment tools to<br />

constantly improve your <strong>LumenVox</strong>, Nuance, or<br />

ScanSoft speech application<br />

Support and development of current and emerging<br />

industry standards<br />

We provide the tools and education needed when creating applications. We aspire to make speech<br />

recognition widely available, so our business model is structured on a single per port charge, as<br />

opposed to tiered pricing models.<br />

At <strong>LumenVox</strong>, we believe in the power of speech recognition to help revolutionize many industries,<br />

including businesses that are currently excluded from using speech by tiered pricing and costly<br />

professional service fees. Contact us to learn more about how speech recognition can work for your<br />

business.<br />

I think your product is the tops, and<br />

I’ve been around the block. The MAJOR<br />

KEY TO ME is that I can use it effectively<br />

right out of the box BUT ALSO create<br />

custom speech-activated (true-IVR)<br />

applications right down to the<br />

TeleVantage API level!<br />

Evan Klayman President of Brainstem<br />

4 5


Discover <strong>LumenVox</strong><br />

<strong>LumenVox</strong>’s Products<br />

The speech industry<br />

contains many<br />

different technologies,<br />

platforms, and types<br />

of applications. This<br />

chart enables you to<br />

quickly view the<br />

components of a final<br />

speech application, as<br />

well as the products<br />

<strong>LumenVox</strong> offers.<br />

Tuning<br />

The process of<br />

changing an<br />

application to improve<br />

performance.<br />

Applications<br />

Speech applications<br />

allow callers to access<br />

any database to get<br />

account information,<br />

perform order entry,<br />

take customer surveys,<br />

or check order status.<br />

Speech Tuner<br />

Complete maintenance tool to<br />

tune and test any speech<br />

application using various ASRs<br />

including <strong>LumenVox</strong>, Nuance,<br />

and ScanSoft.<br />

Speech Assistant<br />

Complete solution, currently<br />

for TeleVantage owners to<br />

speech-enable their name<br />

directories, contact lists,<br />

voicemails, IVRs, and emails.<br />

Professional Services<br />

Experts in the speech industry<br />

that provide a variety of<br />

services including speech<br />

application design,<br />

development, and tuning.<br />

Pre-Packaged<br />

Pre-packaged applications can<br />

offer features such as email<br />

and calendar management.<br />

Custom Built<br />

Custom speech applications<br />

can address any vertical or<br />

horizontal market need<br />

where touch-tone<br />

applications can not.<br />

Platform<br />

Defines what type of<br />

applications can be<br />

run, and how those<br />

applications are<br />

allowed to operate.<br />

Speech Platform<br />

Complete platform with a<br />

toolkit to design, and deploy<br />

speech applications. The Call<br />

Handler supports any phone<br />

system via analog, digital,<br />

ISDN, or PRI.<br />

VoiceXML<br />

Standards-based speech<br />

application programming<br />

language supported by many<br />

platforms.<br />

Others<br />

Many platforms are available<br />

that use languages, other than<br />

VoiceXML, for creating and<br />

running speech applications.<br />

Core Speech<br />

Technology<br />

These technologies<br />

form the basis for<br />

building any speech<br />

application.<br />

Speech Engine<br />

Automatic Speech Recognizer<br />

(ASR) that supports SRGS,<br />

MRCP, and SISR. Integrated<br />

in various VoiceXML and<br />

proprietary platforms.<br />

ASR<br />

Technology used for<br />

interpreting audio data from<br />

phone, web, microphone, or 2-<br />

way radio.<br />

Other Core Technology<br />

Text-to-Speech is used to<br />

produce audio from text. Voice<br />

Verification is used to identify<br />

individual speakers for security<br />

purposes.<br />

6 7


Speech Engine<br />

<strong>LumenVox</strong>'s Speech Recognition Engine is a flexible<br />

API that performs speech recognition on audio data<br />

from any audio source.<br />

The Speech Engine is speaker and hardware<br />

independent: it supports SRGS and SISR on both<br />

Windows and Linux platforms.<br />

How the Speech Engine works<br />

The Speech Engine provides speech application developers with an<br />

efficient development and runtime platform, allowing for dynamic<br />

language, grammar, audio format, and logging capabilities to customize<br />

every step of their applications. Grammars are entered as a simple list of<br />

words or pronunciations, or in the industry standard Speech Recognition<br />

Grammar Specification (SRGS), as defined by the W3C.<br />

Grammars<br />

Just 13 lines of code (8 calls to the<br />

Speech Engine) will implement a<br />

simple "yes-no" speech recognition<br />

system. The system must provide the<br />

audio and the audio data length for<br />

SoundData and SoundDataLength.<br />

<strong>LumenVox</strong>’s technology provides us<br />

with some of the very best in speech<br />

technologies today, at the right price for<br />

our customers.<br />

Mark Kelley President of Parallax<br />

Sample Code<br />

void RecognizeSpeech (void* SoundData, int SoundDataLength)<br />

{<br />

const char* GrammarString =<br />

"#ABNF 1.0\n"<br />

"language en-US;\n"<br />

"mode voice;\n"<br />

"tag-format ;\n"<br />

"$yes = (yes | yeah | okay):'true';\n"<br />

"$no = (nope | no):'false';\n";<br />

LVSpeechPort Port;<br />

Port.OpenPort ();<br />

Port.LoadGrammarFromBuffer (0, GrammarString);<br />

Port.LoadVoiceChannel (0, SoundData,SoundDataLength, ULAW_8KHZ);<br />

Port.Decode (0, 0, LV_DECODE_SEMANTIC_INTERPRETATION |<br />

LV_DECODE_BLOCK );<br />

int NumInterpretations = Port.GetNumberOfInterpretations (0);<br />

for (int i = 0; i < NumInterpretations; ++i)<br />

cout


Supporting Standards<br />

<strong>LumenVox</strong> supports the W3C's Speech Recognition Grammar<br />

Specification (SRGS), part of the VoiceXML 2.0 and SALT<br />

specifications. Companies that track these specifications are<br />

dedicated to the future of speech, and to integrating with<br />

other companies committed to promoting speech recognition.<br />

The <strong>LumenVox</strong> SRGS implementation is backward compatible<br />

with the existing <strong>LumenVox</strong> BNF grammar format; current<br />

deployments will leverage the power of the SRGS system<br />

immediately and transparently.<br />

Both companies (<strong>LumenVox</strong> and<br />

SandCherry) have focused on simplifying<br />

the development, integration, and<br />

deployment of speech services while<br />

maintaining affordability.<br />

Charles Corfield<br />

President and CEO of<br />

SandCherry<br />

<strong>LumenVox</strong> recognizes that the speech community will need to<br />

work together to develop solutions for businesses, and as<br />

such, <strong>LumenVox</strong> applications complement the following<br />

technologies:<br />

VXML<br />

VoiceXML (VXML) is a mark-up language designed to code speech<br />

applications with many of the same architectural components as HTML.<br />

VoiceXML platforms connect to a combination of speech recognition engines,<br />

text-to-speech synthesis, telephony interfaces and VoiceXML interpreter<br />

software to process the call. In order to interface VXML with any speech<br />

engine, the engine must understand SRGS and SISR.<br />

<strong>LumenVox</strong>'s Speech Engine is compliant with what VXML expects, and our<br />

engine powers the speech recognition portion of several VXML platforms.<br />

SALT<br />

Speech Application Language Tags (SALT) is similar to VoiceXML but also<br />

adds support for multi-modal systems. SALT extends existing mark-up<br />

languages such as XHTML, XML, and HTML. Similar to our work with VXML,<br />

the <strong>LumenVox</strong> Speech Recognition Engine conforms to SALT specifications.<br />

Semantic Interpretation<br />

Engine Features and Functionality:<br />

Streaming audio<br />

<strong>LumenVox</strong> has implemented the W3C's Semantic Interpretation for Speech<br />

Recognition (SISR) working draft, also part of the VoiceXML 2.0 specification.<br />

SISR allows grammar authors to embed snippets of JavaScript code into<br />

Supports English, Latin American Spanish, and Canadian French<br />

their SRGS grammars, to automatically transform what a speaker says into a<br />

format understandable to an application. With <strong>LumenVox</strong>'s Semantic Tags,<br />

Flexible API easily integrates into current OA&M, billing, provisioning, and debugging<br />

callers can say, "September thirteenth two thousand four," and your<br />

systems<br />

application will understand "2004-09-13."<br />

<strong>LumenVox</strong> is committed to supporting the W3C's working draft. As the draft<br />

evolves, we will support both new and old drafts, so application developers<br />

can be confident that their grammars and tags will perform to specification.<br />

Client/Server architecture distributes speech-processing load<br />

Run time defined grammars entered as simple text, BNF, raw phonetic spelling, or SRGS<br />

Advanced dynamic barge-in adapts to each call in real-time<br />

SDK includes documentation and a demonstration C/C++ application<br />

10 Flexible error recovery through the use of confidence scores and n-best results<br />

11


Advanced Features<br />

Noise Reduction Module<br />

When noise is present, it will degrade the performance of any speech recognition system.<br />

Quality noise reduction improves the accuracy of Voice Activity Detection and Core<br />

Recognition, both essential parts of a speech recognition system.<br />

To improve application robustness in noisy environments, <strong>LumenVox</strong> implemented a Noise<br />

Reduction Module (NRM) into our Speech Recognition Engine. The NRM automatically<br />

adapts to the acoustic environment, and dynamically updates its estimate of noise levels.<br />

The adaptive algorithm enables the NRM to reduce the effects of noise.<br />

The waveforms below demonstrate the power of <strong>LumenVox</strong>'s Noise Reduction Module. In<br />

the original audio [Fig. 1], a truck driver speaks on a cell phone while driving. In addition<br />

to noise from the truck engine and blowing wind, another vehicle engine starts in the middle<br />

of the recording. Although traditional noise reduction implementations often fail to adapt to<br />

such dramatic changes, <strong>LumenVox</strong>'s NRM adjusts to the new noise characteristics rapidly<br />

and automatically. [Fig.2]<br />

Fig. 1 Original Audio<br />

Truck’s engine noise Another vehicle starting<br />

Fig. 2 Audio after noise cancellation<br />

Reduced truck’s noise Another vehicle starting Adapting to new engine noise<br />

Voice Activity Detection<br />

Voice Activity Detection (VAD), also referred to as barge-in and/or End-Of-Speech (EOS) detection,<br />

identifies when a person begins speaking, finishes speaking, or pauses while speaking.<br />

<strong>LumenVox</strong>'s VAD implementation delivers high performance despite challenging conditions: hisses,<br />

pops, abrupt changes in background noise, telephone line echo, and squawks from two-way radio<br />

communication.<br />

The Voice Activity Detection module is highly configurable and can be adapted to work equally well<br />

within telephone, VoIP, or microphone-based applications.<br />

We are delighted that <strong>LumenVox</strong> is<br />

extending the capabilities of our<br />

platform.<br />

Media Resource Control Protocol (MRCP)<br />

Speech synthesizers…Audio recorders…DTMF recognizers…Speech<br />

recognizers…Speech verifiers…a fully functioning, media-rich application<br />

needs lots of components to work together. Until now, all of these<br />

components had to be provided by a single vendor, or required extensive<br />

custom programming to integrate them. MRCP changes all this. The<br />

Media Resource Control Protocol allows you to seamlessly manage<br />

diverse media resources. MRCP provides a common language to speak<br />

to all of these devices.<br />

With MRCP, vendors can compete on the basis of their strengths, rather<br />

than attempting to create an all-inclusive, yet mediocre package.<br />

Customers can take the best product from each vendor, creating a<br />

speech application package that is tailored to their particular needs.<br />

For detailed information visit:<br />

http://www.ietf.org/internet-drafts/draft-ietf-speechsc-mrcpv2-06.txt<br />

n-best Results<br />

Instead of returning only the top scoring result, you can<br />

instruct the engine to return several of the highest scoring,<br />

most likely answers, often called n-best results. Returning<br />

n-best results is particularly effective when callers need to<br />

spell names, street addresses, or e-mail addresses.<br />

Without n-best results, if a caller spells a name beginning<br />

with "N," but the engine returns a low confidence score, the<br />

caller would be asked to repeat the letter-and given how<br />

similar "N" is to "M," it's likely that the second answer<br />

would have a similarly low confidence score. With n-best<br />

results, the system can prompt the caller using several of<br />

the likely results, such as "Did you mean 'M,' as in 'Mary'?"<br />

When the caller responds, "No," the system goes to its next<br />

option, "Perhaps you meant 'N' as in 'Nancy'?"<br />

Returning n-best results improves the caller's experience:<br />

instead of asking the caller to simply repeat an answer that<br />

received a low confidence score, the system can confirm the<br />

caller's intention using several likely choices.<br />

Server-Side Grammar<br />

<strong>LumenVox</strong> offers even more efficient support for<br />

large grammars, by allowing clients to pre-load<br />

grammars onto the server, allowing users to send<br />

the grammar prior to the decode requests.<br />

Typically, the grammar itself accompanies each<br />

decode request, but in the case of large grammars,<br />

sending the grammar to the server prior to<br />

decoding is more efficient⎯reducing network traffic.<br />

John Hibel Vocalocity’s Vice President<br />

of Marketing and Business Development<br />

12 13


Speech Platform<br />

The Speech Platform is an intuitive GUI-based toolkit to quickly design,<br />

develop, and deploy any speech application or IVR. By connecting to<br />

almost any phone system and database, the Platform can easily be<br />

designed for a Speech Driven Technical Support, Call Router, Customer<br />

Service, Dealer Locator, Auto-Attendant, or any other speech<br />

application.<br />

Platform Features:<br />

English, Latin American Spanish, and Canadian French Support<br />

Client/Server functionality<br />

Database Connectivity through Custom Action DLLs<br />

Call Bridging and Outbound Dialing through Custom Action DLLs<br />

Support for Intel Dialogic Dx1E, JCT, Dx2, JCT, and DMV Series cards<br />

Enterprise level distribution<br />

User-created sophisticated grammars<br />

Loop start / Analog or T1 / PRI ISDN / Digital Switch<br />

Barge-In capability<br />

Live updates without rebooting system<br />

DTMF and speech input<br />

Detailed Call Flow logging<br />

Live runtime GUI status monitoring<br />

Complete Call Flow handling<br />

Flash-hook transfer capability<br />

Flexible Call Job definitions<br />

Carrier-grade application-ready<br />

Full SRGS, SISR support<br />

On-the-fly project switching<br />

File or SQL/MSDE Database-based projects<br />

Flexible line setting<br />

Assign each phone line to a different database, file project, or CAPI only mode<br />

Noise Reduction Module<br />

Speech Tuner included<br />

Speech Recognition Engine included<br />

<strong>LumenVox</strong>’s Speech Platform is a<br />

clear leader in the speech recognition<br />

sector.<br />

Nadii Tehrani Chairman of TMC<br />

14 15


Everything You Need<br />

<strong>LumenVox</strong>'s Speech Platform includes all of the components you need to<br />

produce, adjust, and maintain your speech applications. Designed from<br />

the outset to work together, the Platform's components operate<br />

seamlessly.<br />

Platform Component Descriptions:<br />

The Platform Designer allows you to construct the framework for your speech<br />

application in a GUI environment.<br />

Platform Extensions are used to handle any situation that the Platform Designer<br />

cannot support internally.<br />

The Speech Engine recognizes what the caller says, returning the results to the Call<br />

Handler.<br />

The Call Handler works with the Platform’s Designer, Extensions, and Speech<br />

Recognition Engine, executing the logic of your application, and directing calls<br />

appropriately.<br />

1<br />

2<br />

3<br />

The Call Flow View allows the designer of the speech<br />

application to see all of the modules that are associated with<br />

the application and how they relate to each other. This is<br />

where the application dialog flow is created and can be<br />

tracked to see how callers will flow through the speech<br />

application or IVR.<br />

The Objects Panel allows application designers to drag and<br />

drop Modules that perform a specific function, or to add<br />

notes with Annotations to provide quick information or<br />

reminders on the Call Flow itself. Actions Lists, Actions,<br />

and Grammars are drag and drop icons to add within each<br />

individual Module's properties.<br />

The Properties Window has six tabs: Project, Modules, Audio<br />

Library, Quick Audio, To Do, and Notes. Project shows<br />

theglobal and project specific properties. Modules is a list of<br />

modules you have created so far for the application. The<br />

Audio Library and Quick Audio displays all audio prompts<br />

contained within the project. To Do reminds you that there<br />

are objects that have not been finished within the whole<br />

project. Notes provides a place to enter comments or<br />

thoughts concerning the applications.<br />

The Speech Tuner provides a comprehensive window into your application: it allows<br />

you to quickly note where your application performs well, and determine which<br />

areas need improvement. The Tuner also lets you simulate changes to your<br />

application, using audio from past calls, to determine the effectiveness of each<br />

change.<br />

2<br />

<strong>LumenVox</strong> allows us to have control<br />

over the development of call flow,<br />

grammars and tuning, while keeping the<br />

costs at a respectable level.<br />

Brian Lauzon President of TelASK<br />

3<br />

1<br />

16 17


Platform Extensions<br />

The Speech Platform can be extended, allowing you to create a<br />

number of speech applications, by accessing pre-built libraries and<br />

controlling the call flow. These Platform Extensions can be written<br />

as either a Visual Basic ActiveX exe or C/C++ DLL.<br />

Examples of Platform Extensions:<br />

Connect to a live, or regularly updated, website or RSS<br />

feed<br />

Connect to a database via an ADO (ActiveX Data<br />

Object)<br />

Use SRGS for grammars and enable semantic<br />

interpretation<br />

Direct callers to different modules within the<br />

Application Designer project based on external<br />

decision trees<br />

Example Applications:<br />

Pin code and account number capture<br />

Survey systems<br />

Automated billing<br />

Collecting demographic information<br />

Auto-attendant<br />

Transaction completion<br />

Appointment scheduling<br />

Email reader<br />

Voicemail access<br />

By utilizing <strong>LumenVox</strong>’s Speech<br />

Platform our callers actually enjoy the<br />

experience of the phone call, helping us<br />

to build a good relationship with our<br />

customers.<br />

Derek Henry CEO of 1-800-US-LOTTO<br />

Corporation<br />

VB ActiveX Platform Extension Example<br />

18 C/C++ Platform Extension Example 19


Call Handler<br />

The Speech Platform's Call Handler runs the speech application. The Platform's Settings, and<br />

more specifically Line Settings, inform the Call Handler as to which Project to use.<br />

CallHandler<br />

The Call Handler supports a variety of Intel Dialogic telephony<br />

cards, and is designed to make hardware setup as easy as possible;<br />

this allows more time for you to develop and test your speech<br />

application.<br />

When developing your speech application with the Speech Platform's Designer, run tests by<br />

clicking on the Test button in the Toolbar. The Call Handler allows you to use speakers and a<br />

microphone, to completely test your application in-house⎯before releasing it to customers.<br />

Handler<br />

Design Tips<br />

Build the application for the expected case, anticipating the caller's situation and the<br />

application's goals. Focus on these goals as you design the call flow and dialogues.<br />

Keep prompts and grammars concise in your initial design, then expand based on<br />

callers' interactions, learned from the tuning process.<br />

Give thought and time to determine the persona and brand image. Verify that the<br />

voice talent creates the proper tone and perception for your company.<br />

Make the system match the user. Listen to the way a user responds and interacts<br />

with the system.<br />

Apply the strengths of speech recognition to your application: remember that many<br />

of the hierarchical structures used in DTMF or touch tone call flows are not<br />

appropriate in speech applications!<br />

The reason we chose <strong>LumenVox</strong>’s<br />

speech technology was simply because of<br />

the flexibility it provides us, as developers<br />

Never disguise a list question as a yes/no by adding unnecessary pauses, as in,<br />

"Would you like Red (pause), Blue, or Black?" This leads to caller confusion.<br />

Brian Lauzon<br />

President of TelASK<br />

Adjust the "No Input" timeout to match the complexity of the question.<br />

Make explicit decisions with yes/no.<br />

Make it easy for the caller to leave the speech application and reach a live person.<br />

20 21


Speech Driven Assistant<br />

<strong>LumenVox</strong>'s Speech Driven Assistant for TeleVantage is a complete<br />

turn-key program to speech-enable name directories, contact lists,<br />

voicemail, IVRs, and emails, providing all callers hands-free phone<br />

interactions. The system also allows for alternate names, nicknames,<br />

and various spellings and pronunciations to be recognized.<br />

<strong>LumenVox</strong>’s Speech Driven Assistant<br />

for TeleVantage has become the most<br />

valued component in our operation of a<br />

speech-activated hotline, 1-800-US-LOTTO.<br />

Derek Henry<br />

CEO of 1-800-US-LOTTO<br />

Corporation<br />

Remote Access to:<br />

Speech enabled outbound dialing<br />

Users can say a name from their personal contact lists when they want to place a call<br />

Speech enabled name directory<br />

Callers can speak the name of the person or department they want to reach<br />

Support for workgroups and multiple company configurations<br />

Transfer fax calls to specified extension<br />

Add alternate names, spellings, and pronunciations<br />

Speech enabled voicemail access<br />

Access, control, and manage voicemail with speech<br />

Reply directly to caller's phone number or extension<br />

Forward messages to other users<br />

Speech enabled IVR (<strong>LumenVox</strong>'s Speech Platform)<br />

GUI-based development tool to create a variety of IVR applications<br />

Supports both Speech and DTMF input<br />

External database access<br />

Speech enabled access to email<br />

Access POP3, Exchange Server, and IMAP email<br />

Reply to email in an attached recorded audio format or with user predefined text messages<br />

NeoSpeech Text-to-Speech (TTS)<br />

Play prompts of un-recorded proper names, dynamic IVR applications, and emails<br />

Assistant Features:<br />

Speech activated dialing to contact list<br />

Transfer to another extension from voicemail<br />

Navigate, forward, save, and delete voicemail<br />

IMAP, POP3 and Exchange server email access<br />

DTMF and Speech input<br />

Configure Speech Driven Assistant database remotely<br />

Barge-in capability with CSP compatible Intel Dialogic cards<br />

Live run-time GUI status monitoring<br />

Speech Platform included<br />

Speech Tuner included<br />

NeoSpeech Text-to-Speech included<br />

22 23


Configuration Tool<br />

User’s email account information and pre-defined<br />

text message replies are easily managed.<br />

<strong>LumenVox</strong>’s Configuration tool is where all modifications to the Speech<br />

Driven Assistant are made.<br />

The Configuration program allows<br />

administrators to modify user<br />

information, maintain email<br />

servers, configure speech<br />

applications, and check the<br />

server’s telephony hardware<br />

compatibility.<br />

Administrators can easily select and<br />

change any user’s information, which is<br />

continuously synchronized with<br />

TeleVantage’s database.<br />

The Configuration tool<br />

can also discover and<br />

display all Intel Dialogic<br />

cards installed. All<br />

features associated with<br />

the telephony cards are<br />

shown as a quick<br />

reference of its<br />

capabilities.<br />

Alternate names or departments<br />

associated with a particular user can<br />

be added quickly to the live system.<br />

For commonly mispronounced<br />

names, alternate<br />

pronunciations can be added.<br />

The Phonetic Speller provides<br />

alternate pronunciations as<br />

well as the chance to hear<br />

how it sounds, using Text-to-<br />

Speech.<br />

<strong>LumenVox</strong>’s Speech Driven Assistant<br />

has provided the best integration to<br />

TeleVantage of any other speech<br />

recognition product we tested.<br />

John Gagliardi<br />

President of GTI Solutions<br />

24 25


Speech Tuner<br />

<strong>LumenVox</strong>'s Speech Tuner is a complete<br />

maintenance tool for end-users, valueadded<br />

resellers, and platform providers.<br />

It’s designed to perform tuning and<br />

transcription, as well as parameter,<br />

grammar, and version upgrade testing of<br />

any speech application.<br />

With this GUI-based tool, companies<br />

developing speech applications on<br />

various ASR platforms (including<br />

Nuance and ScanSoft) can bring speech<br />

application tuning in-house and avoid<br />

professional service fees.<br />

<strong>LumenVox</strong> is on the cutting edge of<br />

speech technologies and customer<br />

satisfaction by supporting not only their<br />

own speech engine, but other leaders in the<br />

industry as well.<br />

Bruce Balentine<br />

Executive Vice President<br />

and Chief Scientist at EIG<br />

Why Do Companies Need the<br />

<strong>LumenVox</strong> Speech Tuner?<br />

No untuned speech application<br />

survives contact with actual<br />

customers. Tuning is an absolute<br />

requirement for every speech application<br />

deployment. Our Tuner allows you to<br />

quickly assess changes and upgrades.<br />

Tuner Capabilities:<br />

Evaluate and improve the speech recognition<br />

application<br />

Analyze each stage in the call process<br />

Transcribe audio data, make pinpoint adjustments,<br />

and immediately measure the effects on performance<br />

Test changes against actual calls immediately<br />

Analyze data collected using different ASR engines<br />

Test design and development decisions of new<br />

applications, using data from deployed applications<br />

26 27


Tuner Overview<br />

The Speech Tuner is comprised of several key functions and windows.<br />

SRE Custom Tags<br />

displays log<br />

information supplied<br />

by the application’s<br />

developers.<br />

The Call Log displays the list calls and controls the display<br />

of information in the rest of the Tuner. Each interaction<br />

(or turn) in a call is marked with an event type, such as:<br />

: Beginning of the call<br />

: Speech event<br />

: Touch tone event<br />

: 'No input' event; the application<br />

did not detect any speech or touch tones<br />

: 'Unknown' event; these<br />

events are typically ASR-specific<br />

: End of the call<br />

The Transcription tool allows a<br />

transcriber to type the text of the<br />

caller's speech. Transcriptions<br />

are automatically evaluated and<br />

stored in the database for use in<br />

performance evaluations.<br />

The Statistics window<br />

displays performance<br />

statistics directly related<br />

to the calls in the Call Log<br />

window.<br />

The Answer window shows<br />

recognition results. Full<br />

n-best support is available,<br />

as well as the semantic<br />

interpretations, and actual<br />

words recognized.<br />

28<br />

Listen to the application prompt, and the caller's pre- and postprocessed<br />

speech. The recognized words are displayed where the<br />

ASR found them within the caller's speech. Vertical bars, which are<br />

useful for detecting problems, indicate the beginning and end of each<br />

word.<br />

29


Tuning Processes<br />

<strong>LumenVox</strong>'s Speech Tuner provides full support for <strong>LumenVox</strong>'s Speech<br />

Recognition Engine, Nuance 8.5, ScanSoft OSR 2, and other ASRs. The<br />

Speech Tuner allows you to work with any supported ASR via a single<br />

interface.<br />

<strong>LumenVox</strong> is an active supporter of the Tools committee in the VXML<br />

Forum, and is working to help define standard logging information, to<br />

help ease the tuning process.<br />

The tuning process involves three easy steps:<br />

Import Data.<br />

1<br />

2<br />

3<br />

The basic process is simple. Users import call log data into<br />

the Speech Tuner database. All information stored by the call<br />

log is available in the Speech Tuner. In most cases, log fields<br />

between ASR engines are very similar; when the information<br />

differs, every effort is made to preserve the original data.<br />

Each special case is fully documented.<br />

Transcribe Speech.<br />

Transcribers can type the text of the caller's speech directly<br />

into the Speech Tuner. Once the audio is transcribed, the<br />

Tuner compares audio transcripts with the speech engine<br />

results to determine accuracy, greatly reducing errors<br />

associated with hand evaluations. If semantic interpretations<br />

are available, the transcriber can also mark whether the<br />

semantic interpretation was correct or incorrect. The<br />

transcripts are evaluated using the actual decode grammar,<br />

producing measurements such as word-error-rate, in- and<br />

out-of-grammar rates, and semantic error rates.<br />

Test Immediately.<br />

Selecting an interaction in the Call Log automatically loads<br />

the associated audio and grammar into the Tester. The<br />

grammar can be edited, speech engine parameters set, and<br />

individual recognition tests generated. The Speech Tuner<br />

natively supports industry standard SRGS grammars. Once<br />

a set of possible changes is identified, users can batch test<br />

audio to evaluate performance, using those changes.<br />

The Speech Tuner assumes the user possesses licensed<br />

versions of the relevant ASR, that the ASR platform is up and<br />

running, and that the platform is able to accept connections.<br />

<strong>LumenVox</strong> Speech Tuner Database<br />

The Speech Tuner communicates with an open-source, freeware database called SQLite<br />

(www.sqlite.org). The Speech Tuner manages call log importing, searching, and exporting⎯so<br />

users can focus on the task of tuning, not log management. The database is contained in a single<br />

file, is easy to back up and transport, and can be queried using SQL-92 (see the SQLite website for<br />

full details) from a variety of exterior tools. Other speech engine vendors are free to convert their<br />

native logs to ones the engine understands. The format, content, and semantics of the <strong>LumenVox</strong><br />

Speech Tuner database are published.<br />

The database maintains all the information contained in the original call log. The Speech Tuner<br />

includes not only the decode grammar and ASR results, but also the decode platform, parameter<br />

settings, alternative results, prompt audio, and pre- and post-processed audio.<br />

Depending on the platform logging capabilities, the database can provide more advanced<br />

information, such as ASR result alignments within the audio; the list of phonemes used in the<br />

decode; and word, utterance, and semantic interpretation confidence measurements.<br />

In addition, the Tuner stores all transcripts and evaluations within the call log. As transcripts are<br />

entered into the Speech Tuner, they are automatically evaluated against the decode grammar.<br />

These transcripts, and any notes or additional information, are stored directly into the database.<br />

Individual scores⎯such as word error rate, semantic error rate, and in- and out-of-grammar<br />

measurements⎯are stored along with their alignments, as well as information about how the scores<br />

were reached.<br />

Users can generate a variety of reports from these results, including error rate by grammar or<br />

dialog, confusion matrices, transcription progress, and confidence thresholds for confirmation or<br />

rejection settings.<br />

In the future, <strong>LumenVox</strong>'s Speech Tuner will also support back-end database replacement, for use in<br />

enterprise level systems, where multiple users will be analyzing the same data simultaneously.<br />

Companies who use an ODBC-capable database can replace, with certain SQL changes, the diskbased<br />

SQLite system with an enterprise system such as MS SQL Server 2000, MySQL, PostgreSQL,<br />

and/or Oracle.<br />

<strong>LumenVox</strong> has created speech<br />

recognition products that are easy to<br />

code with and GUI-based tools, such as<br />

the new Speech Tuner that greatly<br />

simplifies post-deployment<br />

maintenance.<br />

Vern Baker<br />

President of enGenic<br />

Corporation<br />

30 31


Taking Out the Guesswork<br />

Make changes to grammars, parameters, or ASR engines, secure in the<br />

knowledge that those changes will make the application better, faster,<br />

and more accurate. The Speech Tuner uses historical information to<br />

validate your changes, ensuring your success.<br />

Grammar Tester<br />

Most 'tuning' tools are passive log viewers, requiring that changes be<br />

made in the live speech application and retested over a period of time<br />

with live callers. With <strong>LumenVox</strong>'s Tuner, we send the changes to the<br />

Speech Engine, simulating the recognition process and evaluating<br />

changes instantly. Instead of slow, non-interactive, static tuning, the<br />

Speech Tuner enables on-the-fly, highly interactive, dynamic tuning.<br />

Make a change, do the test, get the results!<br />

The Grammar Tester is a dynamic<br />

testing component. You can switch<br />

ASR engines, grammars, and engine<br />

search parameters on-the-fly, and test<br />

changes in single or batch tests.<br />

Grammar Evaluation<br />

Evaluate speech and grammar sets against the speech engine, as they<br />

took place during the actual call. Adjust grammars and instantly<br />

re-test and re-score to evaluate improvements in performance. With<br />

<strong>LumenVox</strong>'s Speech Tuner, you can instantly determine whether adding<br />

a new phrase to the grammar will improve your accuracy.<br />

Parameter Evaluation<br />

Setting parameters optimizes the speech engine performance, further<br />

improving the caller's experience. Traditionally, changing ASR<br />

parameters is a difficult and time-consuming task, often requiring long<br />

delays between changing a parameter, and evaluating its effects on<br />

performance. Our Speech Tuner can dramatically shorten the process.<br />

The dynamic test capability of the <strong>LumenVox</strong> Speech Tuner allows the<br />

user to quickly make and test parameter changes: now, ASR engine<br />

parameters such as search optimizations, speech end-pointing, and<br />

n-best result processing can be easily adjusted, and immediately<br />

re-tested and re-scored from within the Speech Tuner.<br />

Performance Measurements<br />

The Speech Tuner rates performance against commonly accepted<br />

measures like WER (Word Error Rate), Grammar Coverage, and<br />

Semantic Interpretation matching. This helps give an accurate picture<br />

of details such as average confidence scores, correct versus incorrect<br />

responses, and In-Grammar versus Out-of-Grammar performance.<br />

Assessing Upgrades<br />

Installing new versions of platforms and ASR engines entails a certain<br />

risk with each new upgrade. On occasion, new default settings, search<br />

routines, changes to acoustic models, and so on will actually worsen<br />

the caller's experience, until the application is re-tuned. But using the<br />

<strong>LumenVox</strong> Speech Tuner, you can perform baseline testing with the old<br />

version to establish the minimum acceptable performance. Then,<br />

using the upgraded version of the ASR engine, you can easily re-test<br />

all existing data and compare the results to the baseline. The new<br />

performance, judged against the baseline, gives you the information<br />

you need to make a decision, and deploy an upgrade with confidence.<br />

32 33


Tuner Reports<br />

The <strong>LumenVox</strong> Speech Tuner defines several pre-built queries for most<br />

common reports. The reports are generated using SQL queries against<br />

the Tuner database, with results produced in a pre-defined XML format.<br />

The format, content, and semantics of the <strong>LumenVox</strong> Speech Tuner<br />

database are published: if you need to extract data in logs that are not<br />

provided in the Tuner interface by default, you can easily produce<br />

custom reports by writing SQL commands.<br />

Tuner Tips<br />

When a problem occurs with a transaction,<br />

determine if the fault lies primarily with<br />

prompts or grammars.<br />

Never make a change in your call flow or<br />

grammar for just one failed call.<br />

Train acoustic models with environment<br />

noise in play.<br />

Train acoustic models to allow for caller<br />

dialects and regional pronunciations.<br />

Tune grammars, prompts, and all system<br />

parameters.<br />

Remember that accurate transcriptions<br />

need to account for noise.<br />

Speech Understood<br />

Speech Understood<br />

The Speech Tuner is an excellent fully<br />

integrated tool for improving speech<br />

applications.<br />

Bruce Balentine Executive Vice President<br />

and Chief Scientist at EIG<br />

34 35


<strong>LumenVox</strong> Training Courses<br />

The <strong>LumenVox</strong> Team, a group of knowledgeable<br />

professionals with extensive development and<br />

technical support experience, provides our<br />

courses. We will help you get the most out of<br />

your <strong>LumenVox</strong> products.<br />

Courses include Speech Application Design, API Development,<br />

Speech Application Tuning, and many others. These courses give<br />

developers and business personnel opportunities to learn about<br />

speech development and sales opportunities on our premises, with<br />

the assistance and advice of the <strong>LumenVox</strong> Team.<br />

Who Should Attend<br />

People responsible for developing, maintaining,<br />

marketing, and/or selling <strong>LumenVox</strong> speech<br />

applications will benefit from <strong>LumenVox</strong> training.<br />

Classes also benefit anyone with an interest in<br />

designing, developing, tuning, testing, or maintaining<br />

any speech telephony system.<br />

What to Expect<br />

<strong>LumenVox</strong> training is key to accelerating your learning curve. Through a combination of<br />

presentations and hands-on exercises, our courses provide the details of creating and maintaining<br />

applications. In these courses, you will learn solutions to real problems encountered during actual<br />

application design, development, deployment, tuning, marketing, and selling.<br />

<strong>LumenVox</strong> training will give you the guidance you need to successfully design, develop, deploy, and<br />

refine your applications. We tailor our trainings to meet your particular needs.<br />

About the Instructors<br />

Our team of expert instructors is committed to your success. Every <strong>LumenVox</strong> instructor has a<br />

background in computer telephony, application development, and speech recognition. We are<br />

familiar with the development challenges you will encounter on a daily basis; we will offer solutions<br />

to routine problems, as well as creative approaches to not-so-routine problems.<br />

<strong>LumenVox</strong> Support Services: Ensuring Your Success<br />

At <strong>LumenVox</strong>, we recognize that high quality, cost effective technical support is a crucial component<br />

of successful application development. With proper support, subscribers gain a deeper product<br />

understanding, resulting in enhanced productivity, and ultimately in greater customer satisfaction.<br />

With this in mind, <strong>LumenVox</strong> offers a simplified technical support system designed to meet varying<br />

customer needs. <strong>LumenVox</strong> technical support plans are available for VARs, Distributors, and End<br />

Users with ongoing projects/support needs. The key component of <strong>LumenVox</strong> technical support is<br />

the Customer Hotline. Two additional avenues are also available: Fax Support and Email Support.<br />

Whichever method you choose, know that <strong>LumenVox</strong> will work efficiently to answer your questions<br />

and resolve the problem.<br />

Our <strong>LumenVox</strong> technical support team is made up of knowledgeable<br />

professionals with extensive <strong>LumenVox</strong> development and support experience.<br />

We are well versed in computer telephony technology and are available to<br />

assist you with:<br />

General <strong>LumenVox</strong> Technical Assistance<br />

Timely Problem Resolution<br />

Product Installation Assistance<br />

<strong>LumenVox</strong> Database/Host Connection Assistance<br />

Intel Dialogic Hardware Optimization<br />

I really appreciate the cooperation<br />

and assistance that we received from the<br />

<strong>LumenVox</strong> engineers. They are an easy<br />

group to work with.<br />

Chris Riggenbach CXM Product<br />

Development Manager<br />

36 37


Our Partners Include...<br />

Keeping People Connected<br />

<strong>LumenVox</strong> and SandCherry Speech Enable 2-way Radio Networks<br />

SandCherry's Voice4 Radio Message System (RMS) dramatically improves workgroup efficiency for<br />

teams with 2-way radio, phone, and Web users by providing easy-to-use voice and data messaging.<br />

The ability to offer messaging to 2-way radios⎯one of the most widely used tools for mobile work<br />

forces⎯adds an entirely new dimension to workgroup communications.<br />

Leaving messages for radio users when they are unavailable, and providing access to the same<br />

system for phone and Web users to leave and retrieve messages, provides the critical bridge<br />

between what had been independent communications networks. The system also offers phone<br />

users a patch capability to connect to the radio network for real-time communication with radio<br />

users. Voice4 RMS equips a workgroup to improve its efficiency and performance using tools and<br />

processes already in place, without relying on a dispatcher.<br />

Using <strong>LumenVox</strong>'s Speech Engine, SandCherry's Voice4 RMS provides an unparalleled level of<br />

voice-driven functionality for mobile workgroups. Voice4 RMS is easy to install and use, and<br />

provides a cost-effective, robust communications solution.<br />

Making Travel Easier<br />

Los Angeles World Airports (LAWA) using <strong>LumenVox</strong> Engine and enGenic technology to<br />

speech-enable travel information<br />

Los Angeles World Airports, including Los Angeles International Airport (LAX), has implemented the<br />

first fully speech enabled voice response system for a major airport network. In compliance with<br />

Homeland Security regulations, LAWA helps callers access the most up-to-date flight information<br />

directly from the airport hotline, rather than calling individual airline carriers.<br />

Now callers can check the status of their flights, get information about parking, ground<br />

transportation, services for people with disabilities, inquire about lost and found items, contact<br />

administrative offices, and get directions to the airport⎯all through an automated speech system.<br />

The hotline was created using enGenic's development and engineering tools, and <strong>LumenVox</strong>'s<br />

Speech Engine. Jim Coulter, CEO of enGenic, states, "<strong>LumenVox</strong> provides us with fast and accurate<br />

recognition of speech grammars, in an ever-changing world environment. Callers around the world,<br />

with different accents, can use simple English commands to obtain information from all four<br />

airports, over 185 airlines, 40 different taxi and limousine services, and 20 different airport<br />

departments."<br />

The new system will enable us to<br />

continue our rapid growth and at the<br />

same time improve the efficiency of our<br />

ordering system without adding<br />

additional staff.<br />

Mike Gilson<br />

Vice President of ATA Retail<br />

Services<br />

Helping Doctors Treat Patients<br />

<strong>LumenVox</strong> Speech Engine Used in Innovative Patient Follow-up System<br />

The University of Ottawa Heart Institute in Ottawa, Canada has implemented an innovative<br />

automated patient follow-up system developed by <strong>LumenVox</strong> partner TelASK Technologies Inc.<br />

(www.TelASK.com). The TelASK System incorporates the <strong>LumenVox</strong> Speech Engine, allowing the<br />

Ottawa Heart Institute to closely monitor the progress and recovery of surgical patients by placing<br />

an automated call to their homes on day three, and again on day ten, after the patients have been<br />

discharged. The speech enabled outbound dialing system automatically phones the patient and<br />

delivers a pre-set list of questions. Each question is a strong indicator of the patient's progress.<br />

Using a specially designed algorithm, patients' answers are grouped as either requiring an<br />

immediate call back, contact within a day, or progressing normally. In the event of a response that<br />

may indicate a problem, the system will hold callers on the line and connect them to an Advance<br />

Practice Nurse at the Heart Institute for immediate attention. The University of Ottawa Heart<br />

Institute will also use this system to monitor patients with Acute Coronary Syndrome, and patients<br />

participating in their Smoking Cessation programs.<br />

38<br />

39


Types of Speech Recognition<br />

Speech recognition is used in a wide range of applications, from<br />

automated commercial phone systems to enhancing personal<br />

productivity. This technology appeals to anyone who needs or wants a<br />

hands-free approach to computing tasks.<br />

There are two main types of speaker models: speaker independent and speaker dependent.<br />

Speaker independent models recognize the speech patterns of a large group of people. Speaker<br />

dependent models recognize speech patterns from only one person. Both models use mathematical<br />

and statistical formulas to yield the best word match for speech. A third variation of speaker<br />

models is now emerging, called speaker adaptive. Speaker adaptive systems usually begin with a<br />

speaker independent model and adjust these models more closely to each individual during a brief<br />

training period.<br />

Leveraging the Power of Speech<br />

Although many companies' first instincts are to simply speech enable their existing DTMF<br />

applications, doing so does not leverage the power and strengths of speech: speech enabling a<br />

DTMF application will not make the system smoother, faster or easier-to-use.<br />

The combination speech/DTMF system lengthens already complex menus, by adding the "press or<br />

say" routine⎯"For Checking, press or say 'one,'" or even worse, "For Checking, press one or say<br />

'checking'." The typical combination speech/DTMF system requires the caller to remember too<br />

much, putting undue burdens on the caller.<br />

Migrating a DTMF application and its prompt design does not fully utilize the conversational aspect<br />

of speech. For speech applications to perform well, the call flow and dialog design is crucial.<br />

Designers must study the user patterns of the existing system, so they can redesign prompts,<br />

menus, and change the steps of the call flow to make the experience faster and more pleasant for<br />

callers.<br />

Well-designed speech applications offer many advantages over the combination speech/DTMF<br />

systems.<br />

By using a speech enabled system, our<br />

merchandisers realize significant time<br />

savings through 24-hour-a-day telephone<br />

access to information, from delivery status<br />

to new delivery days and product<br />

opportunities.<br />

Mike Gilson Vice President of ATA Retail<br />

Services<br />

Speech is:<br />

More Human<br />

With speech, prompts are phrased as easy<br />

questions, and callers can answer simply and<br />

naturally⎯with their voices. Speech systems<br />

provide a more natural interface than touch tone<br />

menus.<br />

Smooth and Fast<br />

Good speech call flow designs permit callers to<br />

get what they need faster, without having to wade<br />

through cumbersome filter menus.<br />

Easy-to-Use<br />

Navigation is much simpler, and callers can use<br />

the application with the interface mechanisms<br />

they are most familiar with: their voices.<br />

More Personal<br />

Speech applications give the impression of the<br />

ideal employee: attentive, empathic, alert, and<br />

consistently agreeable, rather than an impersonal<br />

string of numbers and tones.<br />

When to Use DTMF<br />

Sometimes, DTMF is appropriate: as an errorhandling<br />

backup, or in special, security-sensitive<br />

interactions, such as pin code or credit card entry.<br />

In terms of customer satisfaction for most calls,<br />

speech applications outperform DTMF. Rather<br />

than speech-enabling an existing DTMF system,<br />

design your application with a conversation in<br />

mind⎯and learn to leverage the power of speech.<br />

40 41


State of the Industry<br />

To get an accurate understanding of the current state of the speech<br />

industry, we must first look at the history of Interactive Voice Response<br />

(IVR) systems.<br />

Companies have attempted to handle customer interactions with touch tone IVR systems or live<br />

agents. Yet, most customers become frustrated with ineffective DTMF interfaces, or hang up while<br />

holding for a live agent. In order to support customer interactions more quickly and efficiently,<br />

companies began to request speech recognition interfaces.<br />

This movement towards speech provided the speech recognition industry with tremendous growth<br />

potential, however, many companies consider speech to be in the early adopter stage of the market.<br />

Why is that?<br />

While people have been hearing about speech for decades, only in the past decade or so have<br />

advances in the technology and supporting hardware allowed speech to finally become a viable<br />

option, with most performing tasks with over 90 percent accuracy. During this period many Fortune<br />

500 companies implemented speech recognition, and helped educate consumers on how to interact<br />

with speech applications. These applications have become so advanced and mainstream, that<br />

businesses⎯both large and small⎯now turn to speech solutions for everything from basic autoattendants<br />

to more complex order-taking systems.<br />

Vendor Selection Tips<br />

Select a partner with the technological and business<br />

expertise that best suits your company and future<br />

projects. Ensure that the partner you choose will<br />

provide all of the services and products you'll<br />

need to be successful.<br />

Search for tools that allow for every change and<br />

adjustment to be automatically and rigorously<br />

tested, with actual historical call data.<br />

Include a tool in your development process that<br />

verifies any dead ends or unfinished work.<br />

Choose technologies that best fit with your application<br />

requirements.<br />

Select partners with deployment experience.<br />

Verify the level of technical support provided⎯ensure that<br />

you will receive the support you need.<br />

Speech recognition offers a great solution for large and small businesses: it simplifies customer<br />

interactions, increases efficiency, and reduces operating costs.<br />

Analysts at Cahners In-Stat and Giga report that calls that are handled by live agents can have an<br />

average cost-per-call from $2 to over $15. With speech recognition, the average cost-per-call can<br />

be cut to $.20 per call or less.<br />

<strong>LumenVox</strong>’s corporate and product<br />

strategy is right in sync with us.<br />

Since this technology appeals to anyone who needs or wants a hands-free approach to computing<br />

tasks, it is becoming a standard software option. At <strong>LumenVox</strong>, we focus on developing tools that<br />

will empower the user to build, customize, maintain, and refine their own applications.<br />

Vern Baker<br />

President of enGenic<br />

Corporation<br />

Speech recognition system development is still something that requires time to prepare and<br />

monitor. With tools like <strong>LumenVox</strong>'s Speech Platform and Speech Tuner, we are continually working<br />

to simplify the aspects of speech application development⎯to help your business get the most out<br />

of speech recognition.<br />

42 43


Effective Design = Customer Satisfaction<br />

Speech recognition applications come in many varieties, from simple<br />

routers to complex ordering systems. What designers must remember is<br />

ease-of-use. Even with a complex system, callers must be able to<br />

navigate through the system easily.<br />

Speech applications allow customers to accomplish their goals quickly and easily. Much of the<br />

internal work is in the design phase: building the call flow, creating grammars, recording prompts,<br />

and conducting usability testing. Speech application designers will modify each aspect throughout<br />

design and internal testing phases.<br />

But with all the speech applications on the market today, and most prominent speech companies<br />

boasting recognition accuracy in the high 90%'s, why do so many people feel that "speech doesn't<br />

work?"<br />

Usually, it's because callers don't know what to say.<br />

You can avoid common problems by carrying<br />

out the following steps during the design phase:<br />

1 Research Needs and Create Initial Design.<br />

First, speak to the people who currently answer phone calls, and get their<br />

input. What current questions and interactions could potentially be automated?<br />

If the company already uses a DTMF application, how well does DTMF handle<br />

these interactions and questions? Not all interactions match well with speech's<br />

capabilities, so initial research is critical.<br />

Next, sketch out a potential progression of the call flow, and share this with<br />

others to make sure the progression makes sense and that callers can quickly<br />

and easily navigate the system.<br />

2Develop Prompts and Grammars.<br />

Designers should decide how much of a "natural language" system callers need<br />

or desire. "How may I help you?" only works if callers know precisely what<br />

they want, and the designer can accurately predict their responses. Generally,<br />

callers will need some guidance and cues as to what to say. A "How may I<br />

help you?" question involves more extensive grammar development and testing<br />

than a question like: "We offer three choices, A, B, or C. Which would you<br />

prefer?"<br />

The customer service that we have<br />

received from the <strong>LumenVox</strong> team has<br />

always been top of the line and we are<br />

looking forward to continuing our<br />

partnership for many years in the<br />

future.<br />

The application developer must keep the system as conversational as possible,<br />

but prevent callers from treating the machine as a human. When callers think<br />

the system "actually" understands, or think that it possesses a greater<br />

vocabulary than it really does, they get lost⎯or make requests that are outside<br />

the system's capabilities. These problems rapidly compound, resulting in caller<br />

frustration and dissatisfaction. Effective, intentional, clearly designed prompts<br />

and grammars help keep customers satisfied while managing cost.<br />

3 Test with Real Customers.<br />

The ultimate measure of an application is the first live deployment of the<br />

system. The first live deployment must be a test version with actual users, not<br />

the programmers who are intimately familiar with the application's design.<br />

This will be the first time that assumptions about user behavior will be<br />

seriously tested; the resulting data will allow designers to modify the<br />

application to meet the caller's needs.<br />

4 Tune with Real Data.<br />

Test deployments permit designers to fine-tune the system, often resulting in<br />

significant changes, if they review actual caller experiences. By refining<br />

prompts, grammars, and call flow design, the application will become more<br />

robust, error-free, clear, and effective⎯in short, an application that customers<br />

will want to use.<br />

To tune the application effectively, all of the components of initial<br />

design⎯prompts, grammars, call flow, and the persona of the call<br />

system⎯must be tested. Since these elements are often built separately,<br />

Victor Salazar President of Technical<br />

developers must ensure that all of the parts combine effectively in the testing<br />

Support Systems<br />

phase to achieve the desired effects. Properly tuning the speech application<br />

involves a thorough assessment of initial design components and real caller<br />

44<br />

interactions.<br />

45


Error Handling<br />

Make your speech system more efficient and usable by optimizing error<br />

handling.<br />

Focus on the basics: Is the system accurate? Does the caller achieve call completion? Does the<br />

caller like using the system, and want to continue doing business with your company?<br />

An optimal application addresses these issues by combining technology with art. Mixing technical<br />

aspects, like programming and testing, with aesthetic elements, like writing, casting, and coaching,<br />

reduces errors and increases customer satisfaction. Give sufficient consideration to each part of<br />

the development process.<br />

Understand the speech recognizer that you are using⎯the confidence scores it returns allow you to<br />

make good decisions about the call flow. Track confidence scores at the project, grammar, and<br />

single call levels, to set both static and dynamic thresholds. This will permit the system to make a<br />

good decision on whether or not to confirm.<br />

Remember that it is better to confirm than to make a mistake. Although confirming can be<br />

unpleasant for callers, it is preferable to the frustration of being lost. Figure out when you need to<br />

confirm using the confidence scores, and try to make the confirmation prompts less complex than<br />

the original prompt. If you choose your confirmations wisely, even though it takes a little longer,<br />

users will not become irritated or impatient, and will get to where they need to be.<br />

Help Callers Avoid Frustration…<br />

Errors occur because callers get lost⎯or are unsure of what to say. In either case, this<br />

is usually a prompt issue. Effective prompt writing guides callers to say what is in the<br />

grammar. Another option is to make the grammar robust enough to handle reasonable<br />

requests, even when the caller's phrasing is clumsy.<br />

No matter the style, the prompt should always focus on the task at hand: moving the<br />

call forward.<br />

Sequential prompts should be connected with transitions, e.g., first, next, finally.<br />

Prompts can also use related questions, as in "Please tell me your account<br />

number…And your pin code."<br />

Although long prompts should generally be avoided, sometimes a few extra words pay<br />

off. Phrases like "Just to confirm," "Almost finished," and "So I can help you better"<br />

create a forward mental model…in these cases, reassuring the caller is more beneficial<br />

than the few seconds you gain by clipping the prompt.<br />

Whatever your company's style or needs, <strong>LumenVox</strong> can help.<br />

Solving Problems<br />

When an error happens, fix it! And don't let it happen<br />

again.<br />

Sounds simple enough, but how? And what will<br />

customers do in the meantime?<br />

Figuring out why an error happens is the key to fixing<br />

it. The feedback callers provide is often vague;<br />

instead, go straight for empirical data. Examine the<br />

context of the call to help pinpoint what caused the<br />

error: too much background noise; not enough<br />

volume; mispronunciation. Invariably, errors will<br />

occur. To provide great customer service through<br />

speech applications, we need to minimize errors, and<br />

make the caller as comfortable as possible when errors<br />

do occur.<br />

Error-handling interfaces often increase caller<br />

frustration. Recognize any of these?<br />

Adversarial error responses:<br />

"I need you to be more specific."<br />

Generic responses:<br />

"I'm sorry, I didn't understand."<br />

Annoyingly enthusiastic responses:<br />

"Let's try it again!"<br />

When developing speech applications, most companies strive to<br />

achieve a balance between saving money on live agents and<br />

providing better service than traditional DTMF systems.<br />

Unfortunately, callers are sometimes uncomfortable with speech<br />

applications⎯they try to talk as they would to a human, or worse,<br />

speak in a stilted way, because they think computers will better<br />

process their requests. Successful experiences with your call<br />

system will help increase caller confidence⎯and when errors<br />

occur, callers will be more patient if they have had positive<br />

experiences in the past.<br />

Improving the caller experience is about good service.<br />

Thinking carefully about the error interface, designing<br />

effective prompts, testing the call system often, and<br />

examining the context of errors can help improve<br />

the caller's satisfaction with the speech<br />

application…and with your business.<br />

We are delighted that an industry<br />

leader like <strong>LumenVox</strong> has met the market<br />

demand with a product specialized for our<br />

TeleVantage platform.<br />

Rob Black Product Marketing Manager<br />

of Vertical (formerly Artisoft)<br />

46 47


Voice Matters<br />

The voice of your speech application is your company's<br />

first representative. Choose the voice wisely. You're not<br />

just looking for a type of voice; you're looking for an<br />

emissary. The voice needs to be able to explain, inspire,<br />

soothe, excite, and above all else, sound sincere.<br />

Assuming the Voice User Interface (VUI) design is good, the prompts must be<br />

recorded to best represent that design. The four essential pieces for great<br />

prompt recording include:<br />

Great Casting<br />

Great Directing<br />

Great Concatenate Recording<br />

Great "Voice"<br />

Think about your desired call experience⎯and consider using professional<br />

voice talent rather than your receptionist. Professionals will provide<br />

better sounding prompts: they will know appropriate variations in<br />

pitch, rhythm, speed, duration of pauses, and elongation of words.<br />

In addition, professionals will avoid novice errors like wobble,<br />

nervous tics, sloppy diction, and colloquial pronunciations. Most<br />

importantly, talent is directable; professionals can respond to<br />

instructions regarding persona ideas and desired inflections, to<br />

create the proper tone for the application.<br />

Voice Talents versus Voice Actors<br />

Voice Talents are people who speak well, with good resonance<br />

and intonation. Voice Actors are people who are Voice Talents,<br />

but through the use of their voices alone, can also convey<br />

character, humor, sincerity, and meaning. Voice actors can take<br />

your application to the next level, providing a better experience<br />

for your callers.<br />

Voice Actors do not necessarily over-articulate everything. They<br />

stress the most important words and concepts, which often<br />

corresponds to what the system is trying to recognize. Voice Actors will<br />

also know how to record concatenated prompts with consistency. If<br />

feasible, using a Voice Actor will give your speech application polish⎯and<br />

this polish, combined with a well-designed prompt and grammar, will lead to<br />

more satisfied customers.<br />

Prompt Tips<br />

Never disguise the system as being a real person.<br />

Ensure that prompts elicit a predictable response.<br />

Offer the needed information at the right time,<br />

and try not to frontload the application with too<br />

much information.<br />

Never use "Press or Say 1" style prompts.<br />

List options from specific to general⎯so that<br />

users do not choose a general category when a<br />

more specific one will be offered later.<br />

Allow more time for difficult tasks: the pace of<br />

the recorded prompts dictates the pace at which<br />

the callers will respond.<br />

Keep the caller in the transaction by saying<br />

"first…next…finally" or similar transition words, in<br />

the appropriate places.<br />

Insert pauses in your prompts to allow<br />

experienced callers a turn-taking option to<br />

interrupt and move forward in the application.<br />

Provide audio rewards at the completion of<br />

difficult transactions, such as "great, wonderful,<br />

excellent..."<br />

<strong>LumenVox</strong>'s software proved to be<br />

very effective in precisely interpreting<br />

callers’ speech patterns.<br />

Chris Riggenbach CXM Product<br />

Development Manager<br />

48 49


1<br />

2<br />

3<br />

Alternate Pronunciations<br />

One of the most useful speech applications today is the front-end call<br />

router, however, it's also one of the most challenging applications<br />

because the system must recognize names.<br />

Sometimes names are derived from languages other than English, and the<br />

pronunciation reflects rules from the other language. Often these names<br />

contain sounds that are not apparent from the spelling, or the caller<br />

stresses a syllable that differs from the common pronunciation.<br />

Imagine a person looking at the name "Elicia." Is the initial sound a<br />

soft 'AX' as in "about", an 'EH' as in "bed" or is it stressed heavily<br />

with a long 'IY' as in "equal?" Is the third syllable pronounced with<br />

an 's' sound or a 'SH'?" The speech application needs help to<br />

determine this.<br />

If a word or name is not in the dictionary, the Speech Recognition<br />

Engine will try to figure out how that word is pronounced using a set<br />

of phonetic rules, similar to how a person might try sounding out the<br />

new word. Unfortunately, the Speech Engine is not always correct. A<br />

good indicator is that if a person has trouble figuring out how to<br />

pronounce a name, the speech engine will, too.<br />

Steps for Developing a Smooth Call Router:<br />

Figure out who the incoming callers are. Are they strangers, people who know the employees, or<br />

a combination of both? In other words, will the callers know how to pronounce the name correctly,<br />

or will other likely pronunciations need to be added? Do the callers refer to employees by first or<br />

last name only, and are they familiar enough to know people's nicknames?<br />

Find out what the Speech Engine thinks is the correct pronunciation, or whether other<br />

pronunciations are needed. You can use the Phonetic Speller located in the Speech Platform to<br />

see how the Speech Engine determines the pronunciations of the names, without having to run the<br />

Speech Engine itself.<br />

Add the new pronunciations for names into the grammar. Here's an illustration of this process:<br />

The name "Paty" (pronounced "Patty") is not a common spelling and is not in the Speech Engine's<br />

dictionary. When typing it into the Phonetic Speller, the system returns 'P EY DX IY' which sounds<br />

something like "Paydee", instead of the correct 'P AE DX IY'. To add a new pronunciation, the<br />

phonemes will need to be made within a set of curly braces. Adding a colon followed by the true<br />

spelling of the name helps readability, so it's a good idea to include it. The final entry of {P AE DX<br />

IY: Paty} as an alternative pronunciation should help the system's performance, since now callers<br />

who say "Patty" will be more likely to be recognized, instead of the incorrect {P EY DX IY}.<br />

Phonemes<br />

The unit of sound the recognition engine actually recognizes is the phoneme. All phrase formats<br />

are ultimately translated into phonetic spellings for decoding. These phonetic spellings can be<br />

directly entered if surrounded by curly braces.<br />

The phonetic alphabet used by the American English language model is below.<br />

Phoneme Example 1 Phonetic Spelling 1 Example 2 Phonetic Spelling 2<br />

Considering alternate pronunciations and spellings at the outset will help avoid errors<br />

50 and frustration later!<br />

51<br />

Vowels<br />

AA<br />

AE<br />

AH<br />

AO<br />

AW<br />

AX<br />

AXR<br />

AY<br />

EH<br />

ER<br />

EY<br />

IH<br />

IX<br />

IY<br />

OW<br />

OY<br />

UH<br />

UW<br />

Consonants<br />

B<br />

CH<br />

D<br />

DH<br />

DX<br />

F<br />

G<br />

HH<br />

JH<br />

K<br />

L<br />

M<br />

N<br />

NG<br />

P<br />

R<br />

S<br />

SH<br />

T<br />

TH<br />

V<br />

W<br />

Y<br />

Z<br />

ZH<br />

barn<br />

bat<br />

what<br />

more<br />

cow<br />

about<br />

butter<br />

type<br />

check<br />

church<br />

take<br />

little<br />

action<br />

team<br />

loan<br />

hoist<br />

book<br />

flew<br />

web<br />

chair<br />

reed<br />

with<br />

forty<br />

four<br />

peg<br />

halt<br />

cage<br />

coin<br />

late<br />

lemon<br />

night<br />

ring<br />

pay<br />

rest<br />

sit<br />

blush<br />

raft<br />

three<br />

van<br />

swap<br />

yes<br />

arms<br />

Asian<br />

B AA R N<br />

B AE T<br />

W AH T<br />

M AO R<br />

C AW<br />

AX B AW T<br />

B AH DX AXR<br />

T AY P<br />

CH EH K<br />

CH ER CH<br />

T EY K<br />

L IH DX AX L<br />

AE K SH IX N<br />

T IY M<br />

L OW N<br />

H OY S T<br />

B UH K<br />

F L UW<br />

W EH B<br />

CH EY R<br />

R IY D<br />

W IH DH<br />

F AO R DX IY<br />

F AO R<br />

P EH G<br />

HH AO L T<br />

K EY JH<br />

K OY N<br />

L EY T<br />

L EH M AH N<br />

N AY T<br />

R IH NG<br />

P EY<br />

R EH S T<br />

S IH T<br />

B L AH SH<br />

R AE F T<br />

TH R IY<br />

V AE N<br />

S W AA P<br />

Y EH S<br />

AA R M Z<br />

EY ZH AH N<br />

top<br />

crab<br />

cut<br />

auto<br />

house<br />

dial<br />

career<br />

life<br />

mess<br />

bird<br />

hail<br />

rib<br />

women<br />

keep<br />

robe<br />

joy<br />

look<br />

who<br />

bear<br />

statue<br />

dark<br />

other<br />

butter<br />

graph<br />

exam<br />

Jose<br />

Jack<br />

back<br />

really<br />

mail<br />

any<br />

ankle<br />

beep<br />

prior<br />

bass<br />

sure<br />

taped<br />

youth<br />

river<br />

wing<br />

year<br />

blaze<br />

genre<br />

T AA P<br />

K R AE B<br />

K AH T<br />

AO T OW<br />

HH AW S<br />

D AY AX L<br />

K AXR IH R<br />

K AY F<br />

M EH S<br />

B ER D<br />

HH EY L<br />

R IH B<br />

W IH M IX N<br />

K IY P<br />

R OW B<br />

JH OY<br />

L UH K<br />

HH UW<br />

B EH R<br />

S T AE CH UW<br />

D AA R K<br />

AH DH ER<br />

B AH DX AXR<br />

G R AE F<br />

IH G Z AE M<br />

HH OW Z EY<br />

JH AE K<br />

B AE K<br />

R IH L IY<br />

M EY L<br />

EH N IY<br />

AE NG K AH L<br />

B IY P<br />

P R AY ER<br />

B AE S<br />

SH UH R<br />

Y EY P T<br />

Y UW TH<br />

R IH V AXR<br />

W IH NG<br />

Y IY R<br />

B L EY Z<br />

ZH AA N R AH


Practical Guide To Tuning<br />

Untuned speech applications do not survive contact with customers:<br />

whether your company has live speech applications in deployment<br />

today, plans to implement one within the next three to six months, or is<br />

only beginning to consider adding speech applications, you should<br />

consider the importance of tuning. Tuning uses prompts, grammars,<br />

call flow, and caller data to improve the speech application as a whole.<br />

There are three ideas to keep in mind when approaching<br />

the tuning task:<br />

1<br />

Tuning Takes Time.<br />

Even the best of "best-practices" build on assumptions that might not hold<br />

true after deployment⎯once you have callers, you must often readjust or<br />

remove these assumptions to provide the quality experience callers expect.<br />

To give an idea of how much time tuning can take, the speech industry<br />

estimates that 40-50% of total development and deployment time should<br />

be spent on the tuning process. Putting emphasis on tuning will help your<br />

application run more smoothly, keeping callers happy.<br />

2<br />

Adapt the System to the Caller.<br />

Start with Small Changes.<br />

In general, you will not be able to make<br />

users do anything in any particular way.<br />

You can, and should, give as much<br />

guidance for callers as possible, but<br />

ultimately the caller dictates the<br />

conversation. The trick is to provide<br />

good cues and guidelines, so callers<br />

choose the pathway you designed for<br />

the application. Remember that if the<br />

system fails to meet the caller's needs,<br />

it's not the caller who has failed; it's<br />

the speech application.<br />

3<br />

It is all too easy to get caught up in the<br />

moment, expending hours of effort on a<br />

seemingly enormous problem⎯for something<br />

that really only affects a few out of several<br />

hundred callers. Identify the issues that are the<br />

easiest to resolve and provide the biggest<br />

benefit. Making small changes to improve the<br />

experience for most callers is preferable to<br />

costly changes that only benefit a few.<br />

Instead, try this process when tuning an application:<br />

Familiarize Yourself with the Caller's Experiences.<br />

1<br />

2<br />

Do this by listening to the calls, from start to finish. Compare<br />

the ASR results with respect to the audio prompts and the<br />

caller's speech. Transcribe the audio, so you can analyze<br />

the accuracy and performance.<br />

Use your ASR platform's reporting and analytical tools<br />

to maximize your information. You can even use<br />

<strong>LumenVox</strong> Speech Tuner on Nuance's 8.5 or<br />

ScanSoft's OSR.<br />

Above all, identify the key issues and prioritize them.<br />

Solve the easiest dilemmas first, like typical grammar<br />

problems. Then, move to prompt and dialogue<br />

changes, and finally proceed to acoustic model<br />

training and adaptations.<br />

Test Changes Rigorously.<br />

When you make a change, you must test it. You did<br />

the transcripts, and so you have the grammar and<br />

audio data: as much as possible, test under 'real'<br />

conditions. Give yourself the assurance that any<br />

change will help, and then test to find what solution<br />

works best.<br />

What you shouldn't do when tuning a speech application:<br />

Don’t Make Changes<br />

Based on One Instance.<br />

This should be fairly obvious, but it still<br />

happens. Making changes based on a single<br />

instance usually results in fixing a problem that<br />

doesn't really exist. There are numerous<br />

'one-off' errors in speech recognition, many of<br />

which are associated with noise, or transient<br />

effects that won't be generally reproducible.<br />

Real issues will arise multiple times, in multiple<br />

places, with plenty of evidence to help you<br />

decide how to solve them.<br />

Don’t Make Changes on<br />

Unanalyzed Reports.<br />

Treat the report with respect: analyze the call,<br />

compare it with other calls, see what really<br />

happened⎯often, the system worked as<br />

designed, but the design was flawed. Research<br />

the problem carefully so that you avoid<br />

unnecessary (and costly) changes.<br />

The <strong>LumenVox</strong> support team is<br />

always available and always willing to go<br />

the extra mile to provide us with<br />

excellent support.<br />

Derek Henry<br />

CEO of 1-800-US-LOTTO<br />

Corporation<br />

52 53


Tuning Grammars<br />

There are many places to make effective changes, but generally, we have<br />

found grammars to be the easiest and most effective place to start. In<br />

this segment, we will look at how to detect errors and modify grammars<br />

to optimize performance.<br />

Grammar Terms<br />

In-Grammar (IG) and Out-of-Grammar (OOG) are labels that look at whether the ASR matches<br />

a path in the grammar with what the caller actually said. If it can, then the spoken words are<br />

In-Grammar, if not, the spoken words are considered Out-of-Grammar.<br />

Confidence scores indicate the ASR's certainty about the answer it returns.<br />

Confirmations are dialog techniques to help the speech application avoid making a mistake, in<br />

cases where the results are ambiguous.<br />

Substitutions are a particular kind of error the ASR makes; this occurs when the result from the<br />

ASR does not match the words the caller said.<br />

<strong>LumenVox</strong>’s development and support<br />

staff has been very responsive to our<br />

requirements and issues.<br />

Out-Of-Grammar Indicators<br />

There are a number of ways to determine whether or not the error is due to an Out-of-Grammar<br />

issue. The easiest, most efficient way to discover this is to use the tools provided by the platform<br />

or ASR provider. Pre-configured reports will usually highlight these issues up front.<br />

If Out-of-Grammar is a big problem, you will likely receive many customer complaints and low<br />

completion and usage rates. Call lots will also have many "No Matches" or empty results. Finally,<br />

when you do obtain results, the confidence score will be significantly lower than the rest of the<br />

application.<br />

As the call logs are transcribed, look for low accuracies. ASRs will usually get anything that is<br />

In-Grammar, so if there is still low accuracy, look for a bunch of Out-of-Grammar speech.<br />

The other good indicator is a high Out-of-Grammar rate. Generally, this is a direct indication that<br />

callers are not saying things your grammar understands.<br />

There are a few reasons for Out-of-Grammar problems, most of which are easy to resolve.<br />

One common reason is that the grammar designer simply forgot to add items, but callers are asking<br />

for them. Leaving "next" out of a navigation grammar, or forgetting to add a product name is not<br />

uncommon.<br />

Another common error is forgetting common synonyms, for example, 'copier', but not 'Xerox', or<br />

different dialectal versions such as 'soda' in the west, and 'pop' in the South. Usually, you can just<br />

add the missing items.<br />

In-Grammar Indicators<br />

Typical In-Grammar issues will often be oriented towards improving the confidence scores of<br />

recognition, but you will also confront misrecognitions.<br />

Doug Behl<br />

President of Malibu Software<br />

What kinds of issues arise frequently?<br />

Regularly confused phrases are fairly common, and often result because two or more phrases sound<br />

quite similar. Another common issue is the result of bad pronunciations in the grammar. ASRs<br />

provide a methodology for arriving at pronunciations for words that aren't in the dictionary, but the<br />

automatic pronunciations are not always the best.<br />

So, how can we handle this?<br />

For regularly confused phrases, differentiate them by choosing alternate ways to describe the<br />

words. For bad pronunciations, you must add the pronunciations to the dictionary or grammar.<br />

There are tools for helping with this task, although they are nearly always ASR-specific.<br />

54 55


Failures and Fixes: Common<br />

Prompt Tuning Issues<br />

Effective prompt design takes time and practice⎯some errors will not<br />

present themselves until the prompts are tested. There are, however,<br />

some common issues that arise when tuning prompts, which should<br />

help streamline your prompt design and tuning process.<br />

Here's what you should do when callers…<br />

…Give Long, Perplexing Answers<br />

Callers continuously give full sentence answers instead of short, to the point answers. Typically,<br />

this occurs because the prompt asks a very open-ended question, such as "How may I help<br />

you?" Avoid these open-ended prompts; callers usually do not know what responses are<br />

appropriate at particular points in the call. The only real solution when this error occurs is to<br />

redesign the prompt to be more specific, or redesign the interaction to focus caller responses<br />

on specific tasks.<br />

…Answer with Out-of-Grammar (OOG) Phrases<br />

Callers regularly use a particular phrase that is not in the grammar. Prompts are designed to<br />

elicit particular pieces of information from the caller. Because of this, the prompts usually try to<br />

lead the caller to using the correct words or phrases to minimize recognition errors and caller<br />

confusion. When callers regularly use Out-of-Grammar phrases, it's usually because the prompt<br />

leads them to the wrong phrases. Two choices are available: include the Out-of-Grammar<br />

phrases, or revise the prompt to more obviously reflect available options.<br />

…Answer Randomly, or 'Hunt' for the Right Phrase<br />

Unclear, incomplete prompts force the caller to search unnecessarily for the correct response.<br />

Adding clarifying information will generally fix this problem.<br />

Transcription and Training<br />

Humans are exceptionally good at speech processing.<br />

We handle a variety of accents, speaking styles, pitch<br />

differences, noise, and more, with a high degree of<br />

accuracy. No speakers say the same word exactly the<br />

same way. Needless to say, this represents a considerable<br />

challenge for automated speech recognizers. Accurate<br />

transcription and tuning is essential.<br />

Every speech recognizer uses a statistical model of speech in<br />

order to perform good recognition. These models are built<br />

during training, where speech audio and text transcriptions are<br />

combined with algorithms that 'learn' how speech sounds. The<br />

models attempt to determine what 'average' speakers sound<br />

like when they speak particular words, and apply that<br />

knowledge to new incoming speech to determine what words<br />

were spoken.<br />

Words and speaking styles are different for every<br />

application domain (i.e., the vocabulary for a travel<br />

system is quite different from that of a financial<br />

application). Speech applications benefit from<br />

acoustic models specially trained with data<br />

from their specific domains.<br />

Transcribing audio data must be exact, word<br />

for word, and include noise tags so the<br />

system can learn the differences between<br />

noise and speech. To do this, the data must<br />

include as many speakers (both male and<br />

female) as possible, so that the new acoustic<br />

models accurately reflect the average speaker, and<br />

not just one or two particular speakers.<br />

With a larger volume of transcribed audio data,<br />

new models will perform better. New acoustic<br />

models will likely require a new round of<br />

tuning, particularly with respect to<br />

confirmation thresholds.<br />

…Answer 'Yes' or 'No,' Instead of Expected Content<br />

If callers respond 'yes' or 'no' when the prompt requests a content word, a poorly designed<br />

prompt is responsible. For example, the prompt might ask a question like "Do you want..."<br />

or "Would you like..." and pauses after the first choice. The pause can be long enough to<br />

make the caller believe the desired answer is yes or no, rather than a list of choices.<br />

Similarly, multi-item lists may pause too long at later points, as in, "Would you like<br />

pizza, soft drinks, or side items?" Generally, we recommend that<br />

you reserve "Do" and "Would" lead-offs for questions that require yes or no answers.<br />

<strong>LumenVox</strong> support is very fast and<br />

accessible.<br />

Kelly Lumpkin CEO of Alternate Access<br />

56 57


Standards and Systems Supported<br />

Industry Standards<br />

VXML/VoiceXML (Voice eXtensible Markup<br />

Language)<br />

SALT (Speech Application Language Tags)<br />

MRCP (Media Resource Control Protocol)<br />

SRGS (Speech Recognition Grammar<br />

Specification)<br />

SAPI 5 (Microsoft SAPI TTS)<br />

Operating Systems Supported<br />

Linux<br />

Windows NT, 2000, XP, 2003<br />

Telephony Standards<br />

PSTN (Public Switched Telephone<br />

Network)<br />

SIP (Session Initiation Protocol) -<br />

Signaling protocol for Internet<br />

conferencing, telephony, and instant<br />

messaging<br />

VoIP (Voice Over Internet Protocol) -<br />

Protocol to send audio and data<br />

information in digital<br />

DM3 - Intel® series of boards<br />

HMP (Host Media Processing) -<br />

Software that performs media<br />

processing tasks based on Intel®<br />

architecture<br />

Global Call - Protocol for handling the<br />

call control interface for Intel® cards<br />

<strong>LumenVox</strong> allows us to have control<br />

over the development of call flow,<br />

grammars and tuning, while keeping the<br />

costs at a respectable level.<br />

Brian Lauzon<br />

President of TelASK<br />

University Research<br />

Carnegie Mellon University<br />

Performs research for all aspects of speech<br />

recognition, including signal processing,<br />

acoustic model training, language model<br />

training, decoding, spoken language parsing<br />

and interface building.<br />

University of Colorado, Boulder<br />

Focused on research and education in areas<br />

of human communication technology.<br />

Oregon University<br />

Center for Spoken Language-Synthesis,<br />

Recognition and Enhancement.<br />

Stanford University<br />

The Center for the Study of Language and<br />

Information (CSLI) is an Independent<br />

Research Center founded in 1983 by<br />

researchers from Stanford University,<br />

SRI International, and Xerox PARC.<br />

UC Berkeley<br />

The major application area researched in<br />

the Speech Group at ICSI is speech<br />

recognition, although some of this work has<br />

led to basic research in auditory processing.<br />

M.I.T.<br />

Computer Science and Artificial Intelligence<br />

Laboratory that focuses on Spoken<br />

Language Systems.<br />

Other Speech Groups<br />

Carnegie Mellon Speech Group<br />

Develops user interfaces that improve<br />

human-computer and human-human<br />

communication.<br />

IEICE<br />

The Institute of Electronics, Information and<br />

Communication Engineers aims at the<br />

investigation and exchange of knowledge on<br />

the science and technology of electronics,<br />

information, and communications.<br />

Speech Publications<br />

Speech Technology Magazine<br />

Leading source of information promoting the<br />

speech technology solutions that are<br />

changing communications, and reports the<br />

technology needs of organizations<br />

worldwide.<br />

Telephone Strategy News<br />

Newsletter that includes full coverage of the<br />

impact of the Voice User Interface on<br />

telephony.<br />

ASRNews<br />

Monthly newsletter, which tracks the latest<br />

development in the speech recognition and<br />

text-to-speech marketplace.<br />

Call Center And Telephony<br />

Technology Marketing<br />

Corporation<br />

TMC publishes industry-leading print<br />

magazines including Internet Telephony,<br />

Customer Inter@ction Solutions, and<br />

Communications Solutions.<br />

CMP Media<br />

Leading multimedia company that prints<br />

numerous magazines, including Call<br />

Center and Communications Convergence.<br />

Business Communications<br />

Review Magazine<br />

Magazine for the enterprise network<br />

manager and other communications<br />

professionals.


Standards and Systems Supported<br />

Industry Standards<br />

VXML/VoiceXML (Voice eXtensible Markup<br />

Language)<br />

SALT (Speech Application Language Tags)<br />

MRCP (Media Resource Control Protocol)<br />

SRGS (Speech Recognition Grammar<br />

Specification)<br />

SAPI 5 (Microsoft SAPI TTS)<br />

Operating Systems Supported<br />

Linux<br />

Windows NT, 2000, XP, 2003<br />

Telephony Standards<br />

PSTN (Public Switched Telephone<br />

Network)<br />

SIP (Session Initiation Protocol) -<br />

Signaling protocol for Internet<br />

conferencing, telephony, and instant<br />

messaging<br />

VoIP (Voice Over Internet Protocol) -<br />

Protocol to send audio and data<br />

information in digital<br />

DM3 - Intel® series of boards<br />

HMP (Host Media Processing) -<br />

Software that performs media<br />

processing tasks based on Intel®<br />

architecture<br />

Global Call - Protocol for handling the<br />

call control interface for Intel® cards<br />

<strong>LumenVox</strong> allows us to have control<br />

over the development of call flow,<br />

grammars and tuning, while keeping the<br />

costs at a respectable level.<br />

Brian Lauzon<br />

President of TelASK<br />

University Research<br />

Carnegie Mellon University<br />

Performs research for all aspects of speech<br />

recognition, including signal processing,<br />

acoustic model training, language model<br />

training, decoding, spoken language parsing<br />

and interface building.<br />

University of Colorado, Boulder<br />

Focused on research and education in areas<br />

of human communication technology.<br />

Oregon University<br />

Center for Spoken Language-Synthesis,<br />

Recognition and Enhancement.<br />

Stanford University<br />

The Center for the Study of Language and<br />

Information (CSLI) is an Independent<br />

Research Center founded in 1983 by<br />

researchers from Stanford University,<br />

SRI International, and Xerox PARC.<br />

UC Berkeley<br />

The major application area researched in<br />

the Speech Group at ICSI is speech<br />

recognition, although some of this work has<br />

led to basic research in auditory processing.<br />

M.I.T.<br />

Computer Science and Artificial Intelligence<br />

Laboratory that focuses on Spoken<br />

Language Systems.<br />

Other Speech Groups<br />

Carnegie Mellon Speech Group<br />

Develops user interfaces that improve<br />

human-computer and human-human<br />

communication.<br />

IEICE<br />

The Institute of Electronics, Information and<br />

Communication Engineers aims at the<br />

investigation and exchange of knowledge on<br />

the science and technology of electronics,<br />

information, and communications.<br />

Speech Publications<br />

Speech Technology Magazine<br />

Leading source of information promoting the<br />

speech technology solutions that are<br />

changing communications, and reports the<br />

technology needs of organizations<br />

worldwide.<br />

Telephone Strategy News<br />

Newsletter that includes full coverage of the<br />

impact of the Voice User Interface on<br />

telephony.<br />

ASRNews<br />

Monthly newsletter, which tracks the latest<br />

development in the speech recognition and<br />

text-to-speech marketplace.<br />

Call Center And Telephony<br />

Technology Marketing<br />

Corporation<br />

TMC publishes industry-leading print<br />

magazines including Internet Telephony,<br />

Customer Inter@ction Solutions, and<br />

Communications Solutions.<br />

CMP Media<br />

Leading multimedia company that prints<br />

numerous magazines, including Call<br />

Center and Communications Convergence.<br />

Business Communications<br />

Review Magazine<br />

Magazine for the enterprise network<br />

manager and other communications<br />

professionals.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!