06.03.2015 Views

1 - LumenVox

1 - LumenVox

1 - LumenVox

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Tuning Processes<br />

<strong>LumenVox</strong>'s Speech Tuner provides full support for <strong>LumenVox</strong>'s Speech<br />

Recognition Engine, Nuance 8.5, ScanSoft OSR 2, and other ASRs. The<br />

Speech Tuner allows you to work with any supported ASR via a single<br />

interface.<br />

<strong>LumenVox</strong> is an active supporter of the Tools committee in the VXML<br />

Forum, and is working to help define standard logging information, to<br />

help ease the tuning process.<br />

The tuning process involves three easy steps:<br />

Import Data.<br />

1<br />

2<br />

3<br />

The basic process is simple. Users import call log data into<br />

the Speech Tuner database. All information stored by the call<br />

log is available in the Speech Tuner. In most cases, log fields<br />

between ASR engines are very similar; when the information<br />

differs, every effort is made to preserve the original data.<br />

Each special case is fully documented.<br />

Transcribe Speech.<br />

Transcribers can type the text of the caller's speech directly<br />

into the Speech Tuner. Once the audio is transcribed, the<br />

Tuner compares audio transcripts with the speech engine<br />

results to determine accuracy, greatly reducing errors<br />

associated with hand evaluations. If semantic interpretations<br />

are available, the transcriber can also mark whether the<br />

semantic interpretation was correct or incorrect. The<br />

transcripts are evaluated using the actual decode grammar,<br />

producing measurements such as word-error-rate, in- and<br />

out-of-grammar rates, and semantic error rates.<br />

Test Immediately.<br />

Selecting an interaction in the Call Log automatically loads<br />

the associated audio and grammar into the Tester. The<br />

grammar can be edited, speech engine parameters set, and<br />

individual recognition tests generated. The Speech Tuner<br />

natively supports industry standard SRGS grammars. Once<br />

a set of possible changes is identified, users can batch test<br />

audio to evaluate performance, using those changes.<br />

The Speech Tuner assumes the user possesses licensed<br />

versions of the relevant ASR, that the ASR platform is up and<br />

running, and that the platform is able to accept connections.<br />

<strong>LumenVox</strong> Speech Tuner Database<br />

The Speech Tuner communicates with an open-source, freeware database called SQLite<br />

(www.sqlite.org). The Speech Tuner manages call log importing, searching, and exporting⎯so<br />

users can focus on the task of tuning, not log management. The database is contained in a single<br />

file, is easy to back up and transport, and can be queried using SQL-92 (see the SQLite website for<br />

full details) from a variety of exterior tools. Other speech engine vendors are free to convert their<br />

native logs to ones the engine understands. The format, content, and semantics of the <strong>LumenVox</strong><br />

Speech Tuner database are published.<br />

The database maintains all the information contained in the original call log. The Speech Tuner<br />

includes not only the decode grammar and ASR results, but also the decode platform, parameter<br />

settings, alternative results, prompt audio, and pre- and post-processed audio.<br />

Depending on the platform logging capabilities, the database can provide more advanced<br />

information, such as ASR result alignments within the audio; the list of phonemes used in the<br />

decode; and word, utterance, and semantic interpretation confidence measurements.<br />

In addition, the Tuner stores all transcripts and evaluations within the call log. As transcripts are<br />

entered into the Speech Tuner, they are automatically evaluated against the decode grammar.<br />

These transcripts, and any notes or additional information, are stored directly into the database.<br />

Individual scores⎯such as word error rate, semantic error rate, and in- and out-of-grammar<br />

measurements⎯are stored along with their alignments, as well as information about how the scores<br />

were reached.<br />

Users can generate a variety of reports from these results, including error rate by grammar or<br />

dialog, confusion matrices, transcription progress, and confidence thresholds for confirmation or<br />

rejection settings.<br />

In the future, <strong>LumenVox</strong>'s Speech Tuner will also support back-end database replacement, for use in<br />

enterprise level systems, where multiple users will be analyzing the same data simultaneously.<br />

Companies who use an ODBC-capable database can replace, with certain SQL changes, the diskbased<br />

SQLite system with an enterprise system such as MS SQL Server 2000, MySQL, PostgreSQL,<br />

and/or Oracle.<br />

<strong>LumenVox</strong> has created speech<br />

recognition products that are easy to<br />

code with and GUI-based tools, such as<br />

the new Speech Tuner that greatly<br />

simplifies post-deployment<br />

maintenance.<br />

Vern Baker<br />

President of enGenic<br />

Corporation<br />

30 31

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!