1 - LumenVox
1 - LumenVox
1 - LumenVox
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Tuning Processes<br />
<strong>LumenVox</strong>'s Speech Tuner provides full support for <strong>LumenVox</strong>'s Speech<br />
Recognition Engine, Nuance 8.5, ScanSoft OSR 2, and other ASRs. The<br />
Speech Tuner allows you to work with any supported ASR via a single<br />
interface.<br />
<strong>LumenVox</strong> is an active supporter of the Tools committee in the VXML<br />
Forum, and is working to help define standard logging information, to<br />
help ease the tuning process.<br />
The tuning process involves three easy steps:<br />
Import Data.<br />
1<br />
2<br />
3<br />
The basic process is simple. Users import call log data into<br />
the Speech Tuner database. All information stored by the call<br />
log is available in the Speech Tuner. In most cases, log fields<br />
between ASR engines are very similar; when the information<br />
differs, every effort is made to preserve the original data.<br />
Each special case is fully documented.<br />
Transcribe Speech.<br />
Transcribers can type the text of the caller's speech directly<br />
into the Speech Tuner. Once the audio is transcribed, the<br />
Tuner compares audio transcripts with the speech engine<br />
results to determine accuracy, greatly reducing errors<br />
associated with hand evaluations. If semantic interpretations<br />
are available, the transcriber can also mark whether the<br />
semantic interpretation was correct or incorrect. The<br />
transcripts are evaluated using the actual decode grammar,<br />
producing measurements such as word-error-rate, in- and<br />
out-of-grammar rates, and semantic error rates.<br />
Test Immediately.<br />
Selecting an interaction in the Call Log automatically loads<br />
the associated audio and grammar into the Tester. The<br />
grammar can be edited, speech engine parameters set, and<br />
individual recognition tests generated. The Speech Tuner<br />
natively supports industry standard SRGS grammars. Once<br />
a set of possible changes is identified, users can batch test<br />
audio to evaluate performance, using those changes.<br />
The Speech Tuner assumes the user possesses licensed<br />
versions of the relevant ASR, that the ASR platform is up and<br />
running, and that the platform is able to accept connections.<br />
<strong>LumenVox</strong> Speech Tuner Database<br />
The Speech Tuner communicates with an open-source, freeware database called SQLite<br />
(www.sqlite.org). The Speech Tuner manages call log importing, searching, and exporting⎯so<br />
users can focus on the task of tuning, not log management. The database is contained in a single<br />
file, is easy to back up and transport, and can be queried using SQL-92 (see the SQLite website for<br />
full details) from a variety of exterior tools. Other speech engine vendors are free to convert their<br />
native logs to ones the engine understands. The format, content, and semantics of the <strong>LumenVox</strong><br />
Speech Tuner database are published.<br />
The database maintains all the information contained in the original call log. The Speech Tuner<br />
includes not only the decode grammar and ASR results, but also the decode platform, parameter<br />
settings, alternative results, prompt audio, and pre- and post-processed audio.<br />
Depending on the platform logging capabilities, the database can provide more advanced<br />
information, such as ASR result alignments within the audio; the list of phonemes used in the<br />
decode; and word, utterance, and semantic interpretation confidence measurements.<br />
In addition, the Tuner stores all transcripts and evaluations within the call log. As transcripts are<br />
entered into the Speech Tuner, they are automatically evaluated against the decode grammar.<br />
These transcripts, and any notes or additional information, are stored directly into the database.<br />
Individual scores⎯such as word error rate, semantic error rate, and in- and out-of-grammar<br />
measurements⎯are stored along with their alignments, as well as information about how the scores<br />
were reached.<br />
Users can generate a variety of reports from these results, including error rate by grammar or<br />
dialog, confusion matrices, transcription progress, and confidence thresholds for confirmation or<br />
rejection settings.<br />
In the future, <strong>LumenVox</strong>'s Speech Tuner will also support back-end database replacement, for use in<br />
enterprise level systems, where multiple users will be analyzing the same data simultaneously.<br />
Companies who use an ODBC-capable database can replace, with certain SQL changes, the diskbased<br />
SQLite system with an enterprise system such as MS SQL Server 2000, MySQL, PostgreSQL,<br />
and/or Oracle.<br />
<strong>LumenVox</strong> has created speech<br />
recognition products that are easy to<br />
code with and GUI-based tools, such as<br />
the new Speech Tuner that greatly<br />
simplifies post-deployment<br />
maintenance.<br />
Vern Baker<br />
President of enGenic<br />
Corporation<br />
30 31