20.03.2013 Views

Real-World Speech Recognition - Haskins Laboratories

Real-World Speech Recognition - Haskins Laboratories

Real-World Speech Recognition - Haskins Laboratories

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

NSF SBE Grand Challenge White Paper: <strong>Real</strong>-<strong>World</strong> <strong>Speech</strong> <strong>Recognition</strong> Philip Rubin<br />

dynamics, and complexity. Our scientific approaches must take such issues<br />

seriously.<br />

The scientific/technological difficulties in this enterprise require both<br />

increased attention to scientific fundamentals and a multidisciplinary<br />

approach that brings together biologists, computer scientists, educators,<br />

engineers, linguistics, neuroscientists, psychologists, physicists, and social<br />

scientists. The scientific questions are numerous and difficult. Critical areas<br />

of importance include:<br />

• the physiology of speech production<br />

• neural representations of speech and language and the control of<br />

motor behavior<br />

• computational modeling of language, speech, and the mental lexicon<br />

• development and use of realistic embodied conversational agents<br />

• physical/physiological modeling of sound production<br />

• a deeper understanding of the physics of sound/gesture production<br />

• rich techniques for auditory/visual scene analysis and parsing<br />

• cognitive, emotional, cultural and social aspects of language<br />

understanding and use<br />

Attacking such difficult issues will also require investments in<br />

infrastructure and substantial advances in tool development, such as:<br />

• computational models of speech production that are open source and<br />

modular, and support comparison and addition of existing and<br />

future articulatory and aerodynamic models, include rich<br />

temporal controls to explore the evolution of<br />

physiological/linguistic events over time<br />

• multimodal, large-scale databases that provide for the display,<br />

analysis and archiving of physiological, video, audio,<br />

4

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!