CSL - Cognitive Systems Lab

Vorlesung SS 2008 

Multilinguale Mensch-Maschine 

Kommunikation 

Prof. Dr. Tanja Schultz 

Dipl.-Inform. Felix Putze 

Universität Karlsruhe, Fakultät für Informatik 

Dienstag, 15. April 2008

Cognitive Systems Lab 

2/60 

Überblick 

Vorlesung 1: Übersicht und Einführung 

Allgemeine Informationen zur Vorlesung 

Vorstellen des Lehrstuhls 

interACT 

Hinführung zum Thema 

Inhaltliche Struktur der Vorlesung 

Fragen und Antworten


3/60 

Allgemeine Informationen: Vorlesung 

Weiterführende Vorlesung im Hauptdiplom 

o Vorkenntnisse sind nicht erforderlich 

Prüfungsmöglichkeit: 

o Ja, in Kognitive Systeme und Anthropomatik 

Turnus: 

o Jährlich im SS, 4+0 

o Prüfung in jedem Prüfungszeitraum 

Termine: 

o 27 Vorlesungstermine 

o Di 14:00 – 15:30 (HS -101) und Do 14:00 – 15:30 (SR 131) 

o Start 15.04.08, Ende 17.07.08 

DozentInnen: 

o Prof. Dr. Tanja Schultz 

o Dipl.-Inform. Felix Putze


4/60 

Allgemeine Informationen: Vorlesung 

o Alle Vorlesungsunterlagen befinden sich unter 

http://csl.ira.uka.de > Lehre > SS2008 > MMMK 

o Alle Folien als pdf (kein passwd Schutz) 

o Aktuelle Änderungen, Ankündigungen, Syllabus 

o Gegebenenfalls zusätzliches Material (papers) 

o Grundlagen für Prüfungen: 

o Vorlesungsinhalt, Folien, zusätzliches Material 

o Fragen, Probleme und Kommentare sind jederzeit während der 

Vorlesung willkommen, oder im persönlichen Gespräch: 

o Tanja Schultz (tanja@ira.uka.de) 

o Felix Putze (putze@ira.uka.de) 

o Sprechstunden Tanja Schultz nach Vereinbarung


5/60 

Allgemeine Informationen: CSL 

o Lehrstuhl für Kognitive Systeme seit 1. Juni 2007 

o Universität Karlsruhe, Fakultät für Informatik 

o Institut für Algorithmen und Kognitive Systeme (IAKS) 

o Gründung des Cognitive Systems Laboratory (CSL) 

o Homepage: http://csl.ira.uka.de 

o Informatikgebäude 50.34, 2.OG linker Flügel 

o Kontakt: 

o Prof. Dr.-Ing. Tanja Schultz 

o tanja@ira.uka.de 

o +49 721 608 6300 

o Sekretariat Frau Helga Scherer 

o scherer@ira.uka.de 

o +49 721 608 6312


6/60 

Forschung: Human-Centered Technologies 

Anwendungsfeld Mensch-Maschine Interaktion 

Herasusforerderungen und Aufgagen: 

Produktivität und Usability 

Kommunikation des Menschen mit seiner Umwelt 

im weitesten Sinn: 

Sprache, Bewegung, Biosignale 

Technologien und Methoden: 

Erkennen, Verstehen, Identifizieren 

Statistische Modellierung, Klassifikation, ... 

Anwendungsfeld Mensch-Mensch Kommunikation 

Herausforderung und Aufgaben: 

Sprachenvielfalt, kulturelle Barrieren 

Aufwand und Kosten


7/60 

Lehre am CSL – Winter 

Wintersemester 

o Biosignale und Benutzerschnittstellen 

o 4+0, prüfbar in Kognitive Systeme und Anthropomatik 

o Einführung in Erfassung und Interpretation von Biosignalen 

o Anwendungsbeispiele 

o Analyse und Modellierung menschlicher Bewegungen 

o Einführung in die Analyse, Modellierung, und Erkennung menschlicher 

Bewegungsabläufe (gemeinsam mit Dr. Annika Wörner) 


o Multilingual Speech Processing 

o 2+0, Seminar, nicht prüfbar 

o Entwicklung von Sprachübersetzungssystemen mittels Rapid 

Language Adaptation Tools 

o Einblick aus Praxis (IBM) über Standards (VoiceXML) 

o Seminar findet zeitsynchron an UKA und CMU statt über VC 

o erste gemeinsame Lehrveranstaltung im Rahmen des interACT! 

⇒ interACT


8/60 

Sommersemester 

o Multilinguale Mensch-Maschine Kommunikation 


o Einführung in die automatische Spracherkennung und -verarbeitung 

o Signalverarbeitung, statistische Modellierung, praktische Ansätze 

und Methoden, Multilingualität 

o Anwendungen in Mensch-Mensch Kommunikation und Mensch- 

Maschine Interaktion 

o Anwendungsbeispiele 

o Praktikum: Biosignale 

o Praktische Entwicklung 

Lehre am CSL – Sommer 

o Aufnahme von Bewegungsdaten (in Koop mit Sportinstitut) 

o Verschiedene Biosensoren (Vicon, Beschleunigungssensoren, EMG) 

o Automatischer Bewegungserkennung 

o Seminar: Mensch Maschine Emotion 

o Einführung in Modellierung und Erkennung von Emotion im Kontext 

von Mensch-Maschine Interaktion

interACT – 

eine wachsende Kooperation zwischen 

Universität Karlsruhe (TH) 

& 

Carnegie Mellon University, Pittsburgh, USA 

& 

Hong Kong University of Science and 

Technology 

KONTACT: 

interACT-Presse & Kommunikation, Margit Rödder, E-Mail:roedder@ira.uka.de

interACT: 

Seit 2004 gemeinsames Forschungszentrum von Carnegie Mellon 

University (CMU) und Universität Karlsruhe (TH) 

Februar 2007: Erweiterung um Hong Kong University of Science and 

Technology (HKUST) 

Gemeinsame Forschungsprojekte zwischen CMU, UKA und HKUST 

Gastvorlesungen, Workshops, Sommerakademien, Studien- und 

Forschungsaufenthalte 

Begrenzte Zahl von Stipendien für herausragende Studierende, die 

ihre Studien der Informatik mit einem Forschungsaufenthalt in den 

USA und nun auch Hong Kong vertiefen möchten 

KONTACT: 

interACT-Presse & Kommunikation, Margit Rödder, E-Mail:roedder@ira.uka.de 

Prof. Tom Mitchell, 

Vorlesung an UKA 

UKA-Rektor H. Hippler mit CMU- 

Präsident J.L. Cohon während seines 

Besuchs an UKA, Sept. 2005

interACT Studenten-Austausch 2004 - 2008 

CMU-Präsident J.L. Cohon mit interACT- 

Studenten an Universität Karlsruhe, Juni 2006 

KONTACT: 


65 Austausche bislang: 

• 1 Post Doc 

• 5 PhD 

• 34 Diplomarbeiten 

• 25 Studienarbeiten

interACT Stipendien 

Wer wird gefördert? 

• Studierende der Informatik und 

Informationswirtschaft der Universität Karlsruhe 

(TH), von Carnegie Mellon University und Hong 

Kong University of Science and Technology 

Was wird gefördert? 

• Studien- & Bachelorarbeiten, bis zu 3 Monaten 

• Diplom- & Masterarbeiten, bis zu 8 Monaten 

• Doktorarbeiten, bis zu 12 Monaten 

Bewerbungsvoraussetzungen 

• Mindestens abgeschlossenes Vordiplom 

• Gute Englisch-Kenntnisse 

• Betreuender Professor an UKA und CMU 

KONTACT: 


Bewerbungsfristen 

Nächster Termin: 28. 

Mai 2008 

interACT Studenten D. Bertram & K. 

Steinbach in Prof. Kuffner‘s lab, CMU

interACT-Kooperationen UKA und CMU ( Stand April 2008) 

Prof. R. Dillmann – Dr. J. Kuffner, 

KONTACT: 


Prof. R. Simmons, 

Prof. J. Dolan, 

Prof. Chr. Atkeson 

Prof. Ender Finol 

Prof. J. Henkel - Prof. M. Lewicki 

Prof. W. Juling – Prof. E. Nyberg 

Prof. M. Shaw 

Prof. P. Schmid – Prof. E. Clarke 

Prof. Studer – Prof. Sycara 

Prof. R. Dillmann präsentiert seine Forschung 

an Präsident Cohon, UKA, Sept. 2005

interACT-Kooperationen UKA und CMU ( Stand April 2008) 

Prof. W. Tichy – Dr. J. Herbsleb 

KONTACT: 


Prof. M. Shaw 

Prof. R. Vollmar – Prof. Sutner 

Prof. D. Wagner – Prof. Miller 

Prof. Ravi 

Prof. Chr. Weinhardt – Prof. R. Krishnan 

– Prof. Ramayya 

Prof. Alex Waibel & Prof. T. Schultz – Prof. Alex Waibel 

Prof. T. Schultz 

Prof. A. Black

interACT-Kooperationen UKA und HKUST 

(Stand April. 2008) 

Prof. Tanja Schultz – Prof. Dekai Wu 

KONTACT: 


Prof. Pascale Fung 

Prof. Alex Waibel – Prof. Dekai Wu 


Prof. Dekai Wu, Vortrag an UKA, Juli 2007

interACT Advisory Board wählt 2x pro Jahr interACT- 

Stipendiaten aus 

Mitglieder des interACT-Advisory Boards sind: 

• Prof. Alex Waibel, Universität Karlsruhe, Carnegie Mellon University (interACT Director) 

• Prof. Tanja Schultz, Universität Karlsruhe, Carnegie Mellon University (interACT Ass. Director) 

• Prof. Rüdiger Dillmann, Universität Karlsruhe 

• Prof. Jörg Henkel, Universität Karlsruhe 

• Prof. Wilfried Juling, Universität Karlsruhe 

• Prof. Walther Tichy, Universität Karlsruhe 

KONTACT: 

interACT-Presse & Kommunikation, Margit Rödder, E-Mail:roedder@ira.uka.de

interACT Büros 

in Karlsruhe, Pittsburgh und Hong Kong 

• Organisation des interACT Austausch-Programms, Kontaktpersonen für Studierende 

• Presse- und Public Relations 

• Koordination der distinguished lecture series 

• Organisation von Workshops, academies und Events 

Kontaktpersonen 

Universität Karlsruhe (TH) 

Margit Rödder 

Am Fasanengarten 5 

76131 Karlsruhe 

Tel.: +49 721 608 8676 

E-mail: roedder@ira.uka.de 

KONTACT: 


Carnegie Mellon University, interACT 

Lisa Mauti 

407, South Craig Street 

Pittsburgh, PA 15221 

Tel: +1 412 268 1461 

lmauti@cs.cmu.edu 

Hong Kong University 

of Science and Technology 


Tel.: +852 2358 7087 

pascale@ee.ust.hk

weitere Informationen über interACT 

KONTACT: 


http://interact.ira.uka.de


19/60 

o Ausfüllen! 

Hörerliste 

N Nachname, Vorname Fach, Semester Mtr.-Nr Email 

1 SCHULTZ, Tanja Informatik, 36 tanja@ira.uka.de 

2


20/60 

Literatur 

Xuedong Huang, Alex Acero and Hsiao-wuen Hon, Spoken 

Language Processing, Prentice Hall PTR, NJ, 2001 

($81.90 internet price) 

Rabiner and Juang, Fundamentals of Speech Recognition, 

Prentice Hall Signal Processing Series, Englewood 

Cliffs, NJ, 1993 

Jelinek, Statistical Methods for Speech Recognition, MIT 

Press, Cambridge, MA, 1997 ($35) 

Schultz and Kirchhoff, Multilingual Speech Processing, 

Elsevier, Academic Press, 2006 

(ask the authors for discounts!) 

+ diverse Artikel (pdf), die wir im Web zur Verfügung stellen


21/60 

Nützliche Links, Zusätzliches Material 

• Alle Folien werden als pdf ins Web gestellt 

http://csl.ira.uka.de > Lehre > SS2008 > MMMK 

• Dr. Ivica Rogina's Vorlesung (bis einschließlich SS 2006) 

http://isl.ira.uka.de/~stueker/sprachVorlesung 

• Elektronisches Archiv vieler Publikationsbände und Berichte 

(Proceedings) der wichtigsten Konferenzen zum Thema 

“Speech and Language” 

• http://csl.ira.uka.de (passwd) 

• ICASSP (International Conference on Acoustics, Speech, and Signal 

Processing) 

• Interspeech (Zusammenschluss von Eurospeech und ICSLP) 

• ASRU (Automatic Speech Recognition and Understanding) 

• ACL (Association of Comp Linguistics), NA-ACL (North American ACL) 

• HLT (Human Language Technologies) …


22/60 

Weitere Veranstaltungen 

1) Seminar + Praktikum: Multilingual Speech Processing 

- Do-it-yourself Kurs, Entwicklung eines Sprachinterface 

- SIE wählen die Sprache 

- WIR haben die tools (Rapid Language Adaptation Toolkit) 

Der Kurs wird gemeinsam mit der Carnegie Mellon University 

angeboten, d.h. indet LIVE per VC mit CMU statt 

2) Janus-Praktikum von Sebastian Stüker, Lehrstuhl Prof. A Waibel 

3) Vorlesung und Praktikum zum Thema Biosignale 

Biosignale sind Signale, die der menschlichen Körper aufgrund 

physikalischer oder biochemischer Gesetzmäßigkeiten erzeugt 

Erfassung und Interpretation von ElektroBiosignalen und deren 

Anwendung in der Mensch-Maschine Kommunikation 

- Elektroenzephalografie – Messung der Großhirnaktivität, 

- Elektromyografie – Messung der Muskeltätigkeit 

-uvm.


23/60 

SS2008 Vorlesungen zu verwandten Themen 

o Mensch-Maschine-Emotion (Schultz/Laskowski) 

o Sprache und Emotion 

o Multimodale Benutzerschnittstellen (Stiefelhagen/Waibel) 

o Sprache als eine Modalität 

o Mustererkennung (Beyerer) 

o Grundlagen Mustererkennung 

o Mensch-Maschine-Interaktion (Wörn/Burghart) 

o Mensch-Roboter Kooperation (Wörn/Burghart) 

o Schwerpunkt: Herausforderung Humanoide Roboter


24/60 

o TBD 

Syllabus


25/60 

Allgemeine Information: Ziel der Veranstaltung 

Ziele der Vorlesung 

o Sprache in der Mensch-Maschine Kommunikation 

o Vorteile von Sprache als Eingabesignal 

o Nachteile von Sprache als Eingabesignal 

o Grundlagen der Spracherkennung 

o Grundbegriffe 

o Sprachproduktion und Perzeption 

o Digitale Signalverarbeitung, Merkmalsextraktion 

o Statistische Modellierung, Klassifikation 

o Akustische Modellierung, HMMs 

o Sprachmodellierung 

o Weitere Themen der Sprachverarbeitung 

o Dialogmodellierung, Synthese, Übersetzung 

o Anwendungsbeispiele aus der Forschung


26/60 

Heute: Anwendungsbeispiele 

o Spracherkennung: Von Spracheingabesignal nach Text 

o Sprachsynthese: Von Text nach Sprachausgabesignal 

o Sprachübersetzung (über Sprachengrenzen): 

Von Sprachsignal in Sprache L1 zu Sprachsignal in L2 

= Spracherkennung + MT + Sprachsynthese 

o Sprachverstehen, Zusammenfassen 

= Von Spracheingabesignal nach Bedeutung 

o Sprachaktivität ist aber nicht nur das Was wird gesprochen 

Wer spricht? → SprecherIDentifizierung 

Welche Sprache wird gesprochen? → LanguageID 

Über was wird gesprochen? → TopicID 

Wie wird gesprochen? → EmotionID 

Zu wem wird gesprochen? → Focus of Attention 

o Übersetzung (über Speziesgrenzen) 

Beispiel Delphine


27/60 

Introduction 

• Each of the lessons covers one topic from 

“speech recognition and understanding” 

• It covers the most important areas of today’s 

research and also discusses some historic issues 

• The goal of the course is to introduce you to the 

science of automatic speech recognition and 

understanding 

• Today‘s topic: 

o Why are we doing Speech Recognition? 

o What are the advantages and disadvantages 

o Where is it useful? 

o Examples of applications, demos


28/60 

Why Automatic Speech Recognition? 

ADVANTAGES: 

• Natural way of communication for human beings 

• No practicing necessary for users, i.e. speech does not 

require any teaching as opposed to reading/writing 

• High bandwidth (speaking is faster than typing) 

• Additional communication channel (Multimodality) 

• Hands and eyes are free for other tasks 

→ Works in the car / on the run / in the dark 

• Mobility (microphones are smaller than keyboards) 

• Some communication channels (e.g. phone) are designed 

for speech 

• ...


29/60 

Why Automatic Speech Recognition? 

DISADVANTAGES: 

• Unusable where silence/confidentiality is required 

(meetings, library, spoken access codes) 

… we are working on solutions (see later) 

• Still unsatisfactory recognition rate when: 

• Environment is very noisy (party, restaurant, train) 

• Unknown or unlimited domains 

• Uncooperative speakers (whisper, mumble, …) 

• Problems with accents, dialects, code-switching 

• Cultural factors (e.g. collectivism, uncertainty avoidance) 

• Speech input is still more expensive than keyboard


30/60 

Input Speeds (Characters per Minute) 

Handwriting 

Typewriter 

Stenography 

Speech 

Mode Standard Best 

200 500 

200 1000 

500 2000 

1000 4000


31/60 

Where is Speech Recognition and Understanding useful 

Human - Machine Interaction: 

1. Remote control applications 

• Operating Machines over the Phone 

2. Hands/Eyes busy or not useful 

• Speech Recognition in cars 

• Help for Physically Challenged, Nurse bots 

3. Authentication 

• Speaker Identification/Verification/Segmentation 

• Language/Accent Identification 

4. Entertainment / Convenience 

• Speech Recognition for Entertainment 

• Gaming 

5. Indexing and Transcribing Acoustic Documents 

• Archive, Summarize, Search and Retrieve


32/60 

Where is Speech Recognition and Understanding useful 

Human - Human Interaction: 

1. Mediate communication across language boundaries 

• Speech Translation 

• Language Learning 

• Synchronization / Sign Language 

2. Support human interaction 

• Meeting and Lecture systems 

• Non-verbal Cue Identification 

• Multimodal applications 

• Speech therapy support


33/60 

Operating Machines over the Phone 

• Remote Controlled Home 

• Operate heating / air conditioning, turn lights on/off, check email 

• Voice-Operated Answering Machine 

• Call answering machine from anywhere and discuss recent calls 

• Access Databases 

• Pittsburgh Bus Information with CMU’s Let’s Go at 412-268-3526 

• Check the weather with MIT’s Jupiter at 1-888-573-8255 

• Zugauskunft (Erlangen), Telefonauskunft, Fluggesellschaften, Kino 

• Call Center 

• Route or dispatch calls, 911 emergency line 

• AT&T: How may I help you? 

The HMIHY system was deployed in 2001, and according to AT&T 

was handling more than 2 million calls per month by the end of 2001. 

• Use Interactive Services worldwide 

• Plan your next trip with an artificial travel agent


34/60 

Hands-Free / Eyes-Free Tasks 

• Hands and/or Eyes are busy with tools 

• Radio repair 

• Construction site 

• Hands and/or Eyes are needed to operate machines/cars 

• Hold the steering wheel 

• Pull levers, turn knobs, operate switches 

• Watch the street while driving 

• Monitor production line 

• Hands are working on other people 

• Hair stylist cutting hair 

• Surgeon working on a patient 

• Hands and/or Eyes are not helpful in the environment 

• Dark rooms (photography) 

• Outer Space (remote control)


35/60 

Speech Recognition in Cars 

• Use your cellular phone while keeping your hands on the 

wheel and eyes on the street, e.g. voice dialing 

• Operate your audio device while driving 

• Dictate messages (e-mails, SMS) 

TODAY several companies and services 

are emerging which do exactly this 

• Talk to your personal digital assistant 

• Navigation - 

Ask your way through a foreign city 

Find the nearest restaurant


36/60 

Support in everyday life, Help for Elderly and Physically Challenged 

People who are immobile such as lying in bed/hospital or who can‘t use the 

hands due to illness or accidents 

• operate parts of their environment/machines by voice 

• ask a robot for help 

Nursebot Pearl and Florence: ISAC feeding a physically challenged individual 

CMU‘s Robotic assistant for the elderly Center for Intelligent Systems, Vanderbilt Univ 

SFB588, UKA 

Humanoide Roboter 

Children with speaking disorders make significant improvements by trying 

to make a speech recognizer understand them 

Children with dyslexia and similar problems learn to read faster using 

automatic speech recognition


37/60 

Information in Sprache 

Speech 

Recognition 

Language 

Recognition 

Speaker 

Recognition 

: 

: 

Emotion 

Recognition 

Accent 

Recognition 

Words 

Onune baksana be adam! 

Language Name 

Turkish 

Speaker Name 

Umut 

: 

: 

Emotion 

Angry 

Accent 

Istanbul 

Topic ID: Chemicals 

Entity Tracking: Istanbul 

Acoustic Scene: Bus Station 

Discourse Analysis: Negotiation 

Tanja Schultz, Speaker Characteristics, In: C. Müller (Ed.) Speaker Classification, Lecture Notes in Computer 

Science / Artificial Intelligence, Springer, Heidelberg - Berlin - New York, Volume 4343.


38/60 

Identification 

Whose voice is it? 

Where are the speaker changes? 

Which segments are from the same speaker? 

? 

? 

Speaker Recognition 

? 

Verification/Detection 

Is it Sally’s voice? 

Segmentation and Clustering 

Will 

Tim 

?


39/60 

Speaker Identification/Verification/Recognition 

Verification 

verify someone’s claimed identity, e.g. 

is the person who s/he claims to be 

Instead of password: 

say something instead of typing 

Identification 

“who is speaking” 

Identifies a speaker from an enrolled 

population by searching the database 

Personalized behavior: 

customize machine reaction automatically 

to the current user 

Recognition 

Often used to refer to all problems of 

verification, identification, 

segmentation&clustering


40/60 

Speaker Segmentation and Clustering 

Segmentation: Automatically segment incoming speech by speaker 

Clustering: cluster segments of the same speaker 

Adaptation: use parameters that are optimized recognition for specific speaker 

Mandarin Broadcast News 

Overlapping speech 

Speaker turn miss 

Speech over noise


41/60 

Language Identification 

o Auswahl Erkenner (bei multilingualer Spracherkennung) 

o Anrufweiterleitung (z.B. 911 emergency line) 

o Datenanalyse, Auswahl 

o Spezialfall: Akzenterkennung 

o Optimierung aller Systemparameter auf Sprecherakzent 

o E-Language Learning 

Japanese 

Tanja Schultz, Identifizierung von Sprachen -Exemplarisch aufgezeigt am Beispiel der Sprachen Deutsch, Englisch und Spanisch, 

Diplomarbeit, Institut für Logik, Komplexität und Deduktionssysteme, Universität Karlsruhe, April 1995


42/60 

o Originalsignal 

FarSID: Far-Field Speaker Recognition 

o Effekt Echo (mittelgroßer Raum, 1-m Distanz zum Micro): 

0.5 sec 1 sec 2 sec 

o Effekt Distance (mittelgroßer Raum, .5-sec Echo): 

1 m 2 m 4 m 

o Effect Raumgröße (1-m Distanz, .5-sec Echo) 

Klein Mittelgroß Groß 

Q. Jin, Y. Pan, T. Schultz, Far-Field Speaker Recognition, Proceedings of the IEEE International 

Conference on Acoustics, Speech, and Signal Processing, ICASSP, Toulouse, France, 2006


43/60 

The dream (?) of communicating 

across language boundaries 

- A babelfish for everybody - 

Global Communication 

• Fun, Everyday life: 

• Chat in your mother tongue 

Worldwide 

• Travel without comm. problems 

• Business: 

• Negotiate and being sure that 

your partner is getting it right 

• Computer has no stakes, e.g. 

neutral translation, not lopsided 

• Face-to-Face Communication 

• Over the phone or internet 

• Text-to-Text vs Speech-to-Speech 

„The building of the tower of Babel“, 

1563 by Pieter Brueghel, 

Kunsthistorisches Museum, Vienna 

The building of the Tower of Babel 

and the Confusion of Tongues 

(languages) in ancient Babylon 

mentioned in Genesis 

"Babel" is composed of two words 

"baa“meaning "gate" and "el," "god." 

Hence, "the gate of god.“ A related 

word in Hebrew, "balal" means 

"confusion."


44/60 

GALE = Global Autonomous Language Exploitation: 

Process huge volumes of speech and text data in 

multiple languages (Arabic, Chinese, English) 

• Broadcast News, Shows, Telephone Conversations 

Apply automatic technology to spoken and written language: 

• Absorb, Analyze, and Interpret 

Deliver pertinent information in easy-to-understand 

forms to monolingual analysts 

Three engines: 

- Transcription, 

- Translation, 

- Distillation 

GALE


45/60 

ASR 

SMT 

Demonstration GALE – Chinese TV 

Mandarin 

Broadcast News 

CCTV 

recorded in the US 

over satellite 

Transforming the 

Mandarin speech 

Into Chinese text 

using Automatic 

Speech Recognition 

Translating from 

Chinese text into 

English text 

using Statistical 

Machine Translation 

H. Yu, Y.C. Tam, T. Schaaf, S. Stüker, Q. Jin, M. Noamany, T. Schultz, The ISL RT04 Mandarin Broadcast 

News Evaluation System, EARS Rich Transcription Workshop, Palisades, NY, November 2004


46/60 

o Tourism 

PDA Speech Translation in Mobile Scenarios 

o Needs in Foreign Country 

o International Events 

o Conferences 

o Business 

o Olympics 

o Humanitarian Needs 

o Humanitarian, 

Government 

o Emergency line 911 

o USA, multicultural 

population 

o Army, peace corps 

A. Waibel, A. Badran, A. Black, R. Frederking, D. Gates, A. Lavie, L. Levin, K. Lenzo, L Mayfield Tomokiyo, 

J. Reichert, T. Schultz, D. Wallace, M. Woszczyna, J. Zhang, Speechalator: Two-way Speech-to-Speech Translation 

in your Hand. HLT-NAACL 2003, Edmonton, Alberta, Canada, 2003


47/60 

Verbmobil 

Talk to people (face-to-face) from/in other countries in your own 

language. 

A step towards Startrek's "Universal Translator“


48/60 

Mobility: Personal Digital Assistants 

Use your PDA or cellular phone to get help 

• Navigation 

• Translation 

• Information (travel, transportation, medical, ...) 

Demo


49/60 

SPICE: Rapid Language Adaptation Tools 

Major Problem: Tremendous costs and time for development 

o Very few languages (≤ 50 out of 6900) with many resources 

o Lack of conventions (e.g. Languages without writing system) 

o Gap between technology and language expertise 

⇒ SPICE: Intelligent system that learns language from user 

o Speech Processing: Interactive Creation and Evaluation toolkit 

o Develop web-based toolkits for Speech Processing: ASR, MT, TTS 

o http://cmuspice.org 

o Interactive efficient learning 

Interactive learning: 

Demo 

o Solicite knowledge from user in the loop 

o Rapid adaptation of language independent models 

Efficiency: 

o Reduce time and costs by a factor of 10 

T. Schultz, A. Black, S. Badaskar, M. Hornyak, J. Kominek, SPICE: Web-based Tools for Rapid Language 

Adaptation in Speech Processing Systems, Proceedings of Interspeech, Antwerp, Belgium, August 2007


50/60 

Meeting Room 

The Meeting Browser is a powerful tool that allows us to record a new 

meeting, review or summarize an existing meeting or search a set of 

existing meetings for a particular speaker, topic, or idea.


51/60 

Indexing Acoustic Documents 

The world is flooded with 

information. 

More and more 

information is coming 

through audio-visual 

channels. 

Trying to find information 

in acoustic documents 

needs an intelligent 

acoustic search engine.


52/60 

View4You / Informedia 

Automatically records Broadcast News and allows the 

user to retrieve video segments of news items for 

different topics using spoken language input


53/60 

Education, Learning Languages 

• LISTEN: Automated reading tutor that 

listens to a child read it aloud a displayed 

text, and helps where needed. 

• CHENGO: web-based language learning 

in a gaming environment for English, 

Chinese 

• Programm CALL at CMU on Computer 

Assisted Language Learning


54/60 

Robust and Confidential Speech Recognition 

Traditional Speech Recognition: 

• Capture the acoustic sound wave by microphone 

• Transform signal into electrical energy 

Requirements and Challenges: 

• Audibility: 

Speech needs to be perceivable by microphone 

(no low voice or whispering, no silent speech) 

• Interference: Speech disturbs others 

(no speaking in libraries, theaters, meetings) 

• Privacy: Speech signal can be captured by others 

(no confidential phone calls in public places) 

• Robustness: 

Signal is corrupted by noisy environment 

(difficult to recognize in restaurants, bars, cars)


55/60 

Bone-conduction 

o When we speak normally our body is a resonance box 

Skin and bones vibrate when we speak (try this!) 

o Capture this vibration by so-called bone-conducting 

or skin-conducting microphones 

Zheng et al. 

Jou et al. / Intecs 

o Whispered speech is defined as: 

o the articulated production of respiratory sound 

o with few or no vibration of the vocal-folds 

o produced by the motion of the articulator apparatus 

o transmitted through the soft tissue or bones of the head 

Stethoscopic 

Microphone 

Nakajima


56/60 

Approach: 

Electromyography – Silent Speech 

– Surface Electromyography (EMG) 

– Surface = No needles 

– Electro = electrical activity 

– Myo = muscle 

– Graphy = recording 

- Measure the electrical activity of facial 

muscles by capturing the electrical 

capacity differences 

s1 

s2 

EMG-Signal 

- MOTION is recorded, not acoustic signal 

⇒ silently moving the lips / articulators 

is good enough 

Demo 

SILENT SPEECH 

s1 – s2


57/60 

Delphinisch 

Kommunikation über Sprachgrenzen → über Speziesgrenzen 

• Zusammenarbeit mit Wild Dolphins Project 

• freilebende Atlantis Spotted Dolphins 

• Bestimmung, Verhalten, Kommunikation 

• Kommunikation mit Delphinen 

• Delphine versuchen Kontakt aufzunehmen 

• Information 20Mio Jahre alte Spezies 

• “Dolphone” und “Delphinisch” 

• Lautproduktion, Perzeption, Frequenz, 

Medium 

• Mustererkennung, Extraktion, 

Clustering, Statistische Modellierung 

• Audio- und Video indexing, archiving, retrieval 

• Audioaufnahme, -analyse, -synthese, -übersetzung 

http://wilddolphinproject.com


58/60 

Even Beyond Human Speech … 

Towards Communication with Dolphins 

Why do we want to talk to Dolphins? 

• They might have a lot to say (20Mio old species) 

• It is a challenging scientific problem 

- Cross language boundaries 

Cross species boundaries 

- Different sound production, perception, … 

- Different medium (water), transmission, omni-directional 

• Nothing is known about dolphins’ language 

• It involves spending a lot of time in the Bahamas ☺ 

Why do Dolphins want to talk to us? 

We don’t know … 

… but there is evidence that they try hard 

CMU: www.cs.cmu.edu/~tanja 

Wild Dolphin Project 

(http://wilddolphinproject.com)


59/60 

Conclusions 

Speech: 

• Is the most natural way of communication for human beings 

• Does not require any teaching or practicing 

• Has high bandwidth (speaking is faster than typing) 

• Supplements other communication channels (Multimodality) 

Speech Recognition is useful: 

• In hands-busy and eyes-busy environments 

• For mobile / small devices 

• Support in everyday life, Help for physically challenged folks 

Speech Recognition and Understanding: 

• Allows to (remotely) operate Machines 

• Supports global communication between humans 

• Break language (and maybe sometimes cultural) barriers

CSL - Cognitive Systems Lab

Create successful ePaper yourself

Delete template?

Save as template?