CSL - Cognitive Systems Lab
CSL - Cognitive Systems Lab
CSL - Cognitive Systems Lab
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Vorlesung SS 2008<br />
Multilinguale Mensch-Maschine<br />
Kommunikation<br />
Prof. Dr. Tanja Schultz<br />
Dipl.-Inform. Felix Putze<br />
Universität Karlsruhe, Fakultät für Informatik<br />
Dienstag, 15. April 2008
<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />
2/60<br />
Überblick<br />
Vorlesung 1: Übersicht und Einführung<br />
Allgemeine Informationen zur Vorlesung<br />
Vorstellen des Lehrstuhls<br />
interACT<br />
Hinführung zum Thema<br />
Inhaltliche Struktur der Vorlesung<br />
Fragen und Antworten
<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />
3/60<br />
Allgemeine Informationen: Vorlesung<br />
Weiterführende Vorlesung im Hauptdiplom<br />
o Vorkenntnisse sind nicht erforderlich<br />
Prüfungsmöglichkeit:<br />
o Ja, in Kognitive Systeme und Anthropomatik<br />
Turnus:<br />
o Jährlich im SS, 4+0<br />
o Prüfung in jedem Prüfungszeitraum<br />
Termine:<br />
o 27 Vorlesungstermine<br />
o Di 14:00 – 15:30 (HS -101) und Do 14:00 – 15:30 (SR 131)<br />
o Start 15.04.08, Ende 17.07.08<br />
DozentInnen:<br />
o Prof. Dr. Tanja Schultz<br />
o Dipl.-Inform. Felix Putze
<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />
4/60<br />
Allgemeine Informationen: Vorlesung<br />
o Alle Vorlesungsunterlagen befinden sich unter<br />
http://csl.ira.uka.de > Lehre > SS2008 > MMMK<br />
o Alle Folien als pdf (kein passwd Schutz)<br />
o Aktuelle Änderungen, Ankündigungen, Syllabus<br />
o Gegebenenfalls zusätzliches Material (papers)<br />
o Grundlagen für Prüfungen:<br />
o Vorlesungsinhalt, Folien, zusätzliches Material<br />
o Fragen, Probleme und Kommentare sind jederzeit während der<br />
Vorlesung willkommen, oder im persönlichen Gespräch:<br />
o Tanja Schultz (tanja@ira.uka.de)<br />
o Felix Putze (putze@ira.uka.de)<br />
o Sprechstunden Tanja Schultz nach Vereinbarung
<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />
5/60<br />
Allgemeine Informationen: <strong>CSL</strong><br />
o Lehrstuhl für Kognitive Systeme seit 1. Juni 2007<br />
o Universität Karlsruhe, Fakultät für Informatik<br />
o Institut für Algorithmen und Kognitive Systeme (IAKS)<br />
o Gründung des <strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong>oratory (<strong>CSL</strong>)<br />
o Homepage: http://csl.ira.uka.de<br />
o Informatikgebäude 50.34, 2.OG linker Flügel<br />
o Kontakt:<br />
o Prof. Dr.-Ing. Tanja Schultz<br />
o tanja@ira.uka.de<br />
o +49 721 608 6300<br />
o Sekretariat Frau Helga Scherer<br />
o scherer@ira.uka.de<br />
o +49 721 608 6312
<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />
6/60<br />
Forschung: Human-Centered Technologies<br />
Anwendungsfeld Mensch-Maschine Interaktion<br />
Herasusforerderungen und Aufgagen:<br />
Produktivität und Usability<br />
Kommunikation des Menschen mit seiner Umwelt<br />
im weitesten Sinn:<br />
Sprache, Bewegung, Biosignale<br />
Technologien und Methoden:<br />
Erkennen, Verstehen, Identifizieren<br />
Statistische Modellierung, Klassifikation, ...<br />
Anwendungsfeld Mensch-Mensch Kommunikation<br />
Herausforderung und Aufgaben:<br />
Sprachenvielfalt, kulturelle Barrieren<br />
Aufwand und Kosten
<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />
7/60<br />
Lehre am <strong>CSL</strong> – Winter<br />
Wintersemester<br />
o Biosignale und Benutzerschnittstellen<br />
o 4+0, prüfbar in Kognitive Systeme und Anthropomatik<br />
o Einführung in Erfassung und Interpretation von Biosignalen<br />
o Anwendungsbeispiele<br />
o Analyse und Modellierung menschlicher Bewegungen<br />
o Einführung in die Analyse, Modellierung, und Erkennung menschlicher<br />
Bewegungsabläufe (gemeinsam mit Dr. Annika Wörner)<br />
o 2+0, prüfbar in Kognitive Systeme und Anthropomatik<br />
o Multilingual Speech Processing<br />
o 2+0, Seminar, nicht prüfbar<br />
o Entwicklung von Sprachübersetzungssystemen mittels Rapid<br />
Language Adaptation Tools<br />
o Einblick aus Praxis (IBM) über Standards (VoiceXML)<br />
o Seminar findet zeitsynchron an UKA und CMU statt über VC<br />
o erste gemeinsame Lehrveranstaltung im Rahmen des interACT!<br />
⇒ interACT
<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />
8/60<br />
Sommersemester<br />
o Multilinguale Mensch-Maschine Kommunikation<br />
o 4+0, prüfbar in Kognitive Systeme und Anthropomatik<br />
o Einführung in die automatische Spracherkennung und -verarbeitung<br />
o Signalverarbeitung, statistische Modellierung, praktische Ansätze<br />
und Methoden, Multilingualität<br />
o Anwendungen in Mensch-Mensch Kommunikation und Mensch-<br />
Maschine Interaktion<br />
o Anwendungsbeispiele<br />
o Praktikum: Biosignale<br />
o Praktische Entwicklung<br />
Lehre am <strong>CSL</strong> – Sommer<br />
o Aufnahme von Bewegungsdaten (in Koop mit Sportinstitut)<br />
o Verschiedene Biosensoren (Vicon, Beschleunigungssensoren, EMG)<br />
o Automatischer Bewegungserkennung<br />
o Seminar: Mensch Maschine Emotion<br />
o Einführung in Modellierung und Erkennung von Emotion im Kontext<br />
von Mensch-Maschine Interaktion
interACT –<br />
eine wachsende Kooperation zwischen<br />
Universität Karlsruhe (TH)<br />
&<br />
Carnegie Mellon University, Pittsburgh, USA<br />
&<br />
Hong Kong University of Science and<br />
Technology<br />
KONTACT:<br />
interACT-Presse & Kommunikation, Margit Rödder, E-Mail:roedder@ira.uka.de
interACT:<br />
Seit 2004 gemeinsames Forschungszentrum von Carnegie Mellon<br />
University (CMU) und Universität Karlsruhe (TH)<br />
Februar 2007: Erweiterung um Hong Kong University of Science and<br />
Technology (HKUST)<br />
Gemeinsame Forschungsprojekte zwischen CMU, UKA und HKUST<br />
Gastvorlesungen, Workshops, Sommerakademien, Studien- und<br />
Forschungsaufenthalte<br />
Begrenzte Zahl von Stipendien für herausragende Studierende, die<br />
ihre Studien der Informatik mit einem Forschungsaufenthalt in den<br />
USA und nun auch Hong Kong vertiefen möchten<br />
KONTACT:<br />
interACT-Presse & Kommunikation, Margit Rödder, E-Mail:roedder@ira.uka.de<br />
Prof. Tom Mitchell,<br />
Vorlesung an UKA<br />
UKA-Rektor H. Hippler mit CMU-<br />
Präsident J.L. Cohon während seines<br />
Besuchs an UKA, Sept. 2005
interACT Studenten-Austausch 2004 - 2008<br />
CMU-Präsident J.L. Cohon mit interACT-<br />
Studenten an Universität Karlsruhe, Juni 2006<br />
KONTACT:<br />
interACT-Presse & Kommunikation, Margit Rödder, E-Mail:roedder@ira.uka.de<br />
65 Austausche bislang:<br />
• 1 Post Doc<br />
• 5 PhD<br />
• 34 Diplomarbeiten<br />
• 25 Studienarbeiten
interACT Stipendien<br />
Wer wird gefördert?<br />
• Studierende der Informatik und<br />
Informationswirtschaft der Universität Karlsruhe<br />
(TH), von Carnegie Mellon University und Hong<br />
Kong University of Science and Technology<br />
Was wird gefördert?<br />
• Studien- & Bachelorarbeiten, bis zu 3 Monaten<br />
• Diplom- & Masterarbeiten, bis zu 8 Monaten<br />
• Doktorarbeiten, bis zu 12 Monaten<br />
Bewerbungsvoraussetzungen<br />
• Mindestens abgeschlossenes Vordiplom<br />
• Gute Englisch-Kenntnisse<br />
• Betreuender Professor an UKA und CMU<br />
KONTACT:<br />
interACT-Presse & Kommunikation, Margit Rödder, E-Mail:roedder@ira.uka.de<br />
Bewerbungsfristen<br />
Nächster Termin: 28.<br />
Mai 2008<br />
interACT Studenten D. Bertram & K.<br />
Steinbach in Prof. Kuffner‘s lab, CMU
interACT-Kooperationen UKA und CMU ( Stand April 2008)<br />
Prof. R. Dillmann – Dr. J. Kuffner,<br />
KONTACT:<br />
interACT-Presse & Kommunikation, Margit Rödder, E-Mail:roedder@ira.uka.de<br />
Prof. R. Simmons,<br />
Prof. J. Dolan,<br />
Prof. Chr. Atkeson<br />
Prof. Ender Finol<br />
Prof. J. Henkel - Prof. M. Lewicki<br />
Prof. W. Juling – Prof. E. Nyberg<br />
Prof. M. Shaw<br />
Prof. P. Schmid – Prof. E. Clarke<br />
Prof. Studer – Prof. Sycara<br />
Prof. R. Dillmann präsentiert seine Forschung<br />
an Präsident Cohon, UKA, Sept. 2005
interACT-Kooperationen UKA und CMU ( Stand April 2008)<br />
Prof. W. Tichy – Dr. J. Herbsleb<br />
KONTACT:<br />
interACT-Presse & Kommunikation, Margit Rödder, E-Mail:roedder@ira.uka.de<br />
Prof. M. Shaw<br />
Prof. R. Vollmar – Prof. Sutner<br />
Prof. D. Wagner – Prof. Miller<br />
Prof. Ravi<br />
Prof. Chr. Weinhardt – Prof. R. Krishnan<br />
– Prof. Ramayya<br />
Prof. Alex Waibel & Prof. T. Schultz – Prof. Alex Waibel<br />
Prof. T. Schultz<br />
Prof. A. Black
interACT-Kooperationen UKA und HKUST<br />
(Stand April. 2008)<br />
Prof. Tanja Schultz – Prof. Dekai Wu<br />
KONTACT:<br />
interACT-Presse & Kommunikation, Margit Rödder, E-Mail:roedder@ira.uka.de<br />
Prof. Pascale Fung<br />
Prof. Alex Waibel – Prof. Dekai Wu<br />
Prof. Pascale Fung<br />
Prof. Dekai Wu, Vortrag an UKA, Juli 2007
interACT Advisory Board wählt 2x pro Jahr interACT-<br />
Stipendiaten aus<br />
Mitglieder des interACT-Advisory Boards sind:<br />
• Prof. Alex Waibel, Universität Karlsruhe, Carnegie Mellon University (interACT Director)<br />
• Prof. Tanja Schultz, Universität Karlsruhe, Carnegie Mellon University (interACT Ass. Director)<br />
• Prof. Rüdiger Dillmann, Universität Karlsruhe<br />
• Prof. Jörg Henkel, Universität Karlsruhe<br />
• Prof. Wilfried Juling, Universität Karlsruhe<br />
• Prof. Walther Tichy, Universität Karlsruhe<br />
KONTACT:<br />
interACT-Presse & Kommunikation, Margit Rödder, E-Mail:roedder@ira.uka.de
interACT Büros<br />
in Karlsruhe, Pittsburgh und Hong Kong<br />
• Organisation des interACT Austausch-Programms, Kontaktpersonen für Studierende<br />
• Presse- und Public Relations<br />
• Koordination der distinguished lecture series<br />
• Organisation von Workshops, academies und Events<br />
Kontaktpersonen<br />
Universität Karlsruhe (TH)<br />
Margit Rödder<br />
Am Fasanengarten 5<br />
76131 Karlsruhe<br />
Tel.: +49 721 608 8676<br />
E-mail: roedder@ira.uka.de<br />
KONTACT:<br />
interACT-Presse & Kommunikation, Margit Rödder, E-Mail:roedder@ira.uka.de<br />
Carnegie Mellon University, interACT<br />
Lisa Mauti<br />
407, South Craig Street<br />
Pittsburgh, PA 15221<br />
Tel: +1 412 268 1461<br />
lmauti@cs.cmu.edu<br />
Hong Kong University<br />
of Science and Technology<br />
Prof. Pascale Fung<br />
Tel.: +852 2358 7087<br />
pascale@ee.ust.hk
weitere Informationen über interACT<br />
KONTACT:<br />
interACT-Presse & Kommunikation, Margit Rödder, E-Mail:roedder@ira.uka.de<br />
http://interact.ira.uka.de
<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />
19/60<br />
o Ausfüllen!<br />
Hörerliste<br />
N Nachname, Vorname Fach, Semester Mtr.-Nr Email<br />
1 SCHULTZ, Tanja Informatik, 36 tanja@ira.uka.de<br />
2
<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />
20/60<br />
Literatur<br />
Xuedong Huang, Alex Acero and Hsiao-wuen Hon, Spoken<br />
Language Processing, Prentice Hall PTR, NJ, 2001<br />
($81.90 internet price)<br />
Rabiner and Juang, Fundamentals of Speech Recognition,<br />
Prentice Hall Signal Processing Series, Englewood<br />
Cliffs, NJ, 1993<br />
Jelinek, Statistical Methods for Speech Recognition, MIT<br />
Press, Cambridge, MA, 1997 ($35)<br />
Schultz and Kirchhoff, Multilingual Speech Processing,<br />
Elsevier, Academic Press, 2006<br />
(ask the authors for discounts!)<br />
+ diverse Artikel (pdf), die wir im Web zur Verfügung stellen
<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />
21/60<br />
Nützliche Links, Zusätzliches Material<br />
• Alle Folien werden als pdf ins Web gestellt<br />
http://csl.ira.uka.de > Lehre > SS2008 > MMMK<br />
• Dr. Ivica Rogina's Vorlesung (bis einschließlich SS 2006)<br />
http://isl.ira.uka.de/~stueker/sprachVorlesung<br />
• Elektronisches Archiv vieler Publikationsbände und Berichte<br />
(Proceedings) der wichtigsten Konferenzen zum Thema<br />
“Speech and Language”<br />
• http://csl.ira.uka.de (passwd)<br />
• ICASSP (International Conference on Acoustics, Speech, and Signal<br />
Processing)<br />
• Interspeech (Zusammenschluss von Eurospeech und I<strong>CSL</strong>P)<br />
• ASRU (Automatic Speech Recognition and Understanding)<br />
• ACL (Association of Comp Linguistics), NA-ACL (North American ACL)<br />
• HLT (Human Language Technologies) …
<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />
22/60<br />
Weitere Veranstaltungen<br />
1) Seminar + Praktikum: Multilingual Speech Processing<br />
- Do-it-yourself Kurs, Entwicklung eines Sprachinterface<br />
- SIE wählen die Sprache<br />
- WIR haben die tools (Rapid Language Adaptation Toolkit)<br />
Der Kurs wird gemeinsam mit der Carnegie Mellon University<br />
angeboten, d.h. indet LIVE per VC mit CMU statt<br />
2) Janus-Praktikum von Sebastian Stüker, Lehrstuhl Prof. A Waibel<br />
3) Vorlesung und Praktikum zum Thema Biosignale<br />
Biosignale sind Signale, die der menschlichen Körper aufgrund<br />
physikalischer oder biochemischer Gesetzmäßigkeiten erzeugt<br />
Erfassung und Interpretation von ElektroBiosignalen und deren<br />
Anwendung in der Mensch-Maschine Kommunikation<br />
- Elektroenzephalografie – Messung der Großhirnaktivität,<br />
- Elektromyografie – Messung der Muskeltätigkeit<br />
-uvm.
<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />
23/60<br />
SS2008 Vorlesungen zu verwandten Themen<br />
o Mensch-Maschine-Emotion (Schultz/Laskowski)<br />
o Sprache und Emotion<br />
o Multimodale Benutzerschnittstellen (Stiefelhagen/Waibel)<br />
o Sprache als eine Modalität<br />
o Mustererkennung (Beyerer)<br />
o Grundlagen Mustererkennung<br />
o Mensch-Maschine-Interaktion (Wörn/Burghart)<br />
o Mensch-Roboter Kooperation (Wörn/Burghart)<br />
o Schwerpunkt: Herausforderung Humanoide Roboter
<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />
24/60<br />
o TBD<br />
Syllabus
<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />
25/60<br />
Allgemeine Information: Ziel der Veranstaltung<br />
Ziele der Vorlesung<br />
o Sprache in der Mensch-Maschine Kommunikation<br />
o Vorteile von Sprache als Eingabesignal<br />
o Nachteile von Sprache als Eingabesignal<br />
o Grundlagen der Spracherkennung<br />
o Grundbegriffe<br />
o Sprachproduktion und Perzeption<br />
o Digitale Signalverarbeitung, Merkmalsextraktion<br />
o Statistische Modellierung, Klassifikation<br />
o Akustische Modellierung, HMMs<br />
o Sprachmodellierung<br />
o Weitere Themen der Sprachverarbeitung<br />
o Dialogmodellierung, Synthese, Übersetzung<br />
o Anwendungsbeispiele aus der Forschung
<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />
26/60<br />
Heute: Anwendungsbeispiele<br />
o Spracherkennung: Von Spracheingabesignal nach Text<br />
o Sprachsynthese: Von Text nach Sprachausgabesignal<br />
o Sprachübersetzung (über Sprachengrenzen):<br />
Von Sprachsignal in Sprache L1 zu Sprachsignal in L2<br />
= Spracherkennung + MT + Sprachsynthese<br />
o Sprachverstehen, Zusammenfassen<br />
= Von Spracheingabesignal nach Bedeutung<br />
o Sprachaktivität ist aber nicht nur das Was wird gesprochen<br />
Wer spricht? → SprecherIDentifizierung<br />
Welche Sprache wird gesprochen? → LanguageID<br />
Über was wird gesprochen? → TopicID<br />
Wie wird gesprochen? → EmotionID<br />
Zu wem wird gesprochen? → Focus of Attention<br />
o Übersetzung (über Speziesgrenzen)<br />
Beispiel Delphine
<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />
27/60<br />
Introduction<br />
• Each of the lessons covers one topic from<br />
“speech recognition and understanding”<br />
• It covers the most important areas of today’s<br />
research and also discusses some historic issues<br />
• The goal of the course is to introduce you to the<br />
science of automatic speech recognition and<br />
understanding<br />
• Today‘s topic:<br />
o Why are we doing Speech Recognition?<br />
o What are the advantages and disadvantages<br />
o Where is it useful?<br />
o Examples of applications, demos
<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />
28/60<br />
Why Automatic Speech Recognition?<br />
ADVANTAGES:<br />
• Natural way of communication for human beings<br />
• No practicing necessary for users, i.e. speech does not<br />
require any teaching as opposed to reading/writing<br />
• High bandwidth (speaking is faster than typing)<br />
• Additional communication channel (Multimodality)<br />
• Hands and eyes are free for other tasks<br />
→ Works in the car / on the run / in the dark<br />
• Mobility (microphones are smaller than keyboards)<br />
• Some communication channels (e.g. phone) are designed<br />
for speech<br />
• ...
<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />
29/60<br />
Why Automatic Speech Recognition?<br />
DISADVANTAGES:<br />
• Unusable where silence/confidentiality is required<br />
(meetings, library, spoken access codes)<br />
… we are working on solutions (see later)<br />
• Still unsatisfactory recognition rate when:<br />
• Environment is very noisy (party, restaurant, train)<br />
• Unknown or unlimited domains<br />
• Uncooperative speakers (whisper, mumble, …)<br />
• Problems with accents, dialects, code-switching<br />
• Cultural factors (e.g. collectivism, uncertainty avoidance)<br />
• Speech input is still more expensive than keyboard
<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />
30/60<br />
Input Speeds (Characters per Minute)<br />
Handwriting<br />
Typewriter<br />
Stenography<br />
Speech<br />
Mode Standard Best<br />
200 500<br />
200 1000<br />
500 2000<br />
1000 4000
<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />
31/60<br />
Where is Speech Recognition and Understanding useful<br />
Human - Machine Interaction:<br />
1. Remote control applications<br />
• Operating Machines over the Phone<br />
2. Hands/Eyes busy or not useful<br />
• Speech Recognition in cars<br />
• Help for Physically Challenged, Nurse bots<br />
3. Authentication<br />
• Speaker Identification/Verification/Segmentation<br />
• Language/Accent Identification<br />
4. Entertainment / Convenience<br />
• Speech Recognition for Entertainment<br />
• Gaming<br />
5. Indexing and Transcribing Acoustic Documents<br />
• Archive, Summarize, Search and Retrieve
<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />
32/60<br />
Where is Speech Recognition and Understanding useful<br />
Human - Human Interaction:<br />
1. Mediate communication across language boundaries<br />
• Speech Translation<br />
• Language Learning<br />
• Synchronization / Sign Language<br />
2. Support human interaction<br />
• Meeting and Lecture systems<br />
• Non-verbal Cue Identification<br />
• Multimodal applications<br />
• Speech therapy support
<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />
33/60<br />
Operating Machines over the Phone<br />
• Remote Controlled Home<br />
• Operate heating / air conditioning, turn lights on/off, check email<br />
• Voice-Operated Answering Machine<br />
• Call answering machine from anywhere and discuss recent calls<br />
• Access Databases<br />
• Pittsburgh Bus Information with CMU’s Let’s Go at 412-268-3526<br />
• Check the weather with MIT’s Jupiter at 1-888-573-8255<br />
• Zugauskunft (Erlangen), Telefonauskunft, Fluggesellschaften, Kino<br />
• Call Center<br />
• Route or dispatch calls, 911 emergency line<br />
• AT&T: How may I help you?<br />
The HMIHY system was deployed in 2001, and according to AT&T<br />
was handling more than 2 million calls per month by the end of 2001.<br />
• Use Interactive Services worldwide<br />
• Plan your next trip with an artificial travel agent
<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />
34/60<br />
Hands-Free / Eyes-Free Tasks<br />
• Hands and/or Eyes are busy with tools<br />
• Radio repair<br />
• Construction site<br />
• Hands and/or Eyes are needed to operate machines/cars<br />
• Hold the steering wheel<br />
• Pull levers, turn knobs, operate switches<br />
• Watch the street while driving<br />
• Monitor production line<br />
• Hands are working on other people<br />
• Hair stylist cutting hair<br />
• Surgeon working on a patient<br />
• Hands and/or Eyes are not helpful in the environment<br />
• Dark rooms (photography)<br />
• Outer Space (remote control)
<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />
35/60<br />
Speech Recognition in Cars<br />
• Use your cellular phone while keeping your hands on the<br />
wheel and eyes on the street, e.g. voice dialing<br />
• Operate your audio device while driving<br />
• Dictate messages (e-mails, SMS)<br />
TODAY several companies and services<br />
are emerging which do exactly this<br />
• Talk to your personal digital assistant<br />
• Navigation -<br />
Ask your way through a foreign city<br />
Find the nearest restaurant
<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />
36/60<br />
Support in everyday life, Help for Elderly and Physically Challenged<br />
People who are immobile such as lying in bed/hospital or who can‘t use the<br />
hands due to illness or accidents<br />
• operate parts of their environment/machines by voice<br />
• ask a robot for help<br />
Nursebot Pearl and Florence: ISAC feeding a physically challenged individual<br />
CMU‘s Robotic assistant for the elderly Center for Intelligent <strong>Systems</strong>, Vanderbilt Univ<br />
SFB588, UKA<br />
Humanoide Roboter<br />
Children with speaking disorders make significant improvements by trying<br />
to make a speech recognizer understand them<br />
Children with dyslexia and similar problems learn to read faster using<br />
automatic speech recognition
<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />
37/60<br />
Information in Sprache<br />
Speech<br />
Recognition<br />
Language<br />
Recognition<br />
Speaker<br />
Recognition<br />
:<br />
:<br />
Emotion<br />
Recognition<br />
Accent<br />
Recognition<br />
Words<br />
Onune baksana be adam!<br />
Language Name<br />
Turkish<br />
Speaker Name<br />
Umut<br />
:<br />
:<br />
Emotion<br />
Angry<br />
Accent<br />
Istanbul<br />
Topic ID: Chemicals<br />
Entity Tracking: Istanbul<br />
Acoustic Scene: Bus Station<br />
Discourse Analysis: Negotiation<br />
Tanja Schultz, Speaker Characteristics, In: C. Müller (Ed.) Speaker Classification, Lecture Notes in Computer<br />
Science / Artificial Intelligence, Springer, Heidelberg - Berlin - New York, Volume 4343.
<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />
38/60<br />
Identification<br />
Whose voice is it?<br />
Where are the speaker changes?<br />
Which segments are from the same speaker?<br />
?<br />
?<br />
Speaker Recognition<br />
?<br />
Verification/Detection<br />
Is it Sally’s voice?<br />
Segmentation and Clustering<br />
Will<br />
Tim<br />
?
<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />
39/60<br />
Speaker Identification/Verification/Recognition<br />
Verification<br />
verify someone’s claimed identity, e.g.<br />
is the person who s/he claims to be<br />
Instead of password:<br />
say something instead of typing<br />
Identification<br />
“who is speaking”<br />
Identifies a speaker from an enrolled<br />
population by searching the database<br />
Personalized behavior:<br />
customize machine reaction automatically<br />
to the current user<br />
Recognition<br />
Often used to refer to all problems of<br />
verification, identification,<br />
segmentation&clustering
<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />
40/60<br />
Speaker Segmentation and Clustering<br />
Segmentation: Automatically segment incoming speech by speaker<br />
Clustering: cluster segments of the same speaker<br />
Adaptation: use parameters that are optimized recognition for specific speaker<br />
Mandarin Broadcast News<br />
Overlapping speech<br />
Speaker turn miss<br />
Speech over noise
<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />
41/60<br />
Language Identification<br />
o Auswahl Erkenner (bei multilingualer Spracherkennung)<br />
o Anrufweiterleitung (z.B. 911 emergency line)<br />
o Datenanalyse, Auswahl<br />
o Spezialfall: Akzenterkennung<br />
o Optimierung aller Systemparameter auf Sprecherakzent<br />
o E-Language Learning<br />
Japanese<br />
Tanja Schultz, Identifizierung von Sprachen -Exemplarisch aufgezeigt am Beispiel der Sprachen Deutsch, Englisch und Spanisch,<br />
Diplomarbeit, Institut für Logik, Komplexität und Deduktionssysteme, Universität Karlsruhe, April 1995
<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />
42/60<br />
o Originalsignal<br />
FarSID: Far-Field Speaker Recognition<br />
o Effekt Echo (mittelgroßer Raum, 1-m Distanz zum Micro):<br />
0.5 sec 1 sec 2 sec<br />
o Effekt Distance (mittelgroßer Raum, .5-sec Echo):<br />
1 m 2 m 4 m<br />
o Effect Raumgröße (1-m Distanz, .5-sec Echo)<br />
Klein Mittelgroß Groß<br />
Q. Jin, Y. Pan, T. Schultz, Far-Field Speaker Recognition, Proceedings of the IEEE International<br />
Conference on Acoustics, Speech, and Signal Processing, ICASSP, Toulouse, France, 2006
<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />
43/60<br />
The dream (?) of communicating<br />
across language boundaries<br />
- A babelfish for everybody -<br />
Global Communication<br />
• Fun, Everyday life:<br />
• Chat in your mother tongue<br />
Worldwide<br />
• Travel without comm. problems<br />
• Business:<br />
• Negotiate and being sure that<br />
your partner is getting it right<br />
• Computer has no stakes, e.g.<br />
neutral translation, not lopsided<br />
• Face-to-Face Communication<br />
• Over the phone or internet<br />
• Text-to-Text vs Speech-to-Speech<br />
„The building of the tower of Babel“,<br />
1563 by Pieter Brueghel,<br />
Kunsthistorisches Museum, Vienna<br />
The building of the Tower of Babel<br />
and the Confusion of Tongues<br />
(languages) in ancient Babylon<br />
mentioned in Genesis<br />
"Babel" is composed of two words<br />
"baa“meaning "gate" and "el," "god."<br />
Hence, "the gate of god.“ A related<br />
word in Hebrew, "balal" means<br />
"confusion."
<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />
44/60<br />
GALE = Global Autonomous Language Exploitation:<br />
Process huge volumes of speech and text data in<br />
multiple languages (Arabic, Chinese, English)<br />
• Broadcast News, Shows, Telephone Conversations<br />
Apply automatic technology to spoken and written language:<br />
• Absorb, Analyze, and Interpret<br />
Deliver pertinent information in easy-to-understand<br />
forms to monolingual analysts<br />
Three engines:<br />
- Transcription,<br />
- Translation,<br />
- Distillation<br />
GALE
<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />
45/60<br />
ASR<br />
SMT<br />
Demonstration GALE – Chinese TV<br />
Mandarin<br />
Broadcast News<br />
CCTV<br />
recorded in the US<br />
over satellite<br />
Transforming the<br />
Mandarin speech<br />
Into Chinese text<br />
using Automatic<br />
Speech Recognition<br />
Translating from<br />
Chinese text into<br />
English text<br />
using Statistical<br />
Machine Translation<br />
H. Yu, Y.C. Tam, T. Schaaf, S. Stüker, Q. Jin, M. Noamany, T. Schultz, The ISL RT04 Mandarin Broadcast<br />
News Evaluation System, EARS Rich Transcription Workshop, Palisades, NY, November 2004
<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />
46/60<br />
o Tourism<br />
PDA Speech Translation in Mobile Scenarios<br />
o Needs in Foreign Country<br />
o International Events<br />
o Conferences<br />
o Business<br />
o Olympics<br />
o Humanitarian Needs<br />
o Humanitarian,<br />
Government<br />
o Emergency line 911<br />
o USA, multicultural<br />
population<br />
o Army, peace corps<br />
A. Waibel, A. Badran, A. Black, R. Frederking, D. Gates, A. Lavie, L. Levin, K. Lenzo, L Mayfield Tomokiyo,<br />
J. Reichert, T. Schultz, D. Wallace, M. Woszczyna, J. Zhang, Speechalator: Two-way Speech-to-Speech Translation<br />
in your Hand. HLT-NAACL 2003, Edmonton, Alberta, Canada, 2003
<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />
47/60<br />
Verbmobil<br />
Talk to people (face-to-face) from/in other countries in your own<br />
language.<br />
A step towards Startrek's "Universal Translator“
<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />
48/60<br />
Mobility: Personal Digital Assistants<br />
Use your PDA or cellular phone to get help<br />
• Navigation<br />
• Translation<br />
• Information (travel, transportation, medical, ...)<br />
Demo
<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />
49/60<br />
SPICE: Rapid Language Adaptation Tools<br />
Major Problem: Tremendous costs and time for development<br />
o Very few languages (≤ 50 out of 6900) with many resources<br />
o Lack of conventions (e.g. Languages without writing system)<br />
o Gap between technology and language expertise<br />
⇒ SPICE: Intelligent system that learns language from user<br />
o Speech Processing: Interactive Creation and Evaluation toolkit<br />
o Develop web-based toolkits for Speech Processing: ASR, MT, TTS<br />
o http://cmuspice.org<br />
o Interactive efficient learning<br />
Interactive learning:<br />
Demo<br />
o Solicite knowledge from user in the loop<br />
o Rapid adaptation of language independent models<br />
Efficiency:<br />
o Reduce time and costs by a factor of 10<br />
T. Schultz, A. Black, S. Badaskar, M. Hornyak, J. Kominek, SPICE: Web-based Tools for Rapid Language<br />
Adaptation in Speech Processing <strong>Systems</strong>, Proceedings of Interspeech, Antwerp, Belgium, August 2007
<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />
50/60<br />
Meeting Room<br />
The Meeting Browser is a powerful tool that allows us to record a new<br />
meeting, review or summarize an existing meeting or search a set of<br />
existing meetings for a particular speaker, topic, or idea.
<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />
51/60<br />
Indexing Acoustic Documents<br />
The world is flooded with<br />
information.<br />
More and more<br />
information is coming<br />
through audio-visual<br />
channels.<br />
Trying to find information<br />
in acoustic documents<br />
needs an intelligent<br />
acoustic search engine.
<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />
52/60<br />
View4You / Informedia<br />
Automatically records Broadcast News and allows the<br />
user to retrieve video segments of news items for<br />
different topics using spoken language input
<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />
53/60<br />
Education, Learning Languages<br />
• LISTEN: Automated reading tutor that<br />
listens to a child read it aloud a displayed<br />
text, and helps where needed.<br />
• CHENGO: web-based language learning<br />
in a gaming environment for English,<br />
Chinese<br />
• Programm CALL at CMU on Computer<br />
Assisted Language Learning
<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />
54/60<br />
Robust and Confidential Speech Recognition<br />
Traditional Speech Recognition:<br />
• Capture the acoustic sound wave by microphone<br />
• Transform signal into electrical energy<br />
Requirements and Challenges:<br />
• Audibility:<br />
Speech needs to be perceivable by microphone<br />
(no low voice or whispering, no silent speech)<br />
• Interference: Speech disturbs others<br />
(no speaking in libraries, theaters, meetings)<br />
• Privacy: Speech signal can be captured by others<br />
(no confidential phone calls in public places)<br />
• Robustness:<br />
Signal is corrupted by noisy environment<br />
(difficult to recognize in restaurants, bars, cars)
<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />
55/60<br />
Bone-conduction<br />
o When we speak normally our body is a resonance box<br />
Skin and bones vibrate when we speak (try this!)<br />
o Capture this vibration by so-called bone-conducting<br />
or skin-conducting microphones<br />
Zheng et al.<br />
Jou et al. / Intecs<br />
o Whispered speech is defined as:<br />
o the articulated production of respiratory sound<br />
o with few or no vibration of the vocal-folds<br />
o produced by the motion of the articulator apparatus<br />
o transmitted through the soft tissue or bones of the head<br />
Stethoscopic<br />
Microphone<br />
Nakajima
<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />
56/60<br />
Approach:<br />
Electromyography – Silent Speech<br />
– Surface Electromyography (EMG)<br />
– Surface = No needles<br />
– Electro = electrical activity<br />
– Myo = muscle<br />
– Graphy = recording<br />
- Measure the electrical activity of facial<br />
muscles by capturing the electrical<br />
capacity differences<br />
s1<br />
s2<br />
EMG-Signal<br />
- MOTION is recorded, not acoustic signal<br />
⇒ silently moving the lips / articulators<br />
is good enough<br />
Demo<br />
SILENT SPEECH<br />
s1 – s2
<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />
57/60<br />
Delphinisch<br />
Kommunikation über Sprachgrenzen → über Speziesgrenzen<br />
• Zusammenarbeit mit Wild Dolphins Project<br />
• freilebende Atlantis Spotted Dolphins<br />
• Bestimmung, Verhalten, Kommunikation<br />
• Kommunikation mit Delphinen<br />
• Delphine versuchen Kontakt aufzunehmen<br />
• Information 20Mio Jahre alte Spezies<br />
• “Dolphone” und “Delphinisch”<br />
• Lautproduktion, Perzeption, Frequenz,<br />
Medium<br />
• Mustererkennung, Extraktion,<br />
Clustering, Statistische Modellierung<br />
• Audio- und Video indexing, archiving, retrieval<br />
• Audioaufnahme, -analyse, -synthese, -übersetzung<br />
http://wilddolphinproject.com
<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />
58/60<br />
Even Beyond Human Speech …<br />
Towards Communication with Dolphins<br />
Why do we want to talk to Dolphins?<br />
• They might have a lot to say (20Mio old species)<br />
• It is a challenging scientific problem<br />
- Cross language boundaries <br />
Cross species boundaries<br />
- Different sound production, perception, …<br />
- Different medium (water), transmission, omni-directional<br />
• Nothing is known about dolphins’ language<br />
• It involves spending a lot of time in the Bahamas ☺<br />
Why do Dolphins want to talk to us?<br />
We don’t know …<br />
… but there is evidence that they try hard<br />
CMU: www.cs.cmu.edu/~tanja<br />
Wild Dolphin Project<br />
(http://wilddolphinproject.com)
<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />
59/60<br />
Conclusions<br />
Speech:<br />
• Is the most natural way of communication for human beings<br />
• Does not require any teaching or practicing<br />
• Has high bandwidth (speaking is faster than typing)<br />
• Supplements other communication channels (Multimodality)<br />
Speech Recognition is useful:<br />
• In hands-busy and eyes-busy environments<br />
• For mobile / small devices<br />
• Support in everyday life, Help for physically challenged folks<br />
Speech Recognition and Understanding:<br />
• Allows to (remotely) operate Machines<br />
• Supports global communication between humans<br />
• Break language (and maybe sometimes cultural) barriers