14.07.2013 Views

CSL - Cognitive Systems Lab

CSL - Cognitive Systems Lab

CSL - Cognitive Systems Lab

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Vorlesung SS 2008<br />

Multilinguale Mensch-Maschine<br />

Kommunikation<br />

Prof. Dr. Tanja Schultz<br />

Dipl.-Inform. Felix Putze<br />

Universität Karlsruhe, Fakultät für Informatik<br />

Dienstag, 15. April 2008


<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />

2/60<br />

Überblick<br />

Vorlesung 1: Übersicht und Einführung<br />

Allgemeine Informationen zur Vorlesung<br />

Vorstellen des Lehrstuhls<br />

interACT<br />

Hinführung zum Thema<br />

Inhaltliche Struktur der Vorlesung<br />

Fragen und Antworten


<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />

3/60<br />

Allgemeine Informationen: Vorlesung<br />

Weiterführende Vorlesung im Hauptdiplom<br />

o Vorkenntnisse sind nicht erforderlich<br />

Prüfungsmöglichkeit:<br />

o Ja, in Kognitive Systeme und Anthropomatik<br />

Turnus:<br />

o Jährlich im SS, 4+0<br />

o Prüfung in jedem Prüfungszeitraum<br />

Termine:<br />

o 27 Vorlesungstermine<br />

o Di 14:00 – 15:30 (HS -101) und Do 14:00 – 15:30 (SR 131)<br />

o Start 15.04.08, Ende 17.07.08<br />

DozentInnen:<br />

o Prof. Dr. Tanja Schultz<br />

o Dipl.-Inform. Felix Putze


<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />

4/60<br />

Allgemeine Informationen: Vorlesung<br />

o Alle Vorlesungsunterlagen befinden sich unter<br />

http://csl.ira.uka.de > Lehre > SS2008 > MMMK<br />

o Alle Folien als pdf (kein passwd Schutz)<br />

o Aktuelle Änderungen, Ankündigungen, Syllabus<br />

o Gegebenenfalls zusätzliches Material (papers)<br />

o Grundlagen für Prüfungen:<br />

o Vorlesungsinhalt, Folien, zusätzliches Material<br />

o Fragen, Probleme und Kommentare sind jederzeit während der<br />

Vorlesung willkommen, oder im persönlichen Gespräch:<br />

o Tanja Schultz (tanja@ira.uka.de)<br />

o Felix Putze (putze@ira.uka.de)<br />

o Sprechstunden Tanja Schultz nach Vereinbarung


<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />

5/60<br />

Allgemeine Informationen: <strong>CSL</strong><br />

o Lehrstuhl für Kognitive Systeme seit 1. Juni 2007<br />

o Universität Karlsruhe, Fakultät für Informatik<br />

o Institut für Algorithmen und Kognitive Systeme (IAKS)<br />

o Gründung des <strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong>oratory (<strong>CSL</strong>)<br />

o Homepage: http://csl.ira.uka.de<br />

o Informatikgebäude 50.34, 2.OG linker Flügel<br />

o Kontakt:<br />

o Prof. Dr.-Ing. Tanja Schultz<br />

o tanja@ira.uka.de<br />

o +49 721 608 6300<br />

o Sekretariat Frau Helga Scherer<br />

o scherer@ira.uka.de<br />

o +49 721 608 6312


<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />

6/60<br />

Forschung: Human-Centered Technologies<br />

Anwendungsfeld Mensch-Maschine Interaktion<br />

Herasusforerderungen und Aufgagen:<br />

Produktivität und Usability<br />

Kommunikation des Menschen mit seiner Umwelt<br />

im weitesten Sinn:<br />

Sprache, Bewegung, Biosignale<br />

Technologien und Methoden:<br />

Erkennen, Verstehen, Identifizieren<br />

Statistische Modellierung, Klassifikation, ...<br />

Anwendungsfeld Mensch-Mensch Kommunikation<br />

Herausforderung und Aufgaben:<br />

Sprachenvielfalt, kulturelle Barrieren<br />

Aufwand und Kosten


<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />

7/60<br />

Lehre am <strong>CSL</strong> – Winter<br />

Wintersemester<br />

o Biosignale und Benutzerschnittstellen<br />

o 4+0, prüfbar in Kognitive Systeme und Anthropomatik<br />

o Einführung in Erfassung und Interpretation von Biosignalen<br />

o Anwendungsbeispiele<br />

o Analyse und Modellierung menschlicher Bewegungen<br />

o Einführung in die Analyse, Modellierung, und Erkennung menschlicher<br />

Bewegungsabläufe (gemeinsam mit Dr. Annika Wörner)<br />

o 2+0, prüfbar in Kognitive Systeme und Anthropomatik<br />

o Multilingual Speech Processing<br />

o 2+0, Seminar, nicht prüfbar<br />

o Entwicklung von Sprachübersetzungssystemen mittels Rapid<br />

Language Adaptation Tools<br />

o Einblick aus Praxis (IBM) über Standards (VoiceXML)<br />

o Seminar findet zeitsynchron an UKA und CMU statt über VC<br />

o erste gemeinsame Lehrveranstaltung im Rahmen des interACT!<br />

⇒ interACT


<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />

8/60<br />

Sommersemester<br />

o Multilinguale Mensch-Maschine Kommunikation<br />

o 4+0, prüfbar in Kognitive Systeme und Anthropomatik<br />

o Einführung in die automatische Spracherkennung und -verarbeitung<br />

o Signalverarbeitung, statistische Modellierung, praktische Ansätze<br />

und Methoden, Multilingualität<br />

o Anwendungen in Mensch-Mensch Kommunikation und Mensch-<br />

Maschine Interaktion<br />

o Anwendungsbeispiele<br />

o Praktikum: Biosignale<br />

o Praktische Entwicklung<br />

Lehre am <strong>CSL</strong> – Sommer<br />

o Aufnahme von Bewegungsdaten (in Koop mit Sportinstitut)<br />

o Verschiedene Biosensoren (Vicon, Beschleunigungssensoren, EMG)<br />

o Automatischer Bewegungserkennung<br />

o Seminar: Mensch Maschine Emotion<br />

o Einführung in Modellierung und Erkennung von Emotion im Kontext<br />

von Mensch-Maschine Interaktion


interACT –<br />

eine wachsende Kooperation zwischen<br />

Universität Karlsruhe (TH)<br />

&<br />

Carnegie Mellon University, Pittsburgh, USA<br />

&<br />

Hong Kong University of Science and<br />

Technology<br />

KONTACT:<br />

interACT-Presse & Kommunikation, Margit Rödder, E-Mail:roedder@ira.uka.de


interACT:<br />

Seit 2004 gemeinsames Forschungszentrum von Carnegie Mellon<br />

University (CMU) und Universität Karlsruhe (TH)<br />

Februar 2007: Erweiterung um Hong Kong University of Science and<br />

Technology (HKUST)<br />

Gemeinsame Forschungsprojekte zwischen CMU, UKA und HKUST<br />

Gastvorlesungen, Workshops, Sommerakademien, Studien- und<br />

Forschungsaufenthalte<br />

Begrenzte Zahl von Stipendien für herausragende Studierende, die<br />

ihre Studien der Informatik mit einem Forschungsaufenthalt in den<br />

USA und nun auch Hong Kong vertiefen möchten<br />

KONTACT:<br />

interACT-Presse & Kommunikation, Margit Rödder, E-Mail:roedder@ira.uka.de<br />

Prof. Tom Mitchell,<br />

Vorlesung an UKA<br />

UKA-Rektor H. Hippler mit CMU-<br />

Präsident J.L. Cohon während seines<br />

Besuchs an UKA, Sept. 2005


interACT Studenten-Austausch 2004 - 2008<br />

CMU-Präsident J.L. Cohon mit interACT-<br />

Studenten an Universität Karlsruhe, Juni 2006<br />

KONTACT:<br />

interACT-Presse & Kommunikation, Margit Rödder, E-Mail:roedder@ira.uka.de<br />

65 Austausche bislang:<br />

• 1 Post Doc<br />

• 5 PhD<br />

• 34 Diplomarbeiten<br />

• 25 Studienarbeiten


interACT Stipendien<br />

Wer wird gefördert?<br />

• Studierende der Informatik und<br />

Informationswirtschaft der Universität Karlsruhe<br />

(TH), von Carnegie Mellon University und Hong<br />

Kong University of Science and Technology<br />

Was wird gefördert?<br />

• Studien- & Bachelorarbeiten, bis zu 3 Monaten<br />

• Diplom- & Masterarbeiten, bis zu 8 Monaten<br />

• Doktorarbeiten, bis zu 12 Monaten<br />

Bewerbungsvoraussetzungen<br />

• Mindestens abgeschlossenes Vordiplom<br />

• Gute Englisch-Kenntnisse<br />

• Betreuender Professor an UKA und CMU<br />

KONTACT:<br />

interACT-Presse & Kommunikation, Margit Rödder, E-Mail:roedder@ira.uka.de<br />

Bewerbungsfristen<br />

Nächster Termin: 28.<br />

Mai 2008<br />

interACT Studenten D. Bertram & K.<br />

Steinbach in Prof. Kuffner‘s lab, CMU


interACT-Kooperationen UKA und CMU ( Stand April 2008)<br />

Prof. R. Dillmann – Dr. J. Kuffner,<br />

KONTACT:<br />

interACT-Presse & Kommunikation, Margit Rödder, E-Mail:roedder@ira.uka.de<br />

Prof. R. Simmons,<br />

Prof. J. Dolan,<br />

Prof. Chr. Atkeson<br />

Prof. Ender Finol<br />

Prof. J. Henkel - Prof. M. Lewicki<br />

Prof. W. Juling – Prof. E. Nyberg<br />

Prof. M. Shaw<br />

Prof. P. Schmid – Prof. E. Clarke<br />

Prof. Studer – Prof. Sycara<br />

Prof. R. Dillmann präsentiert seine Forschung<br />

an Präsident Cohon, UKA, Sept. 2005


interACT-Kooperationen UKA und CMU ( Stand April 2008)<br />

Prof. W. Tichy – Dr. J. Herbsleb<br />

KONTACT:<br />

interACT-Presse & Kommunikation, Margit Rödder, E-Mail:roedder@ira.uka.de<br />

Prof. M. Shaw<br />

Prof. R. Vollmar – Prof. Sutner<br />

Prof. D. Wagner – Prof. Miller<br />

Prof. Ravi<br />

Prof. Chr. Weinhardt – Prof. R. Krishnan<br />

– Prof. Ramayya<br />

Prof. Alex Waibel & Prof. T. Schultz – Prof. Alex Waibel<br />

Prof. T. Schultz<br />

Prof. A. Black


interACT-Kooperationen UKA und HKUST<br />

(Stand April. 2008)<br />

Prof. Tanja Schultz – Prof. Dekai Wu<br />

KONTACT:<br />

interACT-Presse & Kommunikation, Margit Rödder, E-Mail:roedder@ira.uka.de<br />

Prof. Pascale Fung<br />

Prof. Alex Waibel – Prof. Dekai Wu<br />

Prof. Pascale Fung<br />

Prof. Dekai Wu, Vortrag an UKA, Juli 2007


interACT Advisory Board wählt 2x pro Jahr interACT-<br />

Stipendiaten aus<br />

Mitglieder des interACT-Advisory Boards sind:<br />

• Prof. Alex Waibel, Universität Karlsruhe, Carnegie Mellon University (interACT Director)<br />

• Prof. Tanja Schultz, Universität Karlsruhe, Carnegie Mellon University (interACT Ass. Director)<br />

• Prof. Rüdiger Dillmann, Universität Karlsruhe<br />

• Prof. Jörg Henkel, Universität Karlsruhe<br />

• Prof. Wilfried Juling, Universität Karlsruhe<br />

• Prof. Walther Tichy, Universität Karlsruhe<br />

KONTACT:<br />

interACT-Presse & Kommunikation, Margit Rödder, E-Mail:roedder@ira.uka.de


interACT Büros<br />

in Karlsruhe, Pittsburgh und Hong Kong<br />

• Organisation des interACT Austausch-Programms, Kontaktpersonen für Studierende<br />

• Presse- und Public Relations<br />

• Koordination der distinguished lecture series<br />

• Organisation von Workshops, academies und Events<br />

Kontaktpersonen<br />

Universität Karlsruhe (TH)<br />

Margit Rödder<br />

Am Fasanengarten 5<br />

76131 Karlsruhe<br />

Tel.: +49 721 608 8676<br />

E-mail: roedder@ira.uka.de<br />

KONTACT:<br />

interACT-Presse & Kommunikation, Margit Rödder, E-Mail:roedder@ira.uka.de<br />

Carnegie Mellon University, interACT<br />

Lisa Mauti<br />

407, South Craig Street<br />

Pittsburgh, PA 15221<br />

Tel: +1 412 268 1461<br />

lmauti@cs.cmu.edu<br />

Hong Kong University<br />

of Science and Technology<br />

Prof. Pascale Fung<br />

Tel.: +852 2358 7087<br />

pascale@ee.ust.hk


weitere Informationen über interACT<br />

KONTACT:<br />

interACT-Presse & Kommunikation, Margit Rödder, E-Mail:roedder@ira.uka.de<br />

http://interact.ira.uka.de


<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />

19/60<br />

o Ausfüllen!<br />

Hörerliste<br />

N Nachname, Vorname Fach, Semester Mtr.-Nr Email<br />

1 SCHULTZ, Tanja Informatik, 36 tanja@ira.uka.de<br />

2


<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />

20/60<br />

Literatur<br />

Xuedong Huang, Alex Acero and Hsiao-wuen Hon, Spoken<br />

Language Processing, Prentice Hall PTR, NJ, 2001<br />

($81.90 internet price)<br />

Rabiner and Juang, Fundamentals of Speech Recognition,<br />

Prentice Hall Signal Processing Series, Englewood<br />

Cliffs, NJ, 1993<br />

Jelinek, Statistical Methods for Speech Recognition, MIT<br />

Press, Cambridge, MA, 1997 ($35)<br />

Schultz and Kirchhoff, Multilingual Speech Processing,<br />

Elsevier, Academic Press, 2006<br />

(ask the authors for discounts!)<br />

+ diverse Artikel (pdf), die wir im Web zur Verfügung stellen


<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />

21/60<br />

Nützliche Links, Zusätzliches Material<br />

• Alle Folien werden als pdf ins Web gestellt<br />

http://csl.ira.uka.de > Lehre > SS2008 > MMMK<br />

• Dr. Ivica Rogina's Vorlesung (bis einschließlich SS 2006)<br />

http://isl.ira.uka.de/~stueker/sprachVorlesung<br />

• Elektronisches Archiv vieler Publikationsbände und Berichte<br />

(Proceedings) der wichtigsten Konferenzen zum Thema<br />

“Speech and Language”<br />

• http://csl.ira.uka.de (passwd)<br />

• ICASSP (International Conference on Acoustics, Speech, and Signal<br />

Processing)<br />

• Interspeech (Zusammenschluss von Eurospeech und I<strong>CSL</strong>P)<br />

• ASRU (Automatic Speech Recognition and Understanding)<br />

• ACL (Association of Comp Linguistics), NA-ACL (North American ACL)<br />

• HLT (Human Language Technologies) …


<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />

22/60<br />

Weitere Veranstaltungen<br />

1) Seminar + Praktikum: Multilingual Speech Processing<br />

- Do-it-yourself Kurs, Entwicklung eines Sprachinterface<br />

- SIE wählen die Sprache<br />

- WIR haben die tools (Rapid Language Adaptation Toolkit)<br />

Der Kurs wird gemeinsam mit der Carnegie Mellon University<br />

angeboten, d.h. indet LIVE per VC mit CMU statt<br />

2) Janus-Praktikum von Sebastian Stüker, Lehrstuhl Prof. A Waibel<br />

3) Vorlesung und Praktikum zum Thema Biosignale<br />

Biosignale sind Signale, die der menschlichen Körper aufgrund<br />

physikalischer oder biochemischer Gesetzmäßigkeiten erzeugt<br />

Erfassung und Interpretation von ElektroBiosignalen und deren<br />

Anwendung in der Mensch-Maschine Kommunikation<br />

- Elektroenzephalografie – Messung der Großhirnaktivität,<br />

- Elektromyografie – Messung der Muskeltätigkeit<br />

-uvm.


<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />

23/60<br />

SS2008 Vorlesungen zu verwandten Themen<br />

o Mensch-Maschine-Emotion (Schultz/Laskowski)<br />

o Sprache und Emotion<br />

o Multimodale Benutzerschnittstellen (Stiefelhagen/Waibel)<br />

o Sprache als eine Modalität<br />

o Mustererkennung (Beyerer)<br />

o Grundlagen Mustererkennung<br />

o Mensch-Maschine-Interaktion (Wörn/Burghart)<br />

o Mensch-Roboter Kooperation (Wörn/Burghart)<br />

o Schwerpunkt: Herausforderung Humanoide Roboter


<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />

24/60<br />

o TBD<br />

Syllabus


<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />

25/60<br />

Allgemeine Information: Ziel der Veranstaltung<br />

Ziele der Vorlesung<br />

o Sprache in der Mensch-Maschine Kommunikation<br />

o Vorteile von Sprache als Eingabesignal<br />

o Nachteile von Sprache als Eingabesignal<br />

o Grundlagen der Spracherkennung<br />

o Grundbegriffe<br />

o Sprachproduktion und Perzeption<br />

o Digitale Signalverarbeitung, Merkmalsextraktion<br />

o Statistische Modellierung, Klassifikation<br />

o Akustische Modellierung, HMMs<br />

o Sprachmodellierung<br />

o Weitere Themen der Sprachverarbeitung<br />

o Dialogmodellierung, Synthese, Übersetzung<br />

o Anwendungsbeispiele aus der Forschung


<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />

26/60<br />

Heute: Anwendungsbeispiele<br />

o Spracherkennung: Von Spracheingabesignal nach Text<br />

o Sprachsynthese: Von Text nach Sprachausgabesignal<br />

o Sprachübersetzung (über Sprachengrenzen):<br />

Von Sprachsignal in Sprache L1 zu Sprachsignal in L2<br />

= Spracherkennung + MT + Sprachsynthese<br />

o Sprachverstehen, Zusammenfassen<br />

= Von Spracheingabesignal nach Bedeutung<br />

o Sprachaktivität ist aber nicht nur das Was wird gesprochen<br />

Wer spricht? → SprecherIDentifizierung<br />

Welche Sprache wird gesprochen? → LanguageID<br />

Über was wird gesprochen? → TopicID<br />

Wie wird gesprochen? → EmotionID<br />

Zu wem wird gesprochen? → Focus of Attention<br />

o Übersetzung (über Speziesgrenzen)<br />

Beispiel Delphine


<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />

27/60<br />

Introduction<br />

• Each of the lessons covers one topic from<br />

“speech recognition and understanding”<br />

• It covers the most important areas of today’s<br />

research and also discusses some historic issues<br />

• The goal of the course is to introduce you to the<br />

science of automatic speech recognition and<br />

understanding<br />

• Today‘s topic:<br />

o Why are we doing Speech Recognition?<br />

o What are the advantages and disadvantages<br />

o Where is it useful?<br />

o Examples of applications, demos


<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />

28/60<br />

Why Automatic Speech Recognition?<br />

ADVANTAGES:<br />

• Natural way of communication for human beings<br />

• No practicing necessary for users, i.e. speech does not<br />

require any teaching as opposed to reading/writing<br />

• High bandwidth (speaking is faster than typing)<br />

• Additional communication channel (Multimodality)<br />

• Hands and eyes are free for other tasks<br />

→ Works in the car / on the run / in the dark<br />

• Mobility (microphones are smaller than keyboards)<br />

• Some communication channels (e.g. phone) are designed<br />

for speech<br />

• ...


<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />

29/60<br />

Why Automatic Speech Recognition?<br />

DISADVANTAGES:<br />

• Unusable where silence/confidentiality is required<br />

(meetings, library, spoken access codes)<br />

… we are working on solutions (see later)<br />

• Still unsatisfactory recognition rate when:<br />

• Environment is very noisy (party, restaurant, train)<br />

• Unknown or unlimited domains<br />

• Uncooperative speakers (whisper, mumble, …)<br />

• Problems with accents, dialects, code-switching<br />

• Cultural factors (e.g. collectivism, uncertainty avoidance)<br />

• Speech input is still more expensive than keyboard


<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />

30/60<br />

Input Speeds (Characters per Minute)<br />

Handwriting<br />

Typewriter<br />

Stenography<br />

Speech<br />

Mode Standard Best<br />

200 500<br />

200 1000<br />

500 2000<br />

1000 4000


<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />

31/60<br />

Where is Speech Recognition and Understanding useful<br />

Human - Machine Interaction:<br />

1. Remote control applications<br />

• Operating Machines over the Phone<br />

2. Hands/Eyes busy or not useful<br />

• Speech Recognition in cars<br />

• Help for Physically Challenged, Nurse bots<br />

3. Authentication<br />

• Speaker Identification/Verification/Segmentation<br />

• Language/Accent Identification<br />

4. Entertainment / Convenience<br />

• Speech Recognition for Entertainment<br />

• Gaming<br />

5. Indexing and Transcribing Acoustic Documents<br />

• Archive, Summarize, Search and Retrieve


<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />

32/60<br />

Where is Speech Recognition and Understanding useful<br />

Human - Human Interaction:<br />

1. Mediate communication across language boundaries<br />

• Speech Translation<br />

• Language Learning<br />

• Synchronization / Sign Language<br />

2. Support human interaction<br />

• Meeting and Lecture systems<br />

• Non-verbal Cue Identification<br />

• Multimodal applications<br />

• Speech therapy support


<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />

33/60<br />

Operating Machines over the Phone<br />

• Remote Controlled Home<br />

• Operate heating / air conditioning, turn lights on/off, check email<br />

• Voice-Operated Answering Machine<br />

• Call answering machine from anywhere and discuss recent calls<br />

• Access Databases<br />

• Pittsburgh Bus Information with CMU’s Let’s Go at 412-268-3526<br />

• Check the weather with MIT’s Jupiter at 1-888-573-8255<br />

• Zugauskunft (Erlangen), Telefonauskunft, Fluggesellschaften, Kino<br />

• Call Center<br />

• Route or dispatch calls, 911 emergency line<br />

• AT&T: How may I help you?<br />

The HMIHY system was deployed in 2001, and according to AT&T<br />

was handling more than 2 million calls per month by the end of 2001.<br />

• Use Interactive Services worldwide<br />

• Plan your next trip with an artificial travel agent


<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />

34/60<br />

Hands-Free / Eyes-Free Tasks<br />

• Hands and/or Eyes are busy with tools<br />

• Radio repair<br />

• Construction site<br />

• Hands and/or Eyes are needed to operate machines/cars<br />

• Hold the steering wheel<br />

• Pull levers, turn knobs, operate switches<br />

• Watch the street while driving<br />

• Monitor production line<br />

• Hands are working on other people<br />

• Hair stylist cutting hair<br />

• Surgeon working on a patient<br />

• Hands and/or Eyes are not helpful in the environment<br />

• Dark rooms (photography)<br />

• Outer Space (remote control)


<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />

35/60<br />

Speech Recognition in Cars<br />

• Use your cellular phone while keeping your hands on the<br />

wheel and eyes on the street, e.g. voice dialing<br />

• Operate your audio device while driving<br />

• Dictate messages (e-mails, SMS)<br />

TODAY several companies and services<br />

are emerging which do exactly this<br />

• Talk to your personal digital assistant<br />

• Navigation -<br />

Ask your way through a foreign city<br />

Find the nearest restaurant


<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />

36/60<br />

Support in everyday life, Help for Elderly and Physically Challenged<br />

People who are immobile such as lying in bed/hospital or who can‘t use the<br />

hands due to illness or accidents<br />

• operate parts of their environment/machines by voice<br />

• ask a robot for help<br />

Nursebot Pearl and Florence: ISAC feeding a physically challenged individual<br />

CMU‘s Robotic assistant for the elderly Center for Intelligent <strong>Systems</strong>, Vanderbilt Univ<br />

SFB588, UKA<br />

Humanoide Roboter<br />

Children with speaking disorders make significant improvements by trying<br />

to make a speech recognizer understand them<br />

Children with dyslexia and similar problems learn to read faster using<br />

automatic speech recognition


<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />

37/60<br />

Information in Sprache<br />

Speech<br />

Recognition<br />

Language<br />

Recognition<br />

Speaker<br />

Recognition<br />

:<br />

:<br />

Emotion<br />

Recognition<br />

Accent<br />

Recognition<br />

Words<br />

Onune baksana be adam!<br />

Language Name<br />

Turkish<br />

Speaker Name<br />

Umut<br />

:<br />

:<br />

Emotion<br />

Angry<br />

Accent<br />

Istanbul<br />

Topic ID: Chemicals<br />

Entity Tracking: Istanbul<br />

Acoustic Scene: Bus Station<br />

Discourse Analysis: Negotiation<br />

Tanja Schultz, Speaker Characteristics, In: C. Müller (Ed.) Speaker Classification, Lecture Notes in Computer<br />

Science / Artificial Intelligence, Springer, Heidelberg - Berlin - New York, Volume 4343.


<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />

38/60<br />

Identification<br />

Whose voice is it?<br />

Where are the speaker changes?<br />

Which segments are from the same speaker?<br />

?<br />

?<br />

Speaker Recognition<br />

?<br />

Verification/Detection<br />

Is it Sally’s voice?<br />

Segmentation and Clustering<br />

Will<br />

Tim<br />

?


<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />

39/60<br />

Speaker Identification/Verification/Recognition<br />

Verification<br />

verify someone’s claimed identity, e.g.<br />

is the person who s/he claims to be<br />

Instead of password:<br />

say something instead of typing<br />

Identification<br />

“who is speaking”<br />

Identifies a speaker from an enrolled<br />

population by searching the database<br />

Personalized behavior:<br />

customize machine reaction automatically<br />

to the current user<br />

Recognition<br />

Often used to refer to all problems of<br />

verification, identification,<br />

segmentation&clustering


<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />

40/60<br />

Speaker Segmentation and Clustering<br />

Segmentation: Automatically segment incoming speech by speaker<br />

Clustering: cluster segments of the same speaker<br />

Adaptation: use parameters that are optimized recognition for specific speaker<br />

Mandarin Broadcast News<br />

Overlapping speech<br />

Speaker turn miss<br />

Speech over noise


<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />

41/60<br />

Language Identification<br />

o Auswahl Erkenner (bei multilingualer Spracherkennung)<br />

o Anrufweiterleitung (z.B. 911 emergency line)<br />

o Datenanalyse, Auswahl<br />

o Spezialfall: Akzenterkennung<br />

o Optimierung aller Systemparameter auf Sprecherakzent<br />

o E-Language Learning<br />

Japanese<br />

Tanja Schultz, Identifizierung von Sprachen -Exemplarisch aufgezeigt am Beispiel der Sprachen Deutsch, Englisch und Spanisch,<br />

Diplomarbeit, Institut für Logik, Komplexität und Deduktionssysteme, Universität Karlsruhe, April 1995


<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />

42/60<br />

o Originalsignal<br />

FarSID: Far-Field Speaker Recognition<br />

o Effekt Echo (mittelgroßer Raum, 1-m Distanz zum Micro):<br />

0.5 sec 1 sec 2 sec<br />

o Effekt Distance (mittelgroßer Raum, .5-sec Echo):<br />

1 m 2 m 4 m<br />

o Effect Raumgröße (1-m Distanz, .5-sec Echo)<br />

Klein Mittelgroß Groß<br />

Q. Jin, Y. Pan, T. Schultz, Far-Field Speaker Recognition, Proceedings of the IEEE International<br />

Conference on Acoustics, Speech, and Signal Processing, ICASSP, Toulouse, France, 2006


<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />

43/60<br />

The dream (?) of communicating<br />

across language boundaries<br />

- A babelfish for everybody -<br />

Global Communication<br />

• Fun, Everyday life:<br />

• Chat in your mother tongue<br />

Worldwide<br />

• Travel without comm. problems<br />

• Business:<br />

• Negotiate and being sure that<br />

your partner is getting it right<br />

• Computer has no stakes, e.g.<br />

neutral translation, not lopsided<br />

• Face-to-Face Communication<br />

• Over the phone or internet<br />

• Text-to-Text vs Speech-to-Speech<br />

„The building of the tower of Babel“,<br />

1563 by Pieter Brueghel,<br />

Kunsthistorisches Museum, Vienna<br />

The building of the Tower of Babel<br />

and the Confusion of Tongues<br />

(languages) in ancient Babylon<br />

mentioned in Genesis<br />

"Babel" is composed of two words<br />

"baa“meaning "gate" and "el," "god."<br />

Hence, "the gate of god.“ A related<br />

word in Hebrew, "balal" means<br />

"confusion."


<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />

44/60<br />

GALE = Global Autonomous Language Exploitation:<br />

Process huge volumes of speech and text data in<br />

multiple languages (Arabic, Chinese, English)<br />

• Broadcast News, Shows, Telephone Conversations<br />

Apply automatic technology to spoken and written language:<br />

• Absorb, Analyze, and Interpret<br />

Deliver pertinent information in easy-to-understand<br />

forms to monolingual analysts<br />

Three engines:<br />

- Transcription,<br />

- Translation,<br />

- Distillation<br />

GALE


<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />

45/60<br />

ASR<br />

SMT<br />

Demonstration GALE – Chinese TV<br />

Mandarin<br />

Broadcast News<br />

CCTV<br />

recorded in the US<br />

over satellite<br />

Transforming the<br />

Mandarin speech<br />

Into Chinese text<br />

using Automatic<br />

Speech Recognition<br />

Translating from<br />

Chinese text into<br />

English text<br />

using Statistical<br />

Machine Translation<br />

H. Yu, Y.C. Tam, T. Schaaf, S. Stüker, Q. Jin, M. Noamany, T. Schultz, The ISL RT04 Mandarin Broadcast<br />

News Evaluation System, EARS Rich Transcription Workshop, Palisades, NY, November 2004


<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />

46/60<br />

o Tourism<br />

PDA Speech Translation in Mobile Scenarios<br />

o Needs in Foreign Country<br />

o International Events<br />

o Conferences<br />

o Business<br />

o Olympics<br />

o Humanitarian Needs<br />

o Humanitarian,<br />

Government<br />

o Emergency line 911<br />

o USA, multicultural<br />

population<br />

o Army, peace corps<br />

A. Waibel, A. Badran, A. Black, R. Frederking, D. Gates, A. Lavie, L. Levin, K. Lenzo, L Mayfield Tomokiyo,<br />

J. Reichert, T. Schultz, D. Wallace, M. Woszczyna, J. Zhang, Speechalator: Two-way Speech-to-Speech Translation<br />

in your Hand. HLT-NAACL 2003, Edmonton, Alberta, Canada, 2003


<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />

47/60<br />

Verbmobil<br />

Talk to people (face-to-face) from/in other countries in your own<br />

language.<br />

A step towards Startrek's "Universal Translator“


<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />

48/60<br />

Mobility: Personal Digital Assistants<br />

Use your PDA or cellular phone to get help<br />

• Navigation<br />

• Translation<br />

• Information (travel, transportation, medical, ...)<br />

Demo


<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />

49/60<br />

SPICE: Rapid Language Adaptation Tools<br />

Major Problem: Tremendous costs and time for development<br />

o Very few languages (≤ 50 out of 6900) with many resources<br />

o Lack of conventions (e.g. Languages without writing system)<br />

o Gap between technology and language expertise<br />

⇒ SPICE: Intelligent system that learns language from user<br />

o Speech Processing: Interactive Creation and Evaluation toolkit<br />

o Develop web-based toolkits for Speech Processing: ASR, MT, TTS<br />

o http://cmuspice.org<br />

o Interactive efficient learning<br />

Interactive learning:<br />

Demo<br />

o Solicite knowledge from user in the loop<br />

o Rapid adaptation of language independent models<br />

Efficiency:<br />

o Reduce time and costs by a factor of 10<br />

T. Schultz, A. Black, S. Badaskar, M. Hornyak, J. Kominek, SPICE: Web-based Tools for Rapid Language<br />

Adaptation in Speech Processing <strong>Systems</strong>, Proceedings of Interspeech, Antwerp, Belgium, August 2007


<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />

50/60<br />

Meeting Room<br />

The Meeting Browser is a powerful tool that allows us to record a new<br />

meeting, review or summarize an existing meeting or search a set of<br />

existing meetings for a particular speaker, topic, or idea.


<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />

51/60<br />

Indexing Acoustic Documents<br />

The world is flooded with<br />

information.<br />

More and more<br />

information is coming<br />

through audio-visual<br />

channels.<br />

Trying to find information<br />

in acoustic documents<br />

needs an intelligent<br />

acoustic search engine.


<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />

52/60<br />

View4You / Informedia<br />

Automatically records Broadcast News and allows the<br />

user to retrieve video segments of news items for<br />

different topics using spoken language input


<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />

53/60<br />

Education, Learning Languages<br />

• LISTEN: Automated reading tutor that<br />

listens to a child read it aloud a displayed<br />

text, and helps where needed.<br />

• CHENGO: web-based language learning<br />

in a gaming environment for English,<br />

Chinese<br />

• Programm CALL at CMU on Computer<br />

Assisted Language Learning


<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />

54/60<br />

Robust and Confidential Speech Recognition<br />

Traditional Speech Recognition:<br />

• Capture the acoustic sound wave by microphone<br />

• Transform signal into electrical energy<br />

Requirements and Challenges:<br />

• Audibility:<br />

Speech needs to be perceivable by microphone<br />

(no low voice or whispering, no silent speech)<br />

• Interference: Speech disturbs others<br />

(no speaking in libraries, theaters, meetings)<br />

• Privacy: Speech signal can be captured by others<br />

(no confidential phone calls in public places)<br />

• Robustness:<br />

Signal is corrupted by noisy environment<br />

(difficult to recognize in restaurants, bars, cars)


<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />

55/60<br />

Bone-conduction<br />

o When we speak normally our body is a resonance box<br />

Skin and bones vibrate when we speak (try this!)<br />

o Capture this vibration by so-called bone-conducting<br />

or skin-conducting microphones<br />

Zheng et al.<br />

Jou et al. / Intecs<br />

o Whispered speech is defined as:<br />

o the articulated production of respiratory sound<br />

o with few or no vibration of the vocal-folds<br />

o produced by the motion of the articulator apparatus<br />

o transmitted through the soft tissue or bones of the head<br />

Stethoscopic<br />

Microphone<br />

Nakajima


<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />

56/60<br />

Approach:<br />

Electromyography – Silent Speech<br />

– Surface Electromyography (EMG)<br />

– Surface = No needles<br />

– Electro = electrical activity<br />

– Myo = muscle<br />

– Graphy = recording<br />

- Measure the electrical activity of facial<br />

muscles by capturing the electrical<br />

capacity differences<br />

s1<br />

s2<br />

EMG-Signal<br />

- MOTION is recorded, not acoustic signal<br />

⇒ silently moving the lips / articulators<br />

is good enough<br />

Demo<br />

SILENT SPEECH<br />

s1 – s2


<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />

57/60<br />

Delphinisch<br />

Kommunikation über Sprachgrenzen → über Speziesgrenzen<br />

• Zusammenarbeit mit Wild Dolphins Project<br />

• freilebende Atlantis Spotted Dolphins<br />

• Bestimmung, Verhalten, Kommunikation<br />

• Kommunikation mit Delphinen<br />

• Delphine versuchen Kontakt aufzunehmen<br />

• Information 20Mio Jahre alte Spezies<br />

• “Dolphone” und “Delphinisch”<br />

• Lautproduktion, Perzeption, Frequenz,<br />

Medium<br />

• Mustererkennung, Extraktion,<br />

Clustering, Statistische Modellierung<br />

• Audio- und Video indexing, archiving, retrieval<br />

• Audioaufnahme, -analyse, -synthese, -übersetzung<br />

http://wilddolphinproject.com


<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />

58/60<br />

Even Beyond Human Speech …<br />

Towards Communication with Dolphins<br />

Why do we want to talk to Dolphins?<br />

• They might have a lot to say (20Mio old species)<br />

• It is a challenging scientific problem<br />

- Cross language boundaries <br />

Cross species boundaries<br />

- Different sound production, perception, …<br />

- Different medium (water), transmission, omni-directional<br />

• Nothing is known about dolphins’ language<br />

• It involves spending a lot of time in the Bahamas ☺<br />

Why do Dolphins want to talk to us?<br />

We don’t know …<br />

… but there is evidence that they try hard<br />

CMU: www.cs.cmu.edu/~tanja<br />

Wild Dolphin Project<br />

(http://wilddolphinproject.com)


<strong>Cognitive</strong> <strong>Systems</strong> <strong>Lab</strong><br />

59/60<br />

Conclusions<br />

Speech:<br />

• Is the most natural way of communication for human beings<br />

• Does not require any teaching or practicing<br />

• Has high bandwidth (speaking is faster than typing)<br />

• Supplements other communication channels (Multimodality)<br />

Speech Recognition is useful:<br />

• In hands-busy and eyes-busy environments<br />

• For mobile / small devices<br />

• Support in everyday life, Help for physically challenged folks<br />

Speech Recognition and Understanding:<br />

• Allows to (remotely) operate Machines<br />

• Supports global communication between humans<br />

• Break language (and maybe sometimes cultural) barriers

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!