13.12.2012 Aufrufe

DAGA 2010 - Deutsche Gesellschaft für Akustik eV

DAGA 2010 - Deutsche Gesellschaft für Akustik eV

DAGA 2010 - Deutsche Gesellschaft für Akustik eV

MEHR ANZEIGEN
WENIGER ANZEIGEN

Sie wollen auch ein ePaper? Erhöhen Sie die Reichweite Ihrer Titel.

YUMPU macht aus Druck-PDFs automatisch weboptimierte ePaper, die Google liebt.

68 <strong>DAGA</strong> <strong>2010</strong> Programm<br />

Di. 15:45 Grashof C 20 Auditorisch-visuelle Sprache<br />

Image-based Talking Head: Analysis and Synthesis<br />

K. Liu und J. Ostermann<br />

Leibniz Univ. Hannover, Institut <strong>für</strong> Informationsverarbeitung<br />

The development of modern human-computer interfaces and their applications<br />

such as E-Learning and web-based information services has<br />

been the focus of the computer graphics community in recent years.<br />

Image-based approaches for animating faces have achieved realistic<br />

talking heads. In this paper, our image-based talking head system is<br />

presented, which includes two parts: analysis and synthesis. In the analysis<br />

part, a subject reading a predefined corpus is recorded first. The<br />

recorded audio-visual data is analyzed in order to create a database<br />

containing a large number of normalized mouth images and their related<br />

information. The synthesis part generates natural looking talking heads<br />

from phonetic transcripts by the unit selection algorithm. The phonetic<br />

transcripts can be extracted from a TTS (Text-To-Speech) system (for<br />

text-driven animation) or from speech by an aligner (for speech-driven<br />

animation). The unit selection is to select and concatenate appropriate<br />

mouth images from the database by minimizing two costs: lip synchronization<br />

and smoothness. The lip synchronization measures how well the<br />

unit fits to the phonetic context, and the smoothness cost measures how<br />

well two units join together. Finally, the mouth images are stitched at the<br />

correct position on the face of a recorded video sequence and the talking<br />

head is displayed.<br />

Di. 16:10 Grashof C 20 Auditorisch-visuelle Sprache<br />

Adaptation of a Talking Head System to a Different Language<br />

M. Zelezny und Z. Krnoul<br />

University of West Bohemia, Plzen (CZ)<br />

This paper presents new techniques that were developed in order to adapt<br />

a talking head system for a different language. Originally, the system<br />

is developed for the Czech language. Methods for generation and animation<br />

of a 3D face model, speech data processing, and the whole system<br />

training were designed for the same language. Recently, experiments<br />

were carried out on adaptation of the system to the English and Dutch<br />

languages. This paper generalizes these approaches for other languages.<br />

Key steps of the methods that are needed to adhere and proposed<br />

software tools to use are mentioned. The process of an audio-visual database<br />

recording is summarized with the aim to get result data suitable<br />

for the lip-tracking method used in the training system. Requirements for<br />

a speaker, lighting and acoustical conditions, and data annotation requirements<br />

are mentioned. The steps needed for the lip-tracking method<br />

are a list of basic speech units for given language, segmentation of the<br />

recorded data, and preparation of speech segments for settings of the<br />

SAT (Selection of Articulatory Targets) method. At the end are mentioned<br />

requirements for the TTS (text-to-speech) system so that from the

Hurra! Ihre Datei wurde hochgeladen und ist bereit für die Veröffentlichung.

Erfolgreich gespeichert!

Leider ist etwas schief gelaufen!