04.07.2013 Views

Download - CCRMA - Stanford University

Download - CCRMA - Stanford University

Download - CCRMA - Stanford University

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

G.2.6 Toward a High-Quality Singing Synthesizer<br />

Hui-Ling Lu<br />

Naturalness of the sound quality is essential for the singing synthesis. Since 9"/Tj in sinking is voiced<br />

sound, the focus of this research is to improve the naturalness of the vowel tone quality. In this study, we<br />

only focus on the non-nasal voiced sound. To trade off between the complexity of the modeling and the<br />

analysis procedun to acquire the model parameters, we propose to use the source-filter type synthesis<br />

model, based on a simplified human voice production system. The source-filter model decomposes the<br />

human voice production system into three linear systems: glottal source, vocal tract and radiation. The<br />

radiation is simplified as a differencing filter. The vocal tract filter is assumed all-poled for non-nasal<br />

sound. The glottal source and the radiation are then combined as the derivative glottal wave. We shall<br />

call it as the glottal excitation.<br />

The effort is then to estimate the vocal tract filter parameter and glottal excitation to mimic the desired<br />

singing vowels. The de-convolution of the vocal tract filter and glottal excitation was developed via the<br />

convex optimization technique [lj. Through this de-convolution, one could obtain the vocal tract filter<br />

parameters and the glottal excitation waveform.<br />

Since the glottal source modeling has shown to be an important factor for improving the naturalness<br />

of the speech synthesis. We are investigating the glottal source modeling alternatives for singing voice.<br />

Besides the abilities of flexible pitch and volume control, the desired source model is expected to be<br />

capable of controlling the voice quality. The voice quality is restricted to the voice source modification<br />

ranging from laryngealized (pressed^ to normal to breathy phonation. The evaluation will be based on<br />

the flexibility of the control and the ability to mimic the original sound recording of sustained vowels.<br />

Reference<br />

• Hui-Ling Lu and J. O. Smith. "Joint estimation of vocal tract filter and glottal source waveform<br />

via convex optimization."' Proc. IEEE Workshop on application of signal processing to audio and<br />

acoustics. 1999. pp. 79-82. Available online at http://www-ccrma.stanford.edu/~vickylu/research/VTestimation/estimation.htm<br />

6.2.7 Scanned Synthesis, A New Synthesis Technique<br />

Max V. Mathews. Bill Verplank. and Rob Shaw<br />

Developed at the Interval Research Corporation in 1998 and 1999. Scanned Synthesis is a new technique<br />

for the synthesis of musical sounds. We believe it will become as important as existing methods such<br />

as wave table synthesis, additive synthesis. FM synthesis, and physical modeling. Scanned Synthesis is<br />

based on the psychoacoustics of how we hear and appreciate timbres and on our motor control (haptic)<br />

abilities to manipulate timbres during live performance. A unique feature of scanned synthesis is its<br />

emphasis on the performer's control of timbre.<br />

Scanned synthesis involves a slow dynamic system whose frequencies of vibration are below about 15<br />

hz. The system is directly manipulated by motions of the performer. The vibrations of the system are a<br />

function of the initial conditions, the forces applied by the performer, and the dynamics of the system.<br />

Examples include slowly vibrating strings, two dimensional surfaces obeying the wave equation, and a<br />

waterbed. We have simulated the string and surface models on a computer. Our waterbed model is<br />

purely conceptual.<br />

The ear cannot hear the low frequencies of the dynamic system. To make audible frequencies, the<br />

"shape" of the dynamic system, along a closed path, is scanned periodically. The "shape" is converted<br />

to a sound wave whose pitch is determined by the speed of the scanning function. Pitch control is<br />

completely separate from the dynamic system control. Thus timbre and pitch are independent. This<br />

system can be looked upon as a dynamic wave table controlled by the performer.<br />

38

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!