Download - CCRMA - Stanford University
Download - CCRMA - Stanford University
Download - CCRMA - Stanford University
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
G.2.6 Toward a High-Quality Singing Synthesizer<br />
Hui-Ling Lu<br />
Naturalness of the sound quality is essential for the singing synthesis. Since 9"/Tj in sinking is voiced<br />
sound, the focus of this research is to improve the naturalness of the vowel tone quality. In this study, we<br />
only focus on the non-nasal voiced sound. To trade off between the complexity of the modeling and the<br />
analysis procedun to acquire the model parameters, we propose to use the source-filter type synthesis<br />
model, based on a simplified human voice production system. The source-filter model decomposes the<br />
human voice production system into three linear systems: glottal source, vocal tract and radiation. The<br />
radiation is simplified as a differencing filter. The vocal tract filter is assumed all-poled for non-nasal<br />
sound. The glottal source and the radiation are then combined as the derivative glottal wave. We shall<br />
call it as the glottal excitation.<br />
The effort is then to estimate the vocal tract filter parameter and glottal excitation to mimic the desired<br />
singing vowels. The de-convolution of the vocal tract filter and glottal excitation was developed via the<br />
convex optimization technique [lj. Through this de-convolution, one could obtain the vocal tract filter<br />
parameters and the glottal excitation waveform.<br />
Since the glottal source modeling has shown to be an important factor for improving the naturalness<br />
of the speech synthesis. We are investigating the glottal source modeling alternatives for singing voice.<br />
Besides the abilities of flexible pitch and volume control, the desired source model is expected to be<br />
capable of controlling the voice quality. The voice quality is restricted to the voice source modification<br />
ranging from laryngealized (pressed^ to normal to breathy phonation. The evaluation will be based on<br />
the flexibility of the control and the ability to mimic the original sound recording of sustained vowels.<br />
Reference<br />
• Hui-Ling Lu and J. O. Smith. "Joint estimation of vocal tract filter and glottal source waveform<br />
via convex optimization."' Proc. IEEE Workshop on application of signal processing to audio and<br />
acoustics. 1999. pp. 79-82. Available online at http://www-ccrma.stanford.edu/~vickylu/research/VTestimation/estimation.htm<br />
6.2.7 Scanned Synthesis, A New Synthesis Technique<br />
Max V. Mathews. Bill Verplank. and Rob Shaw<br />
Developed at the Interval Research Corporation in 1998 and 1999. Scanned Synthesis is a new technique<br />
for the synthesis of musical sounds. We believe it will become as important as existing methods such<br />
as wave table synthesis, additive synthesis. FM synthesis, and physical modeling. Scanned Synthesis is<br />
based on the psychoacoustics of how we hear and appreciate timbres and on our motor control (haptic)<br />
abilities to manipulate timbres during live performance. A unique feature of scanned synthesis is its<br />
emphasis on the performer's control of timbre.<br />
Scanned synthesis involves a slow dynamic system whose frequencies of vibration are below about 15<br />
hz. The system is directly manipulated by motions of the performer. The vibrations of the system are a<br />
function of the initial conditions, the forces applied by the performer, and the dynamics of the system.<br />
Examples include slowly vibrating strings, two dimensional surfaces obeying the wave equation, and a<br />
waterbed. We have simulated the string and surface models on a computer. Our waterbed model is<br />
purely conceptual.<br />
The ear cannot hear the low frequencies of the dynamic system. To make audible frequencies, the<br />
"shape" of the dynamic system, along a closed path, is scanned periodically. The "shape" is converted<br />
to a sound wave whose pitch is determined by the speed of the scanning function. Pitch control is<br />
completely separate from the dynamic system control. Thus timbre and pitch are independent. This<br />
system can be looked upon as a dynamic wave table controlled by the performer.<br />
38