CSEM Scientific and Technical Report 2008
CSEM Scientific and Technical Report 2008
CSEM Scientific and Technical Report 2008
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Vocal Aid for Laryngectomees<br />
P. Renevey, O. Schleusing, R. Vetter, P. Theurillat, M. Correvon, J. M. Solà i Carós<br />
After an operation of laryngectomy (partial or total removal of the vocal folds) the quality of the residual voice is significantly degraded. The project<br />
Larynx aims at the development of a non-invasive voice restoration system that improves the quality of speech in real time while preserving original<br />
idiosyncrasies of the original voice. The recent advances of this research activity are presented in the following.<br />
People who have undergone laryngectomy, to treat laryngeal<br />
cancer, loose the ability to speak. The vocal function can be<br />
recovered partially through medical rehabilitation, although,<br />
with a loss of speech intensity, a poor quality of the pitch<br />
(fundamental frequency of speech) <strong>and</strong> large variations in the<br />
energy of producing speech signals.<br />
To overcome these deficiencies, a vocal aid system has been<br />
designed <strong>and</strong> developed. This system aims at restoring in<br />
real-time the quality of speech by the use of digital signal<br />
processing techniques [1] . The proposed restoration approach<br />
is mainly based on an autoregressive analysis of speech<br />
signal that separates the excitation signal (produced by the<br />
lungs <strong>and</strong> the larynx) <strong>and</strong> the articulation (produced by<br />
resonances in the oral <strong>and</strong> nasal cavities). These components<br />
are then restored separately <strong>and</strong> merged afterwards to get the<br />
restored speech signal (see Figure 1). Such an approach<br />
preserves the idiosyncrasies from the original voice while<br />
improving its overall quality.<br />
Figure 1: Block diagram <strong>and</strong> illustrative example of the targeted vocal<br />
aid system for laryngectomees.<br />
Previous research activities have highlighted two key issues<br />
for restoring voices of good quality. These points are namely<br />
the estimation of the fundamental frequency during voiced<br />
speech (e.g. vowels) <strong>and</strong> the detection of these voiced<br />
segments. Therefore, recent work performed in the framework<br />
of this activity has been focused on these two aspects.<br />
Fundamental frequency refers to the vibration frequency of the<br />
vocal folds during voiced sounds. Such sounds are produced<br />
when the speaker closes his vocal folds. The air expelled from<br />
the lungs forces a way through the folds <strong>and</strong> makes them<br />
vibrate. The tension applied to the folds controls the frequency<br />
of vibration <strong>and</strong> introduces variations around its mean values<br />
(e.g. accentuation or singing). The correct estimation of the<br />
fundamental frequency <strong>and</strong> of its variations are of great<br />
importance for the achievement of restoring voice that sounds<br />
natural. After a laryngectomy, these vibrations are no more<br />
produced by the vocal folds (removed by the surgical<br />
operation) but by the remaining flesh structure of the larynx. It<br />
results in a vibration of poor quality with frequent breakdown<br />
of energy <strong>and</strong> frequency. Under these circumstances,<br />
classical approaches fail to estimate it correctly. Different<br />
methods for pitch estimation in pathological voices have been<br />
developed <strong>and</strong> analyzed in respect to their performances <strong>and</strong><br />
feasibility in a real-time environment. A new method called<br />
Adaptive Wavetable Oscillators, has shown in a large variety<br />
of experiments, that it can out-perform more complex methods<br />
when applied to synthetic <strong>and</strong> healthy speech signals,<br />
especially with varying levels of additive noise being present.<br />
In subjective listening tests the presented method performs<br />
similar to more complex <strong>and</strong> established methods. Due to the<br />
fact that it is remarkably inexpensive from the computational<br />
point of view, the method bears a great potential for<br />
employment in mobile, embedded applications [2, 3] .<br />
An algorithm for the detection of voiced segments, based on<br />
Hidden Markov Model, has been developed <strong>and</strong> evaluated.<br />
This classification system, trained specifically for each subject,<br />
has been shown to obtain adequate performances for the<br />
planned system. Classification of pathological voices obtains<br />
results that are comparable to those obtained with healthy<br />
voices up to an expected small reduction of the performances.<br />
The requirement of a specific training for each user is only a<br />
minor drawback because a session of recording <strong>and</strong><br />
adjustments would be required to match the patient voice <strong>and</strong><br />
the desired restored voice characteristics.<br />
The presented solutions represent a step forward in<br />
developing a complete operational vocal aid system. The<br />
ongoing work now consists of the integration of the proposed<br />
solution in the complete restoration algorithm <strong>and</strong> its<br />
implementation in a real-time demonstration platform<br />
developed by HEIG-VD.<br />
The partners of the project are the University of Applied<br />
Science in Yverdon (HEIG-VD – responsible for the<br />
development of the real-time platform), the University Hospital<br />
in Lausanne (CHUV – responsible for the medical issues) <strong>and</strong><br />
the Swiss Institute for Technology in Lausanne (EPFL –<br />
supporting <strong>CSEM</strong> for the signal processing tasks).<br />
This work was jointly funded by the Gebert Rüf Stiftung <strong>and</strong><br />
OncoSuisse. <strong>CSEM</strong> thanks them for their support.<br />
[1] R. Vetter, J. Cornuz, P. Vuadens, J. M. Solà i Carós, P. Renevey,<br />
“Method <strong>and</strong> System for Converting Voice”, European Patent EP<br />
1 710 788 A1, 2006<br />
[2] F. N. Reale, J. M. Solà i Carós, P. Renevey, R. Vetter,<br />
“Restoration of Natural Prosody in Pathological Voices”, Swiss<br />
Society for Biomedical Engineering (SSBE), Annual Meeting<br />
2007, Neuchâtel<br />
[3] O. Schleusing, R. Vetter, P. Renevey, J. Krauss, F.N. Reale,<br />
V. Schweizer <strong>and</strong> J.-M. Vesin, “Restoration of Prosody in<br />
Tracheoesophageal Speech by a Multi-Resolution Approach”,<br />
submitted to IASTED Software Engineering Conference 2009,<br />
Innsbruck<br />
79