06.12.2012 Views

CSEM Scientific and Technical Report 2008

CSEM Scientific and Technical Report 2008

CSEM Scientific and Technical Report 2008

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Vocal Aid for Laryngectomees<br />

P. Renevey, O. Schleusing, R. Vetter, P. Theurillat, M. Correvon, J. M. Solà i Carós<br />

After an operation of laryngectomy (partial or total removal of the vocal folds) the quality of the residual voice is significantly degraded. The project<br />

Larynx aims at the development of a non-invasive voice restoration system that improves the quality of speech in real time while preserving original<br />

idiosyncrasies of the original voice. The recent advances of this research activity are presented in the following.<br />

People who have undergone laryngectomy, to treat laryngeal<br />

cancer, loose the ability to speak. The vocal function can be<br />

recovered partially through medical rehabilitation, although,<br />

with a loss of speech intensity, a poor quality of the pitch<br />

(fundamental frequency of speech) <strong>and</strong> large variations in the<br />

energy of producing speech signals.<br />

To overcome these deficiencies, a vocal aid system has been<br />

designed <strong>and</strong> developed. This system aims at restoring in<br />

real-time the quality of speech by the use of digital signal<br />

processing techniques [1] . The proposed restoration approach<br />

is mainly based on an autoregressive analysis of speech<br />

signal that separates the excitation signal (produced by the<br />

lungs <strong>and</strong> the larynx) <strong>and</strong> the articulation (produced by<br />

resonances in the oral <strong>and</strong> nasal cavities). These components<br />

are then restored separately <strong>and</strong> merged afterwards to get the<br />

restored speech signal (see Figure 1). Such an approach<br />

preserves the idiosyncrasies from the original voice while<br />

improving its overall quality.<br />

Figure 1: Block diagram <strong>and</strong> illustrative example of the targeted vocal<br />

aid system for laryngectomees.<br />

Previous research activities have highlighted two key issues<br />

for restoring voices of good quality. These points are namely<br />

the estimation of the fundamental frequency during voiced<br />

speech (e.g. vowels) <strong>and</strong> the detection of these voiced<br />

segments. Therefore, recent work performed in the framework<br />

of this activity has been focused on these two aspects.<br />

Fundamental frequency refers to the vibration frequency of the<br />

vocal folds during voiced sounds. Such sounds are produced<br />

when the speaker closes his vocal folds. The air expelled from<br />

the lungs forces a way through the folds <strong>and</strong> makes them<br />

vibrate. The tension applied to the folds controls the frequency<br />

of vibration <strong>and</strong> introduces variations around its mean values<br />

(e.g. accentuation or singing). The correct estimation of the<br />

fundamental frequency <strong>and</strong> of its variations are of great<br />

importance for the achievement of restoring voice that sounds<br />

natural. After a laryngectomy, these vibrations are no more<br />

produced by the vocal folds (removed by the surgical<br />

operation) but by the remaining flesh structure of the larynx. It<br />

results in a vibration of poor quality with frequent breakdown<br />

of energy <strong>and</strong> frequency. Under these circumstances,<br />

classical approaches fail to estimate it correctly. Different<br />

methods for pitch estimation in pathological voices have been<br />

developed <strong>and</strong> analyzed in respect to their performances <strong>and</strong><br />

feasibility in a real-time environment. A new method called<br />

Adaptive Wavetable Oscillators, has shown in a large variety<br />

of experiments, that it can out-perform more complex methods<br />

when applied to synthetic <strong>and</strong> healthy speech signals,<br />

especially with varying levels of additive noise being present.<br />

In subjective listening tests the presented method performs<br />

similar to more complex <strong>and</strong> established methods. Due to the<br />

fact that it is remarkably inexpensive from the computational<br />

point of view, the method bears a great potential for<br />

employment in mobile, embedded applications [2, 3] .<br />

An algorithm for the detection of voiced segments, based on<br />

Hidden Markov Model, has been developed <strong>and</strong> evaluated.<br />

This classification system, trained specifically for each subject,<br />

has been shown to obtain adequate performances for the<br />

planned system. Classification of pathological voices obtains<br />

results that are comparable to those obtained with healthy<br />

voices up to an expected small reduction of the performances.<br />

The requirement of a specific training for each user is only a<br />

minor drawback because a session of recording <strong>and</strong><br />

adjustments would be required to match the patient voice <strong>and</strong><br />

the desired restored voice characteristics.<br />

The presented solutions represent a step forward in<br />

developing a complete operational vocal aid system. The<br />

ongoing work now consists of the integration of the proposed<br />

solution in the complete restoration algorithm <strong>and</strong> its<br />

implementation in a real-time demonstration platform<br />

developed by HEIG-VD.<br />

The partners of the project are the University of Applied<br />

Science in Yverdon (HEIG-VD – responsible for the<br />

development of the real-time platform), the University Hospital<br />

in Lausanne (CHUV – responsible for the medical issues) <strong>and</strong><br />

the Swiss Institute for Technology in Lausanne (EPFL –<br />

supporting <strong>CSEM</strong> for the signal processing tasks).<br />

This work was jointly funded by the Gebert Rüf Stiftung <strong>and</strong><br />

OncoSuisse. <strong>CSEM</strong> thanks them for their support.<br />

[1] R. Vetter, J. Cornuz, P. Vuadens, J. M. Solà i Carós, P. Renevey,<br />

“Method <strong>and</strong> System for Converting Voice”, European Patent EP<br />

1 710 788 A1, 2006<br />

[2] F. N. Reale, J. M. Solà i Carós, P. Renevey, R. Vetter,<br />

“Restoration of Natural Prosody in Pathological Voices”, Swiss<br />

Society for Biomedical Engineering (SSBE), Annual Meeting<br />

2007, Neuchâtel<br />

[3] O. Schleusing, R. Vetter, P. Renevey, J. Krauss, F.N. Reale,<br />

V. Schweizer <strong>and</strong> J.-M. Vesin, “Restoration of Prosody in<br />

Tracheoesophageal Speech by a Multi-Resolution Approach”,<br />

submitted to IASTED Software Engineering Conference 2009,<br />

Innsbruck<br />

79

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!