04.07.2013 Views

Download - CCRMA - Stanford University

Download - CCRMA - Stanford University

Download - CCRMA - Stanford University

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

6.5.2 Realtime Chord Recognition of Musical Sound: a System Using Common Lisp<br />

Music<br />

Takuya Fujishima<br />

I designed an algorithm to recognize musical chords from the input signal of musical sounds. The keys<br />

of the algorithm are:<br />

1. use of "pitch class profile (PCPf. or an intensity map of twelve semitone pitch classes<br />

2. numerical pattern matching between the PCP and built-in 'chord-type templates'" to determine<br />

the most likely root and chord type.<br />

I introduced two major heuristics and some other improvements to Marc Leman's "Simple Auditory<br />

Model" so as to make it a practical one.<br />

I implemented the algorithm using Common Lisp Music. The system worked continuously in realtime<br />

on an Silicon Graphics 02 workstation and on Intel PC platforms running Linux. It could take the input<br />

sound from the audio input, or from a sound file, and could display the recognition results on the fly.<br />

In the experiments. I first used the pure tone and three other timbres from electronic musical instruments<br />

to estimate the potential capability of the system. The system could distinguish all of 27 built-in chord<br />

types for chord tones played in the above mentioned timbres. Then I input to the system a 50 second<br />

audio excerpt from the opening theme of Smetanas Moldau. Without the two heuristics, the accuracy<br />

remained around 80 percent level. When they were applied, the accuracy rose to 94 percent ... 196 out<br />

of 208 guesses that the system made were musically correct. These experiments showed that the system<br />

could recognize triadic harmonic events, and to some extent more complex chords such as sevenths and<br />

ninths, at the signal level.<br />

References<br />

• Leman. Marc. Music and Schema Theory. Springer-V'erlag.<br />

• Schottstaedt. William. Machine Tongues XVII. CLM: Music \ Meets Common Lisp. Computer<br />

Music Journal. Vol.18. N'o.2. pp.30-38. 1994.<br />

• Fujishima. Takuya. Realtime Chord Recognition of Musical Sound: a System Using Common Lisp<br />

Music. Proceedings of the 1999 International Computer Music Conference. Beijing. China, pp.<br />

464-467.<br />

6.5.3 Speaker Normalization for Speech Recognition Based On Maximum-Likelihood Linear<br />

Transformation<br />

Yoon Kim<br />

In speaker-independent speech recognition, where a pre-trained model is used to recognize speech uttered<br />

by an arbitrary speaker, minimizing the effects of speaker-dependent acoustics is crucial. Acoustic<br />

mismatches between the test speakers and the statistical model result in considerable degradation of<br />

recognition performance. In this work, we apply linear transformation to the cepstral space, which<br />

can be viewed as the Fourier dual of the log spectral space. Gaussian mixture is the most popular<br />

distribution used for modeling speech segments. If we restrict the feature transformation to be that<br />

of convolution, which is equivalent to filtering in the log spectrum domain, the resulting normalization<br />

matrix exhibits a Toeplitz structure, simplifying the parameter estimation. The problem of finding the<br />

optimal matrix coefficients that maximize the likelihood of the utterance with respect to the existing<br />

Gaussian-based model then becomes nothing but a constrained least-squares problem, which is convex in<br />

nature, yielding a unique optimum. Applying the optimal linear transformation to the test feature space<br />

vielded in considerable improvements over the baseline system in frame-based vowel recognition using<br />

51

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!