Download - CCRMA - Stanford University
Download - CCRMA - Stanford University
Download - CCRMA - Stanford University
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
6.5.2 Realtime Chord Recognition of Musical Sound: a System Using Common Lisp<br />
Music<br />
Takuya Fujishima<br />
I designed an algorithm to recognize musical chords from the input signal of musical sounds. The keys<br />
of the algorithm are:<br />
1. use of "pitch class profile (PCPf. or an intensity map of twelve semitone pitch classes<br />
2. numerical pattern matching between the PCP and built-in 'chord-type templates'" to determine<br />
the most likely root and chord type.<br />
I introduced two major heuristics and some other improvements to Marc Leman's "Simple Auditory<br />
Model" so as to make it a practical one.<br />
I implemented the algorithm using Common Lisp Music. The system worked continuously in realtime<br />
on an Silicon Graphics 02 workstation and on Intel PC platforms running Linux. It could take the input<br />
sound from the audio input, or from a sound file, and could display the recognition results on the fly.<br />
In the experiments. I first used the pure tone and three other timbres from electronic musical instruments<br />
to estimate the potential capability of the system. The system could distinguish all of 27 built-in chord<br />
types for chord tones played in the above mentioned timbres. Then I input to the system a 50 second<br />
audio excerpt from the opening theme of Smetanas Moldau. Without the two heuristics, the accuracy<br />
remained around 80 percent level. When they were applied, the accuracy rose to 94 percent ... 196 out<br />
of 208 guesses that the system made were musically correct. These experiments showed that the system<br />
could recognize triadic harmonic events, and to some extent more complex chords such as sevenths and<br />
ninths, at the signal level.<br />
References<br />
• Leman. Marc. Music and Schema Theory. Springer-V'erlag.<br />
• Schottstaedt. William. Machine Tongues XVII. CLM: Music \ Meets Common Lisp. Computer<br />
Music Journal. Vol.18. N'o.2. pp.30-38. 1994.<br />
• Fujishima. Takuya. Realtime Chord Recognition of Musical Sound: a System Using Common Lisp<br />
Music. Proceedings of the 1999 International Computer Music Conference. Beijing. China, pp.<br />
464-467.<br />
6.5.3 Speaker Normalization for Speech Recognition Based On Maximum-Likelihood Linear<br />
Transformation<br />
Yoon Kim<br />
In speaker-independent speech recognition, where a pre-trained model is used to recognize speech uttered<br />
by an arbitrary speaker, minimizing the effects of speaker-dependent acoustics is crucial. Acoustic<br />
mismatches between the test speakers and the statistical model result in considerable degradation of<br />
recognition performance. In this work, we apply linear transformation to the cepstral space, which<br />
can be viewed as the Fourier dual of the log spectral space. Gaussian mixture is the most popular<br />
distribution used for modeling speech segments. If we restrict the feature transformation to be that<br />
of convolution, which is equivalent to filtering in the log spectrum domain, the resulting normalization<br />
matrix exhibits a Toeplitz structure, simplifying the parameter estimation. The problem of finding the<br />
optimal matrix coefficients that maximize the likelihood of the utterance with respect to the existing<br />
Gaussian-based model then becomes nothing but a constrained least-squares problem, which is convex in<br />
nature, yielding a unique optimum. Applying the optimal linear transformation to the test feature space<br />
vielded in considerable improvements over the baseline system in frame-based vowel recognition using<br />
51