LIBRO DE ACTAS (pdf) - Universidad de Sevilla
LIBRO DE ACTAS (pdf) - Universidad de Sevilla
LIBRO DE ACTAS (pdf) - Universidad de Sevilla
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Melodic Transcription of Flamenco Singing from Monophonic and Polyphonic Music Recordings<br />
1.3 Automatic transcription of sung melodies<br />
Automatic transcription is one of the main research challenges in the field of sound and music<br />
computing. It consists in computing a symbolic musical representation (in terms of Western<br />
notation) from a given musical performance (Klapuri, 2006). For monophonic music material,<br />
the obtained transcription relates to the melody (Gómez et al., 2003) and in polyphonic music<br />
material there is an interest in transcribing the predominant melodic line (Klapuri, 2006).<br />
Transcription systems can provi<strong>de</strong> melodic <strong>de</strong>scriptors at different levels. The main melodyrelated<br />
Low-level features are energy, associated with loudness, and fundamental frequency (f0)<br />
related to its perceptual correlate, pitch. From now on, we will use the term pitch to refer to f0.<br />
In a higher structural level, audio streams are segmented into notes, and their duration and pitch<br />
provi<strong>de</strong> a symbolic representation. This representation can be the input to higher-level music<br />
analyses, e.g. ornament <strong>de</strong>tection, melodic contour extraction or key or scale analysis. Current<br />
systems for automatic transcription are usually composed of three different stages: low-level<br />
(frame-based) <strong>de</strong>scriptor extraction, note segmentation and note labelling.<br />
When <strong>de</strong>aling with monophonic music signals, existing transcription systems provi<strong>de</strong><br />
satisfying results for a great number of musical instruments. Although we find some successful<br />
approaches for singing voice (Mul<strong>de</strong>r et al. 2003; Ryynänen, 2006), it is still one of the most<br />
complex instruments to transcribe, even in a monophonic context. This is due to several factors,<br />
such as the continuous character of the human voice and the variety of pitch ranges and timbre.<br />
This results in difficulties in obtaining correct f0 estimations, <strong>de</strong>tecting note transitions and<br />
labelling notes in terms of pitch or duration. When <strong>de</strong>aling with polyphonic music signals,<br />
current state-of-the-art algorithms for predominant f0 estimation yield an overall accuracy<br />
around 75% according to the 2011 edition of the Music Information Retrieval Evaluation<br />
eXchange (MIREX). Moreover, audio onset <strong>de</strong>tection methods yield an average F-measure<br />
around 0.78 (MIREX). This F-measure is obtained for a mixed dataset of 85 files, but if we just<br />
consi<strong>de</strong>r the 5 tested singing voice excerpts, the maximum F-measure is 0.47. In addition,<br />
current approaches are oriented towards mainstream popular music. This leads us to the<br />
question of how would these algorithms perform for, e.g. traditional music, and more<br />
particularly, flamenco singing. Additional challenges in flamenco transcription arise from the<br />
quality of existing recordings, the acoustic and expressive particularities of singing, its ornamental<br />
and improvisational character and the yet to be formalized musical structures employed (Mora et<br />
al., 2010).<br />
2. Selected approach<br />
Figure 1 shows an overall diagram of the proposed system, which is based on the one<br />
<strong>de</strong>scribed in (Janer et al., 2008). It consists of four main steps: low-level feature extraction<br />
(fundamental frequency, energy and spectral features), tuning frequency estimation, transcription<br />
into short notes, and an iterative process involving note consolidation and refinement of the<br />
tuning frequency.<br />
200