06.02.2013 Views

Abstract book (pdf) - ICPR 2010

Abstract book (pdf) - ICPR 2010

Abstract book (pdf) - ICPR 2010

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

ness and quality of the tracking compared to the use of the RGB space. This is asserted by the experiments performed on<br />

several sequences showing vehicles and pedestrians in various contexts.<br />

13:30-16:30, Paper ThBCT9.44<br />

Signal-To-Signal Ratio Independent Speaker Identifi Cation for Co-Channel Speech Signals<br />

Saeidi, Rahim, Univ. of Eastern Finland<br />

Mowlaee, Pejman, Aalborg Univ.<br />

Kinnunen, Tomi, Univ. of Eastern Finland<br />

Tan, Zheng-Hua, Aalborg Univ.<br />

Christensen, Mads Græsbøll, Aalborg Univ.<br />

Jensen, Søren Holdt, Aalborg Univ.<br />

Fränti, Pasi, Univ. of Eastern Finland<br />

In this paper, we consider speaker identification for the co-channel scenario in which speech mixture from speakers is<br />

recorded by one microphone only. The goal is to identify both of the speakers from their mixed signal. High recognition<br />

accuracies have already been reported when an accurately estimated signal-to-signal ratio (SSR) is available. In this paper,<br />

we approach the problem without estimating SSR. We show that a simple method based on fusion of adapted Gaussian<br />

mixture models and Kullback-Leibler divergence calculated between models, achieves an accuracy of 97% and 93% when<br />

the two target speakers enlisted as three and two most probable speakers, respectively.<br />

13:30-16:30, Paper ThBCT9.45<br />

Selection of Training Instances for Music Genre Classification<br />

Lopes, Miguel, INESC Porto<br />

Gouyon, Fabien, INESC Porto<br />

Koerich, Alessandro, PUCPR<br />

Oliveira, Luiz, Federal Univ. of Parana<br />

In this paper we present a method for the selection of training instances based on the classification accuracy of a SVM<br />

classifier. The instances consist of feature vectors representing short-term, low-level characteristics of music audio signals.<br />

The objective is to build, from only a portion of the training data, a music genre classifier with at least similar performance<br />

as when the whole data is used. The particularity of our approach lies in a pre-classification of instances prior to the main<br />

classifier training: i.e. we select from the training data those instances that show better discrimination with respect to class<br />

memberships. On a very challenging dataset of 900 music pieces divided among 10 music genres, the instance selection<br />

method slightly improves the music genre classification in 2.4 percentage points. On the other hand, the resulting classification<br />

model is significantly reduced, permitting much faster classification over test data.<br />

13:30-16:30, Paper ThBCT9.46<br />

Semi-Blind Speech-Music Separation using Sparsity and Continuity Priors<br />

Erdogan, Hakan, Sabanci Univ.<br />

M. Grais, Emad, Sabanci Univ.<br />

In this paper we propose an approach for the problem of single channel source separation of speech and music signals.<br />

Our approach is based on representing each source’s power spectral density using dictionaries and nonlinearly projecting<br />

the mixture signal spectrum onto the combined span of the dictionary entries. We encourage sparsity and continuity of the<br />

dictionary coefficients using penalty terms (or log-priors) in an optimization framework. We propose to use a novel coordinate<br />

descent technique for optimization, which nicely handles nonnegativity constraints and nonquadratic penalty terms.<br />

We use an adaptive Wiener filter, and spectral subtraction to reconstruct both of the sources from the mixture data after<br />

corresponding power spectral densities (PSDs) are estimated for each source. Using conventional metrics, we measure<br />

the performance of the system on simulated mixtures of single person speech and piano music sources. The results indicate<br />

that the proposed method is a promising technique for low speech-to-music ratio conditions and that sparsity and continuity<br />

priors help improve the performance of the proposed system.<br />

- 325 -

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!