22.01.2013 Views

Volume 1, Issue 1, January 2011 - DROPS - Schloss Dagstuhl

Volume 1, Issue 1, January 2011 - DROPS - Schloss Dagstuhl

Volume 1, Issue 1, January 2011 - DROPS - Schloss Dagstuhl

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

92 11041 – Multimodal Music Processing<br />

source separation could be improved when some additional information about the sources<br />

such as the musical score, beats, or the sung melody is known. A potential application<br />

of active listening involving audio-informed source separation and watermarking was also<br />

discussed.<br />

As second topic, we discussed possible strategies for combining features that are calculated<br />

at different frame rates or that represent different semantics levels. In particular, it was<br />

shown that, besides the well known early and late fusion strategies, it may be appropriate to<br />

consider other approaches that iteratively (or jointly) exploit information given by the various<br />

modalities to improve estimates for the other modalities. In the same spirit, an example of<br />

cross-modality search was discussed. More specifically, it was shown how cross-correlation<br />

between semantic functions automatically extracted from each stream can lead to efficient<br />

inter-modality search for information retrieval or content authoring. Here, specific examples<br />

on music videos were given including a short demonstration on an audio-based video search<br />

prototype.<br />

As a third topic, the strategy of considering the time evolution of audio features simultaneously<br />

at different temporal levels was addressed. This so-called multi-level feature<br />

integration was discussed with a specific focus on the appropriateness to combine both early<br />

and late fusion strategies. Finally, a set of questions was selected intended as initial input<br />

for further group discussions.<br />

3.31 Personalization in Multimodal Music Retrieval<br />

Markus Schedl (Universität Linz, AT)<br />

License Creative Commons BY-NC-ND 3.0 Unported license<br />

© Markus Schedl<br />

This talk provided an overview of current research endeavors and existing solutions in<br />

multimodal music retrieval, where the term “multimodal” relates to two aspects. Accounting<br />

for the context of a piece of music or an artist constitutes the first aspect, while the second<br />

one relates to incorporating the user context. Adding “traditional” signal-based audio<br />

analysis, I proposed a model that incorporates three broad categories of influence factors for<br />

music retrieval and music similarity computation: music content, music context, and user<br />

context. The music context is introduced as all the information that is important to the<br />

music, albeit not directly extractable from the audio signal. Such information includes, for<br />

example, editorial or collaboratively assembled meta-data, lyrics in textual form, cultural<br />

background of an artist, or images of album covers. The user context, in contrast, is defined<br />

by various external factors that influence how a listener perceives music. It is therefore<br />

strongly related to user modeling and personalization, both facets of music information<br />

research that have not gained large attention by the MIR community so far. In my estimation,<br />

adding personalization aspects to existing music retrieval systems (for example, playlist<br />

generators, recommender systems, or visual browsers) constitutes one key issue in future<br />

MIR research.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!