Volume 1, Issue 1, January 2011 - DROPS - Schloss Dagstuhl
Volume 1, Issue 1, January 2011 - DROPS - Schloss Dagstuhl
Volume 1, Issue 1, January 2011 - DROPS - Schloss Dagstuhl
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
92 11041 – Multimodal Music Processing<br />
source separation could be improved when some additional information about the sources<br />
such as the musical score, beats, or the sung melody is known. A potential application<br />
of active listening involving audio-informed source separation and watermarking was also<br />
discussed.<br />
As second topic, we discussed possible strategies for combining features that are calculated<br />
at different frame rates or that represent different semantics levels. In particular, it was<br />
shown that, besides the well known early and late fusion strategies, it may be appropriate to<br />
consider other approaches that iteratively (or jointly) exploit information given by the various<br />
modalities to improve estimates for the other modalities. In the same spirit, an example of<br />
cross-modality search was discussed. More specifically, it was shown how cross-correlation<br />
between semantic functions automatically extracted from each stream can lead to efficient<br />
inter-modality search for information retrieval or content authoring. Here, specific examples<br />
on music videos were given including a short demonstration on an audio-based video search<br />
prototype.<br />
As a third topic, the strategy of considering the time evolution of audio features simultaneously<br />
at different temporal levels was addressed. This so-called multi-level feature<br />
integration was discussed with a specific focus on the appropriateness to combine both early<br />
and late fusion strategies. Finally, a set of questions was selected intended as initial input<br />
for further group discussions.<br />
3.31 Personalization in Multimodal Music Retrieval<br />
Markus Schedl (Universität Linz, AT)<br />
License Creative Commons BY-NC-ND 3.0 Unported license<br />
© Markus Schedl<br />
This talk provided an overview of current research endeavors and existing solutions in<br />
multimodal music retrieval, where the term “multimodal” relates to two aspects. Accounting<br />
for the context of a piece of music or an artist constitutes the first aspect, while the second<br />
one relates to incorporating the user context. Adding “traditional” signal-based audio<br />
analysis, I proposed a model that incorporates three broad categories of influence factors for<br />
music retrieval and music similarity computation: music content, music context, and user<br />
context. The music context is introduced as all the information that is important to the<br />
music, albeit not directly extractable from the audio signal. Such information includes, for<br />
example, editorial or collaboratively assembled meta-data, lyrics in textual form, cultural<br />
background of an artist, or images of album covers. The user context, in contrast, is defined<br />
by various external factors that influence how a listener perceives music. It is therefore<br />
strongly related to user modeling and personalization, both facets of music information<br />
research that have not gained large attention by the MIR community so far. In my estimation,<br />
adding personalization aspects to existing music retrieval systems (for example, playlist<br />
generators, recommender systems, or visual browsers) constitutes one key issue in future<br />
MIR research.