29.06.2013 Views

NUI Galway – UL Alliance First Annual ENGINEERING AND - ARAN ...

NUI Galway – UL Alliance First Annual ENGINEERING AND - ARAN ...

NUI Galway – UL Alliance First Annual ENGINEERING AND - ARAN ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Detecting Topics and Events in Video Content via<br />

Synchronized Twitter Streams<br />

Smitashree Choudhury & John G.Breslin<br />

DERI, National University of Ireland, <strong>Galway</strong>, Ireland<br />

2 School of Engineering and Informatics, National University of Ireland, <strong>Galway</strong>, Ireland<br />

{smitashree.choudhury, john.breslin }@deri.org<br />

Abstract<br />

This work is an attempt to annotate event videos from<br />

contextual sources, in this case from user tweets. The<br />

problem involves two subtasks, 1) Identifying<br />

interesting entities and events 2) aligning the detected<br />

entities to the recorded video. Combination of linguistic<br />

processing statistical data and domain knowledge helps<br />

to get high quality result for a time stamped concept<br />

based annotation of various event videos. To evaluate<br />

our approach we studied four live events in two<br />

different domains.<br />

1. Introduction<br />

Being able to automatically annotate videos on the<br />

social web is a complex problem to solve. It usually<br />

requires combinations of expensive content-processing<br />

algorithms, speech recognition techniques or expert<br />

manual annotations, but not scalable. To address the<br />

above problem, we explored a unique yet extremely<br />

lightweight approach by leveraging user-generated<br />

tweet streams to annotate event videos. Our proposed<br />

approach involves the detection of interesting topics<br />

using various combinations of statistical and natural<br />

language processing techniques, and the<br />

synchronization of topics to a video timeline using<br />

simple heuristics. A similar approach has been reported<br />

in [1] for detecting audience sentiment during US<br />

presidential debate.<br />

2. Methodology<br />

2.1. Data processing<br />

The experiment started with data collection from 4<br />

different events using various related keywords and<br />

hashtags. Cleaning of data started with pre-processing,<br />

segregating hashtags, usernames and identifying<br />

relevant tweets from the non-relevant.<br />

2.2. Feature Selection<br />

Features selection includes statistical features such<br />

as Twitter volume, unique users etc, linguistic features<br />

such as name variations, hashtags and some domain<br />

knowledge such as event name and participants<br />

(conference) and player’s name (sports) were used.<br />

2.3. Entity and Topic Detection<br />

The entity and topic detection performed using<br />

multiple approaches including burst detection, tf-idf<br />

measure and feature based classification of tweets.<br />

96<br />

Figure 1: Interface for the time based annotation of the<br />

WWDC keynote video.<br />

3. Result<br />

The experiment was evaluated using IR measures<br />

such as recall, precision and F-measure against the<br />

manually-annotated ground truth.<br />

Figure 2: Results for topic and event detection from the<br />

tweets.<br />

A simple evaluation of the automatic synchronization<br />

was performed against the same users labeled as ground<br />

truth. The objective of the evaluation is to see whether<br />

the heuristics adopted are sufficient to localize the<br />

topics.<br />

Figure 3: Results for twitter to timeline alignment<br />

4. References<br />

[1] Shamma, D.A., Kennedy, L. and Churchill, E. “Tweet the<br />

Debates”, ACM Multimedia Workshop on Social Media<br />

(WSM), (2009).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!