29.06.2013 Views

NUI Galway – UL Alliance First Annual ENGINEERING AND - ARAN ...

NUI Galway – UL Alliance First Annual ENGINEERING AND - ARAN ...

NUI Galway – UL Alliance First Annual ENGINEERING AND - ARAN ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Context Stamp - A Topic-based Content Abstraction for Visual Concordance<br />

Analysis<br />

VinhTuan Thai, Siegfried Handschuh, Stefan Decker<br />

Digital Enterprise Research Institute, National University of Ireland, <strong>Galway</strong><br />

firstname.lastname@deri.org<br />

Abstract<br />

Concordance analysis supports users in studying how<br />

terms are used in a document vs. another by<br />

investigating their usage contexts. This type of analysis<br />

is useful in many domains, from literary study to market<br />

analysis. However, as current approaches usually<br />

present a large set of contexts in their full text form or<br />

as a large frequency-based word cloud, they still<br />

require a lot of effort from users to make sense of the<br />

underlying complex and dynamic semantic dimensions<br />

of contexts. To address this limitation, we propose<br />

Context Stamp as a visual representation of the gist of a<br />

term's usage contexts. To abstract away the textual<br />

details and yet retain the core facets of a term's contexts<br />

for visualization, we blend a statistical topic modeling<br />

method with a combination of the treemaps and Seesoft<br />

visual metaphors. This paper provides a high level<br />

description of the text analysis method and outlines the<br />

visual design of Context Stamps.<br />

1. Introduction<br />

Apart from the needs to search for and navigate to<br />

relevant information, many knowledge workers also<br />

need to compare different usages of a word in one<br />

document vs. another, or at one point in time vs.<br />

another. This comparison can be achieved by looking<br />

into the usage contexts of the word. In literature<br />

analysis, a concordance is commonly used, which<br />

includes an index of the terms in question, their<br />

frequencies and the surrounding contexts. Concordance<br />

analysis “is intended for understanding properties of<br />

language or for analyzing the structure and content of a<br />

document for its own sake, rather than search” [2]. It<br />

helps users investigate word frequencies, study how<br />

terms are used, or which words tend to go well together.<br />

Outside of the literature domain, concordance analysis<br />

can also be used for other purposes. For instance, in<br />

market analysis, it can be used to track how customers'<br />

responses to a product evolve over time.<br />

2. Research Goal<br />

The inherent categorical nature of text and its very<br />

high dimensionality makes it very challenging to<br />

display the contexts graphically [2]. Our goal is to<br />

propose a novel approach to make it easy for users to<br />

quickly compare the sets of contexts within which a<br />

term is used in one document vs another.<br />

Here we consider:<br />

•Instead of presenting a term's set of contexts in their<br />

original textual form, can the details be abstracted away<br />

163<br />

and only their gist retained to let users make contextual<br />

sense of the term?<br />

•Which visualization elements can be used together<br />

to convey both the distribution of a term and its<br />

contexts at different levels of detail?<br />

3. Text Analysis and Visual Design<br />

As with other visual analytic solutions, our focus is<br />

not to propose a new visual metaphor, but to identify a<br />

good automated algorithm for the analysis task, and<br />

then integrate the results with appropriate visualization<br />

and interaction techniques.<br />

To abstract away textual details of a set of contexts<br />

and yet retain facets of rich information about them, we<br />

rely on statistical topic models to obtain the relationship<br />

between the mental representation of language<br />

(meaning) and its manifestation in written form. Instead<br />

of treating documents as bags of words, a topic model<br />

treats documents as mixtures of latent topics, and each<br />

topic is a probability distribution over words. A word<br />

can be assigned to various topics with different<br />

probabilities, depending on its levels of association with<br />

various strands of meaning. With a topic model, we<br />

have the inferred distributions of topics within<br />

documents, and the distributions of words over the<br />

topics. The key outcomes are the compositions of the<br />

inferred topics, which are coherent clusters of<br />

thematically related words. While the inferred model is<br />

imperfect as finding the optimal model parameters for a<br />

dataset is non-trivial, these topic-word distributions can<br />

be employed to abstract away textual details of a term's<br />

set of contexts in a document.<br />

To show both the distributions of a term and the<br />

compositions of the core elements of its contexts, we<br />

propose a visualization that is an innovative<br />

combination of the treemaps metaphor [3] and the<br />

Seesoft-based visualization [1].<br />

Further details are available in [4].<br />

4. References<br />

[1] S. G. Eick. Graphically displaying text. Journal of<br />

Computational & Graphical Statistics, 3:127<strong>–</strong>142, 1994.<br />

[2] M. A. Hearst. Search User Interfaces. Cambridge<br />

University Press, New York, NY, USA, 2009.<br />

[3] B. Shneiderman. Tree visualization with tree-maps: 2-d<br />

space-filling approach. ACM Transactions on Graphics,<br />

11:92<strong>–</strong>99, January 1992.<br />

[4] V. Thai, S. Handschuh. Context Stamp - A Topic-based<br />

Content Abstraction for Visual Concordance Analysis. In<br />

Proceedings of the 29th of the international conference<br />

extended abstracts on Human factors in computing systems<br />

(CHI 2011), ACM, 2011.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!