NUI Galway – UL Alliance First Annual ENGINEERING AND - ARAN ...
NUI Galway – UL Alliance First Annual ENGINEERING AND - ARAN ...
NUI Galway – UL Alliance First Annual ENGINEERING AND - ARAN ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Context Stamp - A Topic-based Content Abstraction for Visual Concordance<br />
Analysis<br />
VinhTuan Thai, Siegfried Handschuh, Stefan Decker<br />
Digital Enterprise Research Institute, National University of Ireland, <strong>Galway</strong><br />
firstname.lastname@deri.org<br />
Abstract<br />
Concordance analysis supports users in studying how<br />
terms are used in a document vs. another by<br />
investigating their usage contexts. This type of analysis<br />
is useful in many domains, from literary study to market<br />
analysis. However, as current approaches usually<br />
present a large set of contexts in their full text form or<br />
as a large frequency-based word cloud, they still<br />
require a lot of effort from users to make sense of the<br />
underlying complex and dynamic semantic dimensions<br />
of contexts. To address this limitation, we propose<br />
Context Stamp as a visual representation of the gist of a<br />
term's usage contexts. To abstract away the textual<br />
details and yet retain the core facets of a term's contexts<br />
for visualization, we blend a statistical topic modeling<br />
method with a combination of the treemaps and Seesoft<br />
visual metaphors. This paper provides a high level<br />
description of the text analysis method and outlines the<br />
visual design of Context Stamps.<br />
1. Introduction<br />
Apart from the needs to search for and navigate to<br />
relevant information, many knowledge workers also<br />
need to compare different usages of a word in one<br />
document vs. another, or at one point in time vs.<br />
another. This comparison can be achieved by looking<br />
into the usage contexts of the word. In literature<br />
analysis, a concordance is commonly used, which<br />
includes an index of the terms in question, their<br />
frequencies and the surrounding contexts. Concordance<br />
analysis “is intended for understanding properties of<br />
language or for analyzing the structure and content of a<br />
document for its own sake, rather than search” [2]. It<br />
helps users investigate word frequencies, study how<br />
terms are used, or which words tend to go well together.<br />
Outside of the literature domain, concordance analysis<br />
can also be used for other purposes. For instance, in<br />
market analysis, it can be used to track how customers'<br />
responses to a product evolve over time.<br />
2. Research Goal<br />
The inherent categorical nature of text and its very<br />
high dimensionality makes it very challenging to<br />
display the contexts graphically [2]. Our goal is to<br />
propose a novel approach to make it easy for users to<br />
quickly compare the sets of contexts within which a<br />
term is used in one document vs another.<br />
Here we consider:<br />
•Instead of presenting a term's set of contexts in their<br />
original textual form, can the details be abstracted away<br />
163<br />
and only their gist retained to let users make contextual<br />
sense of the term?<br />
•Which visualization elements can be used together<br />
to convey both the distribution of a term and its<br />
contexts at different levels of detail?<br />
3. Text Analysis and Visual Design<br />
As with other visual analytic solutions, our focus is<br />
not to propose a new visual metaphor, but to identify a<br />
good automated algorithm for the analysis task, and<br />
then integrate the results with appropriate visualization<br />
and interaction techniques.<br />
To abstract away textual details of a set of contexts<br />
and yet retain facets of rich information about them, we<br />
rely on statistical topic models to obtain the relationship<br />
between the mental representation of language<br />
(meaning) and its manifestation in written form. Instead<br />
of treating documents as bags of words, a topic model<br />
treats documents as mixtures of latent topics, and each<br />
topic is a probability distribution over words. A word<br />
can be assigned to various topics with different<br />
probabilities, depending on its levels of association with<br />
various strands of meaning. With a topic model, we<br />
have the inferred distributions of topics within<br />
documents, and the distributions of words over the<br />
topics. The key outcomes are the compositions of the<br />
inferred topics, which are coherent clusters of<br />
thematically related words. While the inferred model is<br />
imperfect as finding the optimal model parameters for a<br />
dataset is non-trivial, these topic-word distributions can<br />
be employed to abstract away textual details of a term's<br />
set of contexts in a document.<br />
To show both the distributions of a term and the<br />
compositions of the core elements of its contexts, we<br />
propose a visualization that is an innovative<br />
combination of the treemaps metaphor [3] and the<br />
Seesoft-based visualization [1].<br />
Further details are available in [4].<br />
4. References<br />
[1] S. G. Eick. Graphically displaying text. Journal of<br />
Computational & Graphical Statistics, 3:127<strong>–</strong>142, 1994.<br />
[2] M. A. Hearst. Search User Interfaces. Cambridge<br />
University Press, New York, NY, USA, 2009.<br />
[3] B. Shneiderman. Tree visualization with tree-maps: 2-d<br />
space-filling approach. ACM Transactions on Graphics,<br />
11:92<strong>–</strong>99, January 1992.<br />
[4] V. Thai, S. Handschuh. Context Stamp - A Topic-based<br />
Content Abstraction for Visual Concordance Analysis. In<br />
Proceedings of the 29th of the international conference<br />
extended abstracts on Human factors in computing systems<br />
(CHI 2011), ACM, 2011.