NUI Galway – UL Alliance First Annual ENGINEERING AND - ARAN ...
NUI Galway – UL Alliance First Annual ENGINEERING AND - ARAN ...
NUI Galway – UL Alliance First Annual ENGINEERING AND - ARAN ...
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Personalised Document Corpora Exploration<br />
Abstract<br />
In our research, we explore ways of enhancing<br />
personalised retrieval of relevant documents in open<br />
text document corpora. While a lot of research has been<br />
done in document retrieval, most approaches look at<br />
ways of improving single-step query answering.<br />
Because the users might have more complex<br />
information needs than they can express in a short<br />
query, our scope is to research how a system can<br />
actively assist the users in their exploratory search over<br />
a sequence of steps (i.e., queries).<br />
1. Introduction<br />
Internet has become one of the main information<br />
access points on the planet due to the great amounts of<br />
text documents it provides. Two types of information<br />
search have been identified: (i) look-up search (e.g., fact<br />
retrieval) and (ii) exploratory search (e.g., learning,<br />
investigation, analysis) [1]. Our focus is on enhancing<br />
exploratory search with machine learning techniques.<br />
Currently, there are very few approaches towards<br />
information exploration, while most of research in the<br />
area of personalised search focuses on the look-up<br />
search [1]. However, since many users come to a search<br />
engine for exploratory search, our research is directed<br />
towards a system which is able to assist the user in his<br />
“exploration” over an unlimited sequence of queries.<br />
An important challenge is that new information items<br />
become available every day and the system must be able<br />
to accommodate any dynamic corpus of documents in<br />
unsupervised fashion. Moreover, different users need<br />
different information even when they ask the same<br />
query, so user modeling is crucial for our work.<br />
2. Our Approach<br />
2.1 Domain modeling<br />
In order to be able to guide the user towards<br />
information items of interest, the system must keep a<br />
“browsable” domain model, and constantly map the<br />
user’s state to that model.<br />
In our approach, the text documents are firstly<br />
processed for extraction of probabilistic topic models.<br />
Latent Dirichlet Allocation (LDA) is a well-known<br />
algorithm for this purpose which we explore in this<br />
setting. The obtained topics are then used to label and<br />
cluster the documents. Many algorithms have been<br />
researched for document clustering. Still, one challenge<br />
is to identify for each user the level of granularity he can<br />
internalise or is interested in. For example, the level of<br />
details an expert feels comfortable with is not the same<br />
with that of a beginner in the same subject.<br />
Ioana Hulpuș, Conor Hayes<br />
DERI, National University of Ireland, <strong>Galway</strong><br />
{name.surname@deri.org}<br />
88<br />
Documents are also connected based on their<br />
citation and bibliographic networks. The resulting<br />
network structure enables the navigation of documents<br />
based on authors and their citations. The bibliographic<br />
links enable the identification of communities, which is<br />
currently a highly researched topic. However, these<br />
communities are created based on citations, and at the<br />
same time, one of the most common ways of scientific<br />
data exploration is by following citation paths. Thus, by<br />
traversing only bibliographic paths, the users can get<br />
stuck into the point of view of one community. Our aim<br />
is to use the available research in the direction of<br />
community detection and make the user aware of the<br />
different communities dealing with his topics of interest.<br />
2.2 Trace models and trace based reasoning<br />
This model captures the “interaction traces” between<br />
the user and the system. Trace-based reasoning (TBR)<br />
has its root in case-based reasoning, as it solves new<br />
problems by adapting past solutions (i.e., user traces).<br />
Past traces can be used to infer the context of a user’s<br />
search, which is very important for assessing relevance<br />
of information. At the same time, machine learning can<br />
be used on the repository of traces, in order to identify<br />
patterns of information requests. The idea behind this is<br />
that if some documents or queries are constantly<br />
accessed together by various users, they might be<br />
related.<br />
2.3 User modeling<br />
User modeling is very much related to trace<br />
modeling as each user has assigned his past traces. As<br />
well, the system will try to capture the user’s level of<br />
expertise and his topics of interest, through a mixed<br />
initiative approach.<br />
3. Conclusions<br />
Exploratory search is still an under-studied area of<br />
personalised information retrieval. In order to address<br />
this topic, we plan to use machine learning for (i)<br />
creating a browsable hierarchical structure of the<br />
domain, (ii) inferring the user’s expertise, (iii) and to<br />
identify latent document relations. This knowledge is<br />
then used to guide the user and recommend documents<br />
suitable to his information needs.<br />
4. References<br />
[1] Marchionini, G. (2006) Exploratory search: From<br />
finding to understanding. Communications of the<br />
ACM, 49(4), pp. 41<strong>–</strong>46.