Programme booklet (pdf)
Programme booklet (pdf)
Programme booklet (pdf)
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
62<br />
CLIN 21 – CONFERENCE PROGRAMME<br />
Subtrees as a new type of context in Word Space Models<br />
Abstract<br />
Smets, Margaux and Speelman, Dirk and Geeraerts, Dirk<br />
QLVL, K.U.Leuven<br />
In Word Space Models (WSMs) there are traditionally two types of contexts that can be<br />
used: (i) lexical co-occurrences (`bag-of-words models') and (ii) syntactic dependencies.<br />
In general, models with the second type of contexts seem to perform better. However,<br />
there are some problems with these models. In the first place, a choice has to be made<br />
which contexts to include: only subject/verb and verb/object-relations, or also other<br />
dependencies . Second, in contrast with bag-of-words models, the syntactic models are<br />
supervised: they require quite large resources (a dependency parser, a manually<br />
annotated corpus, . . .), which might not be available for each language .<br />
The contexts we propose for use in WSMs are subtrees as defined in the framework of<br />
Data-Oriented-Parsing. Subtrees can capture both bag-of-words (co- occurrence)<br />
information, and syntactic information. Moreover, they are not limited to specific types<br />
of dependencies, but rather take entire structures into account.<br />
At first sight, it might seem that the problem of resources for dependency-WSMs<br />
remains in this framework. After all, we first need the `correct' tree for a sentence,<br />
before we can extract subtrees from it. However, in our experiments we show how the<br />
entire algorithm can be made unsupervised by using an unsupervised parser as a<br />
preprocessing step.<br />
In the presentation, I will first discuss in detail the workings of this new type of WSM.<br />
Next, I will present some initial results from experiments with parameters such as the<br />
accuracy of the parser in the preprocessing step, the maximum subtree depth, the<br />
minimum subtree frequency, and considering only subtrees with the highest variance.<br />
Corresponding author: margauxsmets@gmail.com