Programme booklet (pdf)
Programme booklet (pdf)
Programme booklet (pdf)
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Abstract<br />
38<br />
CLIN 21 – CONFERENCE PROGRAMME<br />
Computing Semantic Relations from Heterogeneous<br />
Information Sources<br />
Panchenko, Alexander<br />
UCL CENTAL<br />
Computation of semantic relations between terms or concepts is a general problem in<br />
Natural Language Processing and a subtask of automatic thesaurus construction.<br />
This work describes and compares available heterogeneous information sources which<br />
can be used for mining semantic relations such as texts, electronic dictionaries and<br />
encyclopedias, lexical ontologies and thesauri, folksonomies, surfaces of words, query<br />
logs of search engines, and so forth. Most of the existing algorithms use a single<br />
information source for extracting semantic knowledge: Distributional Analysis relies on<br />
text, Extented Lesk uses dictionary definitions, Jiang-Conrath distance employs a<br />
semantic network such as WordNet and so on. We show that different methods<br />
capture different aspects of the terms’ relatedness: while one acquires similarities of<br />
word contexts, others capture similarities of syntactic contexts, term definitions,<br />
surfaces forms etc.<br />
In these settings, there is a need for a general model capable to aggregate different<br />
aspects of semantic similarity from all available information sources and methods in an<br />
optimal and consistent way. We discuss how such a model can be implemented with a<br />
linear combination, and using tensors (i.e. multi-way arrays). We describe two ways of<br />
using tensors for calculation of semantic relations in the context of multiple<br />
information sources, which we call “adjacency tensor” and “feature tensor”. The sparse<br />
tensor factorization methods PARAFAC, Non-negative Tensor Factorization (NTF), and<br />
Memory-Efficient Tucker (MET) are suggested in order to fusion information about<br />
terms from different methods and information sources. We conclude that tensors can<br />
be used for representing terms, while tensor factorizations can serve to generalize data<br />
about terms’ relatedness.<br />
Corresponding author: alexander.panchenko@student.uclouvain.be