12.09.2013 Views

Programme booklet (pdf)

Programme booklet (pdf)

Programme booklet (pdf)

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Abstract<br />

38<br />

CLIN 21 – CONFERENCE PROGRAMME<br />

Computing Semantic Relations from Heterogeneous<br />

Information Sources<br />

Panchenko, Alexander<br />

UCL CENTAL<br />

Computation of semantic relations between terms or concepts is a general problem in<br />

Natural Language Processing and a subtask of automatic thesaurus construction.<br />

This work describes and compares available heterogeneous information sources which<br />

can be used for mining semantic relations such as texts, electronic dictionaries and<br />

encyclopedias, lexical ontologies and thesauri, folksonomies, surfaces of words, query<br />

logs of search engines, and so forth. Most of the existing algorithms use a single<br />

information source for extracting semantic knowledge: Distributional Analysis relies on<br />

text, Extented Lesk uses dictionary definitions, Jiang-Conrath distance employs a<br />

semantic network such as WordNet and so on. We show that different methods<br />

capture different aspects of the terms’ relatedness: while one acquires similarities of<br />

word contexts, others capture similarities of syntactic contexts, term definitions,<br />

surfaces forms etc.<br />

In these settings, there is a need for a general model capable to aggregate different<br />

aspects of semantic similarity from all available information sources and methods in an<br />

optimal and consistent way. We discuss how such a model can be implemented with a<br />

linear combination, and using tensors (i.e. multi-way arrays). We describe two ways of<br />

using tensors for calculation of semantic relations in the context of multiple<br />

information sources, which we call “adjacency tensor” and “feature tensor”. The sparse<br />

tensor factorization methods PARAFAC, Non-negative Tensor Factorization (NTF), and<br />

Memory-Efficient Tucker (MET) are suggested in order to fusion information about<br />

terms from different methods and information sources. We conclude that tensors can<br />

be used for representing terms, while tensor factorizations can serve to generalize data<br />

about terms’ relatedness.<br />

Corresponding author: alexander.panchenko@student.uclouvain.be

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!