29.01.2014 Views

GWC 2008

GWC 2008

GWC 2008

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

SemanticNet: a WordNet-based Tool for the Navigation of Semantic… 23<br />

also extract the specialized semantics keys from the structured part of a document and<br />

return the generic semantic key mapped to these keys by means of the conceptual<br />

mapping.<br />

Unstructured parts need a linguistic analysis and a semantic interpretation to be<br />

performed by means of NLP techniques. The main tools involved are:<br />

• the Syntactic Analyzer and Disambiguator, a module for the syntactic analysis,<br />

integrated with the Link Grammar parser [6], a highly lexical, context-free<br />

formalism. This module identifies the syntactical structure of sentences, and<br />

resolves the terms' roles ambiguity in natural languages;<br />

• the Semantic Analyzer and Disambiguator, a module that analyzes each<br />

sentence identifying roles, meanings of terms and semantic relations in order to<br />

extract “part of speech” information, the synonymy and hypernym relations<br />

from the WordNet semantic net. It also evaluates terms contained in the<br />

document by means of a density function based on the synonyms and<br />

hypernyms frequency [7];<br />

• the Classifier, a module that classifies documents automatically. As proposed in<br />

WordNet Domains ([8] and [9]), a lexical resource representing domain<br />

associations between terms, the module applies a classification algorithm based<br />

on the Dewey Decimal Classification (DDC) and associates a set of categories<br />

and a weight to each document.<br />

The analysis of structured parts, followed by the linguistic analysis and the semantic<br />

interpretation of unstructured parts, produces as results three types of semantic keys:<br />

• a synset ID identifying a particular sense of a term of WordNet<br />

• a category name given by a possible classification<br />

• a key composed by a word and a document category, i.e. when the semantic key<br />

related to the word is not included in the WordNet vocabulary.<br />

Finally, all semantic keys are used to index the document whereas in searching<br />

phase they are used in order to retrieve document descriptors using the SemanticNet<br />

through the concept of semantic vicinity.<br />

2.2 The Specific Context<br />

In a specific context the system adopts a formal or terminological ontology describing<br />

the specific semantic domain. The semantic keys are the identifiers of concepts in the<br />

specialized ontology and the classification is performed using the taxonomy defined<br />

in it.<br />

Different analysis are performed in structured and unstructured parts of the<br />

document, as identified in the generic context. Moreover two types of indexing are<br />

performed: one with a set of generic semantic keys and the other with a set of<br />

specialized semantic keys.<br />

The module extracts all of the specialized semantic keys each time a structured part<br />

related to a specific context is identified in a document. Otherwise the system

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!