29.01.2014 Views

GWC 2008

GWC 2008

GWC 2008

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

SemanticNet: a WordNet-based Tool for the Navigation of Semantic… 25<br />

concepts through relations. Formulating the query , the user can move through the net<br />

and extract the concepts which really interest him, limiting the search field and<br />

obtaining a more specific result. The enriched semantic net can also be used directly<br />

by the system without the user being aware of it. In fact, the system receives and<br />

elaborates queries by means of the SemanticNet, extracts from the query the concepts<br />

and their related relations, then shows the user a result set related to the new concepts<br />

found as well as the found categories.<br />

The automatic creation of a conceptual knowledge map using documents coming<br />

from the Web is a very relevant problem and it is very hard because of the difficulty<br />

to distinguish between valid and invalid contents documents. We therefore realized<br />

the importance of being able to access a multidisciplinary structure of documents, so a<br />

great amount of documents included in Wikipedia [12] was used to extract new<br />

knowledge and to define a new semantic net enriching WordNet with new terms, their<br />

classification and new associative relations.<br />

In fact WordNet, as semantic net, is too limited with respect to the web language<br />

dictionary. WordNet contains about 150,000 words organized in over 115,000 synsets<br />

whereas Wikipedia contains about 1.900.000 encyclopedic information; the number<br />

of connections between words related by topics is limited; several word “senses” are<br />

not included in WordNet. These are only some of the reasons that convinced us to<br />

enrich the WordNet semantic net , as emphasized in [13] where authors identify this<br />

and 5 other weaknesses in the WordNet semantic net constitution.<br />

We chose Wikipedia, the free content encyclopedia, excluding other solutions<br />

such as language specific thesaurus or on-line encyclopedia available only in a<br />

specific language. A conceptual map built using Wikipedia pages allows a user to<br />

associate a concept to other ones enriched with some relations that an author points<br />

out. The use of Wikipedia guarantees, with reasonable certainty, that such conceptual<br />

connection is valid because it is produced by someone who, at least theoretically, has<br />

the skills or the experience to justify it. Moreover, the rules and the reviewers controls<br />

set up guarantee reliability and objectivity and the correctness of the inserted topics.<br />

The reviewers also control the conformity of the added or modified voices.<br />

What we are more interested in are terms and their classification in order to build<br />

an enriched semantic net, called “SemanticNet” to be used in the searching phase in<br />

the general context, while in the specific context we build a specialized Semantic Net<br />

(sSN). The reason for this is that varied mental association of places, events, persons<br />

and things depend on the cultural backgrounds of the users' personal history. In fact,<br />

the ability to associate a concept to another is different from person to person. The<br />

SemanticNet is definitely not exhaustive, it is limited by the dictionary of WordNet,<br />

by the contents included in Wikipedia and by the accuracy of the information given<br />

by the system.<br />

Starting from the information contained in Wikipedia about a term of WordNet, the<br />

system is capable of enriching the SemanticNet by adding new nodes, links and<br />

attributes, such as IS-A or PART-OF relations. Moreover, the system is able to<br />

classify the textual contents of web resources, indexed through the Classifier that uses<br />

WordNet Domains and applies a density function ([1], [2]), based on the computation<br />

of the number of synsets related to each term of the document. In this way, it is able to<br />

retrieve the most frequently used “senses” by extracting the synonyms relations given<br />

by the use of similar terms in the document. Through the categorization of the

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!