24.07.2013 Views

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

6.5. Discussion 111<br />

Ano<strong>the</strong>r contribution <strong>of</strong> this part <strong>of</strong> <strong>the</strong> work is that TeP, originally made for<br />

Brazilian Portuguese, is enriched with words from dictionaries whose entries contain,<br />

mainly 3 , words from European Portuguese. Therefore, besides being larger, <strong>the</strong><br />

new <strong>the</strong>saurus has a better coverage <strong>of</strong> European Portuguese than TeP. Also, once<br />

again due to its public domain character, <strong>the</strong> resulting <strong>the</strong>saurus is ano<strong>the</strong>r suitable<br />

alternative to replace OpenThesaurus.<strong>PT</strong> as <strong>the</strong> <strong>the</strong>saurus <strong>of</strong> <strong>the</strong> OpenOffice word<br />

processor.<br />

One limitation <strong>of</strong> <strong>the</strong> work presented here is <strong>the</strong> amount <strong>of</strong> observation labour<br />

required to select <strong>the</strong> best assignment settings. An alternative would be to develop a<br />

procedure to learn automatically <strong>the</strong> best measures and thresholds for associating a<br />

synpair to a synset. Given that we already have a small gold resource, a supervised<br />

learning approach, would suit this purpose. A simple linear classifier, such as a<br />

perceptron (Rosenblatt, 1958) would probably be enough to, given a set <strong>of</strong> labelled<br />

correct and incorrect examples for each assignment, learn <strong>the</strong> best threshold. This<br />

will be devised as future work. Also, in order to get more reliable results, <strong>the</strong> gold<br />

resource should as well be augmented. As it currently contains only nouns, in <strong>the</strong><br />

future, especially special attention should be given to <strong>the</strong> inclusion <strong>of</strong> verbs and<br />

adjectives.<br />

3 Wiktionary.<strong>PT</strong> covers all variants <strong>of</strong> Portuguese, and PAPEL contains a minority <strong>of</strong> words in<br />

o<strong>the</strong>r variants <strong>of</strong> Portuguese, including Brazilian, Angolan and Mozambican.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!