06.08.2015 Views

A Wordnet from the Ground Up

A Wordnet from the Ground Up - School of Information Technology ...

A Wordnet from the Ground Up - School of Information Technology ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4.5. Hybrid Combinations 143eliminated, but <strong>the</strong> error of elimination is small. Even so, we are still ra<strong>the</strong>r far <strong>from</strong>a support tool truly valuable <strong>from</strong> <strong>the</strong> linguists’ point of view.Because of <strong>the</strong> small size of plWordNet, it will be a laborious process to prepare amore demanding training set. In <strong>the</strong> case of each LU pair we can suspect that it is notyet described in plWordNet – building <strong>the</strong> set means expanding <strong>the</strong> wordnet. None<strong>the</strong>less,it can be done and some bootstrapping approach can be applied in improving <strong>the</strong>classifier and expanding <strong>the</strong> wordnet. The next section presents work along <strong>the</strong>se lines.In contrast with (Snow et al., 2005), who use directly lexico-syntactic features, weproposed a two-step approach. It is intrinsically based on MSR, on whose quality itsomewhat depends. On <strong>the</strong> o<strong>the</strong>r hand, a good MSR can introduce a general descriptionof relations among LUs and deliver knowledge derived <strong>from</strong> a very large number ofcontexts, not only direct LU co-occurrences. The complex attributes designed for<strong>the</strong> classifiers are a form of pre-processing. They express condensed information thatfacilitates <strong>the</strong> classifiers’ decision processes. The results achieved on <strong>the</strong> manual testset M shows that <strong>the</strong> present set of attributes does not give enough evidence fordistinguishing near-synonyms and close hypernyms <strong>from</strong> co-hyponyms. More researchis necessary on o<strong>the</strong>r possible sources of knowledge.4.5.3 Multicriteria voting in wordnet expansionA wordnet is built of LUs, synsets and relation links. After a ra<strong>the</strong>r unsuccessfulattempt to acquire lemmas for LUs <strong>from</strong> corpora (Section 2.4), we took an initialbatch <strong>from</strong> a small dictionary (Piotrowski and Saloni, 1999). We tackled <strong>the</strong> extractionof wordnet relation instances several times. We considered Measures of SemanticRelatedness [MSRs] (Section 3.4), manually constructed patterns (Section 4.1), automaticallyextracted patterns (Section 4.3) and a classifier-based method (Section 4.5.1).We have not achieved results better than around 30% of accuracy, but many symptomssuggest that a combination of algorithms can improve <strong>the</strong> accuracy a lot. In this sectionwe will investigate thoroughly this possibility. The extraction of synsets, on <strong>the</strong> o<strong>the</strong>rhand, seems to be a serious problem. We could notice this in Section 3.5. The bestclustering algorithm produces interesting results,but is still far <strong>from</strong> being a source ofautomatically extracted synsets. Clustering of LUs is a self-organising process and<strong>the</strong>refore raises expectations which in our case have not been met. From <strong>the</strong> pointof view of future wordnet user one would expect, if not directly synsets, than somegeneral but intuitively distinguished and useful classes as represented by <strong>the</strong> higherlevels of <strong>the</strong> hypernymy structure.Clustering algorithms also tend to produce a flat set of clusters. Changing such a setof clusters into a hierarchy poses two problems: how to identify <strong>the</strong> right shape of <strong>the</strong>tree and how to label higher levels of <strong>the</strong> cluster tree with <strong>the</strong> adequately general LUs.Moreover, most hierarchical clustering methods produces strict trees, while a wordnet

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!