24.07.2013 Views

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

Onto.PT: Towards the Automatic Construction of a Lexical Ontology ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

102 Chapter 6. Thesaurus Enrichment<br />

Measure Mode σ Sets/Pair P R RR F1 RF1 F0.5 RF0.5<br />

All – – 4.31 0.37 1.00 1.00 0.54 0.54 0.42 0.42<br />

Best<br />

0.00<br />

0.15<br />

1.07<br />

0.52<br />

0.63<br />

0.72<br />

0.49<br />

0.25<br />

0.81<br />

0.45<br />

0.55<br />

0.37<br />

0.71<br />

0.56<br />

0.59<br />

0.53<br />

0.66<br />

0.65<br />

Jaccard<br />

All<br />

0.05<br />

0.1<br />

0.15<br />

3.14<br />

1.52<br />

0.77<br />

0.45<br />

0.59<br />

0.70<br />

0.86<br />

0.52<br />

0.33<br />

0.96<br />

0.69<br />

0.48<br />

0.59<br />

0.55<br />

0.45<br />

0.61<br />

0.63<br />

0.57<br />

0.49<br />

0.57<br />

0.57<br />

0.50<br />

0.61<br />

0.64<br />

0.2 0.41 0.78 0.20 0.32 0.32 0.46 0.50 0.61<br />

Best<br />

0.00<br />

0.15<br />

1.06<br />

1.04<br />

0.63<br />

0.64<br />

0.48<br />

0.47<br />

0.81<br />

0.80<br />

0.55<br />

0.54<br />

0.71<br />

0.71<br />

0.60<br />

0.60<br />

0.66<br />

0.67<br />

0.1 3.98 0.39 0.96 0.98 0.55 0.56 0.44 0.44<br />

Overlap<br />

All<br />

0.4<br />

0.45<br />

0.5<br />

1.28<br />

1.05<br />

0.88<br />

0.63<br />

0.67<br />

0.70<br />

0.44<br />

0.37<br />

0.32<br />

0.61<br />

0.53<br />

0.48<br />

0.52<br />

0.47<br />

0.44<br />

0.62<br />

0.59<br />

0.57<br />

0.58<br />

0.57<br />

0.57<br />

0.63<br />

0.63<br />

0.64<br />

0.55 0.65 0.74 0.25 0.37 0.37 0.50 0.53 0.62<br />

0.6 0.50 0.79 0.19 0.31 0.30 0.45 0.48 0.60<br />

Best<br />

0.00<br />

0.15<br />

1.06<br />

0.87<br />

0.63<br />

0.64<br />

0.49<br />

0.45<br />

0.81<br />

0.7<br />

0.55<br />

0.53<br />

0.71<br />

0.67<br />

0.60<br />

0.59<br />

0.66<br />

0.65<br />

0.1 2.97 0.46 0.85 0.95 0.60 0.62 0.51 0.51<br />

Dice<br />

All<br />

0.15<br />

0.2<br />

0.25<br />

2.00<br />

1.26<br />

0.81<br />

0.54<br />

0.62<br />

0.71<br />

0.65<br />

0.45<br />

0.35<br />

0.80<br />

0.63<br />

0.50<br />

0.59<br />

0.52<br />

0.47<br />

0.64<br />

0.62<br />

0.59<br />

0.56<br />

0.57<br />

0.59<br />

0.58<br />

0.62<br />

0.66<br />

0.3 0.55 0.77 0.25 0.38 0.38 0.51 0.54 0.64<br />

0.35 0.35 0.81 0.16 0.27 0.26 0.41 0.44 0.58<br />

Best<br />

0.00<br />

0.15<br />

1.05<br />

0.94<br />

0.64<br />

0.66<br />

0.48<br />

0.45<br />

0.81<br />

0.75<br />

0.55<br />

0.53<br />

0.71<br />

0.70<br />

0.60<br />

0.60<br />

0.66<br />

0.68<br />

0.1 3.34 0.44 0.89 0.97 0.59 0.60 0.49 0.49<br />

0.15 2.40 0.52 0.75 0.88 0.61 0.66 0.55 0.57<br />

Cosine<br />

0.2 1.58 0.58 0.53 0.70 0.55 0.64 0.57 0.60<br />

All 0.25 1.08 0.66 0.41 0.58 0.51 0.61 0.59 0.64<br />

0.3 0.74 0.74 0.32 0.48 0.45 0.58 0.59 0.67<br />

0.35 0.48 0.82 0.21 0.35 0.34 0.49 0.52 0.64<br />

0.4 0.32 0.80 0.15 0.25 0.25 0.37 0.43 0.55<br />

Table 6.3: Evaluation against intersection <strong>of</strong> annotators 1 and 2.<br />

1. Create a new sparse matrix M(|V | × |V |).<br />

2. In each cell Mij, put <strong>the</strong> similarity between <strong>the</strong> adjacency vectors <strong>of</strong> <strong>the</strong> word<br />

in vi with <strong>the</strong> adjacency vectors <strong>of</strong> vj, Mij = sim(vi, vj);<br />

3. Extract a cluster Ci from each row Mi, consisting <strong>of</strong> <strong>the</strong> words vj where Mij > θ,<br />

a selected threshold. A lower θ leads to larger synsets and higher ambiguity,<br />

while a larger θ will result on less and smaller synsets or no synsets at all.<br />

4. For each cluster Ci with all elements included in a larger cluster Cj (Ci∪Cj = Cj<br />

and Ci ∩ Cj = Ci), Ci and Cj are merged, giving rise to a new cluster Ck with<br />

<strong>the</strong> same elements <strong>of</strong> Cj.<br />

After clustering, we will have a <strong>the</strong>saurus T with synsets Si and a set <strong>of</strong> discovered<br />

clusters C. A simple thing to do would be to handle <strong>the</strong> clusters as synsets and add<br />

<strong>the</strong>n to <strong>the</strong> <strong>the</strong>saurus. However, some <strong>of</strong> <strong>the</strong> clusters might be already included or<br />

partly included in existing synsets. Therefore, before adding <strong>the</strong> clusters to T , we<br />

compute <strong>the</strong> similarity between <strong>the</strong> words in each synset Si and <strong>the</strong> words in each<br />

discovered cluster Cj. For this purpose, we measure <strong>the</strong> overlap between <strong>the</strong> former<br />

sets, using <strong>the</strong> overlap coefficient:<br />

Overlap(Si, Cj) =<br />

|Si ∩ C ||<br />

min(|Si|, |Cj|)<br />

(6.1)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!