Comparing the value of Latent Semantic Analysis on two English-to ...
Comparing the value of Latent Semantic Analysis on two English-to ...
Comparing the value of Latent Semantic Analysis on two English-to ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
tences. We <str<strong>on</strong>g>the</str<strong>on</strong>g>n show <str<strong>on</strong>g>the</str<strong>on</strong>g> actual textual c<strong>on</strong>text set<br />
with <str<strong>on</strong>g>the</str<strong>on</strong>g> same notati<strong>on</strong> as above.<br />
WordNet Synset ID: 100319939, Words: chase, following,<br />
pursual, pursuit, Gloss: <str<strong>on</strong>g>the</str<strong>on</strong>g> act <str<strong>on</strong>g>of</str<strong>on</strong>g> pursuing in an effort <strong>to</strong> overtake<br />
or capture, Example: <str<strong>on</strong>g>the</str<strong>on</strong>g> culprit started <strong>to</strong> run and <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
cop <strong>to</strong>ok <str<strong>on</strong>g>of</str<strong>on</strong>g>f in pursuit, Textual c<strong>on</strong>text set: {{following,<br />
chase}, {<str<strong>on</strong>g>the</str<strong>on</strong>g>, effort, <str<strong>on</strong>g>of</str<strong>on</strong>g>, <strong>to</strong>, or, capture, in, act, pursuing, an},<br />
{<str<strong>on</strong>g>the</str<strong>on</strong>g>, <str<strong>on</strong>g>of</str<strong>on</strong>g>f, <strong>to</strong>ok, <strong>to</strong>, run, in, culprit, started, and}}<br />
KBBI ID: k39607 - Similarity: 0.804, Sublemma: mengejar,<br />
Definiti<strong>on</strong>: berlari untuk menyusul menangkap dsb memburu,<br />
Example: ia berusaha mengejar dan menangkap saya,<br />
Textual c<strong>on</strong>text set: {{mengejar}, {memburu, berlari, menangkap,<br />
untuk, menyusul},{berusaha, dan, ia, mengejar,<br />
saya, menangkap}}<br />
(a)<br />
WordNet synset ID: 201277784, Words: crease, furrow,<br />
wrinkle<br />
Gloss: make wrinkled or creased, Example: furrow <strong>on</strong>e’s<br />
brow,<br />
Textual c<strong>on</strong>text set: {{}, {or, make}, {s, <strong>on</strong>e}}<br />
KBBI ID: k02421 - Similarity: 0.69, Sublemma: alur, Definiti<strong>on</strong>:<br />
jalinan peristiwa dl karya sastra untuk mencapai efek<br />
tertentu pautannya dapat diwujudkan oleh hubungan temporal<br />
atau waktu dan oleh hubungan kausal atau sebab-akibat, Example:<br />
(n<strong>on</strong>e), Textual c<strong>on</strong>text set: {{alur}, {oleh, dan, atau,<br />
jalinan, peristiwa, diwujudkan, efek, dapat, karya, hubungan,<br />
waktu, mencapai, untuk, tertentu}, {}}<br />
(b)<br />
Table 4. Example <str<strong>on</strong>g>of</str<strong>on</strong>g> (a) Successful and (b) Unsuccessful<br />
C<strong>on</strong>cept Mappings<br />
In <str<strong>on</strong>g>the</str<strong>on</strong>g> first example, <str<strong>on</strong>g>the</str<strong>on</strong>g> textual c<strong>on</strong>text sets<br />
from both <str<strong>on</strong>g>the</str<strong>on</strong>g> WordNet synset and <str<strong>on</strong>g>the</str<strong>on</strong>g> KBBI<br />
senses are fairly large, and provide sufficient c<strong>on</strong>text<br />
for LSA <strong>to</strong> choose <str<strong>on</strong>g>the</str<strong>on</strong>g> correct KBBI sense.<br />
However, in <str<strong>on</strong>g>the</str<strong>on</strong>g> sec<strong>on</strong>d example, <str<strong>on</strong>g>the</str<strong>on</strong>g> textual c<strong>on</strong>text<br />
set for <str<strong>on</strong>g>the</str<strong>on</strong>g> synset is very small, due <strong>to</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
words not appearing in <str<strong>on</strong>g>the</str<strong>on</strong>g> training collecti<strong>on</strong>. Fur<str<strong>on</strong>g>the</str<strong>on</strong>g>rmore,<br />
it does not c<strong>on</strong>tain any <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> words that<br />
truly c<strong>on</strong>vey <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>cept. As a result, LSA is unable<br />
<strong>to</strong> identify <str<strong>on</strong>g>the</str<strong>on</strong>g> correct KBBI sense.<br />
For this experiment, we used <str<strong>on</strong>g>the</str<strong>on</strong>g> P 1000 training<br />
collecti<strong>on</strong>. The results are presented in Table 5. As<br />
a baseline, we select three random suggested Ind<strong>on</strong>esian<br />
word senses as a mapping for an <strong>English</strong><br />
word sense. The reported random baseline in Table<br />
5 is an average <str<strong>on</strong>g>of</str<strong>on</strong>g> 10 separate runs. Ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r baseline<br />
was computed by comparing <strong>English</strong> comm<strong>on</strong>-based<br />
c<strong>on</strong>cepts <strong>to</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g>ir suggesti<strong>on</strong> based <strong>on</strong> a<br />
full rank word-document matrix. Top 3 Ind<strong>on</strong>esian<br />
c<strong>on</strong>cepts with <str<strong>on</strong>g>the</str<strong>on</strong>g> highest similarity <str<strong>on</strong>g>value</str<strong>on</strong>g>s are designated<br />
as <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping results. Subsequently, we<br />
compute <str<strong>on</strong>g>the</str<strong>on</strong>g> Fleiss kappa (Fleiss, 1971) <str<strong>on</strong>g>of</str<strong>on</strong>g> this result<br />
<strong>to</strong>ge<str<strong>on</strong>g>the</str<strong>on</strong>g>r with <str<strong>on</strong>g>the</str<strong>on</strong>g> human judgements.<br />
The average level <str<strong>on</strong>g>of</str<strong>on</strong>g> agreement between <str<strong>on</strong>g>the</str<strong>on</strong>g><br />
LSA mappings 10% and <str<strong>on</strong>g>the</str<strong>on</strong>g> human judges<br />
(0.2713) is not as high as between <str<strong>on</strong>g>the</str<strong>on</strong>g> human<br />
judges <str<strong>on</strong>g>the</str<strong>on</strong>g>mselves (0.4831). Never<str<strong>on</strong>g>the</str<strong>on</strong>g>less, in general<br />
it is better than <str<strong>on</strong>g>the</str<strong>on</strong>g> random baseline (0.2380)<br />
and frequency baseline (0.2132), which suggests<br />
that LSA is indeed managing <strong>to</strong> capture some<br />
measure <str<strong>on</strong>g>of</str<strong>on</strong>g> bilingual semantic informati<strong>on</strong> implicit<br />
within <str<strong>on</strong>g>the</str<strong>on</strong>g> parallel corpus.<br />
Fur<str<strong>on</strong>g>the</str<strong>on</strong>g>rmore, LSA mappings with 10% rank approximati<strong>on</strong><br />
yields higher levels <str<strong>on</strong>g>of</str<strong>on</strong>g> agreement than<br />
LSA with o<str<strong>on</strong>g>the</str<strong>on</strong>g>r rank approximati<strong>on</strong>s. It is c<strong>on</strong>tradic<strong>to</strong>ry<br />
with <str<strong>on</strong>g>the</str<strong>on</strong>g> word mapping results where LSA<br />
with bigger rank approximati<strong>on</strong>s yields higher results<br />
(Secti<strong>on</strong> 4.2).<br />
5 Discussi<strong>on</strong><br />
Previous works have shown LSA <strong>to</strong> c<strong>on</strong>tribute<br />
positive gains <strong>to</strong> similar tasks such as Cross Language<br />
Informati<strong>on</strong> Retrieval (Rehder et al., 1997).<br />
However, <str<strong>on</strong>g>the</str<strong>on</strong>g> bilingual word mapping results presented<br />
in Secti<strong>on</strong> 4.3 show <str<strong>on</strong>g>the</str<strong>on</strong>g> basic vec<strong>to</strong>r space<br />
model c<strong>on</strong>sistently outperforming LSA at that particular<br />
task, despite our initial intuiti<strong>on</strong> that LSA<br />
should actually improve precisi<strong>on</strong> and recall.<br />
We speculate that <str<strong>on</strong>g>the</str<strong>on</strong>g> task <str<strong>on</strong>g>of</str<strong>on</strong>g> bilingual word<br />
mapping may be even harder for LSA than that <str<strong>on</strong>g>of</str<strong>on</strong>g><br />
Judges<br />
Synsets<br />
Judges <strong>on</strong>ly<br />
Judges +<br />
RNDM3<br />
Fleiss Kappa Values<br />
Judges + Judges +<br />
FREQ Top LSA 10%<br />
3<br />
Top3<br />
Judges +<br />
LSA 25%<br />
Top3<br />
Judges +<br />
LSA 50%<br />
Top3<br />
≥ 2 144 0.4269 0.1318 0.1667 0.1544 0.1606 0.1620<br />
≥ 3 24 0.4651 0.2197 0.2282 0.2334 0.2239 0.2185<br />
≥ 4 8 0.5765 0.3103 0.2282 0.3615 0.3329 0.3329<br />
≥ 5 4 0.4639 0.2900 0.2297 0.3359 0.3359 0.3359<br />
Average 0.4831 0.2380 0.2132 0.2713 0.2633 0.2623<br />
Table 5. Results <str<strong>on</strong>g>of</str<strong>on</strong>g> C<strong>on</strong>cept Mapping