26.04.2015 Views

Comparing the value of Latent Semantic Analysis on two English-to ...

Comparing the value of Latent Semantic Analysis on two English-to ...

Comparing the value of Latent Semantic Analysis on two English-to ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

tences. We <str<strong>on</strong>g>the</str<strong>on</strong>g>n show <str<strong>on</strong>g>the</str<strong>on</strong>g> actual textual c<strong>on</strong>text set<br />

with <str<strong>on</strong>g>the</str<strong>on</strong>g> same notati<strong>on</strong> as above.<br />

WordNet Synset ID: 100319939, Words: chase, following,<br />

pursual, pursuit, Gloss: <str<strong>on</strong>g>the</str<strong>on</strong>g> act <str<strong>on</strong>g>of</str<strong>on</strong>g> pursuing in an effort <strong>to</strong> overtake<br />

or capture, Example: <str<strong>on</strong>g>the</str<strong>on</strong>g> culprit started <strong>to</strong> run and <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

cop <strong>to</strong>ok <str<strong>on</strong>g>of</str<strong>on</strong>g>f in pursuit, Textual c<strong>on</strong>text set: {{following,<br />

chase}, {<str<strong>on</strong>g>the</str<strong>on</strong>g>, effort, <str<strong>on</strong>g>of</str<strong>on</strong>g>, <strong>to</strong>, or, capture, in, act, pursuing, an},<br />

{<str<strong>on</strong>g>the</str<strong>on</strong>g>, <str<strong>on</strong>g>of</str<strong>on</strong>g>f, <strong>to</strong>ok, <strong>to</strong>, run, in, culprit, started, and}}<br />

KBBI ID: k39607 - Similarity: 0.804, Sublemma: mengejar,<br />

Definiti<strong>on</strong>: berlari untuk menyusul menangkap dsb memburu,<br />

Example: ia berusaha mengejar dan menangkap saya,<br />

Textual c<strong>on</strong>text set: {{mengejar}, {memburu, berlari, menangkap,<br />

untuk, menyusul},{berusaha, dan, ia, mengejar,<br />

saya, menangkap}}<br />

(a)<br />

WordNet synset ID: 201277784, Words: crease, furrow,<br />

wrinkle<br />

Gloss: make wrinkled or creased, Example: furrow <strong>on</strong>e’s<br />

brow,<br />

Textual c<strong>on</strong>text set: {{}, {or, make}, {s, <strong>on</strong>e}}<br />

KBBI ID: k02421 - Similarity: 0.69, Sublemma: alur, Definiti<strong>on</strong>:<br />

jalinan peristiwa dl karya sastra untuk mencapai efek<br />

tertentu pautannya dapat diwujudkan oleh hubungan temporal<br />

atau waktu dan oleh hubungan kausal atau sebab-akibat, Example:<br />

(n<strong>on</strong>e), Textual c<strong>on</strong>text set: {{alur}, {oleh, dan, atau,<br />

jalinan, peristiwa, diwujudkan, efek, dapat, karya, hubungan,<br />

waktu, mencapai, untuk, tertentu}, {}}<br />

(b)<br />

Table 4. Example <str<strong>on</strong>g>of</str<strong>on</strong>g> (a) Successful and (b) Unsuccessful<br />

C<strong>on</strong>cept Mappings<br />

In <str<strong>on</strong>g>the</str<strong>on</strong>g> first example, <str<strong>on</strong>g>the</str<strong>on</strong>g> textual c<strong>on</strong>text sets<br />

from both <str<strong>on</strong>g>the</str<strong>on</strong>g> WordNet synset and <str<strong>on</strong>g>the</str<strong>on</strong>g> KBBI<br />

senses are fairly large, and provide sufficient c<strong>on</strong>text<br />

for LSA <strong>to</strong> choose <str<strong>on</strong>g>the</str<strong>on</strong>g> correct KBBI sense.<br />

However, in <str<strong>on</strong>g>the</str<strong>on</strong>g> sec<strong>on</strong>d example, <str<strong>on</strong>g>the</str<strong>on</strong>g> textual c<strong>on</strong>text<br />

set for <str<strong>on</strong>g>the</str<strong>on</strong>g> synset is very small, due <strong>to</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

words not appearing in <str<strong>on</strong>g>the</str<strong>on</strong>g> training collecti<strong>on</strong>. Fur<str<strong>on</strong>g>the</str<strong>on</strong>g>rmore,<br />

it does not c<strong>on</strong>tain any <str<strong>on</strong>g>of</str<strong>on</strong>g> <str<strong>on</strong>g>the</str<strong>on</strong>g> words that<br />

truly c<strong>on</strong>vey <str<strong>on</strong>g>the</str<strong>on</strong>g> c<strong>on</strong>cept. As a result, LSA is unable<br />

<strong>to</strong> identify <str<strong>on</strong>g>the</str<strong>on</strong>g> correct KBBI sense.<br />

For this experiment, we used <str<strong>on</strong>g>the</str<strong>on</strong>g> P 1000 training<br />

collecti<strong>on</strong>. The results are presented in Table 5. As<br />

a baseline, we select three random suggested Ind<strong>on</strong>esian<br />

word senses as a mapping for an <strong>English</strong><br />

word sense. The reported random baseline in Table<br />

5 is an average <str<strong>on</strong>g>of</str<strong>on</strong>g> 10 separate runs. Ano<str<strong>on</strong>g>the</str<strong>on</strong>g>r baseline<br />

was computed by comparing <strong>English</strong> comm<strong>on</strong>-based<br />

c<strong>on</strong>cepts <strong>to</strong> <str<strong>on</strong>g>the</str<strong>on</strong>g>ir suggesti<strong>on</strong> based <strong>on</strong> a<br />

full rank word-document matrix. Top 3 Ind<strong>on</strong>esian<br />

c<strong>on</strong>cepts with <str<strong>on</strong>g>the</str<strong>on</strong>g> highest similarity <str<strong>on</strong>g>value</str<strong>on</strong>g>s are designated<br />

as <str<strong>on</strong>g>the</str<strong>on</strong>g> mapping results. Subsequently, we<br />

compute <str<strong>on</strong>g>the</str<strong>on</strong>g> Fleiss kappa (Fleiss, 1971) <str<strong>on</strong>g>of</str<strong>on</strong>g> this result<br />

<strong>to</strong>ge<str<strong>on</strong>g>the</str<strong>on</strong>g>r with <str<strong>on</strong>g>the</str<strong>on</strong>g> human judgements.<br />

The average level <str<strong>on</strong>g>of</str<strong>on</strong>g> agreement between <str<strong>on</strong>g>the</str<strong>on</strong>g><br />

LSA mappings 10% and <str<strong>on</strong>g>the</str<strong>on</strong>g> human judges<br />

(0.2713) is not as high as between <str<strong>on</strong>g>the</str<strong>on</strong>g> human<br />

judges <str<strong>on</strong>g>the</str<strong>on</strong>g>mselves (0.4831). Never<str<strong>on</strong>g>the</str<strong>on</strong>g>less, in general<br />

it is better than <str<strong>on</strong>g>the</str<strong>on</strong>g> random baseline (0.2380)<br />

and frequency baseline (0.2132), which suggests<br />

that LSA is indeed managing <strong>to</strong> capture some<br />

measure <str<strong>on</strong>g>of</str<strong>on</strong>g> bilingual semantic informati<strong>on</strong> implicit<br />

within <str<strong>on</strong>g>the</str<strong>on</strong>g> parallel corpus.<br />

Fur<str<strong>on</strong>g>the</str<strong>on</strong>g>rmore, LSA mappings with 10% rank approximati<strong>on</strong><br />

yields higher levels <str<strong>on</strong>g>of</str<strong>on</strong>g> agreement than<br />

LSA with o<str<strong>on</strong>g>the</str<strong>on</strong>g>r rank approximati<strong>on</strong>s. It is c<strong>on</strong>tradic<strong>to</strong>ry<br />

with <str<strong>on</strong>g>the</str<strong>on</strong>g> word mapping results where LSA<br />

with bigger rank approximati<strong>on</strong>s yields higher results<br />

(Secti<strong>on</strong> 4.2).<br />

5 Discussi<strong>on</strong><br />

Previous works have shown LSA <strong>to</strong> c<strong>on</strong>tribute<br />

positive gains <strong>to</strong> similar tasks such as Cross Language<br />

Informati<strong>on</strong> Retrieval (Rehder et al., 1997).<br />

However, <str<strong>on</strong>g>the</str<strong>on</strong>g> bilingual word mapping results presented<br />

in Secti<strong>on</strong> 4.3 show <str<strong>on</strong>g>the</str<strong>on</strong>g> basic vec<strong>to</strong>r space<br />

model c<strong>on</strong>sistently outperforming LSA at that particular<br />

task, despite our initial intuiti<strong>on</strong> that LSA<br />

should actually improve precisi<strong>on</strong> and recall.<br />

We speculate that <str<strong>on</strong>g>the</str<strong>on</strong>g> task <str<strong>on</strong>g>of</str<strong>on</strong>g> bilingual word<br />

mapping may be even harder for LSA than that <str<strong>on</strong>g>of</str<strong>on</strong>g><br />

Judges<br />

Synsets<br />

Judges <strong>on</strong>ly<br />

Judges +<br />

RNDM3<br />

Fleiss Kappa Values<br />

Judges + Judges +<br />

FREQ Top LSA 10%<br />

3<br />

Top3<br />

Judges +<br />

LSA 25%<br />

Top3<br />

Judges +<br />

LSA 50%<br />

Top3<br />

≥ 2 144 0.4269 0.1318 0.1667 0.1544 0.1606 0.1620<br />

≥ 3 24 0.4651 0.2197 0.2282 0.2334 0.2239 0.2185<br />

≥ 4 8 0.5765 0.3103 0.2282 0.3615 0.3329 0.3329<br />

≥ 5 4 0.4639 0.2900 0.2297 0.3359 0.3359 0.3359<br />

Average 0.4831 0.2380 0.2132 0.2713 0.2633 0.2623<br />

Table 5. Results <str<strong>on</strong>g>of</str<strong>on</strong>g> C<strong>on</strong>cept Mapping

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!