13.07.2015 Views

Dictionary Alignment for Context-sensitive Word Glossing

Dictionary Alignment for Context-sensitive Word Glossing

Dictionary Alignment for Context-sensitive Word Glossing

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Proceedings of the Australasian Language Technology Workshop 2007, pages 125-133<strong>Dictionary</strong> <strong>Alignment</strong> <strong>for</strong> <strong>Context</strong>-<strong>sensitive</strong> <strong>Word</strong> <strong>Glossing</strong>Willy Yap and Timothy BaldwinDepartment of CSSEUniversity of Melbourne{willy,tim}@csse.unimelb.edu.auAbstractThis paper proposes a method <strong>for</strong> automaticallysense-to-sense aligning dictionaries indifferent languages (focusing on Japaneseand English), based on structural data inthe respective dictionaries. The basis ofthe proposed method is sentence similarityof the sense definition sentences, using abilingual Japanese-to-English dictionary asa pivot during the alignment process. Weexperiment with various embellishments tothe basic method, including term weighting,stemming/lemmatisation, and ontology expansion.1 IntroductionIn a multi-lingual environment such as the Internet,users often stumble across webpages authored in anunfamiliar language which potentially contain in<strong>for</strong>mationof interest. While users can consult dictionariesto help them understand the content of thewebpages, the process of looking up words in unfamiliarlanguages is at best time-consuming, and atworst impossible due to a range of reasons. First,the writing system of the language may be unfamiliarto the user, e.g. the Cyrillic alphabet <strong>for</strong> a monolingualEnglish speaker. Second, the user may notbe familiar with the non-segmenting nature of languagessuch as Chinese and Japanese, and hence beincapable of delimiting the words to look up in thedictionary in the first place. Third, the user may beunable to lemmatise the word to determine the <strong>for</strong>min which it is listed in a dictionary.There are several alternatives to help decipherwebpages in unfamiliar languages. The first one isto use an online machine translation system suchas Altavista’s Babel Fish 1 or Google Translate. 21 http://babelfish.altavista.com/2 http://www.google.com/translate tFigure 1: Multiple translations <strong>for</strong> the Japaneseword [ageru] produced by rikai.com.The correct translation in this context is “to raise”.While web-based machine translation services occasionallyproduce good translations <strong>for</strong> linguisticallysimilarlanguages such as English and French, theydo not per<strong>for</strong>m very well in translating languageswhich are removed from one another (Koehn, 2005).The second alternative is a pop-up glossing application.The application takes raw text or a URL,parses the words, and returns the pop-up translationof each word as the mouse hovers over it. Someexample pop-up glossing applications <strong>for</strong> Japanesesource text and English glosses are Rikai 3 andPOPjisyo. 4 With the aid of these pop-up translations,the manual ef<strong>for</strong>t of segmenting words (if necessary)and looking up each can be avoided. Thisapplication is also useful as an educational aid <strong>for</strong>learners of that language.The drawback with these applications is they displayall possible translations of a given word irrespectiveof context. Faced with the task of determiningthe correct translation themselves, users frequentlymisinterpret words. An illustration of thissituation is given in Figure 1.3 http://www.rikai.com/perl/Home.pl4 http://www.popjisyo.com125


⎡<strong>Word</strong> ryuuPOS noun⎡Lexical-typeDefinitionSense 1⎢⎣HypernymSense 2,3,4 [ ... ]⎡Lexical-type⎢Sense 5 ⎢Definition⎣ ⎣Domain⎤noun-lex//// /////////////// ///////////////////An imaginary animal. Dragons are like enormous snakes with 4 legs and horns.Dragons live in the sea, lakes and ponds, and are said to <strong>for</strong>m clouds and cause rain⎥when they fly up into the sky.⎦ANIMAL⎤noun-lex///////⎥⎥In shogi, a promoted rook. ⎦⎦SHOGI⎤Figure 2: A partial view of the Lexeed entry <strong>for</strong> [ryuu] (with English glosses)3.1 The Lexeed semantic database of JapaneseThe Lexeed Semantic Database of Japanese is amachine-readable dictionary consisting of the mostcommonly-used words in Japanese (Kasahara et al.,2004). In total, there are 28,000 words in Lexeed,and a total of 46,437 senses. Associated with eachsense is a set of definition sentences, constructedentirely using the closed vocabulary of the 28,000words found in Lexeed, such that 60% of the 28,000words occur in the definition sentences (Tanaka etal., 2006). In addition to the definition sentences,Lexeed also contains part of speech (POS), lexicalrelations between the senses (if any) and an examplesentence, also based on the closed vocabulary of28,000 words. All content words in the definitionand example sentences are sense annotated.Automatic ontology acquisition methods havebeen applied to Lexeed to induce lexical relationsbetween sense pairs, based on the sense-annotateddefinition sentences (Nichols et al., 2005) and comparisonwith both the Goi-Taikei thesaurus and<strong>Word</strong>Net 2.0.An example Lexeed entry <strong>for</strong> the word ryuu isgiven in Figure 2.3.2 EDICTEDICT is a free machine-readable Japanese-to-English dictionary (Breen, 1995). The project ishighly active and has been extended to other targetlanguages such as German, French and Russian.EDICT contains more than 170,000 Japanese entries,each of which is associated with one or moreEnglish glosses. It also optionally contains in<strong>for</strong>mationsuch as the pronunciation of the entry, POS, anddomain of application.3.3 <strong>Word</strong>Net<strong>Word</strong>Net is an electronic semantic lexical databaseof English (Fellbaum, 1998). It is made up of morethan 100,000 synsets, with each synset representinga group of synonyms. Its entries are categorised intofour POS categories: nouns, verbs, adjectives andadverbs. Each POS is described in a discrete lexicalnetwork.Every synset in <strong>Word</strong>Net has a definition sentence,and sample sentence(s) are provided <strong>for</strong> mostof the synsets; in combination, these are termedthe <strong>Word</strong>Net gloss. Semantic relations connect onesynset to another, and include relation types suchas hypernym, hyponymy, antonymy and meronymy.The majority of these relations do not cross POSboundaries.Since we only experiment with hypernyms (and,symmetrically, hyponyms), we provide a simple reviewof this relation. A synset A is a hypernym of asynset B iff B is a kind of A. For example, vehicleis a hypernym of car, while perceive is a hypernymof hear, sight, touch, smell, taste. 55 Strictly speaking, hear, etc. are troponyms of perceive, i.e.they denote specific ways of perceiving. Because <strong>Word</strong>Net127


Figure 3: Example of normalisation of the translationstring; we stop at “rook” as <strong>Word</strong>Net has amatching entry <strong>for</strong> itWhen building the baseline <strong>for</strong> our evaluation, weused the SemCor corpus—a subset of the Browncorpus annotated with <strong>Word</strong>Net senses—to derivethe frequency counts of each <strong>Word</strong>Net synset (Landeset al., 1998). Section 5 discusses this process inmore detail.4 Proposed MethodsOur basic alignment method, along with various extensions,is outlined below.4.1 Basic alignment method using cosinesimilarityIn this paper, we align a semantic database ofJapanese (Lexeed) with a semantic network of English(<strong>Word</strong>Net) at the sense level. First, we useLexeed to find all possible senses of a given word,and retrieve the definition sentences <strong>for</strong> each.Since all the definition sentences are in Japanese,we use EDICT as a pivot to convert Lexeed definitionsentences into English. In this process, allpossible translations of all Japanese words foundin the definition sentences are returned, along withtheir POS classes. For every translation returned,we find entries in <strong>Word</strong>Net that match the translationand POS category. If there is no match <strong>for</strong> thegiven POS, we relax this constraint and search <strong>for</strong>entries in <strong>Word</strong>Net that match the translation but notthe POS.Problems arise when <strong>Word</strong>Net does not have amatching entry <strong>for</strong> the translation. This situationdoesn’t distinguish between hyponyms and troponyms, however,we treat the two identically.usually happens when the translation returned byEDICT is comprised of more than one English word.For a Japanese verb, e.g., the English translation inEDICT almost always begins with the auxiliary to(e.g. nomu is translated as to drink). <strong>Word</strong>Net doesnot contain a verbal entry <strong>for</strong> to drink, but does containan entry <strong>for</strong> drink. To handle this case of partialmatch, we locate the longest right word substring ofthe EDICT translation which is indexed in <strong>Word</strong>Net.A related problem is when the translation containsdomain or collocational in<strong>for</strong>mation in parentheses.For example, ryuu is translated as both dragon andpromoted rook (shogi). The first translation has amatching entry in <strong>Word</strong>Net but the second translationdoes not. In this second case, there is no rightword substring which matches in <strong>Word</strong>Net, as weend up with rook (shogi) and then (shogi), neitherof which is contained in <strong>Word</strong>Net. In order to dealwith this situation, we first normalise the translationstrings by removing all the brackets and query<strong>Word</strong>Net with the normalised string. Should therebe a matching entry, we stop here. If not, we thenremove all strings between brackets, and apply thelongest right word substring heuristic as above. Anillustration of this process is given in Figure 3.In the worst case of <strong>Word</strong>Net not having a matchingentry <strong>for</strong> any right word substring, we discardthe translation.At this point, we have aligned a given Japaneseword with (hopefully) one or more English words,but are still no closer to inducing sense alignmentpairs. In order to produce the sense alignments, wegenerate all pairings of Lexeed senses with <strong>Word</strong>Netsynsets <strong>for</strong> each <strong>Word</strong>Net-matched word translation.For each such pair, we compile out the Lexeed definitionsentence(s) word-translated into English, andthe <strong>Word</strong>Net glosses, and convert each into a simplevector of term frequencies. We then measure thesimilarity of each vector pair using cosine similarity.An overview of this alignment process is presentedin Figure 4.4.2 Weighting terms using TF-IDF mechanismThe basic alignment method does not use any <strong>for</strong>mof term weighting, and thus overemphasises commonfunction words such as the, which and and,and downplays the impact of rare words. As we expectto have a large amount of noise in the word-128


Figure 4: Overview of the Lexeed–<strong>Word</strong>Net sense alignment methodtranslated Lexeed definition sentences, includingspurious translations <strong>for</strong> Japanese function wordssuch as ka, ga and no that have no literal translationin English, we predict that an appropriate <strong>for</strong>mof term weighting should improve the per<strong>for</strong>manceof our method.As a first attempt at term weighting, we experimentedwith the classic SMART <strong>for</strong>mulation of TF-IDF (Salton, 1971), treating the vector associatedwith each definition sentence as a single document.4.3 <strong>Word</strong> stoppingAs mentioned in the previous section, commonlyoccurringsemantically-bleached words are a sourceof noise in the naive cosine similarity scoringmethod. One conventional way of countering theirimpact is to filter them out of the vectors, based on astop word list. For our experiments, we use the stopword list provided by the Snowball project. 64.4 POS filteringAnother source of possible noise is the translationsof Japanese function words. As all the Lexeed definitionsentences are POS tagged, it is a relativelysimple process to filter out all Japanese functionwords, focusing on prefixes, suffixes and particles.4.5 Lemmatisation, stemming andnormalisationIn its basic <strong>for</strong>m, our vector space model treats distinctword as a unique term, including ignoring the6 http://snowball.tartarus.org/obvious similarity between inflectional variants ofthe same word, such as dragon and dragons. Toremove such inflectional variation, we experimentwith lemmatising all words found in both the Lexeedand <strong>Word</strong>Net vectors, using morph (Minnen etal., 2001). For similar reasons, we also experimentwith the Porter stemmer, noting that stemming willfurther reduce the set of terms but potential introducespurious matches.As part of this process (with both lemmatisationand stemming), we remove all punctuation from thedefinition sentences.4.6 Lexical relationsBoth the Lexeed and <strong>Word</strong>Net sense inventories aredescribed in the <strong>for</strong>m of hierarchies, making it possibleto complement the sense definitions with thosefrom neighbouring senses. The intuition behind thisis that the sense granularity in the two sense inventoriescan vary greatly, such that a single sensein Lexeed is split across multiple <strong>Word</strong>Net synsets,which we can readily uncover by considering eachsense as not a single point in <strong>Word</strong>Net but a semanticneighbourhood. For example, the second senseof the word kinou in Figure 5, which literally means“near past”, should be aligned with the second senseof yesterday, which is defined as “the recent past”.This alignment is more self-evident, however, whenwe observe that the hypernym of each of the twosenses is defined as “past”.In our current experiments, we only look at theutility of hypernymy. For a given sense Lexeed–129


Figure 5: The output of word-translating Japanese definition sentences to English<strong>Word</strong>Net sense pairing, we extract the hypernymsof the respective senses and expand the definitionsentences with the definition sentences from the hypernyms.The term vectors are then based on thisexpanded term set, similar to query expansion in in<strong>for</strong>mationretrieval.5 Experimental Setup5.1 Gold-standard dataTo evaluate the per<strong>for</strong>mance of our system, werandomly selected 100 words from Lexeed, extractedout the Lexeed–<strong>Word</strong>Net sense pairings asdescribed above, and manually selected the goldstandardalignments from amongst them. The 100words were associated with a total of 268 Lexeedsenses and 772 <strong>Word</strong>Net senses, creating a total of206,896 possible alignment pairs. Of these, 259alignments were selected as our gold-standard.We encountered a number of partial matches thatwere caused by the Japanese word being more specificthan its English counterparts (as identified byour <strong>Word</strong>Net matching method). For example,kakkazan is translated as “active volcano”. Since<strong>Word</strong>Net does not have any entry <strong>for</strong> active volcano,the longest right word substring that matchesin <strong>Word</strong>Net is simply volcano. The definition sentencesreturned by Lexeed describe kakkazan as “avolcano which still can erupt” and “a volcano thatwill soon erupt”, while volcano is described as “afissure in the earth’s crust (or in the surface of someother planet) through which molten lava and gaseserupt” and “a mountain <strong>for</strong>med by volcanic material”.Although there is some similarity betweenthese definitions (namely key words such as eruptand volcano), we do not include this pairing in ourgold-standard alignment data.5.2 BaselineAs a baseline, we take the most-frequent sense ofeach of the 100 random words from Lexeed, andmatch it with the synset with the highest SemCorfrequency count out of all the candidate synsets.5.3 ThresholdingAll our calculations are based on cosine similarity,which returns a similarity between 0 and 1, with 1being an exact match. In its simplest <strong>for</strong>m, we wouldidentify the unique <strong>Word</strong>Net sense with highest similarityto each Lexeed sense, irrespective of the magnitudeof the similarity. This has the dual disadvantageof allowing only one <strong>Word</strong>Net sense <strong>for</strong> eachLexeed sense, and potentially <strong>for</strong>cing alignments tobe made on low similarity values. A more reasonableapproach is to apply a threshold x, and treatall <strong>Word</strong>Net senses with similarity greater than x asbeing aligned with the Lexeed sense. Thresholdingalso gives us more flexibility in terms of tuning theper<strong>for</strong>mance of our method: at higher threshold values,we can hope to increase precision at the expenseof recall, and at lower threshold values, we can hopeto increase recall at the expense of precision.5.4 Evaluation metricsTo evaluate the per<strong>for</strong>mance of our system, we useprecision, recall and F-score. In an alignment context,precision is defined as the proportion of correctalignments to all alignments returned by the system,and recall is defined as the proportion of the correctalignments returned by our system to all the align-130


Basic Stopping POS filtering Lemmatisation Stemming NormalisationBasic model 0.228 0.334 0.256 0.240 0.243 0.221Basic+TF-IDF 0.292 0.335 0.295 0.344 0.330 0.288Table 1: Best system F-score of combination of all features using the basic model vs. the basic model withTF-IDF weightingments in our gold-standard. F-score is the harmonicmean of precision and recall, and provides a singlefigure-of-merit rating of the balance between thesetwo factors. We evaluate our system using unbiased(β =1) F-score.6 ResultsThroughout our experimentation, we evaluate relativeto the 100 manually sense-aligned Japanesewords.Our baseline method predicts 100 alignments (asit is guaranteed to produce a unique alignment persource-language word), of which 60 are correct.Hence, the precision is 60100=0.600, the recall is60259=0.231, and the F-score is 0.334.With the basic alignment model, the highest F-score achieved with thresholding is 0.228 at a thresholdvalue of 0.19, well below the baseline F-score.The recall and precision value at this threshold are0.263 and 0.202, respectively.The basic model with TF-IDF weighting per<strong>for</strong>medconsiderably better, scoring the highest F-score of 0.292 (recall = 0.382 and precision = 0.236)at a threshold value of 0.04, but is still well belowthe baseline F-score. To confirm that TF-IDFterm weighting is always beneficial to overall alignmentper<strong>for</strong>mance, we took the unweighted modelcombined with each of the proposed extensions, andcompared it with the same extension but with the inclusionof TF-IDF (without lexical relations at thispoint). The result of these experiments can be foundin Table 1. As we can see, TF-IDF weighting constantlyimproves alignment per<strong>for</strong>mance. Also notethat, with the exception of simple (punctuation) normalisation,all extensions improve over the basicmodel both with and without TF-IDF weighting.We extended our experiments by considering allpossible combinations of 2 or more proposed extensions(excluding lexical relations <strong>for</strong> the time being)with TF-IDF weighting. The purpose of this ex-Method Precision Recall F-scoreBaseline 0.600 0.231 0.334WS+PF+L+N 0.298 0.494 0.372WS+PF+L 0.326 0.428 0.370WS+L 0.305 0.455 0.365WS+PF+L+S+N 0.275 0.540 0.364WS+L+S 0.301 0.455 0.363Table 2: Top-5 combinations of extensions, excludinglexical relations (WS = <strong>Word</strong> stopping, PF =POS filtering, L = Lemmatisation, S = Stemming,N = Normalisation)periment is to investigate whether the proposed extensionsare complementary in improving alignmentper<strong>for</strong>mance. The 5 top-per<strong>for</strong>ming combinationsare presented in Table 2.The best result is achieved by combining all theproposed extensions, at an F-score of 0.364, whichis significantly above baseline. It is also interestingto see that not all methods are fully complementary.By excluding stemming, e.g., the system actuallyper<strong>for</strong>ms better, producing a higher F-score of0.372.We then experimented with the addition of lexicalrelations to the different combinations of extensionsexplored above. The 5 top-per<strong>for</strong>ming combinationsare presented in Table 3. The best F-score of0.408 is achieved with the combination of all the extensionsproposed. When lexical relations are usedexclusively or combined with less than three of theproposed extensions, the per<strong>for</strong>mance tends to decline.In our best per<strong>for</strong>ming combination, we outper<strong>for</strong>medthe baseline F-score by 22%. 349 alignmentswere returned <strong>for</strong> this F-score, of which 124matched the gold-standard. The precision and recallscores are 0.355 and 0.478, respectively.We carried out more detailed analysis of theprecision–recall trade-off. While we expect the pre-131


Method Precision Recall F-scoreBaseline 0.600 0.231 0.334WS+PF+L+S+N+H 0.355 0.478 0.408WS+PF+L+S+H 0.344 0.490 0.404WS+PF+N+S+H 0.342 0.478 0.399WS+L+S+H 0.361 0.440 0.396WS+PF+S+H 0.317 0.525 0.396Table 3: Top-5 per<strong>for</strong>ming combinations of extensions,including lexical relations (WS = stopping, PF= POS filtering, L = Lemmatisation, S = Stemming,N = Normalisation, H = Hypernym)cision to go up to 1 as we increase our threshold,we found out that it is in fact not the case. The precisionpeaks at 0.625 at a threshold level of 0.265.At this level, there are 10 correct alignments out of16 alignments returned. Upon investigating the sixnon-matching entries, we found that they all containsimilar words but that the literal meaning of thesenses are very different. Below, we present two ofthe six non-matching entries.The first one relates to a sense of the Japaneseword shanpuu “shampoo”. The definition sentences<strong>for</strong> this sense found in Lexeed are directly translatedas “shampoo medicine, drug, or dose; detergentor washing material that is used to wash hair orfur”. The corresponding match in <strong>Word</strong>Net is “theact of washing your hair with shampoo”. We can seethat there are similar terms in the two vectors, suchas shampoo, washing and hair, but that the literalmeaning of the two senses is quite different.The second example is very similar to thekakkazan example presented in Section 5. One senseof sengetsu (“last month”) is defined as “the previousmonth”, and is aligned to the <strong>Word</strong>Net synsetof month (<strong>Word</strong>Net does not have an entry <strong>for</strong> lastmonth). It does not help that the hypernym of sengetsuis tsuki which translates to “month”, boostingthe similarity of this alignment.7 DiscussionIn terms of F-score, the best-per<strong>for</strong>ming combinationof extensions per<strong>for</strong>med better than the baseline.However, the recall seems to be the dominantfactor in the F-score calculations <strong>for</strong> the proposedmethod. This is in sharp contrast to what we have inour baseline, where precision dominates the F-scorecalculation. There are several reasons <strong>for</strong> the baselinescores. First, there are 259 alignments in ourgold-standard <strong>for</strong> 100 random words, correspondingto approximately 2.6 alignment per word. Givenhow we created our baseline, with one alignmentper word, the maximum recall that the baseline canachieve is 100259 =0.386.On the other hand, the first-sense basis of thebaseline method leads to high precision, largely dueto the design process <strong>for</strong> ontologies and dictionaries.Namely, there is usually good coverage of frequentword senses in ontologies and dictionaries, and additionally,the translations <strong>for</strong> a given word are generallyselected to be highly biased towards commonsenses (i.e. even if a polysemous word is chosen asa translation, its predominant sense is almost alwaysthat which corresponds to the source language word,<strong>for</strong> obvious accessibility/usability reasons). For thisreason, there is a very high probability that thesefrequent senses <strong>for</strong> each of the two languages alignwith each other.In this paper, we proposed a cross-lingual senseto-sensealignment method, based on similarity ofdefinition sentences as calculated via a bilingual dictionary.We explored various extensions to a simplelexical overlap method, and achieved promising resultsin preliminary experiments.In future work, we plan to exploit more lexical relations,such as synonymy and hyponymy. We alsoplan to experiment with weighting up alignmentswhere both the sense pairing and the hypernym pairingmatch well.Nichols et al. (2005) linked Lexeed senses to<strong>Word</strong>Net in their evaluation on ontology induction.Comparison with their method would be very interestingand is an area <strong>for</strong> future research.AcknowledgementsWe thank Francis Bond <strong>for</strong> his insights and suggestions <strong>for</strong> ourexperiment and Sanae Fujita <strong>for</strong> her help with the data. Weare also grateful to anonymous reviewers <strong>for</strong> their helpful comments.This research was supported by NTT CommunicationScience Laboratories, Nippon Telegraph and Telephone Corporation.132


ReferencesNaoki Asanoma. 2001. <strong>Alignment</strong> of ontologies: <strong>Word</strong>-Net and Goi-Taikei. In Proc. of the NAACL 2001Workshop on <strong>Word</strong>Net and Other Lexical Resources:Applications, Extensions and Customizations, pages89–94, Pittsburgh, USA.Timothy Baldwin, Su Nam Kim, Francis Bond, SanaeFujita, David Martinez, and Takaaki Tanaka. toappear. MRD-based word sense disambiguation:Further #2 extending #1 Lesk. In Proc. of the ThirdInternational Joint Conference on Natural LanguagePrcoessing (IJCNLP 2008), Hyderabad, India.Jim Breen. 1995. Building an electronic Japanese-English dictionary. In Japanese Studies Associationof Australia Conference.Christiane Fellbaum, editor. 1998. <strong>Word</strong>Net: An ElectronicLexical Database. MIT Press, Cambridge,USA.Alfio Massimiliano Gliozzo, Marcello Ranieri, and CarloStrapparava. 2005. Crossing parallel corpora andmultilingual lexical databases <strong>for</strong> WSD. In Proc. ofthe 6th International Conference on Intelligent TextProcessing and Computational Linguistics (CICLing-2005), pages 242–5, Mexico City, Mexico.Kaname Kasahara, Hiroshi Sato, Francis Bond, TakaakiTanaka, Sanae Fujita, Tomoko Kasunagi, and ShigeakiAmano. 2004. Construction of a Japanese semanticlexicon: Lexeed. In Proc. of SIG NLC-159, IPSJ,Tokyo, Japan.Kevin Knight and Steve K. Luk. 1994. Building a largescaleknowledge base <strong>for</strong> machine translation. In Proc.of the 12th Annual Conference on Artificial Intelligence(AAAI-94), pages 773–8, Seattle, USA.Eric Nichols, Francis Bond, and Daniel Flickinger. 2005.Robust ontology acquisition from machine-readabledictionaries. In Proc. of the 19th International JoinConference on Artificial Intelligence (IJCAI-2005),pages 1111–6, Edinburgh, UK.Kyonghee Paik, Francis Bond, and Shirai Satoshi. 2001.Using multiple pivots to align Korean and Japaneselexical resources. In Proc. of the NLPRS-2001 Workshopon Language Resources in Asia, pages 63–70,Tokyo.Gerald Salton. 1971. The SMART Retrieval System:Experiments in Automatic Document Processing.Prentice-Hall.Satoshi Shirai, Kazuhide Yamamoto, and Kyonghee Paik.2001. Overlapping constraints of two step selection togenerate a transfer dictionary. In ICSP-2001, pages731–736.Sofia Stamou, Kemal Oflazer, Karel Pala, Dimitris Christoudoulakis,Dan Cristea, Dan Tufiş, Svetla Koeva,George Totkov, Dominique Dutoit, and Maria Grigoriadou.2002. BALKANET: A multilingual semanticnetwork <strong>for</strong> the Balkan languages. In Proc. of the International<strong>Word</strong>net Conference, pages 12–4, Mysore,India.Takaaki Tanaka, Francis Bond, and Sanae Fujita. 2006.The hinoki sensebank — a large-scale word sensetagged corpus of Japanese —. In Proceedings ofthe Workshop on Frontiers in Linguistically AnnotatedCorpora 2006, pages 62–9, Sydney, Australia.Piek Vossen, editor. 1998. Euro<strong>Word</strong>Net: A MultilingualDatabase with Lexical Semantic Networks. KluwerAcademic Publishers, Dordrecht, Netherlands.Philipp Koehn. 2005. Europarl: A parallel corpus <strong>for</strong>statistical machine translation. In Proc. of the TenthMachine Translation Summit (MT Summit X), Phuket,Thailand.Shari Landes, Claudia Leacock, and Randes Tengi.1998. Building semantic concordances. In ChristianeFellbaum, editor, <strong>Word</strong>Net: An Electronic LexicalDatabase. MIT Press, Cambridge, USA.Guido Minnen, John Carroll, and Darren Pearce. 2001.Applied morphological processing of English. NaturalLanguage Engineering, 7(3):207–23.Grace Ngai, Marine Carpuat, and Pascale Fung. 2002.Identifying concepts across languages: A first step towardsa corpus-based approach to automatic ontologyalignment. In Proc. of the 19th International Conferenceon Computational Linguistics (COLING 2002),Taipei, Taiwan.133

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!