13.07.2015 Views

Dictionary Alignment for Context-sensitive Word Glossing

Dictionary Alignment for Context-sensitive Word Glossing

Dictionary Alignment for Context-sensitive Word Glossing

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

⎡<strong>Word</strong> ryuuPOS noun⎡Lexical-typeDefinitionSense 1⎢⎣HypernymSense 2,3,4 [ ... ]⎡Lexical-type⎢Sense 5 ⎢Definition⎣ ⎣Domain⎤noun-lex//// /////////////// ///////////////////An imaginary animal. Dragons are like enormous snakes with 4 legs and horns.Dragons live in the sea, lakes and ponds, and are said to <strong>for</strong>m clouds and cause rain⎥when they fly up into the sky.⎦ANIMAL⎤noun-lex///////⎥⎥In shogi, a promoted rook. ⎦⎦SHOGI⎤Figure 2: A partial view of the Lexeed entry <strong>for</strong> [ryuu] (with English glosses)3.1 The Lexeed semantic database of JapaneseThe Lexeed Semantic Database of Japanese is amachine-readable dictionary consisting of the mostcommonly-used words in Japanese (Kasahara et al.,2004). In total, there are 28,000 words in Lexeed,and a total of 46,437 senses. Associated with eachsense is a set of definition sentences, constructedentirely using the closed vocabulary of the 28,000words found in Lexeed, such that 60% of the 28,000words occur in the definition sentences (Tanaka etal., 2006). In addition to the definition sentences,Lexeed also contains part of speech (POS), lexicalrelations between the senses (if any) and an examplesentence, also based on the closed vocabulary of28,000 words. All content words in the definitionand example sentences are sense annotated.Automatic ontology acquisition methods havebeen applied to Lexeed to induce lexical relationsbetween sense pairs, based on the sense-annotateddefinition sentences (Nichols et al., 2005) and comparisonwith both the Goi-Taikei thesaurus and<strong>Word</strong>Net 2.0.An example Lexeed entry <strong>for</strong> the word ryuu isgiven in Figure 2.3.2 EDICTEDICT is a free machine-readable Japanese-to-English dictionary (Breen, 1995). The project ishighly active and has been extended to other targetlanguages such as German, French and Russian.EDICT contains more than 170,000 Japanese entries,each of which is associated with one or moreEnglish glosses. It also optionally contains in<strong>for</strong>mationsuch as the pronunciation of the entry, POS, anddomain of application.3.3 <strong>Word</strong>Net<strong>Word</strong>Net is an electronic semantic lexical databaseof English (Fellbaum, 1998). It is made up of morethan 100,000 synsets, with each synset representinga group of synonyms. Its entries are categorised intofour POS categories: nouns, verbs, adjectives andadverbs. Each POS is described in a discrete lexicalnetwork.Every synset in <strong>Word</strong>Net has a definition sentence,and sample sentence(s) are provided <strong>for</strong> mostof the synsets; in combination, these are termedthe <strong>Word</strong>Net gloss. Semantic relations connect onesynset to another, and include relation types suchas hypernym, hyponymy, antonymy and meronymy.The majority of these relations do not cross POSboundaries.Since we only experiment with hypernyms (and,symmetrically, hyponyms), we provide a simple reviewof this relation. A synset A is a hypernym of asynset B iff B is a kind of A. For example, vehicleis a hypernym of car, while perceive is a hypernymof hear, sight, touch, smell, taste. 55 Strictly speaking, hear, etc. are troponyms of perceive, i.e.they denote specific ways of perceiving. Because <strong>Word</strong>Net127

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!