06.08.2015 Views

A Wordnet from the Ground Up

A Wordnet from the Ground Up - School of Information Technology ...

A Wordnet from the Ground Up - School of Information Technology ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

66 Chapter 3. Discovering Semantic Relatedness1. mama (mam = mom case=gen,num=pl ),2. mieć (mam = have (possess) person=1st,num=sg,tense=present ),3. mamić (mam = delude imperative ).In order to disambiguate base form assignment, we applied <strong>the</strong> morphosyntactic taggerTaKIPI (Piasecki and Godlewski, 2006). The accuracy of <strong>the</strong> base form identificationby TaKIPI is 99.31% (Piasecki and Radziszewski, 2009), as measured in relation to<strong>the</strong> manually disambiguated part of IPIC.MSR extraction methods based on lexico-syntactic constraints assume that <strong>the</strong>corpus has been preprocessed by a parser. There is no available parser or shallowparser for Polish, which could be used for this task: Swigra (Woliński, 2005) is a deepparser that produce many possible detailed analyses for a sentence, <strong>the</strong> dependencyparser of Obrębski (2002) also returns several analyses for a sentence, and <strong>the</strong> Polengparser (Graliński, 2005) is a commercial product, whose version available for <strong>the</strong>plWordNet project caused problems with interpreting <strong>the</strong> output format 10 .Faced with <strong>the</strong> lack of a suitable parser, we considered <strong>the</strong> morphological informationencoded by Polish word forms. It has turned out to be rich enough for use ina tool to replace a parser. Lexico-morphosyntactic constraints as context descriptorshelp identify semantically relevant association between a target LU and o<strong>the</strong>r LUs in<strong>the</strong> lexicon. In Polish, associations among language expressions very often depend on<strong>the</strong> morphosyntactic characteristics of <strong>the</strong>ir constituents, such as gender/number/caseagreement between an adjective and a head noun. In an inflectional language like Polish,<strong>the</strong> morphosyntactic description of word forms (ra<strong>the</strong>r than word order) deliversmost of <strong>the</strong> structural information. For example, an adjective and a noun which areconstituents of <strong>the</strong> same noun phrase can occur in both possible orders 11 but <strong>the</strong> agreementis necessary. Morphosyntactic associations are also simpler to recognise, sincethis requires only a tagger and a constraint representation formalism. Morphosyntactictaggers have been created for most European languages; in our experience, a constraintlanguage interpreter can be constructed for a given language in a few person-weeks.The JOSKIPI language, originally introduced as <strong>the</strong> language of tagging rules inTaKIPI, was used to implement morphosyntactic constraints. Selected elements ofJOSKIPI will be presented as we discuss <strong>the</strong> constraint examples later in this section.For a detailed description, see (Piasecki, 2006, Piasecki and Radziszewski, 2009). Ingeneral, <strong>the</strong> expressions are used to recognise potential associations between a targetLU occurrence and occurrences of o<strong>the</strong>r LU in <strong>the</strong> given sentence. Each constraintis based on a template that has a marked place for a LU – a lexical element. A setof concrete constraints is generated <strong>from</strong> a list of lexical elements predefined for <strong>the</strong>10 It was designed as an internal module of a Machine Translation system.11 Except some fixed collocations.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!