06.08.2015 Views

A Wordnet from the Ground Up

A Wordnet from the Ground Up - School of Information Technology ...

A Wordnet from the Ground Up - School of Information Technology ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

104 Chapter 4. Extracting Relation Instancestarget nominal LUs not only with some constraints by also with <strong>the</strong> pairs of LUs infocus. That is why we used here <strong>the</strong> same set of 13285 nominal lemmas which wereselected for <strong>the</strong> construction of nominal MSR and <strong>the</strong> expansion of <strong>the</strong> core plWordNet(Section 3.4.5). From <strong>the</strong> set, we generated all possible pairs. We also reapplied <strong>the</strong>mechanism of co-incidence matrix construction. Target LUs were assigned to rows,and patterns with <strong>the</strong> position NLU2 instantiated to subsequent target LUs were assignedto columns. The patterns were run with position 0 representing NLU1 3 .Given <strong>the</strong>se assumptions, <strong>the</strong>re is no need to test <strong>the</strong> presence of NLU1 in <strong>the</strong>IInne and TakichJak code (Figures 4.1–4.2). We refer <strong>the</strong> reader to Section 3.4.3for <strong>the</strong> details of JOSKIPI. IInne is implemented in two symmetrical parts joined byor for two configurations of <strong>the</strong> hyponym (NLU1) and hypernym (NLU2). The matrixconstruction requires that we start with <strong>the</strong> hyponym in position 0. In <strong>the</strong> first part, wefirst test if <strong>the</strong> potential NLU1 is nominative, <strong>the</strong>n look to <strong>the</strong> right (till <strong>the</strong> end of <strong>the</strong>sentence) for <strong>the</strong> first verb word form or <strong>the</strong> first nominal LU and record its positionin variable $X. We test if it is a form of <strong>the</strong> verb być (to be) – any o<strong>the</strong>r verb or nounmeans that <strong>the</strong> sentence does not match <strong>the</strong> pattern. We look fur<strong>the</strong>r to <strong>the</strong> right for <strong>the</strong>first verb or <strong>the</strong> first nominal LU, or a preposition (prep) that requires <strong>the</strong> instrumentalcase. The latter is necessary, because NLU2 in <strong>the</strong> pattern is only identified by <strong>the</strong> casevalue induced by <strong>the</strong> verb być. The token at position $Y is compared with <strong>the</strong> baseform with which <strong>the</strong> pattern was instantiated 4 . We also test its case and number.In <strong>the</strong> pattern TakichJak in Figure 4.2, <strong>the</strong> iteration goes in <strong>the</strong> opposite direction.Hyponyms now follow <strong>the</strong> hypernym, and we wanted to keep <strong>the</strong> same 〈hyponym,hypernym〉 order of <strong>the</strong> extracted LU pairs across all <strong>the</strong> patterns. After <strong>the</strong> caseof NLU1 has been tested, we look to <strong>the</strong> left till <strong>the</strong> beginning of <strong>the</strong> sentence for<strong>the</strong> sequence taki jak (such as). Next, we test <strong>the</strong> tokens between 0 and $+2T –<strong>the</strong> position after jak – for <strong>the</strong> presence of only LUs of <strong>the</strong> specified grammaticalclasses plus <strong>the</strong> specified punctuation marks and conjunctions; this signal a coordinatesequence of noun phrases. Finally, NLU2 is sought fur<strong>the</strong>r to <strong>the</strong> left, and tokensbetween it and taki are tested. Only modifiers are accepted <strong>the</strong>re, including nounsand pronouns in <strong>the</strong> genitive case.The implementation of <strong>the</strong> o<strong>the</strong>r three patterns is similar.The patterns IInne, WTym and TakichJak are structurally very similar: a hypernymand a list of hyponyms. Also, a preliminary evaluation on a part of IPIC showed3 Multiword LUs were recognised during preprocessing and folded into a one-token representationwith <strong>the</strong> attribute and root set to <strong>the</strong> values proper for <strong>the</strong> whole LU. During matrix construction, eachtarget LU occupies exactly one token in <strong>the</strong> preprocessed representation of <strong>the</strong> corpus (Broda and Piasecki,2008b). Recognition of multiword LUs was limited to target LUs (all parts of speech) due to <strong>the</strong> labourintensityof <strong>the</strong>ir syntactic description.4 Technically, each column in <strong>the</strong> matrix is assigned its own copy of <strong>the</strong> pattern instantiated to <strong>the</strong>appropriate nominal LU as NLU2.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!