06.08.2015 Views

A Wordnet from the Ground Up

A Wordnet from the Ground Up - School of Information Technology ...

A Wordnet from the Ground Up - School of Information Technology ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

4.5. Hybrid Combinations 135The next group of attributes is meant to tell <strong>the</strong> classifier directly about <strong>the</strong> possibilityof hypernymy or co-hyponymy between a and b. The co-ordination attribute (2) isbased on <strong>the</strong> lexico morpho-syntactic constraint NcC used for <strong>the</strong> MSR RW F extraction(Section 3.4.3, page 67). NcC looks for occurrences of syntactic co-ordination of aand b as constituents of <strong>the</strong> same composite noun phrase. It recognises only a limitedset of conjunctions: ani (nei<strong>the</strong>r, nor), albo (or), czy (whe<strong>the</strong>r), i (and), lub (or),oraz (and). All <strong>the</strong>se were manually identified as marking semantic coordination of<strong>the</strong> linked nominal LUs, possibly indirect co-hyponyms (“coordinated terms” in (Snowet al., 2005)).During matrix construction, occurrences of NcC are recorded for a LU x with allnominal lexical elements, very often more than 100000. Here, we pay attention onlyto nominal LUs in <strong>the</strong> classified pairs – potentially all LUs described by <strong>the</strong> givenMSR. The value of (2) is <strong>the</strong> frequency with which <strong>the</strong> constraint is met for a and bco-occurring in <strong>the</strong> same sentence. 11 We assumed that co-ordination is more frequentfor potential co-hyponyms and hypernyms.A manual investigation of instance pairs of hypernyms in <strong>the</strong> IPI PAN Corpus ofPolish 12 [IPIC] (Przepiórkowski, 2004) showed that, surprisingly, <strong>the</strong>y often occur as<strong>the</strong> noun phrase head and its noun modifier in <strong>the</strong> genitive case. Even more frequent ismeronymy expressed by <strong>the</strong> genitive modification. The classifier receives informationon <strong>the</strong> frequency of this syntactic relation in both directions, when a is modified (3)and is <strong>the</strong> modifier (4). Both attributes are based on <strong>the</strong> same lexico-morphosyntacticconstraint NmgC used for MSR extraction, presented in Figure 3.6 and discussed inSection 3.4.3. NmgC is based more on <strong>the</strong> relative positions of both nominal LUs thanon agreement. Just as attribute (2), NmgC was used only to detect associations betweenLUs described by <strong>the</strong> MSR.The idea of <strong>the</strong> precision of repeating b’s features by a’s features, used in attributes5 and 7, is modelled after <strong>the</strong> MSR in (Weeds and Weir, 2005). We want to analyse <strong>the</strong>additive precision with which, using a’s features, we refer to (“retrieve”) b’s features.The precision is defined as follows:∑P add i∈F (a)∩F (b)M[a, i](a, b) = ∑j∈F (a) M[a, j] (4.10)• F (x) is <strong>the</strong> set of features occurring frequently enough with x, according to atest of statistical significance, e.g., a t-score test,• M is a co-incidence matrix that represents <strong>the</strong> given set of features; for attribute11 The corpus is processed with <strong>the</strong> granularity of sentences – identified by a simple sentencer.12 Especially in <strong>the</strong> part called HC (Section 4.1) – sentences that contain pairs of known hypernyms.HC has been extracted to facilitate manual construction of lexico-syntactic patterns.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!