06.08.2015 Views

A Wordnet from the Ground Up

A Wordnet from the Ground Up - School of Information Technology ...

A Wordnet from the Ground Up - School of Information Technology ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

4.2. Benefits of Handwritten Patterns for <strong>Wordnet</strong> Expansion 105or(and(equal(cas[0],{nom}),rlook(1,end,$X, inter(flex[$X],{adjectival participles, noun,pronouns, verbal grammatical classes }) ),equal(base[$X],{"być"}),rlook($+1X,end,$Y,or(inter(flex[$Y], {adjectival passive participle,noun, pronouns, verbal grammatical classes }),and( equal(flex[$Y],{prep}),equal(cas[$Y],{inst})))),inter(flex[$Y],{subst,ger,depr}),equal(base[$Y],{"NP2"}),equal(cas[$Y],{inst}),equal(nmb[$Y],nmb[0])),a symmetrical condition for <strong>the</strong> right context)Figure 4.1: The essentials of <strong>the</strong> JestInst pattern implementation in JOSKIPIthat <strong>the</strong>y have very similar accuracy. That is why we decided to merge <strong>the</strong>m into acomplex pattern that combines <strong>the</strong> constraints using <strong>the</strong> or operator. We will refer tothis pattern as mIInne – see, for example, Table 4.1.4.2 Benefits of Handwritten Patterns for <strong>Wordnet</strong> ExpansionWe ran experiments on <strong>the</strong> extraction of hypernymic pairs on <strong>the</strong> same three corporaas those used for MSR extraction (Section 3.4.5): <strong>the</strong> IPI PAN Corpus [IPIC](≈ 254 million tokens) (Przepiórkowski, 2004), <strong>the</strong> Rzeczpospolita corpus [RzCorp](≈ 113 million tokens) (Rzeczpospolita, 2008), and a corpus of large texts in Polish<strong>from</strong> Internet (≈ 214 million tokens) [WebCorp]. Table 4.1 presents detailed resultsfor three patterns, JestInst, NomToNom and mIInne.We assessed <strong>the</strong> accuracy manually on randomly selected samples. Similarly too<strong>the</strong>r manual evaluations (for example, Section 3.4.5), we determined sample sizesfollowing <strong>the</strong> method discussed in (Israel, 1992), aiming for <strong>the</strong> 95% confidence levelon <strong>the</strong> whole population. We used a program named Sprawdzacz (Kurc, 2008) thatfacilitates manual evaluation of <strong>the</strong> extracted lexico-semantic relation instances 5 .5 We thank Roman Kurc for his great help with <strong>the</strong> whole plWordNet project.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!