06.08.2015 Views

A Wordnet from the Ground Up

A Wordnet from the Ground Up - School of Information Technology ...

A Wordnet from the Ground Up - School of Information Technology ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

52 Chapter 3. Discovering Semantic RelatednessAutomatic differentiation between words synonymous and not synonymous witha given LU is a natural application for an MSR, especially in <strong>the</strong> context of generation ofsuggestions for a linguist. In Latent Semantic Analysis [LSA] (Landauer and Dumais,1997) <strong>the</strong> MSR constructed using a statistical analysis of a corpus (cf Section 3.4.2)was used to make decisions in a synonymy test, a component of <strong>the</strong> Test of English asa Foreign Language [TOEFL]. This gave 64.4% of hits. Turney (2001) reported 73.75%hits, and Turney et al. (2003) 97.5% hits; <strong>the</strong> latter practically solved <strong>the</strong> TOEFLsynonymy problem. TOEFL is focused on humans, a big advantage for applicationsin MSR evaluation. On <strong>the</strong> o<strong>the</strong>r hand, it is manually constructed, hence its maindrawbacks: limited size and fixed orientation on synonymy.Freitag et al. (2005) proposed a WordNet-Based Synonymy Test [WBST], whichseems to offer an interesting response to <strong>the</strong> limitations of TOEFL. WBST has beenbased on <strong>the</strong> use of PWN to generate “a large set of questions identical in formatto those in <strong>the</strong> TOEFL”. WBST is discussed in details in Section 3.3.1, but its twoproperties are worth emphasising now. First, it is larger and broader than TOEFLbecause it is automatically generated <strong>from</strong> a very large manually constructed resource.Second, with a change in <strong>the</strong> way of selecting question-answer pairs, a WBST-like testcan evolve <strong>from</strong> a synonymy test to a test oriented toward wordnet relations or in <strong>the</strong>sense of (Mohammad and Hirst, 2006).The best reported result for English nouns is 75.8% (Freitag et al., 2005). A slightlymodified WBST was used to evaluate an MSR for Polish nouns (Piasecki et al., 2007a)with <strong>the</strong> result of 86.09%.The evaluation of an MSR via a synonymy test shows <strong>the</strong> ability of <strong>the</strong> MSR todistinguish synonyms <strong>from</strong> non-synonyms. Since <strong>the</strong> MSR is <strong>the</strong> centrepiece of <strong>the</strong>application, <strong>the</strong> achieved results can be directly attributed to it. There was, however,a problem: WBST appeared to be too easy, as we show in Section 3.3.1. It is orientedtoward testing <strong>the</strong> main distinction — closely semantically related versus unrelated —because <strong>the</strong> incorrect answers are selected randomly and on average <strong>the</strong>y are semanticallyunrelated to <strong>the</strong> question and <strong>the</strong> answer. The usefulness of WBST is <strong>the</strong>reforelimited with respect to its use in <strong>the</strong> development of more sophisticated MSRs focusedon semantic similarity and wordnet relations.In view of <strong>the</strong>se findings, we have explored <strong>the</strong> possibility of generating more demandingautomatic methods of MSR assessment, following <strong>the</strong> general idea of WBST.We proposed an Enhanced WBST [EWBST] which is precisely a template of WBSTlikeevaluation methods parameterised by <strong>the</strong> way in which detractors, i.e. false answers,are selected. We wanted its results to be easily interpreted by people and itsfeasibility tested on people. We also expected that it would pick <strong>the</strong> MSR that isa better tool for <strong>the</strong> recognition of lexico-semantic relations between LUs.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!