05.03.2013 Views

PhD thesis - School of Informatics - University of Edinburgh

PhD thesis - School of Informatics - University of Edinburgh

PhD thesis - School of Informatics - University of Edinburgh

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Chapter 5. Parsing English Inclusions 126<br />

FOM<br />

PN<br />

FOM<br />

. . . . . .<br />

(a) Whenever a<br />

FOM is encoun-<br />

tered...<br />

FOM<br />

PN<br />

FP<br />

FOM<br />

. . . . . .<br />

(b) ...a new FP<br />

category is cre-<br />

Figure 5.5: Tree transformation employed in the inclusion entity parser.<br />

<strong>of</strong> the five held-out test sets. For each experiment, parsing performance is reported<br />

in terms <strong>of</strong> the standard PARSEVAL scores (Black et al., 1991), including coverage<br />

(Cov), labelled precision (P) and recall (R) and F-score, the average number <strong>of</strong> crossing<br />

brackets (AvgCB), and the percentage <strong>of</strong> sentences parsed with zero and with two or<br />

fewer crossing brackets (0CB and ≤2CB). In addition, dependency accuracy (Dep) is<br />

also reported. Dependency accuracy is calculated by means <strong>of</strong> the approach described<br />

in Lin (1995), using the head-picking method employed by Dubey (2005a). The la-<br />

belled bracketing figures (P, R and F) and the dependency score are calculated on all<br />

sentences, with those which are out-<strong>of</strong>-coverage getting zero counts. The crossing<br />

bracket scores are calculated only on those sentences which are successfully parsed.<br />

Stratified shuffling is used to determine statistical difference between precision and<br />

recall values <strong>of</strong> different runs. 4 In particular, statistical difference is determined over<br />

the baseline and the perfect tagging model runs for both the inclusion and the random<br />

test sets. In order to differentiate between the different tests, Table 5.2 lists a set <strong>of</strong><br />

diacritics used to indicate a given (in)significance.<br />

4 This approach to statistical testing is described in detail at: http://www.cis.upenn.edu/<br />

˜dbikel/s<strong>of</strong>tware.html<br />

ated

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!