22.07.2013 Views

Automatic Mapping Clinical Notes to Medical - RMIT University

Automatic Mapping Clinical Notes to Medical - RMIT University

Automatic Mapping Clinical Notes to Medical - RMIT University

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Efficient Combina<strong>to</strong>ry Categorial Grammar Parsing<br />

Bojan Djordjevic and James R. Curran<br />

School of Information Technologies<br />

<strong>University</strong> of Sydney<br />

NSW 2006, Australia<br />

{bojan,james}@it.usyd.edu.au<br />

Abstract<br />

Unfortunately, parsing is very inefficient because<br />

of the large degree of ambiguity present<br />

Efficient wide-coverage parsing is integral in natural language. This is particularly true for<br />

<strong>to</strong> large-scale NLP applications. Unfortu- wide-coverage grammars in linguistically expresnately,<br />

parsers for linguistically motivated sive formalisms, especially those au<strong>to</strong>matically<br />

formalisms, e.g. HPSG and TAG, are often extracted from a treebank.<br />

<strong>to</strong>o inefficient for these applications.<br />

Many NLP systems use shallow parsing be-<br />

This paper describes two modifications cause full parsing is <strong>to</strong>o slow (Grishman, 1997).<br />

<strong>to</strong> the standard CKY chart parsing algo- To improve the approximate structure identified<br />

rithm used in the Clark and Curran (2006) by shallow parsers, many systems use domain-<br />

Combina<strong>to</strong>ry Categorial Grammar (CCG) specific knowledge <strong>to</strong> extract dependencies (Gr-<br />

parser. The first modification extends the ishman, 1997; Cole et al., 1997). Ciravegna et al.<br />

tight integration of the supertagger and (1997) show that the accuracy can be improved by<br />

parser, so that individual supertags can be using a better parser. The ability of NLP systems <strong>to</strong><br />

added <strong>to</strong> the chart, which is then repaired extract useful and correct information could there-<br />

rather than rebuilt. The second modificafore be improved substantially if the speed of full<br />

tion adds constraints <strong>to</strong> the chart that re- parsing was acceptable.<br />

strict which constituents can combine.<br />

The C&C CCG parser (Clark and Curran, 2006)<br />

Parsing speed is improved by 30–35%<br />

without a significant accuracy penalty and<br />

a small increase in coverage when both of<br />

these modifications are used.<br />

is the fastest linguistically motivated parser in the<br />

literature, but it is still limited <strong>to</strong> about 25 sentences<br />

per second on commodity hardware.<br />

This paper describes two modifications <strong>to</strong> the<br />

C&C parser that significantly improve parsing ef-<br />

1 Introduction<br />

ficiency without reducing accuracy or coverage.<br />

The first involves chart repair, where the CKY<br />

Parsing is the process of determining the syntactic chart is repaired when new categories are added,<br />

structure of a sentence. It is an integral part of the instead of rebuilt from scratch. This allows an<br />

deep semantic analysis that any sophisticated Nat- even tighter integration of the parser and supertagural<br />

Language Processing (NLP) system, such as ger (described below) which results in an 11%<br />

Question Answering and Information Extraction speed improvement over the original parser.<br />

systems, must perform.<br />

The second modification involves parsing with<br />

The sentences Bob killed Alice, Alice was killed by constraints, that is, requiring certain spans <strong>to</strong> be<br />

Bob and Bob was the man who killed Alice convey constituents. This reduces the search space con-<br />

the same information. If we treat the sentence as siderably by eliminating a large number of con-<br />

a bag or sequence of words by assuming limited stituents that cross the boundary of these spans.<br />

structure, the sentences appear <strong>to</strong> be very differ- The best set of constraints results in a 10% iment.<br />

These examples demonstrate that full parsing provement over the original parser. These con-<br />

is necessary for accurate semantic interpretation. straints are also useful for other tasks. Finally,<br />

Further, sophisticated linguistic analysis capable when both chart repair and constraints are used,<br />

of modelling a wider range of phenomena should a 30-35% speed improvement is achieved while<br />

give us the most information.<br />

coverage increases and the accuracy is unchanged.<br />

Proceedings of the 2006 Australasian Language Technology Workshop (ALTW2006), pages 1–8.<br />

1

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!