Automatic Mapping Clinical Notes to Medical - RMIT University
Automatic Mapping Clinical Notes to Medical - RMIT University
Automatic Mapping Clinical Notes to Medical - RMIT University
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
Efficient Combina<strong>to</strong>ry Categorial Grammar Parsing<br />
Bojan Djordjevic and James R. Curran<br />
School of Information Technologies<br />
<strong>University</strong> of Sydney<br />
NSW 2006, Australia<br />
{bojan,james}@it.usyd.edu.au<br />
Abstract<br />
Unfortunately, parsing is very inefficient because<br />
of the large degree of ambiguity present<br />
Efficient wide-coverage parsing is integral in natural language. This is particularly true for<br />
<strong>to</strong> large-scale NLP applications. Unfortu- wide-coverage grammars in linguistically expresnately,<br />
parsers for linguistically motivated sive formalisms, especially those au<strong>to</strong>matically<br />
formalisms, e.g. HPSG and TAG, are often extracted from a treebank.<br />
<strong>to</strong>o inefficient for these applications.<br />
Many NLP systems use shallow parsing be-<br />
This paper describes two modifications cause full parsing is <strong>to</strong>o slow (Grishman, 1997).<br />
<strong>to</strong> the standard CKY chart parsing algo- To improve the approximate structure identified<br />
rithm used in the Clark and Curran (2006) by shallow parsers, many systems use domain-<br />
Combina<strong>to</strong>ry Categorial Grammar (CCG) specific knowledge <strong>to</strong> extract dependencies (Gr-<br />
parser. The first modification extends the ishman, 1997; Cole et al., 1997). Ciravegna et al.<br />
tight integration of the supertagger and (1997) show that the accuracy can be improved by<br />
parser, so that individual supertags can be using a better parser. The ability of NLP systems <strong>to</strong><br />
added <strong>to</strong> the chart, which is then repaired extract useful and correct information could there-<br />
rather than rebuilt. The second modificafore be improved substantially if the speed of full<br />
tion adds constraints <strong>to</strong> the chart that re- parsing was acceptable.<br />
strict which constituents can combine.<br />
The C&C CCG parser (Clark and Curran, 2006)<br />
Parsing speed is improved by 30–35%<br />
without a significant accuracy penalty and<br />
a small increase in coverage when both of<br />
these modifications are used.<br />
is the fastest linguistically motivated parser in the<br />
literature, but it is still limited <strong>to</strong> about 25 sentences<br />
per second on commodity hardware.<br />
This paper describes two modifications <strong>to</strong> the<br />
C&C parser that significantly improve parsing ef-<br />
1 Introduction<br />
ficiency without reducing accuracy or coverage.<br />
The first involves chart repair, where the CKY<br />
Parsing is the process of determining the syntactic chart is repaired when new categories are added,<br />
structure of a sentence. It is an integral part of the instead of rebuilt from scratch. This allows an<br />
deep semantic analysis that any sophisticated Nat- even tighter integration of the parser and supertagural<br />
Language Processing (NLP) system, such as ger (described below) which results in an 11%<br />
Question Answering and Information Extraction speed improvement over the original parser.<br />
systems, must perform.<br />
The second modification involves parsing with<br />
The sentences Bob killed Alice, Alice was killed by constraints, that is, requiring certain spans <strong>to</strong> be<br />
Bob and Bob was the man who killed Alice convey constituents. This reduces the search space con-<br />
the same information. If we treat the sentence as siderably by eliminating a large number of con-<br />
a bag or sequence of words by assuming limited stituents that cross the boundary of these spans.<br />
structure, the sentences appear <strong>to</strong> be very differ- The best set of constraints results in a 10% iment.<br />
These examples demonstrate that full parsing provement over the original parser. These con-<br />
is necessary for accurate semantic interpretation. straints are also useful for other tasks. Finally,<br />
Further, sophisticated linguistic analysis capable when both chart repair and constraints are used,<br />
of modelling a wider range of phenomena should a 30-35% speed improvement is achieved while<br />
give us the most information.<br />
coverage increases and the accuracy is unchanged.<br />
Proceedings of the 2006 Australasian Language Technology Workshop (ALTW2006), pages 1–8.<br />
1