Automatic Mapping Clinical Notes to Medical - RMIT University

More documents

Recommendations

Info

$journal of latex class files, vol. 6, no. 1, january ... - RMIT University$

The cat ate the hat that I made NP/N N (S\NP)/NP NP/N N (NP\NP)/(S/NP) NP (S\NP)/NP > > >T NP NP S/(S\NP) >B S/NP > NP\NP < NP > S\NP < S 2 Combinatory Categorial Grammar Figure 1: A Combinatory Categorial Grammar derivation Context free grammars (CFGs) have traditionally been used for parsing natural language. However, some constructs in natural language require more expressive grammars. Mildly context sensitive grammars, e.g. HPSG and TAG, are powerful enough to describe natural language but like CFGs (Younger, 1967) are polynomial time parseable (Vijay-Shanker and Weir, 1990). Combinatory Categorial Grammar (CCG) is another mildy context sensitive grammar (Steedman, 2000) that has significant advantages over CFGs, especially for analysing constructions involving coordination and long range dependencies. Consider the following sentence: Give a teacher an apple and a policeman a flower. There are local dependencies between give and teacher, and between give and apple. The additional dependencies between give, and policeman and flower are long range, but are extracted easily using CCG. a policeman a flower is a non-standard constituent that CCG deals with very elegantly, allowing a teacher an apple and a policeman a flower to be coordinated before attachment to the verb give. In CCG each word is assigned a category which encodes sub-categorisation information. Categories are either atomic, such as N, NP and S for noun, noun phrase or sentence; or complex, such as NP/N for a word that combines with a noun on the right to form a noun phrase. NP \N is similarly a word that combines with a noun on the left to create an NP . The categories are then combined using combinatory rules such as forward application > (X/Y Y ⇒ X) and forward composition >B (X/Y Y/Z ⇒B X/Z). An example derivation that uses a number of combination rules is shown in Figure 1. The example demonstrates how CCG handles long range de- 2 pendencies such as the hat . . . I made. Type raising (>T) and forward composition (>B) on I made are used to derive the same predicate-argument structure as if it was written as I made the hat. A derivation creates predicate-argument dependencies, which are 5-tuples 〈hf , f, s, ha, l〉, where hf is the word of the category representing the relationship, f is the category itself, s is the argument slot, ha is the head word of the argument and l indicates if the dependency is local. The argument slot is used to identify the different arguments of words like bet in [Alice]1 bet [Bob]2 [five dollars]3 [that they win]4. The dependency between bet and Bob would be represented as 〈bet, (((S\NP1)/NP2)/NP3)/S[em]4, 2, Bob, −〉. The C&C parser is evaluated by comparing extracted dependencies against the gold standard. Derivations are not compared directly because different derivations can produce the same dependency structure. The parser is trained on CCGbank, a version of Penn Treebank translated semi-automatically into CCG derivations and predicate-argument dependencies (Hockenmaier and Steedman, 2006). The resulting corpus contains 99.4% of the sentences in the Penn Treebank. Hockenmaier and Steedman also describe how a large CCG grammar can be extracted from CCGbank. A grammar automatically extracted from CCGbank is used in the C&C parser and supertagger. 3 C&C CCG Parser The C&C parser takes one or more syntactic structures (categories) assigned to each word and attempts to build a spanning analysis of the sentence. Typically every category that a word was seen with in the training data is assigned.
Supertagging (Bangalore and Joshi, 1999) was introduced for Lexicalized Tree Adjoining Grammar (LTAG) as a way of assigning fewer categories to each word thus reducing the search space of the parser and improving parser efficiency. Clark (2002) introduced supertagging for CCG parsing. The supertagger used in the C&C parser is a maximum entropy (Berger et al., 1996) sequence tagger that uses words and part of speech (POS) tags in a five word window as features. The label set consists of approximately 500 categories (or supertags) so the task is significantly harder than other NLP sequence tagging tasks. The supertagger assigns one or more possible categories to each word together with the probability for each of the guesses. Clark and Curran (2004) discovered that initially assigning a very small number of categories per word and then attempting to parse was not only faster but more accurate than assigning many categories. If no spanning analysis could be found the parser requested more categories from the supertagger, and the parsing process was repeated until the number of attempts exceeded a limit (typically 5 levels of supertagger ambiguity). This tight integration of the supertagger and the parser resulted in state of the art accuracy and a massive improvement in efficiency, reaching up to 25 sentences a second. Up until now, when a spanning analysis was not found the chart was destroyed, then extra categories are assigned to each word, and the chart is built again from scratch. However the chart rebuilding process is very wasteful because the new chart is always a superset of the previous one and could be created by just updating the old chart instead of rebuilding it. This has limited how small the initial ambiguity levels can be set and thus how closely the parser and supertagger can interact. The first modification we describe below is to implement chart repair which allows additional categories to be assigned to an existing chart and the CKY algorithm to run efficiently over just the modified section. 4 Chart Parsing Given a sentence of n words, we define position pos ∈ {0, . . . , n − 1} to be the starting position of a span (contiguous sequence of words), and span, its size. So the hat in the cat ate the hat would have pos = 3 and span = 2. Each span can be parsed in a number of ways so a set of deriva- 3 tions will be created for each valid (pos, span) pair. Let (pos, span) represent this set of derivations. Then, the derivations for (pos, span) will be combinations of derivations in (pos, k) and (pos+ k, span − k) for all k ∈ {1, . . . , span − 1}. The naïve way to parse a sentence using these definitions is to find the derivations that span the whole sentence (0, n) by recursively finding derivations in (0, k) and (k, n − k) for all k ∈ {1, . . . , n − 1}. However, this evaluates derivations for each (pos, span) pair multiple times, making the time complexity exponential in n. To make the algorithm polynomial time, dynamic programming can be used by storing the derivations for each (pos, span) when they are evaluated, and then reusing the stored values. The chart data structure is used to store the derivations. The chart is a two dimensional array indexed by pos and span. The valid pairs correspond to pos + span ≤ n, that is, to spans that do not extend beyond the end of the sentence. The squares represent valid cells in Figure 2. The location of cell(3, 4) is marked with a diamond. cell(3, 4) stores the derivations whose yield is the four word sequence indicated. The CKY (also called CYK or Cocke-Younger- Kasami) algorithm used in the C&C parser has an O(n 3 ) worst case time complexity. Sikkel and Nijholt (1997) give a formal description of CKY (Younger, 1967) and similar parsing algorithms, such as the Earley parser (Earley, 1970). The CKY algorithm is a bottom up algorithm and works by combining adjacent words to give a span of size two (second row from the bottom in Figure 2). It then combines adjacent spans in the first two rows to create all allowable spans of size three in the row above. This process is continued until a phrase that spans the whole sentence (top row) is reached. The (lexical) categories in the bottom row (on the lexical items themselves) are assigned by the supertagger (Clark and Curran, 2004). The number of categories assigned to each word can be varied dynamically. Assigning a small number of categories (i.e. keeping the level of lexical category ambiguity low) increases the parsing speed significantly but does not always produce a spanning derivation. The original C&C parser uses a small number of categories first, and if no spanning tree is found the process is repeated with a larger number of categories.
Page 1: Proceeding of the Australasian Lang
Page 4 and 5: Program Chairs: Committees Lawrence
Page 6 and 7: Towards the Evaluation of Referring
Page 10 and 11: Figure 2: Cells affected by adding
Page 12 and 13: were used because the accuracy for
Page 14 and 15: TIME ACC COVER FAIL RATE % secs % %
Page 16 and 17: model. We explore different estimat
Page 18 and 19: S.S. Source Sense Source S.S. Acc.
Page 20 and 21: acy. The difference in correctly as
Page 22 and 23: Experiments with Sentence Classific
Page 24 and 25: datasets with skewed distribution t
Page 26 and 27: words produced by the feature selec
Page 28 and 29: Sentence Class NB DT SVM Apology 1.
Page 30 and 31: Computational Semantics in the Natu
Page 32 and 33: S[sem = ] -> NP[sem=?subj] VP[sem=?
Page 34 and 35: Any string which cannot be decompos
Page 36 and 37: satisfy(Formula,model(D,F),G,pos):n
Page 38 and 39: Classifying Speech Acts using Verba
Page 40 and 41: The principles are dichotomous —
Page 42 and 43: ance. Several additional example ut
Page 44 and 45: our results. In classifying only th
Page 46 and 47: Word Relatives in Context for Word
Page 48 and 49: create a collection of training exa
Page 50 and 51: Query Sense The nave was rebuilt in
Page 52 and 53: Algorithm Avg S2LS Avg S3LS Avg S2L
Page 54 and 55: B. Snyder and M. Palmer. 2004. The
Page 56 and 57: it is even used as a black box that
Page 58 and 59:
ecognition in QA, so we will not us
Page 60 and 61:
BPER ILOC IPER BLOC BLOC BDATE BLOC
Page 62 and 63:
of noise introduced by the addition
Page 64 and 65:
2 Existing annotated corpora Much o
Page 66 and 67:
Our FUSE|tel spectrum of HD|sta 738
Page 68 and 69:
N Word Correct Tagged 23 OH mol non
Page 70 and 71:
L ATEX typesetting information into
Page 72 and 73:
in English, neither the determiner
Page 74 and 75:
con 4 (Sagot et al., 2006) and Morp
Page 76 and 77:
Feature type Positions/description
Page 78 and 79:
Miriam Butt, Helge Dyvik, Tracy Hol
Page 80 and 81:
ently mostly solved by employing hu
Page 82 and 83:
Figure 1: Augmented SCT Lexicon Tok
Page 84 and 85:
medical terms can be composed by ad
Page 86 and 87:
References Aronson, A. R. (2001). E
Page 88 and 89:
technique to most question types, i
Page 90 and 91:
User Question Analyser QA System Se
Page 92 and 93:
Figure 2: Precision Figure 3: Cover
Page 94 and 95:
Analysis and Review. Springer-Verla
Page 96 and 97:
2 Background 2.1 Pronoun Reference
Page 98 and 99:
ous accounts of the effects of cohe
Page 100 and 101:
Table 1: Overall Results for Object
Page 102 and 103:
References Jennifer Arnold, Janet E
Page 104 and 105:
In order for appropriate documents
Page 106 and 107:
y Michael West. He found that his t
Page 108 and 109:
low-scoring documents are “Click
Page 110 and 111:
a) b) You must complete at least 9
Page 112 and 113:
uct). In this paper, we will only c
Page 114 and 115:
indicate the categories of substruc
Page 116 and 117:
Argument lowering Dependent geach
Page 118 and 119:
who [u : np] WH(np, s, s/?gq) λP.
Page 120 and 121:
algorithms, and report the performa
Page 122 and 123:
2.3 Coverage of the Human Data Out
Page 124 and 125:
and minimal subsets of the data. Of
Page 126 and 127:
output. Finally, and related to the
Page 128 and 129:
identifies, to avoid overwhelming t
Page 130 and 131:
y the student is first parsed, usin
Page 132 and 133:
a given word sequence Sorig as a pe
Page 134 and 135:
A problem arises with this scheme w
Page 136 and 137:
Example 1: Original Headline: Europ
Page 138 and 139:
|relations(sentence1) ∩ relations
Page 140 and 141:
2685 True False 1275 Bleu Dep Bleu
Page 142 and 143:
Features Acc. C1-prec. C1-recall C1
Page 144 and 145:
Given the lack of research in VSD u
Page 146 and 147:
ased features: Type 1 Original sibl
Page 148 and 149:
Preposition of the adjunct semantic
Page 150 and 151:
Algorithm 1 The algorithm of combin
Page 152 and 153:
References Timothy Baldwin and Fran
Page 154 and 155:
dependencies in German with respect
Page 156 and 157:
wij moeten dit probleem aanpakken w
Page 158 and 159:
a brevity penalty of BLEU n = 1 n =
Page 160 and 161:
plicitly matching the syntax of the
Page 162 and 163:
Probabilities improve stress-predic
Page 164 and 165:
Towards Cognitive Optimisation of a
Page 166 and 167:
Natural Language Processing and XML
Page 168 and 169:
Extracting Patient Clinical Profile
show all

Automatic Mapping Clinical Notes to Medical - RMIT University

Create successful ePaper yourself

Delete template?

Save as template?