Automatic Mapping Clinical Notes to Medical - RMIT University

More documents

Recommendations

Info

$journal of latex class files, vol. 6, no. 1, january ... - RMIT University$

wij moeten dit probleem aanpakken wij moeten aanpakken dit probleem aanpakken wij−4 moeten−3 probleem−1 aanpakken wij−2 moeten−1 probleem+1 SHD dit−1 dit−1 2 (node:aanpakken) = SHD2 (node:aanpakken) = (−4) 2 + (−3) 2 + (−1) 2 = 26 (−2) 2 + (−1) 2 + 12 = 6 Figure 1: Reordering based on the sum of the head distances To illustrate the algorithm of this section we present the following two examples: 1. Verb initial in VP: normal order: wij moeten dit probleem aanpakken we must this problem tackle reordered: wij moeten aanpakken dit probleem we must tackle this problem reference: we must tackle this particular problem 2. Verb Particle reordering: normal order: het parlement neemt de resolutie aan the parliament takes the resolution over reordered: het parlement neemt aan de resolutie the parliament takes over the resolution reference: parliament adopted the resolution As an example of the calculation of distances, the left tree of Figure 1 is the dependency tree for the normal order for example 1; nodes are annotated with the distance from the word to its governor. Note that in example 1 probleem gets a value of 1, although the word itself is 2 away from its head; we are measuring the distance from this complete constituent and not this particular word. Based on the distance of the different child nodes we want to define an optimal reordering and pick that one. This means we have to score all the different reorderings. We want a scoring measure to do this that ignores the sign of distances 150 and gives higher weight to longer dependency distances. Thus, similar to various statistical optimisation algorithms such as Least Mean Squares, we calculate the square of the Sum of the Head Distances (SHD2 ) for each node n, defined as: SHD2 (n) = Distance(c,n) 2 c ǫ children(n) Every different ordering of children and head has a SHD2 value; we are interested in minimising this value. We give an SHD2 example in Figure 1. We then reorder the children so that the SHD 2 score of a node is minimised. The righthand tree of Figure 1 gives an example. In example 1 we can see how the Dutch verb aanpakken is moved to the beginning of the verb phrase. In this example we match English word order, even though this is not an explicit goal of the metric. The second example does not match English word order as such, but in Dutch the verb aannemen was split into aan and neemt in the sentence. Our reordering places these two words together so that PSMT can pickup that this is actually one single unit. Note that the two examples also demonstrate two of Collins et al. (2005) hand-written rules. In fact, this principle subsumes all the examples given in that paper in the prototypical cases. 3.2 Selecting Minimal Reorderings In implementing the algorithm, for each node with children we calculate the SHD 2 for all permutations (note that this is not computationally expensive as each node has only a small number of children). We select the collection of sibling orders with a minimal SHD 2 . This is indeed a collection because different orderings can still have the same
SHD 2 value. In these cases we try to match the original sentence order much as possible. There are many different way to calculate which permutation’s order is closer to the original. We first count all the constituents which are on the other side of the head compared to the original. For all permutations from the list with the same minimal SHD2 we count these jumps and keep the collection with the minimal number of jumps. Then we count the breaks where the new order deviates from an ascending ordering, when the constituents are labelled with their original position. For every constituent ordering c this Break Count (BC) is defined as: N BC(c) = max(0, Poso(n − 1) − Poso(n) + 1) n=2 Here Poso(x) is the original position of the xth constituent in the new order, and N is the length of the constituent c. As an example: if we have the five constituents 1,2,3,4,5 which in a new order y are ordered 2,1,3,4,5 we have one break from monotonicity, where from 2 we go to 1. We add the number of breaks to the size of the break. In this case, BC(y) = 2. The original constituent order always gets value 0. From the remaining collection of orderings we can now select that one with the lowest value. This always results in a unique ordering. 3.3 Source Language Parser To derive the dependency trees we used the Alpino (Bouma et al., 2000) parser. 2 Because this grammar comes with dependency information we closely follow their definition of head-dependent relations, deviating from this in only one respect. The Alpino parser marks auxiliary verbs as being the head of a complete sentence, while we took the main verb as the head of sentence, transforming the parse trees accordingly (thus moving closer to semantic dependencies). The Alpino parser does not always produce projective parses. Reading off parse trees of Alpino in some cases already changes the word order. 2 We would like to thank van Noord from the University of Groningen for kindly providing us the parses made by Alpino for most of the Dutch sections for the relevant data. 151 3.4 Four Reordering Models We investigate four models. The first, the ‘Alpino model’, is to measure the impact of the parser used, as Alpino does some reordering that is intended to be linguistically helpful. We want to know what the result is of using a dependency parser which does not generate projective parses i.e. there are parses which do not result into the original sentence if we read off the tree for this parse. In this model we parse the sentences with our parser, and we simply read off the tree. If the tree is projective this results in the original tree. If this is not the case we keep for every node the original order of its children. The second, the ‘full model’, chooses the reordering as described in sections 3.1 and 3.2. We hypothesised the algorithm may be too ‘enthusiastic’ in reordering. For example, when we encounter a ditransitive verb the algorithm usually would put either the direct or the indirect object in front of the subject. Longer constituents were moved to completely different positions in the sentence. This kind of reordering could be problematic for languages, like English, which heavily rely on sentence position to mark the different grammatical functions of the constituents. We therefore defined the ‘limited model’, a restriction on the previous model where only single tokens can be reordered. When analysing previous syntactically motivated reordering (Collins et al., 2005) we realised that in most cases constituents consisting of one token only were repositioned in the sentence. Furthermore since sentence ordering is so important we decided only to reorder if ‘substantial’ parts were changed. To do this we introduced a threshold R and only accepted a new ordering if the new SHD 2 has a reduction of at least R in regard to the original sentence ordering. If it is not possible to reduce SHD 2 that far we would keep the original ordering. Varying R between 0 and 1, in this preliminary work we determined the value R = 0.9. Finally we reimplemented the six rules in Collins et al. (2005) as closely as possible given our language pair and our parser, the Alpino parser. The ‘Collins Model’ will show us the im-
Page 1:
Proceeding of the Australasian Lang
Page 4 and 5:
Program Chairs: Committees Lawrence
Page 6 and 7:
Towards the Evaluation of Referring
Page 8 and 9:
The cat ate the hat that I made NP/
Page 10 and 11:
Figure 2: Cells affected by adding
Page 12 and 13:
were used because the accuracy for
Page 14 and 15:
TIME ACC COVER FAIL RATE % secs % %
Page 16 and 17:
model. We explore different estimat
Page 18 and 19:
S.S. Source Sense Source S.S. Acc.
Page 20 and 21:
acy. The difference in correctly as
Page 22 and 23:
Experiments with Sentence Classific
Page 24 and 25:
datasets with skewed distribution t
Page 26 and 27:
words produced by the feature selec
Page 28 and 29:
Sentence Class NB DT SVM Apology 1.
Page 30 and 31:
Computational Semantics in the Natu
Page 32 and 33:
S[sem = ] -> NP[sem=?subj] VP[sem=?
Page 34 and 35:
Any string which cannot be decompos
Page 36 and 37:
satisfy(Formula,model(D,F),G,pos):n
Page 38 and 39:
Classifying Speech Acts using Verba
Page 40 and 41:
The principles are dichotomous —
Page 42 and 43:
ance. Several additional example ut
Page 44 and 45:
our results. In classifying only th
Page 46 and 47:
Word Relatives in Context for Word
Page 48 and 49:
create a collection of training exa
Page 50 and 51:
Query Sense The nave was rebuilt in
Page 52 and 53:
Algorithm Avg S2LS Avg S3LS Avg S2L
Page 54 and 55:
B. Snyder and M. Palmer. 2004. The
Page 56 and 57:
it is even used as a black box that
Page 58 and 59:
ecognition in QA, so we will not us
Page 60 and 61:
BPER ILOC IPER BLOC BLOC BDATE BLOC
Page 62 and 63:
of noise introduced by the addition
Page 64 and 65:
2 Existing annotated corpora Much o
Page 66 and 67:
Our FUSE|tel spectrum of HD|sta 738
Page 68 and 69:
N Word Correct Tagged 23 OH mol non
Page 70 and 71:
L ATEX typesetting information into
Page 72 and 73:
in English, neither the determiner
Page 74 and 75:
con 4 (Sagot et al., 2006) and Morp
Page 76 and 77:
Feature type Positions/description
Page 78 and 79:
Miriam Butt, Helge Dyvik, Tracy Hol
Page 80 and 81:
ently mostly solved by employing hu
Page 82 and 83:
Figure 1: Augmented SCT Lexicon Tok
Page 84 and 85:
medical terms can be composed by ad
Page 86 and 87:
References Aronson, A. R. (2001). E
Page 88 and 89:
technique to most question types, i
Page 90 and 91:
User Question Analyser QA System Se
Page 92 and 93:
Figure 2: Precision Figure 3: Cover
Page 94 and 95:
Analysis and Review. Springer-Verla
Page 96 and 97:
2 Background 2.1 Pronoun Reference
Page 98 and 99:
ous accounts of the effects of cohe
Page 100 and 101:
Table 1: Overall Results for Object
Page 102 and 103:
References Jennifer Arnold, Janet E
Page 104 and 105:
In order for appropriate documents
Page 106 and 107: y Michael West. He found that his t
Page 108 and 109: low-scoring documents are “Click
Page 110 and 111: a) b) You must complete at least 9
Page 112 and 113: uct). In this paper, we will only c
Page 114 and 115: indicate the categories of substruc
Page 116 and 117: Argument lowering Dependent geach
Page 118 and 119: who [u : np] WH(np, s, s/?gq) λP.
Page 120 and 121: algorithms, and report the performa
Page 122 and 123: 2.3 Coverage of the Human Data Out
Page 124 and 125: and minimal subsets of the data. Of
Page 126 and 127: output. Finally, and related to the
Page 128 and 129: identifies, to avoid overwhelming t
Page 130 and 131: y the student is first parsed, usin
Page 132 and 133: a given word sequence Sorig as a pe
Page 134 and 135: A problem arises with this scheme w
Page 136 and 137: Example 1: Original Headline: Europ
Page 138 and 139: |relations(sentence1) ∩ relations
Page 140 and 141: 2685 True False 1275 Bleu Dep Bleu
Page 142 and 143: Features Acc. C1-prec. C1-recall C1
Page 144 and 145: Given the lack of research in VSD u
Page 146 and 147: ased features: Type 1 Original sibl
Page 148 and 149: Preposition of the adjunct semantic
Page 150 and 151: Algorithm 1 The algorithm of combin
Page 152 and 153: References Timothy Baldwin and Fran
Page 154 and 155: dependencies in German with respect
Page 158 and 159: a brevity penalty of BLEU n = 1 n =
Page 160 and 161: plicitly matching the syntax of the
Page 162 and 163: Probabilities improve stress-predic
Page 164 and 165: Towards Cognitive Optimisation of a
Page 166 and 167: Natural Language Processing and XML
Page 168 and 169: Extracting Patient Clinical Profile
show all

Automatic Mapping Clinical Notes to Medical - RMIT University

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?