Automatic Mapping Clinical Notes to Medical - RMIT University

More documents

Recommendations

Info

$journal of latex class files, vol. 6, no. 1, january ... - RMIT University$

Figure 2: Cells affected by adding a constraint. The axes are the cell indices in the chart with pos the starting position, and span the length of the span, of constituents in the cell. 5 Constraints The original C&C parser uses a supertagger to assign a number of categories to each word, together with the probability of each category (Clark and Curran, 2004). In the first iteration, only categories with β ≥ 0.075 are used, where β is the ratio of the probability of the category and the probability of the most likely category for that word. For example, if the categories for dog are N, NP and N/N with probabilities 0.8, 0.12 and 0.08 then β value of N/N is 0.08/0.8 = 0.1. If there is no spanning tree, a lower value is used for the cutoff β in the next iteration so that more tags are assigned to each word. The previous chart is destroyed and the new chart is built from scratch. Since the assigned categories are always a superset of the previously assigned ones, the derivations in the new chart will include all the derivations in the previous chart. Instead of rebuilding the chart when new categories are added it can be simply repaired by modifying cells that are affected by the new tags. Considering the case where a single tag is added to the ith word in an n word sentence, the new tag can only affect the cells that satisfy pos ≤ i and pos + span > i. These cells are shown in Figure 3. The chart can therefore be repaired bottom up by updating a third of the cells on average. The number of affected cells is (n − pos) × pos and the total number of cells is approximately n2 2 . The average number of affected cells is approxi- 4 n mately 1 n2 n 0 (n − p)p dp = 6 , so on average a third of the cells are affected. The chart is repaired bottom up. A new category is added to one word by adding it to the list of categories in the appropriate cell in the bottom row. The list is marked so that we know which categories are new. For each cell C in the second row we look for each pair of cells A and B whose spans combine to create the span of C. In the original algorithm all categories from A are combined with all categories from B, but during the repair this is only done if at least one of them is new because otherwise the resulting category would already be in C. Again the list of categories in C is marked so that cells higher in the chart know which categories are new. This is repeated for all affected cells. This speeds up the parser not only because previous computations are reused, but also because categories can be added one at a time until a spanning derivation is found. This increases coverage slightly because the number of categories can be varied one by one. In the original parser it was possible to have sentences that a spanning tree cannot be found for using for example 20 categories, but increasing the number of categories to 25 causes the total number of derivations in the chart to exceed a predefined limit, so the sentence does not get parsed even if 23 categories would produce a spanning tree without exceeding the limit.
6 Chart Repair Adding constraints to the parser was originally designed to enable more efficient manual annotation. The parser has been used for semi-automatic parsing of sentences to create gold standard derivations. It is used to produce a guess, which is then manually corrected. Adding a constraint could speed this process up by allowing annotators to quickly tell the parser which part it got wrong. For example if the Royal Air Force contract was parsed as (the Royal (Air Force contract)), meaning that the Air Force contract was royal, instead of manually annotating the category for each word the annotator could simply create a constraint on Royal Air Force requiring it to be a constituent, and thus making the previous derivation impossible. The correct derivation, (the ((Royal Air Force) contract)) would then very likely be produced without further effort from the annotator. However, parsing would be much faster in general if the search space could be constrained by requiring known spans P to be a single constituent. This reduces the search space because P must be the yield of a single cell CP (posP , spanP ), so the cells with yields that cross the boundary of P do not need to be considered at all (grey squares in Figure 2). The problem of what spans are known in advance is described below. In addition, if a cell contains P as a prefix or suffix (wavy pattern cells in Figure 2) then it also has constraints on how it can be created. In Figure 2, P = cell(3, 4) is required, i.e. the span starting at word 3 of length 4 containing words 3, Figure 3: Cells affected by chart repair. 5 4, 5 and 6 is a constituent. Consider cell(3, 7). It includes words 3 to 9 and contains P as the prefix. Normally cell(3, 7) can be created by combining cell(3, 1) with cell(4, 6), . . . , cell(pos, s) with cell(pos + s, span − s), . . . , and cell(3, 6) with cell(9, 1). However the first three of these combinations are not allowed because the second component would cross the boundary of P . This gives a lower limit for the span of the left component. Similarly if P is the suffix of the span of a cell then there is a lower limit on the span of the right component. This eliminates a lot of work during parsing and can provide a significant speed increase. In the quick brown foxes like dogs, the following phrases would all be possible constituents: foxes like dogs, brown foxes like dogs, . . . , and the quick brown foxes like dogs. Since the chart is built bottom up the parser has no knowledge of the surrounding words so each of those appears like a valid constituent and would need to be created. However if the quick brown foxes is known to be a constituent then only the last option becomes possible. 7 Creating Constraints How can we know that specific spans must be yielded by constituents in advance? Surely the parsing is already solved if we have this information? In this paper, we have experimented with constraints determined from shallow parsing and hints in the sentence itself. Chunk tags (gold standard and from the C&C chunker) were used to create constraints. Only NPs
Page 1: Proceeding of the Australasian Lang
Page 4 and 5: Program Chairs: Committees Lawrence
Page 6 and 7: Towards the Evaluation of Referring
Page 8 and 9: The cat ate the hat that I made NP/
Page 12 and 13: were used because the accuracy for
Page 14 and 15: TIME ACC COVER FAIL RATE % secs % %
Page 16 and 17: model. We explore different estimat
Page 18 and 19: S.S. Source Sense Source S.S. Acc.
Page 20 and 21: acy. The difference in correctly as
Page 22 and 23: Experiments with Sentence Classific
Page 24 and 25: datasets with skewed distribution t
Page 26 and 27: words produced by the feature selec
Page 28 and 29: Sentence Class NB DT SVM Apology 1.
Page 30 and 31: Computational Semantics in the Natu
Page 32 and 33: S[sem = ] -> NP[sem=?subj] VP[sem=?
Page 34 and 35: Any string which cannot be decompos
Page 36 and 37: satisfy(Formula,model(D,F),G,pos):n
Page 38 and 39: Classifying Speech Acts using Verba
Page 40 and 41: The principles are dichotomous —
Page 42 and 43: ance. Several additional example ut
Page 44 and 45: our results. In classifying only th
Page 46 and 47: Word Relatives in Context for Word
Page 48 and 49: create a collection of training exa
Page 50 and 51: Query Sense The nave was rebuilt in
Page 52 and 53: Algorithm Avg S2LS Avg S3LS Avg S2L
Page 54 and 55: B. Snyder and M. Palmer. 2004. The
Page 56 and 57: it is even used as a black box that
Page 58 and 59: ecognition in QA, so we will not us
Page 60 and 61:
BPER ILOC IPER BLOC BLOC BDATE BLOC
Page 62 and 63:
of noise introduced by the addition
Page 64 and 65:
2 Existing annotated corpora Much o
Page 66 and 67:
Our FUSE|tel spectrum of HD|sta 738
Page 68 and 69:
N Word Correct Tagged 23 OH mol non
Page 70 and 71:
L ATEX typesetting information into
Page 72 and 73:
in English, neither the determiner
Page 74 and 75:
con 4 (Sagot et al., 2006) and Morp
Page 76 and 77:
Feature type Positions/description
Page 78 and 79:
Miriam Butt, Helge Dyvik, Tracy Hol
Page 80 and 81:
ently mostly solved by employing hu
Page 82 and 83:
Figure 1: Augmented SCT Lexicon Tok
Page 84 and 85:
medical terms can be composed by ad
Page 86 and 87:
References Aronson, A. R. (2001). E
Page 88 and 89:
technique to most question types, i
Page 90 and 91:
User Question Analyser QA System Se
Page 92 and 93:
Figure 2: Precision Figure 3: Cover
Page 94 and 95:
Analysis and Review. Springer-Verla
Page 96 and 97:
2 Background 2.1 Pronoun Reference
Page 98 and 99:
ous accounts of the effects of cohe
Page 100 and 101:
Table 1: Overall Results for Object
Page 102 and 103:
References Jennifer Arnold, Janet E
Page 104 and 105:
In order for appropriate documents
Page 106 and 107:
y Michael West. He found that his t
Page 108 and 109:
low-scoring documents are “Click
Page 110 and 111:
a) b) You must complete at least 9
Page 112 and 113:
uct). In this paper, we will only c
Page 114 and 115:
indicate the categories of substruc
Page 116 and 117:
Argument lowering Dependent geach
Page 118 and 119:
who [u : np] WH(np, s, s/?gq) λP.
Page 120 and 121:
algorithms, and report the performa
Page 122 and 123:
2.3 Coverage of the Human Data Out
Page 124 and 125:
and minimal subsets of the data. Of
Page 126 and 127:
output. Finally, and related to the
Page 128 and 129:
identifies, to avoid overwhelming t
Page 130 and 131:
y the student is first parsed, usin
Page 132 and 133:
a given word sequence Sorig as a pe
Page 134 and 135:
A problem arises with this scheme w
Page 136 and 137:
Example 1: Original Headline: Europ
Page 138 and 139:
|relations(sentence1) ∩ relations
Page 140 and 141:
2685 True False 1275 Bleu Dep Bleu
Page 142 and 143:
Features Acc. C1-prec. C1-recall C1
Page 144 and 145:
Given the lack of research in VSD u
Page 146 and 147:
ased features: Type 1 Original sibl
Page 148 and 149:
Preposition of the adjunct semantic
Page 150 and 151:
Algorithm 1 The algorithm of combin
Page 152 and 153:
References Timothy Baldwin and Fran
Page 154 and 155:
dependencies in German with respect
Page 156 and 157:
wij moeten dit probleem aanpakken w
Page 158 and 159:
a brevity penalty of BLEU n = 1 n =
Page 160 and 161:
plicitly matching the syntax of the
Page 162 and 163:
Probabilities improve stress-predic
Page 164 and 165:
Towards Cognitive Optimisation of a
Page 166 and 167:
Natural Language Processing and XML
Page 168 and 169:
Extracting Patient Clinical Profile
show all

Automatic Mapping Clinical Notes to Medical - RMIT University

Create successful ePaper yourself

Delete template?

Save as template?