Automatic Mapping Clinical Notes to Medical - RMIT University

More documents

Recommendations

Info

$journal of latex class files, vol. 6, no. 1, january ... - RMIT University$

BPER ILOC IPER BLOC BLOC BDATE BLOC IPER OUT OUT IPER OUT IDATE OUT Jack London lived in Oakland in 1885 . PERSON LOCATION LOCATION DATE PERSON PERSON LOCATION Figure 1: Named entities as multiple labels. The token-based labels appear above the words. The final NE labels appear below the words. BPER ILOC IPER BLOC BLOC BDATE BLOC IPER OUT OUT IPER OUT IDATE OUT Jack London lived in Oakland in 1885 . PERSON LOCATION DATE Figure 2: Named entities as single labels. The token-based labels appear above the words. The resulting NE labels appear below the words. ment selection (using the list of relevant documents for each question provided by TREC). From these documents, we select the n best sentences based on word overlap between the sentence and the question. We can now compute an upper-bound baseline. By taking the selected sentences as answers, we can compute the maximum score possible from a question answering perspective. By not requiring exactly matching answers, we can count the number of questions that could be answered if the answer selection phase would be perfect. In other words, we measure the percentage of questions that can still be answered if the answer selection part of the system would be perfect. Next, we run experiments with the same settings, but applying each of the NERs to the relevant sentences. All named entities that are found in these sentences are then considered possible answers to the question and again the percentage of questions that can be answered is computed. Finally, we embed the NERs in a simplified version of AnswerFinder to test their impact in a baseline QA system. 5.2 Empirical results In Table 3 we see the percentage of questions that can still be answered after document selection. The table reflects the intuition that, the smaller the number of preselected documents, the more likely it is that the document that contains the answer is left out. The documents are selected using a list of 54 # of documents % of questions 10 75.5% 20 81.6% 30 86.9% 40 89.5% 50 92.1% Table 3: Percentage of factoid questions that can still be answered after document selection # of sentences % of questions 5 42.4% 10 49.9% 20 62.0% 30 65.4% 40 68.8% 50 70.8% 60 73.0% 70 73.7% Table 4: Percentage of factoid questions that can still be answered after sentence selection from the top 50 documents relevant documents provided for the competition. If we continue with 50 documents after document selection, we can select relevant sentences from the text in these documents using the word overlap metric. We end up with the percentages as given in Table 4. There is quite a dramatic drop from 92.1% in all the documents to 73.7% with 70 sentences selected. This can be explained from the fact that the
# of % of questions sentences ANNIE AFNERs AFNERm 5 27.9% 11.6% 27.7% 10 33.0% 13.6% 33.3% 20 41.4% 17.7% 41.9% 30 44.3% 19.0% 45.6% 40 46.2% 19.9% 47.4% 50 47.8% 20.5% 48.8% 60 49.3% 21.3% 51.0% 70 50.5% 21.3% 51.5% Table 5: Percentage of factoid questions that can still be answered after NE recognition from the top 50 documents word overlap sentence selection is not extremely sophisticated. It only looks at words that can be found in both the question and sentence. In practice, the measure is very coarse-grained. However, we are not particularly interested in perfect answers here, these figures are upper-bounds in the experiment. From the selected sentences now we extract all named entities. The results are summarised in Table 5. The figures of Table 5 approximate recall in that they indicate the questions where the NER has identified a correct answer (among possibly many wrong answers). The best results are those provided by AFNERm and they are closely followed by ANNIE. This is an interesting result in that AFNER has been trained with the Remedia Corpus, which is a very small corpus on a domain that is different from the AQUAINT corpus. In contrast, ANNIE is finetuned for the domain. Given a larger training corpus of the same domain, AFNERm’s results would presumably be much better than ANNIE’s. The results of AFNERs are much worse than the other two NERs. This clearly indicates that some of the additional entities found by AFNERm are indeed correct. It is expected that precision would be different in each NER and, in principle, the noise introduced by the erroneous labels may impact the results returned by a QA system integrating the NER. We have tested the NERs extrinsically by applying them to a baseline setting of AnswerFinder. In particular, the baseline setting of AnswerFinder applies the sentence preselection methods described above and then simply returns 55 the most frequent entity found in the sentences preselected. If there are several entities sharing the top position then one is chosen randomly. In other words, the baseline ignores the question type and the actual context of the entity. We decided to use this baseline setting because it is more closely related to the precision of the NERs than other more sophisticated settings. The results are shown in Table 6. # of % of questions sentences ANNIE AFNERs AFNERm 10 6.2% 2.4% 5.0% 20 6.2% 1.9% 7.0% 30 4.9% 1.4% 6.8% 40 3.7% 1.4% 6.0% 50 4.0% 1.2% 5.1% 60 3.5% 0.8% 5.4% 70 3.5% 0.8% 4.9% Table 6: Percentage of factoid questions that found an answer in a baseline QA system given the top 50 documents The figures show a drastic drop in the results. This is understandable given that the baseline QA system used is very basic. A higher-performance QA system would of course give better results. The best results are those using AFNERm. This confirms our hypothesis that a NER that allows multiple labels produces data that are more suitable for a QA system than a “traditional” singlelabel NER. The results suggest that, as long as recall is high, precision does not need to be too high. Thus there is no need to develop a high-precision NER. The table also indicates a degradation of the performance of the QA system as the number of preselected sentences increases. This indicates that the baseline system is sensitive to noise. The bottomscoring sentences are less relevant to the question and therefore are more likely not to contain the answer. If these sentences contain highly frequent NEs, those NEs might displace the correct answer from the top position. A high-performance QA system that is less sensitive to noise would probably produce better results as the number of preselected sentences increases (possibly at the expense of speed). The fact that AFNERm, which produces higher recall than AFNERs according to Table 5, still obtains the best results in the baseline QA system according to Table 6, suggests that the amount
Page 1:
Proceeding of the Australasian Lang
Page 4 and 5:
Program Chairs: Committees Lawrence
Page 6 and 7:
Towards the Evaluation of Referring
Page 8 and 9:
The cat ate the hat that I made NP/
Page 10 and 11: Figure 2: Cells affected by adding
Page 12 and 13: were used because the accuracy for
Page 14 and 15: TIME ACC COVER FAIL RATE % secs % %
Page 16 and 17: model. We explore different estimat
Page 18 and 19: S.S. Source Sense Source S.S. Acc.
Page 20 and 21: acy. The difference in correctly as
Page 22 and 23: Experiments with Sentence Classific
Page 24 and 25: datasets with skewed distribution t
Page 26 and 27: words produced by the feature selec
Page 28 and 29: Sentence Class NB DT SVM Apology 1.
Page 30 and 31: Computational Semantics in the Natu
Page 32 and 33: S[sem = ] -> NP[sem=?subj] VP[sem=?
Page 34 and 35: Any string which cannot be decompos
Page 36 and 37: satisfy(Formula,model(D,F),G,pos):n
Page 38 and 39: Classifying Speech Acts using Verba
Page 40 and 41: The principles are dichotomous —
Page 42 and 43: ance. Several additional example ut
Page 44 and 45: our results. In classifying only th
Page 46 and 47: Word Relatives in Context for Word
Page 48 and 49: create a collection of training exa
Page 50 and 51: Query Sense The nave was rebuilt in
Page 52 and 53: Algorithm Avg S2LS Avg S3LS Avg S2L
Page 54 and 55: B. Snyder and M. Palmer. 2004. The
Page 56 and 57: it is even used as a black box that
Page 58 and 59: ecognition in QA, so we will not us
Page 62 and 63: of noise introduced by the addition
Page 64 and 65: 2 Existing annotated corpora Much o
Page 66 and 67: Our FUSE|tel spectrum of HD|sta 738
Page 68 and 69: N Word Correct Tagged 23 OH mol non
Page 70 and 71: L ATEX typesetting information into
Page 72 and 73: in English, neither the determiner
Page 74 and 75: con 4 (Sagot et al., 2006) and Morp
Page 76 and 77: Feature type Positions/description
Page 78 and 79: Miriam Butt, Helge Dyvik, Tracy Hol
Page 80 and 81: ently mostly solved by employing hu
Page 82 and 83: Figure 1: Augmented SCT Lexicon Tok
Page 84 and 85: medical terms can be composed by ad
Page 86 and 87: References Aronson, A. R. (2001). E
Page 88 and 89: technique to most question types, i
Page 90 and 91: User Question Analyser QA System Se
Page 92 and 93: Figure 2: Precision Figure 3: Cover
Page 94 and 95: Analysis and Review. Springer-Verla
Page 96 and 97: 2 Background 2.1 Pronoun Reference
Page 98 and 99: ous accounts of the effects of cohe
Page 100 and 101: Table 1: Overall Results for Object
Page 102 and 103: References Jennifer Arnold, Janet E
Page 104 and 105: In order for appropriate documents
Page 106 and 107: y Michael West. He found that his t
Page 108 and 109: low-scoring documents are “Click
Page 110 and 111:
a) b) You must complete at least 9
Page 112 and 113:
uct). In this paper, we will only c
Page 114 and 115:
indicate the categories of substruc
Page 116 and 117:
Argument lowering Dependent geach
Page 118 and 119:
who [u : np] WH(np, s, s/?gq) λP.
Page 120 and 121:
algorithms, and report the performa
Page 122 and 123:
2.3 Coverage of the Human Data Out
Page 124 and 125:
and minimal subsets of the data. Of
Page 126 and 127:
output. Finally, and related to the
Page 128 and 129:
identifies, to avoid overwhelming t
Page 130 and 131:
y the student is first parsed, usin
Page 132 and 133:
a given word sequence Sorig as a pe
Page 134 and 135:
A problem arises with this scheme w
Page 136 and 137:
Example 1: Original Headline: Europ
Page 138 and 139:
|relations(sentence1) ∩ relations
Page 140 and 141:
2685 True False 1275 Bleu Dep Bleu
Page 142 and 143:
Features Acc. C1-prec. C1-recall C1
Page 144 and 145:
Given the lack of research in VSD u
Page 146 and 147:
ased features: Type 1 Original sibl
Page 148 and 149:
Preposition of the adjunct semantic
Page 150 and 151:
Algorithm 1 The algorithm of combin
Page 152 and 153:
References Timothy Baldwin and Fran
Page 154 and 155:
dependencies in German with respect
Page 156 and 157:
wij moeten dit probleem aanpakken w
Page 158 and 159:
a brevity penalty of BLEU n = 1 n =
Page 160 and 161:
plicitly matching the syntax of the
Page 162 and 163:
Probabilities improve stress-predic
Page 164 and 165:
Towards Cognitive Optimisation of a
Page 166 and 167:
Natural Language Processing and XML
Page 168 and 169:
Extracting Patient Clinical Profile
show all

Automatic Mapping Clinical Notes to Medical - RMIT University

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?