Automatic Mapping Clinical Notes to Medical - RMIT University

More documents

Recommendations

Info

$journal of latex class files, vol. 6, no. 1, january ... - RMIT University$

Figure 2: Precision Figure 3: Coverage The precision, coverage and redundancy obtained for the TREC 2004 and 2005 questions regarding people’s name are respectively shown in Figures 2, 3 and 4. We note that the results for the feedback technique do not improve the results on neither T nor TQ on any of the measures we obtained. As expected, the addition of the answer on TQA represents the optimal retrieval set, obtaining the coverage of 86% on the first document per question and over 90% on the second. The noise introduced on TQEA is not a major concern when the answers are involved in the Figure 4: Redundancy 86 query. This is an indication that most entities found by the feedback mechanism do not represent an answer. This raises two issues: how to improve the technique so that the answers are included in the feedback; and how to minimise the noise so that a potential good feedback is not worsened. To address the first problem we can foresee two solutions: one is improving the accuracy of the named-entity recogniser, something we cannot address in this study. The other is increasing the search space without adding more noise in the query. This is a difficult task and it could be achieved by finding the smallest possible windows of text containing the answer on several documents. We performed some experiments using different numbers of documents and variable passage size, at the moment fewer documents and smaller passages provide our best results. We understand that documents in the first retrieval set R1 will contain named-entities of the same type, but not necessarily the correct one (the answer), thus creating some noise in the query. We believed that a certain degree of noise would not hurt the retrieval performance. However, our experiments, as shown, demonstrate otherwise. The noise created by erroneous entities affects the performance once the elements in E become more important than the elements in Q. Because we cannot guarantee the correctness of any of the namedentities included in E, the resulting retrieval set R2 might represent a worse retrieval set than R1. However, these cases may not influence the results in a QA system since R1 would also not lead to the correct result. This shows that our feedback technique suffers from the same flaws most pseudo-feedback techniques have. For instance Ruthven and Lalmas (2003) show that when the initial retrieval set is not good, the pseudo-feedback techniques is likely to worsen the results because, instead of bringing the query closer to the topic at hand, it will take it further away (a phenomenon called query drift). We hypothesise that since our technique is meant to be applicable over a QA system, if the initial set of results is bad (i.e. it does not contain the answer), there is not much that can be worsened. To confirm this hypothesis, it is necessary to perform an evaluation over a QA system. Table 2 shows the runs of QA performed using the same set of questions of the intrinsic evaluation and the documents retrieved by the retrieval sets
Table 2: Correct Answers on AnswerFinder Run Exact T 19.6% TQ 28.6% TQE 23.2% TQA 28.6% TQEA 32.1% shown before. What can be observed here is that the feedback technique (TQE) offers a better set of documents than the one using only the topics (T). However, they are still worse than the topic and question ones (TQ). An interesting result is that TQEA is the best run, which may show that the inclusion of entities can help improve QA. We have not yet performed a deep analysis of this case to verify its cause. Even though our process did not show improvements over the baseline techniques, it was very important to find that the results of the (intrinsic) evaluations of the IR component do not parallel the results of the (extrinsic) evaluation of the QA system. In spite of the fact that high precision, coverage and redundancy represent a better chance of finding answers, we show that they do not guarantee a better performance over a QA system. Comparing the results of T and TQ it is possible to observe that they are very similar on the intrinsic evaluation and quite different on the QA system. Therefore, what appears to help question answering is the presence of more context words so that the answers not only appear in the document but are also present in the context of the questions. This is mostly due to the fact that most QA systems tend to work with full discourse units, such as sentences and paragraphs, and the selection of those are normally based on words from the topic and the question. In summary our experiments did not confirm the hypothesis that named-entities feedback would help improving QA. But, in the ideal situations where the answers are identified and included in the queries, the improvements are clear under an intrinsic evaluation. The differences between the intrinsic evaluation and extrinsic one point out that there are many issues that IR metrics are not currently covering. 5 Concluding Remarks and Future Work In this paper, we have looked at whether a pseudo relevance feedback mechanism could help the QA 87 process, on the assumption that a good indication of a document relevancy for its usage on a QA system is the presence of named entities of the same class required as the answer for a certain question. Our assumption was based on the fact that documents not containing those entities are less likely to help provide the correct answer and every entity of the right type has a probability of being the answer. We have described our evaluation of the hypothesis using known IR metrics and a QA system. Our main conclusions are: • Because we have not yet reported satisfactory results, we believe that even though the method is conceptually sound, it will not produce good results unless a more sophisticated control over the introduced noise is achieved; and • The evaluation of the technique brought to our attention the fact that it is not possible to state that a retrieval technique is better just by relying on conventional IR evaluation metrics. The differences on the intrinsic and extrinsic evaluations demonstrate that there are many hidden variables that are not taken into account in metrics such as precision, coverage and redundancy. As further work, we plan to repeat our evaluation using different QA systems, since other QA systems may give preference to different text features offering a better insight on how the IR evaluation metrics correlate with the QA results. We are also planning to use the named-entity recogniser that is being developed by our research group and to extend the system to use a more advanced passage retrieval algorithm. References Susan Dumais, Michele Banko, Eric Brill, Jimmy Lin, and Andrew Ng. 2002. Web question answering: is more always better? In SIGIR ’02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pages 291–298, Tampere, Finland. ACM Press. David Graff. 2002. The AQUAINT corpus of english news text. CDROM. ISBN: 1-58563-240-6. Karen Spärck Jones and Julia R. Galliers. 1996. Evaluating Natural Language Processing Systems: An
Page 1:
Proceeding of the Australasian Lang
Page 4 and 5:
Program Chairs: Committees Lawrence
Page 6 and 7:
Towards the Evaluation of Referring
Page 8 and 9:
The cat ate the hat that I made NP/
Page 10 and 11:
Figure 2: Cells affected by adding
Page 12 and 13:
were used because the accuracy for
Page 14 and 15:
TIME ACC COVER FAIL RATE % secs % %
Page 16 and 17:
model. We explore different estimat
Page 18 and 19:
S.S. Source Sense Source S.S. Acc.
Page 20 and 21:
acy. The difference in correctly as
Page 22 and 23:
Experiments with Sentence Classific
Page 24 and 25:
datasets with skewed distribution t
Page 26 and 27:
words produced by the feature selec
Page 28 and 29:
Sentence Class NB DT SVM Apology 1.
Page 30 and 31:
Computational Semantics in the Natu
Page 32 and 33:
S[sem = ] -> NP[sem=?subj] VP[sem=?
Page 34 and 35:
Any string which cannot be decompos
Page 36 and 37:
satisfy(Formula,model(D,F),G,pos):n
Page 38 and 39:
Classifying Speech Acts using Verba
Page 40 and 41:
The principles are dichotomous —
Page 42 and 43: ance. Several additional example ut
Page 44 and 45: our results. In classifying only th
Page 46 and 47: Word Relatives in Context for Word
Page 48 and 49: create a collection of training exa
Page 50 and 51: Query Sense The nave was rebuilt in
Page 52 and 53: Algorithm Avg S2LS Avg S3LS Avg S2L
Page 54 and 55: B. Snyder and M. Palmer. 2004. The
Page 56 and 57: it is even used as a black box that
Page 58 and 59: ecognition in QA, so we will not us
Page 60 and 61: BPER ILOC IPER BLOC BLOC BDATE BLOC
Page 62 and 63: of noise introduced by the addition
Page 64 and 65: 2 Existing annotated corpora Much o
Page 66 and 67: Our FUSE|tel spectrum of HD|sta 738
Page 68 and 69: N Word Correct Tagged 23 OH mol non
Page 70 and 71: L ATEX typesetting information into
Page 72 and 73: in English, neither the determiner
Page 74 and 75: con 4 (Sagot et al., 2006) and Morp
Page 76 and 77: Feature type Positions/description
Page 78 and 79: Miriam Butt, Helge Dyvik, Tracy Hol
Page 80 and 81: ently mostly solved by employing hu
Page 82 and 83: Figure 1: Augmented SCT Lexicon Tok
Page 84 and 85: medical terms can be composed by ad
Page 86 and 87: References Aronson, A. R. (2001). E
Page 88 and 89: technique to most question types, i
Page 90 and 91: User Question Analyser QA System Se
Page 94 and 95: Analysis and Review. Springer-Verla
Page 96 and 97: 2 Background 2.1 Pronoun Reference
Page 98 and 99: ous accounts of the effects of cohe
Page 100 and 101: Table 1: Overall Results for Object
Page 102 and 103: References Jennifer Arnold, Janet E
Page 104 and 105: In order for appropriate documents
Page 106 and 107: y Michael West. He found that his t
Page 108 and 109: low-scoring documents are “Click
Page 110 and 111: a) b) You must complete at least 9
Page 112 and 113: uct). In this paper, we will only c
Page 114 and 115: indicate the categories of substruc
Page 116 and 117: Argument lowering Dependent geach
Page 118 and 119: who [u : np] WH(np, s, s/?gq) λP.
Page 120 and 121: algorithms, and report the performa
Page 122 and 123: 2.3 Coverage of the Human Data Out
Page 124 and 125: and minimal subsets of the data. Of
Page 126 and 127: output. Finally, and related to the
Page 128 and 129: identifies, to avoid overwhelming t
Page 130 and 131: y the student is first parsed, usin
Page 132 and 133: a given word sequence Sorig as a pe
Page 134 and 135: A problem arises with this scheme w
Page 136 and 137: Example 1: Original Headline: Europ
Page 138 and 139: |relations(sentence1) ∩ relations
Page 140 and 141: 2685 True False 1275 Bleu Dep Bleu
Page 142 and 143:
Features Acc. C1-prec. C1-recall C1
Page 144 and 145:
Given the lack of research in VSD u
Page 146 and 147:
ased features: Type 1 Original sibl
Page 148 and 149:
Preposition of the adjunct semantic
Page 150 and 151:
Algorithm 1 The algorithm of combin
Page 152 and 153:
References Timothy Baldwin and Fran
Page 154 and 155:
dependencies in German with respect
Page 156 and 157:
wij moeten dit probleem aanpakken w
Page 158 and 159:
a brevity penalty of BLEU n = 1 n =
Page 160 and 161:
plicitly matching the syntax of the
Page 162 and 163:
Probabilities improve stress-predic
Page 164 and 165:
Towards Cognitive Optimisation of a
Page 166 and 167:
Natural Language Processing and XML
Page 168 and 169:
Extracting Patient Clinical Profile
show all

Automatic Mapping Clinical Notes to Medical - RMIT University

Create successful ePaper yourself

Delete template?

Save as template?