Automatic Mapping Clinical Notes to Medical - RMIT University

More documents

Recommendations

Info

$journal of latex class files, vol. 6, no. 1, january ... - RMIT University$

output. Finally, and related to the point above, it is not at all obvious that numeric measures like precision and recall make any sense in assessing generation system output. A generation system that replicates most or all of the outputs produced by humans, while overgenerating as little as possible, would clearly be highly adequate. However, we cannot automatically penalise systems for generating outputs that have not, so far, been seen in human-produced data. Our analysis makes it seem likely that the impracticability of constructing a gold standard data set will prove itself as the core problem in designing tasks and metrics for the evaluation of systems for the generation of referring expressions and of NLG systems in general. There are various ways in which we might deal with this difficulty, which will need to be examined in turn. One possible way forward would be to take a more detailed look at the solutions that other tasks with output in the form of natural language, such as machine translation and text summarisation, have found for their evaluation approaches. We might also come to the conclusion that we can make do with a theoretically ‘imperfect’ evaluation task that works well enough to be able to assess any systems conceivably to be developed in the near or medium term. Although we concede that a lot of groundwork still needs to be done, we are convinced that a more standardised evaluation approach is important for the advancement of the field of NLG. References Akiba, Y., Imamura, K., and Sumita, E. 2001. Using multiple edit distances to automatically rank machine translation output. In Proceedings of the MT Summit VIII, 15–20, Santiago de Compostela, Spain. Arts, A. 2004. Overspecification in Instructive Texts. Ph.D. thesis, Tilburg University. Bangalore, S., Rambow, O., and Whittaker, S. 2000. Evaluation metrics for generation. In Proceedings of the First International Natural Language Generation Conference, 1–8, Mitzpe Ramon, Israel. Belz, A. and Kilgarriff, A. 2006. Shared-task evaluations in HLT: Lessons for NLG. In Proceedings of the Fourth International Natural Language Generation Conference, 133–135, Sydney, Australia. Belz, A. and Reiter, E. 2006. Comparing automatic and human evaluation of NLG systems. In Proceedings of the Eleventh Conference of the European 120 Chapter of the Association for Computational Linguistics, 313–320, Trento, Italy. Carletta, J. 1992. Risk-taking and Recovery in Task- Oriented Dialogue. Ph.D. thesis, University of Edinburgh. Dale, R. and Haddock, N. 1991. Generating referring expressions involving relations. In Proceedings of the Fifth Conference of the European Chapter of the ACL, 161–166, Berlin, Germany. Dale, R. and Reiter, E. 1995. Computational interpretations of the Gricean maxims in the generation of referring expressions. Cognitive Science, 19(2):233–263. Dale, R. 1989. Cooking up referring expressions. In Proceedings of the 27th Annual Meeting of the Association for Computational Linguistics, 68–75, Vancouver, British Columbia. Nenkova, A. and Passonneau, R. 2004. Evaluating content selection in summarization: The pyramid method. In Main Proceedings of HLT-NAACL 2004, 145–152, Boston, Massachusetts, USA. Paris, C., Colineau, N., and Wilkinson, R. 2006. Evaluations of nlg systems: Common corpus and tasks or common dimensions and metrics? In Proceedings of the Fourth International Natural Language Generation Conference, 127–129, Sydney, Australia. Association for Computational Linguistics. Reiter, E. and Belz, A. 2006. Geneval: A proposal for shared-task evaluation in NLG. In Proceedings of the Fourth International Natural Language Generation Conference, 136–138, Sydney, Australia. Reiter, E. and Dale, R. 1992. A fast algorithm for the generation of referring expressions. In Proceedings of the 14th International Conference on Computational Linguistics, 232–238, Nantes, France. Reiter, E. and Sripada, S. 2002. Should corpora texts be gold standards for NLG? In Proceedings of the Second International Natural Language Generation Conference, 97–104, New York, USA. van Deemter, K. and Halldórsson, M. M. 2001. Logical form equivalence: The case of referring expressions generation. In Proceedings of the Eighth European Workshop on Natural Language Generation, Toulouse, France. van Deemter, K., van der Sluis, I., and Gatt, A. 2006. Building a semantically transparent corpus for the generation of referring expressions. In Proceedings of the Fourth International Natural Language Generation Conference, 130–132, Sydney, Australia. Viethen, J. and Dale, R. 2006. Algorithms for generating referring expressions: Do they do what people do? In Proceedings of the Fourth International Natural Language Generation Conference, 63–70, Sydney, Australia.
Error correction using utterance disambiguation techniques Peter Vlugter and Edwin van der Ham and Alistair Knott Dept of Computer Science University of Otago Abstract This paper describes a mechanism for identifying errors made by a student during a computer-aided language learning dialogue. The mechanism generates a set of ‘perturbations’ of the student’s original typed utterance, each of which embodies a hypothesis about an error made by the student. Perturbations are then passed through the system’s ordinary utterance interpretation pipeline, along with the student’s original utterance. An utterance disambiguation algorithm selects the best interpretation, performing error correction as a side-effect. 1 Introduction The process of identifying and correcting the errors in an input utterance has been extensively studied. Kukich (1992) discusses three progressively more difficult tasks. The first task is the identification of nonwords in the utterance. A nonword is by definition a word which is not part of the language, and which therefore must must have been misspelled or mistyped. The difficulty in identifying nonwords is due to the impossibility of assembling a list of all the actual words in a language; some classes of actual words (in particular proper names) are essentially unbounded. So the system should have a reliable way of identifying when an unknown word is likely to belong to such a class. The second task is to suggest corrections for individual nonwords, based purely on their similarity to existing words, and a model of the likelihood of different sorts of errors. This task is performed quite well by the current generation of spellcheckers, and is to some extent a solved problem. The third task is to detect and correct valid word errors—that is, errors which have resulted in words (for instance their misspelled as there). This task requires the use of context: the error can only be detected by identifying that a word is out of place in its current context. Probabilistic language models which estimate the likelihood of words based on their neighbouring words have been used quite successfully to identify valid word errors (see e.g. Golding and Schabes (1996); Mangu and Brill (1997); Brill and Moore (2000)); however, there are situations where these techniques are not able to identify the presence of errors, and a richer model of context is needed, making reference to syntax, semantics or pragmatics. In summary, two of the outstanding problems in automated error correction are identifying proper names and detecting and correcting errors which require a sophisticated model of context. In this paper, we consider a domain where these two problems arise with particular force: computer-aided language learning dialogues (or CALL dialogues. In this domain, the system plays the role of a language tutor, and the user is a student learning a target language: the student engages with the system in a dialogue on some preset topic. One of the system’s key roles is to identify errors in the student’s utterances and to correct these errors (either indirectly, by prompting the student, or directly, by reporting what the student should have said). In either case, it is crucial that the system makes correct diagnoses about student errors. While a regular spell-checker is relatively passive, simply identifying possibly misspelled words, a language tutor frequently takes interventions when detecting errors, and initiates subdialogues aimed at correcting them. (Of course, a tutor may choose to ignore some of the errors she Proceedings of the 2006 Australasian Language Technology Workshop (ALTW2006), pages 121–128. 121
Page 1:
Proceeding of the Australasian Lang
Page 4 and 5:
Program Chairs: Committees Lawrence
Page 6 and 7:
Towards the Evaluation of Referring
Page 8 and 9:
The cat ate the hat that I made NP/
Page 10 and 11:
Figure 2: Cells affected by adding
Page 12 and 13:
were used because the accuracy for
Page 14 and 15:
TIME ACC COVER FAIL RATE % secs % %
Page 16 and 17:
model. We explore different estimat
Page 18 and 19:
S.S. Source Sense Source S.S. Acc.
Page 20 and 21:
acy. The difference in correctly as
Page 22 and 23:
Experiments with Sentence Classific
Page 24 and 25:
datasets with skewed distribution t
Page 26 and 27:
words produced by the feature selec
Page 28 and 29:
Sentence Class NB DT SVM Apology 1.
Page 30 and 31:
Computational Semantics in the Natu
Page 32 and 33:
S[sem = ] -> NP[sem=?subj] VP[sem=?
Page 34 and 35:
Any string which cannot be decompos
Page 36 and 37:
satisfy(Formula,model(D,F),G,pos):n
Page 38 and 39:
Classifying Speech Acts using Verba
Page 40 and 41:
The principles are dichotomous —
Page 42 and 43:
ance. Several additional example ut
Page 44 and 45:
our results. In classifying only th
Page 46 and 47:
Word Relatives in Context for Word
Page 48 and 49:
create a collection of training exa
Page 50 and 51:
Query Sense The nave was rebuilt in
Page 52 and 53:
Algorithm Avg S2LS Avg S3LS Avg S2L
Page 54 and 55:
B. Snyder and M. Palmer. 2004. The
Page 56 and 57:
it is even used as a black box that
Page 58 and 59:
ecognition in QA, so we will not us
Page 60 and 61:
BPER ILOC IPER BLOC BLOC BDATE BLOC
Page 62 and 63:
of noise introduced by the addition
Page 64 and 65:
2 Existing annotated corpora Much o
Page 66 and 67:
Our FUSE|tel spectrum of HD|sta 738
Page 68 and 69:
N Word Correct Tagged 23 OH mol non
Page 70 and 71:
L ATEX typesetting information into
Page 72 and 73:
in English, neither the determiner
Page 74 and 75:
con 4 (Sagot et al., 2006) and Morp
Page 76 and 77: Feature type Positions/description
Page 78 and 79: Miriam Butt, Helge Dyvik, Tracy Hol
Page 80 and 81: ently mostly solved by employing hu
Page 82 and 83: Figure 1: Augmented SCT Lexicon Tok
Page 84 and 85: medical terms can be composed by ad
Page 86 and 87: References Aronson, A. R. (2001). E
Page 88 and 89: technique to most question types, i
Page 90 and 91: User Question Analyser QA System Se
Page 92 and 93: Figure 2: Precision Figure 3: Cover
Page 94 and 95: Analysis and Review. Springer-Verla
Page 96 and 97: 2 Background 2.1 Pronoun Reference
Page 98 and 99: ous accounts of the effects of cohe
Page 100 and 101: Table 1: Overall Results for Object
Page 102 and 103: References Jennifer Arnold, Janet E
Page 104 and 105: In order for appropriate documents
Page 106 and 107: y Michael West. He found that his t
Page 108 and 109: low-scoring documents are “Click
Page 110 and 111: a) b) You must complete at least 9
Page 112 and 113: uct). In this paper, we will only c
Page 114 and 115: indicate the categories of substruc
Page 116 and 117: Argument lowering Dependent geach
Page 118 and 119: who [u : np] WH(np, s, s/?gq) λP.
Page 120 and 121: algorithms, and report the performa
Page 122 and 123: 2.3 Coverage of the Human Data Out
Page 124 and 125: and minimal subsets of the data. Of
Page 128 and 129: identifies, to avoid overwhelming t
Page 130 and 131: y the student is first parsed, usin
Page 132 and 133: a given word sequence Sorig as a pe
Page 134 and 135: A problem arises with this scheme w
Page 136 and 137: Example 1: Original Headline: Europ
Page 138 and 139: |relations(sentence1) ∩ relations
Page 140 and 141: 2685 True False 1275 Bleu Dep Bleu
Page 142 and 143: Features Acc. C1-prec. C1-recall C1
Page 144 and 145: Given the lack of research in VSD u
Page 146 and 147: ased features: Type 1 Original sibl
Page 148 and 149: Preposition of the adjunct semantic
Page 150 and 151: Algorithm 1 The algorithm of combin
Page 152 and 153: References Timothy Baldwin and Fran
Page 154 and 155: dependencies in German with respect
Page 156 and 157: wij moeten dit probleem aanpakken w
Page 158 and 159: a brevity penalty of BLEU n = 1 n =
Page 160 and 161: plicitly matching the syntax of the
Page 162 and 163: Probabilities improve stress-predic
Page 164 and 165: Towards Cognitive Optimisation of a
Page 166 and 167: Natural Language Processing and XML
Page 168 and 169: Extracting Patient Clinical Profile
show all

Automatic Mapping Clinical Notes to Medical - RMIT University

Create successful ePaper yourself

Delete template?

Save as template?