Automatic Mapping Clinical Notes to Medical - RMIT University

More documents

Recommendations

Info

$journal of latex class files, vol. 6, no. 1, january ... - RMIT University$

identifies, to avoid overwhelming the student with negative feedback, or to concentrate on a particular educational topic. However, it is in the nature of a tutorial dialogue that the tutor frequently picks up on a student’s errors.) In this domain, therefore, it is particularly important to get error correction right. The focus of the paper is on our system’s mechanism for error correction, which is tightly integrated with the mechanism for utterance disambiguation. Our claim is that a semantically rich utterance disambiguation scheme can be extended relatively easily to support a sophisticated model of error correction, including the syntactic and semantic errors which are hard for surface based ngram models of context. We will begin in Section 2 by reviewing the kinds of error found in language-learning dialogues. In Section 3, we discuss some different approaches to modelling language errors, and outline our own approach. In Section 4 we introduce our dialogue-based CALL system. In Section 5 we discuss our approach to error correction in detail: the basic suggestion is to create perturbations of the original sentence and interpret these alongside the original sentence, letting the regular utterance disambiguation module decide which interpretation is most likely. In Section 6 we discuss some examples of our system in action, and in Section 7, we discuss how the model can be extended with a treatment of unknown words. 2 The types of error found in language-learning dialogues Learners of a language can be expected to make more errors than native speakers. If we restrict our errors to those present in typed utterances, some types of error are essentially the same—in particular, we can expect a similar proportion of typos in learners as in native speakers. Other types of error will be qualitatively similar to those made by native speakers, but quantitatively more prevalent— for instance, we expect to find more spelling mistakes in learners than in native speakers, but the mechanisms for detecting and correcting these are likely to be similar. However, there are some types of error which we are likely to find only in language learners. We will consider two examples here. Firstly, there are grammatical errors. The learner of a language does not have a firm grasp 122 of the grammatical rules of the language, and is likely to make mistakes. These typically result in syntactically ill-formed sentences: (1) T: How are you feeling? 1 S: I feeling well. It may be that a bigram-based technique can identify errors of this kind. But it is less likely that such a technique can reliably correct such errors. Corrections are likely to be locally suitable (i.e. within a window of two or three words), but beyond this there is no guarantee that the corrected sentence will be grammatically correct. In a CALL system, great care must be taken to ensure that any corrections suggested are at least syntactically correct. Another common type of errors are vocabulary errors. Learners often confuse one word for another, either during interpretation of the tutor’s utterances or generation of their own utterances. These can result in utterances which are syntactically correct, but factually incorrect. (2) T: Where is the bucket? S: It is on the flour. [meaning ‘floor’] (Note that vocabulary errors can manifest themselves as grammatical errors if the wrongly used word is of a different syntactic category.) To detect errors of this sort, the system must have a means of checking utterances against a model of relevant facts in the world. Thirdly, there are pragmatic errors, which involve an utterance which is out of place in the current dialogue context. (3) T: How are you feeling? S: You are feeling well. These errors can result from a failure to comprehend something in the preceding dialogue, or from a grammatical or vocabulary error which happens to result in a syntactically well-formed sentence. To detect and correct errors of this type, a model of coherent dialogue is needed—in particular, a model of the relationship between questions and answers. These three types of error are relatively common in language-learning dialogues. Detecting them requires relatively deep syntactic and semantic processing of the utterances in the dialogue. 1 T stands for ‘tutor’ in these examples, and S stands for ‘student’.
Such processing is not yet feasible in an unrestricted dialogue with a native speaker—however, in a language-learning dialogue, there are several extra constraints which make it more feasible. Firstly, a language learner has a much smaller grammar and vocabulary than a native speaker. Thus it may well be feasible to build a grammar which covers all of the constructions and words which the user currently knows. If the system’s grammar is relatively small, it may be possible to limit the explosion of ambiguities which are characteristic of wide-coverage grammars. Secondly, the semantic domain of a CALL dialogue is likely to be quite well circumscribed. For one thing, the topics of conversation are limited by the syntax and vocabulary of the student. In practice, the topic of conversation is frequently dictated by the tutor; there is a convention that the tutor is responsible for determining the content of languagelearning exercises. Students are relatively happy to engage in semantically trivial dialogues when learning a language, because the content of the dialogue is not the main point; it is simply a means to the end of learning the language. In summary, while CALL dialogues create some special problems for an error correction system, they also well suited to the deep utterance interpretation techniques which are needed to provide the solutions to these problems. 3 Alternative frameworks for modelling language errors There are several basic schemes for modelling language errors. One scheme is to construct specialised error grammars, which explicitly express rules governing the structures of sentences containing errors. The parse tree for an errorcontaining utterance then provides very specific information about the error that has been made. We have explored using a system of this kind (Vlugter et al., (2004), and others have pursued this direction quite extensively (see e.g. Michaud et al. (2001); Bender et al. (2004); Foster and Vogel (2004)). This scheme can be very effective— however, creating the error rules is a very specialised job, which has to be done by a grammar writer. We would prefer a system which makes it easy for language teachers to provide input about the most likely types of error made by students. Another scheme is to introduce ways of relaxing the constraints imposed by a grammar if a sen- 123 tence cannot be parsed. The relaxation which results in a successful parse provides information about the type of error which has occurred. This technique has been used effectively by Menzel and Schröder (1998), and similar techniques have been used by Fouvry (2003) for robust parsing. However, as Foster and Vogel note, the technique has problems dealing with errors involving additions or deletions of whole words. In addition, the model of errors is again something considerably more complicated than the models which teachers use when analysing students’ utterances and providing feedback. In the scheme we propose, the parser is left unchanged, only accepting syntactically correct sentences; however, more than one initial input string is sent to the parser. In our scheme, the student’s utterance is first permuted in different ways, in accordance with a set of hypotheses about wordlevel or character-level errors which might have occurred. There are two benefits to this scheme. Firstly, hypotheses are expressed a ‘surface’ level, in a way which is easy for non-specialists to understand. Secondly, creating multiple input strings in this way allows the process of error correction to be integrated neatly with the process of utterance disambiguation, as will be explained below. 4 Utterance interpretation and disambiguation in our dialogue system Our CALL dialogue system, called Te Kaitito (Vlugter et al. (2004); Knott (2004); Slabbers and Knott (2005)) is designed to assist a student to learn M āori. The system can ‘play’ one or more characters, each of which enters the dialogue with a private knowledge base of facts and an agenda of dialogue moves to make (principally questions to ask the student about him/herself). Each lesson is associated with an agenda of grammatical constructions which the student must show evidence of having assimilated. The system supports a mixed-initiative multi-speaker dialogue: system characters generate initiatives which (if possible) are relevant to the current topic, and feature grammatical constructions which the student has not yet assimilated. System characters can also ask ‘checking’ questions, to explicitly check the student’s assimilation of material presented earlier in the dialogue. The system’s utterance interpretation mechanism takes the form of a pipeline. An utterance
Page 1:
Proceeding of the Australasian Lang
Page 4 and 5:
Program Chairs: Committees Lawrence
Page 6 and 7:
Towards the Evaluation of Referring
Page 8 and 9:
The cat ate the hat that I made NP/
Page 10 and 11:
Figure 2: Cells affected by adding
Page 12 and 13:
were used because the accuracy for
Page 14 and 15:
TIME ACC COVER FAIL RATE % secs % %
Page 16 and 17:
model. We explore different estimat
Page 18 and 19:
S.S. Source Sense Source S.S. Acc.
Page 20 and 21:
acy. The difference in correctly as
Page 22 and 23:
Experiments with Sentence Classific
Page 24 and 25:
datasets with skewed distribution t
Page 26 and 27:
words produced by the feature selec
Page 28 and 29:
Sentence Class NB DT SVM Apology 1.
Page 30 and 31:
Computational Semantics in the Natu
Page 32 and 33:
S[sem = ] -> NP[sem=?subj] VP[sem=?
Page 34 and 35:
Any string which cannot be decompos
Page 36 and 37:
satisfy(Formula,model(D,F),G,pos):n
Page 38 and 39:
Classifying Speech Acts using Verba
Page 40 and 41:
The principles are dichotomous —
Page 42 and 43:
ance. Several additional example ut
Page 44 and 45:
our results. In classifying only th
Page 46 and 47:
Word Relatives in Context for Word
Page 48 and 49:
create a collection of training exa
Page 50 and 51:
Query Sense The nave was rebuilt in
Page 52 and 53:
Algorithm Avg S2LS Avg S3LS Avg S2L
Page 54 and 55:
B. Snyder and M. Palmer. 2004. The
Page 56 and 57:
it is even used as a black box that
Page 58 and 59:
ecognition in QA, so we will not us
Page 60 and 61:
BPER ILOC IPER BLOC BLOC BDATE BLOC
Page 62 and 63:
of noise introduced by the addition
Page 64 and 65:
2 Existing annotated corpora Much o
Page 66 and 67:
Our FUSE|tel spectrum of HD|sta 738
Page 68 and 69:
N Word Correct Tagged 23 OH mol non
Page 70 and 71:
L ATEX typesetting information into
Page 72 and 73:
in English, neither the determiner
Page 74 and 75:
con 4 (Sagot et al., 2006) and Morp
Page 76 and 77:
Feature type Positions/description
Page 78 and 79: Miriam Butt, Helge Dyvik, Tracy Hol
Page 80 and 81: ently mostly solved by employing hu
Page 82 and 83: Figure 1: Augmented SCT Lexicon Tok
Page 84 and 85: medical terms can be composed by ad
Page 86 and 87: References Aronson, A. R. (2001). E
Page 88 and 89: technique to most question types, i
Page 90 and 91: User Question Analyser QA System Se
Page 92 and 93: Figure 2: Precision Figure 3: Cover
Page 94 and 95: Analysis and Review. Springer-Verla
Page 96 and 97: 2 Background 2.1 Pronoun Reference
Page 98 and 99: ous accounts of the effects of cohe
Page 100 and 101: Table 1: Overall Results for Object
Page 102 and 103: References Jennifer Arnold, Janet E
Page 104 and 105: In order for appropriate documents
Page 106 and 107: y Michael West. He found that his t
Page 108 and 109: low-scoring documents are “Click
Page 110 and 111: a) b) You must complete at least 9
Page 112 and 113: uct). In this paper, we will only c
Page 114 and 115: indicate the categories of substruc
Page 116 and 117: Argument lowering Dependent geach
Page 118 and 119: who [u : np] WH(np, s, s/?gq) λP.
Page 120 and 121: algorithms, and report the performa
Page 122 and 123: 2.3 Coverage of the Human Data Out
Page 124 and 125: and minimal subsets of the data. Of
Page 126 and 127: output. Finally, and related to the
Page 130 and 131: y the student is first parsed, usin
Page 132 and 133: a given word sequence Sorig as a pe
Page 134 and 135: A problem arises with this scheme w
Page 136 and 137: Example 1: Original Headline: Europ
Page 138 and 139: |relations(sentence1) ∩ relations
Page 140 and 141: 2685 True False 1275 Bleu Dep Bleu
Page 142 and 143: Features Acc. C1-prec. C1-recall C1
Page 144 and 145: Given the lack of research in VSD u
Page 146 and 147: ased features: Type 1 Original sibl
Page 148 and 149: Preposition of the adjunct semantic
Page 150 and 151: Algorithm 1 The algorithm of combin
Page 152 and 153: References Timothy Baldwin and Fran
Page 154 and 155: dependencies in German with respect
Page 156 and 157: wij moeten dit probleem aanpakken w
Page 158 and 159: a brevity penalty of BLEU n = 1 n =
Page 160 and 161: plicitly matching the syntax of the
Page 162 and 163: Probabilities improve stress-predic
Page 164 and 165: Towards Cognitive Optimisation of a
Page 166 and 167: Natural Language Processing and XML
Page 168 and 169: Extracting Patient Clinical Profile
show all

Automatic Mapping Clinical Notes to Medical - RMIT University

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?