Automatic Mapping Clinical Notes to Medical - RMIT University

More documents

Recommendations

Info

$journal of latex class files, vol. 6, no. 1, january ... - RMIT University$

A problem arises with this scheme when it is used in conjunction with error-correction: wherever it is possible to use a proper name, hypothesising a proper name gives a higher aggregate score than hypothesising an error, all other things being equal. The problem is serious, because grammars typically allow proper names in many different places, in particular as preposed and postposed sentence adverbials functioning as addressee terms (see Knott et al. (2004)). To remedy this problem, it is important to attach a cost to the operation of hypothesising a proper name, comparable to that of hypothesising an error. In our (as-yet unimplemented) combined unknown-word and error-correction scheme, if there is an unknown word which can be interpreted as a proper name, the lexicon is updated prior to parsing, and perturbations are created as usual. A special unknown word cost is associated with the original utterance and with each of these perturbations, except any perturbations which alter the unknown word (and thus do not rely on the hypothesised lexical item). The unknown word cost is another number between 0 and 1, and the aggregate score of an interpretation is multiplied by this number when deciding amongst alternative interpretations. The number is set to be lower than the average perturbation score. If any perturbations of the unknown word survive the parsing process, they stand a good chance of being preferred over the proper name hypothesis, or at least being presented as alternatives to it. We will experiment with this extension to the error-correction algorithm in future work. References E Bender, D Flickinger, S Oepen, A Walsh, and T Baldwin. 2004. Arboretum: Using a precision grammar for grammar checking in CALL. In Proceedings of the In- STIL/ICALL Symposium, pages 83–87. E Brill and R Moore. 2000. An improved error model for noisy channel spelling correction. In Proceedings of ACL. A Copestake and D Flickinger. 2000. An open-source grammar development environment and broad-coverage English grammar using hpsg. In Proceedings of LREC 2000, Athens, Greece. A Copestake. 2000. The (new) LKB system. CSLI, Stanford University. J Foster and C Vogel. 2004. Parsing ill-formed text using an error grammar. Artificial Intelligence Review, 21(3– 4):269–291. 128 F Fouvry. 2003. Lexicon acquisition with a large-coverage unification-based grammar. In 10th Conference of the European Chapter of the Association for Computational Linguistics, Research notes and demos, Budapest, Hungary. R Golding and Y Schabes. 1996. Combining trigram-based and feature-based methods for context-sensitive spelling correction. In Proceedings of the Thirty-Fourth Annual Meeting of the Association for Computational Linguistics. J Hobbs, M Stickel, D Appelt, and P Martin. 1993. Interpretation as abduction. Artificial Intelligence, 63. A Knott and P Vlugter. 2003. Syntactic disambiguation using presupposition resolution. In Proceedings of the 4th Australasian Language Technology Workshop (ALTW2003), Melbourne. A Knott, I Bayard, and P Vlugter. 2004. Multi-agent humanmachine dialogue: issues in dialogue management and referring expression semantics. In Proceedings of the 8th Pacific Rim Conference on Artificial Intelligence (PRICAI 2004), pages 872–881, Auckland. Springer Verlag: Lecture Notes in AI. K Kukich. 1992. Techniques for automatically correcting words in text. Computing Surveys, 24(4):377–439. P Lurcock, P Vlugter, and A Knott. 2004. A framework for utterance disambiguation in dialogue. In Proceedings of the 2004 Australasian Language Technology Workshop (ALTW), pages 101–108, Macquarie University. P Lurcock. 2005. Techniques for utterance disambiguation in a human-computer dialogue system. MSc thesis, Dept of Computer Science, University of Otago. L Mangu and E Brill. 1997. Automatic rule acquisition for spelling correction. In Proceedings of the 14th International Conference on Machine Learning. W Menzel and I Schröder. 1998. Constraint-based diagnosis for intelligent language tutoring systems. L Michaud, K Mccoy, and L Stark. 2001. Modeling the acquisition of english: an intelligent CALL approach. In Proceedings of The 8th International Conference on User Modeling, pages 13–17, Sonthofen, Germany. N Slabbers. 2005. A system for generating teaching initiatives in a computer-aided language learning dialogue. Technical Report OUCS-2005-02, Department of Computer Science, University of Otago, Dunedin, New Zealand. E van der Ham. 2005. Diagnosing and responding to student errors in a dialogue-based computer-aided languagelearning system. Technical Report OUCS-2005-06, Department of Computer Science, University of Otago, Dunedin, New Zealand. M van Schagen and A Knott. 2004. Tauira: A tool for acquiring unknown words in a dialogue context. In Proceedings of the 2004 Australasian Language Technology Workshop (ALTW), pages 131–138, Macquarie University. P Vlugter, A Knott, and V Weatherall. 2004. A humanmachine dialogue system for CALL. In Proceedings of InSTIL/ICALL 2004: NLP and speech technologies in Advanced Language Learning Systems, pages 215–218, Venice.
Using Dependency-Based Features to Take the “Para-farce” out of Paraphrase Stephen Wan †‡ Mark Dras † Robert Dale † † Center for Language Technology Div of Information Communication Sciences Macquarie University Sydney, NSW 2113 swan,madras,rdale@ics.mq.edu.au Abstract As research in text-to-text paraphrase generation progresses, it has the potential to improve the quality of generated text. However, the use of paraphrase generation methods creates a secondary problem. We must ensure that generated novel sentences are not inconsistent with the text from which it was generated. We propose a machine learning approach be used to filter out inconsistent novel sentences, or False Paraphrases. To train such a filter, we use the Microsoft Research Paraphrase corpus and investigate whether features based on syntactic dependencies can aid us in this task. Like Finch et al. (2005), we obtain a classification accuracy of 75.6%, the best known performance for this corpus. We also examine the strengths and weaknesses of dependency based features and conclude that they may be useful in more accurately classifying cases of False Paraphrase. 1 Introduction In recent years, interest has grown in paraphrase generation methods. The use of paraphrase generation tools has been envisaged for applications ranging from abstract-like summarisation (see for example, Barzilay and Lee (2003), Daumé and Marcu (2005), Wan et al. (2005)), questionanswering (for example, Marsi and Krahmer (2005)) and Machine Translation Evaluation (for example, Bannard and Callison-Burch (2005) and Yves Lepage (2005)). These approaches all employ a loose definition of paraphrase attributable to Dras (1999), who defines a ‘paraphrase pair’ Cécile Paris ‡ ‡ Information and Communication Technologies CSIRO Sydney, Australia Cecile.Paris@csiro.au operationally to be “a pair of units of text deemed to be interchangeable”. Notably, such a definition of paraphrase lends itself easily to corpora based methods. Furthermore, what the more modern approaches share is the fact that often they generate new paraphrases from raw text not semantic representations. The generation of paraphrases from raw text is a specific type of what is commonly referred to as text-to-text generation (Barzilay and Lee, 2003). As techniques for generating paraphrases improve and basic concerns such as grammaticality are less of an issue, we are faced with an additional concern. That is, we must validate whether or not the generated novel sentence is in fact a paraphrase. It may be detrimental in some applications, for example abstract-like summarisation, to allow a novel sentence that is inconsistent with the content of the input text to be presented to the end user. As an example of the type of inconsistencies that can arise from paraphrase generation, in Figure 1, we present two examples of generated sentences. In each example, a sentence pair is presented in which the second sentence was generated from an input news article statistically using a four-gram language model and a probabilistic word selection module. Although other paraphrase generation approaches differ in their underlying mechanisms 1 , most generate a novel sentence that cannot be found verbatim in the input text. The generated second sentence of the example is intended to be a paraphrase of the article headline. One might be convinced that the first exam- 1 The details of the generation algorithm used for this example are peripheral to the focus of this paper and we direct the interested reader to Wan et al. (2005) for more details. Proceedings of the 2006 Australasian Language Technology Workshop (ALTW2006), pages 129–136. 129
Page 1:
Proceeding of the Australasian Lang
Page 4 and 5:
Program Chairs: Committees Lawrence
Page 6 and 7:
Towards the Evaluation of Referring
Page 8 and 9:
The cat ate the hat that I made NP/
Page 10 and 11:
Figure 2: Cells affected by adding
Page 12 and 13:
were used because the accuracy for
Page 14 and 15:
TIME ACC COVER FAIL RATE % secs % %
Page 16 and 17:
model. We explore different estimat
Page 18 and 19:
S.S. Source Sense Source S.S. Acc.
Page 20 and 21:
acy. The difference in correctly as
Page 22 and 23:
Experiments with Sentence Classific
Page 24 and 25:
datasets with skewed distribution t
Page 26 and 27:
words produced by the feature selec
Page 28 and 29:
Sentence Class NB DT SVM Apology 1.
Page 30 and 31:
Computational Semantics in the Natu
Page 32 and 33:
S[sem = ] -> NP[sem=?subj] VP[sem=?
Page 34 and 35:
Any string which cannot be decompos
Page 36 and 37:
satisfy(Formula,model(D,F),G,pos):n
Page 38 and 39:
Classifying Speech Acts using Verba
Page 40 and 41:
The principles are dichotomous —
Page 42 and 43:
ance. Several additional example ut
Page 44 and 45:
our results. In classifying only th
Page 46 and 47:
Word Relatives in Context for Word
Page 48 and 49:
create a collection of training exa
Page 50 and 51:
Query Sense The nave was rebuilt in
Page 52 and 53:
Algorithm Avg S2LS Avg S3LS Avg S2L
Page 54 and 55:
B. Snyder and M. Palmer. 2004. The
Page 56 and 57:
it is even used as a black box that
Page 58 and 59:
ecognition in QA, so we will not us
Page 60 and 61:
BPER ILOC IPER BLOC BLOC BDATE BLOC
Page 62 and 63:
of noise introduced by the addition
Page 64 and 65:
2 Existing annotated corpora Much o
Page 66 and 67:
Our FUSE|tel spectrum of HD|sta 738
Page 68 and 69:
N Word Correct Tagged 23 OH mol non
Page 70 and 71:
L ATEX typesetting information into
Page 72 and 73:
in English, neither the determiner
Page 74 and 75:
con 4 (Sagot et al., 2006) and Morp
Page 76 and 77:
Feature type Positions/description
Page 78 and 79:
Miriam Butt, Helge Dyvik, Tracy Hol
Page 80 and 81:
ently mostly solved by employing hu
Page 82 and 83:
Figure 1: Augmented SCT Lexicon Tok
Page 84 and 85: medical terms can be composed by ad
Page 86 and 87: References Aronson, A. R. (2001). E
Page 88 and 89: technique to most question types, i
Page 90 and 91: User Question Analyser QA System Se
Page 92 and 93: Figure 2: Precision Figure 3: Cover
Page 94 and 95: Analysis and Review. Springer-Verla
Page 96 and 97: 2 Background 2.1 Pronoun Reference
Page 98 and 99: ous accounts of the effects of cohe
Page 100 and 101: Table 1: Overall Results for Object
Page 102 and 103: References Jennifer Arnold, Janet E
Page 104 and 105: In order for appropriate documents
Page 106 and 107: y Michael West. He found that his t
Page 108 and 109: low-scoring documents are “Click
Page 110 and 111: a) b) You must complete at least 9
Page 112 and 113: uct). In this paper, we will only c
Page 114 and 115: indicate the categories of substruc
Page 116 and 117: Argument lowering Dependent geach
Page 118 and 119: who [u : np] WH(np, s, s/?gq) λP.
Page 120 and 121: algorithms, and report the performa
Page 122 and 123: 2.3 Coverage of the Human Data Out
Page 124 and 125: and minimal subsets of the data. Of
Page 126 and 127: output. Finally, and related to the
Page 128 and 129: identifies, to avoid overwhelming t
Page 130 and 131: y the student is first parsed, usin
Page 132 and 133: a given word sequence Sorig as a pe
Page 136 and 137: Example 1: Original Headline: Europ
Page 138 and 139: |relations(sentence1) ∩ relations
Page 140 and 141: 2685 True False 1275 Bleu Dep Bleu
Page 142 and 143: Features Acc. C1-prec. C1-recall C1
Page 144 and 145: Given the lack of research in VSD u
Page 146 and 147: ased features: Type 1 Original sibl
Page 148 and 149: Preposition of the adjunct semantic
Page 150 and 151: Algorithm 1 The algorithm of combin
Page 152 and 153: References Timothy Baldwin and Fran
Page 154 and 155: dependencies in German with respect
Page 156 and 157: wij moeten dit probleem aanpakken w
Page 158 and 159: a brevity penalty of BLEU n = 1 n =
Page 160 and 161: plicitly matching the syntax of the
Page 162 and 163: Probabilities improve stress-predic
Page 164 and 165: Towards Cognitive Optimisation of a
Page 166 and 167: Natural Language Processing and XML
Page 168 and 169: Extracting Patient Clinical Profile
show all

Automatic Mapping Clinical Notes to Medical - RMIT University

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?