Automatic Mapping Clinical Notes to Medical - RMIT University

More documents

Recommendations

Info

$journal of latex class files, vol. 6, no. 1, january ... - RMIT University$

algorithms, and report the performance of these algorithms in the chosen test domain. This leads us to identify three specific issues that arise for the evaluation of referring expression generation algorithms, and for NLG systems in general; we discuss these in the subsequent sections of the paper. Section 3 looks at the problem of input representations; Section 4 explores how the wide variety of acceptable outputs, and the lack of a single correct answer, makes it hard to assess generation algorithms; and Section 5 explores whether we can usefully provide a numeric measure of the performance of a generation algorithm. Finally, in Section 6 we point to some ways forward. 2 An Evaluation Experiment In (Viethen and Dale, 2006), we observed that surprisingly little existing work in natural language generation compares its output with natural language generated by humans, and argued that such a comparison is essential. To this end, we carried out an experiment consisting of three steps: 1. the collection of natural referring expressions for objects in a controlled domain, and the subsequent analysis of the data obtained; 2. the implementation of a knowledge base corresponding to the domain, and the reimplementation of three existing algorithms to operate in that domain; and 3. a detailed assessment of the algorithms’ performance against the set of human-produced referring expressions. In the remainder of this section we briefly describe these three stages. As we are mainly concerned here with the evaluation process, we refer to (Viethen and Dale, 2006) for a more detailed account of the experimental settings and an in-depth discussion of the results for the individual algorithms. 2.1 The Human-Generated Data Our test domain consists of four filing cabinets, each containing four vertically arranged drawers. The cabinets are placed directly next to each other, so that the drawers form a four-by-four grid as shown in Figure 1. Each drawer is labelled with a number between 1 and 16 and is coloured either blue, pink, yellow, or orange. There are four drawers of each colour distributed randomly over the grid. 114 Figure 1: The filing cabinets The human participants were given, on a number of temporally-separated occasions, a random number between 1 and 16, and then asked to provide a description of the corresponding drawer to an onlooker without using any of the numbers; this basically restricted the subjects to using either colour, location, or some combination of both to identify the intended referent. The characterisation of the task as one that required the onlooker to identify the drawer in question meant that the referring expressions produced had to be distinguishing descriptions; that is, each referring expression had to uniquely refer to the intended referent, but not to any of the other objects in the domain. The set of natural data we obtained from this experiment contains 140 descriptions. We filtered out 22 descriptions that were (presumably unintentionally) ambiguous or used reference to sets of drawers rather than only single drawers. As none of the algorithms we wanted to test aims to produce ambiguous referring expressions or handle sets of objects, it is clear that they would not be able to replicate these 22 descriptions. Thus the final set of descriptions used for the evaluation contained 118 distinct referring expressions. Referring expression generation algorithms typically are only concerned with selecting the semantic content for a description, leaving the details of syntactic realisation to a later stage in the language production process. We are therefore only interested in the semantic differences between the descriptions in our set of natural data, and not in superficial syntactic variations. The
primary semantic characteristics of a referring expression are the properties of the referent used to describe it. So, for example, the following two referring expressions for drawer d3 are semantically different: (1) The pink drawer in the first row, third column. (2) The pink drawer in the top. For us these are distinct referring expressions. We consider syntactic variation, on the other hand, to be spurious; so, for example, the following two expressions, which demonstrate the distinction between using a relative clause and a reduced relative, are assumed to be semantically identical: (3) The drawer that is in the bottom right. (4) The drawer in the bottom right. We normalised the human-produced data to remove syntactic surface variations such as these, and also to normalise synonymic variation, as demonstrated by the use of the terms column and cabinet, which in our context carry no difference in meaning. The resulting set of data effectively characterises each human-generated referring expression in terms of the semantic attributes used in constructing those expressions. We can identify four absolute properties that the human participants used for describing the drawers: these are the colour of the drawer; its row and column; and in those cases where the drawer is located in one of the corners of the grid, what we might call cornerhood. A number of participants also made use of relations that hold between two or more drawers to describe the target drawer. The relational properties that occurred in the natural descriptions were: above, below, next to, right of, left of and between. However, relational properties were used a lot less than the other properties: 103 of the 118 descriptions (87.3%) did not use relations between drawers. Many referring expression generation algorithms aim to produce minimal, non-redundant descriptions. For a referring expression to be minimal means that all of the facts about the referent that are contained in the expression are essential for the hearer to be able to uniquely distinguish the referent from the other objects in the domain. If any part of the referring expression was dropped, 115 the description would become ambiguous; if any other information was added, the resulting expression would contain redundancy. Dale and Reiter (1995), in justifying the fact that their Incremental Algorithm would sometimes produce non-minimal descriptions, pointed out that human-produced descriptions are often not minimal in this sense. This observation has been supported more recently by a number of other researchers in the area, notably van Deemter and Halldórsson (2001) and Arts (2004). However, in the data from our experiment it is evident that the participants tended to produce minimal descriptions: only 24.6% of the descriptions (29 out of 118) contain redundant information. 2.2 The Algorithms Many detailed descriptions of algorithms are available in the literature on the generation of referring expressions. For the purpose of our evaluation experiment, we focussed here on three algorithms on which many subsequently developed algorithms have been based: • The Full Brevity algorithm (Dale, 1989) uses a greedy heuristic for its attempt to build a minimal distinguishing description. At each step, it always selects the most discriminatory property available. • The Relational Algorithm from (Dale and Haddock, 1991) uses constraint satisfaction to incorporate relational properties into the framework of the Full Brevity algorithm. It uses a simple mechanism to avoid infinite regress. • The Incremental Algorithm (Reiter and Dale, 1992; Dale and Reiter, 1995) considers the available properties to be used in a description via a predefined preference ordering over those properties. We re-implemented these algorithms and applied them to a knowledge base made up of the properties evidenced collectively in the human-generated data. We then analysed to which extent the output of the algorithms for each drawer was semantically equivalent to the descriptions produced by the human participants. The following section gives a short account of this analysis.
Page 1:
Proceeding of the Australasian Lang
Page 4 and 5:
Program Chairs: Committees Lawrence
Page 6 and 7:
Towards the Evaluation of Referring
Page 8 and 9:
The cat ate the hat that I made NP/
Page 10 and 11:
Figure 2: Cells affected by adding
Page 12 and 13:
were used because the accuracy for
Page 14 and 15:
TIME ACC COVER FAIL RATE % secs % %
Page 16 and 17:
model. We explore different estimat
Page 18 and 19:
S.S. Source Sense Source S.S. Acc.
Page 20 and 21:
acy. The difference in correctly as
Page 22 and 23:
Experiments with Sentence Classific
Page 24 and 25:
datasets with skewed distribution t
Page 26 and 27:
words produced by the feature selec
Page 28 and 29:
Sentence Class NB DT SVM Apology 1.
Page 30 and 31:
Computational Semantics in the Natu
Page 32 and 33:
S[sem = ] -> NP[sem=?subj] VP[sem=?
Page 34 and 35:
Any string which cannot be decompos
Page 36 and 37:
satisfy(Formula,model(D,F),G,pos):n
Page 38 and 39:
Classifying Speech Acts using Verba
Page 40 and 41:
The principles are dichotomous —
Page 42 and 43:
ance. Several additional example ut
Page 44 and 45:
our results. In classifying only th
Page 46 and 47:
Word Relatives in Context for Word
Page 48 and 49:
create a collection of training exa
Page 50 and 51:
Query Sense The nave was rebuilt in
Page 52 and 53:
Algorithm Avg S2LS Avg S3LS Avg S2L
Page 54 and 55:
B. Snyder and M. Palmer. 2004. The
Page 56 and 57:
it is even used as a black box that
Page 58 and 59:
ecognition in QA, so we will not us
Page 60 and 61:
BPER ILOC IPER BLOC BLOC BDATE BLOC
Page 62 and 63:
of noise introduced by the addition
Page 64 and 65:
2 Existing annotated corpora Much o
Page 66 and 67:
Our FUSE|tel spectrum of HD|sta 738
Page 68 and 69:
N Word Correct Tagged 23 OH mol non
Page 70 and 71: L ATEX typesetting information into
Page 72 and 73: in English, neither the determiner
Page 74 and 75: con 4 (Sagot et al., 2006) and Morp
Page 76 and 77: Feature type Positions/description
Page 78 and 79: Miriam Butt, Helge Dyvik, Tracy Hol
Page 80 and 81: ently mostly solved by employing hu
Page 82 and 83: Figure 1: Augmented SCT Lexicon Tok
Page 84 and 85: medical terms can be composed by ad
Page 86 and 87: References Aronson, A. R. (2001). E
Page 88 and 89: technique to most question types, i
Page 90 and 91: User Question Analyser QA System Se
Page 92 and 93: Figure 2: Precision Figure 3: Cover
Page 94 and 95: Analysis and Review. Springer-Verla
Page 96 and 97: 2 Background 2.1 Pronoun Reference
Page 98 and 99: ous accounts of the effects of cohe
Page 100 and 101: Table 1: Overall Results for Object
Page 102 and 103: References Jennifer Arnold, Janet E
Page 104 and 105: In order for appropriate documents
Page 106 and 107: y Michael West. He found that his t
Page 108 and 109: low-scoring documents are “Click
Page 110 and 111: a) b) You must complete at least 9
Page 112 and 113: uct). In this paper, we will only c
Page 114 and 115: indicate the categories of substruc
Page 116 and 117: Argument lowering Dependent geach
Page 118 and 119: who [u : np] WH(np, s, s/?gq) λP.
Page 122 and 123: 2.3 Coverage of the Human Data Out
Page 124 and 125: and minimal subsets of the data. Of
Page 126 and 127: output. Finally, and related to the
Page 128 and 129: identifies, to avoid overwhelming t
Page 130 and 131: y the student is first parsed, usin
Page 132 and 133: a given word sequence Sorig as a pe
Page 134 and 135: A problem arises with this scheme w
Page 136 and 137: Example 1: Original Headline: Europ
Page 138 and 139: |relations(sentence1) ∩ relations
Page 140 and 141: 2685 True False 1275 Bleu Dep Bleu
Page 142 and 143: Features Acc. C1-prec. C1-recall C1
Page 144 and 145: Given the lack of research in VSD u
Page 146 and 147: ased features: Type 1 Original sibl
Page 148 and 149: Preposition of the adjunct semantic
Page 150 and 151: Algorithm 1 The algorithm of combin
Page 152 and 153: References Timothy Baldwin and Fran
Page 154 and 155: dependencies in German with respect
Page 156 and 157: wij moeten dit probleem aanpakken w
Page 158 and 159: a brevity penalty of BLEU n = 1 n =
Page 160 and 161: plicitly matching the syntax of the
Page 162 and 163: Probabilities improve stress-predic
Page 164 and 165: Towards Cognitive Optimisation of a
Page 166 and 167: Natural Language Processing and XML
Page 168 and 169: Extracting Patient Clinical Profile
show all

Automatic Mapping Clinical Notes to Medical - RMIT University

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?