Automatic Mapping Clinical Notes to Medical - RMIT University

More documents

Recommendations

Info

$journal of latex class files, vol. 6, no. 1, january ... - RMIT University$

model. We explore different estimates of the most frequent super-sense, and find that aggregating the counts of the fine-grained senses is a significantly better heuristic for this task. We believe that one of the reasons coarse grained senses are useful is that they largely ameliorate the inter-annotator agreement issues of fine grained sense tagging. Not only are there fewer senses to choose from, but the senses are more distinct, and therefore should be less easily confused. The super-sense tags are therefore probably less noisy than the fine-grained senses, which would make corpus-based estimates of their frequency more reliable. The best performing fine grained frequency heuristic we present exploits this property of supersenses, by making the assumption that the most frequent sense of a word will be the most frequent member of the most frequent supersense. Essentially, when the overall most frequent sense of a word is a member of a minority supersense, we find that it is better to avoid selecting that sense. This system scores 63.8% on the SEN- SEVAL 3 all words task, significantly higher than the relevant baseline of 62.5% — using no contextual information at all. 2 Preliminaries: WordNet Sense Ranks, Frequencies and Supersenses This paper discusses different methods of selecting a ‘default’ sense of a word from the WordNet (Fellbaum, 1998) sense inventory. These methods draw on four different sources of information associated with the lexicon: WordNet sense ranks, WordNet sense counts, the SemCor sense tagged corpus, and WordNet lexical file numbers. Each word entry in WordNet consists of a lemma under a part-of-speech and an inventory of its senses. These senses are ranked by the lexicographers according to their frequencies in “various semantic concordance texts” (Fellbaum, 1998). These frequencies are often given in the database. We refer to them as WordNet counts to distinguish them from the frequencies we obtain from the SemCor corpus. The SemCor corpus is a subset of the semantic concordance texts used to calculate WordNet counts. Each WordNet sense is categorised under one of forty-five lexicographer files. Each lexicographer file covers only one part of speech. The main categorisation is applied to nouns and verbs, as there is only one file for adverbs, and three for adjectives. 10 Lexical files are interesting because they represent broad, or coarse-grained, semantic categories; and therefore a way around the commonly noted problem that WordNet senses are generally too fine grained. We describe a first-sense heuristic that takes advantage of this property of the lexicographer files (often referred to as ‘supersenses’ (Ciaramita and Johnson, 2003) — we use both terms interchangeably). We also discuss first supersense heuristics, as increasing attention is being paid to supervised supersense tagging (Ciaramita and Altun, 2006). 3 First Order Models for Word Sense Disambiguation Supervised word sense disambiguation (WSD) is the task of finding the most likely sense s from a sense inventory S given a usage context C: arg max P (s|C) (1) s∈S These models are usually compared to models of the form: arg max P (s) (2) s∈S in order to evaluate how much the context is informing the model. Since we are primarily interested in the argmax, it is usually sufficient to simply define a function that selects the most likely sense, even if a full probability distribution is not defined. Selecting the sense listed first in WordNet is one such function. As a WSD system, a first order model has an inherent upper bound, as if a word is used with more than one sense, a first order model cannot get all examples correct. However, Figure 1 shows that on the SENSEVAL 3 data, this upper bound is far higher than the performance of state-of-the-art WSD systems. The upper bound was calculated by selecting the most frequent sense of each word in the test data. It is effectively a system with oracle frequency information. Because the correct sense will always be given to words that only occur once, it is interesting to see how the upper bound decays if the system is forced to use the first-sense heuristic instead for words that occur less than n times in the test data. For n > 14, the oracle system either falls back to or makes the same prediction as the first-sense system for every instance, and so the systems are effectively identical. McCarthy et al. (2004) described a first order word sense disambiguation system that ac-
Accuracy 72 71 70 69 68 67 66 65 64 63 62 0 2 4 6 8 10 12 14 16 Threshold Oracle Baseline Supersense Frequency Method First Sense Figure 1: Upper performance bound of one-sense per term strategy for SenseEval quired ‘predominant’ senses automatically from un-annotated data using distributional similarity as a proxy for context, and WordNet-based semantic similarity as a sense distinction heuristic. The authors reported an accuracy of 64%, but their system was evaluated on the nouns from the SENSE- VAL 2 all words task, and hence cannot be directly compared with the results we report. 4 Experiments In this section we describe several default sense heuristics and evaluate them on the SENSEVAL 3 English all words test data set. All of the systems use the tokenisation, lemmatisation and part-ofspeech tags supplied in the data. Table 1 presents the results for the systems described below. Each system selects a source of information to select a supersense (None, Word- Net, SemCor) and a fine grained sense (WordNet or SemCor). When supersenses are being used, the fine grained sense is chosen from within the selected supersense, effectively filtering out the senses that belong to other lexical files. 4.1 Supersense: None; Fine Sense: WordNet This is the default sense heuristic used as the baseline in SENSEVAL 3, and is the most common heuristic used for WSD. The heuristic involves simply choosing the lowest numbered sense in 11 the WordNet sense inventory. As this is the only heuristic we explore that does not require extra data, it is the only one that has perfect coverage of WordNet’s vocabulary — there is guaranteed to be a sense rank for every WordNet lemma. When there is a coverage problem for one of the other heuristics, such as a case where a word has not occurred in the frequency estimation corpus, that heuristic is allowed to fall back to the WordNet sense rank, rather than being forced to select a sense arbitrarily. 4.2 Supersense: None; Fine Sense: SemCor As noted in Section 2, SemCor is effectively a subset of the information used to produce the Word- Net sense ranks. We evaluated a default sense heuristic that preferred SemCor-only frequncy estimates for words that occurred at least once in the SemCor corpus. This only results in a different prediction from the first-sense heuristic 7% of the time. Nevertheless, the systems perform significantly differently. 4.3 Supersense: WordNet; Fine Sense: WordNet WordNet sense ranks can also be straightforwardly used as the basis of a default supersense heuristic. Ciaramita and Altun (2006) select the supersense of the first fine grained sense
Page 1: Proceeding of the Australasian Lang
Page 4 and 5: Program Chairs: Committees Lawrence
Page 6 and 7: Towards the Evaluation of Referring
Page 8 and 9: The cat ate the hat that I made NP/
Page 10 and 11: Figure 2: Cells affected by adding
Page 12 and 13: were used because the accuracy for
Page 14 and 15: TIME ACC COVER FAIL RATE % secs % %
Page 18 and 19: S.S. Source Sense Source S.S. Acc.
Page 20 and 21: acy. The difference in correctly as
Page 22 and 23: Experiments with Sentence Classific
Page 24 and 25: datasets with skewed distribution t
Page 26 and 27: words produced by the feature selec
Page 28 and 29: Sentence Class NB DT SVM Apology 1.
Page 30 and 31: Computational Semantics in the Natu
Page 32 and 33: S[sem = ] -> NP[sem=?subj] VP[sem=?
Page 34 and 35: Any string which cannot be decompos
Page 36 and 37: satisfy(Formula,model(D,F),G,pos):n
Page 38 and 39: Classifying Speech Acts using Verba
Page 40 and 41: The principles are dichotomous —
Page 42 and 43: ance. Several additional example ut
Page 44 and 45: our results. In classifying only th
Page 46 and 47: Word Relatives in Context for Word
Page 48 and 49: create a collection of training exa
Page 50 and 51: Query Sense The nave was rebuilt in
Page 52 and 53: Algorithm Avg S2LS Avg S3LS Avg S2L
Page 54 and 55: B. Snyder and M. Palmer. 2004. The
Page 56 and 57: it is even used as a black box that
Page 58 and 59: ecognition in QA, so we will not us
Page 60 and 61: BPER ILOC IPER BLOC BLOC BDATE BLOC
Page 62 and 63: of noise introduced by the addition
Page 64 and 65: 2 Existing annotated corpora Much o
Page 66 and 67:
Our FUSE|tel spectrum of HD|sta 738
Page 68 and 69:
N Word Correct Tagged 23 OH mol non
Page 70 and 71:
L ATEX typesetting information into
Page 72 and 73:
in English, neither the determiner
Page 74 and 75:
con 4 (Sagot et al., 2006) and Morp
Page 76 and 77:
Feature type Positions/description
Page 78 and 79:
Miriam Butt, Helge Dyvik, Tracy Hol
Page 80 and 81:
ently mostly solved by employing hu
Page 82 and 83:
Figure 1: Augmented SCT Lexicon Tok
Page 84 and 85:
medical terms can be composed by ad
Page 86 and 87:
References Aronson, A. R. (2001). E
Page 88 and 89:
technique to most question types, i
Page 90 and 91:
User Question Analyser QA System Se
Page 92 and 93:
Figure 2: Precision Figure 3: Cover
Page 94 and 95:
Analysis and Review. Springer-Verla
Page 96 and 97:
2 Background 2.1 Pronoun Reference
Page 98 and 99:
ous accounts of the effects of cohe
Page 100 and 101:
Table 1: Overall Results for Object
Page 102 and 103:
References Jennifer Arnold, Janet E
Page 104 and 105:
In order for appropriate documents
Page 106 and 107:
y Michael West. He found that his t
Page 108 and 109:
low-scoring documents are “Click
Page 110 and 111:
a) b) You must complete at least 9
Page 112 and 113:
uct). In this paper, we will only c
Page 114 and 115:
indicate the categories of substruc
Page 116 and 117:
Argument lowering Dependent geach
Page 118 and 119:
who [u : np] WH(np, s, s/?gq) λP.
Page 120 and 121:
algorithms, and report the performa
Page 122 and 123:
2.3 Coverage of the Human Data Out
Page 124 and 125:
and minimal subsets of the data. Of
Page 126 and 127:
output. Finally, and related to the
Page 128 and 129:
identifies, to avoid overwhelming t
Page 130 and 131:
y the student is first parsed, usin
Page 132 and 133:
a given word sequence Sorig as a pe
Page 134 and 135:
A problem arises with this scheme w
Page 136 and 137:
Example 1: Original Headline: Europ
Page 138 and 139:
|relations(sentence1) ∩ relations
Page 140 and 141:
2685 True False 1275 Bleu Dep Bleu
Page 142 and 143:
Features Acc. C1-prec. C1-recall C1
Page 144 and 145:
Given the lack of research in VSD u
Page 146 and 147:
ased features: Type 1 Original sibl
Page 148 and 149:
Preposition of the adjunct semantic
Page 150 and 151:
Algorithm 1 The algorithm of combin
Page 152 and 153:
References Timothy Baldwin and Fran
Page 154 and 155:
dependencies in German with respect
Page 156 and 157:
wij moeten dit probleem aanpakken w
Page 158 and 159:
a brevity penalty of BLEU n = 1 n =
Page 160 and 161:
plicitly matching the syntax of the
Page 162 and 163:
Probabilities improve stress-predic
Page 164 and 165:
Towards Cognitive Optimisation of a
Page 166 and 167:
Natural Language Processing and XML
Page 168 and 169:
Extracting Patient Clinical Profile
show all

Automatic Mapping Clinical Notes to Medical - RMIT University

Create successful ePaper yourself

Delete template?

Save as template?