Automatic Mapping Clinical Notes to Medical - RMIT University

More documents

Recommendations

Info

$journal of latex class files, vol. 6, no. 1, january ... - RMIT University$

con 4 (Sagot et al., 2006) and Morphalou 5 (Romary et al., 2004), a lexicon of inflected forms, we automatically annotate tokens for which the sources predict an unambiguous gender, and handannotate ambiguous tokens using contextual information. These ambiguous tokens are generally animate nouns like collègue, which are masculine or feminine according to their referent, or polysemous nouns like aide, whose gender depends on the applicable sense. 3.3 POS Taggers Again, TIGER comes annotated with handcorrected part-of-speech tags, while the BAF does not. For consistency, we tag both corpora with TreeTagger 6 (Schmid, 1994), a decision tree– based probabilistic tagger trained on both German and French text. We were interested in the impact of the accuracy of the tagger compared to the corrected judgements in the corpus as an extrinsic evaluation of tagger performance. The tagger token accuracy with respect to the TIGER judgements was about 96%, with many of the confusion pairs being common nouns for proper nouns (as the uniform capitalisation makes it less predictable). 4 Deep Lexical Acquisition We use a deep lexical acquisitional approach similar to Baldwin (2005) to predict a lexical property. In our case, we predict gender and restrict ourselves to languages other than English. Baldwin examines the relative performance on predicting the lexical types in the ERG, over various linguistically-motivated features based on morphology, syntax, and an ontology: the socalled “bang for the buck” of language resources. To take a similar approach, we extract all of the common nouns (labelled NN), from each of the corpora to form both a token and a type data set. We generate our feature set independently over the token data set and the type data set for both morphological and syntactic features (explained below) without feature selection, then perform 10-fold stratified cross-validation using a nearestneighbour classifier (TiMBL 5.0: Daelemans et al. (2003)) with the default k = 1 for evaluation. A summary of the corpora appears in Table 1. 4 http://www.lefff.net 5 http://www.cnrtl.fr/morphalou 6 http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger 68 Corpus Tokens NN Tokens NN Types TIGER 900K 180K 46K BAF 450K 110K 8K Table 1: A summary of the two corpora: TIGER for German and BAF for French. The comparatively low number of noun types in BAF is caused by domain specificity. 4.1 Morphological DLA Morphology-based deep lexical acquisition is based on the hypothesis that words with a similar morphology (affixes) have similar lexical properties. Using character n-grams is a simple approach that does not require any language resources other than a set of pre-classified words from which to bootstrap. For each (token or type) instance, we generate all of the uni-, bi-, and tri-grams from the wordform, taken from the left (prefixes and infixes) and the right (suffixes and infixes), padded to the length of the longest word in the data. For example, the 1-grams for fenêtre above would be f, e, n, ê, t, r, e, #, #, ... from the left (L) and e, r, t, ê, n, e, f, #, #, ... from the right (R). We evaluate using each of 1-, 2-, and 3-grams, as well as the combination of 1- and 2-grams, and 1-, 2-, and 3-grams from L and R and both (LR) — to make 5 × 3 = 15 experiments. Other resources for morphological analysis exist, such as derivational morphological analysers, lemmatisers, and stemmers. We do not include them, or their information where it is available in our corpora, taking the stance that such tools will not be readily available for most languages. 4.2 Syntactic DLA Syntax-based deep lexical acquisition purports that a lexical property can be predicted from the words which surround it. Most languages have at least local morphosyntax, meaning that morphosyntactic and syntactico-semantic properties are attested in the surrounding words. For each token, we examine the four word forms to the left and right, the POS tags of these wordforms, and corresponding bi-word and bi-tag features according to (Baldwin, 2005). For each type, we take the best N of each feature across all of the relevant tokens. We examine both left (preceding, in languages written left–to–right) and right (following) context
to maintain a language agnostic approach. While in general, contextual gender information is encoded in the noun modifiers, it is unclear whether these modifiers precede the noun (as in head– final languages like English or German), follow the noun, or occur in some combination (as in French). While context in freer–word–order languages such as German is circumspect, limiting our feature set to four words on each side takes into account at least local agreement (which is a fixture in most languages). Other unrelated information can presumably be treated as noise. A summary of the feature set appears in Table 2. 4.3 Ontological DLA We do not make use of an ontology in the way that Baldwin does; although a candidate ontology does exist for these particular languages (EuroWord- Net; Vossen (1998)), the likelihood of such as resource existing for an arbitrary language is low. 5 Evaluation We evaluate each of the feature sets across the four data sets collected from the two corpora: the German NN-tagged tokens in TIGER, the German NN-tagged types in TIGER, the French NNtagged tokens in BAF and the French NN-tagged types in BAF. There were four lexical types for German: MASC, FEM, NEUT, and * and three lexical types for French: MASC, FEM, *. The instances labelled * were those to which a gender could not be sensibly ascribed (e.g. abbreviations such as PGs), or uniquely defined (e.g. genderunderspecified nouns such as relâche). Baseline accuracy for each set corresponds to the majority–class: for German tokens and types, this was FEM (41.4 and 38.4% respectively); for French tokens and types, this was MASC (51.8 and 52.7%). 5.1 Morphology The results for the morphological features are shown in Table 3, for 1, 2, and 3-grams taken from the left and right. The performance over tokens is excellent, comparable to that of language-specific morphological analysers. Taking characters from the right unsurprisingly performs best, as both German and French inflect for gender using suffixes. 69 German French token type token type L1 93.8 77.0 99.4 85.5 L2 93.6 77.0 99.4 88.0 L3 93.4 73.5 99.4 86.0 L1+2 93.8 77.3 99.5 87.5 L1+2+3 93.6 75.1 99.4 87.3 R1 97.1 85.3 99.5 87.9 R2 97.4 86.6 99.5 88.4 R3 96.9 84.2 99.4 85.3 R1+2 97.4 86.5 99.5 88.4 R1+2+3 97.3 85.9 99.5 87.5 LR1 95.6 78.5 99.4 85.5 LR2 96.2 82.0 99.4 86.1 LR3 95.7 78.9 99.4 85.7 LR1+2 96.1 81.6 99.4 86.1 LR1+2+3 96.0 80.9 99.4 85.5 Table 3: Morphological results using TiMBL German French token type token type All 82.2 52.5 — — TT POS 81.7 52.1 95.5 66.6 WF only 84.9 53.9 96.6 67.6 Table 4: Syntactic results using TiMBL The best results invariably occurred when using bigrams, or unigrams and bigrams together, suggesting that gender is encoded using more than just the final character in a word. This is intuitive, in that a 1-letter suffix is usually insufficient evidence; for example, -n in French could imply a feminine gender for words like maison, information and a masculine gender for words like bâton, écrivain. 5.2 Syntax The syntax results shown in Table 4 show the goldstandard POS tags (ALL) against those estimated by TreeTagger (TT POS), when combined with wordform context. We also contrast these with using the wordforms without the part-of-speech features (WF only). For type results, we took the best N features across corresponding tokens — for consistency, we let N = 1, i.e. we considered the best contextual feature from all of the tokens. 7 7 Experimentally, a value of 2 gave the best results, with a constant decrease for larger N representing the addition of ir-
Page 1:
Proceeding of the Australasian Lang
Page 4 and 5:
Program Chairs: Committees Lawrence
Page 6 and 7:
Towards the Evaluation of Referring
Page 8 and 9:
The cat ate the hat that I made NP/
Page 10 and 11:
Figure 2: Cells affected by adding
Page 12 and 13:
were used because the accuracy for
Page 14 and 15:
TIME ACC COVER FAIL RATE % secs % %
Page 16 and 17:
model. We explore different estimat
Page 18 and 19:
S.S. Source Sense Source S.S. Acc.
Page 20 and 21:
acy. The difference in correctly as
Page 22 and 23:
Experiments with Sentence Classific
Page 24 and 25: datasets with skewed distribution t
Page 26 and 27: words produced by the feature selec
Page 28 and 29: Sentence Class NB DT SVM Apology 1.
Page 30 and 31: Computational Semantics in the Natu
Page 32 and 33: S[sem = ] -> NP[sem=?subj] VP[sem=?
Page 34 and 35: Any string which cannot be decompos
Page 36 and 37: satisfy(Formula,model(D,F),G,pos):n
Page 38 and 39: Classifying Speech Acts using Verba
Page 40 and 41: The principles are dichotomous —
Page 42 and 43: ance. Several additional example ut
Page 44 and 45: our results. In classifying only th
Page 46 and 47: Word Relatives in Context for Word
Page 48 and 49: create a collection of training exa
Page 50 and 51: Query Sense The nave was rebuilt in
Page 52 and 53: Algorithm Avg S2LS Avg S3LS Avg S2L
Page 54 and 55: B. Snyder and M. Palmer. 2004. The
Page 56 and 57: it is even used as a black box that
Page 58 and 59: ecognition in QA, so we will not us
Page 60 and 61: BPER ILOC IPER BLOC BLOC BDATE BLOC
Page 62 and 63: of noise introduced by the addition
Page 64 and 65: 2 Existing annotated corpora Much o
Page 66 and 67: Our FUSE|tel spectrum of HD|sta 738
Page 68 and 69: N Word Correct Tagged 23 OH mol non
Page 70 and 71: L ATEX typesetting information into
Page 72 and 73: in English, neither the determiner
Page 76 and 77: Feature type Positions/description
Page 78 and 79: Miriam Butt, Helge Dyvik, Tracy Hol
Page 80 and 81: ently mostly solved by employing hu
Page 82 and 83: Figure 1: Augmented SCT Lexicon Tok
Page 84 and 85: medical terms can be composed by ad
Page 86 and 87: References Aronson, A. R. (2001). E
Page 88 and 89: technique to most question types, i
Page 90 and 91: User Question Analyser QA System Se
Page 92 and 93: Figure 2: Precision Figure 3: Cover
Page 94 and 95: Analysis and Review. Springer-Verla
Page 96 and 97: 2 Background 2.1 Pronoun Reference
Page 98 and 99: ous accounts of the effects of cohe
Page 100 and 101: Table 1: Overall Results for Object
Page 102 and 103: References Jennifer Arnold, Janet E
Page 104 and 105: In order for appropriate documents
Page 106 and 107: y Michael West. He found that his t
Page 108 and 109: low-scoring documents are “Click
Page 110 and 111: a) b) You must complete at least 9
Page 112 and 113: uct). In this paper, we will only c
Page 114 and 115: indicate the categories of substruc
Page 116 and 117: Argument lowering Dependent geach
Page 118 and 119: who [u : np] WH(np, s, s/?gq) λP.
Page 120 and 121: algorithms, and report the performa
Page 122 and 123: 2.3 Coverage of the Human Data Out
Page 124 and 125:
and minimal subsets of the data. Of
Page 126 and 127:
output. Finally, and related to the
Page 128 and 129:
identifies, to avoid overwhelming t
Page 130 and 131:
y the student is first parsed, usin
Page 132 and 133:
a given word sequence Sorig as a pe
Page 134 and 135:
A problem arises with this scheme w
Page 136 and 137:
Example 1: Original Headline: Europ
Page 138 and 139:
|relations(sentence1) ∩ relations
Page 140 and 141:
2685 True False 1275 Bleu Dep Bleu
Page 142 and 143:
Features Acc. C1-prec. C1-recall C1
Page 144 and 145:
Given the lack of research in VSD u
Page 146 and 147:
ased features: Type 1 Original sibl
Page 148 and 149:
Preposition of the adjunct semantic
Page 150 and 151:
Algorithm 1 The algorithm of combin
Page 152 and 153:
References Timothy Baldwin and Fran
Page 154 and 155:
dependencies in German with respect
Page 156 and 157:
wij moeten dit probleem aanpakken w
Page 158 and 159:
a brevity penalty of BLEU n = 1 n =
Page 160 and 161:
plicitly matching the syntax of the
Page 162 and 163:
Probabilities improve stress-predic
Page 164 and 165:
Towards Cognitive Optimisation of a
Page 166 and 167:
Natural Language Processing and XML
Page 168 and 169:
Extracting Patient Clinical Profile
show all

Automatic Mapping Clinical Notes to Medical - RMIT University

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?