Automatic Mapping Clinical Notes to Medical - RMIT University

More documents

Recommendations

Info

$journal of latex class files, vol. 6, no. 1, january ... - RMIT University$

create a collection of training examples for each sense, and (iv) use an ML algorithm trained on the acquired collections to tag the test instances. This method has been applied in different works (Mihalcea, 2002; Agirre and Martinez, 2004). We describe here the approach by Agirre and Martinez (2004), which we will apply to the same datasets as the novel method described in Section 4. In this implementation, the monosemous relatives are obtained using WordNet, and different relevance weights are assigned to these words depending on the distance to the target word (synonyms are the closest, followed by immediate hypernyms and hyponyms). These weights are used to determine an order of preference to construct the training corpus from the queries, and 1,000 examples are then retrieved for each query. As explained in (Agirre and Martinez, 2004), the number of examples taken for each sense has a big impact in the performance, and information on the expected distribution of senses influences the results. They obtain this information using different means, such as hand-tagged data distribution (from Semcor), or a prior algorithm like (Mc- Carthy et al., 2004). In this paper we present the results of the basic approach that uses all the retrieved examples per sense, which is the best standalone unsupervised alternative. The ML technique Agirre and Martinez (2004) applied is Decision Lists (Yarowsky, 1994). In this method, the sense sk with the highest weighted feature fi is selected, according to its loglikelihood (see Formula 1). For this implementation, they used a simple smoothing method: the cases where the denominator is zero are smoothed by the constant 0.1. weight(sk,fi) = log( Pr(sk|fi) ) (1) Pr(sj|fi) j=k The feature set consisted of local collocations (bigrams and trigrams), bag-of-words features (unigrams and salient bigrams), and domain features from the WordNet Domains resource (Magnini and Cavagliá, 2000). The Decision List algorithm showed good comparative performance with the monosemous relatives method, and it had the advantage of allowing hands-on analysis of the different features. 42 4 Relatives in Context The goal of this new approach is to use the Word- Net relatives and the contexts of the target words to overcome some of the limitations found in the “monosemous relatives” technique. One of the main problems is the lack of close monosemous relatives for some senses of the target word. This forces the system to rely on distant relatives whose meaning is far from the intended one. Another problem is that by querying only with the relative word we do not put any restrictions on the sentences we retrieve. Even if we are using words that are listed as monosemous in WordNet, we can find different usages of them in a big corpus such as Internet (e.g. Named Entities, see example below). Including real contexts of the target word in the queries could alleviate the problem. For instance, let us assume that we want to classify church with one of the 3 senses it has in Senseval-2: (1) Group of Christians, (2) Church building, or (3) Church service. When querying the Internet directly with monosemous relatives of these senses, we find the following problems: • Metaphors: the relative cathedral (2nd sense) appears in very different collocations that are not related to any sense of church, e.g. the cathedral of football. • Named entities: the relative kirk (2nd sense), which is a name for a Scottish church, will retrieve sentences that use Kirk as a proper noun. • Frequent words as relatives: relatives like hebraism (1st sense) could provide useful examples, but if the query is not restricted can also be the source of many noisy examples. The idea behind the “relatives in context” method is to combine local contexts of the target word with the pool of relatives in order to obtain a better set of examples per sense. Using this approach, we only gather those examples that have a close similarity with the target contexts, defined by a set of pre-defined features. We will illustrate this with the following example from the Senseval-2 dataset, where the goal is to disambiguate the word church: The church was rebuilt in the 13th century and further modifications and restoration were carried out in the 15th century.
We can extract different features from this context, for instance using a dependency parser. We can obtain that there is a object-verb relation between church and rebuild. Then we can incorporate this knowledge to the relative-based query and obtain training examples that are closer to our target sentence. In order to implement this approach with rich features we require tools that allow for linguistic queries, such as the linguist’s engine (Resnik and Elkiss, 2005), but other approach would be to use simple features, such as strings of words, in order to benefit directly from the examples coming from search engines in the Internet. In this paper we decided to explore the latter technique to observe the performance we can achieve with simple features. Thus, in the example above, we query the Internet with snippets such as “The cathedral was rebuilt” to retrieve training examples. We will go back to the example at the end of this section. With this method we can obtain a separate training set starting from each test instance and the pool of relatives for each sense. Then, a ML algorithm can be trained with the acquired examples. Alternatively, we can just rank the different queries according to the following factors: • Length of the query: the longer the match, the more similar the new sentence will be to the target. • Distance of the relative to the target word: examples that are obtained with synonyms will normally be closer to the original meaning. • Number of hits: the more common the snippet we query, the more reliable. We observed a similar performance in preliminary experiments when using a ML method or applying an heuristic on the above factors. For this paper we devised a simple algorithm to rank queries according to the three factors, but we plan to apply other techniques in the acquired training data in the future. Thus, we build a disambiguation algorithm that can be explained in the following four steps: 1. Obtain pool of relatives: for each sense of the target word we gather its synonyms, hyponyms, and hypernyms. We also take polysemous nouns, as we expect that in similar local contexts the relative will keep its related meaning. 43 2. Construct queries: first we tokenise each target sentence, then we apply sliding windows of different sizes (up to 6 tokens) that include the target word. For each window and each relative in the pool, we substitute the target word for the relative and query the Internet. Then we store the number of hits for each query. The algorithm stops augmenting the window for the relative when one of its substrings returns zero hits. 3. Ranking of queries: we devised a simple heuristic to rank the queries according to our intuition on the relevant parameters. We chose these three factors (in decreasing order of relevance): • Number of tokens of the query. • Type of relative: preference order: (1) synonyms, (2) immediate hyponyms, (3) immediate hypernyms, and (4) distant relatives. • Number of hits: we choose the query with most hits. For normalisation we divide by the number of hits of the relative alone, which penalises frequent and polysemous relatives. We plan to improve this ranking approach in the future, by learning the best parameter set on a development corpus. We also would like to gather a training corpus from the returned documents and apply a ML classifier. 4. Assign the sense of the highest ranked query: another alternative that we will explore in the future is to vote among the k highest ranked queries. We will show how the algorithm works with the example for the target word church presented above. Using the relatives (synonyms, hypernyms, and hyponyms) of each sense and the local context we query the Internet. The list of the longest matches that have at least 2 hits is given in Table 1. In this case the second sense would be chosen because the words nave, abbey, and cathedral indicate this sense. In cases where the longest match corresponds to more than one sense the closest relative is chosen; if there is still a tie the number of hits (divided by the number of hits of the relative for normalisation) is used. 5 Experimental setting For our experiments we relied on the lexicalsample datasets of both Senseval-2 (Kilgarriff, 2001) and Senseval-3 (Mihalcea et al., 2004). We
Page 1: Proceeding of the Australasian Lang
Page 4 and 5: Program Chairs: Committees Lawrence
Page 6 and 7: Towards the Evaluation of Referring
Page 8 and 9: The cat ate the hat that I made NP/
Page 10 and 11: Figure 2: Cells affected by adding
Page 12 and 13: were used because the accuracy for
Page 14 and 15: TIME ACC COVER FAIL RATE % secs % %
Page 16 and 17: model. We explore different estimat
Page 18 and 19: S.S. Source Sense Source S.S. Acc.
Page 20 and 21: acy. The difference in correctly as
Page 22 and 23: Experiments with Sentence Classific
Page 24 and 25: datasets with skewed distribution t
Page 26 and 27: words produced by the feature selec
Page 28 and 29: Sentence Class NB DT SVM Apology 1.
Page 30 and 31: Computational Semantics in the Natu
Page 32 and 33: S[sem = ] -> NP[sem=?subj] VP[sem=?
Page 34 and 35: Any string which cannot be decompos
Page 36 and 37: satisfy(Formula,model(D,F),G,pos):n
Page 38 and 39: Classifying Speech Acts using Verba
Page 40 and 41: The principles are dichotomous —
Page 42 and 43: ance. Several additional example ut
Page 44 and 45: our results. In classifying only th
Page 46 and 47: Word Relatives in Context for Word
Page 50 and 51: Query Sense The nave was rebuilt in
Page 52 and 53: Algorithm Avg S2LS Avg S3LS Avg S2L
Page 54 and 55: B. Snyder and M. Palmer. 2004. The
Page 56 and 57: it is even used as a black box that
Page 58 and 59: ecognition in QA, so we will not us
Page 60 and 61: BPER ILOC IPER BLOC BLOC BDATE BLOC
Page 62 and 63: of noise introduced by the addition
Page 64 and 65: 2 Existing annotated corpora Much o
Page 66 and 67: Our FUSE|tel spectrum of HD|sta 738
Page 68 and 69: N Word Correct Tagged 23 OH mol non
Page 70 and 71: L ATEX typesetting information into
Page 72 and 73: in English, neither the determiner
Page 74 and 75: con 4 (Sagot et al., 2006) and Morp
Page 76 and 77: Feature type Positions/description
Page 78 and 79: Miriam Butt, Helge Dyvik, Tracy Hol
Page 80 and 81: ently mostly solved by employing hu
Page 82 and 83: Figure 1: Augmented SCT Lexicon Tok
Page 84 and 85: medical terms can be composed by ad
Page 86 and 87: References Aronson, A. R. (2001). E
Page 88 and 89: technique to most question types, i
Page 90 and 91: User Question Analyser QA System Se
Page 92 and 93: Figure 2: Precision Figure 3: Cover
Page 94 and 95: Analysis and Review. Springer-Verla
Page 96 and 97: 2 Background 2.1 Pronoun Reference
Page 98 and 99:
ous accounts of the effects of cohe
Page 100 and 101:
Table 1: Overall Results for Object
Page 102 and 103:
References Jennifer Arnold, Janet E
Page 104 and 105:
In order for appropriate documents
Page 106 and 107:
y Michael West. He found that his t
Page 108 and 109:
low-scoring documents are “Click
Page 110 and 111:
a) b) You must complete at least 9
Page 112 and 113:
uct). In this paper, we will only c
Page 114 and 115:
indicate the categories of substruc
Page 116 and 117:
Argument lowering Dependent geach
Page 118 and 119:
who [u : np] WH(np, s, s/?gq) λP.
Page 120 and 121:
algorithms, and report the performa
Page 122 and 123:
2.3 Coverage of the Human Data Out
Page 124 and 125:
and minimal subsets of the data. Of
Page 126 and 127:
output. Finally, and related to the
Page 128 and 129:
identifies, to avoid overwhelming t
Page 130 and 131:
y the student is first parsed, usin
Page 132 and 133:
a given word sequence Sorig as a pe
Page 134 and 135:
A problem arises with this scheme w
Page 136 and 137:
Example 1: Original Headline: Europ
Page 138 and 139:
|relations(sentence1) ∩ relations
Page 140 and 141:
2685 True False 1275 Bleu Dep Bleu
Page 142 and 143:
Features Acc. C1-prec. C1-recall C1
Page 144 and 145:
Given the lack of research in VSD u
Page 146 and 147:
ased features: Type 1 Original sibl
Page 148 and 149:
Preposition of the adjunct semantic
Page 150 and 151:
Algorithm 1 The algorithm of combin
Page 152 and 153:
References Timothy Baldwin and Fran
Page 154 and 155:
dependencies in German with respect
Page 156 and 157:
wij moeten dit probleem aanpakken w
Page 158 and 159:
a brevity penalty of BLEU n = 1 n =
Page 160 and 161:
plicitly matching the syntax of the
Page 162 and 163:
Probabilities improve stress-predic
Page 164 and 165:
Towards Cognitive Optimisation of a
Page 166 and 167:
Natural Language Processing and XML
Page 168 and 169:
Extracting Patient Clinical Profile
show all

Automatic Mapping Clinical Notes to Medical - RMIT University

Create successful ePaper yourself

Delete template?

Save as template?