Automatic Mapping Clinical Notes to Medical - RMIT University

More documents

Recommendations

Info

$journal of latex class files, vol. 6, no. 1, january ... - RMIT University$

Word Relatives in Context for Word Sense Disambiguation David Martinez Eneko Agirre Computer Science and IXA NLP Group Sofware Engineering Univ. of the Basque Country University of Melbourne Donostia, Basque Country Victoria 3010 Australia e.agirre@ehu.es davidm@csse.unimelb.edu.au Abstract The current situation for Word Sense Disambiguation (WSD) is somewhat stuck due to lack of training data. We present in this paper a novel disambiguation algorithm that improves previous systems based on acquisition of examples by incorporating local context information. With a basic configuration, our method is able to obtain state-of-the-art performance. We complemented this work by evaluating other well-known methods in the same dataset, and analysing the comparative results per word. We observed that each algorithm performed better for different types of words, and each of them failed for some particular words. We proposed then a simple unsupervised voting scheme that improved significantly over single systems, achieving the best unsupervised performance on both the Senseval 2 and Senseval 3 lexical sample datasets. 1 Introduction Word Sense Disambiguation (WSD) is an intermediate task that potentially can benefit many other NLP systems, from machine translation to indexing of biomedical texts. The goal of WSD is to ground the meaning of words in certain contexts into concepts as defined in some dictionary or lexical repository. Since 1998, the Senseval challenges have been serving as showcases for the state-of-the-art WSD systems. In each competition, Senseval has been growing in participants, labelling tasks, and target languages. The most recent Senseval workshop (Mihalcea et al., 2004) has again shown Xinglong Wang School of Informatics University of Edinburgh EH8 9LW, Edinburgh, UK xwang@inf.ed.ac.uk clear superiority in performance of supervised systems, which rely on hand-tagged data, over other kinds of techniques (knowledge-based and unsupervised). However, supervised systems use large amounts of accurately sense-annotated data to yield good results, and such resources are very costly to produce and adapt for specific domains. This is the so-called knowledge acquisition bottleneck, and it has to be tackled in order to produce technology that can be integrated in real applications. The challenge is to make systems that disambiguate all the words in the context, as opposed to techniques that work for a handful of words. As shown in the all-words tasks in Senseval- 3 (B. Snyder and M. Palmer, 2004), the current WSD techniques are only able to exceed the most frequent sense baseline by a small margin. We believe the main reason for that is the lack of large amounts of training material for English words (not to mention words in other languages). Unfortunately developing such resources is difficult and sometimes not feasible, which has been motivating us to explore unsupervised techniques to open up the knowledge acquisition bottleneck in WSD. The unsupervised systems that we will apply on this paper require raw corpora and a thesaurus with relations between word senses and words. Although these resources are not available for all languages, there is a growing number of WordNets in different languages that can be used 1 . Other approach would be to apply methods based on distributional similarity to build a thesaurus automatically from raw corpora (Lin, 1998). The relations can then be applied in our algorithm. In this paper we have focused on the results we can obtain for 1 http://www.globalwordnet.org/gwa/wordnet table.htm Proceedings of the 2006 Australasian Language Technology Workshop (ALTW2006), pages 40–48. 40
English, relying on WordNet as thesaurus (Fellbaum, 1998). A well known approach for unsupervised WSD consists of the automatic acquisition of training data by means of monosemous relatives (Leacock et al., 1998). This technique roughly follows these steps: (i) select a set of monosemous words that are related to the different senses of the target word, (ii) query the Internet to obtain examples for each relative, (iii) create a collection of training examples for each sense, and (iv) use an ML algorithm trained on the acquired collections to tag the test instances. This method has been used to bootstrap large sense-tagged corpora (Mihalcea, 2002; Agirre and Martinez, 2004). Two important shortcomings of this method are the lack of monosemous relatives for some senses of the target words, and the noise introduced by some distant relatives. In this paper we directly address those problems by developing a new method that makes use of polysemous relatives and relies on the context of the target word to reduce the presence of noisy examples. The remaining of the paper is organised as follows. In Section 2 we describe related work in this area. Section 3 briefly introduces the monosemous relatives algorithm, and our novel method is explained in Section 4. Section 5 presents our experimental setting, and in Section 6 we report the performance of our technique and the improvement over the monosemous relatives method. Section 7 is devoted to compare our system to other unsupervised techniques and analyse the prospects for system combination. Finally, we conclude and discuss future work in Section 8. 2 Related Work The construction of unsupervised WSD systems applicable to all words in context has been the goal of many research initiatives, as can be seen in special journals and devoted books - see for instance (Agirre and Edmonds, 2006) for a recent book. We will now describe different trends that are being explored. Some recent techniques seek to alleviate the knowledge acquisition bottleneck by combining training data from different words. Kohomban and Lee (2005) build semantic classifiers by merging data from words in the same semantic class. Once the class is selected, simple heuristics are applied to obtain the fine-grained sense. The classifier fol- 41 lows memory-based learning, and the examples are weighted according to their semantic similarity to the target word. Niu et al. (2005) use all-words training data to build a word-independent model to compute the similarity between two contexts. A maximum entropy algorithm is trained with the all-words corpus, and the model is used for clustering the instances of a given target word. One of the problems of clustering algorithms for WSD is evaluation, and in this case they map the clusters to Senseval-3 lexical-sample data by looking at 10% of the examples in training data. One of the drawbacks of these systems is that they still require hand-tagged data. Parallel corpora have also been widely used to avoid the need of hand-tagged data. Recently Chan and Ng (2005) built a classifier from English-Chinese parallel corpora. They grouped senses that share the same Chinese translation, and then the occurrences of the word on the English side of the parallel corpora were considered to have been disambiguated and “sense tagged” by the appropriate Chinese translations. The system was successfully evaluated in the all-words task of Senseval-2. However, parallel corpora is an expensive resource to obtain for all target words. A related approach is to use monolingual corpora in a second language and use bilingual dictionaries to translate the training data (Wang and Carroll, 2005). Instead of using bilingual dictionaries, Wang and Martinez (2006) tried to apply machine translation on translating text snippets in foreign languages back into English and achieved good results on English WSD. Regarding portability, methods to automatically rank the senses of a word given a raw corpus, such as (McCarthy et al., 2004), have shown good flexibility to adapt to different domains, which is a desirable feature of all-words systems. We will compare the performance of the latter two systems and our approach in Section 7. 3 Monosemous Relatives method The “monosemous relatives” approach is a technique to acquire training examples automatically and then feed them to a Machine Learning (ML) method. This algorithm is based on (Leacock et al., 1998), and follows these steps: (i) select a set of monosemous words that are related to the different senses of the target word, (ii) query the Internet to obtain examples for each relative, (iii)
Page 1: Proceeding of the Australasian Lang
Page 4 and 5: Program Chairs: Committees Lawrence
Page 6 and 7: Towards the Evaluation of Referring
Page 8 and 9: The cat ate the hat that I made NP/
Page 10 and 11: Figure 2: Cells affected by adding
Page 12 and 13: were used because the accuracy for
Page 14 and 15: TIME ACC COVER FAIL RATE % secs % %
Page 16 and 17: model. We explore different estimat
Page 18 and 19: S.S. Source Sense Source S.S. Acc.
Page 20 and 21: acy. The difference in correctly as
Page 22 and 23: Experiments with Sentence Classific
Page 24 and 25: datasets with skewed distribution t
Page 26 and 27: words produced by the feature selec
Page 28 and 29: Sentence Class NB DT SVM Apology 1.
Page 30 and 31: Computational Semantics in the Natu
Page 32 and 33: S[sem = ] -> NP[sem=?subj] VP[sem=?
Page 34 and 35: Any string which cannot be decompos
Page 36 and 37: satisfy(Formula,model(D,F),G,pos):n
Page 38 and 39: Classifying Speech Acts using Verba
Page 40 and 41: The principles are dichotomous —
Page 42 and 43: ance. Several additional example ut
Page 44 and 45: our results. In classifying only th
Page 48 and 49: create a collection of training exa
Page 50 and 51: Query Sense The nave was rebuilt in
Page 52 and 53: Algorithm Avg S2LS Avg S3LS Avg S2L
Page 54 and 55: B. Snyder and M. Palmer. 2004. The
Page 56 and 57: it is even used as a black box that
Page 58 and 59: ecognition in QA, so we will not us
Page 60 and 61: BPER ILOC IPER BLOC BLOC BDATE BLOC
Page 62 and 63: of noise introduced by the addition
Page 64 and 65: 2 Existing annotated corpora Much o
Page 66 and 67: Our FUSE|tel spectrum of HD|sta 738
Page 68 and 69: N Word Correct Tagged 23 OH mol non
Page 70 and 71: L ATEX typesetting information into
Page 72 and 73: in English, neither the determiner
Page 74 and 75: con 4 (Sagot et al., 2006) and Morp
Page 76 and 77: Feature type Positions/description
Page 78 and 79: Miriam Butt, Helge Dyvik, Tracy Hol
Page 80 and 81: ently mostly solved by employing hu
Page 82 and 83: Figure 1: Augmented SCT Lexicon Tok
Page 84 and 85: medical terms can be composed by ad
Page 86 and 87: References Aronson, A. R. (2001). E
Page 88 and 89: technique to most question types, i
Page 90 and 91: User Question Analyser QA System Se
Page 92 and 93: Figure 2: Precision Figure 3: Cover
Page 94 and 95: Analysis and Review. Springer-Verla
Page 96 and 97:
2 Background 2.1 Pronoun Reference
Page 98 and 99:
ous accounts of the effects of cohe
Page 100 and 101:
Table 1: Overall Results for Object
Page 102 and 103:
References Jennifer Arnold, Janet E
Page 104 and 105:
In order for appropriate documents
Page 106 and 107:
y Michael West. He found that his t
Page 108 and 109:
low-scoring documents are “Click
Page 110 and 111:
a) b) You must complete at least 9
Page 112 and 113:
uct). In this paper, we will only c
Page 114 and 115:
indicate the categories of substruc
Page 116 and 117:
Argument lowering Dependent geach
Page 118 and 119:
who [u : np] WH(np, s, s/?gq) λP.
Page 120 and 121:
algorithms, and report the performa
Page 122 and 123:
2.3 Coverage of the Human Data Out
Page 124 and 125:
and minimal subsets of the data. Of
Page 126 and 127:
output. Finally, and related to the
Page 128 and 129:
identifies, to avoid overwhelming t
Page 130 and 131:
y the student is first parsed, usin
Page 132 and 133:
a given word sequence Sorig as a pe
Page 134 and 135:
A problem arises with this scheme w
Page 136 and 137:
Example 1: Original Headline: Europ
Page 138 and 139:
|relations(sentence1) ∩ relations
Page 140 and 141:
2685 True False 1275 Bleu Dep Bleu
Page 142 and 143:
Features Acc. C1-prec. C1-recall C1
Page 144 and 145:
Given the lack of research in VSD u
Page 146 and 147:
ased features: Type 1 Original sibl
Page 148 and 149:
Preposition of the adjunct semantic
Page 150 and 151:
Algorithm 1 The algorithm of combin
Page 152 and 153:
References Timothy Baldwin and Fran
Page 154 and 155:
dependencies in German with respect
Page 156 and 157:
wij moeten dit probleem aanpakken w
Page 158 and 159:
a brevity penalty of BLEU n = 1 n =
Page 160 and 161:
plicitly matching the syntax of the
Page 162 and 163:
Probabilities improve stress-predic
Page 164 and 165:
Towards Cognitive Optimisation of a
Page 166 and 167:
Natural Language Processing and XML
Page 168 and 169:
Extracting Patient Clinical Profile
show all

Automatic Mapping Clinical Notes to Medical - RMIT University

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?