Automatic Mapping Clinical Notes to Medical - RMIT University

More documents

Recommendations

Info

$journal of latex class files, vol. 6, no. 1, january ... - RMIT University$

Experiments with Sentence Classification Anthony Khoo, Yuval Marom and David Albrecht Faculty of Information Technology, Monash University Clayton, VICTORIA 3800, AUSTRALIA nthony ak@yahoo.com, {yuvalm,dwa}@csse.monash.edu.au Abstract We present a set of experiments involving sentence classification, addressing issues of representation and feature selection, and we compare our findings with similar results from work on the more general text classification task. The domain of our investigation is an email-based help-desk corpus. Our investigations compare the use of various popular classification algorithms with various popular feature selection methods. The results highlight similarities between sentence and text classification, such as the superiority of Support Vector Machines, as well as differences, such as a lesser extent of the usefulness of features selection on sentence classification, and a detrimental effect of common preprocessing techniques (stop-word removal and lemmatization). 1 Introduction Classification tasks applied to textual data have been receiving increasing attention due to the explosion in digital presentation and storage of textual information, such as web pages, emails, publications, and discussion forums. The bulk of the research concerns the classification of complete documents, such as spam detection in emails (Drucker et al., 1999), and the classification of news articles (Yang and Pedersen, 1997; Joachims, 1998). These kinds of tasks are widely known as text classification (TC). A text document is best characterized by the words and terms it contains, and consequently the representation of textual data is often of a very high dimensionality. Thus, an important aspect of TC is feature selection (Yang and Pedersen, 1997; Forman, 2003). There are numerous examples of textual documents whose content conveys communication between multiple parties. In such documents, it may be useful to classify individual sentences that express communicative acts, either to obtain a more meaningful description of the documents, or simply to extract meaningful components, such as action items or opinions. The computational linguistics community devotes considerable research into speech and dialogue acts, and has developed a markup convention for coding both spoken and written language (Core and Allen, 1997, for example). The classifications we use for sentences are inspired by such conventions. Although there are existing implementations of sentence classification (SC) (Zhou et al., 2004; Wang et al., 2005; McKnight and Srinivasan, 2003), including ones where sentences convey communicative acts (Cohen et al., 2004; Corston- Oliver et al., 2004; Ivanovic, 2005), comparatively little attention has been given to SC in general. In particular, there are no empirical demonstrations of the effect of feature selection in SC tasks, to the best of our knowledge. This paper presents a study into sentence classification, with particular emphasis on representational issues of extracting features from sentences, and applying feature selection (FS) methods. We experiment with various widely accepted FS methods and classification algorithms, and relate our findings to results from TC reported in the literature. Note that we do not offer any new methods in this paper. Rather, we offer some insight into the characteristics of SC and what distinguishes it from the more general TC, and this insight is driven by empirical findings. We believe that sen- Proceedings of the 2006 Australasian Language Technology Workshop (ALTW2006), pages 16–23. 16
Sentence Class Frequency Percentage Sentence Class Frequency Percentage APOLOGY 23 1.5% SALUTATION 129 8.7% INSTRUCTION 126 8.5% SIGNATURE 32 2.2% INSTRUCTION-ITEM 94 6.3% SPECIFICATION 41 2.8% OTHERS 22 1.5% STATEMENT 423 28.5% QUESTION 24 1.6% SUGGESTION 55 3.7% REQUEST 146 9.8% THANKING 228 15.3% RESPONSE-ACK 63 4.2% URL 80 5.4% tence classification is emerging as an important task to investigate, due to the increasing interest in detecting intentional units at a sub-document level. The rest of the paper is organized as follows. In the next section we present our domain of investigation. In Section 3 we discuss the experiments that we carried out, and we conclude the paper in Section 4. 2 Domain Our corpus consists of 160 email dialogues between customers and operators at Hewlett- Packard’s help-desk. These deal with a variety of issues, including requests for technical assistance, inquiries about products, and queries about how to return faulty products or parts. As an initial step in our study, we decided to focus only on the response emails, as they contain well-formed grammatical sentences, as opposed to the customers’ emails. In future work we intend to extend our study to include both types of emails. The response emails contain 1486 sentences overall, which we have divided into the classes shown in Table 1. The classes are inspired by the SWBD- DAMSL tag set (Jurafsky et al., 1997), an adaptation of the Dialog Act Markup in Several Layers (DAMSL) annotation scheme (Core and Allen, 1997) for switchboard conversations. For example, RESPONSE-ACK refers to an acknowledgement by the operator of receiving the customer’s request: Your email was submitted to the HP eServices Commercial Support group; INSTRUCTION- ITEM is similar to INSTRUCTION but appears as part of a list of instructions. We can see from Table 1 that there is a high distribution skew, where some classes are very small. This means that many of the classes have very few positive examples to learn from. We will see various implications of this high skew in our investigation (Section 3). Table 1: Sentence class distribution. 17 When annotating the sentences, problems arose when a sentence was of compound form, which consisted of multiple independent clauses connected by conjunctions, like “and”, “but”, and “or”. For example, the sentence “Please send us the error message and we will be able to help.”. The two clauses could be labeled as REQUEST and STATEMENT respectively. As our study considered only one tag per sentence, the annotators were asked to consider the most dominant clause to tag the sentence as a whole. Another tricky problem when tagging the sentences dealt with the complex sentences, which contained one independent clause and one or more dependent clauses, for example “If you see any error message, please forward it to us”. The first clause is a dependent clause, while the second one is an independent clause. To solve this problem, the annotators were asked to consider only the independent clause to determine which tag to use. Despite these difficulties, we obtained a high inter-tagger agreement, measured with the widely used Kappa statistic (Carletta, 1996) as 0.85. We had three annotators, and we considered only the sentences on which at least two of the annotators agreed. This was the case in all but 21 of the sentences. 3 Experiments Our experiments involve three classification algorithms, Naive Bayes (NB), Decision Tree (DT), and Support Vector Machine (SVM). The evaluation platform is the machine learning software toolkit WEKA (Witten and Frank, 2005). For the SVM, the multi-class task is implemented as a series of binary classification tasks. We employ a stratified 10-fold validation procedure, where the labelled sentences are randomly allocated to training and testing data splits. A standard measure for classification performance is classification accuracy. However, for
Page 1: Proceeding of the Australasian Lang
Page 4 and 5: Program Chairs: Committees Lawrence
Page 6 and 7: Towards the Evaluation of Referring
Page 8 and 9: The cat ate the hat that I made NP/
Page 10 and 11: Figure 2: Cells affected by adding
Page 12 and 13: were used because the accuracy for
Page 14 and 15: TIME ACC COVER FAIL RATE % secs % %
Page 16 and 17: model. We explore different estimat
Page 18 and 19: S.S. Source Sense Source S.S. Acc.
Page 20 and 21: acy. The difference in correctly as
Page 24 and 25: datasets with skewed distribution t
Page 26 and 27: words produced by the feature selec
Page 28 and 29: Sentence Class NB DT SVM Apology 1.
Page 30 and 31: Computational Semantics in the Natu
Page 32 and 33: S[sem = ] -> NP[sem=?subj] VP[sem=?
Page 34 and 35: Any string which cannot be decompos
Page 36 and 37: satisfy(Formula,model(D,F),G,pos):n
Page 38 and 39: Classifying Speech Acts using Verba
Page 40 and 41: The principles are dichotomous —
Page 42 and 43: ance. Several additional example ut
Page 44 and 45: our results. In classifying only th
Page 46 and 47: Word Relatives in Context for Word
Page 48 and 49: create a collection of training exa
Page 50 and 51: Query Sense The nave was rebuilt in
Page 52 and 53: Algorithm Avg S2LS Avg S3LS Avg S2L
Page 54 and 55: B. Snyder and M. Palmer. 2004. The
Page 56 and 57: it is even used as a black box that
Page 58 and 59: ecognition in QA, so we will not us
Page 60 and 61: BPER ILOC IPER BLOC BLOC BDATE BLOC
Page 62 and 63: of noise introduced by the addition
Page 64 and 65: 2 Existing annotated corpora Much o
Page 66 and 67: Our FUSE|tel spectrum of HD|sta 738
Page 68 and 69: N Word Correct Tagged 23 OH mol non
Page 70 and 71: L ATEX typesetting information into
Page 72 and 73:
in English, neither the determiner
Page 74 and 75:
con 4 (Sagot et al., 2006) and Morp
Page 76 and 77:
Feature type Positions/description
Page 78 and 79:
Miriam Butt, Helge Dyvik, Tracy Hol
Page 80 and 81:
ently mostly solved by employing hu
Page 82 and 83:
Figure 1: Augmented SCT Lexicon Tok
Page 84 and 85:
medical terms can be composed by ad
Page 86 and 87:
References Aronson, A. R. (2001). E
Page 88 and 89:
technique to most question types, i
Page 90 and 91:
User Question Analyser QA System Se
Page 92 and 93:
Figure 2: Precision Figure 3: Cover
Page 94 and 95:
Analysis and Review. Springer-Verla
Page 96 and 97:
2 Background 2.1 Pronoun Reference
Page 98 and 99:
ous accounts of the effects of cohe
Page 100 and 101:
Table 1: Overall Results for Object
Page 102 and 103:
References Jennifer Arnold, Janet E
Page 104 and 105:
In order for appropriate documents
Page 106 and 107:
y Michael West. He found that his t
Page 108 and 109:
low-scoring documents are “Click
Page 110 and 111:
a) b) You must complete at least 9
Page 112 and 113:
uct). In this paper, we will only c
Page 114 and 115:
indicate the categories of substruc
Page 116 and 117:
Argument lowering Dependent geach
Page 118 and 119:
who [u : np] WH(np, s, s/?gq) λP.
Page 120 and 121:
algorithms, and report the performa
Page 122 and 123:
2.3 Coverage of the Human Data Out
Page 124 and 125:
and minimal subsets of the data. Of
Page 126 and 127:
output. Finally, and related to the
Page 128 and 129:
identifies, to avoid overwhelming t
Page 130 and 131:
y the student is first parsed, usin
Page 132 and 133:
a given word sequence Sorig as a pe
Page 134 and 135:
A problem arises with this scheme w
Page 136 and 137:
Example 1: Original Headline: Europ
Page 138 and 139:
|relations(sentence1) ∩ relations
Page 140 and 141:
2685 True False 1275 Bleu Dep Bleu
Page 142 and 143:
Features Acc. C1-prec. C1-recall C1
Page 144 and 145:
Given the lack of research in VSD u
Page 146 and 147:
ased features: Type 1 Original sibl
Page 148 and 149:
Preposition of the adjunct semantic
Page 150 and 151:
Algorithm 1 The algorithm of combin
Page 152 and 153:
References Timothy Baldwin and Fran
Page 154 and 155:
dependencies in German with respect
Page 156 and 157:
wij moeten dit probleem aanpakken w
Page 158 and 159:
a brevity penalty of BLEU n = 1 n =
Page 160 and 161:
plicitly matching the syntax of the
Page 162 and 163:
Probabilities improve stress-predic
Page 164 and 165:
Towards Cognitive Optimisation of a
Page 166 and 167:
Natural Language Processing and XML
Page 168 and 169:
Extracting Patient Clinical Profile
show all

Automatic Mapping Clinical Notes to Medical - RMIT University

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?