Automatic Mapping Clinical Notes to Medical - RMIT University

More documents

Recommendations

Info

$journal of latex class files, vol. 6, no. 1, january ... - RMIT University$

Extracting Patient Clinical Profiles from Case Reports Abstract This research aims to extract detailed clinical profiles, such as signs and symptoms, and important laboratory test results of the patient from descriptions of the diagnostic and treatment procedures in journal articles. This paper proposes a novel mark-up tag set to cover a wide variety of semantics in the description of clinical case studies in the clinical literature. A manually annotated corpus which consists of 75 clinical reports with 5,117 sentences has been created and a sentence classification system is reported as the preliminary attempt to exploit the fast growing online repositories of clinical case reports. 1 Corpus and Mark-up Tags This paper proposes a mark-up scheme aimed at recovering key semantics of clinical case reports in journal articles. The development of this markup tag set is the result of analysing information needs of clinicians for building a better health information system. During the development of this tag set, domain experts were constantly consulted for their input and advice. 1.1 The Mark-up Tag Set • Sign is a signal that indicates the existence or nonexistence of a disease as observed by clinicians during the diagnostic and treatment procedure. Typical signs of a patient include the appearance of the patient, readings or analytical results of laboratory tests, or responses to a medical treatment. • Symptom is also an indication of disorder or disease but is noted by patients rather than by Yitao Zhang and Jon Patrick School of Information Technology University of Sydney NSW 2006, Australia {yitao, jonpat}@it.usyd.edu.au clinicians. For instance, a patient can experience weakness, fatigue, or pain during the illness. • Medical test is a specific type of sign in which a quantifiable or specific value has been identified by a medical testing procedure, such as blood pressure or white blood cell count. • Diagnostic test gives analytical results for diagnosis purposes as observed by clinicians in a medical testing procedure. It differs from a medical test in that it generally returns no quantifiable value or reading as its result. The expertise of clinicians is required to read and analyse the result of a diagnostic test, such as interpreting an X-ray image. • Diagnosis identifies conditions that are diagnosed by clinicians. • Treatment is the therapy or medication that patients received. • Referral specifies another unit or department to which patients are referred for further examination or treatment. • Patient health profile identifies characteristics of patient health histories, including social behaviors. • Patient demographics outlines the details and backgrounds of a patient. • Causation is a speculation about the cause of a particular abnormal condition, circumstance or case. • Exceptionality states the importance and merits of the reported case. Proceedings of the 2006 Australasian Language Technology Workshop (ALTW2006), pages 162–163. 162
Total articles 75 Total sentences 5,117 Total sentences with tag 2,319 Total tokens 112,382 Total tokens with tag 48,394 Table 1: Statistics of the Corpus • Case recommendations marks the advice for clinicians or other readers of the report. • Exclusion rules out a particular causation or phenomenon in a report. 1.2 The Corpus The corpus described in this paper is a collection of recent research articles that report clinical findings by medical researchers. To make the data representative of the clinical domain, a wide variety of topics have been covered in the corpus, such as cancers, gene-related diseases, viral and bacteria infections, and sports injuries. The articles were randomly selected and downloaded from BioMed Central 1 . During the selection stage, those reports that describe a group of patients are removed. As a result, this corpus is confined to clinical reports on individual patients. A single human annotator (first author) has manually tagged all the articles in the corpus. The statistical profile of the corpus is shown in Table 1. 2 The Sentence Classification Task The patient case studies corpus provides a promising source for automatically extracting knowledge from clinical records. As a preliminary experiment, an information extraction task has been conducted to assign each sentence in the corpus with appropriate tags. Among the total of 2,319 sentences that have tags, there are 544 (23.5%) sentences assigned more than one tag. This overlapping feature of the tag assignment makes a single multi-class classifier approach not appropriate for the task. Instead, each tag has been given a separate machine-learned classifier capable of assigning a binary ’Yes’ or ’No’ label for a sentence according to whether or not the sentence includes the targeted information as defined by the tag set. Meanwhile, a supervised-learning approach was adopted in this experiment. 1 http://www.biomedcentral.com/ 163 Tag Precision Recall F1 Diagnostic test 66.6 46.8 55.0 Medical test 80.4 51.6 62.9 Treatment 67.6 44.7 53.8 Diagnosis 62.5 33.8 43.8 Sign 61.2 50.5 55.4 Symptom 67.8 45.8 54.7 Patient demographics 91.6 73.1 81.3 Patient health profile 53.0 24.1 33.2 Table 2: Sentence Classification Result for Some Semantic Tags A Maximum Entropy (MaxEnt) classifier 2 and a SVM classifier (SVM-light) with tree kernel (Moschitti, 2004; Joachims, 1999) were used in the experiment. The SVM classifier used two different kernels in the experiment: a linear kernel (SVM t=1), and a combination of sub-tree kernel and linear kernel (SVM t=6). The introduction of the tree kernel was an attempt to evaluate the effectiveness of incorporating syntactic clues for the task. The feature set used in the experiment consists of unigrams, bigrams, and title of the current section. The experiment results for selected markup tags are shown in Table 2. Acknowledgments We wish to thank Prof Deborah Saltman for defining the tag categories and Joel Nothman for refining their use on texts. References Thorsten Joachims. 1999. Making Large-scale SVM Learning Practical. In B. Schölkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods- Support Vector Learning. Alessandro Moschitti. 2004. A Study on Convolution Kernels for Shallow Semantic Parsing. In Proceedings of the 42-th Conference on Association for Computational Linguistic. 2 Zhang Le http://homepages.inf.ed.ac.uk/s0450736/
Page 1:
Proceeding of the Australasian Lang
Page 4 and 5:
Program Chairs: Committees Lawrence
Page 6 and 7:
Towards the Evaluation of Referring
Page 8 and 9:
The cat ate the hat that I made NP/
Page 10 and 11:
Figure 2: Cells affected by adding
Page 12 and 13:
were used because the accuracy for
Page 14 and 15:
TIME ACC COVER FAIL RATE % secs % %
Page 16 and 17:
model. We explore different estimat
Page 18 and 19:
S.S. Source Sense Source S.S. Acc.
Page 20 and 21:
acy. The difference in correctly as
Page 22 and 23:
Experiments with Sentence Classific
Page 24 and 25:
datasets with skewed distribution t
Page 26 and 27:
words produced by the feature selec
Page 28 and 29:
Sentence Class NB DT SVM Apology 1.
Page 30 and 31:
Computational Semantics in the Natu
Page 32 and 33:
S[sem = ] -> NP[sem=?subj] VP[sem=?
Page 34 and 35:
Any string which cannot be decompos
Page 36 and 37:
satisfy(Formula,model(D,F),G,pos):n
Page 38 and 39:
Classifying Speech Acts using Verba
Page 40 and 41:
The principles are dichotomous —
Page 42 and 43:
ance. Several additional example ut
Page 44 and 45:
our results. In classifying only th
Page 46 and 47:
Word Relatives in Context for Word
Page 48 and 49:
create a collection of training exa
Page 50 and 51:
Query Sense The nave was rebuilt in
Page 52 and 53:
Algorithm Avg S2LS Avg S3LS Avg S2L
Page 54 and 55:
B. Snyder and M. Palmer. 2004. The
Page 56 and 57:
it is even used as a black box that
Page 58 and 59:
ecognition in QA, so we will not us
Page 60 and 61:
BPER ILOC IPER BLOC BLOC BDATE BLOC
Page 62 and 63:
of noise introduced by the addition
Page 64 and 65:
2 Existing annotated corpora Much o
Page 66 and 67:
Our FUSE|tel spectrum of HD|sta 738
Page 68 and 69:
N Word Correct Tagged 23 OH mol non
Page 70 and 71:
L ATEX typesetting information into
Page 72 and 73:
in English, neither the determiner
Page 74 and 75:
con 4 (Sagot et al., 2006) and Morp
Page 76 and 77:
Feature type Positions/description
Page 78 and 79:
Miriam Butt, Helge Dyvik, Tracy Hol
Page 80 and 81:
ently mostly solved by employing hu
Page 82 and 83:
Figure 1: Augmented SCT Lexicon Tok
Page 84 and 85:
medical terms can be composed by ad
Page 86 and 87:
References Aronson, A. R. (2001). E
Page 88 and 89:
technique to most question types, i
Page 90 and 91:
User Question Analyser QA System Se
Page 92 and 93:
Figure 2: Precision Figure 3: Cover
Page 94 and 95:
Analysis and Review. Springer-Verla
Page 96 and 97:
2 Background 2.1 Pronoun Reference
Page 98 and 99:
ous accounts of the effects of cohe
Page 100 and 101:
Table 1: Overall Results for Object
Page 102 and 103:
References Jennifer Arnold, Janet E
Page 104 and 105:
In order for appropriate documents
Page 106 and 107:
y Michael West. He found that his t
Page 108 and 109:
low-scoring documents are “Click
Page 110 and 111:
a) b) You must complete at least 9
Page 112 and 113:
uct). In this paper, we will only c
Page 114 and 115:
indicate the categories of substruc
Page 116 and 117:
Argument lowering Dependent geach
Page 118 and 119: who [u : np] WH(np, s, s/?gq) λP.
Page 120 and 121: algorithms, and report the performa
Page 122 and 123: 2.3 Coverage of the Human Data Out
Page 124 and 125: and minimal subsets of the data. Of
Page 126 and 127: output. Finally, and related to the
Page 128 and 129: identifies, to avoid overwhelming t
Page 130 and 131: y the student is first parsed, usin
Page 132 and 133: a given word sequence Sorig as a pe
Page 134 and 135: A problem arises with this scheme w
Page 136 and 137: Example 1: Original Headline: Europ
Page 138 and 139: |relations(sentence1) ∩ relations
Page 140 and 141: 2685 True False 1275 Bleu Dep Bleu
Page 142 and 143: Features Acc. C1-prec. C1-recall C1
Page 144 and 145: Given the lack of research in VSD u
Page 146 and 147: ased features: Type 1 Original sibl
Page 148 and 149: Preposition of the adjunct semantic
Page 150 and 151: Algorithm 1 The algorithm of combin
Page 152 and 153: References Timothy Baldwin and Fran
Page 154 and 155: dependencies in German with respect
Page 156 and 157: wij moeten dit probleem aanpakken w
Page 158 and 159: a brevity penalty of BLEU n = 1 n =
Page 160 and 161: plicitly matching the syntax of the
Page 162 and 163: Probabilities improve stress-predic
Page 164 and 165: Towards Cognitive Optimisation of a
Page 166 and 167: Natural Language Processing and XML
show all

Automatic Mapping Clinical Notes to Medical - RMIT University

Create successful ePaper yourself

Delete template?

Save as template?