Automatic Mapping Clinical Notes to Medical - RMIT University

More documents

Recommendations

Info

$journal of latex class files, vol. 6, no. 1, january ... - RMIT University$

Figure 1: Augmented SCT Lexicon Token Matching Algorithm The token matching algorithm takes unstructured text and pre-processes it using the same techniques as are applied to the concepts when generating the augmented lexicon. It then attempts to find each SNOMED CT Description which is contained in the input sentence. For each word, the algorithm looks up the Augmented Lexicon, retrieving a list of the descriptions which contain the word. Figure 2 gives a graphical representation of the data structure used in the algorithm. The elements of the matrix are n-grams from the input sentence with the diagonal line sequence runs of two words. The remainder of the matrix is the cell to the left of it with the next word appended onto it. In this way the algorithm covers every possible sequence of sequential tokens Figure 2: Matching Matrix example 76 The data stored in each cell is a list of Description IDs (DID) that are in all the tokens that comprise the cell, i.e. the intersection of each set of DID of each word. The score is then calculated using the "atomic term count", which stores the number of tokens that make up that description. The score is the number of tokens in the current cell that have the DID in common divided by the number of tokens in the full description, i.e.: # of Tokens in Sequence Score = # of Tokens in Full Description The algorithm itself is shown here in Figure 3 as pseudo-code. Step 1 is building the Matching Matrix. Step 2 is using the Matching Matrix to find the best combination of sequences that gives the highest score. This final score is dependant on the number of tokens used to make the match divided by the total number of tokens in the input stream, i.e.: # of Tokens used in all matches Score = # of Tokens in total input stream STEP 1 for each word in list: add entry to the Matching Matrix for new column: Intersect new word with cell from matching table Sort the matching array in descending order based off the scores for each row in the matrix: start at the right most cell STEP 2 if the top score for the cell is 1.0 add cell details to current best match list, update current match score. recursively call STEP 2 on cell (row=column+2, column=right) else: move one column left to the next cell or the right-most cell of the next row if left cell empty repeat STEP 2 until visited all cells Figure 3: Matching Algorithm Pseudo-code
Adding Abbreviations Different sub-domains have different definitions of abbreviations. In medical domain, the abbreviations are highly ambiguous, as (Liu et al., 2002) show that 33% of abbreviations in UMLS are ambiguous. In different hospitals, they have their own convention of abbreviations, and the abbreviations used are not the same cross the sections in the same sub-domain. This creates difficulties for resolving the abbreviation problem. As we are processing clinical data in the RPAH (Royal Prince Alfred Hospital) ICU (Intensive Care Unit), we believe that the abbreviations used in their reports are restricted to a subdomain and not that ambiguous. We use a list of abbreviations provided by the ICU department, and integrated them into the Augmented Lexicon. The abbreviations are manually mapped to SNOMED CT concepts by two experts in RPAH. The list consists of 1,254 abbreviations, 57 of them are ambiguous (4.5%). We decided not to disambiguate the abbreviations in the token matching, but return a list of all possible candidates and leave it for later stage to resolve the ambiguity. 4.3 Negation Identification In our system, we aim to identify the negation phrases and the scope of the negation. Two kinds of negations are identified, the pre-coordianted SNOMED CT concepts and concepts that are explicitly asserted as negative by negation phrases. A pre-coordinated phrase is a term that exists in SNOMED CT terminology that represents a negative term, for example no headache. SNOMED CT contains a set of percoordinated negative terms under the Clinical Finding Absent (373572006) category that indicate the absence of findings and diseases. However, SNOMED CT is not an exhaustive terminology, it is not able to capture all negated terms. Moreover clinical notes have many negation forms other than “absence”, such as “denial of procedures”. For a negative term that has a precoordinated mapping in SNOMED CT, we mark up this term using the SNOMED CT concept id (CID), for other negations, we identify the negation phrases and the SNOMED CT concepts that the negation applies on. The following examples show the two different negations: no headache (Pre-coordinated negation term) "absent of" CID: 162298006 no headache (context-dependent category) 77 no evidence of neoplasm malignant (Explicitly asserted negation) negation phrase: "no evidence of" CID: 363346000 malignant neoplastic disease (disorder) Figure 4: Examples of Negations To identify explicitly asserted negation, we implemented a simple-rule based negation identifier similar to (Chapman et al, 2001; Elkin et al, 2005). At first the SNOMED CT concept id is assigned to each medical term, the negation phrases then are identified using a list of negation phrases in (Chapman et al, 2001). Then a rule base is applied on the negation phrase to check at its left and right contexts to see if any surrounding concepts have been negated. The algorithm is able to identify the negation of the form: negation phrase … (SNOMED CT phrase)* (SNOMED CT phrase)* … negation phrase The contexts can up to 5 non-stopwords long, which allow identification of negation of coordination structure, for example in the following sentence segment: … and pelvis did not reveal retroperitoneal lymphadenopathy or mediastinal lymphadenopathy … In this sentence segment, the terms, retroperitoneal lymphadenopathy and mediastinal lymphadenopathy are negated. Whenever there is a overlapping between Precoordinated negation and explicitly asserted negation, we identify the term as pre-coordinated negation. For example, the term no headache (162298006) will not be identified as the negation of headache (25064002). 4.4 Qualification and Term Composition In medical terminology a term may contain an atomic concept or composition of multiple concepts, for example the term pain is an atomic concept and back pain represents composition two atomic concepts back and pain. Some composite concepts appear as single concepts in medical terminology, for example back pain is a single concept in SNOMED CT. Such concept is called pre-coordinated concept. However, the
Page 1:
Proceeding of the Australasian Lang
Page 4 and 5:
Program Chairs: Committees Lawrence
Page 6 and 7:
Towards the Evaluation of Referring
Page 8 and 9:
The cat ate the hat that I made NP/
Page 10 and 11:
Figure 2: Cells affected by adding
Page 12 and 13:
were used because the accuracy for
Page 14 and 15:
TIME ACC COVER FAIL RATE % secs % %
Page 16 and 17:
model. We explore different estimat
Page 18 and 19:
S.S. Source Sense Source S.S. Acc.
Page 20 and 21:
acy. The difference in correctly as
Page 22 and 23:
Experiments with Sentence Classific
Page 24 and 25:
datasets with skewed distribution t
Page 26 and 27:
words produced by the feature selec
Page 28 and 29:
Sentence Class NB DT SVM Apology 1.
Page 30 and 31:
Computational Semantics in the Natu
Page 32 and 33: S[sem = ] -> NP[sem=?subj] VP[sem=?
Page 34 and 35: Any string which cannot be decompos
Page 36 and 37: satisfy(Formula,model(D,F),G,pos):n
Page 38 and 39: Classifying Speech Acts using Verba
Page 40 and 41: The principles are dichotomous —
Page 42 and 43: ance. Several additional example ut
Page 44 and 45: our results. In classifying only th
Page 46 and 47: Word Relatives in Context for Word
Page 48 and 49: create a collection of training exa
Page 50 and 51: Query Sense The nave was rebuilt in
Page 52 and 53: Algorithm Avg S2LS Avg S3LS Avg S2L
Page 54 and 55: B. Snyder and M. Palmer. 2004. The
Page 56 and 57: it is even used as a black box that
Page 58 and 59: ecognition in QA, so we will not us
Page 60 and 61: BPER ILOC IPER BLOC BLOC BDATE BLOC
Page 62 and 63: of noise introduced by the addition
Page 64 and 65: 2 Existing annotated corpora Much o
Page 66 and 67: Our FUSE|tel spectrum of HD|sta 738
Page 68 and 69: N Word Correct Tagged 23 OH mol non
Page 70 and 71: L ATEX typesetting information into
Page 72 and 73: in English, neither the determiner
Page 74 and 75: con 4 (Sagot et al., 2006) and Morp
Page 76 and 77: Feature type Positions/description
Page 78 and 79: Miriam Butt, Helge Dyvik, Tracy Hol
Page 80 and 81: ently mostly solved by employing hu
Page 84 and 85: medical terms can be composed by ad
Page 86 and 87: References Aronson, A. R. (2001). E
Page 88 and 89: technique to most question types, i
Page 90 and 91: User Question Analyser QA System Se
Page 92 and 93: Figure 2: Precision Figure 3: Cover
Page 94 and 95: Analysis and Review. Springer-Verla
Page 96 and 97: 2 Background 2.1 Pronoun Reference
Page 98 and 99: ous accounts of the effects of cohe
Page 100 and 101: Table 1: Overall Results for Object
Page 102 and 103: References Jennifer Arnold, Janet E
Page 104 and 105: In order for appropriate documents
Page 106 and 107: y Michael West. He found that his t
Page 108 and 109: low-scoring documents are “Click
Page 110 and 111: a) b) You must complete at least 9
Page 112 and 113: uct). In this paper, we will only c
Page 114 and 115: indicate the categories of substruc
Page 116 and 117: Argument lowering Dependent geach
Page 118 and 119: who [u : np] WH(np, s, s/?gq) λP.
Page 120 and 121: algorithms, and report the performa
Page 122 and 123: 2.3 Coverage of the Human Data Out
Page 124 and 125: and minimal subsets of the data. Of
Page 126 and 127: output. Finally, and related to the
Page 128 and 129: identifies, to avoid overwhelming t
Page 130 and 131: y the student is first parsed, usin
Page 132 and 133:
a given word sequence Sorig as a pe
Page 134 and 135:
A problem arises with this scheme w
Page 136 and 137:
Example 1: Original Headline: Europ
Page 138 and 139:
|relations(sentence1) ∩ relations
Page 140 and 141:
2685 True False 1275 Bleu Dep Bleu
Page 142 and 143:
Features Acc. C1-prec. C1-recall C1
Page 144 and 145:
Given the lack of research in VSD u
Page 146 and 147:
ased features: Type 1 Original sibl
Page 148 and 149:
Preposition of the adjunct semantic
Page 150 and 151:
Algorithm 1 The algorithm of combin
Page 152 and 153:
References Timothy Baldwin and Fran
Page 154 and 155:
dependencies in German with respect
Page 156 and 157:
wij moeten dit probleem aanpakken w
Page 158 and 159:
a brevity penalty of BLEU n = 1 n =
Page 160 and 161:
plicitly matching the syntax of the
Page 162 and 163:
Probabilities improve stress-predic
Page 164 and 165:
Towards Cognitive Optimisation of a
Page 166 and 167:
Natural Language Processing and XML
Page 168 and 169:
Extracting Patient Clinical Profile
show all

Automatic Mapping Clinical Notes to Medical - RMIT University

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?