Automatic Mapping Clinical Notes to Medical - RMIT University

More documents

Recommendations

Info

$journal of latex class files, vol. 6, no. 1, january ... - RMIT University$

Natural Language Processing and XML Retrieval Alan Woodley Faculty of Information Technology Queensland University of Technology awoodley@qut.edu.au Marcus Hassler Department of Informatics Universitdat Klagenfurt Marcus.Hassler@uni-klu.ac.at Abstract Xavier Tannier Dept. Networks, Information, Multi-media Ecole des Mines Saint-Etienne tannier@emse.fr Shlomo Geva Faculty of Information Technology Queensland University of Technology s.geva@qut.edu.au 2 Motivation XML information retrieval (XML-IR) systems respond to user queries with results There are two major problems with formal query more specific than documents. XML-IR languages for XML-IR that could be rectified with queries contain both content and struc- NLQs. First, expressing a structural information tural requirements traditionally expressed need in a formal language is too difficult for many in a formal language. However, an intu- users. O’Keefe and Trotman (2004) investigated itive alternative is natural language queries five structured query languages and concluded that 1 (NLQs). Here, we discuss three approaches for handling NLQs in an XML- IR system that are comparable to, and even outperform formal language queries. Introduction all of them were too complicated to use. In practice, 63% of the expert-built queries queries in the 2003 INEX campaign had major semantic or syntactic errors, requiring up to 12 rounds of corrections. In contrast, users should be able to express their need in NLQs intuitively. Information retrieval (IR) systems respond to user Second, formal query languages require an inti- queries with a ranked list of relevant documents, mate knowledge of a document’s structure. So, in even though only parts of the documents are rel- order to retrieve information from abstracts, secevant. In contrast, XML-IR systems are able to tions or bibliographic items, users need to know exploit the separation of structure and content in their corresponding tags. While this information XML documents by returning relevant portions is contained in the DTD/ Schema, it may not be of documents. To interact with XML-IR sys- publicly available, and is too much information tems users must specify both their content and to remember (INEX, for instance has 192 nodes). structural requirements in structured queries. Cur- The problem extrapolates in a heterogenous colrently, formal languages are used to specify struclection since a single retrieval unit could be extured queries, however, they have proven problempressed in multiple tags. In contrast, since strucatic since they are too difficult to use and are too tures in NLQs are formulated at the conceptional tightly bound to the collection. level users do not have to know their actual tag A promising alternative to formal queries languages is structured natural language queries names. (NLQs). Here, we present justifications for NLQs in XML-IR, and describe three approaches that translate NLQs to an existing formal language 3 The Approaches (NEXI). When used in with an XML-IR system Here, we present three techniques used to translate the approaches perform strongly, at times outper- NLQs to NEXI in INEX 2004 and 2005. The three forming a baseline consisting of manually con- approaches are called Hassler, Tannier (Tannier, structed NEXI expressions. These results show 2005) and Woodley (Woodley and Geva, 2005) af- that NLQs are potentially a viable alternative to ter their authors. While each of the approaches is XML-IR systems. Proceedings of the 2006 Australasian Language Technology different, Workshop they all (ALTW2006), contain four pagesmain 160–161. stages. 160
3.1 Detecting Structural and Content Constraints The first stage is to detect a query’s structural and content constraints. Hassler uses template matching based on words and parts-of-speech. Links between structure and content are not linguistically motivated, and it is assumed that content is the last element. Woodley adds shallow syntactic parsing before applying the same kind of template matching. Tannier uses deep syntactic analysis, complemented by some specific semantic rules concerning query structure. 3.2 Structure Analysis The second stage is to map structural constraints to corresponding XML tags. This requires lexical knowledge about the documents’ structure, since the tags in the XML documents are rarely "real" words or phrases, but abbreviations, acronyms or an amalgamation of two. Furthermore, a single tag can be referred to by different names. Tannier uses grammatical knowledge to recognise some frequent linguistic constructions that imply structure. 3.3 Content Analysis The third stage is to derive users’ content requirements, as either terms or phrases. Noun phrases are particularly useful in information retrieval. They are identified as specific sequences of parts-of-speech. Tannier is also able to use content terms to set up a contextual search along the entire structure of the documents. 3.4 NEXI Query Formulation The final stage of translation is the formulation of NEXI queries. Following NEXI format, content terms are delimitated by spaces, with phrases surrounded by quotation marks. 4 Results Here, we present the ep-gr scores from the 2005 INEX NLQ2NEXI Track. The results correspond to different relevance quantisation and interpretations of structural constraints - a thorough description of which is provided in (Kazai and Lalmas, 2005). The results compare the retrieval performance of a XML-IR system (Geva, 2005) when the 3 natural language approaches and a fourth "baseline" system, which used manually constructed NEXIs queries, were used as input. 161 Baseline Hassler Tannier Woodley Strict 0.0770 0.0740 0.0775 0.0755 Gen 0.1324 0.1531 0.1064 0.1051 Table 1: SSCAS ep-gr scores Baseline Hassler Tannier Woodley Strict 0.0274 0.0267 0.0304 0.0267 Gen 0.0272 0.0287 0.0298 0.0311 Table 2: SVCAS ep-gr scores Baseline Hassler Tannier Woodley Strict 0.0383 0.0338 0.0363 0.0340 Gen 0.0608 0.0641 0.0682 0.0632 Table 3: VSCAS ep-gr scores Baseline Hassler Tannier Woodley Strict 0.0454 0.0372 0.0418 0.0483 Gen 0.0694 0.0740 0.0799 0.0742 Table 4: VVCAS ep-gr scores The results show that the NLP approaches perform comparably - and even outperform - the baseline. 5 Conclusion While the application of NLP XML-IR is in its infancy, it has already produced promising results. But if it is to process to an operational environment it requires an intuitive interface. Here, we describe and presented the performance of three approaches for handling NLQs. The results show that NLQs are potentially a viable alternative to formal query languages and the integration of NLP and XML-IR can be mutually beneficial. References Norbert Fuhr, Mounia Lalmas, Saadia Malik, and Zoltàn Szlàvik, editors. 2006. INEX 2005 Proceedings, Schloss Dagstuhl, Germany, November 28–30, 2005. Shlomo Geva. 2005. GPX - Gardens Point XML Information Retrieval at INEX 2005. In Fuhr et al. (Fuhr et al., 2006), pages 211–223. Gabriella Kazai and Mounia Lalmas. 2005. INEX 2005 Evaluation Metrics. http://inex.is.informatik.uniduisburg.de/2005/inex-2005-metricsv4.pdf. Richard A. O’Keefe and Andrew Trotman. 2004. The Simplest Query Language That Could Possibly Work. In Proceedings of the second Workshop of the INEX, Schloss Dagstuhl, Germany. Xavier Tannier. 2005. From Natural Language to NEXI, an interface for INEX 2005 queries. In INEX05 (Fuhr et al., 2006), pages 289–313. Alan Woodley and Shlomo Geva. 2005. NLPX at INEX 2005. In INEX05 (Fuhr et al., 2006), pages 274–288.
Page 1:
Proceeding of the Australasian Lang
Page 4 and 5:
Program Chairs: Committees Lawrence
Page 6 and 7:
Towards the Evaluation of Referring
Page 8 and 9:
The cat ate the hat that I made NP/
Page 10 and 11:
Figure 2: Cells affected by adding
Page 12 and 13:
were used because the accuracy for
Page 14 and 15:
TIME ACC COVER FAIL RATE % secs % %
Page 16 and 17:
model. We explore different estimat
Page 18 and 19:
S.S. Source Sense Source S.S. Acc.
Page 20 and 21:
acy. The difference in correctly as
Page 22 and 23:
Experiments with Sentence Classific
Page 24 and 25:
datasets with skewed distribution t
Page 26 and 27:
words produced by the feature selec
Page 28 and 29:
Sentence Class NB DT SVM Apology 1.
Page 30 and 31:
Computational Semantics in the Natu
Page 32 and 33:
S[sem = ] -> NP[sem=?subj] VP[sem=?
Page 34 and 35:
Any string which cannot be decompos
Page 36 and 37:
satisfy(Formula,model(D,F),G,pos):n
Page 38 and 39:
Classifying Speech Acts using Verba
Page 40 and 41:
The principles are dichotomous —
Page 42 and 43:
ance. Several additional example ut
Page 44 and 45:
our results. In classifying only th
Page 46 and 47:
Word Relatives in Context for Word
Page 48 and 49:
create a collection of training exa
Page 50 and 51:
Query Sense The nave was rebuilt in
Page 52 and 53:
Algorithm Avg S2LS Avg S3LS Avg S2L
Page 54 and 55:
B. Snyder and M. Palmer. 2004. The
Page 56 and 57:
it is even used as a black box that
Page 58 and 59:
ecognition in QA, so we will not us
Page 60 and 61:
BPER ILOC IPER BLOC BLOC BDATE BLOC
Page 62 and 63:
of noise introduced by the addition
Page 64 and 65:
2 Existing annotated corpora Much o
Page 66 and 67:
Our FUSE|tel spectrum of HD|sta 738
Page 68 and 69:
N Word Correct Tagged 23 OH mol non
Page 70 and 71:
L ATEX typesetting information into
Page 72 and 73:
in English, neither the determiner
Page 74 and 75:
con 4 (Sagot et al., 2006) and Morp
Page 76 and 77:
Feature type Positions/description
Page 78 and 79:
Miriam Butt, Helge Dyvik, Tracy Hol
Page 80 and 81:
ently mostly solved by employing hu
Page 82 and 83:
Figure 1: Augmented SCT Lexicon Tok
Page 84 and 85:
medical terms can be composed by ad
Page 86 and 87:
References Aronson, A. R. (2001). E
Page 88 and 89:
technique to most question types, i
Page 90 and 91:
User Question Analyser QA System Se
Page 92 and 93:
Figure 2: Precision Figure 3: Cover
Page 94 and 95:
Analysis and Review. Springer-Verla
Page 96 and 97:
2 Background 2.1 Pronoun Reference
Page 98 and 99:
ous accounts of the effects of cohe
Page 100 and 101:
Table 1: Overall Results for Object
Page 102 and 103:
References Jennifer Arnold, Janet E
Page 104 and 105:
In order for appropriate documents
Page 106 and 107:
y Michael West. He found that his t
Page 108 and 109:
low-scoring documents are “Click
Page 110 and 111:
a) b) You must complete at least 9
Page 112 and 113:
uct). In this paper, we will only c
Page 114 and 115:
indicate the categories of substruc
Page 116 and 117: Argument lowering Dependent geach
Page 118 and 119: who [u : np] WH(np, s, s/?gq) λP.
Page 120 and 121: algorithms, and report the performa
Page 122 and 123: 2.3 Coverage of the Human Data Out
Page 124 and 125: and minimal subsets of the data. Of
Page 126 and 127: output. Finally, and related to the
Page 128 and 129: identifies, to avoid overwhelming t
Page 130 and 131: y the student is first parsed, usin
Page 132 and 133: a given word sequence Sorig as a pe
Page 134 and 135: A problem arises with this scheme w
Page 136 and 137: Example 1: Original Headline: Europ
Page 138 and 139: |relations(sentence1) ∩ relations
Page 140 and 141: 2685 True False 1275 Bleu Dep Bleu
Page 142 and 143: Features Acc. C1-prec. C1-recall C1
Page 144 and 145: Given the lack of research in VSD u
Page 146 and 147: ased features: Type 1 Original sibl
Page 148 and 149: Preposition of the adjunct semantic
Page 150 and 151: Algorithm 1 The algorithm of combin
Page 152 and 153: References Timothy Baldwin and Fran
Page 154 and 155: dependencies in German with respect
Page 156 and 157: wij moeten dit probleem aanpakken w
Page 158 and 159: a brevity penalty of BLEU n = 1 n =
Page 160 and 161: plicitly matching the syntax of the
Page 162 and 163: Probabilities improve stress-predic
Page 164 and 165: Towards Cognitive Optimisation of a
Page 168 and 169: Extracting Patient Clinical Profile
show all

Automatic Mapping Clinical Notes to Medical - RMIT University

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?