Automatic Mapping Clinical Notes to Medical - RMIT University

More documents

Recommendations

Info

$journal of latex class files, vol. 6, no. 1, january ... - RMIT University$

L ATEX typesetting information into a useable format. Unlike existing work we have rendered the mathematics in Unicode text rather than just removing it, which is important for further analysis of the data. The resulting corpus is larger than existing resources, such as muc, but has been annotated with a much more detailed set of over 40 named entity classes. Finally, we have demonstrated that high accuracy named entity recognisers can be trained using the initial release of this corpus, and shown how the tagger can be used to iteratively identify potential tagging errors. The quality of the results should only improve as the corpus size and quality is increased. 11 Acknowledgments We would like to thank the arXiv administrators for giving us access to the astrop-ph archive. This research has made use of nasa’s Astrophysics Data System Bibliographic Services. This research has made use of the nasa/ipac Extragalactic Database (ned) operated by the Jet Propulsion Laboratory, California Institute of Technology. This research was funded under a University of Sydney Research and Development Grant and ARC Discovery grants DP0453131 and DP0665973. References ADS. 2005. Astronomical Data Service. http://www. adsabs.harvard.edu/. arXiv. 2005. arXiv.org archive. http://arxiv.org. M. Becker, B. Hachey, B. Alex, and C. Grover. 2005. Optimising selective sampling for bootstrapping named entity recognition. In Proceedings of the ICML Workshop on Learning with Multiple Views, pages 5–11, Bonn, Germany. J. Cohen. 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, 20:37–46. M. Collins. 2002. Ranking algorithms for named-entity extraction: Boosting and the voted perceptron. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 489– 496, Philadephia, PA USA. J.R. Curran and S. Clark. 2003. Language independent ner using a maximum entropy tagger. In Proceedings of the 7th Conference on Natural Language Learning (CoNLL), pages 164–167, Edmonton, Canada. 64 M. Dickinson and W.D. Meurers. 2003. Detecting errors in part-of-speech annotation. In Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics, pages 107–114, Budapest, Hungary. P. Ginsparg. 2001. Creating a global knowledge network. In UNESCO Expert Conference on Electronic Publishing in Science, Paris, France. J. Grimm. 2003. Tralics, a L ATEX to XML translator. TUGboat, 24(3):377 – 388. B. Hachey, B. Alex, and M. Becker. 2005. Investigating the effects of selective sampling on the annotation task. In Proceedings of the 9th Conference on Natural Language Learning (CoNLL), pages 144– 151, Ann Arbor, MI USA. R. J. Hanisch and P. J. Quinn. 2005. The IVOA. http://www.ivoa.net/pub/info/. S. Harabagiu, D. Moldovan, M. Pa¸sca, R. Mihalcea, M. Surdeanu, R. Bunescu, R. Gîrju, V. Rus, and P. Morărescu. 2000. Falcon: Boosting knowledge for answer engines. In Proceedings of TREC-9. L. Hirschman and R. Gaizauskas. 2001. Natural language question answering: The view from here. Journal of Natural Language Engineering, 7(4):275– 300. L. Hirschman, J.C. Park, J. Tsujii, L. Wong, and C.H. Wu. 2002. Accomplishments and challenges in literature data mining for biology. Bioinformatics, 18(12):1553–1561. J.-D. Kim, T. Ohta, Y. Tateisi, and J. Tsujii. 2003. GENIA corpus - a semantically annotated corpus for bio-textmining. Bioinformatics, 19(s1):i180–i182. M.-C. Lortet, S. Borde, and F. Ochsenbein. 1994. Second Reference Dictionary of the Nomenclature of Celestial Objects. A&AS, 107:193–218, October. M.P. Marcus, B. Santorini, and M.A. Marcinkiewicz. 1994. Building a large annotated corpus of English: the Penn Treebank. Computational Linguistics, 19(2):313–330. A. P. Martinez, S. Derriere, N. Gray, R. Mann, J. Mc- Dowell, T. McGlynn, Ochsenbein F., P. Osuna, G. Rixon, and R. Williams. 2005. The UCD1+ controlled vocabulary Version 1.02. A. Ratnaparkhi. 1996. A maximum entropy part-ofspeech tagger. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 133–142, Philadelphia, PA USA. J.C. Reynar and A. Ratnaparkhi. 1997. A maximum entropy approach to identifying sentence boundaries. In Proceedings of the 5th Conference on Applied Natural Language Processing, pages 16–19, Washington DC, USA. Daniel Trinkle. 2002. Detex. http://www.cs.purdue. edu/homes/trinkle/detex/. Unicode Consortium. 2005. The Unicode Standard. Addison-Wesley, 4th edition.
Die Morphologie (f) : Targeted Lexical Acquisition for Languages other than English Jeremy Nicholson†, Timothy Baldwin†, and Phil Blunsom‡ †‡ Department of Computer Science and Software Engineering University of Melbourne, VIC 3010, Australia and † NICTA Victoria Research Laboratories University of Melbourne, VIC 3010, Australia {jeremymn,tim,pcbl}@csse.unimelb.edu.au Abstract We examine standard deep lexical acquisition features in automatically predicting the gender of noun types and tokens by bootstrapping from a small annotated corpus. Using a knowledge-poor approach to simulate prediction in unseen languages, we observe results comparable to morphological analysers trained specifically on our target languages of German and French. These results describe further scope in analysing other properties in languages displaying a more challenging morphosyntax, in order to create language resources in a language-independent manner. 1 Introduction As a result of incremental annotation efforts and advances in algorithm design and statistical modelling, deep language resources (DLRs, i.e. language resources with a high level of linguistic sophistication) are increasingly being applied to mainstream NLP applications. Examples include analysis of semantic similarity through ontologies such as WordNet (Fellbaum, 1998) or VerbOcean (Chklovski and Pantel, 2004), parsing with precision grammars such as the DELPH-IN (Oepen et al., 2002) or PARGRAM (Butt et al., 2002) grammars, and modelling with richly annotated treebanks such as PropBank (Palmer et al., 2005) or CCGbank (Hockenmaier and Steedman, 2002). Unfortunately, the increasing complexity of these language resources and the desire for broad coverage has meant that their traditionally manual mode of creation, development and maintenance has become infeasibly labour-intensive. As a consequence, deep lexical acquisition (DLA) has been proposed as a means of automatically learning deep linguistic representations to expand the coverage of DLRs (Baldwin, 2005). DLA research can be divided into two main categories: targeted DLA, in which lexemes are classified according to a given lexical property (e.g. noun countability, or subcategorisation properties); and generalised DLA, in which lexemes are classified according to the full range of lexical properties captured in a given DLR (e.g. the full range of lexical relations in a lexical ontology, or the system of lexical types in an HPSG). As we attest in Section 2, most work in deep lexical acquisition has focussed on the English language. This can be explained in part by the ready availability of targeted language resources like the ones mentioned above, as well as secondary resources — such as corpora, part-ofspeech taggers, chunkers, and so on — with which to aid prediction of the lexical property in question. One obvious question, therefore, is whether the techniques used to perform lexical acquisition in English generalise readily to other languages, where subtle but important differences in morphosyntax might obfuscate the surface cues used for prediction. In this work, we will examine the targeted prediction of the gender of noun types and tokens in context, for both German and French. As an example, the following phrases display adjectival gender inflection in two of the three languages below. the open window das offene Fenster la fenêtre ouverte window has no gender in English, Fenster is neuter in German and fenêtre is feminine in French. So Proceedings of the 2006 Australasian Language Technology Workshop (ALTW2006), pages 65–72. 65
Page 1:
Proceeding of the Australasian Lang
Page 4 and 5:
Program Chairs: Committees Lawrence
Page 6 and 7:
Towards the Evaluation of Referring
Page 8 and 9:
The cat ate the hat that I made NP/
Page 10 and 11:
Figure 2: Cells affected by adding
Page 12 and 13:
were used because the accuracy for
Page 14 and 15:
TIME ACC COVER FAIL RATE % secs % %
Page 16 and 17:
model. We explore different estimat
Page 18 and 19:
S.S. Source Sense Source S.S. Acc.
Page 20 and 21: acy. The difference in correctly as
Page 22 and 23: Experiments with Sentence Classific
Page 24 and 25: datasets with skewed distribution t
Page 26 and 27: words produced by the feature selec
Page 28 and 29: Sentence Class NB DT SVM Apology 1.
Page 30 and 31: Computational Semantics in the Natu
Page 32 and 33: S[sem = ] -> NP[sem=?subj] VP[sem=?
Page 34 and 35: Any string which cannot be decompos
Page 36 and 37: satisfy(Formula,model(D,F),G,pos):n
Page 38 and 39: Classifying Speech Acts using Verba
Page 40 and 41: The principles are dichotomous —
Page 42 and 43: ance. Several additional example ut
Page 44 and 45: our results. In classifying only th
Page 46 and 47: Word Relatives in Context for Word
Page 48 and 49: create a collection of training exa
Page 50 and 51: Query Sense The nave was rebuilt in
Page 52 and 53: Algorithm Avg S2LS Avg S3LS Avg S2L
Page 54 and 55: B. Snyder and M. Palmer. 2004. The
Page 56 and 57: it is even used as a black box that
Page 58 and 59: ecognition in QA, so we will not us
Page 60 and 61: BPER ILOC IPER BLOC BLOC BDATE BLOC
Page 62 and 63: of noise introduced by the addition
Page 64 and 65: 2 Existing annotated corpora Much o
Page 66 and 67: Our FUSE|tel spectrum of HD|sta 738
Page 68 and 69: N Word Correct Tagged 23 OH mol non
Page 72 and 73: in English, neither the determiner
Page 74 and 75: con 4 (Sagot et al., 2006) and Morp
Page 76 and 77: Feature type Positions/description
Page 78 and 79: Miriam Butt, Helge Dyvik, Tracy Hol
Page 80 and 81: ently mostly solved by employing hu
Page 82 and 83: Figure 1: Augmented SCT Lexicon Tok
Page 84 and 85: medical terms can be composed by ad
Page 86 and 87: References Aronson, A. R. (2001). E
Page 88 and 89: technique to most question types, i
Page 90 and 91: User Question Analyser QA System Se
Page 92 and 93: Figure 2: Precision Figure 3: Cover
Page 94 and 95: Analysis and Review. Springer-Verla
Page 96 and 97: 2 Background 2.1 Pronoun Reference
Page 98 and 99: ous accounts of the effects of cohe
Page 100 and 101: Table 1: Overall Results for Object
Page 102 and 103: References Jennifer Arnold, Janet E
Page 104 and 105: In order for appropriate documents
Page 106 and 107: y Michael West. He found that his t
Page 108 and 109: low-scoring documents are “Click
Page 110 and 111: a) b) You must complete at least 9
Page 112 and 113: uct). In this paper, we will only c
Page 114 and 115: indicate the categories of substruc
Page 116 and 117: Argument lowering Dependent geach
Page 118 and 119: who [u : np] WH(np, s, s/?gq) λP.
Page 120 and 121:
algorithms, and report the performa
Page 122 and 123:
2.3 Coverage of the Human Data Out
Page 124 and 125:
and minimal subsets of the data. Of
Page 126 and 127:
output. Finally, and related to the
Page 128 and 129:
identifies, to avoid overwhelming t
Page 130 and 131:
y the student is first parsed, usin
Page 132 and 133:
a given word sequence Sorig as a pe
Page 134 and 135:
A problem arises with this scheme w
Page 136 and 137:
Example 1: Original Headline: Europ
Page 138 and 139:
|relations(sentence1) ∩ relations
Page 140 and 141:
2685 True False 1275 Bleu Dep Bleu
Page 142 and 143:
Features Acc. C1-prec. C1-recall C1
Page 144 and 145:
Given the lack of research in VSD u
Page 146 and 147:
ased features: Type 1 Original sibl
Page 148 and 149:
Preposition of the adjunct semantic
Page 150 and 151:
Algorithm 1 The algorithm of combin
Page 152 and 153:
References Timothy Baldwin and Fran
Page 154 and 155:
dependencies in German with respect
Page 156 and 157:
wij moeten dit probleem aanpakken w
Page 158 and 159:
a brevity penalty of BLEU n = 1 n =
Page 160 and 161:
plicitly matching the syntax of the
Page 162 and 163:
Probabilities improve stress-predic
Page 164 and 165:
Towards Cognitive Optimisation of a
Page 166 and 167:
Natural Language Processing and XML
Page 168 and 169:
Extracting Patient Clinical Profile
show all

Automatic Mapping Clinical Notes to Medical - RMIT University

Create successful ePaper yourself

Delete template?

Save as template?