22.07.2013 Views

Automatic Mapping Clinical Notes to Medical - RMIT University

Automatic Mapping Clinical Notes to Medical - RMIT University

Automatic Mapping Clinical Notes to Medical - RMIT University

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

L ATEX typesetting information in<strong>to</strong> a useable<br />

format. Unlike existing work we have rendered<br />

the mathematics in Unicode text rather than<br />

just removing it, which is important for further<br />

analysis of the data. The resulting corpus<br />

is larger than existing resources, such as muc,<br />

but has been annotated with a much more detailed<br />

set of over 40 named entity classes.<br />

Finally, we have demonstrated that high accuracy<br />

named entity recognisers can be trained<br />

using the initial release of this corpus, and<br />

shown how the tagger can be used <strong>to</strong> iteratively<br />

identify potential tagging errors. The<br />

quality of the results should only improve as<br />

the corpus size and quality is increased.<br />

11 Acknowledgments<br />

We would like <strong>to</strong> thank the arXiv administra<strong>to</strong>rs<br />

for giving us access <strong>to</strong> the astrop-ph<br />

archive. This research has made use of nasa’s<br />

Astrophysics Data System Bibliographic Services.<br />

This research has made use of the<br />

nasa/ipac Extragalactic Database (ned) operated<br />

by the Jet Propulsion Labora<strong>to</strong>ry, California<br />

Institute of Technology.<br />

This research was funded under a <strong>University</strong><br />

of Sydney Research and Development Grant<br />

and ARC Discovery grants DP0453131 and<br />

DP0665973.<br />

References<br />

ADS. 2005. Astronomical Data Service. http://www.<br />

adsabs.harvard.edu/.<br />

arXiv. 2005. arXiv.org archive. http://arxiv.org.<br />

M. Becker, B. Hachey, B. Alex, and C. Grover.<br />

2005. Optimising selective sampling for bootstrapping<br />

named entity recognition. In Proceedings of the<br />

ICML Workshop on Learning with Multiple Views,<br />

pages 5–11, Bonn, Germany.<br />

J. Cohen. 1960. A coefficient of agreement for nominal<br />

scales. Educational and Psychological Measurement,<br />

20:37–46.<br />

M. Collins. 2002. Ranking algorithms for named-entity<br />

extraction: Boosting and the voted perceptron. In<br />

Proceedings of the 40th Annual Meeting of the Association<br />

for Computational Linguistics, pages 489–<br />

496, Philadephia, PA USA.<br />

J.R. Curran and S. Clark. 2003. Language independent<br />

ner using a maximum entropy tagger. In<br />

Proceedings of the 7th Conference on Natural Language<br />

Learning (CoNLL), pages 164–167, Edmon<strong>to</strong>n,<br />

Canada.<br />

64<br />

M. Dickinson and W.D. Meurers. 2003. Detecting<br />

errors in part-of-speech annotation. In Proceedings<br />

of the 10th Conference of the European Chapter<br />

of the Association for Computational Linguistics,<br />

pages 107–114, Budapest, Hungary.<br />

P. Ginsparg. 2001. Creating a global knowledge network.<br />

In UNESCO Expert Conference on Electronic<br />

Publishing in Science, Paris, France.<br />

J. Grimm. 2003. Tralics, a L ATEX <strong>to</strong> XML transla<strong>to</strong>r.<br />

TUGboat, 24(3):377 – 388.<br />

B. Hachey, B. Alex, and M. Becker. 2005. Investigating<br />

the effects of selective sampling on the annotation<br />

task. In Proceedings of the 9th Conference<br />

on Natural Language Learning (CoNLL), pages 144–<br />

151, Ann Arbor, MI USA.<br />

R. J. Hanisch and P. J. Quinn. 2005. The IVOA.<br />

http://www.ivoa.net/pub/info/.<br />

S. Harabagiu, D. Moldovan, M. Pa¸sca, R. Mihalcea,<br />

M. Surdeanu, R. Bunescu, R. Gîrju, V. Rus, and<br />

P. Morărescu. 2000. Falcon: Boosting knowledge<br />

for answer engines. In Proceedings of TREC-9.<br />

L. Hirschman and R. Gaizauskas. 2001. Natural<br />

language question answering: The view from here.<br />

Journal of Natural Language Engineering, 7(4):275–<br />

300.<br />

L. Hirschman, J.C. Park, J. Tsujii, L. Wong, and C.H.<br />

Wu. 2002. Accomplishments and challenges in literature<br />

data mining for biology. Bioinformatics,<br />

18(12):1553–1561.<br />

J.-D. Kim, T. Ohta, Y. Tateisi, and J. Tsujii. 2003.<br />

GENIA corpus - a semantically annotated corpus for<br />

bio-textmining. Bioinformatics, 19(s1):i180–i182.<br />

M.-C. Lortet, S. Borde, and F. Ochsenbein. 1994. Second<br />

Reference Dictionary of the Nomenclature of<br />

Celestial Objects. A&AS, 107:193–218, Oc<strong>to</strong>ber.<br />

M.P. Marcus, B. San<strong>to</strong>rini, and M.A. Marcinkiewicz.<br />

1994. Building a large annotated corpus of English:<br />

the Penn Treebank. Computational Linguistics,<br />

19(2):313–330.<br />

A. P. Martinez, S. Derriere, N. Gray, R. Mann, J. Mc-<br />

Dowell, T. McGlynn, Ochsenbein F., P. Osuna,<br />

G. Rixon, and R. Williams. 2005. The UCD1+<br />

controlled vocabulary Version 1.02.<br />

A. Ratnaparkhi. 1996. A maximum entropy part-ofspeech<br />

tagger. In Proceedings of the Conference on<br />

Empirical Methods in Natural Language Processing,<br />

pages 133–142, Philadelphia, PA USA.<br />

J.C. Reynar and A. Ratnaparkhi. 1997. A maximum<br />

entropy approach <strong>to</strong> identifying sentence boundaries.<br />

In Proceedings of the 5th Conference on<br />

Applied Natural Language Processing, pages 16–19,<br />

Washing<strong>to</strong>n DC, USA.<br />

Daniel Trinkle. 2002. Detex. http://www.cs.purdue.<br />

edu/homes/trinkle/detex/.<br />

Unicode Consortium. 2005. The Unicode Standard.<br />

Addison-Wesley, 4th edition.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!