Automatic Mapping Clinical Notes to Medical - RMIT University
Automatic Mapping Clinical Notes to Medical - RMIT University
Automatic Mapping Clinical Notes to Medical - RMIT University
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
L ATEX typesetting information in<strong>to</strong> a useable<br />
format. Unlike existing work we have rendered<br />
the mathematics in Unicode text rather than<br />
just removing it, which is important for further<br />
analysis of the data. The resulting corpus<br />
is larger than existing resources, such as muc,<br />
but has been annotated with a much more detailed<br />
set of over 40 named entity classes.<br />
Finally, we have demonstrated that high accuracy<br />
named entity recognisers can be trained<br />
using the initial release of this corpus, and<br />
shown how the tagger can be used <strong>to</strong> iteratively<br />
identify potential tagging errors. The<br />
quality of the results should only improve as<br />
the corpus size and quality is increased.<br />
11 Acknowledgments<br />
We would like <strong>to</strong> thank the arXiv administra<strong>to</strong>rs<br />
for giving us access <strong>to</strong> the astrop-ph<br />
archive. This research has made use of nasa’s<br />
Astrophysics Data System Bibliographic Services.<br />
This research has made use of the<br />
nasa/ipac Extragalactic Database (ned) operated<br />
by the Jet Propulsion Labora<strong>to</strong>ry, California<br />
Institute of Technology.<br />
This research was funded under a <strong>University</strong><br />
of Sydney Research and Development Grant<br />
and ARC Discovery grants DP0453131 and<br />
DP0665973.<br />
References<br />
ADS. 2005. Astronomical Data Service. http://www.<br />
adsabs.harvard.edu/.<br />
arXiv. 2005. arXiv.org archive. http://arxiv.org.<br />
M. Becker, B. Hachey, B. Alex, and C. Grover.<br />
2005. Optimising selective sampling for bootstrapping<br />
named entity recognition. In Proceedings of the<br />
ICML Workshop on Learning with Multiple Views,<br />
pages 5–11, Bonn, Germany.<br />
J. Cohen. 1960. A coefficient of agreement for nominal<br />
scales. Educational and Psychological Measurement,<br />
20:37–46.<br />
M. Collins. 2002. Ranking algorithms for named-entity<br />
extraction: Boosting and the voted perceptron. In<br />
Proceedings of the 40th Annual Meeting of the Association<br />
for Computational Linguistics, pages 489–<br />
496, Philadephia, PA USA.<br />
J.R. Curran and S. Clark. 2003. Language independent<br />
ner using a maximum entropy tagger. In<br />
Proceedings of the 7th Conference on Natural Language<br />
Learning (CoNLL), pages 164–167, Edmon<strong>to</strong>n,<br />
Canada.<br />
64<br />
M. Dickinson and W.D. Meurers. 2003. Detecting<br />
errors in part-of-speech annotation. In Proceedings<br />
of the 10th Conference of the European Chapter<br />
of the Association for Computational Linguistics,<br />
pages 107–114, Budapest, Hungary.<br />
P. Ginsparg. 2001. Creating a global knowledge network.<br />
In UNESCO Expert Conference on Electronic<br />
Publishing in Science, Paris, France.<br />
J. Grimm. 2003. Tralics, a L ATEX <strong>to</strong> XML transla<strong>to</strong>r.<br />
TUGboat, 24(3):377 – 388.<br />
B. Hachey, B. Alex, and M. Becker. 2005. Investigating<br />
the effects of selective sampling on the annotation<br />
task. In Proceedings of the 9th Conference<br />
on Natural Language Learning (CoNLL), pages 144–<br />
151, Ann Arbor, MI USA.<br />
R. J. Hanisch and P. J. Quinn. 2005. The IVOA.<br />
http://www.ivoa.net/pub/info/.<br />
S. Harabagiu, D. Moldovan, M. Pa¸sca, R. Mihalcea,<br />
M. Surdeanu, R. Bunescu, R. Gîrju, V. Rus, and<br />
P. Morărescu. 2000. Falcon: Boosting knowledge<br />
for answer engines. In Proceedings of TREC-9.<br />
L. Hirschman and R. Gaizauskas. 2001. Natural<br />
language question answering: The view from here.<br />
Journal of Natural Language Engineering, 7(4):275–<br />
300.<br />
L. Hirschman, J.C. Park, J. Tsujii, L. Wong, and C.H.<br />
Wu. 2002. Accomplishments and challenges in literature<br />
data mining for biology. Bioinformatics,<br />
18(12):1553–1561.<br />
J.-D. Kim, T. Ohta, Y. Tateisi, and J. Tsujii. 2003.<br />
GENIA corpus - a semantically annotated corpus for<br />
bio-textmining. Bioinformatics, 19(s1):i180–i182.<br />
M.-C. Lortet, S. Borde, and F. Ochsenbein. 1994. Second<br />
Reference Dictionary of the Nomenclature of<br />
Celestial Objects. A&AS, 107:193–218, Oc<strong>to</strong>ber.<br />
M.P. Marcus, B. San<strong>to</strong>rini, and M.A. Marcinkiewicz.<br />
1994. Building a large annotated corpus of English:<br />
the Penn Treebank. Computational Linguistics,<br />
19(2):313–330.<br />
A. P. Martinez, S. Derriere, N. Gray, R. Mann, J. Mc-<br />
Dowell, T. McGlynn, Ochsenbein F., P. Osuna,<br />
G. Rixon, and R. Williams. 2005. The UCD1+<br />
controlled vocabulary Version 1.02.<br />
A. Ratnaparkhi. 1996. A maximum entropy part-ofspeech<br />
tagger. In Proceedings of the Conference on<br />
Empirical Methods in Natural Language Processing,<br />
pages 133–142, Philadelphia, PA USA.<br />
J.C. Reynar and A. Ratnaparkhi. 1997. A maximum<br />
entropy approach <strong>to</strong> identifying sentence boundaries.<br />
In Proceedings of the 5th Conference on<br />
Applied Natural Language Processing, pages 16–19,<br />
Washing<strong>to</strong>n DC, USA.<br />
Daniel Trinkle. 2002. Detex. http://www.cs.purdue.<br />
edu/homes/trinkle/detex/.<br />
Unicode Consortium. 2005. The Unicode Standard.<br />
Addison-Wesley, 4th edition.