Automatic Mapping Clinical Notes to Medical - RMIT University
Automatic Mapping Clinical Notes to Medical - RMIT University
Automatic Mapping Clinical Notes to Medical - RMIT University
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
it is even used as a black box that is not fine-tuned<br />
<strong>to</strong> the task. In this paper we perform a step <strong>to</strong>wards<br />
such a formal study of the ideal characteristics<br />
of a NER for the task of QA. In particular,<br />
section 2 comments on the desiderata of a NER for<br />
QA. Next, section 3 describes the QA system used<br />
in the paper, while section 4 describes the NER<br />
and its modifications for its use for QA. Section 5<br />
presents the results of various experiments evaluating<br />
variations of the NER, and finally Section 6<br />
presents the concluding remarks and lines of further<br />
research.<br />
2 Named Entity Recognition for<br />
Question Answering<br />
Most QA systems gradually reduce the amount of<br />
data they need <strong>to</strong> consider in several phases. For<br />
example, when the system receives a user question,<br />
it first selects a set of relevant documents,<br />
and then filters out irrelevant pieces of text of these<br />
documents gradually until the answer is found.<br />
The NER is typically used as an aid <strong>to</strong> filter out<br />
strings that do not contain the answer. Thus, after<br />
a question analysis stage the type of the expected<br />
answer is determined and mapped <strong>to</strong> a list of entity<br />
types. The NER is therefore used <strong>to</strong> single out<br />
the entity types appearing in a text fragment. If a<br />
piece of text does not have any entity with a type<br />
compatible with the type of the expected answer,<br />
the text is discarded or heavily penalised. With<br />
this in mind, the desiderata of a NER are related<br />
with the range of entities <strong>to</strong> detect and with the<br />
recall of the system.<br />
2.1 Range of Entities<br />
Different domains require different types of answers.<br />
Typically, the question classification component<br />
determines the type of question and the<br />
type of the expected answer. For example, the<br />
questions used in the QA track of past TREC conferences<br />
can be classified following the taxonomy<br />
shown in Table 1 (Li and Roth, 2002).<br />
The set of entity types recognised by a standalone<br />
NER is typically very different and much<br />
more coarse-grained. For example, a typical set<br />
of entity types recognised by a NER is the one<br />
defined in past MUC tasks and presented in Table<br />
2. The table shows a two-level hierarchy and<br />
the types are much more coarse-grained than that<br />
of Table 1. Within each of the entity types of Table<br />
2 there are several types of questions of Ta-<br />
50<br />
ABBREVIATION<br />
abb, exp<br />
ENTITY<br />
animal, body, color, creative, currency, dis.med., event,<br />
food, instrument, lang, letter, other, plant, product, religion,<br />
sport, substance, symbol, technique, term, vehicle,<br />
word<br />
DESCRIPTION<br />
definition, description, manner, reason<br />
HUMAN<br />
group, ind, title, description<br />
LOCATION<br />
city, country, mountain, other, state<br />
NUMERIC<br />
code, count, date, distance, money, order, other, period,<br />
percent, speed, temp, size, weight<br />
Table 1: Complete taxonomy of Li & Roth<br />
Class Type<br />
ENAMEX Organization<br />
Person<br />
Location<br />
TIMEX Date<br />
Time<br />
NUMEX Money<br />
Percent<br />
Table 2: Entities used in the MUC tasks<br />
ble 1.<br />
A QA system typically uses both a taxonomy<br />
of expected answers and the taxonomy of named<br />
entities produced by its NER <strong>to</strong> identify which<br />
named entities are relevant <strong>to</strong> a question. The<br />
question is assigned a type from a taxonomy such<br />
as defined in Table 1. This type is then used <strong>to</strong> filter<br />
out irrelevant named entities that have types as<br />
defined in Table 2.<br />
A problem that arises here is that the granularity<br />
of the NEs provided by a NER is much coarser<br />
than the ideal granularity for QA, as the named entity<br />
types are matched against the types the question<br />
requires. Consequently, even though a question<br />
classifier could determine a very specific type<br />
of answer, this type needs <strong>to</strong> be mapped <strong>to</strong> the<br />
types provided by the NER.<br />
2.2 Recall<br />
Given that the NER is used <strong>to</strong> filter out candidate<br />
answers, it is important that only wrong answers<br />
are removed, while all correct answers stay in the<br />
set of possible answers. Therefore, recall in a NER<br />
in question answering is <strong>to</strong> be preferred above precision.<br />
Generally, a NER developed for a generic<br />
NE recognition task (or for information extraction)<br />
is fine-tuned for a good balance between recall<br />
and precision, and this is not necessarily what