22.07.2013 Views

Automatic Mapping Clinical Notes to Medical - RMIT University

Automatic Mapping Clinical Notes to Medical - RMIT University

Automatic Mapping Clinical Notes to Medical - RMIT University

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

it is even used as a black box that is not fine-tuned<br />

<strong>to</strong> the task. In this paper we perform a step <strong>to</strong>wards<br />

such a formal study of the ideal characteristics<br />

of a NER for the task of QA. In particular,<br />

section 2 comments on the desiderata of a NER for<br />

QA. Next, section 3 describes the QA system used<br />

in the paper, while section 4 describes the NER<br />

and its modifications for its use for QA. Section 5<br />

presents the results of various experiments evaluating<br />

variations of the NER, and finally Section 6<br />

presents the concluding remarks and lines of further<br />

research.<br />

2 Named Entity Recognition for<br />

Question Answering<br />

Most QA systems gradually reduce the amount of<br />

data they need <strong>to</strong> consider in several phases. For<br />

example, when the system receives a user question,<br />

it first selects a set of relevant documents,<br />

and then filters out irrelevant pieces of text of these<br />

documents gradually until the answer is found.<br />

The NER is typically used as an aid <strong>to</strong> filter out<br />

strings that do not contain the answer. Thus, after<br />

a question analysis stage the type of the expected<br />

answer is determined and mapped <strong>to</strong> a list of entity<br />

types. The NER is therefore used <strong>to</strong> single out<br />

the entity types appearing in a text fragment. If a<br />

piece of text does not have any entity with a type<br />

compatible with the type of the expected answer,<br />

the text is discarded or heavily penalised. With<br />

this in mind, the desiderata of a NER are related<br />

with the range of entities <strong>to</strong> detect and with the<br />

recall of the system.<br />

2.1 Range of Entities<br />

Different domains require different types of answers.<br />

Typically, the question classification component<br />

determines the type of question and the<br />

type of the expected answer. For example, the<br />

questions used in the QA track of past TREC conferences<br />

can be classified following the taxonomy<br />

shown in Table 1 (Li and Roth, 2002).<br />

The set of entity types recognised by a standalone<br />

NER is typically very different and much<br />

more coarse-grained. For example, a typical set<br />

of entity types recognised by a NER is the one<br />

defined in past MUC tasks and presented in Table<br />

2. The table shows a two-level hierarchy and<br />

the types are much more coarse-grained than that<br />

of Table 1. Within each of the entity types of Table<br />

2 there are several types of questions of Ta-<br />

50<br />

ABBREVIATION<br />

abb, exp<br />

ENTITY<br />

animal, body, color, creative, currency, dis.med., event,<br />

food, instrument, lang, letter, other, plant, product, religion,<br />

sport, substance, symbol, technique, term, vehicle,<br />

word<br />

DESCRIPTION<br />

definition, description, manner, reason<br />

HUMAN<br />

group, ind, title, description<br />

LOCATION<br />

city, country, mountain, other, state<br />

NUMERIC<br />

code, count, date, distance, money, order, other, period,<br />

percent, speed, temp, size, weight<br />

Table 1: Complete taxonomy of Li & Roth<br />

Class Type<br />

ENAMEX Organization<br />

Person<br />

Location<br />

TIMEX Date<br />

Time<br />

NUMEX Money<br />

Percent<br />

Table 2: Entities used in the MUC tasks<br />

ble 1.<br />

A QA system typically uses both a taxonomy<br />

of expected answers and the taxonomy of named<br />

entities produced by its NER <strong>to</strong> identify which<br />

named entities are relevant <strong>to</strong> a question. The<br />

question is assigned a type from a taxonomy such<br />

as defined in Table 1. This type is then used <strong>to</strong> filter<br />

out irrelevant named entities that have types as<br />

defined in Table 2.<br />

A problem that arises here is that the granularity<br />

of the NEs provided by a NER is much coarser<br />

than the ideal granularity for QA, as the named entity<br />

types are matched against the types the question<br />

requires. Consequently, even though a question<br />

classifier could determine a very specific type<br />

of answer, this type needs <strong>to</strong> be mapped <strong>to</strong> the<br />

types provided by the NER.<br />

2.2 Recall<br />

Given that the NER is used <strong>to</strong> filter out candidate<br />

answers, it is important that only wrong answers<br />

are removed, while all correct answers stay in the<br />

set of possible answers. Therefore, recall in a NER<br />

in question answering is <strong>to</strong> be preferred above precision.<br />

Generally, a NER developed for a generic<br />

NE recognition task (or for information extraction)<br />

is fine-tuned for a good balance between recall<br />

and precision, and this is not necessarily what

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!