W10-09

Recommendations

Info

Large Scale Relation Detection ∗ Chris Welty and James Fan and David Gondek and Andrew Schlaikjer IBM Watson Research Center · 19 Skyline Drive · Hawthorne, NY 10532, USA {welty, fanj, dgondek, ahschlai}@us.ibm.com Abstract We present a technique for reading sentences and producing sets of hypothetical relations that the sentence may be expressing. The technique uses large amounts of instance-level background knowledge about the relations in order to gather statistics on the various ways the relation may be expressed in language, and was inspired by the observation that half of the linguistic forms used to express relations occur very infrequently and are simply not considered by systems that use too few seed examples. Some very early experiments are presented that show promising results. 1 Introduction We are building a system that learns to read in a new domain by applying a novel combination of natural language processing, machine learning, knowledge representation and reasoning, information retrieval, data mining, etc. techniques in an integrated way. Central to our approach is the view that all parts of the system should be able to interact during any level of processing, rather than a pipeline view in which certain parts of the system only take as input the results of other parts, and thus cannot influence those results. In this paper we discuss a particular case of that idea, using large knowledge bases hand in hand with natural language processing to improve the quality of relation detection. Ultimately we define reading as representing natural language text in ∗ Research supported in part by DARPA MRP Grant FA8750-<strong>09</strong>-C0172 a way that integrates background knowledge and inference, and thus are doing the relation detection to better integrate text with pre-existing knowledge, however that should not (and does not) prevent us from using what knowledge we have to influence that integration along the way. 2 Background The most obvious points of interaction between NLP and KR systems are named entity tagging and other forms of type instance extraction. The second major point of interaction is relation extraction, and while there are many kinds of relations that may be detected (e.g. syntactic relations such as modifiers and verb subject/object, equivalence relations like coreference or nicknames, event frame relations such as participants, etc.), the kind of relations that reading systems need to extract to support domainspecific reasoning tasks are relations that are known to be expressed in supporting knowledge-bases. We call these relations semantic relations in this paper. Compared to entity and type detection, extraction of semantic relations is significantly harder. In our work on bridging the NLP-KR gap, we have observed several aspects of what makes this task difficult, which we discuss below. 2.1 Keep reading Humans do not read and understand text by first recognizing named entities, giving them types, and then finding a small fixed set of relations between them. Rather, humans start with the first sentence and build up a representation of what they read that expands and is refined during reading. Furthermore, humans 24 Proceedings of the NAACL HLT 2010 First International Workshop on Formalisms and Methodology for Learning by Reading, pages 24–33, Los Angeles, California, June 2010. c○2010 Association for Computational Linguistics
do not “populate databases” by reading; knowledge is not only a product of reading, it is an integral part of it. We require knowledge during reading in order to understand what we read. One of the central tenets of our machine reading system is the notion that reading is not performed on sentences in isolation. Often, problems in NLP can be resolved by simply waiting for the next sentence, or remembering the results from the previous, and incorporating background or domain specific knowledge. This includes parse ambiguity, coreference, typing of named entities, etc. We call this the Keep Reading principle. Keep reading applies to relation extraction as well. Most relation extraction systems are implemented such that a single interpretation is forced on a sentence, based only on features of the sentence itself. In fact, this has been a shortcoming of many NLP systems in the past. However, when you apply the Keep Reading principle, multiple hypotheses from different parts of the NLP pipeline are maintained, and decisions are deferred until there is enough evidence to make a high confidence choice between competing hypotheses. Knowledge, such as those entities already known to participate in a relation and how that relation was expressed, can and should be part of that evidence. We will present many examples of the principle in subsequent sections. 2.2 Expressing relations in language Due to the flexibility and expressive power of natural language, a specific type of semantic relation can usually be expressed in language in a myriad of ways. In addition, semantic relations are often implied by the expression of other relations. For example, all of the following sentences more or less express the same relation between an actor and a movie: (1) “Elijah wood starred in Lord of the Rings: The Fellowship of the Ring”, (2) “Lord of the Rings: The Fellowship of the Ring’s Elijah Wood, ...”, and(3) “Elijah Wood’s coming of age was clearly his portrayal of the dedicated and noble hobbit that led the eponymous fellowship from the first episode of the Lord of the Rings trilogy.” No human reader would have any trouble recognizing the relation, but clearly this variability of expression presents a major problem for machine reading sys- 25 tems. To get an empirical sense of the variability of natural language used to express a relation, we studied a few semantic relations and found sentences that expressed that relation, extracted simple patterns to account for how the relation is expressed between two arguments, mainly by removing the relation arguments (e.g. “Elijah Wood” and “Lord of the Rings: The Fellowship of the Ring” above) and replacing them with variables. We then counted the number of times each pattern was used to express the relation, producing a recognizable very long tail shown in Figure 1 for the top 50 patterns expressing the acted-in-movie relation in 17k sentences. More sophisticated pattern generalization (as discussed in later sections) would significantly fatten the head, bringing it closer to the traditional 50% of the area under the curve, but no amount of generalization will eliminate the tail. The patterns become increasingly esoteric, such as “The movie Death Becomes Her features a brief sequence in which Bruce Willis and Goldie Hawn’s characters plan Meryl Streep’s character’s death by sending her car off of a cliff on Mulholland Drive,” or “The best known Hawksian woman is probably Lauren Bacall, who iconically played the type opposite Humphrey Bogart in To Have and Have Not and The Big Sleep.” 2.3 What relations matter We do not consider relation extraction to be an end in and of itself, but rather as a component in larger systems that perform some task requiring interoperation between language- and knowledge-based components. Such larger tasks can include question answering, medical diagnosis, intelligence analysis, museum curation, etc. These tasks have evaluation criteria that go beyond measuring relation extraction results. The first step in applying relation detection to these larger tasks is analysis to determine what relations matter for the task and domain. There are a number of manual and semi-automatic ways to perform such analysis. Repeating the theme of this paper, which is to use pre-existing knowledge-bases as resources, we performed this analysis using freebase and a set of 20k questionanswer pairs representing our task domain. For each question, we formed tuples of each entity name in the question (QNE) with the answer, and found all
Page 1 and 2: NAACL HLT 2010 First International
Page 3: Introduction It has been a long ter
Page 7: Table of Contents Machine Reading a
Page 10 and 11: Sunday, June 6, 2010 (continued) Se
Page 12 and 13: "coherent", based on criteria such
Page 14 and 15: (SRL). While there are a number of
Page 16 and 17: epeat select a clause chain Cu of
Page 18 and 19: even if those words reflect somethi
Page 20 and 21: Building an end-to-end text reading
Page 22 and 23: found to be wrong or correct (by su
Page 24 and 25: Text Queue Parser 4 Summary Text Mi
Page 26 and 27: formalized procedure to attach elem
Page 28 and 29: Steve_Walsh:throw:pass Steve_Walsh:
Page 30 and 31: NVNPN 2 'person':'intercept':'pass'
Page 32 and 33: 6 Related Work To build the knowled
Page 36 and 37: 1000 900 800 700 600 500 400 300 20
Page 38 and 39: to run against a web-scale corpus a
Page 40 and 41: Relation Prec Rec F1 Tuples Seeds i
Page 42 and 43: of the pattern space for any given
Page 44 and 45: Mining Script-Like Structures from
Page 46 and 47: e1 [ enter, nsubj, {customer, John}
Page 48 and 49: To identify such pairs, the topic s
Page 50 and 51: ≈ 57% 2. More words (bold) were j
Page 52 and 53: References Sergey Brin and Lawrence
Page 54 and 55: strengthsandweaknessesofthesystemin
Page 56 and 57: Fine-grainedRSTparser CollapsedRSTp
Page 58 and 59: Inference accuracy 0.3 0.25 0.2 0.1
Page 60 and 61: wehaveintroducedinferenceprocedures
Page 62 and 63: Semantic Role Labeling for Open Inf
Page 64 and 65: inference. Pruning involves using a
Page 66 and 67: TEXTRUNNER SRL-IE P R F1 P R F1 Bin
Page 68 and 69: that maximizes information gain div
Page 70 and 71: References Eugene Agichtein and Lui
Page 72 and 73: 1. Patterns based on words vs. pred
Page 74 and 75: state of the art information extrac
Page 76 and 77: ence of an attack. For instance, th
Page 78 and 79: manually annotated examples, which
Page 80 and 81: Towards Learning Rules from Natural
Page 82 and 83: tions of the learner suit the obser
Page 84 and 85:
Accuracy Accuracy (Aggressive−Nov
Page 86 and 87:
Accuracy Accuracy 100 95 90 85 80 7
Page 88 and 89:
Unsupervised techniques for discove
Page 90 and 91:
to categorize them. The article on
Page 92 and 93:
ized label) or the most specialized
Page 94 and 95:
Similarity Function k (L=2) F (L=2)
Page 96 and 97:
References Sören Auer, Christian B
Page 98 and 99:
a wide spectrum of solutions to the
Page 100 and 101:
tion from the Web corpus at large s
Page 102 and 103:
TextRunner, Kylin, KOG, WOE, WPE).
Page 104 and 105:
Doug Downey, Matthew Broadhead, and
Page 106 and 107:
Analogical Dialogue Acts: Supportin
Page 108 and 109:
Heat flows from one place to anothe
Page 110 and 111:
Source Text Translation* QRG-CE Tex
Page 112 and 113:
1987). We simplified the syntax of
Page 114 and 115:
Gentner, D. (1983). Structure-Mappi
Page 116 and 117:
sources can help in tasks like name
Page 118 and 119:
werset in the previous step and bas
Page 120 and 121:
we were able to construct a list of
Page 122 and 123:
found at threshold of 2. There were
Page 124 and 125:
Supporting rule-based representatio
Page 126 and 127:
the domain of the spatial argument
Page 128 and 129:
the time of the movement. This link
Page 130 and 131:
don’t seem to be plausible candid
Page 132 and 133:
PRISMATIC: Inducing Knowledge from
Page 134 and 135:
Figure 1: System Overview by a suit
Page 136 and 137:
ferent dimensions. Continuing with
Page 139:
Author Index Barbella, David, 96 Ba
show all

W10-09

Create successful ePaper yourself

Delete template?

Save as template?