W10-09

Recommendations

Info

state of the art information extraction engine. A manually determined confidence threshold is used to discard mentions where co-reference reference certainty is too low. During pattern matching, co-reference reference is used to find nd the “best string” for each element of the matched triple. In particular, pronouns and descri descriptors are replaced by their referents; and abbreviat abbreviated names are replaced by full names. If any pronoun cannot be resolved to a description or a name, or iif the co-reference reference threshold falls below a manually determined threshold, then the match is discarded. Pattern scoring requires that we compare iin stances of triples across the whole corpus. If the these instances were compared purely on the basis of strings, in many y cases the same entity would aap- pear as distinct (e.g. US, United States) States). This would interfere with the arity constraints describe described below. To alleviate this challenge, we uuse a database of name strings that have been shown to be equivalent with a combination tion of edit edit-distance and extraction statistics (Baron & Freedman Freedman, 2008). Thus, for triple(tP) ) and hypothesized triples (HT (HTi), if tP∉HTi, , but can be mapped via the equivalent names database to some triple tP’ ’∈HTi, then its score and confidence are adjusted towards that of tP’, ’, weighted by the confidence of the equivalence. 4.3 Relation Set and Constraints We ran experiments using 11 relation types. The relation types were selected as a subset of the rel relations that have been proposed for DARPA’s mma chine reading program. In addition to seed examples, we provided the learning system with three types of constraints for each relation: Symmetry: : For relations where R(X,Y) = R(Y,X), the learning (and scoring process), norm normalizes instances of the relation so that R(X,Y R(X,Y) and R(Y,X) are equivalent. Arity: For each argument of the relation, pr provide an expected maximum number of instances per unique instance of the other argument. These numbers are intentionally higher than the expected true value to account for co-referen reference mistakes. Patterns that violate the arity constraint (e.g v:accompanied(=, = as a pattern for spouse) are penalized. This is one way of providing negative feedback during the uns unsupervised training process. Argument Type: For each argument, specify 64 the expected class of entities for this argument. Entity types are one of the 7 ACE types (Person, Organization, Geo-political political entity, Location, Facil- Faci ity, Weapon, Vehicle) or Date. Currently the system only allows instance proposals sals when the types are correct. Potentially, the system could use pat- pa tern matches that violate type constraints as an ad- a ditional type of negative example. Any implementation would need to account for the fact that in some cases, potentially too general patterns pa are quite effective when the type constraints are met. For example, for the relation employed(PERSON, ORGANIZATION), ORGANIZATION) ’s is a fairly precise pattern, despite clearly being overly general without the type constraints. In our relation set, only two relations (sibling ( and spouse) are symmetric. Table 1 below includes the other constraints. ACE types/dates are in col- co umns labeled with the first letter of the t name of the type (A is arity). . We have only included those types that are an argument for some relation. relation Table 1: : Argument types of the test relations 4.4 Corpus and Seed Examples While many other experiments using this approach have used web-scale scale corpora, we chose to include Wikipedia articles as well as Gigaword-3 Gigaword to provide additional instances of information (e.g. birthDate and sibling) that is uncommon in news. For each relation type, 20 seed-examples seed were selected randomly from the corpus by using a combination of keyword search and an ACE ex- e traction system to identify passages that were like- lik ly to contain the relations of interest. As such, each seed example was guaranteed to occur at least once in a context that indicated the he relation was present. 5 Evaluation Framework To evaluate system performance, we ran two sepa- sep
ate annotation-based based evaluations, the first mea measured precision, and the second measure measured recall. To measure overall precision, , we ran each sy system’s patterns over the web-scale scale corpora, and randomly sampled pled 200 of the instances it found These instances were then manually whether they conveyed the intended relation or not. The system precision is simply the percentage of instances that were judged to be correct. To measure recall, we began by randomly s lecting 20 test-examples from the corpus, using the same process that we used to select the training seeds (but guaranteed to be distinct from the trai ing seeds). We then searched the web for sentences that might possibly link these test examples together (whether directly or via co reference). We randomly sampled this set of se tences, choosing 10 sentences for each test example, to form a collection of 200 sentences which were likely to convey the desired relation. These sentences were then manually annotated to indicate which sentences actually convey the d sired relation; this set of sentences forms the test set. Once a recall set had been created for each relation, a system’s recall could be running that system’s patterns over the documents in the recall set, and checking what percentage of the recall test sentences it correctly identified. We intentionally chose to sample 10 sentences from each test example, rather than sampling from the set of all sentences found for any of the test examples, in order to prevent one or two very common instances from dominating the recall set. As a result, the recall test set is somewhat biased away from “true” recall, since it places a higher weight on the “long tail” of instances. we believe that this gives a more accurate indic tion of the system’s ability to find novel instance of a relation (as opposed to novel ways of talking about known instances). 6 . These instances were then manually assessed as to the intended relation or not. The system precision is simply the percentage of o be correct. we began by randomly seexamples from the corpus, using the same process that we used to select the training (but guaranteed to be distinct from the train- We then searched the web-scale corpus for sentences that might possibly link these test examples together (whether directly or via co- We randomly sampled this set of sentences, choosing 10 sentences for each testexample, to form a collection of 200 sentences convey the desired relation. These sentences were then manually annotated to indicate which sentences actually convey the deset of sentences forms the recall Once a recall set had been created for each could be evaluated by running that system’s patterns over the documents in the recall set, and checking what percentage of the recall test sentences it correctly identified. We intentionally chose to sample 10 sentences ple, rather than sampling from the set of all sentences found for any of the test examples, in order to prevent one or two very common instances from dominating the recall set. As a result, the recall test set is somewhat biased nce it places a higher weight on the “long tail” of instances. However, we believe that this gives a more accurate indication of the system’s ability to find novel instances (as opposed to novel ways of talking 6 Effect of Pattern Type As described in 4.1, , our system is capable of lear learning two classes of patterns: surface surface-strings and text-graphs. graphs. We measured our system’s perfo performance ance on each of the relation types after twenty 6 While the system provides estimated precision for each pa pattern, we do not evaluate over the n-best best matches. All patterns with estimated confidence above 50% are treated eq equally. We sample from the set of matches produced by these pa patterns. 65 iterations. In each iteration, the system can learn multiple patterns of either type. There is currently no penalty for learning overlapping pattern types. For or example, in the first iteration for the relation killed(), (), the system learns both the surface-string surface pattern killed
Page 1 and 2:
NAACL HLT 2010 First International
Page 3:
Introduction It has been a long ter
Page 7:
Table of Contents Machine Reading a
Page 10 and 11:
Sunday, June 6, 2010 (continued) Se
Page 12 and 13:
"coherent", based on criteria such
Page 14 and 15:
(SRL). While there are a number of
Page 16 and 17:
epeat select a clause chain Cu of
Page 18 and 19:
even if those words reflect somethi
Page 20 and 21:
Building an end-to-end text reading
Page 22 and 23:
found to be wrong or correct (by su
Page 24 and 25: Text Queue Parser 4 Summary Text Mi
Page 26 and 27: formalized procedure to attach elem
Page 28 and 29: Steve_Walsh:throw:pass Steve_Walsh:
Page 30 and 31: NVNPN 2 'person':'intercept':'pass'
Page 32 and 33: 6 Related Work To build the knowled
Page 34 and 35: Large Scale Relation Detection ∗
Page 36 and 37: 1000 900 800 700 600 500 400 300 20
Page 38 and 39: to run against a web-scale corpus a
Page 40 and 41: Relation Prec Rec F1 Tuples Seeds i
Page 42 and 43: of the pattern space for any given
Page 44 and 45: Mining Script-Like Structures from
Page 46 and 47: e1 [ enter, nsubj, {customer, John}
Page 48 and 49: To identify such pairs, the topic s
Page 50 and 51: ≈ 57% 2. More words (bold) were j
Page 52 and 53: References Sergey Brin and Lawrence
Page 54 and 55: strengthsandweaknessesofthesystemin
Page 56 and 57: Fine-grainedRSTparser CollapsedRSTp
Page 58 and 59: Inference accuracy 0.3 0.25 0.2 0.1
Page 60 and 61: wehaveintroducedinferenceprocedures
Page 62 and 63: Semantic Role Labeling for Open Inf
Page 64 and 65: inference. Pruning involves using a
Page 66 and 67: TEXTRUNNER SRL-IE P R F1 P R F1 Bin
Page 68 and 69: that maximizes information gain div
Page 70 and 71: References Eugene Agichtein and Lui
Page 72 and 73: 1. Patterns based on words vs. pred
Page 76 and 77: ence of an attack. For instance, th
Page 78 and 79: manually annotated examples, which
Page 80 and 81: Towards Learning Rules from Natural
Page 82 and 83: tions of the learner suit the obser
Page 84 and 85: Accuracy Accuracy (Aggressive−Nov
Page 86 and 87: Accuracy Accuracy 100 95 90 85 80 7
Page 88 and 89: Unsupervised techniques for discove
Page 90 and 91: to categorize them. The article on
Page 92 and 93: ized label) or the most specialized
Page 94 and 95: Similarity Function k (L=2) F (L=2)
Page 96 and 97: References Sören Auer, Christian B
Page 98 and 99: a wide spectrum of solutions to the
Page 100 and 101: tion from the Web corpus at large s
Page 102 and 103: TextRunner, Kylin, KOG, WOE, WPE).
Page 104 and 105: Doug Downey, Matthew Broadhead, and
Page 106 and 107: Analogical Dialogue Acts: Supportin
Page 108 and 109: Heat flows from one place to anothe
Page 110 and 111: Source Text Translation* QRG-CE Tex
Page 112 and 113: 1987). We simplified the syntax of
Page 114 and 115: Gentner, D. (1983). Structure-Mappi
Page 116 and 117: sources can help in tasks like name
Page 118 and 119: werset in the previous step and bas
Page 120 and 121: we were able to construct a list of
Page 122 and 123: found at threshold of 2. There were
Page 124 and 125:
Supporting rule-based representatio
Page 126 and 127:
the domain of the spatial argument
Page 128 and 129:
the time of the movement. This link
Page 130 and 131:
don’t seem to be plausible candid
Page 132 and 133:
PRISMATIC: Inducing Knowledge from
Page 134 and 135:
Figure 1: System Overview by a suit
Page 136 and 137:
ferent dimensions. Continuing with
Page 139:
Author Index Barbella, David, 96 Ba
show all

W10-09

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?