W10-09

Recommendations

Info

ence of an attack. For instance, the system learns patterns with predicates said and arrive. Certainly, governments often make statements on the date of an attack and troops arrive in a location before attacking, but both patterns will produce a large number of spurious instances. • While text-graph graph patterns typically have higher precision than the combined pattern set, su surface-string string patterns provide enough improv improvement in recall that typically the all all-pattern F- score is higher than the text-graph graph FF-score. Figure 6: Text-Graph and Surface-String String Patterns A partial explanation for the higher recall and precision of the text-graph graph patterns is illustrated in Figure 6 which presents a simple surface surface-string pattern and a simple text-graph graph pattern that appear to represent the same information. On the right of the figure are three sentences. The text text-graph pattern correctly identifies the agent and victim in each sentence. However, the surface surface-string pattern, misses the killed() () relation in the first sentence and misidentifies the victim in the second sentence. The false-alarm alarm in the second sentence would have been avoided by a system that restricted itself to matching named instances, but as described above in section 4.2, for the micro-reading reading task described here, detecting ecting relations with pronouns is critical. While we allowed the creation of text text-graph patterns with arbitrarily long paths between the arguments, in practice, the system rarely learned such patterns. For the relation killed(Agent, Vi Vic- tim), we learned 8 patterns that have more than one predicate (compared to 22 that only have a single predicate). For the relation killedInLocation(Victim, Location), , the system learned 28 pa patterns with more than 1 predicate (compared with 20 containing only 1 predicate). In both cases, the precision of the longer patterns was higher, but their recall was significantly lower. In the case of killedInLocation, none of the longer path patterns matched any of the examples in our recall set. One strength of text-graph graph patterns is allowing for intelligent omission of overly specific text text, for example, ignoring ‘during during a buglary’ in Figure 6. 66 Surface string patterns can include wild-cards, but for surface string patterns, the omission is not syntactically defined. Approximately 30% of surface string patterns included one wild-card. wild An additional 17% included two. Figure 7 presents aver- aged precision and recall for text-graph text and surface-string patterns. The final three columns break the surface-string string patterns down by the num- nu ber of wild-cards. It appears that with one, the pat- pa terns remain reasonably precise, but the addition of o a second wild-card drops precision by more than 50%. The presence of wild-card patterns improves recall, but surface-string string patterns do not reach the level of recall of text-graph graph patterns. Text Graph Surface String All No-* No 1-* 2-* Precision 0.75 0.61 0.72 0.69 0.30 Recall 0.32 0.22 0.16 0.10 0.<strong>09</strong> Figure 7: Performance by Number of WildCards (*) 7 Effect of Human Review In addition to allowing the system to self-train self in a completely unsupervised manner, we ran a parallel set of experiments where the system was given limited human guidance. At the end of iterations 1, 5, 10, and 20, , a person provided under 10 minutes of feedback (on average 5 minutes). minutes) The person was presented with five patterns, patterns and five sample matched instances for each pattern. The person was able to provide two types of feedback: • The pattern is correct/incorrect (e.g. said is an incorrect pattern for employ(X,Y)) employ( • The matched instances are correct/incorrect (e.g. ‘Bob received a diploma from MIT’ is a correct instance, even if the pattern that pro- pr duced it is debatable (e.g. v: subj:PERSON, from:ORGANIZATION). from:ORGANIZATION A correct instance can also produce a new knownto-be correct seed. Pattern judgments are stored in the database and incorporated as absolute truth. Instance judgments provide useful input into pattern scoring. Patterns were selected for annotation using a score that combines their estimated f-measure; easure; their the frequency; and their dissimilarity to patterns that were previously chosen hosen for annotation. The matched instances for each pattern are randomly sampled, to ensure that the resulting annotation can be used to derive an unbiased precision estimate. estima
Figure 8, Figure 9, and Figure 10 plot precision, recall, and F-score score at iterations 5 and 20 for the system running in a fully unsupervised manner and one allowing human intervention. Figure 8: : Precision at Iterations 5 and 20 for the Uns Unsupervised System and the System with Intervention Figure 9: Recall at Iterations 5 and 20 for the Unsupervised System and the System with Intervention Figure 10: F-Score Score at Iterations 5 and 20 for the Uns Unsupervised System and the System with Intervention For two relations: child and sibling sibling, recall improved dramatically atically with human intervention. By inspecting the patterns produced by the system, we see that in case of sibling without intervention, the system only learned the relation ‘brother’ brother’ and not the relation ‘sister.’ The limited feedback from a person was enough nough to allow the system to learn patterns for sister as well, causing the significantly improved recall. We see smaller, but frequently significant improvements in recall in a number of other relations. Interestingly, for different relat relations, the recall improvements are seen at different iter iterations. For sibling, the jump in recall appears within the first five iterations. Contrastingly, for attend- School, , there is a minor improvement in recall aaf 67 ter iteration 5, but a much larger improvement after afte iteration 20. For child, there is actually a small de- d crease in recall after 5 iterations, but after 20 itera- iter tions, the system has dramatically improved. The effect on precision is similarly varied. For 9 of the 11, human intervention improves preci- prec sion; but the improvement is never as dramatic as the improvement in recall. For precision, the strongest improvements in performance appear in the early iterations. It is unclear whether this mere- mer ly reflects that bootstrapping is likely to become less precise over time (as it learns more patterns), or if early feedback is truly better for improving precision. In the case of attackOn, even with human intervention, after iteration 10, the system begins to learn very general patterns of the type described in the previous section (e.g. as a pattern indicating ndicating an attack. attack These patterns may be correlated with experiencing an attack but are not themselves evidence of an attack. attack Because the overly general patterns do in fact cor- co relate with the presence of an attack, the positive examples provided by human intervention may in fact produce more such patterns. There here is an interaction between improved preci- prec sion and improved recall. If a system is very im- i precise at iteration n, , the additional instances that it proposes may not reflect the relation and be so dif- di ferent from each other that the system becomes unable to produce good patterns that improve re- r call. Conversely, if recall at iteration n does not produce a sufficiently diverse set of instances, it will be difficult for the system to generate instances that are used to estimate pattern precision. 8 Related Work Much research has been done on concept and relation detection using large amounts of super- supe vised training. This is the typical approach in pro- pr grams like Automatic Content Extraction (ACE), which evaluates luates system performance in detecting a fixed set of concepts and relations in text. In ACE, ACE all participating researchers are given access to a substantial amount of supervised training, e.g., 250k words of annotated data. Researchers have typically used this data to incorporate a great deal of structural syntactic information in their models (e.g. Ramshaw 2001), but the obvious weakness of these approaches is the resulting reliance on the
Page 1 and 2:
NAACL HLT 2010 First International
Page 3:
Introduction It has been a long ter
Page 7:
Table of Contents Machine Reading a
Page 10 and 11:
Sunday, June 6, 2010 (continued) Se
Page 12 and 13:
"coherent", based on criteria such
Page 14 and 15:
(SRL). While there are a number of
Page 16 and 17:
epeat select a clause chain Cu of
Page 18 and 19:
even if those words reflect somethi
Page 20 and 21:
Building an end-to-end text reading
Page 22 and 23:
found to be wrong or correct (by su
Page 24 and 25:
Text Queue Parser 4 Summary Text Mi
Page 26 and 27: formalized procedure to attach elem
Page 28 and 29: Steve_Walsh:throw:pass Steve_Walsh:
Page 30 and 31: NVNPN 2 'person':'intercept':'pass'
Page 32 and 33: 6 Related Work To build the knowled
Page 34 and 35: Large Scale Relation Detection ∗
Page 36 and 37: 1000 900 800 700 600 500 400 300 20
Page 38 and 39: to run against a web-scale corpus a
Page 40 and 41: Relation Prec Rec F1 Tuples Seeds i
Page 42 and 43: of the pattern space for any given
Page 44 and 45: Mining Script-Like Structures from
Page 46 and 47: e1 [ enter, nsubj, {customer, John}
Page 48 and 49: To identify such pairs, the topic s
Page 50 and 51: ≈ 57% 2. More words (bold) were j
Page 52 and 53: References Sergey Brin and Lawrence
Page 54 and 55: strengthsandweaknessesofthesystemin
Page 56 and 57: Fine-grainedRSTparser CollapsedRSTp
Page 58 and 59: Inference accuracy 0.3 0.25 0.2 0.1
Page 60 and 61: wehaveintroducedinferenceprocedures
Page 62 and 63: Semantic Role Labeling for Open Inf
Page 64 and 65: inference. Pruning involves using a
Page 66 and 67: TEXTRUNNER SRL-IE P R F1 P R F1 Bin
Page 68 and 69: that maximizes information gain div
Page 70 and 71: References Eugene Agichtein and Lui
Page 72 and 73: 1. Patterns based on words vs. pred
Page 74 and 75: state of the art information extrac
Page 78 and 79: manually annotated examples, which
Page 80 and 81: Towards Learning Rules from Natural
Page 82 and 83: tions of the learner suit the obser
Page 84 and 85: Accuracy Accuracy (Aggressive−Nov
Page 86 and 87: Accuracy Accuracy 100 95 90 85 80 7
Page 88 and 89: Unsupervised techniques for discove
Page 90 and 91: to categorize them. The article on
Page 92 and 93: ized label) or the most specialized
Page 94 and 95: Similarity Function k (L=2) F (L=2)
Page 96 and 97: References Sören Auer, Christian B
Page 98 and 99: a wide spectrum of solutions to the
Page 100 and 101: tion from the Web corpus at large s
Page 102 and 103: TextRunner, Kylin, KOG, WOE, WPE).
Page 104 and 105: Doug Downey, Matthew Broadhead, and
Page 106 and 107: Analogical Dialogue Acts: Supportin
Page 108 and 109: Heat flows from one place to anothe
Page 110 and 111: Source Text Translation* QRG-CE Tex
Page 112 and 113: 1987). We simplified the syntax of
Page 114 and 115: Gentner, D. (1983). Structure-Mappi
Page 116 and 117: sources can help in tasks like name
Page 118 and 119: werset in the previous step and bas
Page 120 and 121: we were able to construct a list of
Page 122 and 123: found at threshold of 2. There were
Page 124 and 125: Supporting rule-based representatio
Page 126 and 127:
the domain of the spatial argument
Page 128 and 129:
the time of the movement. This link
Page 130 and 131:
don’t seem to be plausible candid
Page 132 and 133:
PRISMATIC: Inducing Knowledge from
Page 134 and 135:
Figure 1: System Overview by a suit
Page 136 and 137:
ferent dimensions. Continuing with
Page 139:
Author Index Barbella, David, 96 Ba
show all

W10-09

Create successful ePaper yourself

Delete template?

Save as template?