W10-09

Recommendations

Info

Accuracy Accuracy (Aggressive−NoveltyModel) Multiple−predicateBootstrappingResultsforFARMER 0501001502000 0.4 0.6 0.8 0 0.51 Fractionofmissingpredicates Supportthreshold 0.2 100 80 60 40 (Aggressive−NoveltyModel) 0.4 0.6 0.8 0.8 0.85 0.9 0.951Multiple−predicateBootstrappingResultsforFOIL 0501001502000 Supportthreshold Fractionofmissingpredicates 0.2 (a) (b) Accuracy Multiple-predicate Bootstrapping Results for FARMER (Support threshold = 120) Novelty-Aggressive 20 Random-Aggressive Random-Conservative 0 0 0.1 Novelty-Conservative 0.2 0.3 0.4 0.5 0.6 Fraction of missing predicates (c) (d) Accuracy 100 80 60 40 Multiple-predicate Bootstrapping Results for FOIL (Support threshold = 120) Novelty-Aggressive 20 Random-Aggressive Random-Conservative 0 0 0.1 Novelty-Conservative 0.2 0.3 0.4 0.5 0.6 Fraction of missing predicates Figure 1: Multiple-predicate bootstrapping Results for (a) FARMER using aggressive-novelty model (b) FOIL using aggressive-novelty model (c) FARMER with support threshold 120 (d) FOIL with support threshold 120 from the first group to mention its values, and omitseach of the other predicates independently with some probabilityÕ. Similarly it gives a value to one of the two predicates in the second group and omits the other predicate with probabilityÕ. One consequence of this model is that it always has one of the predicates in the first group and one of the predicates in the second group, which is sufficient to infer everything if one knew the correct domain rules. We evaluate two scoring methods: the aggressive scoring and the conservative scoring. We employed two learning systems: Quinlan’s FOIL, which learns relational rules using a greedy covering algorithm (Quinlan, 1990; Cameron- Jones and Quinlan, 1994), and Nijssen and Kok’s FARMER, which is a relational data mining algorithm that searches for conjunctions of literals of large support using a bottom-up depth first search 74 (Nijssen and Kok, 2003). Both systems were applied to learn rules for all target predicates. One important difference to note here is that while FARMER seeks all rules that exceed the necessary support threshold, FOIL only learns rules that are sufficient to classify all training instances into those that satisfy the target predicate and those that do not. Secondly, FOIL tries to learn maximally deterministic rules, while FARMER is parameterized by the minimum precision of a rule. We have not modified the way they interpret missing features during learning. However, after the learning is complete, the rules learned by both approaches are scored by interpreting the missing data either aggressively or conservatively as described in the previous section. We ran both systems on synthetic data generated using different parameters that control the fraction of missing data and the minimum support threshold
needed for promotion. In Figures 1(a) and 1(b), the X and Y-axes show the fraction of missing predicates and the support threshold for the novelty mention model and aggressive scoring of rules for FOIL and FARMER. On the Z-axis is the accuracy of predictions on the missing data, which is the fraction of the total number of initially missing entries that are correctly imputed. We can see that aggressive scoring of rules with the novelty mention model performs very well even for large numbers of missing values for both FARMER and FOIL. FARMER’s performance is more robust than FOIL’s because FARMER learns all correct rules and uses whichever rule fires. For example, in the NFL domain, it could infer gameWinner from gameLoser or teamSmallerScore or teamGreaterScore. In contrast, FOIL’s covering strategy prevents it from learning more than one rule if it finds one perfect rule. The results show that FOIL’s performance degrades at larger fractions of missing data and large support thresholds. Figures 1(c) and 1(d) show the accuracy of prediction vs. percentage of missing predicates for each of the mention models and the scoring methods for FARMER and FOIL for a support threshold of 120. They show that agressive scoring clearly outperforms conservative scoring for data generated using the novelty mention model. In FOIL, aggressive scoring also seems to outperform conservative scoring on the dataset generated by the random mention model at high levels of missing data. In FARMER, the two methods perform similarly. However, these results should be interpreted cautiously as they are derived from a single dataset which enjoys deterministic rules. We are working towards a more robust evaluation in multiple domains as well as data extracted from natural texts. 4 Ensemble Co-learning with an Implicit Mention Model One weakness of Multiple-predicate Bootstrapping is its high variance especially when significant amounts of training data are missing. Aggressive evaluation of rules in this case would amplify the contradictory conclusions of different rules. Thus, picking only one rule among the many possible rules could lead to dangerously large variance. 75 One way to guard against the variance problem is to use an ensemble approach. In this section we test the hypothesis that an ensemble approach would be more robust and exhibit less variance in the context of learning from incomplete examples with an implicit mention model. For the experiments in this section, we employ a decision tree learner that uses a distributional scoring scheme to handle missing data as described in (Quinlan, 1986). While classifying an instance, when a missing value is encountered, the instance is split into multiple pseudo-instances each with a different value for the missing feature and a weight corresponding to the estimated probability for the particular missing value (based on the frequency of values at this split in the training data). Each pseudo-instance is passed down the tree according to its assigned value. After reaching a leaf node, the frequency of the class in the training instances associated with this leaf is returned as the class-membership probability of the pseudo-instance. The overall estimated probability of class membership is calculated as the weighted average of class membership probabilities over all pseudo-instances. If there is more than one missing value, the process recurses with the weights combining multiplicatively. The process is similar at the training time, except that the information gain at the internal nodes and the class probabilities at the leaves are calculated based on the weights of the relevant pseudo-instances. We use the confidence level for pruning a decision tree as a proxy for support of a rule in this case. By setting this parameter to different values, we can obtain different degrees of pruning. Experimental Results: We use the Congressional Voting Records 2 database for our experiments. The (non-text) database includes the party affiliation and votes on 16 measures for each member of the U.S House Representatives. Although this database (just like the NFL database) is complete, we generate two different synthetic versions of it to simulate the extracted facts from typical news stories on this topic. We use all the instances including those with unknown values for training, but do not count the errors on these unknown values. We ex- 2 http://archive.ics.uci.edu/ml/datasets/ Congressional+Voting+Records
Page 1 and 2:
NAACL HLT 2010 First International
Page 3:
Introduction It has been a long ter
Page 7:
Table of Contents Machine Reading a
Page 10 and 11:
Sunday, June 6, 2010 (continued) Se
Page 12 and 13:
"coherent", based on criteria such
Page 14 and 15:
(SRL). While there are a number of
Page 16 and 17:
epeat select a clause chain Cu of
Page 18 and 19:
even if those words reflect somethi
Page 20 and 21:
Building an end-to-end text reading
Page 22 and 23:
found to be wrong or correct (by su
Page 24 and 25:
Text Queue Parser 4 Summary Text Mi
Page 26 and 27:
formalized procedure to attach elem
Page 28 and 29:
Steve_Walsh:throw:pass Steve_Walsh:
Page 30 and 31:
NVNPN 2 'person':'intercept':'pass'
Page 32 and 33:
6 Related Work To build the knowled
Page 34 and 35: Large Scale Relation Detection ∗
Page 36 and 37: 1000 900 800 700 600 500 400 300 20
Page 38 and 39: to run against a web-scale corpus a
Page 40 and 41: Relation Prec Rec F1 Tuples Seeds i
Page 42 and 43: of the pattern space for any given
Page 44 and 45: Mining Script-Like Structures from
Page 46 and 47: e1 [ enter, nsubj, {customer, John}
Page 48 and 49: To identify such pairs, the topic s
Page 50 and 51: ≈ 57% 2. More words (bold) were j
Page 52 and 53: References Sergey Brin and Lawrence
Page 54 and 55: strengthsandweaknessesofthesystemin
Page 56 and 57: Fine-grainedRSTparser CollapsedRSTp
Page 58 and 59: Inference accuracy 0.3 0.25 0.2 0.1
Page 60 and 61: wehaveintroducedinferenceprocedures
Page 62 and 63: Semantic Role Labeling for Open Inf
Page 64 and 65: inference. Pruning involves using a
Page 66 and 67: TEXTRUNNER SRL-IE P R F1 P R F1 Bin
Page 68 and 69: that maximizes information gain div
Page 70 and 71: References Eugene Agichtein and Lui
Page 72 and 73: 1. Patterns based on words vs. pred
Page 74 and 75: state of the art information extrac
Page 76 and 77: ence of an attack. For instance, th
Page 78 and 79: manually annotated examples, which
Page 80 and 81: Towards Learning Rules from Natural
Page 82 and 83: tions of the learner suit the obser
Page 86 and 87: Accuracy Accuracy 100 95 90 85 80 7
Page 88 and 89: Unsupervised techniques for discove
Page 90 and 91: to categorize them. The article on
Page 92 and 93: ized label) or the most specialized
Page 94 and 95: Similarity Function k (L=2) F (L=2)
Page 96 and 97: References Sören Auer, Christian B
Page 98 and 99: a wide spectrum of solutions to the
Page 100 and 101: tion from the Web corpus at large s
Page 102 and 103: TextRunner, Kylin, KOG, WOE, WPE).
Page 104 and 105: Doug Downey, Matthew Broadhead, and
Page 106 and 107: Analogical Dialogue Acts: Supportin
Page 108 and 109: Heat flows from one place to anothe
Page 110 and 111: Source Text Translation* QRG-CE Tex
Page 112 and 113: 1987). We simplified the syntax of
Page 114 and 115: Gentner, D. (1983). Structure-Mappi
Page 116 and 117: sources can help in tasks like name
Page 118 and 119: werset in the previous step and bas
Page 120 and 121: we were able to construct a list of
Page 122 and 123: found at threshold of 2. There were
Page 124 and 125: Supporting rule-based representatio
Page 126 and 127: the domain of the spatial argument
Page 128 and 129: the time of the movement. This link
Page 130 and 131: don’t seem to be plausible candid
Page 132 and 133: PRISMATIC: Inducing Knowledge from
Page 134 and 135:
Figure 1: System Overview by a suit
Page 136 and 137:
ferent dimensions. Continuing with
Page 139:
Author Index Barbella, David, 96 Ba
show all

W10-09

Create successful ePaper yourself

Delete template?

Save as template?