23.02.2015 Views

Machine Learning - DISCo

Machine Learning - DISCo

Machine Learning - DISCo

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

In the case of noise-free training data, FOIL may continue adding new literals<br />

to the rule until it covers no negative examples. To handle noisy data, the search<br />

is continued until some tradeoff occurs between rule accuracy, coverage, and<br />

complexity. FOIL uses a minimum description length approach to halt the growth<br />

of rules, in which new literals are added only when their description length is<br />

shorter than the description length of the training data they explain. The details<br />

of this strategy are given in Quinlan (1990). In addition, FOIL post-prunes each<br />

rule it learns, using the same rule post-pruning strategy used for decision trees<br />

(Chapter 3).<br />

10.6 INDUCTION AS INVERTED DEDUCTION<br />

A second, quite different approach to inductive logic programming is based on<br />

the simple observation that induction is just the inverse of deduction! In general,<br />

machine learning involves building theories that explain the observed data. Given<br />

some data D and some partial background knowledge B, learning can be described<br />

as generating a hypothesis h that, together with B, explains D. Put more precisely,<br />

assume as usual that the training data D is a set of training examples, each of<br />

the form (xi, f (xi)). Here xi denotes the ith training instance and f (xi) denotes<br />

its target value. Then learning is the problem of discovering a hypothesis h, such<br />

that the classification f (xi) of each training instance xi follows deductively from<br />

the hypothesis h, the description of xi, and any other background knowledge B<br />

known to the system.<br />

(V(xi, f (xi)) E D) (B Ah A xi) f (xi) (10.2)<br />

The expression X F Y is read "Y follows deductively from X," or alternatively<br />

"X entails Y." Expression (10.2) describes the constraint that must be satisfied<br />

by the learned hypothesis h; namely, for every training instance xi, the target<br />

classification f (xi) must follow deductively from B, h, and xi.<br />

As an example, consider the case where the target concept to be learned is<br />

"pairs of people (u, v) such that the child of u is v," represented by the predicate<br />

Child(u, v). Assume we are given a single positive example Child(Bob, Sharon),<br />

where the instance is described by the literals Male(Bob), Female(Sharon), and<br />

Father(Sharon, Bob). Furthermore, suppose we have the general background<br />

knowledge Parent (u, v) t Father (u, v). We can describe this situation in the<br />

terms of Equation (10.2) as follows:<br />

xi : Male(Bob), Female(Sharon), Father(Sharon, Bob)<br />

f (xi) :<br />

Child(Bob, Sharon)<br />

In this case, two of the many hypotheses that satisfy the constraint (B Ah A xi) t-<br />

f (xi) are<br />

hl : Child(u, v) t Father(v, u)<br />

h2 : Child(u, v) t Parent (v, u)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!