23.02.2015 Views

Machine Learning - DISCo

Machine Learning - DISCo

Machine Learning - DISCo

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Note that the target literal Child(Bob, Sharon) is entailed by hl AX^ with no need<br />

for the background information B. In the case of hypothesis h2, however, the<br />

situation is different. The target Child(Bob, Sharon) follows from B ~ h AX^, 2 but<br />

not from h2 AX^ alone. This example illustrates the role of background knowledge<br />

in expanding the set of acceptable hypotheses for a given set of training data. It also<br />

illustrates how new predicates (e.g., Parent) can be introduced into hypotheses<br />

(e.g., h2), even when the predicate is not present in the original description of the<br />

instance xi. This process of augmenting the set of predicates, based on background<br />

knowledge, is often referred to as constructive induction.<br />

The significance of Equation (10.2) is that it casts the learning problem in the<br />

framework of deductive inference and formal logic. In the case of propositional<br />

and first-order logics, there exist well-understood algorithms for automated deduction.<br />

Interestingly, it is possible to develop inverses of these procedures in order<br />

to automate the process of inductive generalization. The insight that induction<br />

might be performed by inverting deduction appears to have been first observed<br />

by the nineteenth century economist W. S. Jevons, who wrote:<br />

Induction is, in fact, the inverse operation of deduction, and cannot be conceived<br />

to exist without the corresponding operation, so that the question of relative<br />

importance cannot arise. Who thinks of asking whether addition or subtraction is<br />

the more important process in arithmetic? But at the same time much difference in<br />

difficulty may exist between a direct and inverse operation; . . . it must be allowed<br />

that inductive investigations are of a far higher degree of difficulty and complexity<br />

than any questions of deduction.. . . (Jevons 1874)<br />

In the remainder of this chapter we will explore this view of induction<br />

as the inverse of deduction. The general issue we will be interested in here is<br />

designing inverse entailment operators. An inverse entailment operator, O(B, D)<br />

takes the training data D = {(xi, f (xi))} and background knowledge B as input<br />

and produces as output a hypothesis h satisfying Equation (10.2).<br />

O(B, D) = h such that (V(xi, f (xi)) E D) (B ~h A xi) F f (xi)<br />

Of course there will, in general, be many different hypotheses h that satisfy<br />

(V(X~, f (xi)) E D) (B A h A xi) F f (xi). One common heuristic in ILP for choosing<br />

among such hypotheses is to rely on the heuristic known as the Minimum<br />

Description Length principle (see Section 6.6).<br />

There are several attractive features to formulating the learning task as finding<br />

a hypothesis h that solves the relation (V(xi, f (xi)) E D) (B A h A xi) F f (xi).<br />

0 This formulation subsumes the common definition of learning as finding<br />

some general concept that matches a given set of training examples (which<br />

corresponds to the special case where no background knowledge B is available).<br />

0 By incorporating the notion of background information B, this formulation<br />

allows a more rich definition of when a hypothesis may be said to "fit"<br />

the data. Up until now, we have always determined whether a hypothesis

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!