23.02.2015 Views

Machine Learning - DISCo

Machine Learning - DISCo

Machine Learning - DISCo

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

call LEMMA-ENUMERATOR. The LEMMA-ENUMERATOR algorithm simply enumerates<br />

all proof trees that conclude the target concept based on assertions in the domain<br />

theory B. For each such proof tree, LEMMA-ENUMERATOR calculates the weakest<br />

preimage and constructs a Horn clause, in the same fashion as PROLOG-EBG. The<br />

only difference between LEMMA-ENUMERATOR and PROLOG-EBG is that LEMMA-<br />

ENUMERATOR ignores the training data and enumerates all proof trees.<br />

Notice LEMMA-ENUMERATOR will output a superset of the Horn clauses output<br />

by PROLOG-EBG. Given this fact, several questions arise. First, if its hypotheses<br />

follow from the domain theory alone, then what is the role of training data in<br />

PROLOG-EBG? The answer is that training examples focus the PROLOG-EBG algorithm<br />

on generating rules that cover the distribution of instances that occur in<br />

practice. In our original chess example, for instance, the set of all possible lemmas<br />

is huge, whereas the set of chess positions that occur in normal play is only a<br />

small fraction of those that are syntactically possible. Therefore, by focusing only<br />

on training examples encountered in practice, the program is likely to develop a<br />

smaller, more relevant set of rules than if it attempted to enumerate all possible<br />

lemmas about chess.<br />

The second question that arises is whether PROLOG-EBG can ever learn a<br />

hypothesis that goes beyond the knowledge that is already implicit in the domain<br />

theory. Put another way, will it ever learn to classify an instance that could not<br />

be classified by the original domain theory (assuming a theorem prover with<br />

unbounded computational resources)? Unfortunately, it will not. If B F h, then<br />

any classification entailed by h will also be entailed by B. Is this an inherent<br />

limitation of analytical or deductive learning methods? No, it is not, as illustrated<br />

by the following example.<br />

To produce an instance of deductive learning in which the learned hypothesis<br />

h entails conclusions that are not entailed by B, we must create an example where<br />

B y h but where D A B F h (recall the constraint given by Equation (11.2)).<br />

One interesting case is when B contains assertions such as "If x satisfies the<br />

target concept, then so will g(x)." Taken alone, this assertion does not entail the<br />

classification of any instances. However, once we observe a positive example, it<br />

allows generalizing deductively to other unseen instances. For example, consider<br />

learning the PlayTennis target concept, describing the days on which our friend<br />

Ross would like to play tennis. Imagine that each day is described only by the<br />

single attribute Humidity, and the domain theory B includes the single assertion<br />

"If Ross likes to play tennis when the humidity is x, then he will also like to play<br />

tennis when the humidity is lower than x," which can be stated more formally as<br />

(Vx)<br />

IF ((PlayTennis = Yes) t (Humidity = x))<br />

THEN ((PlayTennis = Yes) t (Humidity 5 x))<br />

Note that this domain theory does not entail any conclusions regarding which<br />

instances are positive or negative instances of PlayTennis. However, once the<br />

learner observes a positive example day for which Humidity = .30, the domain<br />

theory together with this positive example entails the following general hypothe-

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!