14.11.2014 Views

Learning in First-Order Logic using Greedy ... - ResearchGate

Learning in First-Order Logic using Greedy ... - ResearchGate

Learning in First-Order Logic using Greedy ... - ResearchGate

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Learn<strong>in</strong>g</strong> <strong>in</strong> <strong>First</strong>-<strong>Order</strong> <strong>Logic</strong> us<strong>in</strong>g <strong>Greedy</strong> Evolutionary Algorithms<br />

Federico Div<strong>in</strong>a<br />

div<strong>in</strong>a@cs.vu.nl<br />

Elena Marchiori<br />

elena@cs.vu.nl<br />

Department of Mathematics and Computer Sciences, Vrije Universiteit, De Boelelaan 1081a, 1081 HV Amsterdam<br />

The Netherlands<br />

Abstract<br />

In evolutionary computation ‘learn<strong>in</strong>g’ is a<br />

byproduct of the evolutionary process as<br />

successful <strong>in</strong>dividuals are reta<strong>in</strong>ed through<br />

stochastic trial and error. This learn<strong>in</strong>g process<br />

can be rather slow, due to the weak strategy<br />

used to guide evolution. A way to overcome<br />

this drawback is to <strong>in</strong>corporate greedy<br />

operators <strong>in</strong> the evolutionary process.<br />

This paper <strong>in</strong>vestigates the effectiveness of<br />

this approach for <strong>in</strong>ductive concept learn<strong>in</strong>g<br />

<strong>in</strong> (a fragment of) <strong>First</strong>-<strong>Order</strong> <strong>Logic</strong> (FOL).<br />

This is done by means of a new greedy evolutionary<br />

algorithm. The algorithm evolves<br />

a population of Horn clauses. Randomized<br />

greedy operators are employed for generaliz<strong>in</strong>g<br />

and specializ<strong>in</strong>g a clause. The degree of<br />

greed<strong>in</strong>ess of each operator is determ<strong>in</strong>ed by<br />

a parameter. In this way, the user can control<br />

the greed<strong>in</strong>ess of the learn<strong>in</strong>g process by<br />

sett<strong>in</strong>g the parameters to specific values.<br />

A typical case study <strong>in</strong> Inductive <strong>Logic</strong> Programm<strong>in</strong>g<br />

(the KRK endgame problem) is<br />

used for test<strong>in</strong>g the learn<strong>in</strong>g method. The<br />

effect of the greedy operators on the learn<strong>in</strong>g<br />

process is analyzed, by means of extensive<br />

experiments with different values of their parameters.<br />

Moreover, the robustness of the<br />

method to noise <strong>in</strong> the tra<strong>in</strong><strong>in</strong>g examples is<br />

<strong>in</strong>vestigated.<br />

1. Introduction<br />

<strong>Learn<strong>in</strong>g</strong> from examples <strong>in</strong> FOL, also known as Inductive<br />

<strong>Logic</strong> Programm<strong>in</strong>g (ILP) (Muggleton & Raedt,<br />

1994), constitutes a central topic <strong>in</strong> Artificial Intelligence,<br />

with relevant applications to problems <strong>in</strong> complex<br />

doma<strong>in</strong>s like natural language and molecular computational<br />

biology (Muggleton, 1999). Given a FOL<br />

description language used to express possible hypotheses,<br />

a set of positive examples, and a set of negative<br />

examples, one has to f<strong>in</strong>d a hypothesis which covers<br />

all positive examples and none of the negative ones<br />

(cf. (Kubat et al., 1998; Mitchell, 1997)).<br />

<strong>Learn<strong>in</strong>g</strong> hypotheses <strong>in</strong> FOL is a hard task because the<br />

search space (of all hypotheses)has prohibitive size,<br />

also when restrictions on the representation are imposed<br />

(e.g. Datalog clauses). A standard approach to<br />

tackle this problem, adopted <strong>in</strong> the majority of FOL<br />

learn<strong>in</strong>g systems, is to use specific search strategies,<br />

like the general-to-specific (hill-climb<strong>in</strong>g) search (e.g.<br />

(Qu<strong>in</strong>lan, 1990)) and the <strong>in</strong>verse resolution mechanism<br />

(e.g. (Muggleton & Bunt<strong>in</strong>e, 1988)).<br />

An alternative approach, based on evolutionary computation,<br />

employ a multi-po<strong>in</strong>t search strategy where<br />

randomized operators (mutation and crossover) are<br />

used to move <strong>in</strong> the search space, and a fitness function<br />

is used to guide the search (by means of a probabilistic<br />

selection process) (Jong et al., 1993). The result<strong>in</strong>g<br />

learn<strong>in</strong>g methods are <strong>in</strong> general weaker than the ILP<br />

ones, but more effective <strong>in</strong> escap<strong>in</strong>g from attraction<br />

bas<strong>in</strong>s (local optima) dur<strong>in</strong>g the search for a best hypothesis.<br />

This paper <strong>in</strong>vestigates a learn<strong>in</strong>g framework which<br />

unites these two approaches. This is done via generalization/specialization<br />

operators which are <strong>in</strong>corporated<br />

<strong>in</strong> the mutation process of a simple evolutionary<br />

algorithm. Four randomized greedy operators are <strong>in</strong>troduced,<br />

two for generaliz<strong>in</strong>g and two for specializ<strong>in</strong>g<br />

a clause. Each operator is equipped with a parameter<br />

which determ<strong>in</strong>es its degree of greed<strong>in</strong>ess. Different<br />

values of the parameters determ<strong>in</strong>e different search<br />

strategies: low values yield weak learn<strong>in</strong>g methods,<br />

like standard evolutionary algorithms, while high values<br />

yield more greedy learn<strong>in</strong>g methods, like Inductive<br />

<strong>Logic</strong> Programm<strong>in</strong>g systems.<br />

The result<strong>in</strong>g hybrid evolutionary algorithm is tested<br />

on a case study which appears frequently <strong>in</strong> the ILP<br />

literature, learn<strong>in</strong>g illegal white-to-move positions <strong>in</strong>


the chess endgame White K<strong>in</strong>g and Rook versus Black<br />

K<strong>in</strong>g. The results of experiments on datasets for this<br />

problem <strong>in</strong>dicate that the presence of greed<strong>in</strong>ess <strong>in</strong> the<br />

operators is beneficial for the learn<strong>in</strong>g process, but<br />

that too much greed<strong>in</strong>ess may affect negatively the<br />

learn<strong>in</strong>g process. Satisfactory performance can also be<br />

obta<strong>in</strong>ed also when greed<strong>in</strong>ess is used <strong>in</strong> only one operator.<br />

Moreover, the hybrid algorithm exhibit a robust<br />

behaviour when different levels of noise are <strong>in</strong>troduced<br />

<strong>in</strong> the tra<strong>in</strong><strong>in</strong>g dataset.<br />

This <strong>in</strong>vestigation suggests an experimental methodology<br />

for design<strong>in</strong>g greedy evolutionary algorithms for<br />

<strong>in</strong>ductive learn<strong>in</strong>g, where the user constructs a suitable<br />

search strategy for a considered learn<strong>in</strong>g problem,<br />

by sett<strong>in</strong>g the greed<strong>in</strong>ess parameters to specific values<br />

(which are experimentally determ<strong>in</strong>ed).<br />

2. Evolutionary Approaches<br />

One can dist<strong>in</strong>guish two ma<strong>in</strong> <strong>in</strong>ductive learn<strong>in</strong>g approaches<br />

based on evolutionary computation, called<br />

Pittsburgh and Michigan (cf. (Michalewicz, 1996)).<br />

In the first approach, an <strong>in</strong>dividual represents an entire<br />

set of rules, a population of rule sets is ma<strong>in</strong>ta<strong>in</strong>ed, and<br />

selection and genetic operators are used to produce<br />

new generations of rule sets. Instances of this approach<br />

are e.g. GIL (Janikow, 1993), GLPS (Leung & Wong,<br />

1995) and STEPS (Kennedy & Giraud-Carrier, 1999).<br />

In contrast, the Michigan approach employs a representation<br />

where an <strong>in</strong>dividual represents one rule,<br />

and <strong>in</strong>dividuals co-operate and compete <strong>in</strong> the evolutionary<br />

process. In this approach specific strategies<br />

have to be designed <strong>in</strong> order to extract a non redundant<br />

hypothesis from the f<strong>in</strong>al population. Systems for<br />

FOL learn<strong>in</strong>g based on this approach are e.g. REGAL<br />

(Giordana & Neri, 1996), DOGMA (Hekanaho, 1998).<br />

Both approaches present advantages and drawbacks.<br />

Encod<strong>in</strong>g a whole hypothesis <strong>in</strong> each <strong>in</strong>dividual allows<br />

an easier control of the genetic search but <strong>in</strong>troduces<br />

a large redundancy, that can lead to populations<br />

hard to manage and to <strong>in</strong>dividuals of enormous size.<br />

In the Michigan approach, co-operation/competition<br />

between different <strong>in</strong>dividuals reduces redundancy and<br />

more complex problems can be handled, but also more<br />

sophisticated strategies may have to be designed, for<br />

cop<strong>in</strong>g with the presence <strong>in</strong> the population of super<strong>in</strong>dividuals<br />

which lead the evolutionary process to a<br />

premature convergence.<br />

This paper adopts the Michigan approach.<br />

3. A <strong>Greedy</strong> Evolutionary Algorithm<br />

This section describes the components of a hybrid<br />

evolutionary algorithm for <strong>in</strong>ductive learn<strong>in</strong>g <strong>in</strong> FOL,<br />

called GEL (<strong>Greedy</strong> Evolutionary Learner). The algorithm<br />

evolves a population of Horn clauses us<strong>in</strong>g a<br />

mutation process for mov<strong>in</strong>g <strong>in</strong> the search space, a<br />

fitness function for measur<strong>in</strong>g the quality of clauses,<br />

and a probabilistic selection/replacement mechanism<br />

for cull<strong>in</strong>g fitter rules from the population. In order<br />

to guide the search, the mutation process uses four<br />

greedy operators described <strong>in</strong> the next section. A best<br />

hypothesis is extracted from the f<strong>in</strong>al population us<strong>in</strong>g<br />

a heuristic procedure.<br />

3.1 Notation and Term<strong>in</strong>ology<br />

The algorithm considers Horn clauses of the form<br />

p(X, Y ) ← r(X, Z), q(Y, a).<br />

consist<strong>in</strong>g of atoms whose arguments are either variables<br />

(e.g. X, Y, Z) or constants (e.g. a). The lefthand<br />

side of the rule is called head, and the righthand-side<br />

body, of the clause. These rules have a<br />

declarative <strong>in</strong>terpretation (universally quantified FOL<br />

implications) and a procedural <strong>in</strong>terpretation (<strong>in</strong> order<br />

to solve p(X, Y ) solve r(X, Z) and q(Y, a)). A set<br />

of Horn clauses forms a logic program, which can be<br />

directly (<strong>in</strong> a slightly different syntax) executed <strong>in</strong> the<br />

programm<strong>in</strong>g language Prolog.<br />

Thus the goal of GEL is to <strong>in</strong>duce a logic program from<br />

a set of tra<strong>in</strong><strong>in</strong>g examples. The number of examples<br />

<strong>in</strong> the (tra<strong>in</strong><strong>in</strong>g) set is denoted by nte, the number of<br />

its positive (negative) examples by pos (neg, respectively).<br />

A clause covers an example if the theory formed by the<br />

clause and the background knowledge logically entails<br />

the example. Given a clause cl, the number of positive<br />

(negative) examples covered by a cl is denoted by pos cl<br />

(neg cl respectively).<br />

3.2 Representation and Fitness<br />

The algorithm operates on Horn clauses of restricted<br />

form (described above).<br />

The fitness of a clause cl is def<strong>in</strong>ed by<br />

fitness(cl) = w 1 ∗ neg cl + nte − pos cl<br />

where the <strong>in</strong>teger w 1 > 1 is a weight used to favor<br />

clauses cover<strong>in</strong>g few negative examples.<br />

So GEL has to evolve clauses with m<strong>in</strong>imum fitness.


3.3 Initialization<br />

Each clause cl of the <strong>in</strong>itial population is generated <strong>in</strong><br />

two phases as follows.<br />

<strong>First</strong>, a ground (i.e. with no variables) clause is constructed<br />

whose head is a positive example randomly<br />

selected from the tra<strong>in</strong><strong>in</strong>g set, and whose body consists<br />

of all atoms <strong>in</strong> the background knowledge hav<strong>in</strong>g<br />

at most one argument which does not occur <strong>in</strong> the<br />

head. This procedure is similar to the one used <strong>in</strong><br />

CLINT (Raedt, 1992), but <strong>in</strong> CLINT each argument<br />

of the body occurs also <strong>in</strong> the head of the clause.<br />

All other elements <strong>in</strong> the background knowledge hav<strong>in</strong>g<br />

at least one argument occurr<strong>in</strong>g <strong>in</strong> the head are<br />

<strong>in</strong>serted <strong>in</strong> a list B cl associated to cl. Atoms of this<br />

list may be added to the clause body dur<strong>in</strong>g the evolutionary<br />

process (us<strong>in</strong>g a specialization operator).<br />

Next, cl is obta<strong>in</strong>ed from this ground clause by apply<strong>in</strong>g<br />

one of the two generalization operators (described<br />

<strong>in</strong> Sections 4.1, 4.2), randomly chosen.<br />

3.4 Evolutionary Cycle<br />

At each iteration the probabilistic tournament selection<br />

mechanism (cf. (Blickle, 2000)) is used (better<br />

the fitness yields higher the probability of selection),<br />

to select a number of <strong>in</strong>dividuals. Then every selected<br />

<strong>in</strong>dividual undergoes the mutation process.<br />

No other evolutionary operator is used. In particular,<br />

crossover is not used, <strong>in</strong> accordance with the conviction<br />

that ‘a sum of optimal parts rarely leads to<br />

an optimal overall solution’, which is the key to the<br />

(philosophical) dist<strong>in</strong>ction between evolutionary and<br />

genetic algorithms (cf. (Porto, 2000)).<br />

3.5 Mutation Process<br />

The mutation process consists of the repeated application<br />

of the greedy operators to an <strong>in</strong>dividual until<br />

its fitness does not <strong>in</strong>crease (or a maximum number of<br />

iterations is reached).<br />

At each iteration one of the four greedy operators is<br />

applied. This operator is chosen as follows. <strong>First</strong>, a<br />

(randomized) test decides whether it will be a generalization<br />

or a specialization operator. Next, one of the<br />

two operators of the chosen class is randomly selected.<br />

The test decides to generalize a clause cl with probability<br />

p gen (cl) = 1 ( )<br />

poscl − neg cl<br />

+ α<br />

2 nte<br />

otherwise it decides to specialize it (with probability<br />

1 − p gen (cl)). The constant α = 1 + 0.5 ∗ (neg − pos)<br />

is used to slightly bias the decision towards generalization.<br />

The probability p gen (cl) is maximal when cl<br />

covers all positive and no negative examples, and it is<br />

m<strong>in</strong>imal <strong>in</strong> the dual case.<br />

3.6 Hypothesis Extraction<br />

At the end of the evolutionary process, a best logic program<br />

cover<strong>in</strong>g all the positive examples and no negative<br />

ones has to be extracted from the f<strong>in</strong>al population.<br />

This problem can be translated <strong>in</strong> an <strong>in</strong>stance of the<br />

weighted set cover<strong>in</strong>g problem as follows. Each <strong>in</strong>dividual<br />

cl of the f<strong>in</strong>al population is a column with<br />

positive weight equal to<br />

weight cl = neg cl ∗ fitness(cl) + 1<br />

and each positive example is a row. The problem consists<br />

of f<strong>in</strong>d<strong>in</strong>g a subset of the set of columns, cover<strong>in</strong>g<br />

all the rows and hav<strong>in</strong>g m<strong>in</strong>imum total weight. The<br />

weight of a column is def<strong>in</strong>ed <strong>in</strong> this way <strong>in</strong> order to<br />

prefer clauses cover<strong>in</strong>g few negative examples. A fast<br />

(heuristic) algorithm (cf. (Caprara et al., 1998)) is<br />

applied to this problem <strong>in</strong>stance to f<strong>in</strong>d a best logic<br />

program.<br />

4. <strong>Greedy</strong> Operators<br />

A clause cl is generalized either by replac<strong>in</strong>g (all occurrences<br />

of) a constant of the clause with a variable,<br />

or by delet<strong>in</strong>g an atom from the body of the clause.<br />

Dually, cl is specialized either by replac<strong>in</strong>g (all occurrences<br />

of) a variable of cl with a constant, or by add<strong>in</strong>g<br />

an atom to the body of cl.<br />

The four operators utilize parameters N1, . . . , N4, respectively,<br />

<strong>in</strong> their def<strong>in</strong>ition, and a ga<strong>in</strong> function.<br />

When applied to operator τ and clause cl, the ga<strong>in</strong><br />

function yields the difference between the clause fitness<br />

before and after the application of that operator<br />

ga<strong>in</strong>(cl, τ) = fitness(cl) − fitness(τ(cl)).<br />

The four operators are def<strong>in</strong>ed below.<br />

4.1 Variable <strong>in</strong>to Constant<br />

Consider the set Con consist<strong>in</strong>g of N1 constants of cl<br />

randomly chosen, and the set V ar consist<strong>in</strong>g of all the<br />

variables of cl and of a fresh variable.<br />

For each a <strong>in</strong> Con and for each X <strong>in</strong> V ar, compute<br />

ga<strong>in</strong>(cl, {a/X}), the ga<strong>in</strong> of cl when all occurrences of<br />

a are replaced by X.<br />

Choose a substitution {a/X} yield<strong>in</strong>g the highest ga<strong>in</strong><br />

(ties are broken randomly), and generalize cl by replac<strong>in</strong>g<br />

all occurrences of a with X.


4.2 Atom Deletion<br />

Consider the set Atm consist<strong>in</strong>g of N2 atoms of cl<br />

randomly chosen.<br />

For each A <strong>in</strong> Atm, compute ga<strong>in</strong>(cl, −A), the ga<strong>in</strong> of<br />

cl when A is deleted from cl.<br />

Choose an atom A yield<strong>in</strong>g the highest ga<strong>in</strong><br />

ga<strong>in</strong>(cl, −A) (ties are broken randomly), and generalize<br />

cl by delet<strong>in</strong>g A from its body.<br />

Insert the deleted atom A <strong>in</strong>to a list D cl conta<strong>in</strong><strong>in</strong>g<br />

atoms which have been deleted from cl. Atoms from<br />

this list may be added to the clause dur<strong>in</strong>g the evolutionary<br />

process by means of a specialization operator.<br />

4.3 Constant <strong>in</strong>to Variable<br />

Consider a variable X of cl randomly selected, and the<br />

set Con consist<strong>in</strong>g of N3 constants (of the problem<br />

language) randomly chosen.<br />

For each a <strong>in</strong> Con, compute ga<strong>in</strong>(cl, {X/a}), the ga<strong>in</strong><br />

of cl when all occurrences of X are replaced by a.<br />

Choose a substitution {X/a} yield<strong>in</strong>g the highest ga<strong>in</strong><br />

(ties are broken randomly), and specialize cl by replac<strong>in</strong>g<br />

all occurrences of X with a.<br />

4.4 Atom Addition<br />

Consider the set Atm consist<strong>in</strong>g of N4 atoms of B cl<br />

(<strong>in</strong>troduced <strong>in</strong> the <strong>in</strong>itialization of GEL) and of N4<br />

atoms of D cl , all randomly chosen.<br />

For each A <strong>in</strong> Atm, compute ga<strong>in</strong>(cl, +A), the ga<strong>in</strong> of<br />

cl when A is added to cl.<br />

Choose an atom A yield<strong>in</strong>g the highest ga<strong>in</strong><br />

ga<strong>in</strong>(cl, +A) (ties are broken randomly), and specialize<br />

cl by add<strong>in</strong>g A from its body.<br />

Remove A from its orig<strong>in</strong>al list (B cl or D cl ).<br />

5. Experimental Setup<br />

The case study used to test GEL is the problem<br />

of learn<strong>in</strong>g illegal positions <strong>in</strong> the chess endgame<br />

doma<strong>in</strong> White K<strong>in</strong>g and Rook versus Black K<strong>in</strong>g<br />

(KRK endgame) (Muggleton et al., 1989; Qu<strong>in</strong>lan,<br />

1990). The concept to learn is expressed by the<br />

predicate illegal(A, B, C, D, E, F ) which states that<br />

the position where the White K<strong>in</strong>g is at (A, B),<br />

the White Rook at (C, D) and the Black K<strong>in</strong>g at<br />

(E, F ) is an illegal White-to-move position. For <strong>in</strong>stance,<br />

illegal(g, 6, c, 7, c, 8) is an illegal White-tomove<br />

position. The background knowledge consists<br />

of facts about the two predicates adjacent(A, B) and<br />

less than(A, B), <strong>in</strong>dicat<strong>in</strong>g that rank/file A is adjacent<br />

and less than rank/file B, respectively. The<br />

dataset orig<strong>in</strong>ates from (Muggleton et al., 1989), and<br />

is available at<br />

http://oldwww.comlab.ox.ac.uk/oucl/groups/machlea<br />

rn/chess.html. It consists of 5 tra<strong>in</strong><strong>in</strong>g sets of 100<br />

examples, and 1 test set of 5000 examples. The<br />

background knowledge conta<strong>in</strong>s 50 elements.<br />

The parameter sett<strong>in</strong>g of GEL is obta<strong>in</strong>ed after perform<strong>in</strong>g<br />

a small number of tun<strong>in</strong>g experiments, and<br />

their choice is based on a tradeoff between efficiency<br />

and performance. The population conta<strong>in</strong>s 200 <strong>in</strong>dividuals.<br />

The tournament size is 4. At each iteration,<br />

15 <strong>in</strong>dividuals are selected (us<strong>in</strong>g the tournament selection<br />

mechanism) and mutation process is applied<br />

to each of them. The algorithm term<strong>in</strong>ates after 30<br />

iterations.<br />

6. Results<br />

The effect of vary<strong>in</strong>g the value of one greedy parameter<br />

Ni (i <strong>in</strong> 1, . . . , 4) is <strong>in</strong>vestigated, when the values of the<br />

other parameters are fixed, and are either all equal to<br />

1 (no greed<strong>in</strong>ess) or all equal to their maximum values<br />

max (full greed<strong>in</strong>ess). In this way, a parameter sett<strong>in</strong>g<br />

can be described by a pair (Ni = v, w), mean<strong>in</strong>g that<br />

Ni has value v and all other parameters have value w.<br />

For each parameter sett<strong>in</strong>g, 25 runs of the GEL are<br />

performed, 5 runs with different random seeds for<br />

each of the 5 tra<strong>in</strong><strong>in</strong>g sets. One run of GEL takes<br />

on the average about 2 m<strong>in</strong>utes on a Sun Ultra 250,<br />

UltraSPARC-II 400MHz.<br />

For each run, the simplicity (i.e. number of clauses)<br />

of the result<strong>in</strong>g logic program, and its accuracy on the<br />

test set (i.e. number of correctly classified test examples<br />

divided by the total number of test examples) are<br />

computed. The average of the results over the 25 runs<br />

is considered as f<strong>in</strong>al result (standard deviation is not<br />

reported because the results do not differ too much<br />

from each other).<br />

The results are illustrated <strong>in</strong> Figure 1. For each parameter<br />

Ni, the (average) values of accuracy (left column)<br />

and simplicity (right column) are plotted for different<br />

values of Ni <strong>in</strong> the two configurations w = 1 and<br />

w = max.<br />

The results <strong>in</strong>dicate that the best accuracy is obta<strong>in</strong>ed<br />

when greed<strong>in</strong>ess is present <strong>in</strong> all the parameters. However,<br />

satisfactory results are obta<strong>in</strong>ed also when greed<strong>in</strong>ess<br />

is <strong>in</strong>corporated <strong>in</strong> only one parameter (e.g. <strong>in</strong> the<br />

parameter sett<strong>in</strong>g (N3 = 14, w = 1)). Moreover, too<br />

much greed<strong>in</strong>ess seems to have a negative <strong>in</strong>fluence


0.94<br />

0.92<br />

Ni = 1<br />

Ni = max<br />

10<br />

9<br />

Ni = 1<br />

Ni = max<br />

Accuracy<br />

0.9<br />

0.88<br />

0.86<br />

0.84<br />

0.82<br />

Nr. clauses<br />

8<br />

7<br />

6<br />

5<br />

0 2 4 6 8 10 max<br />

N1<br />

0 2 4 6 8 10 max<br />

N1<br />

0.94<br />

0.92<br />

Ni = 1<br />

Ni = max<br />

10<br />

9<br />

Ni = 1<br />

Ni = max<br />

Accuracy<br />

0.9<br />

0.88<br />

0.86<br />

0.84<br />

0.82<br />

Nr. clauses<br />

8<br />

7<br />

6<br />

5<br />

0 1 2 4 6 max<br />

N2<br />

0 1 2 4 6 max<br />

N2<br />

0.94<br />

0.92<br />

Ni = 1<br />

Ni = max<br />

10<br />

9<br />

Ni = 1<br />

Ni = max<br />

Accuracy<br />

0.9<br />

0.88<br />

0.86<br />

0.84<br />

0.82<br />

Nr. clauses<br />

8<br />

7<br />

6<br />

5<br />

0 2 6 10 14 18 max<br />

N3<br />

0 2 6 10 14 18 max<br />

N3<br />

0.94<br />

0.92<br />

Ni = 1<br />

Ni = max<br />

10<br />

9<br />

Ni = 1<br />

Ni = max<br />

Accuracy<br />

0.9<br />

0.88<br />

0.86<br />

0.84<br />

0.82<br />

Nr. clauses<br />

8<br />

7<br />

6<br />

5<br />

0 1 2 4 6 max<br />

N4<br />

0 1 2 4 6 max<br />

N4<br />

Figure 1. Accuracy and number of clauses for different values of the N parameters


on the learn<strong>in</strong>g process, possibly because it conf<strong>in</strong>es<br />

the search to limited regions <strong>in</strong> the neighbourhood of<br />

local optima. In general, accuracy does not <strong>in</strong>crease<br />

monotonically with respect to greed<strong>in</strong>ess, and exhibits<br />

different behaviour for the four parameters.<br />

The results concern<strong>in</strong>g simplicity do not allow to derive<br />

any general functional relationship between simplicity<br />

and greed<strong>in</strong>ess of the operators. Simple programs<br />

are obta<strong>in</strong>ed for both very low and very high<br />

values of the parameters, with a somehow irregular<br />

behaviour for the other sett<strong>in</strong>gs.<br />

7. Noise<br />

It is <strong>in</strong>terest<strong>in</strong>g to test the robustness of GEL when a<br />

controlled amount of noise is <strong>in</strong>troduced <strong>in</strong> the tra<strong>in</strong><strong>in</strong>g<br />

dataset, as done <strong>in</strong> (Lavrač & Džeroski, 1994; Lavrač<br />

et al., 1996). Three different types of noise are added<br />

at different noise levels: noise <strong>in</strong> arguments (type na),<br />

noise <strong>in</strong> class (positive and negative) (type nc), and<br />

noise <strong>in</strong> both arguments and class (type nb). For<br />

each tra<strong>in</strong><strong>in</strong>g set, and for each type of noise, seven<br />

datasets with different noise levels (5, 10, 15, 20, 30, 50,<br />

and 80%) are generated (mean<strong>in</strong>g that p% of the examples<br />

are corrupted by replac<strong>in</strong>g a value with a random<br />

value <strong>in</strong> the doma<strong>in</strong>).<br />

GEL is tested <strong>in</strong> a sett<strong>in</strong>g with moderate greed<strong>in</strong>ess,<br />

with N1 = 2, N2 = 1, N3 = 2, N4 = 4. The results<br />

of the experiments are illustrated <strong>in</strong> Figure 2 (as <strong>in</strong><br />

previous experiments, the average results over all the<br />

runs are considered).<br />

As expected, a rapid decay <strong>in</strong> accuracy and simplicity<br />

is obta<strong>in</strong>ed when the noise level <strong>in</strong>creases, with<br />

programs conta<strong>in</strong><strong>in</strong>g more and more clauses. A comparison<br />

with results of other ILP systems on the<br />

same datasets, like FOIL, DOGMA and mFOIL (cf.<br />

(Hekanaho, 1998)), <strong>in</strong>dicate that the performance of<br />

GEL is comparable to the one of FOIL, be<strong>in</strong>g slightly<br />

worse than FOIL for low levels of noise, and slightly<br />

better for noise levels of 50 and 80%. These ILP systems<br />

use specific mechanisms for noise handl<strong>in</strong>g, while<br />

GEL robustness to noise is ma<strong>in</strong>ly due to its stochastic<br />

nature.<br />

8. Related Work<br />

There are few systems for <strong>in</strong>ductive learn<strong>in</strong>g <strong>in</strong> FOL<br />

based on evolutionary algorithms.<br />

REGAL (Giordana & Neri, 1996) and DOGMA<br />

(Hekanaho, 1998) are hybrids between the Pittsburgh<br />

and Michigan approach, use a restricted Horn clause<br />

language, and an explicit bias for restrict<strong>in</strong>g the search<br />

Accuracy<br />

Nr. clauses<br />

0.9<br />

0.88<br />

0.86<br />

0.84<br />

0.82<br />

0.8<br />

0.78<br />

0.76<br />

0.74<br />

0.72<br />

0.7<br />

0.68<br />

0.66<br />

0.64<br />

0.62<br />

0.6<br />

24<br />

22<br />

20<br />

19<br />

18<br />

17<br />

16<br />

15<br />

14<br />

13<br />

12<br />

11<br />

10<br />

9<br />

8<br />

0 5 10 15 20 30 50 80<br />

noise level<br />

on arguments<br />

on classes<br />

on both<br />

0 5 10 15 20 30 50 80<br />

noise level<br />

on arguments<br />

on classes<br />

on both<br />

Figure 2. Accuracy and simplicity for different levels of<br />

noise<br />

space. They use simple specialization and generalization<br />

operators <strong>in</strong>corporated <strong>in</strong> crossover operators<br />

(act<strong>in</strong>g on bit representations) and a standard bl<strong>in</strong>d<br />

mutation operator.<br />

A more compact and complete b<strong>in</strong>ary representation<br />

is used <strong>in</strong> a recent framework (Tamaddoni-Nezhad &<br />

Muggleton, 2000) for <strong>in</strong>cremental learn<strong>in</strong>g <strong>in</strong> FOL. Encod<strong>in</strong>g<br />

of solutions is based on a bottom clause (of<br />

a subsumption lattice) constructed accord<strong>in</strong>g to the<br />

background knowledge us<strong>in</strong>g some ILP methods such<br />

as Inverse Entailment. Novel crossover operators for<br />

generalization and specialization based on this representation<br />

are <strong>in</strong>troduced. The encod<strong>in</strong>g and operators<br />

can be <strong>in</strong>terpreted <strong>in</strong> standard ILP concepts.<br />

GLPS (Leung & Wong, 1995) and STEPS (Kennedy<br />

& Giraud-Carrier, 1999) are based on the Pittsburgh<br />

approach. The systems evolve a population of logic<br />

programs. GLPS uses the same restrictions of the form<br />

of the clauses as GEL, represents a logic program by<br />

means of an AND-OR tree, and utilizes standard bl<strong>in</strong>d<br />

two-po<strong>in</strong>ts crossover.<br />

STEPS uses a tree-like representation employed <strong>in</strong> Genetic<br />

Programm<strong>in</strong>g, and works on strongly typed (Escher)<br />

programs, hence it uses modified genetic operators<br />

to handle type constra<strong>in</strong>ts. However, like <strong>in</strong><br />

GLPS, no (other) knowledge is <strong>in</strong>corporated <strong>in</strong>to the


genetic operators.<br />

9. Conclusions<br />

The ma<strong>in</strong> contribution of this paper is a new framework<br />

for evolv<strong>in</strong>g a population of Horn clauses, which<br />

unites ILP and evolutionary computation. The framework<br />

allows the user to experiment with search strategies<br />

of different degree of greed<strong>in</strong>ess, by sett<strong>in</strong>g the<br />

parameters of four greedy operators which are used <strong>in</strong><br />

the mutation process of a hybrid evolutionary algorithm.<br />

The research of this paper concerned a possible <strong>in</strong>tegration<br />

of greedy search strategies <strong>in</strong> FOL learn<strong>in</strong>g<br />

methods based on evolutionary computation. The development<br />

of a successful <strong>in</strong>ductive learn<strong>in</strong>g system<br />

based on the framework proposed <strong>in</strong> this paper, and its<br />

application to real life problems, needs further <strong>in</strong>vestigation.<br />

Other <strong>in</strong>terest<strong>in</strong>g topics to be addressed <strong>in</strong><br />

future research <strong>in</strong>clude the extension of the framework<br />

to deal with multiple predicates and recursion.<br />

References<br />

Blickle, T. (2000). Tournament selection. In T. Bäck,<br />

D. Fogel and T. Michalewicz (Eds.), Evolutionary<br />

computation 1, 181–187. Bristol and Philadelphia:<br />

IoP.<br />

Caprara, A., Fischetti, M., & Toth, P. (1998). Algorithms<br />

for the set cover<strong>in</strong>g problem (Technical Report).<br />

DEIS Operation Research Technical Report,<br />

Italy.<br />

Giordana, A., & Neri, F. (1996). Search-<strong>in</strong>tensive concept<br />

<strong>in</strong>duction. Evolutionary Computation, 3, 375–<br />

416.<br />

Hekanaho, J. (1998). DOGMA: a GA based relational<br />

learner. Proceed<strong>in</strong>gs of the 8th International Conference<br />

on Inductive <strong>Logic</strong> Programm<strong>in</strong>g (pp. 205–214).<br />

Spr<strong>in</strong>ger Verlag.<br />

Janikow, C. (1993). A knowledge <strong>in</strong>tensive genetic algorithm<br />

for supervised learn<strong>in</strong>g. Mach<strong>in</strong>e <strong>Learn<strong>in</strong>g</strong>,<br />

13, 198–228.<br />

Jong, K. D., Spears, W., & Gordon, D. (1993). Us<strong>in</strong>g<br />

Genetic Algorithms for Concept <strong>Learn<strong>in</strong>g</strong>. Mach<strong>in</strong>e<br />

<strong>Learn<strong>in</strong>g</strong>, 13(1/2), 155–188.<br />

Kennedy, C. J., & Giraud-Carrier, C. (1999). A depth<br />

controll<strong>in</strong>g strategy for strongly typed evolutionary<br />

programm<strong>in</strong>g. GECCO 1999: Proceed<strong>in</strong>gs of the<br />

<strong>First</strong> Annual Conference (pp. 1–6). Morgan Kauffman.<br />

Kubat, M., Bratko, I., & Michalski, R. (1998). A review<br />

of Mach<strong>in</strong>e <strong>Learn<strong>in</strong>g</strong> Methods. In R. Michalski,<br />

I. Bratko and M. Kubat (Eds.), Mach<strong>in</strong>e learn<strong>in</strong>g<br />

and data m<strong>in</strong><strong>in</strong>g. Chichester: John Wiley and Sons<br />

Ltd.<br />

Lavrač, N., & Džeroski, S. (1994). Inductive logic programm<strong>in</strong>g:<br />

Techniques and applications. Ellis Horwood.<br />

Lavrač, N., Džeroski, S., & Bratko, I. (1996). Handl<strong>in</strong>g<br />

imperfect data <strong>in</strong> <strong>in</strong>ductive logic programm<strong>in</strong>g. In<br />

L. De Raedt (Ed.), Advances <strong>in</strong> Inductive <strong>Logic</strong> Programm<strong>in</strong>g,<br />

48–64. IOS Press.<br />

Leung, K., & Wong, M. (1995). Genetic logic programm<strong>in</strong>g<br />

and applications. IEEE Expert, 10(5), 68–76.<br />

Michalewicz, Z. (1996). Genetic algorithms + data<br />

structures = evolution programs. Berl<strong>in</strong>: Spr<strong>in</strong>ger-<br />

Verlag.<br />

Mitchell, T. (1997). Mach<strong>in</strong>e learn<strong>in</strong>g. Series <strong>in</strong> Computer<br />

Science. McGraw-Hill.<br />

Muggleton, S. (1999). Inductive logic programm<strong>in</strong>g:<br />

issues, results and the challenge of learn<strong>in</strong>g language<br />

<strong>in</strong> logic. Artificial Intelligence, 114, 283–296.<br />

Muggleton, S., Ba<strong>in</strong>, M., Hayes-Michie, J., & Michie,<br />

D. (1989). An experimental comparison of human<br />

and mach<strong>in</strong>e learn<strong>in</strong>g formalisms. Proceed<strong>in</strong>gs of the<br />

6th International Workshop on Mach<strong>in</strong>e <strong>Learn<strong>in</strong>g</strong><br />

(pp. 113–118). Morgan Kaufmann.<br />

Muggleton, S., & Bunt<strong>in</strong>e, W. (1988). Mach<strong>in</strong>e <strong>in</strong>vention<br />

of first-order predicates by <strong>in</strong>vert<strong>in</strong>g resolution.<br />

Proceed<strong>in</strong>gs of the Fifth International Mach<strong>in</strong>e<br />

<strong>Learn<strong>in</strong>g</strong> Conference (pp. 339–352). Morgan Kaufmann.<br />

Muggleton, S., & Raedt, L. D. (1994). Inductive logic<br />

programm<strong>in</strong>g: Theory and methods. Journal of<br />

<strong>Logic</strong> Programm<strong>in</strong>g, 19-20, 669–679.<br />

Porto, V. (2000). Evolutionary programm<strong>in</strong>g. In<br />

T. Bäck, D. Fogel and T. Michalewicz (Eds.), Evolutionary<br />

computation 1, 89–102. Bristol and Philadelphia:<br />

IoP.<br />

Qu<strong>in</strong>lan, J. (1990). <strong>Learn<strong>in</strong>g</strong> logical def<strong>in</strong>ition from<br />

relations. Mach<strong>in</strong>e <strong>Learn<strong>in</strong>g</strong>, 5, 239–266.<br />

Raedt, L. D. (1992). Interactive concept learn<strong>in</strong>g and<br />

constructive <strong>in</strong>duction by analogy. Mach<strong>in</strong>e <strong>Learn<strong>in</strong>g</strong>,<br />

8, 107–150.


Tamaddoni-Nezhad, A., & Muggleton, S. (2000).<br />

Search<strong>in</strong>g the subsumption lattice by a genetic algorithm.<br />

Proceed<strong>in</strong>gs of the 10th International Conference<br />

on Inductive <strong>Logic</strong> Programm<strong>in</strong>g (pp. 243–253).<br />

Spr<strong>in</strong>ger Verlag.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!