Learning in First-Order Logic using Greedy ... - ResearchGate
Learning in First-Order Logic using Greedy ... - ResearchGate
Learning in First-Order Logic using Greedy ... - ResearchGate
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Learn<strong>in</strong>g</strong> <strong>in</strong> <strong>First</strong>-<strong>Order</strong> <strong>Logic</strong> us<strong>in</strong>g <strong>Greedy</strong> Evolutionary Algorithms<br />
Federico Div<strong>in</strong>a<br />
div<strong>in</strong>a@cs.vu.nl<br />
Elena Marchiori<br />
elena@cs.vu.nl<br />
Department of Mathematics and Computer Sciences, Vrije Universiteit, De Boelelaan 1081a, 1081 HV Amsterdam<br />
The Netherlands<br />
Abstract<br />
In evolutionary computation ‘learn<strong>in</strong>g’ is a<br />
byproduct of the evolutionary process as<br />
successful <strong>in</strong>dividuals are reta<strong>in</strong>ed through<br />
stochastic trial and error. This learn<strong>in</strong>g process<br />
can be rather slow, due to the weak strategy<br />
used to guide evolution. A way to overcome<br />
this drawback is to <strong>in</strong>corporate greedy<br />
operators <strong>in</strong> the evolutionary process.<br />
This paper <strong>in</strong>vestigates the effectiveness of<br />
this approach for <strong>in</strong>ductive concept learn<strong>in</strong>g<br />
<strong>in</strong> (a fragment of) <strong>First</strong>-<strong>Order</strong> <strong>Logic</strong> (FOL).<br />
This is done by means of a new greedy evolutionary<br />
algorithm. The algorithm evolves<br />
a population of Horn clauses. Randomized<br />
greedy operators are employed for generaliz<strong>in</strong>g<br />
and specializ<strong>in</strong>g a clause. The degree of<br />
greed<strong>in</strong>ess of each operator is determ<strong>in</strong>ed by<br />
a parameter. In this way, the user can control<br />
the greed<strong>in</strong>ess of the learn<strong>in</strong>g process by<br />
sett<strong>in</strong>g the parameters to specific values.<br />
A typical case study <strong>in</strong> Inductive <strong>Logic</strong> Programm<strong>in</strong>g<br />
(the KRK endgame problem) is<br />
used for test<strong>in</strong>g the learn<strong>in</strong>g method. The<br />
effect of the greedy operators on the learn<strong>in</strong>g<br />
process is analyzed, by means of extensive<br />
experiments with different values of their parameters.<br />
Moreover, the robustness of the<br />
method to noise <strong>in</strong> the tra<strong>in</strong><strong>in</strong>g examples is<br />
<strong>in</strong>vestigated.<br />
1. Introduction<br />
<strong>Learn<strong>in</strong>g</strong> from examples <strong>in</strong> FOL, also known as Inductive<br />
<strong>Logic</strong> Programm<strong>in</strong>g (ILP) (Muggleton & Raedt,<br />
1994), constitutes a central topic <strong>in</strong> Artificial Intelligence,<br />
with relevant applications to problems <strong>in</strong> complex<br />
doma<strong>in</strong>s like natural language and molecular computational<br />
biology (Muggleton, 1999). Given a FOL<br />
description language used to express possible hypotheses,<br />
a set of positive examples, and a set of negative<br />
examples, one has to f<strong>in</strong>d a hypothesis which covers<br />
all positive examples and none of the negative ones<br />
(cf. (Kubat et al., 1998; Mitchell, 1997)).<br />
<strong>Learn<strong>in</strong>g</strong> hypotheses <strong>in</strong> FOL is a hard task because the<br />
search space (of all hypotheses)has prohibitive size,<br />
also when restrictions on the representation are imposed<br />
(e.g. Datalog clauses). A standard approach to<br />
tackle this problem, adopted <strong>in</strong> the majority of FOL<br />
learn<strong>in</strong>g systems, is to use specific search strategies,<br />
like the general-to-specific (hill-climb<strong>in</strong>g) search (e.g.<br />
(Qu<strong>in</strong>lan, 1990)) and the <strong>in</strong>verse resolution mechanism<br />
(e.g. (Muggleton & Bunt<strong>in</strong>e, 1988)).<br />
An alternative approach, based on evolutionary computation,<br />
employ a multi-po<strong>in</strong>t search strategy where<br />
randomized operators (mutation and crossover) are<br />
used to move <strong>in</strong> the search space, and a fitness function<br />
is used to guide the search (by means of a probabilistic<br />
selection process) (Jong et al., 1993). The result<strong>in</strong>g<br />
learn<strong>in</strong>g methods are <strong>in</strong> general weaker than the ILP<br />
ones, but more effective <strong>in</strong> escap<strong>in</strong>g from attraction<br />
bas<strong>in</strong>s (local optima) dur<strong>in</strong>g the search for a best hypothesis.<br />
This paper <strong>in</strong>vestigates a learn<strong>in</strong>g framework which<br />
unites these two approaches. This is done via generalization/specialization<br />
operators which are <strong>in</strong>corporated<br />
<strong>in</strong> the mutation process of a simple evolutionary<br />
algorithm. Four randomized greedy operators are <strong>in</strong>troduced,<br />
two for generaliz<strong>in</strong>g and two for specializ<strong>in</strong>g<br />
a clause. Each operator is equipped with a parameter<br />
which determ<strong>in</strong>es its degree of greed<strong>in</strong>ess. Different<br />
values of the parameters determ<strong>in</strong>e different search<br />
strategies: low values yield weak learn<strong>in</strong>g methods,<br />
like standard evolutionary algorithms, while high values<br />
yield more greedy learn<strong>in</strong>g methods, like Inductive<br />
<strong>Logic</strong> Programm<strong>in</strong>g systems.<br />
The result<strong>in</strong>g hybrid evolutionary algorithm is tested<br />
on a case study which appears frequently <strong>in</strong> the ILP<br />
literature, learn<strong>in</strong>g illegal white-to-move positions <strong>in</strong>
the chess endgame White K<strong>in</strong>g and Rook versus Black<br />
K<strong>in</strong>g. The results of experiments on datasets for this<br />
problem <strong>in</strong>dicate that the presence of greed<strong>in</strong>ess <strong>in</strong> the<br />
operators is beneficial for the learn<strong>in</strong>g process, but<br />
that too much greed<strong>in</strong>ess may affect negatively the<br />
learn<strong>in</strong>g process. Satisfactory performance can also be<br />
obta<strong>in</strong>ed also when greed<strong>in</strong>ess is used <strong>in</strong> only one operator.<br />
Moreover, the hybrid algorithm exhibit a robust<br />
behaviour when different levels of noise are <strong>in</strong>troduced<br />
<strong>in</strong> the tra<strong>in</strong><strong>in</strong>g dataset.<br />
This <strong>in</strong>vestigation suggests an experimental methodology<br />
for design<strong>in</strong>g greedy evolutionary algorithms for<br />
<strong>in</strong>ductive learn<strong>in</strong>g, where the user constructs a suitable<br />
search strategy for a considered learn<strong>in</strong>g problem,<br />
by sett<strong>in</strong>g the greed<strong>in</strong>ess parameters to specific values<br />
(which are experimentally determ<strong>in</strong>ed).<br />
2. Evolutionary Approaches<br />
One can dist<strong>in</strong>guish two ma<strong>in</strong> <strong>in</strong>ductive learn<strong>in</strong>g approaches<br />
based on evolutionary computation, called<br />
Pittsburgh and Michigan (cf. (Michalewicz, 1996)).<br />
In the first approach, an <strong>in</strong>dividual represents an entire<br />
set of rules, a population of rule sets is ma<strong>in</strong>ta<strong>in</strong>ed, and<br />
selection and genetic operators are used to produce<br />
new generations of rule sets. Instances of this approach<br />
are e.g. GIL (Janikow, 1993), GLPS (Leung & Wong,<br />
1995) and STEPS (Kennedy & Giraud-Carrier, 1999).<br />
In contrast, the Michigan approach employs a representation<br />
where an <strong>in</strong>dividual represents one rule,<br />
and <strong>in</strong>dividuals co-operate and compete <strong>in</strong> the evolutionary<br />
process. In this approach specific strategies<br />
have to be designed <strong>in</strong> order to extract a non redundant<br />
hypothesis from the f<strong>in</strong>al population. Systems for<br />
FOL learn<strong>in</strong>g based on this approach are e.g. REGAL<br />
(Giordana & Neri, 1996), DOGMA (Hekanaho, 1998).<br />
Both approaches present advantages and drawbacks.<br />
Encod<strong>in</strong>g a whole hypothesis <strong>in</strong> each <strong>in</strong>dividual allows<br />
an easier control of the genetic search but <strong>in</strong>troduces<br />
a large redundancy, that can lead to populations<br />
hard to manage and to <strong>in</strong>dividuals of enormous size.<br />
In the Michigan approach, co-operation/competition<br />
between different <strong>in</strong>dividuals reduces redundancy and<br />
more complex problems can be handled, but also more<br />
sophisticated strategies may have to be designed, for<br />
cop<strong>in</strong>g with the presence <strong>in</strong> the population of super<strong>in</strong>dividuals<br />
which lead the evolutionary process to a<br />
premature convergence.<br />
This paper adopts the Michigan approach.<br />
3. A <strong>Greedy</strong> Evolutionary Algorithm<br />
This section describes the components of a hybrid<br />
evolutionary algorithm for <strong>in</strong>ductive learn<strong>in</strong>g <strong>in</strong> FOL,<br />
called GEL (<strong>Greedy</strong> Evolutionary Learner). The algorithm<br />
evolves a population of Horn clauses us<strong>in</strong>g a<br />
mutation process for mov<strong>in</strong>g <strong>in</strong> the search space, a<br />
fitness function for measur<strong>in</strong>g the quality of clauses,<br />
and a probabilistic selection/replacement mechanism<br />
for cull<strong>in</strong>g fitter rules from the population. In order<br />
to guide the search, the mutation process uses four<br />
greedy operators described <strong>in</strong> the next section. A best<br />
hypothesis is extracted from the f<strong>in</strong>al population us<strong>in</strong>g<br />
a heuristic procedure.<br />
3.1 Notation and Term<strong>in</strong>ology<br />
The algorithm considers Horn clauses of the form<br />
p(X, Y ) ← r(X, Z), q(Y, a).<br />
consist<strong>in</strong>g of atoms whose arguments are either variables<br />
(e.g. X, Y, Z) or constants (e.g. a). The lefthand<br />
side of the rule is called head, and the righthand-side<br />
body, of the clause. These rules have a<br />
declarative <strong>in</strong>terpretation (universally quantified FOL<br />
implications) and a procedural <strong>in</strong>terpretation (<strong>in</strong> order<br />
to solve p(X, Y ) solve r(X, Z) and q(Y, a)). A set<br />
of Horn clauses forms a logic program, which can be<br />
directly (<strong>in</strong> a slightly different syntax) executed <strong>in</strong> the<br />
programm<strong>in</strong>g language Prolog.<br />
Thus the goal of GEL is to <strong>in</strong>duce a logic program from<br />
a set of tra<strong>in</strong><strong>in</strong>g examples. The number of examples<br />
<strong>in</strong> the (tra<strong>in</strong><strong>in</strong>g) set is denoted by nte, the number of<br />
its positive (negative) examples by pos (neg, respectively).<br />
A clause covers an example if the theory formed by the<br />
clause and the background knowledge logically entails<br />
the example. Given a clause cl, the number of positive<br />
(negative) examples covered by a cl is denoted by pos cl<br />
(neg cl respectively).<br />
3.2 Representation and Fitness<br />
The algorithm operates on Horn clauses of restricted<br />
form (described above).<br />
The fitness of a clause cl is def<strong>in</strong>ed by<br />
fitness(cl) = w 1 ∗ neg cl + nte − pos cl<br />
where the <strong>in</strong>teger w 1 > 1 is a weight used to favor<br />
clauses cover<strong>in</strong>g few negative examples.<br />
So GEL has to evolve clauses with m<strong>in</strong>imum fitness.
3.3 Initialization<br />
Each clause cl of the <strong>in</strong>itial population is generated <strong>in</strong><br />
two phases as follows.<br />
<strong>First</strong>, a ground (i.e. with no variables) clause is constructed<br />
whose head is a positive example randomly<br />
selected from the tra<strong>in</strong><strong>in</strong>g set, and whose body consists<br />
of all atoms <strong>in</strong> the background knowledge hav<strong>in</strong>g<br />
at most one argument which does not occur <strong>in</strong> the<br />
head. This procedure is similar to the one used <strong>in</strong><br />
CLINT (Raedt, 1992), but <strong>in</strong> CLINT each argument<br />
of the body occurs also <strong>in</strong> the head of the clause.<br />
All other elements <strong>in</strong> the background knowledge hav<strong>in</strong>g<br />
at least one argument occurr<strong>in</strong>g <strong>in</strong> the head are<br />
<strong>in</strong>serted <strong>in</strong> a list B cl associated to cl. Atoms of this<br />
list may be added to the clause body dur<strong>in</strong>g the evolutionary<br />
process (us<strong>in</strong>g a specialization operator).<br />
Next, cl is obta<strong>in</strong>ed from this ground clause by apply<strong>in</strong>g<br />
one of the two generalization operators (described<br />
<strong>in</strong> Sections 4.1, 4.2), randomly chosen.<br />
3.4 Evolutionary Cycle<br />
At each iteration the probabilistic tournament selection<br />
mechanism (cf. (Blickle, 2000)) is used (better<br />
the fitness yields higher the probability of selection),<br />
to select a number of <strong>in</strong>dividuals. Then every selected<br />
<strong>in</strong>dividual undergoes the mutation process.<br />
No other evolutionary operator is used. In particular,<br />
crossover is not used, <strong>in</strong> accordance with the conviction<br />
that ‘a sum of optimal parts rarely leads to<br />
an optimal overall solution’, which is the key to the<br />
(philosophical) dist<strong>in</strong>ction between evolutionary and<br />
genetic algorithms (cf. (Porto, 2000)).<br />
3.5 Mutation Process<br />
The mutation process consists of the repeated application<br />
of the greedy operators to an <strong>in</strong>dividual until<br />
its fitness does not <strong>in</strong>crease (or a maximum number of<br />
iterations is reached).<br />
At each iteration one of the four greedy operators is<br />
applied. This operator is chosen as follows. <strong>First</strong>, a<br />
(randomized) test decides whether it will be a generalization<br />
or a specialization operator. Next, one of the<br />
two operators of the chosen class is randomly selected.<br />
The test decides to generalize a clause cl with probability<br />
p gen (cl) = 1 ( )<br />
poscl − neg cl<br />
+ α<br />
2 nte<br />
otherwise it decides to specialize it (with probability<br />
1 − p gen (cl)). The constant α = 1 + 0.5 ∗ (neg − pos)<br />
is used to slightly bias the decision towards generalization.<br />
The probability p gen (cl) is maximal when cl<br />
covers all positive and no negative examples, and it is<br />
m<strong>in</strong>imal <strong>in</strong> the dual case.<br />
3.6 Hypothesis Extraction<br />
At the end of the evolutionary process, a best logic program<br />
cover<strong>in</strong>g all the positive examples and no negative<br />
ones has to be extracted from the f<strong>in</strong>al population.<br />
This problem can be translated <strong>in</strong> an <strong>in</strong>stance of the<br />
weighted set cover<strong>in</strong>g problem as follows. Each <strong>in</strong>dividual<br />
cl of the f<strong>in</strong>al population is a column with<br />
positive weight equal to<br />
weight cl = neg cl ∗ fitness(cl) + 1<br />
and each positive example is a row. The problem consists<br />
of f<strong>in</strong>d<strong>in</strong>g a subset of the set of columns, cover<strong>in</strong>g<br />
all the rows and hav<strong>in</strong>g m<strong>in</strong>imum total weight. The<br />
weight of a column is def<strong>in</strong>ed <strong>in</strong> this way <strong>in</strong> order to<br />
prefer clauses cover<strong>in</strong>g few negative examples. A fast<br />
(heuristic) algorithm (cf. (Caprara et al., 1998)) is<br />
applied to this problem <strong>in</strong>stance to f<strong>in</strong>d a best logic<br />
program.<br />
4. <strong>Greedy</strong> Operators<br />
A clause cl is generalized either by replac<strong>in</strong>g (all occurrences<br />
of) a constant of the clause with a variable,<br />
or by delet<strong>in</strong>g an atom from the body of the clause.<br />
Dually, cl is specialized either by replac<strong>in</strong>g (all occurrences<br />
of) a variable of cl with a constant, or by add<strong>in</strong>g<br />
an atom to the body of cl.<br />
The four operators utilize parameters N1, . . . , N4, respectively,<br />
<strong>in</strong> their def<strong>in</strong>ition, and a ga<strong>in</strong> function.<br />
When applied to operator τ and clause cl, the ga<strong>in</strong><br />
function yields the difference between the clause fitness<br />
before and after the application of that operator<br />
ga<strong>in</strong>(cl, τ) = fitness(cl) − fitness(τ(cl)).<br />
The four operators are def<strong>in</strong>ed below.<br />
4.1 Variable <strong>in</strong>to Constant<br />
Consider the set Con consist<strong>in</strong>g of N1 constants of cl<br />
randomly chosen, and the set V ar consist<strong>in</strong>g of all the<br />
variables of cl and of a fresh variable.<br />
For each a <strong>in</strong> Con and for each X <strong>in</strong> V ar, compute<br />
ga<strong>in</strong>(cl, {a/X}), the ga<strong>in</strong> of cl when all occurrences of<br />
a are replaced by X.<br />
Choose a substitution {a/X} yield<strong>in</strong>g the highest ga<strong>in</strong><br />
(ties are broken randomly), and generalize cl by replac<strong>in</strong>g<br />
all occurrences of a with X.
4.2 Atom Deletion<br />
Consider the set Atm consist<strong>in</strong>g of N2 atoms of cl<br />
randomly chosen.<br />
For each A <strong>in</strong> Atm, compute ga<strong>in</strong>(cl, −A), the ga<strong>in</strong> of<br />
cl when A is deleted from cl.<br />
Choose an atom A yield<strong>in</strong>g the highest ga<strong>in</strong><br />
ga<strong>in</strong>(cl, −A) (ties are broken randomly), and generalize<br />
cl by delet<strong>in</strong>g A from its body.<br />
Insert the deleted atom A <strong>in</strong>to a list D cl conta<strong>in</strong><strong>in</strong>g<br />
atoms which have been deleted from cl. Atoms from<br />
this list may be added to the clause dur<strong>in</strong>g the evolutionary<br />
process by means of a specialization operator.<br />
4.3 Constant <strong>in</strong>to Variable<br />
Consider a variable X of cl randomly selected, and the<br />
set Con consist<strong>in</strong>g of N3 constants (of the problem<br />
language) randomly chosen.<br />
For each a <strong>in</strong> Con, compute ga<strong>in</strong>(cl, {X/a}), the ga<strong>in</strong><br />
of cl when all occurrences of X are replaced by a.<br />
Choose a substitution {X/a} yield<strong>in</strong>g the highest ga<strong>in</strong><br />
(ties are broken randomly), and specialize cl by replac<strong>in</strong>g<br />
all occurrences of X with a.<br />
4.4 Atom Addition<br />
Consider the set Atm consist<strong>in</strong>g of N4 atoms of B cl<br />
(<strong>in</strong>troduced <strong>in</strong> the <strong>in</strong>itialization of GEL) and of N4<br />
atoms of D cl , all randomly chosen.<br />
For each A <strong>in</strong> Atm, compute ga<strong>in</strong>(cl, +A), the ga<strong>in</strong> of<br />
cl when A is added to cl.<br />
Choose an atom A yield<strong>in</strong>g the highest ga<strong>in</strong><br />
ga<strong>in</strong>(cl, +A) (ties are broken randomly), and specialize<br />
cl by add<strong>in</strong>g A from its body.<br />
Remove A from its orig<strong>in</strong>al list (B cl or D cl ).<br />
5. Experimental Setup<br />
The case study used to test GEL is the problem<br />
of learn<strong>in</strong>g illegal positions <strong>in</strong> the chess endgame<br />
doma<strong>in</strong> White K<strong>in</strong>g and Rook versus Black K<strong>in</strong>g<br />
(KRK endgame) (Muggleton et al., 1989; Qu<strong>in</strong>lan,<br />
1990). The concept to learn is expressed by the<br />
predicate illegal(A, B, C, D, E, F ) which states that<br />
the position where the White K<strong>in</strong>g is at (A, B),<br />
the White Rook at (C, D) and the Black K<strong>in</strong>g at<br />
(E, F ) is an illegal White-to-move position. For <strong>in</strong>stance,<br />
illegal(g, 6, c, 7, c, 8) is an illegal White-tomove<br />
position. The background knowledge consists<br />
of facts about the two predicates adjacent(A, B) and<br />
less than(A, B), <strong>in</strong>dicat<strong>in</strong>g that rank/file A is adjacent<br />
and less than rank/file B, respectively. The<br />
dataset orig<strong>in</strong>ates from (Muggleton et al., 1989), and<br />
is available at<br />
http://oldwww.comlab.ox.ac.uk/oucl/groups/machlea<br />
rn/chess.html. It consists of 5 tra<strong>in</strong><strong>in</strong>g sets of 100<br />
examples, and 1 test set of 5000 examples. The<br />
background knowledge conta<strong>in</strong>s 50 elements.<br />
The parameter sett<strong>in</strong>g of GEL is obta<strong>in</strong>ed after perform<strong>in</strong>g<br />
a small number of tun<strong>in</strong>g experiments, and<br />
their choice is based on a tradeoff between efficiency<br />
and performance. The population conta<strong>in</strong>s 200 <strong>in</strong>dividuals.<br />
The tournament size is 4. At each iteration,<br />
15 <strong>in</strong>dividuals are selected (us<strong>in</strong>g the tournament selection<br />
mechanism) and mutation process is applied<br />
to each of them. The algorithm term<strong>in</strong>ates after 30<br />
iterations.<br />
6. Results<br />
The effect of vary<strong>in</strong>g the value of one greedy parameter<br />
Ni (i <strong>in</strong> 1, . . . , 4) is <strong>in</strong>vestigated, when the values of the<br />
other parameters are fixed, and are either all equal to<br />
1 (no greed<strong>in</strong>ess) or all equal to their maximum values<br />
max (full greed<strong>in</strong>ess). In this way, a parameter sett<strong>in</strong>g<br />
can be described by a pair (Ni = v, w), mean<strong>in</strong>g that<br />
Ni has value v and all other parameters have value w.<br />
For each parameter sett<strong>in</strong>g, 25 runs of the GEL are<br />
performed, 5 runs with different random seeds for<br />
each of the 5 tra<strong>in</strong><strong>in</strong>g sets. One run of GEL takes<br />
on the average about 2 m<strong>in</strong>utes on a Sun Ultra 250,<br />
UltraSPARC-II 400MHz.<br />
For each run, the simplicity (i.e. number of clauses)<br />
of the result<strong>in</strong>g logic program, and its accuracy on the<br />
test set (i.e. number of correctly classified test examples<br />
divided by the total number of test examples) are<br />
computed. The average of the results over the 25 runs<br />
is considered as f<strong>in</strong>al result (standard deviation is not<br />
reported because the results do not differ too much<br />
from each other).<br />
The results are illustrated <strong>in</strong> Figure 1. For each parameter<br />
Ni, the (average) values of accuracy (left column)<br />
and simplicity (right column) are plotted for different<br />
values of Ni <strong>in</strong> the two configurations w = 1 and<br />
w = max.<br />
The results <strong>in</strong>dicate that the best accuracy is obta<strong>in</strong>ed<br />
when greed<strong>in</strong>ess is present <strong>in</strong> all the parameters. However,<br />
satisfactory results are obta<strong>in</strong>ed also when greed<strong>in</strong>ess<br />
is <strong>in</strong>corporated <strong>in</strong> only one parameter (e.g. <strong>in</strong> the<br />
parameter sett<strong>in</strong>g (N3 = 14, w = 1)). Moreover, too<br />
much greed<strong>in</strong>ess seems to have a negative <strong>in</strong>fluence
0.94<br />
0.92<br />
Ni = 1<br />
Ni = max<br />
10<br />
9<br />
Ni = 1<br />
Ni = max<br />
Accuracy<br />
0.9<br />
0.88<br />
0.86<br />
0.84<br />
0.82<br />
Nr. clauses<br />
8<br />
7<br />
6<br />
5<br />
0 2 4 6 8 10 max<br />
N1<br />
0 2 4 6 8 10 max<br />
N1<br />
0.94<br />
0.92<br />
Ni = 1<br />
Ni = max<br />
10<br />
9<br />
Ni = 1<br />
Ni = max<br />
Accuracy<br />
0.9<br />
0.88<br />
0.86<br />
0.84<br />
0.82<br />
Nr. clauses<br />
8<br />
7<br />
6<br />
5<br />
0 1 2 4 6 max<br />
N2<br />
0 1 2 4 6 max<br />
N2<br />
0.94<br />
0.92<br />
Ni = 1<br />
Ni = max<br />
10<br />
9<br />
Ni = 1<br />
Ni = max<br />
Accuracy<br />
0.9<br />
0.88<br />
0.86<br />
0.84<br />
0.82<br />
Nr. clauses<br />
8<br />
7<br />
6<br />
5<br />
0 2 6 10 14 18 max<br />
N3<br />
0 2 6 10 14 18 max<br />
N3<br />
0.94<br />
0.92<br />
Ni = 1<br />
Ni = max<br />
10<br />
9<br />
Ni = 1<br />
Ni = max<br />
Accuracy<br />
0.9<br />
0.88<br />
0.86<br />
0.84<br />
0.82<br />
Nr. clauses<br />
8<br />
7<br />
6<br />
5<br />
0 1 2 4 6 max<br />
N4<br />
0 1 2 4 6 max<br />
N4<br />
Figure 1. Accuracy and number of clauses for different values of the N parameters
on the learn<strong>in</strong>g process, possibly because it conf<strong>in</strong>es<br />
the search to limited regions <strong>in</strong> the neighbourhood of<br />
local optima. In general, accuracy does not <strong>in</strong>crease<br />
monotonically with respect to greed<strong>in</strong>ess, and exhibits<br />
different behaviour for the four parameters.<br />
The results concern<strong>in</strong>g simplicity do not allow to derive<br />
any general functional relationship between simplicity<br />
and greed<strong>in</strong>ess of the operators. Simple programs<br />
are obta<strong>in</strong>ed for both very low and very high<br />
values of the parameters, with a somehow irregular<br />
behaviour for the other sett<strong>in</strong>gs.<br />
7. Noise<br />
It is <strong>in</strong>terest<strong>in</strong>g to test the robustness of GEL when a<br />
controlled amount of noise is <strong>in</strong>troduced <strong>in</strong> the tra<strong>in</strong><strong>in</strong>g<br />
dataset, as done <strong>in</strong> (Lavrač & Džeroski, 1994; Lavrač<br />
et al., 1996). Three different types of noise are added<br />
at different noise levels: noise <strong>in</strong> arguments (type na),<br />
noise <strong>in</strong> class (positive and negative) (type nc), and<br />
noise <strong>in</strong> both arguments and class (type nb). For<br />
each tra<strong>in</strong><strong>in</strong>g set, and for each type of noise, seven<br />
datasets with different noise levels (5, 10, 15, 20, 30, 50,<br />
and 80%) are generated (mean<strong>in</strong>g that p% of the examples<br />
are corrupted by replac<strong>in</strong>g a value with a random<br />
value <strong>in</strong> the doma<strong>in</strong>).<br />
GEL is tested <strong>in</strong> a sett<strong>in</strong>g with moderate greed<strong>in</strong>ess,<br />
with N1 = 2, N2 = 1, N3 = 2, N4 = 4. The results<br />
of the experiments are illustrated <strong>in</strong> Figure 2 (as <strong>in</strong><br />
previous experiments, the average results over all the<br />
runs are considered).<br />
As expected, a rapid decay <strong>in</strong> accuracy and simplicity<br />
is obta<strong>in</strong>ed when the noise level <strong>in</strong>creases, with<br />
programs conta<strong>in</strong><strong>in</strong>g more and more clauses. A comparison<br />
with results of other ILP systems on the<br />
same datasets, like FOIL, DOGMA and mFOIL (cf.<br />
(Hekanaho, 1998)), <strong>in</strong>dicate that the performance of<br />
GEL is comparable to the one of FOIL, be<strong>in</strong>g slightly<br />
worse than FOIL for low levels of noise, and slightly<br />
better for noise levels of 50 and 80%. These ILP systems<br />
use specific mechanisms for noise handl<strong>in</strong>g, while<br />
GEL robustness to noise is ma<strong>in</strong>ly due to its stochastic<br />
nature.<br />
8. Related Work<br />
There are few systems for <strong>in</strong>ductive learn<strong>in</strong>g <strong>in</strong> FOL<br />
based on evolutionary algorithms.<br />
REGAL (Giordana & Neri, 1996) and DOGMA<br />
(Hekanaho, 1998) are hybrids between the Pittsburgh<br />
and Michigan approach, use a restricted Horn clause<br />
language, and an explicit bias for restrict<strong>in</strong>g the search<br />
Accuracy<br />
Nr. clauses<br />
0.9<br />
0.88<br />
0.86<br />
0.84<br />
0.82<br />
0.8<br />
0.78<br />
0.76<br />
0.74<br />
0.72<br />
0.7<br />
0.68<br />
0.66<br />
0.64<br />
0.62<br />
0.6<br />
24<br />
22<br />
20<br />
19<br />
18<br />
17<br />
16<br />
15<br />
14<br />
13<br />
12<br />
11<br />
10<br />
9<br />
8<br />
0 5 10 15 20 30 50 80<br />
noise level<br />
on arguments<br />
on classes<br />
on both<br />
0 5 10 15 20 30 50 80<br />
noise level<br />
on arguments<br />
on classes<br />
on both<br />
Figure 2. Accuracy and simplicity for different levels of<br />
noise<br />
space. They use simple specialization and generalization<br />
operators <strong>in</strong>corporated <strong>in</strong> crossover operators<br />
(act<strong>in</strong>g on bit representations) and a standard bl<strong>in</strong>d<br />
mutation operator.<br />
A more compact and complete b<strong>in</strong>ary representation<br />
is used <strong>in</strong> a recent framework (Tamaddoni-Nezhad &<br />
Muggleton, 2000) for <strong>in</strong>cremental learn<strong>in</strong>g <strong>in</strong> FOL. Encod<strong>in</strong>g<br />
of solutions is based on a bottom clause (of<br />
a subsumption lattice) constructed accord<strong>in</strong>g to the<br />
background knowledge us<strong>in</strong>g some ILP methods such<br />
as Inverse Entailment. Novel crossover operators for<br />
generalization and specialization based on this representation<br />
are <strong>in</strong>troduced. The encod<strong>in</strong>g and operators<br />
can be <strong>in</strong>terpreted <strong>in</strong> standard ILP concepts.<br />
GLPS (Leung & Wong, 1995) and STEPS (Kennedy<br />
& Giraud-Carrier, 1999) are based on the Pittsburgh<br />
approach. The systems evolve a population of logic<br />
programs. GLPS uses the same restrictions of the form<br />
of the clauses as GEL, represents a logic program by<br />
means of an AND-OR tree, and utilizes standard bl<strong>in</strong>d<br />
two-po<strong>in</strong>ts crossover.<br />
STEPS uses a tree-like representation employed <strong>in</strong> Genetic<br />
Programm<strong>in</strong>g, and works on strongly typed (Escher)<br />
programs, hence it uses modified genetic operators<br />
to handle type constra<strong>in</strong>ts. However, like <strong>in</strong><br />
GLPS, no (other) knowledge is <strong>in</strong>corporated <strong>in</strong>to the
genetic operators.<br />
9. Conclusions<br />
The ma<strong>in</strong> contribution of this paper is a new framework<br />
for evolv<strong>in</strong>g a population of Horn clauses, which<br />
unites ILP and evolutionary computation. The framework<br />
allows the user to experiment with search strategies<br />
of different degree of greed<strong>in</strong>ess, by sett<strong>in</strong>g the<br />
parameters of four greedy operators which are used <strong>in</strong><br />
the mutation process of a hybrid evolutionary algorithm.<br />
The research of this paper concerned a possible <strong>in</strong>tegration<br />
of greedy search strategies <strong>in</strong> FOL learn<strong>in</strong>g<br />
methods based on evolutionary computation. The development<br />
of a successful <strong>in</strong>ductive learn<strong>in</strong>g system<br />
based on the framework proposed <strong>in</strong> this paper, and its<br />
application to real life problems, needs further <strong>in</strong>vestigation.<br />
Other <strong>in</strong>terest<strong>in</strong>g topics to be addressed <strong>in</strong><br />
future research <strong>in</strong>clude the extension of the framework<br />
to deal with multiple predicates and recursion.<br />
References<br />
Blickle, T. (2000). Tournament selection. In T. Bäck,<br />
D. Fogel and T. Michalewicz (Eds.), Evolutionary<br />
computation 1, 181–187. Bristol and Philadelphia:<br />
IoP.<br />
Caprara, A., Fischetti, M., & Toth, P. (1998). Algorithms<br />
for the set cover<strong>in</strong>g problem (Technical Report).<br />
DEIS Operation Research Technical Report,<br />
Italy.<br />
Giordana, A., & Neri, F. (1996). Search-<strong>in</strong>tensive concept<br />
<strong>in</strong>duction. Evolutionary Computation, 3, 375–<br />
416.<br />
Hekanaho, J. (1998). DOGMA: a GA based relational<br />
learner. Proceed<strong>in</strong>gs of the 8th International Conference<br />
on Inductive <strong>Logic</strong> Programm<strong>in</strong>g (pp. 205–214).<br />
Spr<strong>in</strong>ger Verlag.<br />
Janikow, C. (1993). A knowledge <strong>in</strong>tensive genetic algorithm<br />
for supervised learn<strong>in</strong>g. Mach<strong>in</strong>e <strong>Learn<strong>in</strong>g</strong>,<br />
13, 198–228.<br />
Jong, K. D., Spears, W., & Gordon, D. (1993). Us<strong>in</strong>g<br />
Genetic Algorithms for Concept <strong>Learn<strong>in</strong>g</strong>. Mach<strong>in</strong>e<br />
<strong>Learn<strong>in</strong>g</strong>, 13(1/2), 155–188.<br />
Kennedy, C. J., & Giraud-Carrier, C. (1999). A depth<br />
controll<strong>in</strong>g strategy for strongly typed evolutionary<br />
programm<strong>in</strong>g. GECCO 1999: Proceed<strong>in</strong>gs of the<br />
<strong>First</strong> Annual Conference (pp. 1–6). Morgan Kauffman.<br />
Kubat, M., Bratko, I., & Michalski, R. (1998). A review<br />
of Mach<strong>in</strong>e <strong>Learn<strong>in</strong>g</strong> Methods. In R. Michalski,<br />
I. Bratko and M. Kubat (Eds.), Mach<strong>in</strong>e learn<strong>in</strong>g<br />
and data m<strong>in</strong><strong>in</strong>g. Chichester: John Wiley and Sons<br />
Ltd.<br />
Lavrač, N., & Džeroski, S. (1994). Inductive logic programm<strong>in</strong>g:<br />
Techniques and applications. Ellis Horwood.<br />
Lavrač, N., Džeroski, S., & Bratko, I. (1996). Handl<strong>in</strong>g<br />
imperfect data <strong>in</strong> <strong>in</strong>ductive logic programm<strong>in</strong>g. In<br />
L. De Raedt (Ed.), Advances <strong>in</strong> Inductive <strong>Logic</strong> Programm<strong>in</strong>g,<br />
48–64. IOS Press.<br />
Leung, K., & Wong, M. (1995). Genetic logic programm<strong>in</strong>g<br />
and applications. IEEE Expert, 10(5), 68–76.<br />
Michalewicz, Z. (1996). Genetic algorithms + data<br />
structures = evolution programs. Berl<strong>in</strong>: Spr<strong>in</strong>ger-<br />
Verlag.<br />
Mitchell, T. (1997). Mach<strong>in</strong>e learn<strong>in</strong>g. Series <strong>in</strong> Computer<br />
Science. McGraw-Hill.<br />
Muggleton, S. (1999). Inductive logic programm<strong>in</strong>g:<br />
issues, results and the challenge of learn<strong>in</strong>g language<br />
<strong>in</strong> logic. Artificial Intelligence, 114, 283–296.<br />
Muggleton, S., Ba<strong>in</strong>, M., Hayes-Michie, J., & Michie,<br />
D. (1989). An experimental comparison of human<br />
and mach<strong>in</strong>e learn<strong>in</strong>g formalisms. Proceed<strong>in</strong>gs of the<br />
6th International Workshop on Mach<strong>in</strong>e <strong>Learn<strong>in</strong>g</strong><br />
(pp. 113–118). Morgan Kaufmann.<br />
Muggleton, S., & Bunt<strong>in</strong>e, W. (1988). Mach<strong>in</strong>e <strong>in</strong>vention<br />
of first-order predicates by <strong>in</strong>vert<strong>in</strong>g resolution.<br />
Proceed<strong>in</strong>gs of the Fifth International Mach<strong>in</strong>e<br />
<strong>Learn<strong>in</strong>g</strong> Conference (pp. 339–352). Morgan Kaufmann.<br />
Muggleton, S., & Raedt, L. D. (1994). Inductive logic<br />
programm<strong>in</strong>g: Theory and methods. Journal of<br />
<strong>Logic</strong> Programm<strong>in</strong>g, 19-20, 669–679.<br />
Porto, V. (2000). Evolutionary programm<strong>in</strong>g. In<br />
T. Bäck, D. Fogel and T. Michalewicz (Eds.), Evolutionary<br />
computation 1, 89–102. Bristol and Philadelphia:<br />
IoP.<br />
Qu<strong>in</strong>lan, J. (1990). <strong>Learn<strong>in</strong>g</strong> logical def<strong>in</strong>ition from<br />
relations. Mach<strong>in</strong>e <strong>Learn<strong>in</strong>g</strong>, 5, 239–266.<br />
Raedt, L. D. (1992). Interactive concept learn<strong>in</strong>g and<br />
constructive <strong>in</strong>duction by analogy. Mach<strong>in</strong>e <strong>Learn<strong>in</strong>g</strong>,<br />
8, 107–150.
Tamaddoni-Nezhad, A., & Muggleton, S. (2000).<br />
Search<strong>in</strong>g the subsumption lattice by a genetic algorithm.<br />
Proceed<strong>in</strong>gs of the 10th International Conference<br />
on Inductive <strong>Logic</strong> Programm<strong>in</strong>g (pp. 243–253).<br />
Spr<strong>in</strong>ger Verlag.