- Text
- Parity,
- Functions,
- Population,
- Multiplier,
- Programming,
- Boolean,
- Variation,
- Chromosome,
- Algorithm,
- Nodes,
- Empirical,
- Efficiency,
- Researchgate,
- Www.cartesiangp.co.uk

An empirical study of the efficiency of learning ... - ResearchGate

appear to be much more effective at solving boolean concept **learning** than much more sophisticated methods employing GP or EP [Koza 94 ], [Chellapilla 98]. **An**o**the**r objective is to show that a GA is sometimes most efficient when a tiny population size is used. It was this fact which suggested to **the** author that a PH might be more efficient. 2 CARTESIAN GENETIC PROGRAMMING In CGP a program is seen as a rectangular array **of** nodes. The nodes represent any operation on **the** data seen at its inputs. Each node may implement any convenient programming construct (if, switch, OR, * etc.). All **the** inputs whe**the**r primary data, node inputs, node outputs, and program outputs are sequentially indexed by integers. The functions **of** **the** nodes are also separately sequentially indexed. The chromosome is just a linear string **of** **the**se integers. The idea is best explained with a simple example. Fig 1. shows **the** genotype and **the** corresponding phenotype for a program which implements both **the** difference in volume between two boxes V 1 - V 2 , and **the** sum **of** **the** volumes, V 1 + V 2 , where, V 1 = X 1 X 2 X 3 , V 2 =Y 1 Y 2 Y 3 . The particular values **of** **the** dimensions **of** **the** two boxes X 1 , X 2 , X 3 , Y 1 , Y 2 , Y 3 , are labelled 0-5, and are seen on **the** left. The function set is nominally {0=Plus, 1=minus, 2=multiply, 3=divide, 4=or, 5=xor}, **the** functions actually used in this example are shown in bold in **the** genotype and are seen inside **the** nodes. It is nor necessary for **the** function types to be embedded in **the** genotype in this way, **the**y could just as well form a contiguous section **of** **the** genome. The program outputs are taken from node outputs 10 and 11, V 1 and V 2 are each re-used in **the** calculation **of** **the** two outputs. Genotype 0 1 2 3 4 2 6 2 2 7 5 2 8 9 0 8 9 1 10 11 Phenotype 0 0 6 8 1 2 2 0 2 1 * 6 2 * 8 9 + 10 10 3 4 5 3 4 2 * 7 7 5 2 * 9 8 9 1 - 11 11 a particular cell, is referred to as levels-back. Using a levels-back =1 forces maximum re-use **of** individual node outputs but hampers large scale re-use **of** collections **of** nodes. However using levels-back = number **of** columns with only a single row allows unrestricted connectivity **of** nodes and program inputs. One **of** **the** advantages **of** this representation **of** a program is that **the** chromosome representation used is independent **of** **the** data type used for **the** problem, as **the** chromosome consists **of** addresses where data is stored. Additionally when **the** fitness **of** a chromosome is calculated no interpretation **of** **the** genome is required to obtain **the** addresses in data arrays. Unlike LISP expressions **the**re are no syntactical constraints which must be observed when crossover is carried out. Mutation is very simple one merely has to allow changes to **the** genes which respect ei**the**r **the** functional constraints or **the** constraints imposed by levels-back. Nodes do not have to be connected and can **the**refore be redundant, thus **the** number **of** nodes used can vary from 0 to **the** maximum number available. Automatically defined functions emerge quite naturally in this scheme as if a particular collection **of** gates is very useful **the**n it may be connected many times. In **the** example shown in Fig 1. There is good re-use **of** sub-trees with outputs 8 and 9. In **the** example shown all **the** nodes have **the** same number **of** inputs; this is a convenience, not a fundamental requirement. Thus **the** representation could be readily generalised to accommodate variable number **of** inputs and outputs for each node. Whe**the**r **the** representation discussed **of**fers more efficient evolution **of** programs in general, will have to await fur**the**r experiments. However **the** effectiveness **of** **the** closely related PDGP [Poli 97] suggests that that signs are favourable. In this paper a special case **of** CGP is employed where **the** data type is binary and **the** network is allowed to be feedforward only, this is appropriate for Boolean concept **learning**. The function set for this is shown in Table 1. Table 1: Allowed cell functions 0 1 2 3 4 5 6 7 8 9 0 1 a b ⎺a ⎺b ab a⎺b ⎺ab ⎺a⎺b 10 11 12 13 14 a ⊕ b a ⊕⎺b a + b a +⎺b ⎺a + b 15 16 17 18 19 ⎺a +⎺b a⎺c + bc a⎺c +⎺bc ⎺a⎺c + bc ⎺a⎺c + ⎺bc Figure 1: **An** example CGP genotype and phenotype If no sequential behaviour is assumed **the**n **the** inputs **of** vertical lines **of** nodes can only be connected to **the** outputs (or program inputs) which are on **the** left. The number **of** columns on **the** left, which may be connected to All **the** nodes are assumed to possess three-inputs, if **the** functions require less, **the**n some connections are ignored, this introduces an additional redundancy into **the** genome. In Table 1, ab implies a AND b, ⎺a indicates NOT a, ⊕ represents **the** exclusive-OR operation and + **the** OR operation. Functions 0-15 are **the** basic binary functions **of**

0, 1 and two inputs. Functions 16-19 are all binary multiplexers with various inputs inverted. The multiplexer (MUX) implements a simple IF-THEN statement (i.e. IF c=0 THEN a ELSE b). These functions (16-19) are called universal logic modules (ULMs). They are well known to be very effective and efficient building blocks for logic circuits [ Chen and Hurst 82]. 3 CHARACTERISTICS OF THE GENETIC ALGORITHM AND THE PROBABILISTIC HILLCLIMBER The GA used in this paper is very simple. It is generational in nature, with uniform crossover (50% **of** genetic material is exchanged), random mutation, and size two probabilistic tournament selection. In this method **of** parent selection, **the** fittest chromosome in a tournament is only accepted with a given probability (in this case 0.7), o**the**rwise, **the** chromosome with **the** lower fitness is chosen. The amount **of** genetic recombination is determined by **the** breeding rate, which represents **the** percentage **of** **the** population, which will take part in recombination. The mutation rate is defined as **the** percentage **of** **the** genes **of** **the** entire population, which will undergo mutation. The GA always employs simple elitism where **the** fittest chromosome **of** one generation is automatically promoted to **the** next. There is strong evidence [Miller 98a], that this is extremely beneficial. The fitness **of** a chromosome is calculated as **the** ratio **of** **the** number **of** correct output bits divided by **the** total number **of** output bits taken over all input combinations. The GA terminates after **the** chosen number **of** generations, or when 100% correctness is reached (whichever is **the** sooner). The PH algorithm begins with a randomly initialised population **of** chromosomes. The best chromosome is promoted to **the** next generation, all **the** remaining population members are mutations **of** this chromosome. The process is iterated until termination (same conditions as GA). The only parameters associated with this algorithm are: number **of** runs, population size, number **of** generations, and mutation rate. The larger **the** population **the** stronger **the** selection pressure. The same genotype representation was used for both **the** GA and PH algorithms. 4 DEFINITIONS AND RESULTS The problems studied in this paper are **the** even-parity functions, with 3,4, and 5 inputs, and **the** 2-bit multiplier. The n-bit parity function has n binary inputs, and a single binary output. If **the** parity is even **the** output is one if **the**re are an even number **of** ones in **the** input stream. The even parity functions **of** a given number **of** variables are **the** most difficult functions to find when carrying out a random search **of** all GP trees with function set {and, or, nand, nor} [Koza 92]. The n-bit multiplier has 2 n-bit inputs and one 2n-bit output, which is **the** binary result **of** multiplying each **of** **the** n-bit inputs. It is a difficult function to evolve even when using **the** complete set **of** logic gates shown in Table 1. The reason for **study**ing it here is that it differs markedly from **the** parity functions in that it is built most efficiently with a variety **of** gates, unlike **the** parity functions which can be easily built with a single gate (xor). The method used to assess **the** effectiveness **of** an algorithm, or a set **of** parameters, is that favoured by Koza [Koza 92]. It consists **of** calculating **the** number **of** individual chromosomes, which would have to be processed to give a certain probability **of** success. To calculate this figure one must first calculate **the** cumulative probability **of** success P(M, i), where M represents **the** population size, and i **the** generation number. R( z) represents **the** number **of** independent runs required for a probability **of** success (100% functional), given by z, by generation i. I(M, z, i) represents **the** minimum number **of** chromosomes which must be processed to give a probability **of** success z, by generation i. The formulae for **the**se are given below, N s (i) represents **the** number **of** successful runs at generation i, and N total , represents **the** total number **of** runs: N s ( i) ⎧ log(1 − z) ⎫ P( M , i) = , R ( z) = ceil⎨ ⎬ , N total log(1 − P( M , i) ⎭ I(M, i, z) = M R( z) i Note that when z =1.0 **the** formulae are invalid (all runs successful). In **the** tables and graphs **of** this section z takes **the** value 0.99 unless stated o**the**rwise. The variation **of** I(M, z, i) with population size has been investigated for **the** parity, and multiplier functions. The set **of** primitive functions used for **the** parity functions (gate set) was {and, or, nand, nor}, unless stated to **the** contrary, and for **the** multiplier all gates were allowed. For **the** 4-bit even-parity function, I(M, z, i) was investigated as a function **of** M, for three different geometry sizes, 16 x 16, 10 x 10, and 3 x 3, **the** latter two employed **the** complete set **of** allowed primitives (Table 1) Also three geometries were chosen for **the** 2-bit multiplier, 10 x 10, 7 x 7, and 4 x 4. The different geometries were investigated because **the** difficulty **of** **the** boolean concept **learning** depends on **the** amount **of** resources allocated [Miller 98b], thus it was anticipated that **the** GA parameters most likely to lead to success would be dependent on this. It should be noted that using I(M, z, i) as a measure **of** computational effort does not directly equate to CPU time when different geometries are being used. A more rigorous treatment would take this into account but **the** simple object here was to adopt **the** measure that o**the**r researchers have used. It took a great deal **of** time to collect all **the** data shown in this section as hundreds **of** runs **of** thousands **of** generations were required for each data point shown in **the** graphs. In all **the** tables **the** figures in paren**the**ses in **the** R(z) column refer to **the** number **of** successful runs (out **of**

- Page 1: An empirical study of the efficienc
- Page 5 and 6: I (M, N, z) 8.0E+05 6.0E+05 4.0E+05
- Page 7 and 8: I (M, N, z) 25000 20000 15000 10000