Views
3 years ago

An empirical study of the efficiency of learning ... - ResearchGate

An empirical study of the efficiency of learning ... - ResearchGate

appear to be much more

appear to be much more effective at solving boolean concept learning than much more sophisticated methods employing GP or EP [Koza 94 ], [Chellapilla 98]. Another objective is to show that a GA is sometimes most efficient when a tiny population size is used. It was this fact which suggested to the author that a PH might be more efficient. 2 CARTESIAN GENETIC PROGRAMMING In CGP a program is seen as a rectangular array of nodes. The nodes represent any operation on the data seen at its inputs. Each node may implement any convenient programming construct (if, switch, OR, * etc.). All the inputs whether primary data, node inputs, node outputs, and program outputs are sequentially indexed by integers. The functions of the nodes are also separately sequentially indexed. The chromosome is just a linear string of these integers. The idea is best explained with a simple example. Fig 1. shows the genotype and the corresponding phenotype for a program which implements both the difference in volume between two boxes V 1 - V 2 , and the sum of the volumes, V 1 + V 2 , where, V 1 = X 1 X 2 X 3 , V 2 =Y 1 Y 2 Y 3 . The particular values of the dimensions of the two boxes X 1 , X 2 , X 3 , Y 1 , Y 2 , Y 3 , are labelled 0-5, and are seen on the left. The function set is nominally {0=Plus, 1=minus, 2=multiply, 3=divide, 4=or, 5=xor}, the functions actually used in this example are shown in bold in the genotype and are seen inside the nodes. It is nor necessary for the function types to be embedded in the genotype in this way, they could just as well form a contiguous section of the genome. The program outputs are taken from node outputs 10 and 11, V 1 and V 2 are each re-used in the calculation of the two outputs. Genotype 0 1 2 3 4 2 6 2 2 7 5 2 8 9 0 8 9 1 10 11 Phenotype 0 0 6 8 1 2 2 0 2 1 * 6 2 * 8 9 + 10 10 3 4 5 3 4 2 * 7 7 5 2 * 9 8 9 1 - 11 11 a particular cell, is referred to as levels-back. Using a levels-back =1 forces maximum re-use of individual node outputs but hampers large scale re-use of collections of nodes. However using levels-back = number of columns with only a single row allows unrestricted connectivity of nodes and program inputs. One of the advantages of this representation of a program is that the chromosome representation used is independent of the data type used for the problem, as the chromosome consists of addresses where data is stored. Additionally when the fitness of a chromosome is calculated no interpretation of the genome is required to obtain the addresses in data arrays. Unlike LISP expressions there are no syntactical constraints which must be observed when crossover is carried out. Mutation is very simple one merely has to allow changes to the genes which respect either the functional constraints or the constraints imposed by levels-back. Nodes do not have to be connected and can therefore be redundant, thus the number of nodes used can vary from 0 to the maximum number available. Automatically defined functions emerge quite naturally in this scheme as if a particular collection of gates is very useful then it may be connected many times. In the example shown in Fig 1. There is good re-use of sub-trees with outputs 8 and 9. In the example shown all the nodes have the same number of inputs; this is a convenience, not a fundamental requirement. Thus the representation could be readily generalised to accommodate variable number of inputs and outputs for each node. Whether the representation discussed offers more efficient evolution of programs in general, will have to await further experiments. However the effectiveness of the closely related PDGP [Poli 97] suggests that that signs are favourable. In this paper a special case of CGP is employed where the data type is binary and the network is allowed to be feedforward only, this is appropriate for Boolean concept learning. The function set for this is shown in Table 1. Table 1: Allowed cell functions 0 1 2 3 4 5 6 7 8 9 0 1 a b ⎺a ⎺b ab a⎺b ⎺ab ⎺a⎺b 10 11 12 13 14 a ⊕ b a ⊕⎺b a + b a +⎺b ⎺a + b 15 16 17 18 19 ⎺a +⎺b a⎺c + bc a⎺c +⎺bc ⎺a⎺c + bc ⎺a⎺c + ⎺bc Figure 1: An example CGP genotype and phenotype If no sequential behaviour is assumed then the inputs of vertical lines of nodes can only be connected to the outputs (or program inputs) which are on the left. The number of columns on the left, which may be connected to All the nodes are assumed to possess three-inputs, if the functions require less, then some connections are ignored, this introduces an additional redundancy into the genome. In Table 1, ab implies a AND b, ⎺a indicates NOT a, ⊕ represents the exclusive-OR operation and + the OR operation. Functions 0-15 are the basic binary functions of

0, 1 and two inputs. Functions 16-19 are all binary multiplexers with various inputs inverted. The multiplexer (MUX) implements a simple IF-THEN statement (i.e. IF c=0 THEN a ELSE b). These functions (16-19) are called universal logic modules (ULMs). They are well known to be very effective and efficient building blocks for logic circuits [ Chen and Hurst 82]. 3 CHARACTERISTICS OF THE GENETIC ALGORITHM AND THE PROBABILISTIC HILLCLIMBER The GA used in this paper is very simple. It is generational in nature, with uniform crossover (50% of genetic material is exchanged), random mutation, and size two probabilistic tournament selection. In this method of parent selection, the fittest chromosome in a tournament is only accepted with a given probability (in this case 0.7), otherwise, the chromosome with the lower fitness is chosen. The amount of genetic recombination is determined by the breeding rate, which represents the percentage of the population, which will take part in recombination. The mutation rate is defined as the percentage of the genes of the entire population, which will undergo mutation. The GA always employs simple elitism where the fittest chromosome of one generation is automatically promoted to the next. There is strong evidence [Miller 98a], that this is extremely beneficial. The fitness of a chromosome is calculated as the ratio of the number of correct output bits divided by the total number of output bits taken over all input combinations. The GA terminates after the chosen number of generations, or when 100% correctness is reached (whichever is the sooner). The PH algorithm begins with a randomly initialised population of chromosomes. The best chromosome is promoted to the next generation, all the remaining population members are mutations of this chromosome. The process is iterated until termination (same conditions as GA). The only parameters associated with this algorithm are: number of runs, population size, number of generations, and mutation rate. The larger the population the stronger the selection pressure. The same genotype representation was used for both the GA and PH algorithms. 4 DEFINITIONS AND RESULTS The problems studied in this paper are the even-parity functions, with 3,4, and 5 inputs, and the 2-bit multiplier. The n-bit parity function has n binary inputs, and a single binary output. If the parity is even the output is one if there are an even number of ones in the input stream. The even parity functions of a given number of variables are the most difficult functions to find when carrying out a random search of all GP trees with function set {and, or, nand, nor} [Koza 92]. The n-bit multiplier has 2 n-bit inputs and one 2n-bit output, which is the binary result of multiplying each of the n-bit inputs. It is a difficult function to evolve even when using the complete set of logic gates shown in Table 1. The reason for studying it here is that it differs markedly from the parity functions in that it is built most efficiently with a variety of gates, unlike the parity functions which can be easily built with a single gate (xor). The method used to assess the effectiveness of an algorithm, or a set of parameters, is that favoured by Koza [Koza 92]. It consists of calculating the number of individual chromosomes, which would have to be processed to give a certain probability of success. To calculate this figure one must first calculate the cumulative probability of success P(M, i), where M represents the population size, and i the generation number. R( z) represents the number of independent runs required for a probability of success (100% functional), given by z, by generation i. I(M, z, i) represents the minimum number of chromosomes which must be processed to give a probability of success z, by generation i. The formulae for these are given below, N s (i) represents the number of successful runs at generation i, and N total , represents the total number of runs: N s ( i) ⎧ log(1 − z) ⎫ P( M , i) = , R ( z) = ceil⎨ ⎬ , N total log(1 − P( M , i) ⎭ I(M, i, z) = M R( z) i Note that when z =1.0 the formulae are invalid (all runs successful). In the tables and graphs of this section z takes the value 0.99 unless stated otherwise. The variation of I(M, z, i) with population size has been investigated for the parity, and multiplier functions. The set of primitive functions used for the parity functions (gate set) was {and, or, nand, nor}, unless stated to the contrary, and for the multiplier all gates were allowed. For the 4-bit even-parity function, I(M, z, i) was investigated as a function of M, for three different geometry sizes, 16 x 16, 10 x 10, and 3 x 3, the latter two employed the complete set of allowed primitives (Table 1) Also three geometries were chosen for the 2-bit multiplier, 10 x 10, 7 x 7, and 4 x 4. The different geometries were investigated because the difficulty of the boolean concept learning depends on the amount of resources allocated [Miller 98b], thus it was anticipated that the GA parameters most likely to lead to success would be dependent on this. It should be noted that using I(M, z, i) as a measure of computational effort does not directly equate to CPU time when different geometries are being used. A more rigorous treatment would take this into account but the simple object here was to adopt the measure that other researchers have used. It took a great deal of time to collect all the data shown in this section as hundreds of runs of thousands of generations were required for each data point shown in the graphs. In all the tables the figures in parentheses in the R(z) column refer to the number of successful runs (out of

An Empirical Comparative Study of Job ... - ResearchGate
Efficiently Supporting Structure Queries on ... - ResearchGate
An Empirical Study of Efficiency and Accuracy of Probabilistic ...
Empirical Study on the Efficiency of Search Based Test Generation ...
Does weather matter on efficiency? An empirical study of the ...
Empirical study on the efficiency of search based test generation for ...
Efficiently Evaluating Graph Constraints in Content ... - ResearchGate
An Empirical Study of Active Learning with Support Vector Machines ...
Efficient, decentralized computation of the topology ... - ResearchGate
AN EFFICIENT FRAMEWORK FOR ROBUST ... - ResearchGate
Empirical study on the efficiency of search based test generation for ...
Empirical Models of Cultural Transmission - ResearchGate
Computationally Efficient Multi-task Learning with ... - ResearchGate
Some Empirical Regularities of Spatial Economies ... - ResearchGate
Empirical Evidence on Satisfaction with ... - ResearchGate
Toward efficient agnostic learning - Princeton University
Efficient Implementation of Nearest Neighbor ... - ResearchGate
An efficient algorithm for incremental mining of ... - ResearchGate
Efficient Shift-Adds Design of Digit-Serial Multiple ... - ResearchGate
The Empirical Likelihood: an alternative for Signal ... - ResearchGate
Competition and Quality Restoration: An Empirical ... - ResearchGate
An Energy Efficient Routing Scheme for Mobile ... - ResearchGate
Efficient Aggregation of Delay-Constrained Data in ... - ResearchGate
An Empirical Evaluation of Automated Black-Box ... - ResearchGate
an Empirical Analysis with Conditional ... - ResearchGate
An Empirical Study on the Performance of Spectral Manifold Learning Techniques
Efficient Searching with Linear Constraints - ResearchGate
Efficient on-line algorithms for Euler diagram region ... - ResearchGate
FAST AND EFFICIENT MRF-BASED BLOTCH ... - ResearchGate
Efficient combination of parametric spaces, models ... - ResearchGate