28.01.2015 Views

Optimizing Sorting with Genetic Algorithms - Polaris

Optimizing Sorting with Genetic Algorithms - Polaris

Optimizing Sorting with Genetic Algorithms - Polaris

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3.2 Optimization of <strong>Sorting</strong> <strong>with</strong> <strong>Genetic</strong> <strong>Algorithms</strong><br />

3.2.1 Encoding<br />

As discussed above we use a tree based schema where the nodes<br />

of the tree are sorting and selection primitives.<br />

3.2.2 Operators<br />

<strong>Genetic</strong> operators are used to derive new offsprings and introduce<br />

changes in the population. Crossover and mutation are the two<br />

operators that most genetic algorithms use. Crossover exchanges<br />

subtrees from different trees. Mutation operator applies changes to<br />

a single tree. Next, we explain how we apply these two operators.<br />

Crossover<br />

The purpose of crossover is to generate new offsprings that have<br />

better performance than their parents. This is likely to happen when<br />

the new offsprings inherit the best subtrees of the parents.<br />

In this paper we use single-point cross-over, since it can preserve<br />

the whole subtree <strong>with</strong>out changing its structure. Figure 6 shows<br />

an example of single-point crossover.<br />

DV<br />

LDV<br />

Parent trees<br />

DP<br />

DR<br />

DR<br />

BE<br />

LDV<br />

LDR<br />

DR<br />

Offsprings<br />

Figure 6: Crossover of sorting trees.<br />

Mutation<br />

Mutation works on a single tree, where it produces some changes,<br />

introducing some diversity in the population. This operator, however,<br />

does not produce new offsprings. Mutation prevents the population<br />

from permanent fixation at any particular generation [1].<br />

This, to some extent, avoids the local optimal problem. Mutation<br />

changes the parameter values hoping to find better ones. Our mutation<br />

operator can perform the following changes:<br />

1. Change the values of the parameters in the sorting and selection<br />

primitive nodes. The parameters are changed randomly but the<br />

new value is close to the old one, <strong>with</strong> the hope that the parameter<br />

will converge to the best values.<br />

2. Flip: Exchange two subtrees. This type of mutation can help<br />

in cases like the one shown in Figure 7-(a) where a subtree that<br />

is good to sort sets of less than 4M records is being applied to<br />

larger sets. By exchanging the subtrees we can correct this type<br />

of misplacement.<br />

3. Grow: Add a new subtree. This type of mutation is helpful<br />

when more partitioning is required along one path in the tree.<br />

Figure 7-(b) shows an example of this mutation.<br />

4. Truncate: Remove a subtree. Unnecessary subtrees can be deleted<br />

<strong>with</strong> this operation.<br />

LDV<br />

DP<br />

DR<br />

BE<br />

DV<br />

LDV<br />

LDR<br />

Optimal subtee<br />

for 4 M records<br />

4M<br />

Subtree A<br />

BS<br />

BS<br />

Optimal subtee<br />

for 4 M records<br />

Exchange subtrees<br />

4M<br />

(a)<br />

Subtree A<br />

Optimal subtee<br />

for 4 M records<br />

Add a new<br />

subtree<br />

DP s=4M records<br />

Optimal subtee<br />

for 4 M records<br />

Figure 7: Mutation operator. (a)-Exchange subtrees. (b)-Add<br />

a new subtree.<br />

3.2.3 Fitness Function<br />

The fitness function determines the probability for an individual<br />

to reproduce in the population. The higher the fitness of an individual,<br />

the higher the chances it will reproduce and mutate.<br />

In our case, performance will be used as the fitness function.<br />

However, the following two considerations have been taken into<br />

account in the design of our fitness function:<br />

1. We are searching for a sorting algorithm that performs well<br />

across all possible inputs. Thus, the average performance of<br />

a tree can be used as its base fitness. However, we also want<br />

the sorting algorithm to consistently perform well across inputs.<br />

Thus, the tree <strong>with</strong> a variable performance is penalized <strong>with</strong> a<br />

factor that depends on the standard deviation of its performance<br />

when sorting the test inputs.<br />

2. In the first generations, the fitness variance of the population<br />

is high, that is, a small number of sorting trees have a much<br />

better performance than the other ones. If our fitness function<br />

was directly proportional to the performance of the tree, most of<br />

the offsprings would be the descendants of these few trees, since<br />

they would have a much higher probability to reproduce. As a result,<br />

these offsprings would soon occupy most of the population.<br />

This could result in premature convergence, which would prevent<br />

the system from exploring areas of the search space outside the<br />

neighborhood of the highly fit trees. To address this problem, our<br />

fitness function uses the performance order or rank of the sorting<br />

trees in the population. By using the performance ranking, the<br />

absolute performance difference between trees is not considered<br />

and the trees <strong>with</strong> lower performance have more probability to<br />

reproduce than if the absolute performance value had been used.<br />

This avoids the problem of early convergence and of convergence<br />

to a local optimum.<br />

3.2.4 Evolution Algorithm<br />

An important decision is to choose the appropriate evolution algorithm.<br />

The evolution algorithm determines how many offsprings<br />

will be generated, how many individuals of the current generation<br />

will be replaced and so on. The search for the optimal sorting algorithm<br />

is an incremental learning problem and, as such, what has<br />

already been learned should be remembered and used later.<br />

In this work we use a steady-state evolution algorithm. For each<br />

generation, only a small number of the least fit individuals in the<br />

current generation are replaced by the new generated offsprings.<br />

(b)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!