Optimizing Sorting with Genetic Algorithms - Polaris
Optimizing Sorting with Genetic Algorithms - Polaris
Optimizing Sorting with Genetic Algorithms - Polaris
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
3.2 Optimization of <strong>Sorting</strong> <strong>with</strong> <strong>Genetic</strong> <strong>Algorithms</strong><br />
3.2.1 Encoding<br />
As discussed above we use a tree based schema where the nodes<br />
of the tree are sorting and selection primitives.<br />
3.2.2 Operators<br />
<strong>Genetic</strong> operators are used to derive new offsprings and introduce<br />
changes in the population. Crossover and mutation are the two<br />
operators that most genetic algorithms use. Crossover exchanges<br />
subtrees from different trees. Mutation operator applies changes to<br />
a single tree. Next, we explain how we apply these two operators.<br />
Crossover<br />
The purpose of crossover is to generate new offsprings that have<br />
better performance than their parents. This is likely to happen when<br />
the new offsprings inherit the best subtrees of the parents.<br />
In this paper we use single-point cross-over, since it can preserve<br />
the whole subtree <strong>with</strong>out changing its structure. Figure 6 shows<br />
an example of single-point crossover.<br />
DV<br />
LDV<br />
Parent trees<br />
DP<br />
DR<br />
DR<br />
BE<br />
LDV<br />
LDR<br />
DR<br />
Offsprings<br />
Figure 6: Crossover of sorting trees.<br />
Mutation<br />
Mutation works on a single tree, where it produces some changes,<br />
introducing some diversity in the population. This operator, however,<br />
does not produce new offsprings. Mutation prevents the population<br />
from permanent fixation at any particular generation [1].<br />
This, to some extent, avoids the local optimal problem. Mutation<br />
changes the parameter values hoping to find better ones. Our mutation<br />
operator can perform the following changes:<br />
1. Change the values of the parameters in the sorting and selection<br />
primitive nodes. The parameters are changed randomly but the<br />
new value is close to the old one, <strong>with</strong> the hope that the parameter<br />
will converge to the best values.<br />
2. Flip: Exchange two subtrees. This type of mutation can help<br />
in cases like the one shown in Figure 7-(a) where a subtree that<br />
is good to sort sets of less than 4M records is being applied to<br />
larger sets. By exchanging the subtrees we can correct this type<br />
of misplacement.<br />
3. Grow: Add a new subtree. This type of mutation is helpful<br />
when more partitioning is required along one path in the tree.<br />
Figure 7-(b) shows an example of this mutation.<br />
4. Truncate: Remove a subtree. Unnecessary subtrees can be deleted<br />
<strong>with</strong> this operation.<br />
LDV<br />
DP<br />
DR<br />
BE<br />
DV<br />
LDV<br />
LDR<br />
Optimal subtee<br />
for 4 M records<br />
4M<br />
Subtree A<br />
BS<br />
BS<br />
Optimal subtee<br />
for 4 M records<br />
Exchange subtrees<br />
4M<br />
(a)<br />
Subtree A<br />
Optimal subtee<br />
for 4 M records<br />
Add a new<br />
subtree<br />
DP s=4M records<br />
Optimal subtee<br />
for 4 M records<br />
Figure 7: Mutation operator. (a)-Exchange subtrees. (b)-Add<br />
a new subtree.<br />
3.2.3 Fitness Function<br />
The fitness function determines the probability for an individual<br />
to reproduce in the population. The higher the fitness of an individual,<br />
the higher the chances it will reproduce and mutate.<br />
In our case, performance will be used as the fitness function.<br />
However, the following two considerations have been taken into<br />
account in the design of our fitness function:<br />
1. We are searching for a sorting algorithm that performs well<br />
across all possible inputs. Thus, the average performance of<br />
a tree can be used as its base fitness. However, we also want<br />
the sorting algorithm to consistently perform well across inputs.<br />
Thus, the tree <strong>with</strong> a variable performance is penalized <strong>with</strong> a<br />
factor that depends on the standard deviation of its performance<br />
when sorting the test inputs.<br />
2. In the first generations, the fitness variance of the population<br />
is high, that is, a small number of sorting trees have a much<br />
better performance than the other ones. If our fitness function<br />
was directly proportional to the performance of the tree, most of<br />
the offsprings would be the descendants of these few trees, since<br />
they would have a much higher probability to reproduce. As a result,<br />
these offsprings would soon occupy most of the population.<br />
This could result in premature convergence, which would prevent<br />
the system from exploring areas of the search space outside the<br />
neighborhood of the highly fit trees. To address this problem, our<br />
fitness function uses the performance order or rank of the sorting<br />
trees in the population. By using the performance ranking, the<br />
absolute performance difference between trees is not considered<br />
and the trees <strong>with</strong> lower performance have more probability to<br />
reproduce than if the absolute performance value had been used.<br />
This avoids the problem of early convergence and of convergence<br />
to a local optimum.<br />
3.2.4 Evolution Algorithm<br />
An important decision is to choose the appropriate evolution algorithm.<br />
The evolution algorithm determines how many offsprings<br />
will be generated, how many individuals of the current generation<br />
will be replaced and so on. The search for the optimal sorting algorithm<br />
is an incremental learning problem and, as such, what has<br />
already been learned should be remembered and used later.<br />
In this work we use a steady-state evolution algorithm. For each<br />
generation, only a small number of the least fit individuals in the<br />
current generation are replaced by the new generated offsprings.<br />
(b)