Last time^{[1]} we took a look at the simulated annealing global minimisation algorithm which searches for points at which functions return their least possible values and which drew its inspiration from the metallurgical process of annealing which minimises the energy state of the crystalline structure of metals by first heating and then slowly cooling them.
Now as it happens, physics isn't the only branch of science from which we can draw inspiration for global optimisation algorithms. For example, in biology we have the process of evolution through which the myriad species of life on Earth have become extraordinarily well adapted to their environments. Put very simply this happens because offspring differ slightly from their parents and differences that reduce the chances that they will survive to have offspring of their own are less likely to be passed down through the generations than those that increase those chances.
Noting that extraordinarily well adapted is more or less synonymous with near maximally adapted, it's not unreasonable to suppose that we might exploit a mathematical model of evolution to search for global maxima of functions, being those points at which they return their greatest possible values.
In nature an offspring's inheritance of characteristics of its parents is governed by molecules of DNA called chromosomes, which are long chains of the bases adenine, thymine, cytosine and guanine. These chains are comprised of genes, which are relatively short sequences of the bases, that are processed by cells to create proteins and are typically represented by strings of the characters A, T, C and G. Now if we were to replace those characters with the digits 0, 1, 2 and 3 respectively then we could equate each gene with a base four integer and the entire set of them as a parametric description of an organism.
Ignoring the actual biochemistry and focussing purely upon the information contained within the genetic code is admittedly a gross simplification of the mechanics of evolution, but it's good enough for a simple model of it.
Firstly, they act upon a population of individuals, each of whose genetic code can be transformed into a set of arguments. Secondly, they have a fitness function which measures how well adapted, or fit, those arguments are. Thirdly, they have a crossover operator which exchanges genetic code between a pair of individuals as an analogue of sexual reproduction. Fourthly, they have a mutation operator which makes a small random change to an individual's genetic code. Finally, they have a selection operator which chooses an individual to copy into the next generation of the population according to its fitness in relation to the fitness of the population as a whole.
Here we are using
Program 1 demonstrates these functions so that you can confirm for yourself that they behave correctly, but keep in mind that the leftmost bits in the genes are the least significant when you do so.
We shall require that fitness is always nonnegative and so whenever the function that we're optimising returns a negative value we replace it with zero, as illustrated by listing 2.
Note that we don't do anything unless the chromosome's fitness is undefined since that's how we're indicating that it hasn't yet been evaluated. Furthermore, because relative comparisons involving NaNs are always false, this will also replace them with zero.
For example, if we represent the bases of the parents' pairs of chromosomes with \(m_i\) and \(m^\prime_i\) and \(f_i\) and \(f^\prime_i\)
As a minor optimisation, we're performing the recombination in place rather than creating offspring bit arrays and replacing the parents' bit arrays with them. Furthermore, since those bit arrays may have consequently changed we're marking the chromosomes as not evaluated by setting their fitnesses to
Program 2 demonstrates the effect of this crossover operator upon a pair of chromosomes, one of whose bits are all equal to zero and the other's whose are all equal to one, together with the arguments that they map to.
Program 3 demonstrates its effect on some randomly generated chromosomes and their associated arguments.
We don't actually need to calculate those probabilities when implementing the selection operator, however. Instead we can draw a uniformly distributed random number greater than or equal to zero and strictly less than the total fitness of the current population and then subtract from it the fitnesses of each individual in turn until it drops below zero. When it does, we have found the individual to copy into the next generation.
Whilst, on the face of it, this seems simple enough, there are a few complications that we must be sure to deal with.
Firstly, if the total fitness of the population is equal to zero then the successive subtractions of the individual's fitnesses, which must all also equal zero, will not take it below zero and so we should instead pick an individual at random.
Secondly, we must remember that floating point numbers are not real numbers and it is therefore possible, however unlikely, that rounding errors during those subtractions will result in the total fitness never falling below zero. In the event of this we should choose the last member of the population with a nonzero fitness, which must exist since we have already dealt with the case that they all have zero fitness.
The, admittedly somewhat terse,
Here the
Firstly, we shall create an initial population by filling an array with
During subsequent generations we shall first apply the crossover operator to some proportion of the population, known as the crossover rate, then apply the mutation operation to some proportion of the total number of bits in the population, known as the mutation rate, before reevaluating any chromosomes that were affected and finally selecting the new population, as done by the
Note that we don't need to randomly select individuals for the crossover operator since both initialisation and selection put them in random order.
As usual, we shall first demonstrate the progress of this algorithm on Rosenbrock's function
Program 4 runs our genetic algorithm on it for thirty generations, plotting the current generation in green and the previous ones in red.
If you run it several times you will find that, whilst the algorithm typically converges somewhere upon the curved ridge, it frequently does so a fair way off from the maximum.
Now you may very well be thinking that this means that the genetic algorithm isn't really much use for optimisation but, to be fair, we wouldn't turn to it for those functions with just one local optimum like Rosenbrock's.
Instead, we should employ it to optimise functions that have many local optima, of which we should like to find one of the best.
For an example of such a function we can use De Jong's \(f_5\), which was part of a test suite used to investigate the best settings for the algorithm^{[2]} and is defined by the formula
Of course, we shall again need to reciprocate this function since our implementation of the genetic algorithm maximises rather than minimises functions and so program 5 applies it to
Well it certainly does a much better job for \(f_5\) than it did for Rosenbrock's function, frequently converging upon one of the peaks at the bottom left of the chart. Nevertheless, there are some significant problems with the genetic algorithm as implemented in this post, which is why I haven't made it part of the
In the next post we shall take a look at a theoretical treatment of the algorithm in order to reveal just what those problems are and how we might go about fixing them.
[2] De Jong, K., An analysis of the behaviour of a class of genetic adaptive systems, Ph.D. thesis, University of Michigan, 1975.
Now as it happens, physics isn't the only branch of science from which we can draw inspiration for global optimisation algorithms. For example, in biology we have the process of evolution through which the myriad species of life on Earth have become extraordinarily well adapted to their environments. Put very simply this happens because offspring differ slightly from their parents and differences that reduce the chances that they will survive to have offspring of their own are less likely to be passed down through the generations than those that increase those chances.
Noting that extraordinarily well adapted is more or less synonymous with near maximally adapted, it's not unreasonable to suppose that we might exploit a mathematical model of evolution to search for global maxima of functions, being those points at which they return their greatest possible values.
In nature an offspring's inheritance of characteristics of its parents is governed by molecules of DNA called chromosomes, which are long chains of the bases adenine, thymine, cytosine and guanine. These chains are comprised of genes, which are relatively short sequences of the bases, that are processed by cells to create proteins and are typically represented by strings of the characters A, T, C and G. Now if we were to replace those characters with the digits 0, 1, 2 and 3 respectively then we could equate each gene with a base four integer and the entire set of them as a parametric description of an organism.
Ignoring the actual biochemistry and focussing purely upon the information contained within the genetic code is admittedly a gross simplification of the mechanics of evolution, but it's good enough for a simple model of it.
The Genetic Algorithm
The genetic algorithm is probably the most widely used of such models for optimisation, although the singular term algorithm is something of a misnomer since there are a great many variants that go by the same name. They do all have some operational characteristics in common, however.Firstly, they act upon a population of individuals, each of whose genetic code can be transformed into a set of arguments. Secondly, they have a fitness function which measures how well adapted, or fit, those arguments are. Thirdly, they have a crossover operator which exchanges genetic code between a pair of individuals as an analogue of sexual reproduction. Fourthly, they have a mutation operator which makes a small random change to an individual's genetic code. Finally, they have a selection operator which chooses an individual to copy into the next generation of the population according to its fitness in relation to the fitness of the population as a whole.
The Chromosome
In their simplest form, individuals are represented by a single binary chromosome which concatenates a set of binary genes \(g_i\), having \(n_i\) bits apiece, that map to the elements of a vector argument of a multivariate function, having lower and upper bounds of \(x^\mathrm{lo}_i\) and \(x^\mathrm{hi}_i\) respectively, with
\[
x_i = x^\mathrm{lo}_i \times \left(1\frac{g_i}{2^{n_i}}\right) + x^\mathrm{hi}_i \times \frac{g_i}{2^{n_i}}
\]
Listing 1 shows how we might initialise an individual whose genes are all the same length using a random bit generator, such as our ak.wardMoltenoRnd
, and convert it to an ak.vector
argument.Listing 1: A Simple Chromosome
function chromosome(genes, bits, bitRnd) { var b = new Array(genes*bits); var i = 0; var gene, bit; for(gene=0;gene<genes;++gene) { for(bit=0;bit<bits;++bit) { b[i++] = bitRnd(); } } return {bits: b, fitness: undefined}; } function toArg(chromosome, genes, bits, lb, ub) { var a = new Array(genes); var i = 0; var gene, bit; for(gene=0;gene<genes;++gene) { a[gene] = 0; for(bit=0;bit<bits;++bit) { a[gene] += chromosome.bits[i++]; a[gene] *= 0.5; } a[gene] = lb.at(gene)*(1a[gene]) + ub.at(gene)*a[gene]; } return ak.vector(a); }
Here we are using
undefined
for the fitness to mean not evaluated. Note that, rather than convert the gene to an integer and then scale it to a value between zero and one, we're directly converting the gene to a number between zero and one by dividing by two after we add each more significant bit. Finally, we're also assuming that the lower and upper bounds are represented by ak.vector
objects.Program 1 demonstrates these functions so that you can confirm for yourself that they behave correctly, but keep in mind that the leftmost bits in the genes are the least significant when you do so.
Program 1: Chromosomes And Arguments



We shall require that fitness is always nonnegative and so whenever the function that we're optimising returns a negative value we replace it with zero, as illustrated by listing 2.
Listing 2: Evaluating The Fitness
function evaluation(f, chromosome, genes, bits, lb, ub) { var y; if(ak.nativeType(chromosome.fitness)===ak.UNDEFINED_T) { y = f(toArg(chromosome, genes, bits, lb, ub)); chromosome.fitness = y>0 ? y : 0; } }
Note that we don't do anything unless the chromosome's fitness is undefined since that's how we're indicating that it hasn't yet been evaluated. Furthermore, because relative comparisons involving NaNs are always false, this will also replace them with zero.
The Crossover Operator
In nature there are typically two versions of each chromosome, one of which is copied into reproductive cells such as spermatozoa and ova. During reproduction, an offspring's pair of chromosomes is constructed by swapping all of the bases up to a certain point, or locus, between the parents' reproductive chromosomes.For example, if we represent the bases of the parents' pairs of chromosomes with \(m_i\) and \(m^\prime_i\) and \(f_i\) and \(f^\prime_i\)
\[
\begin{matrix}
m_0 & m_1 & \dots & m_{n2} & m_{n1}\\
m^\prime_0 & m^\prime_1 & \dots & m^\prime_{n2} & m^\prime_{n1}\\
\\
f_0 & f_1 & \dots & f_{n2} & f_{n1}\\
f^\prime_0 & f^\prime_1 & \dots & f^\prime_{n2} & f^\prime_{n1}
\end{matrix}
\]
then, if the former were present in their reproductive cells and the locus of recombination were \(k\), the offspring's pair of chromosomes would be
\[
\begin{matrix}
f_0 & f_1 & \dots & f_{k1} & f_k & m_{k+1} & \dots & m_{n2} & m_{n1}\\
m_0 & m_1 & \dots & m_{k1} & m_k & f_{k+1} & \dots & f_{n2} & f_{n1}
\end{matrix}
\]
Since our simple individuals only have one chromosome we can't create offspring in this way, so the crossover operator instead replaces the parents' chromosomes with this pair of offspring chromosomes, as illustrated by listing 3.Listing 3: The Crossover Operator
function crossover(chromosome0, chromosome1) { var n = chromosome0.bits.length; var locus = ak.floor(Math.random()*n); var i, tmp; for(i=0;i<=locus;++i) { tmp = chromosome0.bits[i]; chromosome0.bits[i] = chromosome1.bits[i]; chromosome1.bits[i] = tmp; } chromosome0.fitness = undefined; chromosome1.fitness = undefined; }
As a minor optimisation, we're performing the recombination in place rather than creating offspring bit arrays and replacing the parents' bit arrays with them. Furthermore, since those bit arrays may have consequently changed we're marking the chromosomes as not evaluated by setting their fitnesses to
undefined
.Program 2 demonstrates the effect of this crossover operator upon a pair of chromosomes, one of whose bits are all equal to zero and the other's whose are all equal to one, together with the arguments that they map to.
Program 2: The Effect Of Crossover



The Mutation Operator
The mutation operator is the most trivial of them all, simply picking a bit at random in a chromosome, flipping it and then setting the fitness toundefined
, as shown by listing 4.Listing 4: The Mutation Operator
function mutation(chromosome) { var locus = ak.floor(Math.random()*chromosome.bits.length); chromosome.bits[locus] = 1chromosome.bits[locus]; chromosome.fitness = undefined; }
Program 3 demonstrates its effect on some randomly generated chromosomes and their associated arguments.
Program 3: The Effect Of Mutation



The Selection Operator
Finally we come to the selection operator which, in its original form, is definitely more complicated than the operators that we have defined so far. That form is known as roulette wheel selection and sets the probability that the \(i^\mathrm{th}\) individual of the next generation, \(c^\prime_i\), will be a copy of the \(j^{\,\mathrm{th}}\) individual of the current one, \(c_j\), to
\[
\Pr\left(c^\prime_i = c_j\right) = \frac{f_j}{\sum_k f_k}
\]
where \(f_k\) is the fitness of the \(k^\mathrm{th}\) individual in the current population and \(\sum\) is the summation sign. Note that this is the reason why we required fitnesses to be nonnegative; these simply wouldn't be valid probabilities if some of them weren't.We don't actually need to calculate those probabilities when implementing the selection operator, however. Instead we can draw a uniformly distributed random number greater than or equal to zero and strictly less than the total fitness of the current population and then subtract from it the fitnesses of each individual in turn until it drops below zero. When it does, we have found the individual to copy into the next generation.
Whilst, on the face of it, this seems simple enough, there are a few complications that we must be sure to deal with.
Firstly, if the total fitness of the population is equal to zero then the successive subtractions of the individual's fitnesses, which must all also equal zero, will not take it below zero and so we should instead pick an individual at random.
Secondly, we must remember that floating point numbers are not real numbers and it is therefore possible, however unlikely, that rounding errors during those subtractions will result in the total fitness never falling below zero. In the event of this we should choose the last member of the population with a nonzero fitness, which must exist since we have already dealt with the case that they all have zero fitness.
The, admittedly somewhat terse,
selection
function given in listing 5 does just this.Listing 5: The Selection Operator
function selection(population, fitness) { var n = population.length; var i=0; if(fitness===0) return population[ak.floor(Math.random()*n)]; fitness *= Math.random(); while(i<n1 && (fitness=population[i].fitness)>=0) ++i; while(population[i].fitness===0) i; return {bits: population[i].bits.slice(0), fitness: population[i].fitness}; }
Here the
population
argument is an array of chromosome
objects and fitness
is the sum of their fitnesses. Note that we subtract the individual fitnesses from the total fitness as we check whether the result is negative so that the while loop will terminate before the index is incremented if it is. Finally, we must be sure to copy the selected chromosome's bits into a new array in the returned chromosome rather than copy a reference to them since we might select it again for a later slot in the next generation and we wouldn't want the crossover and mutation operators to act on both copies simultaneously.
Generations
With all of these pieces in place we're finally ready to implement the algorithm itself.Firstly, we shall create an initial population by filling an array with
chromosome
objects and evaluating them, as done by initialise
in listing 6.Listing 6: The Initial Population
function initialise(f, size, genes, bits, lb, ub) { var pop = new Array(size); var bitRnd = ak.wardMoltenoRnd(); var i; for(i=0;i<size;++i) { pop[i] = chromosome(genes, bits, bitRnd); evaluation(f, pop[i], genes, bits, lb, ub); } return pop; }
During subsequent generations we shall first apply the crossover operator to some proportion of the population, known as the crossover rate, then apply the mutation operation to some proportion of the total number of bits in the population, known as the mutation rate, before reevaluating any chromosomes that were affected and finally selecting the new population, as done by the
generation
function given in listing 7.Listing 7: Subsequent Generations
function generation(f, population, genes, bits, lb, ub, cross, mutate) { var n = population.length; var next = new Array(n); var c = ak.floor(cross*n); var m = ak.floor(mutate*genes*bits); var fitness = 0; var i; for(i=1;i<c;i+=2) crossover(population[i1], population[i]); for(i=0;i<m;++i) mutation(population[ak.floor(Math.random()*n)]); for(i=0;i<n;++i) evaluation(f, population[i], genes, bits, lb, ub); for(i=0;i<n;++i) fitness += population[i].fitness; for(i=0;i<n;++i) next[i] = selection(population, fitness); return next; }
Note that we don't need to randomly select individuals for the crossover operator since both initialisation and selection put them in random order.
Figure 1: Rosenbrock's Function
\[
f(x, y) = 100 (y  x^2)^2 + (1x)^2
\]
which has a gently sloping curved valley and returns its minimum value of zero when both \(x\) and \(y\) are equal to one, as illustrated by figure 1.
Now since the genetic algorithm maximises rather than minimises function, in this form at least, we shall instead apply it to its reciprocal
\[
f^\prime(x, y) = \frac{1}{1 + 100 (y  x^2)^2 + (1x)^2}
\]
Note that we're adding one to the denominator so that we don't divide by zero, with the result that it returns a maximum value of one when both \(x\) and \(y\) are equal to one.Program 4 runs our genetic algorithm on it for thirty generations, plotting the current generation in green and the previous ones in red.
Program 4: Maximising Rosenbrock's Reciprocal



Figure 2: De Jong's f_{5}
Now you may very well be thinking that this means that the genetic algorithm isn't really much use for optimisation but, to be fair, we wouldn't turn to it for those functions with just one local optimum like Rosenbrock's.
Instead, we should employ it to optimise functions that have many local optima, of which we should like to find one of the best.
For an example of such a function we can use De Jong's \(f_5\), which was part of a test suite used to investigate the best settings for the algorithm^{[2]} and is defined by the formula
\[
f_5(x_0, x_1) = \left(0.002 + \sum_{i=0}^{24} \frac{1}{i+1 + \left(x_0a_{0i}\right)^6 + \left(x_1a_{1i}\right)^6}\right)^{1}
\]
where
\[
\mathbf{a} =
\begin{pmatrix}
32&16&0&16&32&32&16&0&16&32&\dots&32&16&0&16&32\\
32&32&32&32&32&16&16&16&16&16&\dots&32&32&32&32&32
\end{pmatrix}
\]
This has twenty five narrow local minima and takes its least value of approximately one when both \(x_0\) and \(x_1\) are equal to minus thirty two, which is at the centre of the bottom left dark square in figure 2.Of course, we shall again need to reciprocate this function since our implementation of the genetic algorithm maximises rather than minimises functions and so program 5 applies it to
\[
f^\prime_5(x_0, x_1) = 0.002 + \sum_{i=0}^{24} \frac{1}{i+1 + \left(x_0a_{0i}\right)^6 + \left(x_1a_{1i}\right)^6}
\]
Program 5: Maximising De Jong's f_{5} Reciprocal



Well it certainly does a much better job for \(f_5\) than it did for Rosenbrock's function, frequently converging upon one of the peaks at the bottom left of the chart. Nevertheless, there are some significant problems with the genetic algorithm as implemented in this post, which is why I haven't made it part of the
ak
library.In the next post we shall take a look at a theoretical treatment of the algorithm in order to reveal just what those problems are and how we might go about fixing them.
References
[1] Annealing Down, www.thusspakeak.com, 2017.[2] De Jong, K., An analysis of the behaviour of a class of genetic adaptive systems, Ph.D. thesis, University of Michigan, 1975.
Leave a comment