3.3 Machine Learning Methods for Decoder Design 107

The neural network equations would then be used to adapt the weights, thereby considering

that the mutual information has been obtained by the sum equations with the J(.)

function.

We can see the paradox of this method: The error minimization by back propagating

the error gradient is performed based on the neural network equations which assume the

absence of cycle whereas the actual output is the mutual information of messages on the

cycle graph of C, **and** thus cannot respect the hypothesis. This is problematic since we

want the weights to balance the message dependencies. This is the reason why we cannot

use such a supervised learning approach for error minimization.

Genetic Algorithm to solve the optimization problem

The cost function defined in equation 3.8, we choose to minimize, has no analytical expression.

Therefore, we are going to choose an optimization algorithm which does not

require analytical expression of the cost function. We have decided to use a genetic algorithm

[74]. The flow of the optimization procedure is depicted on figure 3.7. An allele of

the population vectors is made of weights for the t th iteration: weights w (2t) to balance

messages going out of variable nodes **and** weights w (2t+1) to balance messages going out

of check nodes. The size of the vectors h**and**led by the genetic algorithm is

D = ∑ i

(d v (i) − 1) ∗ d v (i) + ∑ j

(d c (j) − 1) ∗ d c (j)

where d v (i) **and** d c (j) are the connection degrees of the i th variable node **and** j th check

node, respectively.

In practice, we have implemented the genetic algorithm, thanks to the C library PGApack

Parallel Genetic Algorithm Library provided at [79]. We have tried to find weights

for the MacKay (3,6) code with code length N = 96 at various SNRs. For a population

size of 200 vectors, N c = 10000 **and** N iter = 10, the algorithm takes about a week on a

last generation CPU.

3.3.5 Estimating the mutual information

To implement the above approach, we have to evaluate the mutual information averaged

over all the edges of the graph, at a given iteration. To do so, we use a mean estimator for

the expectation of definition 5. We set the SNR, **and** then send a given number, say N c , of

noisy codewords. Then we evaluate the mutual information as:

1 − 1 ∑N c

log

N 2

(1 + e −w(t,n)) (3.10)

c

n=1

where w (t,n) is any message of the chosen kind (from check-to-variable or variable-tocheck)

of the graph at the t th iteration when the n th observation is received. This has to

be done to evaluate the cost function for each population vector. For good convergence