106 Chapitre 3 : Machine Learning Methods for Code and Decoder Design iteration t, we denote by x vc (F) (t) the mutual information of messages going out of variable nodes when decoding a code of the ensemble F, made of all the possible cycle-free (infinitely long) codes, with same irregularity parameters as C. The mutual information of messages going out of variable nodes, averaged over all the edges of the code C with parity-check matrix H at iteration t, is denoted by y vc (C) (t). Hence, y vc (C) (t) depends on C and on the weights of the weighted-BP decoder. The cost function at iteration t is: f (C,t) cost = x (F) vc (t) − y(C) vc (t) (3.8) Thus, the optimization problem results in looking for the weights, stored in w opt, (C) that minimize the cost function, for each iteration t: w (C,t) opt = arg min w ( x (F) vc (t) − y (C) vc (t) ) (3.9) Indeed, we will solve the optimization problem for each iteration, by assuming that the correction of stage t will depend only on previous iterations. Let us point out is that the mutual information of a message, on a given edge, at a given iteration, quantifies the “quality” of this edge, i.e., how much this edge is involved in bad topologies (as cycles or combination of cycles). Experiments showed the difference between mutual information of messages on edges involved in very short cycles, and the mutual information of messages on other edges. This is consistent with the fact that errors are more likely to happen on variable nodes involved in such topologies. The next section deals with the way to handle this optimization problem. 3.3.4 Solving the minimization problem Backpropagation of the error gradient To solve the minimization problem, one may think to consider the neural network which would process the mutual information. Indeed, we have seen in the section 1 of this chapter that, with binary LDPCcodes, at both check or variable node sides, the mutual information of outgoing messages can be expressed as a sum of functions of the mutual information of incoming messages, using the J(.) function, provided that the message independence assumption is fulfilled (see equation 1.20). This expression of mutual information, with sums, allows to consider the ANN of the type of figure 3.4, made of only summator neurons. This ANN would compute the mutual information of messages in the cycle-free case. Then, this ANN would be a multi-layer perceptron , and it would be possible to apply the well-known backpropagation of the error gradient algorithm  in order to find the weights leading to the minimization of the cost function. For this supervised learning method, the cost function would be the one of equation 3.8, and the expected value for each output neuron would be the mutual information given by the EXIT curve of the cycle-free code ensemble. Since each neuron corresponds to an edge of the Tanner graph, the output, compared to the expected value, would be the mutual information measured on this edge by empirical mean, when decoding the code C.
3.3 Machine Learning Methods for Decoder Design 107 The neural network equations would then be used to adapt the weights, thereby considering that the mutual information has been obtained by the sum equations with the J(.) function. We can see the paradox of this method: The error minimization by back propagating the error gradient is performed based on the neural network equations which assume the absence of cycle whereas the actual output is the mutual information of messages on the cycle graph of C, and thus cannot respect the hypothesis. This is problematic since we want the weights to balance the message dependencies. This is the reason why we cannot use such a supervised learning approach for error minimization. Genetic Algorithm to solve the optimization problem The cost function defined in equation 3.8, we choose to minimize, has no analytical expression. Therefore, we are going to choose an optimization algorithm which does not require analytical expression of the cost function. We have decided to use a genetic algorithm . The flow of the optimization procedure is depicted on figure 3.7. An allele of the population vectors is made of weights for the t th iteration: weights w (2t) to balance messages going out of variable nodes and weights w (2t+1) to balance messages going out of check nodes. The size of the vectors handled by the genetic algorithm is D = ∑ i (d v (i) − 1) ∗ d v (i) + ∑ j (d c (j) − 1) ∗ d c (j) where d v (i) and d c (j) are the connection degrees of the i th variable node and j th check node, respectively. In practice, we have implemented the genetic algorithm, thanks to the C library PGApack Parallel Genetic Algorithm Library provided at . We have tried to find weights for the MacKay (3,6) code with code length N = 96 at various SNRs. For a population size of 200 vectors, N c = 10000 and N iter = 10, the algorithm takes about a week on a last generation CPU. 3.3.5 Estimating the mutual information To implement the above approach, we have to evaluate the mutual information averaged over all the edges of the graph, at a given iteration. To do so, we use a mean estimator for the expectation of definition 5. We set the SNR, and then send a given number, say N c , of noisy codewords. Then we evaluate the mutual information as: 1 − 1 ∑N c log N 2 (1 + e −w(t,n)) (3.10) c n=1 where w (t,n) is any message of the chosen kind (from check-to-variable or variable-tocheck) of the graph at the t th iteration when the n th observation is received. This has to be done to evaluate the cost function for each population vector. For good convergence