102 Chapitre 3 : Machine Learning Methods for Code and Decoder Design the impulse method. This means that we must decide what should be the value of the messages after a given number of iterations. Our heuristic was to penalize edges connected to variable nodes whose value, in the smallest weight codewords, is zero in order to optimize the minimum distance of the code. However, this this tends to modify the Tanner graph and to turn it into a new code, whose minimum distance may have no relation with the minimum distance we were trying to optimize at first. To see how finding a new Tanner graph by pruning a mother code is not fitted to be solved by pruning an artificial neural network, we can express in another way the above problem of the choice of the cost function: Modeling the belief propagation decoder by an artificial neural network, as done in figure 3.4, leads to consider the BP decoder as a classifier which, to a given noisy observation of a codeword, associates the most likely sent codeword. However, the above pruning approach aims at finding a Tanner graph. This does not consist in finding a good classifier for a given problem, as neural networks are meant to do, but in finding classes (the codewords) on which the classifier depends. Thus, due to the difficulty (impossibility?) to find the relation between minimum distances of the mother code and its pruned version, we could not find a relevant cost function in such a framework. Instead, we decided to focus on a better posed problem and to propose a relevant approach. 3.3 Machine Learning Methods for Decoder Design In this section, we switch to another problem than code design. We consider a given code which sets the classes, and we are going to look for the best classifier to classify inputs in the right classes. The classifier is the decoder. The approach is detailed below. 3.3.1 Decoding is a classification problem As aforementioned, the decoding problem can be seen as a classification problem, where, for each noisy observation received from the channel, one wants to find the corresponding sent codeword. If we assume a linear code of length N with K information bits and M = N −K redundancy bits, decoding consists in finding to which class the observation belongs, among 2 K classes corresponding to all possible codewords, in the vector space of dimension K. Hence, a class corresponds to a codeword and is made of all the noisy variants of this codeword such that, for all i ∈ 1, . . .,N, if the i th bit of the observation is different from the i th bit of the codeword, then the Hamming distance between the codeword and the observation must be lower than dloc min (i) , with d loc 2 min(i) being the local minimum distance of bit i in the code, as defined in . In other words, the class of a given codeword c corresponds to the set of all points closer to c than to any other codeword. A class is therefore the interior of a convex polytope (in some cases unbounded) called the Dirichlet domain or Voronoi cell for c. The set of such polytopes tessellates the whole space, and corresponds to the Voronoi tessellation of all codewords (i.e., to the
3.3 Machine Learning Methods for Decoder Design 103 Figure 3.5 : Voronoi diagram (or Dirichlet tessellation): the partitioning of a plane with n points into convex polygons such that each polygon contains exactly one generating point and every point in a given polygon is closer to its generating point than to any other. code). Hence, we know theoretically the optimal classifier, which corresponds to implement a K-dimensional Voronoi partition of the Euclidean space GF(2) N with codewords as cell centroids, as sketched on figure 3.5. However, implementing this partitioning is intractable in practice for long codes, and corresponds exactly to implement maximumlikelihood (ML) decoding. That is why this classification problem is usually solved with a BP decoder, which actually only implements an approximation of the Voronoi tessellation frontiers, i.e., of ML decoding. Many previous works [19, 20] have characterized the phenomenon which arises when BP decoder is used on loopy graphs, and which emphasizes the difference between ML decodingand BP decoding. ML decoding is always able to find the codeword closest to the observation (even though it makes errors because this closest codeword is not the one which has been sent), whereas BP decoding may converge to fixed points which are not codewords. These points are usually called pseudo-codewords, and it has been shown  that they are of first importance in the loss of performance of BP decoding compared to maximum-likelihood decoding. To try to improve the BP decoding, we focus on pseudo-codewords, but indirectly. Indeed, we make the assumption that pseudo-codewords are the indicators that the frontiers of the classifier implemented by the BP decoder are not the frontiers of ML decoding. Hence, we are going to try to find a correction to BP decoding by considering it as a classifier.