Unsupervised Recursive Sequence Processing - Institute of ...

More documents

Recommendations

Info

sequence s is given, with s i denoting the current entry and n j0 denoting the best matching neuron for this time step. Then the weight correction term is △w j = ɛ · h σ (nhd(n j0 , n j )) · (s i − w j ) As discussed in [23], the learning rule of TKM is unstable and leads to only suboptimal results. More advanced, the Recurrent SOM (RSOM) leaky integration first sums up the weighted directions and afterwards computes the distance [39] t∑ 2 d RSOM (s, n j ) = η(1 − η) i−1 (s ∥ i − w j ) . ∥ i=1 It represents the context in a larger space than TKM since the vectors of directions are stored instead of the scalar Euclidean distance. More importantly, the training rule is changed. RSOM derives its learning rule directly from the objective to minimize the distortion error on sequences and thus adapts the weights towards the vector of integrated directions: △w j = ɛ · h σ (nhd(n j0 , n j )) · y j (i) whereby y j (i) = t∑ η(1 − η) i−1 (s i − w j ) . i=1 Again, the already processed part of the sequence produces a context notion, and the neuron becomes the winner for the current entry of which the weight is most similar to the average entry for the past time steps. The training rule of RSOM takes this fact into account by adapting the weights towards this averaged activation. We will not refer to this learning rule in the following. Instead, the way in which sequences are represented within these two models, and the ways to improve the representational capabilities of such maps will be of interest. Assuming vanishing neighborhood influences σ for both cases TKM and RSOM, one can analytically compute the internal representation of sequences for these two models, i.e. weights with response optimum to a given sequence s = (s 1 , . . . , s t ): the weight w is optimum for which t∑ t∑ w = (1 − η) i−1 s i / (1 − η) i−1 i=1 i=1 holds [40]. This explains the encoding scheme of the winner-takes-all dynamics of TKM and RSOM. Sequences are encoded in the weight space by providing a 6
ecursive partitioning very much like the one generating fractal Cantor sets. As an example for explaining this encoding scheme, assume that binary sequences {0, 1} l are dealt with. For η = 0.5, the representation of sequences of fixed length l corresponds to an encoding in a Cantor set: the interval [0, 0.5) represents sequences with most recent entry s 1 = 0, interval [0.5, 1) contains only codes of sequences with most recent entry 1. Recursive decomposition of the intervals allows to recover further entries of the sequence: [0, 0.25) stands for the beginning 00. . . of a sequence, [0.25, 0.5) stands for 01, [0.5, 0.75) for 10, and [0.75, 1) represents 11. By further subdivision, [0, 0.125) stands for the beginning 000. . ., [0.125, 0.25) for 001, and so on. Similar encodings can be found for alternative choices of η. Sequences over discrete sets Σ = {0, . . . , d} ⊂ R can be uniquely encoded using this fractal partitioning if η < 1/d. For larger η, the subsets start to overlap, i.e. codes are no longer sorted according to their last symbols, and a code might stand for two or more different sequences. A very small η ≪ 1/d, in turn, results in an only sparsely used space; for example the interval (d · η, 1] does not contain a valid code. Note that the explicit computation of this encoding stresses the superiority of the RSOM learning rule compared to TKM update, as pointed out in [40]: the fractal code is a fixed point for the dynamics of RSOM training, whereas TKM converges towards the borders of the intervals, preventing the optimum fractal encoding scheme from developing on its own. Fractal encoding is reasonable, but limited: it is obviously restricted to discrete sequence entries, and real values or noise might destroy the encoded information. Fractal codes do not differentiate between sequences of different length; e.g. the code 0 gives optimum response to 0,00, 000, and so forth. Sequences with this kind of encoding cannot be distinguished. In addition, the number of neurons does not take influence on the expressiveness of the context space. The range in which sequences are encoded is the same as the weight space. Thus, both the size of the weight space and the computation accuracy are limiting the number of different contexts, independently of the number of neurons of the network. Based on these considerations, richer and in particular explicit representations of context have been proposed. The models that we introduce in the following extend the parameter space of each neuron j by an additional vector c j , which is used to explicitly store the sequential context within which a sequence entry is expected. Depending on the model, the context c j is contained in a representation space with different dimensionality. However, in all cases this space is independent of the weight space and extends the expressiveness of the models in comparison to TKM and RSOM. For each model, we will define the basic ingredients: what is the space of context representations? How is the distance between a sequence entry and neuron j computed, taking into account its temporal context c j ? How are the weights and contexts adapted? The Recursive SOM (RecSOM) [41] equips each neuron n j with a weight w j ∈ R n that represents the given sequence entry, as usual. In addition, a vector c j ∈ 7
Page 1 and 2: Unsupervised Recursive Sequence Pro
Page 3 and 4: This framework directly generalizes
Page 5: place by the update rule △w j =
Page 9 and 10: However, the dimensionality of the
Page 11 and 12: In the following, we focus on the c
Page 13 and 14: , ! ! , Fig. 1. Hyperbolic
Page 15 and 16: with reverse indexing notation, i.e
Page 17 and 18: The number of specialized neurons f
Page 19 and 20: to Voegtlin’s results, we observe
Page 21 and 22: * 6 2 6 5 : 8 : 2 8 5 - Type P (0)
Page 23 and 24: TVVEBTSSX SEBTSSX VVEBTXX EBTSSSX E
Page 25 and 26: follows: a stands for (0, 0) + µ,
Page 27 and 28: 6 Conclusions We have presented a s
Page 29 and 30: [18] S. Kaski, T. Honkela, K. Lagus

Unsupervised Recursive Sequence Processing - Institute of ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?