Unsupervised Recursive Sequence Processing - Institute of ...

More documents

Recommendations

Info

R N is provided, N denoting the number of neurons, which explicitly represents the contextual map activation of all neurons in the previous time step. Thus, the temporal context is represented in this model in an N-dimensional vector space, N denoting the number of neurons. One can think of the context as an explicit storage of the activity profile of the whole map in the previous time step. More precisely, distance is recursively computed by d RecSOM ((s 1 , . . . , s t ), n j ) = η 1 ‖s 1 − w j ‖ 2 + η 2 ‖C RecSOM (s 2 , . . . , s t ) − c j ‖ 2 where η 1 , η 2 > 0. C RecSOM (s) = (exp(−d RecSOM (s, n 1 )), . . . , exp(−d RecSOM (s, n N ))) constitutes the context. Note that this vector is almost the vector of distances of all neurons computed in the previous time step. These are exponentially transformed to avoid an explosion of the values. As before, the above distance can be decomposed into two parts: the winner computation similar to standard SOM, and, as in the case of RSOM and TKM, a term which assesses the context match. For Rec- SOM the context match is a comparison of the current context when processing the sequence, i.e. the vector of distances of the previous time step, and the expected context c j which is stored at neuron j. That is to say, RecSOM explicitly stores context vectors for each neuron and compares these context vectors to their expected contexts during the recursive computation. Since the entire map activation is taken into account, sequences of any given fixed length can be stored, if enough neurons are provided. Thus, the representation space for context is no longer restricted by the weight space and its capacity now scales with the number of neurons. For RecSOM, training is done in Hebbian style for both weights and contexts. Denote by n j0 the winner for sequence entry i, then the weight changes are △w j = ɛ · h σ (nhd(n j0 , n j )) · (s i − w j ) and the context adaptation is △c j = ɛ ′ · h σ (nhd(n j0 , n j )) · (C RecSOM (s i+1 , . . . , s t ) − c j ) The latter update rule makes sure that the context vectors of the winner neuron and its neighborhood become more similar to the current context vector C RecSOM , which is computed when the sequence is processed. The learning rates are ɛ, ɛ ′ ∈ (0, 1). As demonstrated in [41], this richer representation of context allows a better quantization of time series data. In [41], various quantitative measures to evaluate trained recursive maps are proposed, such as the temporal quantization error and the specialization of neurons. RecSOM turns out to be clearly superior to TKM and RSOM with respect to these measures in the experiments provided in [41]. 8
However, the dimensionality of the context for RecSOM equals the number of neurons N, making this approach computationally quite costly. The training of very huge maps with several thousands of neurons is no longer feasible for RecSOM. Another drawback is given by the exponential activity transfer function in the term of C RecSOM ∈ R N : specialized neurons are characterized by the fact that they have only one or a few well-matching predecessors contributing values of about 1 to C RecSOM ; however, for a large number N of neurons, the noise influence on C RecSOM from other neurons destroys the valid context information, because even poorly matching neurons – contributing values of slightly above 0 – are summed up in the distance computation. SOM for structured data (SOMSD) as proposed in [10,11] is an efficient and still powerful alternative. SOMSD represents temporal context by the corresponding winner index in the previous time step. Assume that a regular l-dimensional lattice of neurons is given. Each neuron n j is equipped with a weight w j ∈ R n and a value c j ∈ R l which represents a compressed version of the context, the location of the previous winner within the map [10]. The space in which context vectors are represented is the vector space R l for this model. The distance of sequence s = (s 1 , . . . , s t ) from neuron n j is recursively computed by d SOMSD ((s 1 , . . . , s t ), n j ) = η 1 ‖s 1 − w j ‖ 2 + η 2 ‖C SOMSD (s 2 , . . . , s n ) − c j ‖ 2 where C SOMSD (s) equals the location of neuron n j with smallest d SOMSD (s, n j ) in the grid topology. Note that the context C SOMSD is an element in a low-dimensional vector space, usually only R 2 . The distance between contexts is given by the Euclidean metric within this vector space. The learning dynamic of SOMSD is very similar to the dynamic of RecSOM: the current distance is defined as a mixture of two terms, the match of the neuron’s weight and the current sequence entry, and the match of the neuron’s context weight and the context currently computed in the model. Thereby, the current context is represented by the location of the winning neuron of the map in the previous time step. This dynamic imposes a temporal bias towards those neurons which context vector matches the winner location of the previous time step. It relies on the fact that a lattice structure of neurons is defined and a distance measure of locations within the map is defined. Due to the compressed context information, this approach is very efficient in comparison to RecSOM and also very large maps can be trained. In addition, noise is suppressed in this compact representation. However, more complex context information is used than for TKM and RSOM, namely the location of the previous winner in the map. As for RecSOM, Hebbian learning takes place for SOMSD, because weight vectors and contexts are adapted in a well-known correction manner, here by the formulas △w j = ɛ · h σ (nhd(n j0 , n j )) · (s i − w j ) 9
Page 1 and 2: Unsupervised Recursive Sequence Pro
Page 3 and 4: This framework directly generalizes
Page 5 and 6: place by the update rule △w j =
Page 7: ecursive partitioning very much lik
Page 11 and 12: In the following, we focus on the c
Page 13 and 14: , ! ! , Fig. 1. Hyperbolic
Page 15 and 16: with reverse indexing notation, i.e
Page 17 and 18: The number of specialized neurons f
Page 19 and 20: to Voegtlin’s results, we observe
Page 21 and 22: * 6 2 6 5 : 8 : 2 8 5 - Type P (0)
Page 23 and 24: TVVEBTSSX SEBTSSX VVEBTXX EBTSSSX E
Page 25 and 26: follows: a stands for (0, 0) + µ,
Page 27 and 28: 6 Conclusions We have presented a s
Page 29 and 30: [18] S. Kaski, T. Honkela, K. Lagus

Unsupervised Recursive Sequence Processing - Institute of ...

Create successful ePaper yourself

Delete template?

Save as template?