Unsupervised Recursive Sequence Processing - Institute of ...

More documents

Recommendations

Info

2 Unsupervised processing of sequences Let input sequences be denoted by s = (s 1 , . . . , s t ) with entries s i in an alphabet Σ which is embedded in a real-vector space R n . The element s 1 denotes the most recent entry of the sequence and t is the sequence length.The set of sequences of arbitrary length over symbols Σ is Σ ∗ , and Σ l is the space of sequences of length l. Popular recursive sequence processing models are the temporal Kohonen map, recurrent SOM, recursive SOM, and SOM for structured data [8,11,39,41]. The SOMSD has originally been proposed for the more general case of tree structure processing. Here, only sequences, i.e. trees with a node fan-out of 1 are considered. As for standard SOM, a recursive neural map is given by a set of neurons n 1 , . . . , n N . The neurons are arranged on a grid, often a two-dimensional regular lattice. All neurons are equipped with weights w i ∈ R n . Two important ingredients have to be defined to specify self-organizing network models: the data metric and the network update. The metric addressed the question, how an appropriate distance can be defined to measure the similarity of possibly sequential input signals to map units. For this purpose, the sequence entries are compared with the weight parameters stored at the neuron. The set of input signals for which a given neuron i is closest, is called the receptive field of neuron i, and neuron i is the winner and representative for all these signals within its receptive field. In the following, we will recall the distance computation for the standard SOM and also review several ways found in the literature to compute the distance of a neuron from a sequential input. Apart from the metric, the update procedure or learning rule for neurons to adapt to the input is essential. Commonly, Hebbian or competitive 1 learning takes place, referring to the following scheme: the parameters of the winner and its neighborhood within a given lattice structure are adapted such that their response to the current signal is increased. Thereby, neighborhood cooperation ensures a topologically faithful mapping. Standard SOM relies on a simple winner-takes-all scheme and does not account for the temporal structure of inputs. For a stimulus s i ∈ R n the neuron n j responds, for which the squared distance d SOM (s i , w j ) = ‖s i − w j ‖ 2 , s i ∈ R n is minimum, where ‖ · ‖ is the standard Euclidean metric. Training starts with randomly initialized weights w i and adapts the parameters iteratively as follows: denote by n 0 the index of the winning neuron for the input signal s i . Assume a function nhd(n j , n k ) which indicates the degree of neighborhood of neuron j and k within the chosen lattice structure is fixed. Adaptation of all weights w j takes 1 We will use these two terms interchangeably in the following. 4
place by the update rule △w j = ɛ · h σ (nhd(n j0 , n j )) · (s i − w j ) whereby ɛ ∈ (0, 1) is the learning rate. The function h σ describes the amount of neuron adaptation in the neighborhood of the winner: often the Gaussian bell function h σ (x) = exp(−x 2 /σ 2 ) is chosen, of which the shape is narrowed during training by decreasing σ to ensure the neuron specialization. The function nhd(n j , n k ) which measures the degree of neighborhood of the neurons n i and n j within the lattice might be induced by the simple Euclidean distance between the neuron coordinates in a rectangular grid or by the shortest distance in a graph connecting the two neurons. Recursive models substitute the one-shot distance computation for a single entry s i by a recursive formula over all entries of a given sequence s. For all models, sequences are presented recursively, and the current sequence entry s i is processed in the context which is set by its predecessors s i+1 , s i+2 , . . .. 2 The models differ with respect to the representation of the context and in the way that the context influences further computation. The Temporal Kohonen Map (TKM) computes the distance of s = (s 1 , . . . , s t ) from neuron n j labeled with w j ∈ R n by the leaky integration t∑ d TKM (s, n j ) = η(1 − η) i−1 ‖s i − w j ‖ 2 i=1 where η ∈ (0, 1) is a memory parameter [8]. A neuron becomes winner if the current entry s 1 is close to its weight w j as in standard SOM, and, in addition, the remaining sum (1 − η)‖s 2 − w j ‖ + (1 − η) 2 ‖s 3 − w j ‖ + . . . is also small. This additional term integrates the distances of the neuron’s weight from previous sequence entries weighted by an exponentially decreasing decay factor (1 − η) i−1 . The context resulting from previous sequence entries is pointing towards neurons of which the weights have been close to previous entries. Thus, the winner is a neuron whose weight is close to the average presented signal for the recent time steps. The training for the TKM takes place by Hebbian learning in the same way as for the standard SOM, making well-matching neurons more similar to the current input than bad-matching neurons. At the beginning, weights w j are initialized randomly and then iteratively adapted when data is presented. For adaptation assume that a 2 We use reverse indexing of the sequence entries, s 1 denoting the most recent entry, s 2 , s 3 , . . . its predecessors. 5
Page 1 and 2: Unsupervised Recursive Sequence Pro
Page 3: This framework directly generalizes
Page 7 and 8: ecursive partitioning very much lik
Page 9 and 10: However, the dimensionality of the
Page 11 and 12: In the following, we focus on the c
Page 13 and 14: , ! ! , Fig. 1. Hyperbolic
Page 15 and 16: with reverse indexing notation, i.e
Page 17 and 18: The number of specialized neurons f
Page 19 and 20: to Voegtlin’s results, we observe
Page 21 and 22: * 6 2 6 5 : 8 : 2 8 5 - Type P (0)
Page 23 and 24: TVVEBTSSX SEBTSSX VVEBTXX EBTSSSX E
Page 25 and 26: follows: a stands for (0, 0) + µ,
Page 27 and 28: 6 Conclusions We have presented a s
Page 29 and 30: [18] S. Kaski, T. Honkela, K. Lagus

Unsupervised Recursive Sequence Processing - Institute of ...

Create successful ePaper yourself

Delete template?

Save as template?