Unsupervised Recursive Sequence Processing - Institute of ...
Unsupervised Recursive Sequence Processing - Institute of ...
Unsupervised Recursive Sequence Processing - Institute of ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
In a trained map, neurons spread in regions <strong>of</strong> the data space where a high sample<br />
density can be observed, resulting in large U-values at borders between clusters.<br />
Consequently, the U-Matrix forms a 3D landscape on the lattice <strong>of</strong> neurons with<br />
valleys corresponding to meaningful clusters and hills at the cluster borders. The<br />
U-Matrix <strong>of</strong> weight vectors can be constructed also for SOM-S. Based on this matrix,<br />
the sequence entries can be clustered into meaningful categories, based on<br />
which the extraction <strong>of</strong> Markov models as described above is possible. Note that<br />
the U-Matrix is built by using the weights assigned to the neurons only, while the<br />
context information <strong>of</strong> SOM-S is yet ignored. 6 However, since context information<br />
is used for training, clusters emerge which are meaningful with respect to the<br />
temporal structure, and this way they contribute implicitly to the topological ordering<br />
<strong>of</strong> the map and to the U-Matrix. Partially overlapping, noisy, and ambiguous<br />
input elements are separated during the training, because the different temporal<br />
contexts contain enough information to activate and produce characteristic clusters<br />
on the map. Thus, the temporal structure captured by the training allows a reliable<br />
reconstruction <strong>of</strong> the input sequences, which could not have been achieved by the<br />
standard SOM architecture.<br />
5 Experiments<br />
5.1 Mackey-Glass time series<br />
The first task is to learn the dynamic <strong>of</strong> the real-valued chaotic Mackey-Glass time<br />
series dx = bx(τ) + ax(τ−d) using a = 0.2, b = −0.1, d = 17. This is the same<br />
dτ 1+x(τ−d) 10<br />
setup as given in [41] making a comparison <strong>of</strong> the results possible. 7 Three types<br />
<strong>of</strong> maps with 100 neurons have been trained: a 6-neighbor map without context<br />
giving standard SOM, a map with 6 neighbors and with context (SOM-S), and<br />
a 7-neighbor map providing a hyperbolic grid with context utilization (H-SOM-<br />
S). Each run has been computed with 1.5 · 10 5 presentations starting at random<br />
positions within the Mackey-Glass series using a sample period <strong>of</strong> ∆t = 3; the<br />
neuron weights have been initialized white within [0.6, 1.4]. The context has been<br />
considered by decreasing the parameter from η = 1 to η = 0.97. The learning rate<br />
is exponentially decreased from 0.1 to 0.005 for weight and context update. Initial<br />
neighborhood cooperativity is 10 which is annealed to 1 during training.<br />
Figure 2 shows the temporal quantization error for the above setups: the temporal<br />
quantization error is expressed by the average standard deviation <strong>of</strong> the given sequence<br />
and the mean unit receptive field for 29 time steps into the past. Similar<br />
6 Preliminary experiments indicate that the context also orders topologically and yields<br />
meaningful clusters. The number <strong>of</strong> neurons in context clusters is thereby small compared<br />
to the number <strong>of</strong> neurons and statistically significant results could not be obtained.<br />
7 We would like to thank T.Voegtlin for providing data for comparison.<br />
18