17.01.2013 Views

Chapter 2. Prehension

Chapter 2. Prehension

Chapter 2. Prehension

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

392 A pp e n dic e s<br />

where q is a small positive number defining the traveler’s speed.<br />

Gradient descent is a very general technique: first, define some<br />

quantity to minimize (a dependent variable) in terms of controllable<br />

quantities (the independent variables). Then, take the gradient and use<br />

it, as shown above, to ‘tweak’ the independent variables, moving<br />

‘downhill’. The process is repeated until a minimum value is reached,<br />

at which time the gradient wil be zero.<br />

In a neural network, the independent variables are the synaptic<br />

weights. The dependent value to be minimized is taken as some<br />

measure of the difference between the actual and desired perfomance<br />

of the network. Thus, gradient descent is used to decrease the<br />

performance error. For example, imagine a network with a single<br />

output neuron (neuron i), which is being trained with a pattern p<br />

which is a collection of input/output pairs. Let the desired output<br />

value be i, and the actual value of.. Define the error to be E = 1/2($,i<br />

- Opi)’. ken for all weights wij w ich feed into unit i,<br />

dE/dwij = - 1 (tpi - opi>dopi/dwij<br />

Recalling the neuron definition equations,<br />

then, by the chain rule for differentiation,<br />

Now, by the gradient descent definition,<br />

For a linear neuron f (ai) = 1 for all ai, so<br />

(8)<br />

Awij = q (tpi - Opiloj (12)<br />

which is the delta rule. (The complete derivation is shown in<br />

Rumelhart et al., 1986a).<br />

For networks with hidden layers, the generalized delta rule is used<br />

(see Figure C.6), which states:

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!