10.07.2015 Views

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.478 39 — The Single Neuron as a Classifierglobal x ;global t ;for l = 1:L# x is an N * I matrix containing all the input vectors# t is a vector of length N containing all the targets# loop L timesa = x * w ; # compute all activationsy = sigmoid(a) ;# compute outputse = t - y ; # compute errorsg = - x’ * e ;# compute the gradient vectorw = w - eta * ( g + alpha * w ) ; # make step, using learning rate eta# <strong>and</strong> weight decay alphaendforfunction f = sigmoid ( v )f = 1.0 ./ ( 1.0 .+ exp ( - v ) ) ;endfunctionAlgorithm 39.5. Octave sourcecode for a gradient descentoptimizer of a single neuron,batch learning, with optionalweight decay (rate alpha).Octave notation: the instructiona = x * w causes the (N × I)matrix x consisting of all theinput vectors to be multiplied bythe weight vector w, giving thevector a listing the activations forall N input vectors; x’ meansx-transpose; the single comm<strong>and</strong>y = sigmoid(a) computes thesigmoid function of all elements ofthe vector a.(a)(b)20-2-4-6-8-10α = 0.01 α = 0.1 α = 1w0w1w2-121 10 100 1000 10000 100000-0.5-0.5 0 0.5 1 1.5 2 2.5 376532.521.510.50G(w)M(w)210-1-2-3-4w0w1w2-51 10 100 1000 10000 10000032.521.510.50-0.5-0.5 0 0.5 1 1.5 2 2.5 3765G(w)M(w)210-1-2-3-4w0w1w2-51 10 100 1000 10000 10000032.521.510.50-0.5-0.5 0 0.5 1 1.5 2 2.5 3765G(w)M(w)Figure 39.6. The influence ofweight decay on a single neuron’slearning. The objective function isM(w) = G(w) + αE W (w). Thelearning method was as infigure 39.4. (a) Evolution ofweights w 0 , w 1 <strong>and</strong> w 2 . (b)Evolution of weights w 1 <strong>and</strong> w 2 inweight space shown by points,contrasted with the trajectoryfollowed in the case of zero weightdecay, shown by a thin line (fromfigure 39.4). Notice that for thisproblem weight decay has aneffect very similar to ‘earlystopping’. (c) The objectivefunction M(w) <strong>and</strong> the errorfunction G(w) as a function ofnumber of iterations. (d) Thefunction performed by the neuronafter 40 000 iterations.444(c)32323211101 10 100 1000 10000 1000001001 10 100 1000 10000 1000001001 10 100 1000 10000 10000010888666(d)42424200 2 4 6 8 1000 2 4 6 8 1000 2 4 6 8 10

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!