01.08.2013 Views

Information Theory, Inference, and Learning ... - MAELabs UCSD

Information Theory, Inference, and Learning ... - MAELabs UCSD

Information Theory, Inference, and Learning ... - MAELabs UCSD

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981<br />

You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.<br />

39.3: Training the single neuron as a binary classifier 477<br />

10<br />

x2<br />

8<br />

3<br />

w2 2.5<br />

2<br />

2<br />

0<br />

6<br />

1.5<br />

-2<br />

4<br />

2<br />

(a)<br />

0<br />

0 2 4 6 8<br />

1<br />

0.5<br />

0<br />

(c) -0.5<br />

-0.5 0 0.5 1 1.5 2 2.5 3w1<br />

10x1<br />

-4<br />

-6<br />

-8<br />

w0<br />

w1<br />

w2<br />

10<br />

8<br />

10<br />

8<br />

-10<br />

(b)<br />

-12<br />

1 10 100 1000 10000 100000<br />

6<br />

4<br />

2<br />

(f)<br />

0<br />

0 2 4 6 8 10<br />

10<br />

8<br />

6<br />

4<br />

2<br />

(h)<br />

0<br />

0 2 4 6 8 10<br />

10<br />

8<br />

6<br />

4<br />

2<br />

(j)<br />

0<br />

0 2 4 6 8 10<br />

6<br />

4<br />

2<br />

(g)<br />

0<br />

0 2 4 6 8 10<br />

10<br />

8<br />

6<br />

4<br />

2<br />

(i)<br />

0<br />

0 2 4 6 8 10<br />

10<br />

8<br />

6<br />

4<br />

2<br />

(k)<br />

0<br />

0 2 4 6 8 10<br />

Figure 39.4. A single neuron learning to classify by gradient descent. The neuron has two weights w1<br />

<strong>and</strong> w2 <strong>and</strong> a bias w0. The learning rate was set to η = 0.01 <strong>and</strong> batch-mode gradient<br />

descent was performed using the code displayed in algorithm 39.5. (a) The training data.<br />

(b) Evolution of weights w0, w1 <strong>and</strong> w2 as a function of number of iterations (on log scale).<br />

(c) Evolution of weights w1 <strong>and</strong> w2 in weight space. (d) The objective function G(w) as a<br />

function of number of iterations. (e) The magnitude of the weights EW (w) as a function of<br />

time. (f–k) The function performed by the neuron (shown by three of its contours) after 30,<br />

80, 500, 3000, 10 000 <strong>and</strong> 40 000 iterations. The contours shown are those corresponding to<br />

a = 0, ±1, namely y = 0.5, 0.27 <strong>and</strong> 0.73. Also shown is a vector proportional to (w1, w2).<br />

The larger the weights are, the bigger this vector becomes, <strong>and</strong> the closer together are the<br />

contours.<br />

(d)<br />

(e)<br />

7<br />

6<br />

5<br />

4<br />

3<br />

2<br />

1<br />

G(w)<br />

0<br />

1 10 100 1000 10000 100000<br />

400<br />

350<br />

300<br />

250<br />

200<br />

150<br />

100<br />

50<br />

E_W(w)<br />

0<br />

1 10 100 1000 10000 100000

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!