10.07.2015 Views

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

Information Theory, Inference, and Learning ... - Inference Group

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links.39.2: Basic neural network concepts 4731Figure 39.2. Output of a simpleneural network as a function of itsinput.0.5-10-50x151050 x2-510 -10w = (0, 2)Input space <strong>and</strong> weight spaceFor convenience let us study the case where the input vector x <strong>and</strong> the parametervector w are both two-dimensional: x = (x 1 , x 2 ), w = (w 1 , w 2 ). Then wecan spell out the function performed by the neuron thus:1y(x; w) =. (39.10)1 + e−(w1x1+w2x2) Figure 39.2 shows the output of the neuron as a function of the input vector,for w = (0, 2). The two horizontal axes of this figure are the inputs x 1 <strong>and</strong> x 2 ,with the output y on the vertical axis. Notice that on any line perpendicularto w, the output is constant; <strong>and</strong> along a line in the direction of w, the outputis a sigmoid function.We now introduce the idea of weight space, that is, the parameter space ofthe network. In this case, there are two parameters w 1 <strong>and</strong> w 2 , so the weightspace is two dimensional. This weight space is shown in figure 39.3. For aselection of values of the parameter vector w, smaller inset figures show thefunction of x performed by the network when w is set to those values. Each ofthese smaller figures is equivalent to figure 39.2. Thus each point in w spacecorresponds to a function of x. Notice that the gain of the sigmoid function(the gradient of the ramp) increases as the magnitude of w increases.Now, the central idea of supervised neural networks is this. Given examplesof a relationship between an input vector x, <strong>and</strong> a target t, we hope to makethe neural network ‘learn’ a model of the relationship between x <strong>and</strong> t. Asuccessfully trained network will, for any given x, give an output y that isclose (in some sense) to the target value t. Training the network involvessearching in the weight space of the network for a value of w that produces afunction that fits the provided training data well.Typically an objective function or error function is defined, as a functionof w, to measure how well the network with weights set to w solves the task.The objective function is a sum of terms, one for each input/target pair {x, t},measuring how close the output y(x; w) is to the target t. The training processis an exercise in function minimization – i.e., adjusting w in such a way as tofind a w that minimizes the objective function. Many function-minimizationalgorithms make use not only of the objective function, but also its gradientwith respect to the parameters w. For general feedforward neural networksthe backpropagation algorithm efficiently evaluates the gradient of the outputy with respect to the parameters w, <strong>and</strong> thence the gradient of the objectivefunction with respect to w.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!