23.02.2015 Views

Machine Learning - DISCo

Machine Learning - DISCo

Machine Learning - DISCo

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

CHAPTER 12 COMBINING INDUCTIVE AND ANALYTICAL LEARNING 349<br />

PROP considers the squared error between the specified training derivative and<br />

the actual derivative of the learned neural network. The modified error function<br />

is<br />

where p is a constant provided by the user to determine the relative importance<br />

of fitting training values versus fitting training derivatives. Notice the first term<br />

in this definition of E is the original squared error of the network versus training<br />

values, and the second term is the squared error in the network versus training<br />

derivatives.<br />

Simard et al. (1992) give the gradient descent rule for minimizing this extended<br />

error function E. It can be derived in a fashion analogous to the derivation<br />

given in Chapter 4 for the simpler BACKPROPAGATION rule.<br />

12.4.2 An Illustrative Example<br />

Simard et al. (1992) present results comparing the generalization accuracy of TAN-<br />

GENTPROP and purely inductive BACKPROPAGATION for the problem of recognizing<br />

handwritten characters. More specifically, the task in this case is to label images<br />

containing a single digit between 0 and 9. In one experiment, both TANGENT-<br />

PROP and BACKPROPAGATION were trained using training sets of varying size, then<br />

evaluated based on their performance over a separate test set of 160 examples.<br />

The prior knowledge given to TANGENTPROP was the fact that the classification<br />

of the digit is invariant of vertical and horizontal translation of the image (i.e.,<br />

that the derivative of the target function was 0 with respect to these transformations).<br />

The results, shown in Table 12.4, demonstrate the ability of TANGENTPROP<br />

using this prior knowledge to generalize more accurately than purely inductive<br />

BACKPROPAGATION.<br />

Training<br />

set size<br />

Percent error on test set<br />

TANGENTPROP I BACKPROPAGATION<br />

10<br />

20<br />

40<br />

80<br />

160<br />

320<br />

TABLE 12.4<br />

Generalization accuracy for TANGENTPROP and BACKPROPAGATION, for handwritten digit recognition.<br />

TANGENTPROP generalizes more accurately due to its prior knowledge that the identity of the digit<br />

is invariant of translation. These results are from Sirnard et al. (1992).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!