23.02.2015 Views

Machine Learning - DISCo

Machine Learning - DISCo

Machine Learning - DISCo

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

12.4.3 Remarks<br />

To summarize, TANGENTPROP uses prior knowledge in the form of desired derivatives<br />

of the target function with respect to transformations of its inputs. It combines<br />

this prior knowledge with observed training data, by minimizing an objective function<br />

that measures both the network's error with respect to the training example<br />

values (fitting the data) and its error with respect to the desired derivatives (fitting<br />

the prior knowledge). The value of p determines the degree to which the network<br />

will fit one or the other of these two components in the total error. The behavior<br />

of the algorithm is sensitive to p, which must be chosen by the designer.<br />

Although TANGENTPROP succeeds in combining prior knowledge with training<br />

data to guide learning of neural networks, it is not robust to errors in the prior<br />

knowledge. Consider what will happen when prior knowledge is incorrect, that<br />

is, when the training derivatives input to the learner do not correctly reflect the<br />

derivatives of the true target function. In this case the algorithm will attempt to fit<br />

incorrect derivatives. It may therefore generalize less accurately than if it ignored<br />

this prior knowledge altogether and used the purely inductive BACKPROPAGATION<br />

algorithm. If we knew in advance the degree of error in the training derivatives,<br />

we might use this information to select the constant p that determines the relative<br />

importance of fitting training values and fitting training derivatives. However, this<br />

information is unlikely to be known in advance. In the next section we discuss<br />

the EBNN algorithm, which automatically selects values for p on an example-byexample<br />

basis in order to address the possibility of incorrect prior knowledge.<br />

It is interesting to compare the search through hypothesis space (weight<br />

space) performed by TANGENTPROP, KBANN, and BACKPROPAGATION. TANGENT-<br />

PROP incorporates prior knowledge to influence the hypothesis search by altering<br />

the objective function to be minimized by gradient descent. This corresponds to<br />

altering the goal of the hypothesis space search, as illustrated in Figure 12.6. Like<br />

BACKPROPAGATION (but unlike KBANN), TANGENTPROP begins the search with an<br />

initial network of small random weights. However, the gradient descent training<br />

rule produces different weight updates than BACKPROPAGATION, resulting in a different<br />

final hypothesis. As shown in the figure, the set of hypotheses that minimizes<br />

the TANGENTPROP objective may differ from the set that minimizes the BACKPROP-<br />

AGATION objective. Importantly, if the training examples and prior knowledge are<br />

both correct, and the target function can be accurately represented by the ANN,<br />

then the set of weight vectors that satisfy the TANGENTPROP objective will be a<br />

subset of those satisfying the weaker BACKPROPAGATION objective. The difference<br />

between these two sets of final hypotheses is the set of incorrect hypotheses that<br />

will be considered by BACKPROPAGATION, but ruled out by TANGENTPROP due to<br />

its prior knowledge.<br />

Note one alternative to fitting the training derivatives of the target function<br />

is to simply synthesize additional training examples near the observed training<br />

examples, using the known training derivatives to estimate training values for<br />

these nearby instances. For example, one could take a training image in the above<br />

character recognition task, translate it a small amount, and assert that the trans-

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!