23.02.2015 Views

Machine Learning - DISCo

Machine Learning - DISCo

Machine Learning - DISCo

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

354 MACHINE LEARNING<br />

hold when x = xi. In the more general case where the target function has multiple<br />

output units, the gradient is computed for each of these outputs. This matrix of<br />

gradients is called the Jacobian of the target function.<br />

To see the importance of these training derivatives in helping to learn the<br />

target network, consider the derivative ,E~~~i,,e. If the domain theory encodes<br />

the knowledge that the feature Expensive is irrelevant to the target function Cup,<br />

then the derivative ,E~~e~i,,e extracted from the explanation will have the value<br />

zero. A derivative of zero corresponds to the assertion that a change in the feature<br />

Expensive will have no impact on the predicted value of Cup. On the other<br />

hand, a large positive or negative derivative corresponds to the assertion that the<br />

feature is highly relevant to determining the target value. Thus, the derivatives<br />

extracted from the domain theory explanation provide important information for<br />

distinguishing relevant from irrelevant features. When these extracted derivatives<br />

are provided as training derivatives to TANGENTPROP for learning the target network<br />

Cup,,,,,,, they provide a useful bias for guiding generalization. The usual<br />

syntactic inductive bias of neural network learning is replaced in this case by the<br />

bias exerted by the derivatives obtained from the domain theory.<br />

Above we described how the domain theory prediction can be used to generate<br />

a set of training derivatives. To be more precise, the full EBNN algorithm<br />

is as follows. Given the training examples and domain theory, EBNN first creates<br />

a new, fully connected feedforward network to represent the target function.<br />

This target network is initialized with small random weights, just as in BACK-<br />

PROPAGATION. Next, for each training example (xi, f (xi)) EBNN determines the<br />

corresponding training derivatives in a two-step process. First, it uses the domain<br />

theory to predict the value of the target function for instance xi. Let A(xi) denote<br />

this domain theory prediction for instance xi. In other words, A(xi) is the<br />

function defined by the composition of the domain theory networks forming the<br />

explanation for xi. Second, the weights and activations of the domain theory networks<br />

are analyzed to extract the derivatives of A(xi) 'with respect to each of the<br />

components of xi (i.e., the Jacobian of A(x) evaluated at x = xi). Extracting these<br />

derivatives follows a process very similar to calculating the 6 terms in the BACK-<br />

PROPAGATION algorithm (see Exercise 12.5). Finally, EBNN uses a minor variant<br />

of the TANGENTPROP algorithm to train the target network to fit the following error<br />

function<br />

where<br />

Here xi denotes the ith training instance and A(x) denotes the domain theory<br />

prediction for input x. The superscript notation xj denotes the jth component of<br />

the vector x (i.e., the jth input node of the neural network). The coefficient c is<br />

a normalizing constant whose value is chosen to assure that for all i, 0 5 pi 5 1.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!