23.02.2015 Views

Machine Learning - DISCo

Machine Learning - DISCo

Machine Learning - DISCo

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

each hypothesis h(x) is of the form<br />

(a) Derive a gradient descent rule that minimizes the same criterion as BACKPROP-<br />

AGATION; that is, the sum of squared errors between the hypothesis and target<br />

values of the training data.<br />

(b) Derive a second gradient descent rule that minimizes the same criterion as<br />

TANGENTPROP. Consider only the single transformation s(a, x) = x + a.<br />

12.5. EBNN extracts training derivatives from explanations by examining the weights<br />

and activations of the neural networks that make up the explanation. Consider the<br />

simple example in which the explanation is formed by a single sigmoid unit with<br />

n inputs. Derive a procedure for extracting the derivative 91,=,~<br />

where xi is a<br />

particular training instance input to the unit, f(x) is the sigmoid unit output, and<br />

xi denotes the jth input to the sigmoid unit. You may wish to use the notation x!<br />

to refer to the jth component of xi. Hint: The derivation is similar to the derivation<br />

of the BACKPROPAGATION training rule.<br />

12.6. Consider again the search trace of FOCL shown in Figure 12.8. Suppose that the<br />

hypothesis selected at the first level in the search is changed to<br />

Cup t -.HasHandle<br />

Describe the second-level candidate hypotheses that will be generated by FOCL as<br />

successors to this hypothesis. You need only include those hypotheses generated<br />

by FOCL's second search operator, which uses its domain theory. Don't forget to<br />

post-prune the sufficient conditions. Use the training data from Table 12.3.<br />

12.7. This chapter discussed three approaches to using prior knowledge to impact the<br />

search through the space of possible hypotheses. Discuss your ideas for how these<br />

three approaches could be integrated. Can you propose a specific algorithm that<br />

integrates at least two of these three for some specific hypothesis representation?<br />

What advantages and disadvantages would you anticipate from this integration?<br />

12.8. Consider again the question from Section 12.2.1, regarding what criterion to use<br />

for choosing among hypotheses when both data and prior knowledge are available.<br />

Give your own viewpoint on this issue.<br />

REFERENCES<br />

Abu-Mostafa, Y. S. (1989). <strong>Learning</strong> from hints in neural networks. Journal of Complexity, 6(2),<br />

192-198.<br />

Bergadano, F., & Giordana, A. (1990). Guiding induction with domain theories. In R. Michalski et<br />

al. (Eds.), <strong>Machine</strong> learning: An art$cial intelligence approach 3 (pp. 474-492). San Mateo,<br />

CA: Morgan Kaufmann.<br />

Bradshaw, G., Fozzard, R., & Cice, L. (1989). A connectionist expert system that really works. In<br />

Advances in neural information processing. San Mateo, CA: Morgan Kaufmam.<br />

Caruana, R. (1996). Algorithms and applications for multitask learning. Proceedings of the 13th<br />

International Conference on <strong>Machine</strong> <strong>Learning</strong>. San Francisco: Morgan Kaufmann.<br />

Cooper, G. C., & Herskovits, E. (1992). A Bayesian method for the induction of probabilistic networks<br />

from data. <strong>Machine</strong> <strong>Learning</strong>, 9, 309-347.<br />

Craven, M. W. (1996). Extracting comprehensible modelsfrom trained neural networks (PhD thesis)<br />

(UW Technical Report CS-TR-96-1326). Department of Computer Sciences, University of<br />

Wisconsin-Madison.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!