23.02.2015 Views

Machine Learning - DISCo

Machine Learning - DISCo

Machine Learning - DISCo

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

simple functions such as XOR could not be represented or learned with singlelayer<br />

perceptron networks, and work on ANNs receded during the 1970s.<br />

During the mid-1980s work on ANNs experienced a resurgence, caused in<br />

large part by the invention of BACKPROPAGATION and related algorithms for training<br />

multilayer networks (Rumelhart and McClelland 1986; Parker 1985). These<br />

ideas can be traced to related earlier work (e.g., Werbos 1975). Since the 1980s,<br />

BACKPROPAGATION has become a widely used learning method, and many other<br />

ANN approaches have been actively explored. The advent of inexpensive computers<br />

during this same period has allowed experimenting with computationally<br />

intensive algorithms that could not be thoroughly explored during the 1960s.<br />

A number of textbooks are devoted to the topic of neural network learning.<br />

An early but still useful book on parameter learning methods for pattern recognition<br />

is Duda and Hart (1973). The text by Widrow and Stearns (1985) covers<br />

perceptrons and related single-layer networks and their applications. Rumelhart<br />

and McClelland (1986) produced an edited collection of papers that helped generate<br />

the increased interest in these methods beginning in the mid-1980s. Recent<br />

books on neural network learning include Bishop (1996); Chauvin and Rumelhart<br />

(1995); Freeman and Skapina (1991); Fu (1994); Hecht-Nielsen (1990); and Hertz<br />

et al. (1991).<br />

EXERCISES<br />

4.1. What are the values of weights wo, wl, and w2 for the perceptron whose decision<br />

surface is illustrated in Figure 4.3? Assume the surface crosses the xl axis at -1,<br />

and the x2 axis at 2.<br />

4.2. Design a two-input perceptron that implements the boolean function A A -. B. Design<br />

a two-layer network of perceptrons that implements A XO R B.<br />

4.3. Consider two perceptrons defined by the threshold expression wo + wlxl+ ~2x2 > 0.<br />

Perceptron A has weight values<br />

and perceptron B has the weight values<br />

True or false? Perceptron A is more-general~han perceptron B. (more-general~han<br />

is defined in Chapter 2.)<br />

4.4. Implement the delta training rule for a two-input linear unit. Train it to fit the target<br />

concept -2 + XI+ 2x2 > 0. Plot the error E as a function of the number of training<br />

iterations. Plot the decision surface after 5, 10, 50, 100, . . . , iterations.<br />

(a) Try this using various constant values for 17 and using a decaying learning rate<br />

of qo/i for the ith iteration. Which works better?<br />

(b) Try incremental and batch learning. Which converges more quickly? Consider<br />

both number of weight updates and total execution time.<br />

4.5. Derive a gradient descent training rule for a single unit with output o, where

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!