13.07.2015 Views

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

276 CHAPTER 6 | IMPLEMENTATIONS: REAL MACHINE LEARNING SCHEMESwhich is exactly the multiplication rule that we applied previously.The two Bayesian networks in Figure 6.20 <strong>and</strong> Figure 6.21 are fundamentallydifferent. The first (Figure 6.20) makes stronger independence assumptionsbecause for each of its nodes the set of parents is a subset of the correspondingset of parents in the second (Figure 6.21). In fact, Figure 6.20 is almost identicalto the simple Naïve Bayes classifier of Section 4.2. (The probabilities areslightly different but only because each count has been initialized to 0.5 to avoidthe zero-frequency problem.) The network in Figure 6.21 has more rows in theconditional probability tables <strong>and</strong> hence more parameters; it may be a moreaccurate representation of the underlying domain.It is tempting to assume that the directed edges in a Bayesian network representcausal effects. But be careful! In our case, a particular value of play mayenhance the prospects of a particular value of outlook, but it certainly doesn’tcause it—it is more likely to be the other way round. Different Bayesian networkscan be constructed for the same problem, representing exactly the sameprobability distribution. This is done by altering the way in which the jointprobability distribution is factorized to exploit conditional independencies. Thenetwork whose directed edges model causal effects is often the simplest one withthe fewest parameters. Hence, human experts who construct Bayesian networksfor a particular domain often benefit by representing causal effects by directededges. However, when machine learning techniques are applied to inducemodels from data whose causal structure is unknown, all they can do is constructa network based on the correlations that are observed in the data. Inferringcausality from correlation is always a dangerous business.<strong>Learning</strong> Bayesian networksThe way to construct a learning algorithm for Bayesian networks is to definetwo components: a function for evaluating a given network based on the data<strong>and</strong> a method for searching through the space of possible networks. The qualityof a given network is measured by the probability of the data given the network.We calculate the probability that the network accords to each instance <strong>and</strong>multiply these probabilities together over all instances. In practice, this quicklyyields numbers too small to be represented properly (called arithmetic underflow),so we use the sum of the logarithms of the probabilities rather than their product.The resulting quantity is the log-likelihood of the network given the data.Assume that the structure of the network—the set of edges—is given. It’s easyto estimate the numbers in the conditional probability tables: just compute therelative frequencies of the associated combinations of attribute values in thetraining data. To avoid the zero-frequency problem each count is initialized witha constant as described in Section 4.2. For example, to find the probability thathumidity = normal given that play = yes <strong>and</strong> temperature = cool (the last number

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!