13.07.2015 Views

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

Data Mining: Practical Machine Learning Tools and ... - LIDeCC

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

32 CHAPTER 1 | WHAT’S IT ALL ABOUT?eliminate the correct description from the space. The result is that the set ofremaining descriptions becomes empty. This situation is very likely to happenif the examples contain any noise at all, which inevitably they do except inartificial situations.Another way of looking at generalization as search is to imagine it not as aprocess of enumerating descriptions <strong>and</strong> striking out those that don’t apply butas a kind of hill-climbing in description space to find the description that bestmatches the set of examples according to some prespecified matching criterion.This is the way that most practical machine learning methods work. However,except in the most trivial cases, it is impractical to search the whole spaceexhaustively; most practical algorithms involve heuristic search <strong>and</strong> cannotguarantee to find the optimal description.BiasViewing generalization as a search in a space of possible concepts makes it clearthat the most important decisions in a machine learning system are as follows: The concept description language The order in which the space is searched The way that overfitting to the particular training data is avoidedThese three properties are generally referred to as the bias of the search <strong>and</strong> arecalled language bias, search bias, <strong>and</strong> overfitting-avoidance bias. You bias thelearning scheme by choosing a language in which to express concepts, by searchingin a particular way for an acceptable description, <strong>and</strong> by deciding when theconcept has become so complex that it needs to be simplified.Language biasThe most important question for language bias is whether the concept descriptionlanguage is universal or whether it imposes constraints on what concepts canbe learned. If you consider the set of all possible examples, a concept is really justa division of it into subsets. In the weather example, if you were to enumerate allpossible weather conditions, the play concept is a subset of possible weather conditions.A “universal” language is one that is capable of expressing every possiblesubset of examples. In practice, the set of possible examples is generally huge, <strong>and</strong>in this respect our perspective is a theoretical, not a practical, one.If the concept description language permits statements involving logical or,that is, disjunctions, then any subset can be represented. If the description languageis rule based, disjunction can be achieved by using separate rules. Forexample, one possible concept representation is just to enumerate the examples:If outlook = overcast <strong>and</strong> temperature = hot <strong>and</strong> humidity = high<strong>and</strong> windy = false then play = yes

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!