04.04.2014 Views

CS534 Machine Learning - Classes

CS534 Machine Learning - Classes

CS534 Machine Learning - Classes

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>CS534</strong> <strong>Machine</strong> <strong>Learning</strong><br />

Spring 2013<br />

Lecture 1:<br />

Introduction to ML<br />

Course logistics<br />

Reading:<br />

The discipline of <strong>Machine</strong> learning by Tom Mitchell


Course Information<br />

• Instructor: Dr. Xiaoli Fern<br />

Kec 3073, xfern@eecs.oregonstate.edu<br />

• TA: Travis Moore<br />

• Office hour (tentative)<br />

Instructor: MW before class 11‐12 or by appointment<br />

TA: TBA (see class webpage for update)<br />

• Class Web Page<br />

classes.engr.oregonstate.edu/eecs/spring2013/cs534/<br />

• Class email list<br />

cs534‐sp13@engr.orst.edu


Course materials<br />

• Text book:<br />

– Pattern recognition and machine learning by Chris<br />

Bishop (Bishop)<br />

• Slides and reading materials will be provided on<br />

course webpage<br />

• Other good references<br />

– <strong>Machine</strong> learning by Tom Mitchell (TM)<br />

– Pattern Classification by Duda, Hart and Stork (DHS)<br />

2 nd edition<br />

• A lot of online resources on machine learning<br />

– Check class website for a few links<br />

3


Prerequisites<br />

Color Green<br />

means important<br />

• Basic probability theory and statistics<br />

concepts: Distributions, Densities,<br />

Expectation, Variance, parameter estimation<br />

– A brief review is provided on class website<br />

• Multivariable Calculus and linear algebra<br />

– Basic review slides, and links to useful video<br />

lectures provided on class webpage<br />

• Knowledge of basic CS concepts such as data<br />

structure, search strategies, complexity<br />

Please spend some time review these!<br />

It will be tremendously helpful!


Homework Policies<br />

• Homework is generally due at the beginning of<br />

the class on the due day<br />

• Each student has one allowance of handing in<br />

late homework (no more than 48 hours late)<br />

• Collaboration policy<br />

– Discussions are allowed, but copying of solution or<br />

code is not<br />

– See the Student Conduct page on OSU website for<br />

information regarding academic dishonesty<br />

(http://oregonstate.edu/studentconduct/code/ind<br />

ex.php#acdis)


Grading policy<br />

• Grading policy:<br />

Written homework will not be graded based on correctness. We will<br />

record the number of problems that were "completed" (either<br />

correctly or incorrectly).<br />

Completing a problems requires a non‐trivial attempt at solving the<br />

problem. The judgment of whether a problem was "completed" is<br />

left to the instructor and the TA.<br />

• Final grades breakdown:<br />

– Midterm 25%; Final 25%; Final project 25%; Implementation<br />

assignments 25%.<br />

– The resulting letter grade will be decreased by one if a student fails<br />

to complete at least 80% of the written homework problems.


What is <strong>Machine</strong> learning<br />

Task T<br />

Performance P<br />

<strong>Learning</strong> Algorithm<br />

Experience E<br />

<strong>Machine</strong> learning studies algorithms that<br />

• Improve performance P<br />

• at some task T<br />

• based on experience E


<strong>Machine</strong> learning in Computer Science<br />

• <strong>Machine</strong> learning is already the preferred approach to<br />

– Speech recognition, Natural language processing<br />

– Computer vision<br />

– Medical outcomes analysis<br />

– Robot control<br />

– …<br />

• This trend is growing<br />

– Improved machine learning algorithms<br />

– Increase data capture, and new sensors<br />

– Increasing demand for self‐customization to user and<br />

environment


Fields of Study<br />

<strong>Machine</strong> <strong>Learning</strong><br />

Supervised<br />

<strong>Learning</strong><br />

Semi‐supervised<br />

learning<br />

Unsupervised<br />

<strong>Learning</strong><br />

Reinforcement<br />

<strong>Learning</strong>


Supervised <strong>Learning</strong><br />

• Learn to predict output from input.<br />

• Output can be<br />

– continuous: regression problems<br />

$<br />

x<br />

x<br />

x<br />

x x x<br />

x<br />

x<br />

x<br />

x x<br />

x<br />

x<br />

Example: Predicting the<br />

price of a house based on<br />

its square footage<br />

x<br />

feet


Supervised <strong>Learning</strong><br />

• Learn to predict output from input.<br />

• Output can be<br />

– continuous: regression problems<br />

– Discrete: classification problems<br />

Example: classify a loan<br />

applicant as either high<br />

risk or low risk based on<br />

income and saving<br />

amount.


Unsupervised <strong>Learning</strong><br />

• Given a collection of examples (objects),<br />

discover self‐similar groups within the data –<br />

clustering<br />

Example: clustering<br />

artwork


Unsupervised learning<br />

• Given a collection of examples (objects),<br />

discover self‐similar groups within the data –<br />

clustering<br />

Image Segmentation<br />

13


Unsupervised <strong>Learning</strong><br />

• Given a collection of examples (objects),<br />

discover self‐similar groups within the data –<br />

clustering<br />

• Learn the underlying distribution that<br />

generates the data we observe – density<br />

estimation<br />

• Represent high dimensional data using a lowdimensional<br />

representation for compression<br />

or visualization – dimension reduction


Reinforcement <strong>Learning</strong><br />

• Learn to act<br />

• An agent<br />

– Observes the environment<br />

– Takes action<br />

– With each action, receives rewards/punishments<br />

– Goal: learn a policy that optimizes rewards<br />

• No examples of optimal outputs are given<br />

• Not covered in this class. Take 533 if you want<br />

to learn about this.


When do we need computer to learn?


Appropriate Applications for<br />

Supervised <strong>Learning</strong><br />

• Situations where there is no human expert<br />

– x: bond graph of a new molecule, f(x): predicted binding strength to AIDS<br />

protease molecule<br />

– x: nano modification structure to a Fuel cell, f(x): predicted power output<br />

strength by the fuel cell<br />

• Situations where humans can perform the task but can’t describe<br />

how they do it<br />

– x: picture of a hand‐written character, f(x): ascii code of the character<br />

– x: recording of a bird song, f(x): species of the bird<br />

• Situations where the desired function is changing frequently<br />

– x: description of stock prices and trades for last 10 days, f(x): recommended<br />

stock transactions<br />

• Situations where each user needs a customized function f<br />

– x: incoming email message, f(x): importance score for presenting to the user<br />

(or deleting without presenting)<br />

17


Supervised learning<br />

• Given: a set of training examples<br />

, , <br />

– : the input of the ‐th example ( i.e., a<br />

vector)<br />

– is its corresponding output (continuous or discrete)<br />

– We assume there is some underlying function that<br />

maps from to –our target function<br />

• Goal: find a good approximation of so that<br />

accurate prediction can be made for previously<br />

unseen


The underline function:


Polynomial curve fitting<br />

• There are infinite functions that will fit the training data perfectly.<br />

• In order to learn, we have to focus on a limited set of possible<br />

functions<br />

– We call this our hypothesis space<br />

– E.g., all M‐th order polynomial functions<br />

2<br />

y( x,<br />

w)<br />

w w x w x ...<br />

<br />

0<br />

1<br />

M<br />

w M x<br />

– w = (w 0 , w 1 ,…, w M ) represents the unknown parameters that we<br />

wish to learn from the training data<br />

• <strong>Learning</strong> here means to find a good set of parameters<br />

w to minimize some loss function<br />

2<br />

This optimization problem can be solved easily.<br />

We will not focus on solving this at this point, will revisit this later.


Important Issue: Model Selection<br />

• The red line shows the function learned with different M values<br />

• Which M should we choose –this is a model selection problem<br />

• Can we use E(w) that we define in previous slides as a criterion to<br />

choose M?


Over‐fitting<br />

• As M increases, loss on the training data<br />

decreases monotonically<br />

• However, the loss on test data starts to<br />

increase after a while<br />

• Why? Is this a fluke or generally true?<br />

It turns out this is<br />

generally the case –<br />

caused by over‐fitting


Over‐fitting<br />

• Over‐fitting refers to the phenomenon when the<br />

learner adjusts to very specific random features<br />

of the training data, which differs from the target<br />

function<br />

• Real example:<br />

– In Bug ID project, x: image of a robotically<br />

maneuvered bug, f(x): the species of the bug<br />

– Initial attempt yields close to perfect accuracy<br />

– Reason: the different species were imaged in different<br />

batches, one species when imaging, has a peculiar air<br />

bubble in the image.


Overfitting<br />

• Over‐fitting happens when<br />

– There is too little data (or some systematic bias in<br />

the data )<br />

– There are too many parameters


Key Issues in <strong>Machine</strong> <strong>Learning</strong><br />

• What are good hypothesis spaces?<br />

– Linear functions? Polynomials?<br />

– which spaces have been useful in practical applications?<br />

• How to select among different hypothesis spaces?<br />

– The Model selection problem<br />

– Trade‐off between over‐fitting and under‐fitting<br />

• How can we optimize accuracy on future data points?<br />

– This is often called the Generalization Error –error on unseen data pts<br />

– Related to the issue of “overfitting”, i.e., the model fitting to the peculiarities<br />

rather than the generalities of the data<br />

• What level of confidence should we have in the results? (A<br />

statistical question)<br />

– How much training data is required to find an accurate hypotheses with high<br />

probability? This is the topic of learning theory<br />

• Are some learning problems computationally intractable? (A<br />

computational question)<br />

– Some learning problems are provably hard<br />

– Heuristic / greedy approaches are often used when this is the case<br />

• How can we formulate application problems as machine learning<br />

problems? (the engineering question)<br />

25


Terminology<br />

• Training example an example of the form <br />

– x: feature vector<br />

– y<br />

• continuous value for regression problems<br />

• class label, in [1, 2, …, K] , for classification problems<br />

• Training Set a set of training examples drawn randomly from<br />

P(x,y)<br />

• Target function the true mapping from x to y<br />

• Hypothesis: a proposed function h considered by the learning<br />

algorithm to be similar to the target function.<br />

• Test Set a set of training examples used to evaluate a<br />

proposed hypothesis h.<br />

• Hypothesis space The space of all hypotheses that can, in<br />

principle, be output by a particular learning algorithm<br />

26

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!