CS534 Machine Learning - Classes
CS534 Machine Learning - Classes
CS534 Machine Learning - Classes
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
<strong>CS534</strong> <strong>Machine</strong> <strong>Learning</strong><br />
Spring 2013<br />
Lecture 1:<br />
Introduction to ML<br />
Course logistics<br />
Reading:<br />
The discipline of <strong>Machine</strong> learning by Tom Mitchell
Course Information<br />
• Instructor: Dr. Xiaoli Fern<br />
Kec 3073, xfern@eecs.oregonstate.edu<br />
• TA: Travis Moore<br />
• Office hour (tentative)<br />
Instructor: MW before class 11‐12 or by appointment<br />
TA: TBA (see class webpage for update)<br />
• Class Web Page<br />
classes.engr.oregonstate.edu/eecs/spring2013/cs534/<br />
• Class email list<br />
cs534‐sp13@engr.orst.edu
Course materials<br />
• Text book:<br />
– Pattern recognition and machine learning by Chris<br />
Bishop (Bishop)<br />
• Slides and reading materials will be provided on<br />
course webpage<br />
• Other good references<br />
– <strong>Machine</strong> learning by Tom Mitchell (TM)<br />
– Pattern Classification by Duda, Hart and Stork (DHS)<br />
2 nd edition<br />
• A lot of online resources on machine learning<br />
– Check class website for a few links<br />
3
Prerequisites<br />
Color Green<br />
means important<br />
• Basic probability theory and statistics<br />
concepts: Distributions, Densities,<br />
Expectation, Variance, parameter estimation<br />
– A brief review is provided on class website<br />
• Multivariable Calculus and linear algebra<br />
– Basic review slides, and links to useful video<br />
lectures provided on class webpage<br />
• Knowledge of basic CS concepts such as data<br />
structure, search strategies, complexity<br />
Please spend some time review these!<br />
It will be tremendously helpful!
Homework Policies<br />
• Homework is generally due at the beginning of<br />
the class on the due day<br />
• Each student has one allowance of handing in<br />
late homework (no more than 48 hours late)<br />
• Collaboration policy<br />
– Discussions are allowed, but copying of solution or<br />
code is not<br />
– See the Student Conduct page on OSU website for<br />
information regarding academic dishonesty<br />
(http://oregonstate.edu/studentconduct/code/ind<br />
ex.php#acdis)
Grading policy<br />
• Grading policy:<br />
Written homework will not be graded based on correctness. We will<br />
record the number of problems that were "completed" (either<br />
correctly or incorrectly).<br />
Completing a problems requires a non‐trivial attempt at solving the<br />
problem. The judgment of whether a problem was "completed" is<br />
left to the instructor and the TA.<br />
• Final grades breakdown:<br />
– Midterm 25%; Final 25%; Final project 25%; Implementation<br />
assignments 25%.<br />
– The resulting letter grade will be decreased by one if a student fails<br />
to complete at least 80% of the written homework problems.
What is <strong>Machine</strong> learning<br />
Task T<br />
Performance P<br />
<strong>Learning</strong> Algorithm<br />
Experience E<br />
<strong>Machine</strong> learning studies algorithms that<br />
• Improve performance P<br />
• at some task T<br />
• based on experience E
<strong>Machine</strong> learning in Computer Science<br />
• <strong>Machine</strong> learning is already the preferred approach to<br />
– Speech recognition, Natural language processing<br />
– Computer vision<br />
– Medical outcomes analysis<br />
– Robot control<br />
– …<br />
• This trend is growing<br />
– Improved machine learning algorithms<br />
– Increase data capture, and new sensors<br />
– Increasing demand for self‐customization to user and<br />
environment
Fields of Study<br />
<strong>Machine</strong> <strong>Learning</strong><br />
Supervised<br />
<strong>Learning</strong><br />
Semi‐supervised<br />
learning<br />
Unsupervised<br />
<strong>Learning</strong><br />
Reinforcement<br />
<strong>Learning</strong>
Supervised <strong>Learning</strong><br />
• Learn to predict output from input.<br />
• Output can be<br />
– continuous: regression problems<br />
$<br />
x<br />
x<br />
x<br />
x x x<br />
x<br />
x<br />
x<br />
x x<br />
x<br />
x<br />
Example: Predicting the<br />
price of a house based on<br />
its square footage<br />
x<br />
feet
Supervised <strong>Learning</strong><br />
• Learn to predict output from input.<br />
• Output can be<br />
– continuous: regression problems<br />
– Discrete: classification problems<br />
Example: classify a loan<br />
applicant as either high<br />
risk or low risk based on<br />
income and saving<br />
amount.
Unsupervised <strong>Learning</strong><br />
• Given a collection of examples (objects),<br />
discover self‐similar groups within the data –<br />
clustering<br />
Example: clustering<br />
artwork
Unsupervised learning<br />
• Given a collection of examples (objects),<br />
discover self‐similar groups within the data –<br />
clustering<br />
Image Segmentation<br />
13
Unsupervised <strong>Learning</strong><br />
• Given a collection of examples (objects),<br />
discover self‐similar groups within the data –<br />
clustering<br />
• Learn the underlying distribution that<br />
generates the data we observe – density<br />
estimation<br />
• Represent high dimensional data using a lowdimensional<br />
representation for compression<br />
or visualization – dimension reduction
Reinforcement <strong>Learning</strong><br />
• Learn to act<br />
• An agent<br />
– Observes the environment<br />
– Takes action<br />
– With each action, receives rewards/punishments<br />
– Goal: learn a policy that optimizes rewards<br />
• No examples of optimal outputs are given<br />
• Not covered in this class. Take 533 if you want<br />
to learn about this.
When do we need computer to learn?
Appropriate Applications for<br />
Supervised <strong>Learning</strong><br />
• Situations where there is no human expert<br />
– x: bond graph of a new molecule, f(x): predicted binding strength to AIDS<br />
protease molecule<br />
– x: nano modification structure to a Fuel cell, f(x): predicted power output<br />
strength by the fuel cell<br />
• Situations where humans can perform the task but can’t describe<br />
how they do it<br />
– x: picture of a hand‐written character, f(x): ascii code of the character<br />
– x: recording of a bird song, f(x): species of the bird<br />
• Situations where the desired function is changing frequently<br />
– x: description of stock prices and trades for last 10 days, f(x): recommended<br />
stock transactions<br />
• Situations where each user needs a customized function f<br />
– x: incoming email message, f(x): importance score for presenting to the user<br />
(or deleting without presenting)<br />
17
Supervised learning<br />
• Given: a set of training examples<br />
, , <br />
– : the input of the ‐th example ( i.e., a<br />
vector)<br />
– is its corresponding output (continuous or discrete)<br />
– We assume there is some underlying function that<br />
maps from to –our target function<br />
• Goal: find a good approximation of so that<br />
accurate prediction can be made for previously<br />
unseen
The underline function:
Polynomial curve fitting<br />
• There are infinite functions that will fit the training data perfectly.<br />
• In order to learn, we have to focus on a limited set of possible<br />
functions<br />
– We call this our hypothesis space<br />
– E.g., all M‐th order polynomial functions<br />
2<br />
y( x,<br />
w)<br />
w w x w x ...<br />
<br />
0<br />
1<br />
M<br />
w M x<br />
– w = (w 0 , w 1 ,…, w M ) represents the unknown parameters that we<br />
wish to learn from the training data<br />
• <strong>Learning</strong> here means to find a good set of parameters<br />
w to minimize some loss function<br />
2<br />
This optimization problem can be solved easily.<br />
We will not focus on solving this at this point, will revisit this later.
Important Issue: Model Selection<br />
• The red line shows the function learned with different M values<br />
• Which M should we choose –this is a model selection problem<br />
• Can we use E(w) that we define in previous slides as a criterion to<br />
choose M?
Over‐fitting<br />
• As M increases, loss on the training data<br />
decreases monotonically<br />
• However, the loss on test data starts to<br />
increase after a while<br />
• Why? Is this a fluke or generally true?<br />
It turns out this is<br />
generally the case –<br />
caused by over‐fitting
Over‐fitting<br />
• Over‐fitting refers to the phenomenon when the<br />
learner adjusts to very specific random features<br />
of the training data, which differs from the target<br />
function<br />
• Real example:<br />
– In Bug ID project, x: image of a robotically<br />
maneuvered bug, f(x): the species of the bug<br />
– Initial attempt yields close to perfect accuracy<br />
– Reason: the different species were imaged in different<br />
batches, one species when imaging, has a peculiar air<br />
bubble in the image.
Overfitting<br />
• Over‐fitting happens when<br />
– There is too little data (or some systematic bias in<br />
the data )<br />
– There are too many parameters
Key Issues in <strong>Machine</strong> <strong>Learning</strong><br />
• What are good hypothesis spaces?<br />
– Linear functions? Polynomials?<br />
– which spaces have been useful in practical applications?<br />
• How to select among different hypothesis spaces?<br />
– The Model selection problem<br />
– Trade‐off between over‐fitting and under‐fitting<br />
• How can we optimize accuracy on future data points?<br />
– This is often called the Generalization Error –error on unseen data pts<br />
– Related to the issue of “overfitting”, i.e., the model fitting to the peculiarities<br />
rather than the generalities of the data<br />
• What level of confidence should we have in the results? (A<br />
statistical question)<br />
– How much training data is required to find an accurate hypotheses with high<br />
probability? This is the topic of learning theory<br />
• Are some learning problems computationally intractable? (A<br />
computational question)<br />
– Some learning problems are provably hard<br />
– Heuristic / greedy approaches are often used when this is the case<br />
• How can we formulate application problems as machine learning<br />
problems? (the engineering question)<br />
25
Terminology<br />
• Training example an example of the form <br />
– x: feature vector<br />
– y<br />
• continuous value for regression problems<br />
• class label, in [1, 2, …, K] , for classification problems<br />
• Training Set a set of training examples drawn randomly from<br />
P(x,y)<br />
• Target function the true mapping from x to y<br />
• Hypothesis: a proposed function h considered by the learning<br />
algorithm to be similar to the target function.<br />
• Test Set a set of training examples used to evaluate a<br />
proposed hypothesis h.<br />
• Hypothesis space The space of all hypotheses that can, in<br />
principle, be output by a particular learning algorithm<br />
26