CS534 Machine Learning - Classes

CS534 Machine Learning 

Spring 2013 

Lecture 1: 

Introduction to ML 

Course logistics 

Reading: 

The discipline of Machine learning by Tom Mitchell

Course Information 

• Instructor: Dr. Xiaoli Fern 

Kec 3073, xfern@eecs.oregonstate.edu 

• TA: Travis Moore 

• Office hour (tentative) 

Instructor: MW before class 11‐12 or by appointment 

TA: TBA (see class webpage for update) 

• Class Web Page 

classes.engr.oregonstate.edu/eecs/spring2013/cs534/ 

• Class email list 

cs534‐sp13@engr.orst.edu

Course materials 

• Text book: 

– Pattern recognition and machine learning by Chris 

Bishop (Bishop) 

• Slides and reading materials will be provided on 

course webpage 

• Other good references 

– Machine learning by Tom Mitchell (TM) 

– Pattern Classification by Duda, Hart and Stork (DHS) 

2 nd edition 

• A lot of online resources on machine learning 

– Check class website for a few links 

3

Prerequisites 

Color Green 

means important 

• Basic probability theory and statistics 

concepts: Distributions, Densities, 

Expectation, Variance, parameter estimation 

– A brief review is provided on class website 

• Multivariable Calculus and linear algebra 

– Basic review slides, and links to useful video 

lectures provided on class webpage 

• Knowledge of basic CS concepts such as data 

structure, search strategies, complexity 

Please spend some time review these! 

It will be tremendously helpful!

Homework Policies 

• Homework is generally due at the beginning of 

the class on the due day 

• Each student has one allowance of handing in 

late homework (no more than 48 hours late) 

• Collaboration policy 

– Discussions are allowed, but copying of solution or 

code is not 

– See the Student Conduct page on OSU website for 

information regarding academic dishonesty 

(http://oregonstate.edu/studentconduct/code/ind 

ex.php#acdis)

Grading policy 

• Grading policy: 

Written homework will not be graded based on correctness. We will 

record the number of problems that were "completed" (either 

correctly or incorrectly). 

Completing a problems requires a non‐trivial attempt at solving the 

problem. The judgment of whether a problem was "completed" is 

left to the instructor and the TA. 

• Final grades breakdown: 

– Midterm 25%; Final 25%; Final project 25%; Implementation 

assignments 25%. 

– The resulting letter grade will be decreased by one if a student fails 

to complete at least 80% of the written homework problems.

What is Machine learning 

Task T 

Performance P 

Learning Algorithm 

Experience E 

Machine learning studies algorithms that 

• Improve performance P 

• at some task T 

• based on experience E

Machine learning in Computer Science 

• Machine learning is already the preferred approach to 

– Speech recognition, Natural language processing 

– Computer vision 

– Medical outcomes analysis 

– Robot control 

– … 

• This trend is growing 

– Improved machine learning algorithms 

– Increase data capture, and new sensors 

– Increasing demand for self‐customization to user and 

environment

Fields of Study 

Machine Learning 

Supervised 

Learning 

Semi‐supervised 

learning 

Unsupervised 

Learning 

Reinforcement 

Learning

Supervised Learning 

• Learn to predict output from input. 

• Output can be 

– continuous: regression problems 

$ 

x 

x 

x 

x x x 

x 

x 

x 

x x 

x 

x 

Example: Predicting the 

price of a house based on 

its square footage 

x 

feet


• Learn to predict output from input. 

• Output can be 

– continuous: regression problems 

– Discrete: classification problems 

Example: classify a loan 

applicant as either high 

risk or low risk based on 

income and saving 

amount.

Unsupervised Learning 

• Given a collection of examples (objects), 

discover self‐similar groups within the data – 

clustering 

Example: clustering 

artwork

Unsupervised learning 



clustering 

Image Segmentation 

13

Unsupervised Learning 



clustering 

• Learn the underlying distribution that 

generates the data we observe – density 

estimation 

• Represent high dimensional data using a lowdimensional 

representation for compression 

or visualization – dimension reduction

Reinforcement Learning 

• Learn to act 

• An agent 

– Observes the environment 

– Takes action 

– With each action, receives rewards/punishments 

– Goal: learn a policy that optimizes rewards 

• No examples of optimal outputs are given 

• Not covered in this class. Take 533 if you want 

to learn about this.

When do we need computer to learn?

Appropriate Applications for 


• Situations where there is no human expert 

– x: bond graph of a new molecule, f(x): predicted binding strength to AIDS 

protease molecule 

– x: nano modification structure to a Fuel cell, f(x): predicted power output 

strength by the fuel cell 

• Situations where humans can perform the task but can’t describe 

how they do it 

– x: picture of a hand‐written character, f(x): ascii code of the character 

– x: recording of a bird song, f(x): species of the bird 

• Situations where the desired function is changing frequently 

– x: description of stock prices and trades for last 10 days, f(x): recommended 

stock transactions 

• Situations where each user needs a customized function f 

– x: incoming email message, f(x): importance score for presenting to the user 

(or deleting without presenting) 

17

Supervised learning 

• Given: a set of training examples 

, , 

– : the input of the ‐th example ( i.e., a 

vector) 

– is its corresponding output (continuous or discrete) 

– We assume there is some underlying function that 

maps from to –our target function 

• Goal: find a good approximation of so that 

accurate prediction can be made for previously 

unseen

The underline function:

Polynomial curve fitting 

• There are infinite functions that will fit the training data perfectly. 

• In order to learn, we have to focus on a limited set of possible 

functions 

– We call this our hypothesis space 

– E.g., all M‐th order polynomial functions 

2 

y( x, 

w) 

w w x w x ... 

 

0 

1 

M 

w M x 

– w = (w 0 , w 1 ,…, w M ) represents the unknown parameters that we 

wish to learn from the training data 

• Learning here means to find a good set of parameters 

w to minimize some loss function 

2 

This optimization problem can be solved easily. 

We will not focus on solving this at this point, will revisit this later.

Important Issue: Model Selection 

• The red line shows the function learned with different M values 

• Which M should we choose –this is a model selection problem 

• Can we use E(w) that we define in previous slides as a criterion to 

choose M?

Over‐fitting 

• As M increases, loss on the training data 

decreases monotonically 

• However, the loss on test data starts to 

increase after a while 

• Why? Is this a fluke or generally true? 

It turns out this is 

generally the case – 

caused by over‐fitting

Over‐fitting 

• Over‐fitting refers to the phenomenon when the 

learner adjusts to very specific random features 

of the training data, which differs from the target 

function 

• Real example: 

– In Bug ID project, x: image of a robotically 

maneuvered bug, f(x): the species of the bug 

– Initial attempt yields close to perfect accuracy 

– Reason: the different species were imaged in different 

batches, one species when imaging, has a peculiar air 

bubble in the image.

Overfitting 

• Over‐fitting happens when 

– There is too little data (or some systematic bias in 

the data ) 

– There are too many parameters

Key Issues in Machine Learning 

• What are good hypothesis spaces? 

– Linear functions? Polynomials? 

– which spaces have been useful in practical applications? 

• How to select among different hypothesis spaces? 

– The Model selection problem 

– Trade‐off between over‐fitting and under‐fitting 

• How can we optimize accuracy on future data points? 

– This is often called the Generalization Error –error on unseen data pts 

– Related to the issue of “overfitting”, i.e., the model fitting to the peculiarities 

rather than the generalities of the data 

• What level of confidence should we have in the results? (A 

statistical question) 

– How much training data is required to find an accurate hypotheses with high 

probability? This is the topic of learning theory 

• Are some learning problems computationally intractable? (A 

computational question) 

– Some learning problems are provably hard 

– Heuristic / greedy approaches are often used when this is the case 

• How can we formulate application problems as machine learning 

problems? (the engineering question) 

25

Terminology 

• Training example an example of the form 

– x: feature vector 

– y 

• continuous value for regression problems 

• class label, in [1, 2, …, K] , for classification problems 

• Training Set a set of training examples drawn randomly from 

P(x,y) 

• Target function the true mapping from x to y 

• Hypothesis: a proposed function h considered by the learning 

algorithm to be similar to the target function. 

• Test Set a set of training examples used to evaluate a 

proposed hypothesis h. 

• Hypothesis space The space of all hypotheses that can, in 

principle, be output by a particular learning algorithm 

26

CS534 Machine Learning - Classes

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?