08.11.2014 Views

Gradient Descent

Gradient Descent

Gradient Descent

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Linear regression with<br />

one variable<br />

Model<br />

representation<br />

Machine Learning<br />

Andrew Ng


Housing Prices<br />

(Portland, OR)<br />

Price<br />

(in 1000s<br />

of dollars)<br />

Supervised Learning<br />

500<br />

400<br />

300<br />

200<br />

100<br />

Given the “right answer” for<br />

each example in the data.<br />

0<br />

0 500 1000 1500 2000 2500 3000<br />

Size (feet 2 )<br />

Regression Problem<br />

Predict real-valued output<br />

Andrew Ng


Training set of<br />

housing prices<br />

(Portland, OR)<br />

Size in feet 2 (x)<br />

Notation:<br />

m = Number of training examples<br />

x’s = “input” variable / features<br />

y’s = “output” variable / “target” variable<br />

Price ($) in 1000's (y)<br />

2104 460<br />

1416 232<br />

1534 315<br />

852 178<br />

…<br />

…<br />

Andrew Ng


Training Set<br />

How do we represent h ?<br />

Learning Algorithm<br />

Size of<br />

house<br />

h<br />

Estimated<br />

price<br />

Linear regression with one variable.<br />

Univariate linear regression.<br />

Andrew Ng


Linear regression with<br />

one variable<br />

Cost function<br />

Machine Learning<br />

Andrew Ng


Training Set<br />

Size in feet 2 (x) Price ($) in 1000's (y)<br />

2104 460<br />

1416 232<br />

1534 315<br />

852 178<br />

…<br />

…<br />

Hypothesis:<br />

’s: Parameters<br />

How to choose ’s ?<br />

Andrew Ng


3<br />

3<br />

3<br />

2<br />

2<br />

2<br />

1<br />

1<br />

1<br />

0<br />

0 1 2 3<br />

0<br />

0 1 2 3<br />

0<br />

0 1 2 3<br />

Andrew Ng


y<br />

x<br />

Idea: Choose so that<br />

is close to for our<br />

training examples<br />

Andrew Ng


Linear regression with<br />

one variable<br />

Cost function<br />

intuition I<br />

Machine Learning<br />

Andrew Ng


Hypothesis:<br />

Simplified<br />

Parameters:<br />

Cost Function:<br />

Goal:<br />

Andrew Ng


(for fixed , this is a function of x) (function of the parameter )<br />

y<br />

3<br />

2<br />

1<br />

3<br />

2<br />

1<br />

0<br />

0 1 2 3<br />

x<br />

0<br />

-0.5 0 0.5 1 1.5 2 2.5<br />

Andrew Ng


(for fixed , this is a function of x) (function of the parameter )<br />

y<br />

3<br />

2<br />

1<br />

3<br />

2<br />

1<br />

0<br />

0 1 2 3<br />

x<br />

0<br />

-0.5 0 0.5 1 1.5 2 2.5<br />

Andrew Ng


(for fixed , this is a function of x) (function of the parameter )<br />

y<br />

3<br />

2<br />

1<br />

3<br />

2<br />

1<br />

0<br />

0 1 2 3<br />

x<br />

0<br />

-0.5 0 0.5 1 1.5 2 2.5<br />

Andrew Ng


Linear regression with<br />

one variable<br />

Cost function<br />

intuition II<br />

Machine Learning<br />

Andrew Ng


Hypothesis:<br />

Parameters:<br />

Cost Function:<br />

Goal:<br />

Andrew Ng


(for fixed , this is a function of x) (function of the parameters )<br />

500<br />

Price ($)<br />

in 1000’s<br />

400<br />

300<br />

200<br />

100<br />

0<br />

0 1000 2000 3000<br />

Size in feet 2 (x)<br />

Andrew Ng


Andrew Ng


(for fixed , this is a function of x) (function of the parameters )<br />

Andrew Ng


(for fixed , this is a function of x) (function of the parameters )<br />

Andrew Ng


(for fixed , this is a function of x) (function of the parameters )<br />

Andrew Ng


(for fixed , this is a function of x) (function of the parameters )<br />

Andrew Ng


Linear regression with<br />

one variable<br />

<strong>Gradient</strong> descent<br />

Machine Learning<br />

Andrew Ng


Have some function<br />

Want<br />

Outline:<br />

• Start with some<br />

• Keep changing to reduce<br />

until we hopefully end up at a minimum<br />

Andrew Ng


J(θ 0 ,θ 1 )<br />

θ 1<br />

θ 0<br />

Andrew Ng


J(θ 0 ,θ 1 )<br />

θ 0<br />

θ 1<br />

Andrew Ng


<strong>Gradient</strong> descent algorithm<br />

Correct: Simultaneous update<br />

Incorrect:<br />

Andrew Ng


Linear regression with<br />

one variable<br />

<strong>Gradient</strong> descent<br />

intuition<br />

Machine Learning<br />

Andrew Ng


<strong>Gradient</strong> descent algorithm<br />

Andrew Ng


Andrew Ng


If α is too small, gradient descent<br />

can be slow.<br />

If α is too large, gradient descent<br />

can overshoot the minimum. It may<br />

fail to converge, or even diverge.<br />

Andrew Ng


at local optima<br />

Current value of<br />

Andrew Ng


<strong>Gradient</strong> descent can converge to a local<br />

minimum, even with the learning rate α fixed.<br />

As we approach a local<br />

minimum, gradient<br />

descent will automatically<br />

take smaller steps. So, no<br />

need to decrease α over<br />

time.<br />

Andrew Ng


Linear regression with<br />

one variable<br />

<strong>Gradient</strong> descent for<br />

linear regression<br />

Machine Learning<br />

Andrew Ng


<strong>Gradient</strong> descent algorithm<br />

Linear Regression Model<br />

Andrew Ng


Andrew Ng


<strong>Gradient</strong> descent algorithm<br />

update<br />

and<br />

simultaneously<br />

Andrew Ng


J(θ 0 ,θ 1 )<br />

θ 1<br />

θ 0<br />

Andrew Ng


Andrew Ng


(for fixed , this is a function of x) (function of the parameters )<br />

Andrew Ng


(for fixed , this is a function of x) (function of the parameters )<br />

Andrew Ng


(for fixed , this is a function of x) (function of the parameters )<br />

Andrew Ng


(for fixed , this is a function of x) (function of the parameters )<br />

Andrew Ng


(for fixed , this is a function of x) (function of the parameters )<br />

Andrew Ng


(for fixed , this is a function of x) (function of the parameters )<br />

Andrew Ng


(for fixed , this is a function of x) (function of the parameters )<br />

Andrew Ng


(for fixed , this is a function of x) (function of the parameters )<br />

Andrew Ng


(for fixed , this is a function of x) (function of the parameters )<br />

Andrew Ng


“Batch” <strong>Gradient</strong> <strong>Descent</strong><br />

“Batch”: Each step of gradient descent<br />

uses all the training examples.<br />

Andrew Ng


• Batch gradient descent<br />

• stochastic gradient descent (also incremental<br />

gradient descent)<br />

Andrew Ng


Questions ?<br />

Thank you!<br />

Andrew Ng

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!