Gradient Descent

Linear regression with 

one variable 

Model 

representation 

Machine Learning 

Andrew Ng

Housing Prices 

(Portland, OR) 

Price 

(in 1000s 

of dollars) 

Supervised Learning 

500 

400 

300 

200 

100 

Given the “right answer” for 

each example in the data. 

0 

0 500 1000 1500 2000 2500 3000 

Size (feet 2 ) 

Regression Problem 

Predict real-valued output 

Andrew Ng

Training set of 

housing prices 

(Portland, OR) 

Size in feet 2 (x) 

Notation: 

m = Number of training examples 

x’s = “input” variable / features 

y’s = “output” variable / “target” variable 

Price ($) in 1000's (y) 

2104 460 

1416 232 

1534 315 

852 178 

… 

… 

Andrew Ng

Training Set 

How do we represent h ? 

Learning Algorithm 

Size of 

house 

h 

Estimated 

price 

Linear regression with one variable. 

Univariate linear regression. 

Andrew Ng


one variable 

Cost function 


Andrew Ng

Training Set 

Size in feet 2 (x) Price ($) in 1000's (y) 

2104 460 

1416 232 

1534 315 

852 178 

… 

… 

Hypothesis: 

’s: Parameters 

How to choose ’s ? 

Andrew Ng

3 

3 

3 

2 

2 

2 

1 

1 

1 

0 

0 1 2 3 

0 

0 1 2 3 

0 

0 1 2 3 

Andrew Ng

y 

x 

Idea: Choose so that 

is close to for our 

training examples 

Andrew Ng


one variable 

Cost function 

intuition I 


Andrew Ng

Hypothesis: 

Simplified 

Parameters: 

Cost Function: 

Goal: 

Andrew Ng

(for fixed , this is a function of x) (function of the parameter ) 

y 

3 

2 

1 

3 

2 

1 

0 

0 1 2 3 

x 

0 

-0.5 0 0.5 1 1.5 2 2.5 

Andrew Ng


y 

3 

2 

1 

3 

2 

1 

0 

0 1 2 3 

x 

0 

-0.5 0 0.5 1 1.5 2 2.5 

Andrew Ng


y 

3 

2 

1 

3 

2 

1 

0 

0 1 2 3 

x 

0 

-0.5 0 0.5 1 1.5 2 2.5 

Andrew Ng


one variable 

Cost function 

intuition II 


Andrew Ng

Hypothesis: 

Parameters: 

Cost Function: 

Goal: 

Andrew Ng

(for fixed , this is a function of x) (function of the parameters ) 

500 

Price ($) 

in 1000’s 

400 

300 

200 

100 

0 

0 1000 2000 3000 

Size in feet 2 (x) 

Andrew Ng

Andrew Ng


Andrew Ng


Andrew Ng


Andrew Ng


Andrew Ng


one variable 

Gradient descent 


Andrew Ng

Have some function 

Want 

Outline: 

• Start with some 

• Keep changing to reduce 

until we hopefully end up at a minimum 

Andrew Ng

J(θ 0 ,θ 1 ) 

θ 1 

θ 0 

Andrew Ng

J(θ 0 ,θ 1 ) 

θ 0 

θ 1 

Andrew Ng

Gradient descent algorithm 

Correct: Simultaneous update 

Incorrect: 

Andrew Ng


one variable 

Gradient descent 

intuition 


Andrew Ng


Andrew Ng

Andrew Ng

If α is too small, gradient descent 

can be slow. 

If α is too large, gradient descent 

can overshoot the minimum. It may 

fail to converge, or even diverge. 

Andrew Ng

at local optima 

Current value of 

Andrew Ng

Gradient descent can converge to a local 

minimum, even with the learning rate α fixed. 

As we approach a local 

minimum, gradient 

descent will automatically 

take smaller steps. So, no 

need to decrease α over 

time. 

Andrew Ng


one variable 

Gradient descent for 

linear regression 


Andrew Ng


Linear Regression Model 

Andrew Ng

Andrew Ng


update 

and 

simultaneously 

Andrew Ng

J(θ 0 ,θ 1 ) 

θ 1 

θ 0 

Andrew Ng

Andrew Ng


Andrew Ng


Andrew Ng


Andrew Ng


Andrew Ng


Andrew Ng


Andrew Ng


Andrew Ng


Andrew Ng


Andrew Ng

“Batch” Gradient Descent 

“Batch”: Each step of gradient descent 

uses all the training examples. 

Andrew Ng

• Batch gradient descent 

• stochastic gradient descent (also incremental 

gradient descent) 

Andrew Ng

Questions ? 

Thank you! 

Andrew Ng

Gradient Descent

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?