Gradient Descent
Gradient Descent
Gradient Descent
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Linear regression with<br />
one variable<br />
Model<br />
representation<br />
Machine Learning<br />
Andrew Ng
Housing Prices<br />
(Portland, OR)<br />
Price<br />
(in 1000s<br />
of dollars)<br />
Supervised Learning<br />
500<br />
400<br />
300<br />
200<br />
100<br />
Given the “right answer” for<br />
each example in the data.<br />
0<br />
0 500 1000 1500 2000 2500 3000<br />
Size (feet 2 )<br />
Regression Problem<br />
Predict real-valued output<br />
Andrew Ng
Training set of<br />
housing prices<br />
(Portland, OR)<br />
Size in feet 2 (x)<br />
Notation:<br />
m = Number of training examples<br />
x’s = “input” variable / features<br />
y’s = “output” variable / “target” variable<br />
Price ($) in 1000's (y)<br />
2104 460<br />
1416 232<br />
1534 315<br />
852 178<br />
…<br />
…<br />
Andrew Ng
Training Set<br />
How do we represent h ?<br />
Learning Algorithm<br />
Size of<br />
house<br />
h<br />
Estimated<br />
price<br />
Linear regression with one variable.<br />
Univariate linear regression.<br />
Andrew Ng
Linear regression with<br />
one variable<br />
Cost function<br />
Machine Learning<br />
Andrew Ng
Training Set<br />
Size in feet 2 (x) Price ($) in 1000's (y)<br />
2104 460<br />
1416 232<br />
1534 315<br />
852 178<br />
…<br />
…<br />
Hypothesis:<br />
’s: Parameters<br />
How to choose ’s ?<br />
Andrew Ng
3<br />
3<br />
3<br />
2<br />
2<br />
2<br />
1<br />
1<br />
1<br />
0<br />
0 1 2 3<br />
0<br />
0 1 2 3<br />
0<br />
0 1 2 3<br />
Andrew Ng
y<br />
x<br />
Idea: Choose so that<br />
is close to for our<br />
training examples<br />
Andrew Ng
Linear regression with<br />
one variable<br />
Cost function<br />
intuition I<br />
Machine Learning<br />
Andrew Ng
Hypothesis:<br />
Simplified<br />
Parameters:<br />
Cost Function:<br />
Goal:<br />
Andrew Ng
(for fixed , this is a function of x) (function of the parameter )<br />
y<br />
3<br />
2<br />
1<br />
3<br />
2<br />
1<br />
0<br />
0 1 2 3<br />
x<br />
0<br />
-0.5 0 0.5 1 1.5 2 2.5<br />
Andrew Ng
(for fixed , this is a function of x) (function of the parameter )<br />
y<br />
3<br />
2<br />
1<br />
3<br />
2<br />
1<br />
0<br />
0 1 2 3<br />
x<br />
0<br />
-0.5 0 0.5 1 1.5 2 2.5<br />
Andrew Ng
(for fixed , this is a function of x) (function of the parameter )<br />
y<br />
3<br />
2<br />
1<br />
3<br />
2<br />
1<br />
0<br />
0 1 2 3<br />
x<br />
0<br />
-0.5 0 0.5 1 1.5 2 2.5<br />
Andrew Ng
Linear regression with<br />
one variable<br />
Cost function<br />
intuition II<br />
Machine Learning<br />
Andrew Ng
Hypothesis:<br />
Parameters:<br />
Cost Function:<br />
Goal:<br />
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )<br />
500<br />
Price ($)<br />
in 1000’s<br />
400<br />
300<br />
200<br />
100<br />
0<br />
0 1000 2000 3000<br />
Size in feet 2 (x)<br />
Andrew Ng
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )<br />
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )<br />
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )<br />
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )<br />
Andrew Ng
Linear regression with<br />
one variable<br />
<strong>Gradient</strong> descent<br />
Machine Learning<br />
Andrew Ng
Have some function<br />
Want<br />
Outline:<br />
• Start with some<br />
• Keep changing to reduce<br />
until we hopefully end up at a minimum<br />
Andrew Ng
J(θ 0 ,θ 1 )<br />
θ 1<br />
θ 0<br />
Andrew Ng
J(θ 0 ,θ 1 )<br />
θ 0<br />
θ 1<br />
Andrew Ng
<strong>Gradient</strong> descent algorithm<br />
Correct: Simultaneous update<br />
Incorrect:<br />
Andrew Ng
Linear regression with<br />
one variable<br />
<strong>Gradient</strong> descent<br />
intuition<br />
Machine Learning<br />
Andrew Ng
<strong>Gradient</strong> descent algorithm<br />
Andrew Ng
Andrew Ng
If α is too small, gradient descent<br />
can be slow.<br />
If α is too large, gradient descent<br />
can overshoot the minimum. It may<br />
fail to converge, or even diverge.<br />
Andrew Ng
at local optima<br />
Current value of<br />
Andrew Ng
<strong>Gradient</strong> descent can converge to a local<br />
minimum, even with the learning rate α fixed.<br />
As we approach a local<br />
minimum, gradient<br />
descent will automatically<br />
take smaller steps. So, no<br />
need to decrease α over<br />
time.<br />
Andrew Ng
Linear regression with<br />
one variable<br />
<strong>Gradient</strong> descent for<br />
linear regression<br />
Machine Learning<br />
Andrew Ng
<strong>Gradient</strong> descent algorithm<br />
Linear Regression Model<br />
Andrew Ng
Andrew Ng
<strong>Gradient</strong> descent algorithm<br />
update<br />
and<br />
simultaneously<br />
Andrew Ng
J(θ 0 ,θ 1 )<br />
θ 1<br />
θ 0<br />
Andrew Ng
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )<br />
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )<br />
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )<br />
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )<br />
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )<br />
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )<br />
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )<br />
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )<br />
Andrew Ng
(for fixed , this is a function of x) (function of the parameters )<br />
Andrew Ng
“Batch” <strong>Gradient</strong> <strong>Descent</strong><br />
“Batch”: Each step of gradient descent<br />
uses all the training examples.<br />
Andrew Ng
• Batch gradient descent<br />
• stochastic gradient descent (also incremental<br />
gradient descent)<br />
Andrew Ng
Questions ?<br />
Thank you!<br />
Andrew Ng