06.06.2013 Views

Theory of Statistics - George Mason University

Theory of Statistics - George Mason University

Theory of Statistics - George Mason University

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

f(x) ≈ f(x∗) + (x − x∗) T 1<br />

gf x∗ +<br />

0.3 Some Basics <strong>of</strong> Linear Algebra 805<br />

2 (x − x∗) T Hf<br />

<br />

x∗ (x − x∗), (0.3.53)<br />

<br />

because gf x∗ = 0, we have a general method <strong>of</strong> finding a stationary point<br />

for the function f(·), called Newton’s method. If x is an m-vector, gf(x) is an<br />

m-vector and Hf(x) is an m × m matrix.<br />

Newton’s method is to choose a starting point x (0) , then, for k = 0, 1, . . .,<br />

to solve the linear systems<br />

(k)<br />

Hf x p (k+1) (k)<br />

= −gf x <br />

(0.3.54)<br />

for p (k+1) , and then to update the point in the domain <strong>of</strong> f(·) by<br />

x (k+1) = x (k) + p (k+1) . (0.3.55)<br />

The two steps are repeated until there is essentially no change from one iteration<br />

to the next. If f(·) is a quadratic function, the solution is obtained in<br />

one iteration because equation (0.3.53) is exact. These two steps have a very<br />

simple form for a function <strong>of</strong> one variable.<br />

0.3.4.3 Linear Least Squares<br />

In a least squares fit <strong>of</strong> a linear model<br />

y = Xβ + ɛ, (0.3.56)<br />

where y is an n-vector, X is an n×m matrix, and β is an m-vector, we replace<br />

β by a variable b, define the residual vector<br />

and minimize its Euclidean norm,<br />

r = y − Xb, (0.3.57)<br />

f(b) = r T r, (0.3.58)<br />

with respect to the variable b. We can solve this optimization problem by<br />

taking the derivative <strong>of</strong> this sum <strong>of</strong> squares and equating it to zero. Doing<br />

this, we get<br />

d(y − Xb) T (y − Xb)<br />

db<br />

which yields the normal equations<br />

= d(yT y − 2b T X T y + b T X T Xb)<br />

db<br />

= −2X T y + 2X T Xb<br />

= 0,<br />

<strong>Theory</strong> <strong>of</strong> <strong>Statistics</strong> c○2000–2013 James E. Gentle<br />

X T Xb = X T y. (0.3.59)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!