06.06.2013 Views

Theory of Statistics - George Mason University

Theory of Statistics - George Mason University

Theory of Statistics - George Mason University

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

0.3 Some Basics <strong>of</strong> Linear Algebra 801<br />

Beyond the basics <strong>of</strong> differentiation <strong>of</strong> constant multiples or powers <strong>of</strong> a<br />

variable, the two most important properties <strong>of</strong> derivatives <strong>of</strong> expressions are<br />

the linearity <strong>of</strong> the operation and the chaining <strong>of</strong> the operation. These yield<br />

rules that correspond to the familiar rules <strong>of</strong> the differential calculus. A simple<br />

result <strong>of</strong> the linearity <strong>of</strong> the operation is the rule for differentiation <strong>of</strong> the trace:<br />

∂<br />

tr(Y ) = tr<br />

∂x<br />

∂<br />

∂x Y<br />

0.3.3.15 Higher-Order Derivatives with Respect to Vectors; The<br />

Hessian<br />

Higher-order derivatives are derivatives <strong>of</strong> lower-order derivatives. As we have<br />

seen, a derivative <strong>of</strong> a given function with respect to a vector is a more complicated<br />

object than the original function. The simplest higher-order derivative<br />

with respect to a vector is the second-order derivative <strong>of</strong> a scalar-valued function.<br />

Higher-order derivatives may become uselessly complicated.<br />

In accordance with the meaning <strong>of</strong> derivatives <strong>of</strong> vectors with respect to<br />

vectors, the second derivative <strong>of</strong> a scalar-valued function with respect to a<br />

vector is a matrix <strong>of</strong> the partial derivatives <strong>of</strong> the function with respect to the<br />

elements <strong>of</strong> the vector. This matrix is called the Hessian, and is denoted by<br />

Hf or sometimes by ∇∇f or ∇ 2 f:<br />

Hf = ∂2f =<br />

∂x∂xT ⎡<br />

⎢<br />

⎣<br />

∂ 2 f<br />

∂x 2 1<br />

∂ 2 f<br />

∂x2∂x1<br />

∂ 2 f<br />

∂xm∂x1<br />

∂ 2 f<br />

∂x1∂x2<br />

∂ 2 f<br />

∂x 2 2<br />

· · ·<br />

∂ 2 f<br />

∂xm∂x2<br />

· · ·<br />

· · ·<br />

· · ·<br />

<br />

.<br />

∂ 2 f<br />

∂x1∂xm<br />

∂ 2 f<br />

∂x2∂xm<br />

∂ 2 f<br />

∂x 2 m<br />

⎤<br />

⎥ . (0.3.51)<br />

⎥<br />

⎦<br />

The Hessian is a function <strong>of</strong> x. We write Hf(x) or ∇∇f(x) or ∇ 2 f(x) for<br />

the value <strong>of</strong> the Hessian at x.<br />

0.3.3.16 Summary <strong>of</strong> Derivatives with Respect to Vectors<br />

As we have seen, the derivatives <strong>of</strong> functions are complicated by the problem<br />

<strong>of</strong> measuring the change in the function, but <strong>of</strong>ten the derivatives <strong>of</strong> functions<br />

with respect to a vector can be determined by using familiar scalar differentiation.<br />

In general, we see that<br />

• the derivative <strong>of</strong> a scalar (a quadratic form) with respect to a vector is a<br />

vector and<br />

• the derivative <strong>of</strong> a vector with respect to a vector is a matrix.<br />

Table 0.3 lists formulas for the vector derivatives <strong>of</strong> some common expressions.<br />

The derivative ∂f/∂x T is the transpose <strong>of</strong> ∂f/∂x.<br />

<strong>Theory</strong> <strong>of</strong> <strong>Statistics</strong> c○2000–2013 James E. Gentle

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!