06.06.2013 Views

Theory of Statistics - George Mason University

Theory of Statistics - George Mason University

Theory of Statistics - George Mason University

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

0.4.1.2 Methods<br />

0.4 Optimization 815<br />

• Analytic: yields closed form for all local maxima.<br />

• Iterative: for k = 1, 2, . . ., given x (k−1) choose x (k) so that f(x (k) ) →local<br />

maximum <strong>of</strong> f.<br />

We need<br />

– a starting point: x (0) ;<br />

– a method to choose ˜x with good prospects <strong>of</strong> being x (k) ;<br />

– a method to decide whether ˜x should be x (k) .<br />

How we choose and decide determines the differences between optimization<br />

algorithms.<br />

How to choose may be based on derivative information or on some systematic<br />

method <strong>of</strong> exploring the domain.<br />

How to decide may be based on a deterministic criterion, such as requiring<br />

f(x (k) ) > f(x (k−1) ),<br />

or the decision may be randomized.<br />

0.4.1.3 Metamethods: General Tools<br />

• Transformations (for either analytic or iterative methods).<br />

• Any trick you can think <strong>of</strong> (for either analytic or iterative methods), e.g.,<br />

alternating conditional optimization.<br />

• Conditional bounding functions (for iterative methods).<br />

0.4.1.4 Convergence <strong>of</strong> Iterative Algorithms<br />

In an iterative algorithm, we have a sequence f x (k) , x (k) .<br />

The first question is whether the sequence converges to the correct solution.<br />

If there is a unique maximum, and if x∗ is the point at which the maximum<br />

occurs, the first question can be posed more precisely as, given ɛ1 does there<br />

exist M1 such that for k > M1,<br />

<br />

<br />

f(x (k) <br />

<br />

) − f(x∗) < ɛ1;<br />

or, alternatively, for a given ɛ2 does there exist M2 such that for k > M2,<br />

<br />

<br />

x (k) <br />

<br />

− x∗<br />

< ɛ2.<br />

Recall that f : IR d ↦→ IR, so | · | above is the absolute value, while · is<br />

some kind <strong>of</strong> vector norm.<br />

There are complications if x∗ is not unique as the point at which the<br />

maximum occurs.<br />

Similarly, there are comlications if x∗ is merely a point <strong>of</strong> local maximum.<br />

<strong>Theory</strong> <strong>of</strong> <strong>Statistics</strong> c○2000–2013 James E. Gentle

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!