28.12.2013 Views

Relax and Randomize: From Value to Algorithms

Relax and Randomize: From Value to Algorithms

Relax and Randomize: From Value to Algorithms

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Proof of Proposition 5. We would like <strong>to</strong> show that, with the distribution q ∗ t defined in (15),<br />

max { E ∣ŷ t − y t ∣ + Rel T (F∣(x t , y t ))} ≤ Rel T (F∣(x t−1 , y t−1 ))<br />

y t∈{±1} ŷ t∼qt<br />

∗<br />

for any x t ∈ X . Let σ ∈ {±1} t−1 <strong>and</strong> σ t ∈ {±1}. We have<br />

Rel T (F∣(x t , y t )) − 2λ(T − t)<br />

= 1 λ log ⎛ ⎝<br />

≤ 1 λ log ⎛ ⎝<br />

∑<br />

(σ,σ t)∈F∣ x t<br />

∑<br />

σ t∈{±1}<br />

g(Ldim(F t (σ, σ t )), T − t) exp {−λL t−1 (σ)} exp {−λ∣σ t − y t ∣} ⎞ ⎠<br />

exp {−λ∣σ t − y t ∣}<br />

∑<br />

σ∶(σ,σ t)∈F∣ x t<br />

g(Ldim(F t (σ, σ t )), T − t) exp {−λL t−1 (σ)} ⎞ ⎠<br />

Just as in the proof of Proposition 3, we may think of the two choices σ t as the two experts whose<br />

weighting q ∗ t is given by the sum involving the Littles<strong>to</strong>ne’s dimension of subsets of F. Introducing<br />

the normalization term, we arrive at the upper bound<br />

1<br />

λ log (E σ t∼qt ∗ exp {−λ∣σ t − y t ∣}) + 1 λ log ⎛ ⎝<br />

≤ −E σt∼q ∗ t ∣σ t − y t ∣ + 2λ + 1 λ log ⎛ ⎝<br />

∑<br />

∑<br />

∑<br />

σ t∈{±1} σ∶(σ,σ t)∈F∣ x t<br />

∑<br />

σ t∈{±1} σ∶(σ,σ t)∈F∣ x t<br />

g(Ldim(F t (σ, σ t )), T − t) exp {−λL t−1 (σ)} ⎞ ⎠<br />

g(Ldim(F t (σ, σ t )), T − t) exp {−λL t−1 (σ)} ⎞ ⎠<br />

The last step is due <strong>to</strong> Lemma A.1 in [6]. It remains <strong>to</strong> show that the log normalization term is upper<br />

bounded by the relaxation at the previous step:<br />

1<br />

λ log ⎛ ⎝<br />

≤ 1 λ log ⎛ ⎝<br />

≤ 1 λ log ⎛ ⎝<br />

∑<br />

∑<br />

σ t∈{±1} σ∶(σ,σ t)∈F∣ x t<br />

∑<br />

σ∈F∣ x t−1<br />

∑<br />

σ∈F∣ x t−1<br />

= Rel T (F∣(x t−1 , y t−1 ))<br />

g(Ldim(F t (σ, σ t )), T − t) exp {−λL t−1 (σ)} ⎞ ⎠<br />

exp {−λL t−1 (σ)} ∑ g(Ldim(F t (σ, σ t )), T − t) ⎞<br />

σ t∈{±1}<br />

⎠<br />

exp {−λL t−1 (σ)} g(Ldim(F t−1 (σ)), T − t + 1) ⎞ ⎠<br />

To justify the last inequality, note that F t−1 (σ) = F t (σ, +1)∪F t (σ, −1) <strong>and</strong> at most one of F t (σ, +1)<br />

or F t (σ, −1) can have Littles<strong>to</strong>ne’s dimension Ldim(F t−1 (σ)). We now appeal <strong>to</strong> the recursion<br />

g(d, T − t) + g(d − 1, T − t) ≤ g(d, T − t + 1)<br />

where g(d, T − t) is the size of the zero cover for a class with Littles<strong>to</strong>ne’s dimension d on the<br />

worst-case tree of depth T − t (see [14]). This completes the proof of admissibility.<br />

Alternative Method<br />

problem<br />

with the relaxation<br />

Let us now derive the algorithm. Once again, consider the optimization<br />

max { E ∣ŷ t − y t ∣ + Rel T (F∣(x t , y t ))}<br />

y t∈{±1} ŷ t∼qt<br />

∗<br />

Rel T (F∣(x t , y t )) = 1 λ log ⎛ ⎝ ∑ g(Ldim(F t (σ)), T − t) exp {−λL t (σ)} ⎞<br />

σ∈F∣ x<br />

⎠ + λ (T − t)<br />

2<br />

t<br />

The maximum can be written explicitly, as in Section 3:<br />

⎧⎪<br />

max ⎨ − qt ⎪⎩1 ∗ + 1 λ log ⎛ ∑ g(Ldim(F t (σ, σ t )), T − t) exp {−λL t−1 (σ)} exp {−λ(1 − σ t )} ⎞ ⎝(σ,σ t)∈F∣ x<br />

⎠ ,<br />

t<br />

1 + qt ∗ + 1 λ log ⎛ ∑ g(Ldim(F t (σ, σ t )), T − t) exp {−λL t−1 (σ)} exp {−λ(1 + σ t )} ⎞ ⎫⎪<br />

⎬<br />

⎝(σ,σ t)∈F∣ x<br />

⎠<br />

t<br />

⎪⎭<br />

14

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!