13.07.2015 Views

Boosted Regression (Boosting): An introductory tutorial and a Stata ...

Boosted Regression (Boosting): An introductory tutorial and a Stata ...

Boosted Regression (Boosting): An introductory tutorial and a Stata ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

my experience the cross-validated R 2 as a function of the number of iterations isunimodal, i.e. there is only one maximum. If bestiter is too close to maxiter then thenumber of iterations that maximizes the likelihood may be greater than maxiter. It isrecommended to rerun the comm<strong>and</strong> with a larger value for maxiter. The comm<strong>and</strong> I amgiving is:boost y x1-x4, distribution(normal) train(0.5) maxiter(4000) seed(1)bag(0.5) interaction(`inter') shrink(0.01)where the comm<strong>and</strong>s only differ by inter ranging from 1 through 5. One of these boostcomm<strong>and</strong>s runs in 8.8 seconds on my laptop. Fixing the seed is only relevant forbagging. Figure 4 shows a plot of the test R 2 versus the number of interactions. The testR 2 is roughly the same regardless of the number of interactions (note the scale of theplot). The fact that the test R 2 is high even for the main effect model (interaction=1) doesnot surprise because our model did not contain any interactions. The actual number ofiterations that maximizes the likelihood, bestiter, varies. Here the number of iterations are(number of interactions in parenthesis): 3769 (1), 2171 (2), 2401 (3), 1659 (4), 1156 (5).17

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!