14.03.2014 Views

Modeling and Multivariate Methods - SAS

Modeling and Multivariate Methods - SAS

Modeling and Multivariate Methods - SAS

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

Chapter 13 Recursively Partitioning Data 329<br />

Partition Method<br />

Bootstrap Forest<br />

The Bootstrap Forest method makes many trees, <strong>and</strong> averages the predicted values to get the final predicted<br />

value. Each tree is grown on a different r<strong>and</strong>om sample (with replacement) of observations, <strong>and</strong> each split<br />

on each tree considers only a r<strong>and</strong>om sample of c<strong>and</strong>idate columns for splitting. The process can use<br />

validation to assess how many trees to grow, not to exceed the specified number of trees.<br />

Another word for bootstrap-averaging is bagging. Those observations included in the growing of a tree are<br />

called the in-bag sample, abbreviated IB. Those not included are called the out-of-bag sample, abbreviated<br />

OOB.<br />

Bootstrap Forest Fitting Options<br />

If the Bootstrap Forest method is selected on the platform launch window, the Bootstrap Forest options<br />

window appears after clicking OK. Figure 13.10 shows the window using the Car Poll.jmp data table. The<br />

column sex is used as the response, <strong>and</strong> the other columns are used as the predictors.<br />

Figure 13.10 Bootstrap Forest Fitting Options<br />

The options on the Bootstrap Forest options window are described here:<br />

Number of rows<br />

Number of terms<br />

gives the number of observations in the data table.<br />

gives the number of columns specified as predictors.<br />

Number of trees in the forest<br />

is the number of trees to grow, <strong>and</strong> then average together.<br />

Number of terms sampled per split is the number of columns to consider as splitting c<strong>and</strong>idates at<br />

each split. For each split, a new r<strong>and</strong>om sample of columns is taken as the c<strong>and</strong>idate set.<br />

Bootstrap sample rate is the proportion of observations to sample (with replacement) for growing each<br />

tree. A new r<strong>and</strong>om sample is generated for each tree.<br />

Minimum Splits Per Tree<br />

is the minimum number of splits for each tree.<br />

Minimum Size Split<br />

is the minimum number of observations needed on a c<strong>and</strong>idate split.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!