09.12.2012 Views

2003 IMTA Proceedings - International Military Testing Association

2003 IMTA Proceedings - International Military Testing Association

2003 IMTA Proceedings - International Military Testing Association

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

312<br />

Classification and regression trees (CART; Breiman, Friedman, Olshen, & Stone, 1984)<br />

is one algorithm associated with this approach. Through brute force, CART examines all<br />

possible binary splits of the data (answers to “yes/no” questions) based on all of the predictor<br />

variables. It places the best split at the root of the tree and continues this process until no further<br />

splits can be made. Later, the resulting decision tree is “pruned” according to misclassification<br />

rates, user-determined preferences (e.g., permitted number of cases in a terminal node), or by<br />

eliminating redundant nodes. Additionally, competing trees may develop depending on the<br />

nature of the data.<br />

To assess classification accuracy, CART uses “v-fold cross-validation” in which the<br />

sample is divided into v subsamples and grows a decision tree after combining v-1 subsamples<br />

and assesses the classification accuracy using the hold-out subsample. This process is iterated so<br />

that v-1 subsamples are combined and used to grow decision trees and each of the v samples is<br />

used as the hold-out sample once. Classification accuracy is estimated as the average<br />

classification accuracy across the v holdout subsamples.<br />

As mentioned earlier, CART may delineate separate “paths” that lead to the same<br />

outcome –identifying configural relationships in the data. Also, CART may reuse any number of<br />

variables in separate parts of the tree, and thus, may capture non-linear relationships. Below we<br />

describe a investigation that examined whether we could improve upon the Adaptability<br />

Composite in predicting attrition with the AIM.<br />

Applying Decision Tree Methodology to AIM Data<br />

Sample, data, and software<br />

A file containing the data from 22,328 enlisted U.S. Army personnel was created<br />

containing the AIM scale scores and a retention variable to 12 months. The data for this file<br />

came from the AIM Grand Research Database, managed by ARI and a contractor, the Human<br />

Resources Research Organization (HumRRO). This database is the source of much of the recent<br />

research surrounding the AIM and, for this database, the AIM was administered to these<br />

personnel between 1998 and 1999 for research purposes only (Knapp, Heggestad, & Young, in<br />

preparation). The 12-month time interval was selected because this provided the CART 4.0<br />

software package (Breiman, Friedman, Olshen, & Stone, 1997) with a sufficient number of<br />

respondents with which to “grow” a tree. The six content scales were used as input (predictor<br />

variables) and the 12-month attrition variable (criterion) was treated dichotomously (i.e.,<br />

“stayers” and “leavers”).<br />

Results<br />

The analysis yielded 39 trees ranging in complexity from two terminal nodes to a tree<br />

with 2802 terminal nodes and a depth of 51 levels or tiers. However, the larger trees exhibited<br />

high rates of misclassification among the stayers (as much as 60 percent). Of particular interest<br />

were five trees resulting from this analysis. Table 1 summarizes the misclassification rates for<br />

these five “best” trees using the v-fold cross-validation approach (v=10).<br />

45 th Annual Conference of the <strong>International</strong> <strong>Military</strong> <strong>Testing</strong> <strong>Association</strong><br />

Pensacola, Florida, 3-6 November <strong>2003</strong>

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!