18.12.2012 Views

Proceedings

Proceedings

Proceedings

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

( ) M<br />

where C<br />

is a complexity penalty that increases with the number of basis functions<br />

in the model and which is defined as:<br />

( M ) = ( M + 1)<br />

d M<br />

C +<br />

where M is the number of basis functions in equation 7, and the parameter d is a<br />

penalty for each basis function included into the model. d can be also regarded as a<br />

smoothing parameter. In the present research, d equals 2. This value can be chosen by<br />

model user but it must be remarked that a smaller d generates a larger model with<br />

more basis functions; a larger d creates a smaller model with less basis functions<br />

(Kriner, 2007). Further details about the selection of the d parameter can be seen in<br />

Friedman (1991).<br />

The maximum interaction level of the spline basis functions is restricted to 10 as the<br />

application of the MARS model to the studied data set confirmed that the maximum<br />

*<br />

interaction level (optimal model size M ) was 2.<br />

The main steps of the MARS algorithm as applied in this research can be summarized<br />

as follows (Sekulic and Kowalski, 1992):<br />

1. Select the maximum allowed complexity for the model and define the d<br />

parameter.<br />

2. Forward stepwise selection:<br />

3. Start with the simplest model, i.e. with the constant coefficient only.<br />

4. Explore the space of the basis functions for each explanatory variable.<br />

5. Determine the number of basis functions ( M ) that minimizes the prediction<br />

error and include them into the model.<br />

6. Go to step 2 until a model with a predetermined complexity is derived.<br />

7. Backward stepwise selection:<br />

8. Search the entire set of basis functions (excluding the constant) and delete<br />

from the model the one that contributes least to the overall goodness of fit<br />

using the GCV criterion.<br />

9. Repeat 5 until GCV reaches its maximum.<br />

The predetermined complexity of MARS model in step 3 should be considerably<br />

∗<br />

∗<br />

larger than the optimal (minimal GCV) model size M , so choosing 2 M as the<br />

minimum predetermined complexity for the model is enough in general (Friedman<br />

and Roosen, 1995).<br />

The predictive ability of the MARS model can be evaluated in terms of the root mean<br />

squared error of cross-validation (RMSECV) and the squared leave-one-out<br />

2<br />

correlation coefficient (<br />

q<br />

). To compute RMSECV, one object is left out from the<br />

data set and the model is constructed for the remaining n −1<br />

objects. Then the model<br />

is used to predict the value for the object which is left out. When all objects have been<br />

left out once, RMSECV is given by the following expression (Friedman and Roosen,<br />

1995):<br />

~ 174 ~<br />

(9)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!