Variable Selection Methods for Giga-Bases - INFORMS NY
Correlations and Redundancy. Cooperative Suppression. Cooperative Suppression (no graph): positive correlation with dependent variable, but negative correlation among pairs of independent variables. Thus, when a variable is partialled out from another, all measures of fit are enhanced. Tentative Detection: Standardized coefficient > r i suppression. If r i is zero or close to it classical suppression. If standardized coefficient is of opposite sign to correlation net suppression. If standardized coefficient is larger than r i and of the same sign cooperative suppression. 9/18/03 46
Correlations and Redundancy. Algorithm. With thousands of variables in a Giga-Base, use the additional definition (which is not so general as the previous one, Tzelgov, Henik, 1985): sr YX = r Y(i.1,2,3,(i-1),(i+1),,,n) > r Yi ………if both coefficients > 0. (1) sr YX = r Y(i.1,2,3,(i-1),(i+1),,,n) < r Yi ………if both coefficients < 0. (2) Which translates into English as: “the semi-partial correlation is larger/smaller than the corresponding zero-order correlation, according to conditions (1) and (2) respectively”. Is there a way to quickly visualize all these relations 9/18/03 47
Conclusions. •We have reviewed ma
10 Commandments of Data Exploration
6. BIBLIOGRAPHY (cont. 1) Breiman L
6. BIBLIOGRAPHY (cont. 3) George E.