05.11.2014 Views

national multiple family submetering and allocation billing program ...

national multiple family submetering and allocation billing program ...

national multiple family submetering and allocation billing program ...

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

particular property (e.g., the monthly rent of units in an individually-owned complex). The<br />

cumulative effect of these missing data often resulted in a dramatically lower number of<br />

properties being included in the analysis than the number of properties for which water use data<br />

were available. Thus, the models calculated through stepwise regression only used the cases<br />

where all the variables to be examined were present for that case, even if a variable was<br />

ultimately eliminated from inclusion in the model. As both a test of the appropriateness of the<br />

model, <strong>and</strong> to check for any other variable that sometimes can be significantly associated with a<br />

dependent variable even if an automated method such as stepwise regression does not detect it,<br />

many regression models were examined using a method that required entry of certain variables to<br />

choose the most predictive models presented in Chapter 6.<br />

The statistics produced for regression equations include a test of the hypothesis that there<br />

is no relationship between the dependent variable <strong>and</strong> the predictor variables. The results of this<br />

test are reported as an F-statistic with an associated p-value. In general, only models with a p-<br />

value of 0.05 or less are considered “significant,” meaning that if there were no difference, the<br />

probability of seeing a result as or more extreme than that seen in the sample was less than 5%.<br />

In addition, an adjusted R-squared is calculated, which can be interpreted as the proportion of the<br />

variability in the dependent variable accounted for by the factors included in the regression<br />

model.<br />

Regression coefficients are calculated for each predictor variable in the model. These<br />

coefficients can be interpreted as a “slope,” that is, for every unit change in the predictor<br />

variable, the independent variable would change by the amount of the regression coefficient. A<br />

test of statistical significance is calculated for each regression coefficient, with a corresponding<br />

p-value.<br />

The fit of the model <strong>and</strong> the appropriateness of the variables for inclusion in the model<br />

can be tested by examining a scatter plot of the predicted values (usually on the x-axis) <strong>and</strong> the<br />

residual values, usually on the y-axis. A predicted value for the dependent variable can be<br />

calculated for each case, given the values the independent variables in the model for each case.<br />

The residual values are the difference between the actual value of the dependent variable for a<br />

case <strong>and</strong> the predicted value. In a perfect model the residual value would be zero <strong>and</strong> all points<br />

would lie on the x-axis. If there is not an abnormal distribution of the dependent variable or of<br />

the other variables included in the regression model, the scatter plot will resemble a “cloud” or a<br />

59

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!