25.11.2014 Views

Biostatistics

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

8.1 INTRODUCTION 305<br />

8.1 INTRODUCTION<br />

In the preceding chapters the basic concepts of statistics have been examined, and they<br />

provide a foundation for this and the next several chapters. In this chapter and the three that<br />

follow, we provide an overview of two of the most commonly employed analytical tools<br />

used by applied statisticians, analysis of variance and linear regression. The conceptual<br />

foundations of these analytical tools are statistical models that provide useful representations<br />

of the relationships among several variables simultaneously.<br />

Linear Models A statistical model is a mathematical representation of the relationships<br />

among variables. More specifically for the purposes of this book, a statistical model is<br />

most often used to describe how random variables are related to one another in a context in<br />

which the value of one outcome variable, often referred to with the letter “y,” can be<br />

modeled as a function of one or more explanatory variables, often referred to with the letter<br />

“x.” In this way, we are interested in determining how much variability in outcomes can be<br />

explained by random variables that were measured or controlled as part of an experiment.<br />

The linear model can be expanded easily to the more generalized form, in which we include<br />

multiple outcome variables simultaneously. These models are referred to as General Linear<br />

Models, and can be found in more advanced statistics books.<br />

DEFINITION<br />

An outcome variable is represented by the set of measured values that<br />

result from an experiment or some other statistical process. An<br />

explanatory variable, on the other hand, is a variable that is useful for<br />

predicting the value of the outcome variable.<br />

A linear model is any model that is linear in the parameters that define the model. We<br />

can represent such models generically in the form:<br />

Y j ¼ b 0 þ b 1 X 1j þ b 2 X 2j þ ...þ b k X kj þ e j (8.1.1)<br />

In this equation, b j represents the coefficients in the model and e j represents random error.<br />

Therefore, any model that can be represented in this form, where the coefficients are<br />

constants and the algebraic order of the model is one, is considered a linear model. Though<br />

at first glance this equation may seem daunting, it actually is generally easy to find values<br />

for the parameters using basic algebra or calculus, as we shall see as the chapter progresses.<br />

We will see many representations of linear models in this and other forms in the next<br />

several chapters. In particular, we will focus on the use of linear models for analyzing data<br />

using the analysis of variance for testing differences among means, regression for making<br />

predictions, and correlation for understanding associations among variables. In the context<br />

of analysis of variance, the predictor variables are classification variables used to define<br />

factors of interest (e.g., differentiating between a control group and a treatment group), and<br />

in the context of correlation and linear regression the predictor variables are most often<br />

continuous variables, or at least variables at a higher level than nominal classes. Though the<br />

underlying purposes of these tasks may seem quite different, studying these techniques and

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!