10.12.2012 Views

Understanding the Fundamentals of Epidemiology an evolving text

Understanding the Fundamentals of Epidemiology an evolving text

Understanding the Fundamentals of Epidemiology an evolving text

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

variables is linear. An r <strong>of</strong> zero me<strong>an</strong>s that <strong>the</strong> two variables are not at all linearly related (<strong>the</strong>y may<br />

never<strong>the</strong>less be associated in some o<strong>the</strong>r fashion, e.g., a U-shaped relationship). An r <strong>of</strong> +1 or -1<br />

me<strong>an</strong>s that every pair <strong>of</strong> observations <strong>of</strong> <strong>the</strong> two variables corresponds to a point on a straight line<br />

drawn on ordinary graph paper. However, knowing whe<strong>the</strong>r or not <strong>the</strong> relationship is linear tells us<br />

nothing about <strong>the</strong> steepness <strong>of</strong> <strong>the</strong> line, e.g., how much increase in blood pressure results from a 5%<br />

increase in body mass. O<strong>the</strong>r correlation coefficients (e.g., Spearm<strong>an</strong>) measure <strong>the</strong> degree to which<br />

a relationship is monotonic (i.e., <strong>the</strong> two variables covary, without regard to whe<strong>the</strong>r <strong>the</strong> pairs <strong>of</strong><br />

observations correspond to a straight line or a curve).<br />

Epidemiologists think <strong>of</strong> <strong>the</strong> relationships between variables as indications <strong>of</strong> mech<strong>an</strong>istic processes,<br />

so for <strong>an</strong> epidemiologist, strength <strong>of</strong> association me<strong>an</strong>s how large a ch<strong>an</strong>ge in risk or some o<strong>the</strong>r<br />

outcome results from a given absolute or relative ch<strong>an</strong>ge in <strong>an</strong> exposure. If <strong>the</strong> assumption is<br />

correct, <strong>the</strong> strength should not depend upon <strong>the</strong> r<strong>an</strong>ge <strong>of</strong> exposures measured or o<strong>the</strong>r aspects <strong>of</strong><br />

<strong>the</strong> distribution. In contrast, r is affected by <strong>the</strong> r<strong>an</strong>ge <strong>an</strong>d distribution <strong>of</strong> <strong>the</strong> two variables <strong>an</strong>d<br />

<strong>the</strong>refore has no epidemiologic interpretation (Rothm<strong>an</strong>, p.303). St<strong>an</strong>dardized regression<br />

coefficients are also not recommended for epidemiologic <strong>an</strong>alysis for similar reasons (see Greenl<strong>an</strong>d,<br />

Schlesselm<strong>an</strong>, <strong>an</strong>d Criqui, 1986).<br />

Correlation coefficients between dichotomous variables — Correlation coefficients c<strong>an</strong> be<br />

particularly problematic when used to qu<strong>an</strong>tify <strong>the</strong> relationship between two dichotomous (binary)<br />

factors, especially when one or both <strong>of</strong> <strong>the</strong>m are rare. The reason is that correlation coefficients<br />

between binary variables c<strong>an</strong>not attain <strong>the</strong> <strong>the</strong>oretical minimum (-1) <strong>an</strong>d maximum (+1) values<br />

except in <strong>the</strong> special case when <strong>the</strong> both factors are present half <strong>of</strong> <strong>the</strong> time <strong>an</strong>d absent half <strong>of</strong> <strong>the</strong><br />

time (Peduzzi, Peter N., Ka<strong>the</strong>rine M. Detre, Yick-Kwong Ch<strong>an</strong>. Upper <strong>an</strong>d lower bounds for<br />

correlations in 2 x 2 tables–revisited. J Chron Dis 1983;36:491-496). If one or both factors are rare,<br />

even if <strong>the</strong> two variables are very strongly related, <strong>the</strong> correlation coefficient may be restricted to a<br />

modest value. In such a case <strong>an</strong> apparently small correlation coefficient (e.g., 0.15) may actually be<br />

large in comparison with <strong>the</strong> maximum value obtainable for given marginal proportions.<br />

For example, <strong>the</strong> correlation coefficient between smoking <strong>an</strong>d lung c<strong>an</strong>cer c<strong>an</strong>not be large when <strong>the</strong><br />

proportion <strong>of</strong> lung c<strong>an</strong>cer cases is small but that <strong>of</strong> smokers is large, as shown in <strong>the</strong> following<br />

example (Peduzzi PN, Detre KM, Ch<strong>an</strong> YK. Upper <strong>an</strong>d lower bounds for correlations in 2 x 2<br />

tables–revisited. J Chron Dis 1983;36:491-496) based on data from Allegheny County, PA:<br />

_____________________________________________________________________________________________<br />

www.epidemiolog.net, © Victor J. Schoenbach 2000 7. Relating risk factors to health - 181<br />

rev. 7/25/1999, 9/4/1999, 12/22/1999, 10/2/2000, 10/9/2000, 5/8/2001

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!