06.09.2021 Views

Lies, Damned Lies, or Statistics- How to Tell the Truth with Statistics, 2017a

Lies, Damned Lies, or Statistics- How to Tell the Truth with Statistics, 2017a

Lies, Damned Lies, or Statistics- How to Tell the Truth with Statistics, 2017a

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

36 2. BI-VARIATE STATISTICS: BASICS<br />

2.3. C<strong>or</strong>relation<br />

As bef<strong>or</strong>e (in §§1.4 and 1.5), when we moved from describing his<strong>to</strong>grams <strong>with</strong> w<strong>or</strong>ds<br />

(like symmetric) <strong>to</strong> describing <strong>the</strong>m <strong>with</strong> numbers (like <strong>the</strong> mean), we now will build a<br />

numeric measure of <strong>the</strong> strength and direction of a linear association in a scatterplot.<br />

DEFINITION 2.3.1. Given bivariate quantitative data {(x 1 ,y 1 ),...,(x n ,y n )} <strong>the</strong> [Pearson]<br />

c<strong>or</strong>relation coefficient of this dataset is<br />

r = 1 ∑ (xi − x) (y i − y)<br />

n − 1 s x s y<br />

where s x and s y are <strong>the</strong> standard deviations of <strong>the</strong> x and y, respectively, datasets by <strong>the</strong>mselves.<br />

We collect some basic inf<strong>or</strong>mation about <strong>the</strong> c<strong>or</strong>relation coefficient in <strong>the</strong> following<br />

FACT 2.3.2. F<strong>or</strong> any bivariate quantitative dataset {(x 1 ,y 1 ),...,(x n ,y n )} <strong>with</strong> c<strong>or</strong>relation<br />

coefficient r,wehave<br />

(1) −1 ≤ r ≤ 1 is always true;<br />

(2) if |r| is near 1 – meaning that r is near ±1 – <strong>the</strong>n <strong>the</strong> linear association between x<br />

and y is strong<br />

(3) if r is near 0 –meaningthatr is positive <strong>or</strong> negative, but near 0 – <strong>the</strong>n <strong>the</strong> linear<br />

association between x and y is weak<br />

(4) if r>0 <strong>the</strong>n <strong>the</strong> linear association between x and y is positive, while if r

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!