Math TEKS Algebra 1 - Texas Comprehensive Center

Mathematics TEKS Refinement 2006 – 9-12 Tarleton State University 

Tab 3: Algebra I 

Table of Contents 

Master Materials List 3-ii 

Spaghetti Regression 3-1 

Handout 1: Spaghetti Regression 3-6 

Transparency 1/Handout 2: Scatterplot 3-7 

Handout 3: Activity 1 Goodness-of-Fit 3-8 

Transparency 2 3-11 




Handout 4: Measuring 3-15 


Handout 5: Activity 2 3-17 

Handout 6: Activity 3 Absolute Value vs. Squaring 3-27 

Handout 7: Supplemental Material 3-31 

Understanding Correlation Properties with a Visual Model 3-34 



Handout 3: Activity 3 - Correlation vs. Causation 3-61 

Handout 4: Activity 3, Part B – Headlines 3-65 

Handout 5: Supplemental Reading 3-66 

Tab 3: Algebra I: Table of Contents 3-i


Tab 3: Algebra I 

Master Materials List 

Graphing calculator 

Spaghetti or linguine 

Tape 

Colored markers 

Straightedge 

Computer with internet access and Java 1.4 

Yard stick 

Spaghetti Regression: Transparencies and handouts 

Correlation: Transparencies and handouts 

The following materials are not in the notebook. They can be accessed on the MTR 

website until the 9-12 MTR CDs are available. 

Java Applet (http://mathteks2006.net/applets) 

PowerPoint presentation: Correlation vs. Causation 

(http://mathteks2006.net/documents/correlation.ppt) 

Tab 3: Algebra I: Master Materials List 3-ii


Activity: Spaghetti Regression 

Overview: Participants will investigate the concept of the “goodness-of-fit” and its 

significance in determining the regression line or best-fit line for the data. 

TEKS: This activity supports teacher content knowledge underlying the following 

TEKS. 

(A.2) Foundations of functions. The student uses the properties and 

attributes of functions. 

The student is expected to: 

(D) collect and organize data, make and interpret scatterplots (including 

recognizing positive, negative, or no correlation for data approximating 

linear situations), and model, predict, and make decisions and critical 

judgments in problem situations. 

Background: 

Fitting the graph of an equation to a data set is covered in all mathematics courses from 

Algebra I to Calculus and beyond. This module explores the concept in-depth, providing 

the participants with an understanding beyond that in ordinary secondary texts. The 

idea is to provide the background knowledge needed to understand the process of 

modeling. 

To enrich the study of functions, the TEKS call for the inclusion of problem situations 

which illustrate how mathematics can model aspects of the world. In real life, functions 

arise from data gathered through observations or experiments. This data rarely falls 

neatly into a straight line or along a curve. There is variability in real data and it is up to 

the student to find the function that best 'fits' the data. Regression, in its many facets, is 

probably the most widely used statistical methodology in existence. It is the basis of 

almost all modeling. 

This activity supports teacher knowledge underlying TEKS A.2.D, wherein students 

create scatterplots to develop an understanding of the relationships of bivariate data. 

This includes studying correlations and creating models from which they will predict and 

make critical judgments. As always, it is beneficial for students to generate their own 

data. This gives them ownership of the data and gives them insight into the process of 

collecting reliable data. Teachers should naturally encourage the students to discuss 

important concepts such as goodness-of-fit. Using the graphing calculator facilitates this 

understanding. Students will be curious about how the linear functions are created, and 

teachers should help students develop this understanding. 

Spaghetti Regression 3-1


Knuth and Hartmann in Technology-Supported Mathematics Learning Environments 

discuss the common approach to this topic: 

A common instructional practice is to have students plot the data on a 

coordinate plane, and then ask them to use a piece of spaghetti to 

represent the line that they will “fit” to the data. Students are typically 

instructed to position the spaghetti noodle so that it appears to be as 

close as possible to each point—visually determining the “best” fit. At 

this point students might determine the equation for their line and then 

use that equation in making predictions about additional points. 

Alternatively, the objective for the lesson might be to determine a line of 

best fit analytically, usually by using the statistical capabilities of a 

graphing calculator, and then to use the resulting equation in a similar 

fashion (i.e., to make predictions). In the former situation, the line that 

students identified as their line of best fit has not been determined 

mathematically and may or may not be the best fit in reality. In the latter 

example, the line has been determined mathematically, but students 

may not have an understanding of “what the calculator did” in 

determining the equation for the line or why the line is called a least 

squares line of best fit (the most commonly used line of best fit). 

Moreover, teachers often may not attempt to explain the underlying 

ideas, since the focus of the lesson may be on the use of the equation 

for the line. In either situation, ideas underlying the least squares line of 

best fit are not beyond the grasp of students and should be a topic of 

discussion. 

Participants will investigate the concept of the “goodness-of-fit” and its 

significance in determining the regression line or best-fit-line for the data. 

Development sequence: 

Activity 1 What is meant by “best”? 

What are non-analytical methods used by students to determine fit? 

Develop an analytical measure for fit. 

Discuss various measures, including residuals. 

Activity 2 

Activity 3 

Appendix 

Develop the least squares regression method via absolute value 

regression. 

Explore the effects of squaring the residuals and contrast it with using 

the absolute value of the residuals. 

Deriving the regression formula via algebra and then thru 

calculus. 

Historical notes. 

Materials: Graphing calculator 



Spaghetti or linguine 

Tape 

Colored markers 

Straightedge 

Computer with internet access (Activity 3) 

Transparencies: 1-6 (pages 3-7, 3-11 – 3-14, 3-16) 

Handout 1 (page 3-6) 


Handout 3 (pages 3-8 – 3-10) 



Handout 6 (pages 3-27 – 3-28 


Grouping: 4-5 per group 

Time: 1½ -2 hours 

Lesson: 

Procedures Notes 

Activity 1 

Have participants read and discuss Handout 

1, Spaghetti Regression: 

Overview/Learning 

Objectives/Background, (page 3-6). 

Give each participant 3-5 pieces of 

spaghetti, the Transparency 1/Handout 2, 

Scatterplot (page 3-7) and Handout 3, 

Activity 1: Goodness of Fit, (page 3-8 ). 

Have the participants examine the plot and 

visually determine a line of best-fit (or trend 

line) using a piece of spaghetti. They then 

tape the spaghetti line onto their graph. 

Ask: Who has the best line in your group? 

How can we determine this? 

Ask: What is meant by best? 

Ask: What is meant by a close fit? 

See, How Do You Find the Line of Best- 

Fit? (page 3-10), to discuss methods 

students use for placing trend lines. (Do not 

discuss how to measure yet; see below ) 

Discuss the importance of modeling and 

student discussions of concepts such as 

goodness-of-fit (see the Trainer Notes 

Background discussion above.) 

This should be done individually so that 

there is variation in the choice of lines within 

each group. 

This page discusses the general idea behind 

linear regression. To determine a line of 

best fit you must have an agreed upon 

measure of “goodness”. If that measure is 

“closeness of the points to the line”, the best 

line is then the line with the least total 

distance of points to the line. There are 

many methods for measuring “closeness.” 

The most common is the method of least 




discuss how to measure yet; see below.) squares. 

Have the participants use a second piece of 

spaghetti to measure the distance from each 

point to the line and break off that length. 

Each member of a group must measure the 

same way. Thus, each group must decide 

their method for measuring before they 

begin. 

Groups may measure vertically, horizontally, 

perpendicularly, etc. 

Have the participants line up their spaghetti 

distances to determine who in their group 

has the closest fit. Then, they replace the 

segments and tape them to their scatterplot. 

Have each group present their method and 

results. A good way to accomplish this is to 

have the “winner” from each table come up 

to the front. They can then be grouped by 

their method of measurement. Have each 

share, discuss, compare, and contrast. 

Distribute Handout 4, Measuring, (page 3- 

15) to discuss three ways (vertically, 

horizontally, perpendicularly) to measure the 

space between a point and the line. Discuss 

the meaning of a residual and why it is used 

in evaluating the accuracy of a model. 

Activity 2 

Intuitively, we think of a close fit as a good 

fit. We look for a line with little space 

between the line and the points it is 

supposed to fit. We would say that the best 

fitting line is the one that has the least 

space between itself and the data points 

which represent actual measurements. 

Encourage diversity in measuring methods 

among the groups to add depth to the 

following discussions. 

This will determine the total error (i.e., total 

distance from their line to the data). 

Discuss the fact that since the groups used 

different methods of measuring, we cannot 

determine best-of-fit for the entire class. 

Discuss accuracy of measurement. Did they 

measure from the edge of each point or the 

middle, etc.? 

Why measure vertically? The sole purpose 

in making a regression line is to use it to 

predict the output for a given input. The 

vertical distances (residuals) represent how 

far off the predictions are from the data we 

actually measured. 




Distribute Handout 5, Activity 2, (pages 3- 

17 – 3-21). 

Tell the participants we will now determine 

who has the best trend line in the class. 

Tell participants to look for “FYI:” in the 

activity for calculator help. 

Have participants stop when they finish #5 

and use overhead 2 to cultivate a class 

discussion of the questions in #5 before 

proceeding. It is important that participants 

understand why the residuals must be 

absolute valued or squared before summing. 

Transparency 6 reproduces the figure on 

page 3-19. 

Activity 3 

Distribute Handout 6, Activity 3, (pages 3- 

27 – 3-28). 

Participants will need a computer with Java 

version 1.4. 

Have participants open the applet 

Regression and work through handout. 

www.mathteks2006.net/applets 

Supplemental Material 

Ask participants to read the Handout 7, 

Supplemental Material, (pages 3-31 – 3- 

33). 

In Activity 1 the groups used different 

measures of goodness-of-fit; thus the best 

trend line of the class could not be 

determined. 

The participants will need a Graphing 

Calculator. 

Encourage the calculator-capable 

participants to help out within their groups. 

In this activity, an interactive java applet is 

used to investigate several data sets and 

contrast geometrically and numerically the 

effect of using the square of the residuals 

vs. the absolute value of the residuals. 

Encourage the participants to test their own 

conjectures and share/discuss with group. 

The Supplemental Material discusses two 

ways to minimize the sum of the squared 

residuals which leads to the formula that the 

calculator uses to find the least squares 

regression line. 

Historical notes are included about who 

originally developed least squares 

regression and where the term regression 

comes from. 



Overview 

Spaghetti Regression 

Participants will investigate the concept of the “goodness-of-fit” and its significance in 

determining the regression line or best-fit line for the data. 

Learning Objectives 

This activity supports Teacher Content Knowledge needed for A2D: The student 

is expected to collect and organize data, make and interpret scatterplots 

(including recognizing positive, negative, or no correlation for data approximating 

linear situations), and model, predict, and make decisions and critical judgments 

in problem situations. 

Background 

Fitting the graph of an equation to a data set is covered in all mathematics 

courses from Algebra I to Calculus and beyond. The objective of this module is 

to explore the concept in-depth to provide understanding beyond that in ordinary 

secondary texts. 

To enrich the study of functions, the TEKS call for the inclusion of problem 

situations which illustrate how mathematics can model aspects of the world. In 

real life, functions arise from data gathered through observations or experiments. 

This data rarely falls neatly into a straight line or along a curve. There is 

variability in real data and it is up to the student to find the function that best 'fits' 

the data. Regression, in its many facets, is probably the most widely use 

statistical methodology in existence. It is the basis of almost all modeling. 

This activity supports teacher knowledge underlying TEKS A.2.D, wherein 

students create scatterplots to develop an understanding of the relationships of 

bivariate data; this includes studying correlations and creating models from which 

they will predict and make critical judgments. As always, it is beneficial for 

students to generate their own data. This gives them ownership of the data and 

gives them insight into the process of collecting reliable data. Teachers should 

naturally encourage the students to discuss important concepts such as 

goodness-of fit. Using the graphing calculator facilitates this understanding. 

Students will be curious about how the linear functions are created, and teachers 

should help students develop this understanding. 

Handout 1 



Scatterplot 

Transparency 1/Handout 2 



Activity 1 Goodness-of-Fit 

Objective: To Investigate the concept of goodness of fit and develop an 

understanding of residuals in determining a line of best-fit. 

1. Examine the plot provided and visually determine a line of best-fit (or trend line) 

using a piece of spaghetti. Tape your spaghetti line onto your graph. 

2. Now let us investigate the “goodness” of the fit. Use a second piece of spaghetti to 

measure the distance from the first point to the line. Break off this piece to represent 

that distance. Each person at the table must measure in the same way, so discuss 

the method you will use before starting. Repeat this for each point. 

3. Line up your spaghetti distances to determine who in your group has the closest fit. 

Determine the total error; i.e., total distance from your line to the data. Then replace 

the segments and tape them to your scatterplot. 

Total error = _______ 

Handout 3 



Activity 1 Goodness-of-Fit – Possible Solutions 

Objective: To Investigate the concept of goodness of fit and develop an 

understanding of residuals in determining a line of best-fit 

1. Examine the plot provided and visually determine a line of best-fit (or trend line) 

using a piece of spaghetti. Tape your spaghetti line onto your graph. 

Trainer notes: Use the page titled How Do You Find the Line of Best-Fit? to discuss 

methods students use for placing trend lines. This page discusses the general idea 

behind linear regression. To determine a line-of best fit, you must have an agreed upon 

measure of “goodness.” If that measure is closeness of the points to the line, the best 

line is then the line with the least total distance. There are many methods for measuring 

“closeness.” The most common is the method of least squares. 

2. Now let us investigate the “goodness” of the fit. Use a second piece of spaghetti to 

measure the distance from the first point to the line. Break off this piece to represent 

that distance. Each person at the table must measure in the same way, so discuss 

the method you will use before starting. Repeat this for each point. 

Encourage at least one group to use the shortest distance from the point to the line (i.e., 

the perpendicular distance.) Have each group present their method and results. A good 

way to accomplish this is to have the “winner” from each table come up to the front. 

They can then be grouped by their method of measurement. Have each share, discuss, 

compare, and contrast. 

Discuss the fact that since that the groups used different methods of measuring, we 

cannot determine best-of-fit for the entire class. 

Discuss the accuracy of their measurements. Did they measure from the edge of each 

point or the middle, etc.? 

3. Line up your spaghetti distances to determine who in your group has the closest fit. 

Determine the total error. (i.e., total distance from your line to the data.) Then, 

replace the segments and tape them to your scatterplot. 

Total error = _______ 

Use the page titled Measuring to discuss three ways to measure the space between a 

point and the line. Discuss the meaning of a residual and why it is used in evaluating the 

accuracy of a model. 



How Do You Find the Line of Best Fit? – Possible Solutions 

So you’ve observed some data. You have a set of data points (x,y). You've plotted 

them, and they seem to be pretty much linear. How do you find the line that best fits 

those points? "That’s simple," your students say. "Put them into a TI-83 and look at the 

answer." Okay, but let us ask a deeper question: How does the calculator find the 

answer? 

What is meant by Best? 

First, we have to agree on what we mean by the "best fit" of a line to a set of points. 

Why do we say that the line on the left fits the points better than the line on the right? 

And can we say that some other line might fit them better still? 

Transparency 2 (page 3-11) 

Look at the following students’ responses to the task: draw a line of best fit for the data. 

What reasoning might they have given for their choice of lines? 

Passes through the most points, equal number of points above and below, passes through the 

end points, etc. [Transparencies 3-5 (pages 3-12 – 3-14)] 

Usually we think of a close fit as a good fit. But, what do we mean 

by close? 

How close are these points? 



Discuss criteria that might be used to assess the “closeness” of these points? How many 

different ways might it be done? 

Transparency 2 












Measuring 

There are at least three ways to measure the space between a point and the line: 

vertically in the y direction, horizontally in the x direction, and the shortest distance from 

a point to the line (on a perpendicular to the line.) 

In regression, we usually choose to measure the space vertically. These distances 

are known as residuals. 

• Why would you want to measure this way? What do the residuals represent in relation 

to our function? Consider the purpose of the line and the following diagram. 

The purpose of regression is to find a function that can model a data set. The function is 

then used to predict the y values (or outputs, f(x) ) for any given input x. So, the vertical 

distance represents how far off the prediction is from the actual data point (i.e., the 

“error” in each prediction.) Residuals are calculated by subtracting the model’s 

predicted values, f(xi), from the observed values, yi. 

Residual = yi − 

f ( xi 

) 

Handout 4 






Activity 2 

Objective: Investigate various methods of regression. 

Whose model makes the best predictions? Let us compare everyone’s lines using 

the residuals. 

Before we begin, we need to know the equation for your spaghetti function, 

f(x) = mx + b. Assume the lower left corner of the graph is (0,0). 

f(x) = __________________ 

1. Enter your function at Y1= in the calculator. 

2. Enter the actual data into L1 and L2. Put the x-values in L1 and the y-values in L2. 

Make certain that the x’s are typed in correspondence to the y’s. 

x 2 5 6 10 12 15 16 20 20 

y 14 19 9 21 7 21 18 10 22 

3. Place the predicted values, f(xi), created by your function, in L3. To do this, place 

your cursor on L3 and enter your function, using L1 as the inputs of the function. 

(See below.) 

FYI: Y1 can be found under [vars] → [Y-vars] → [1:function] → 1:Y1 

Handout 5-1 



4. Compute the residuals (the distances between the predicted values, f(xi) , and actual 

y values) and place them in L4. This can be done by entering L4 = L2-L3. 

5. On your home screen compute Sum(L4). Record your group’s functions and the 

corresponding sums. 

FYI: Sum can be found under [2 nd ][stat] → [math] → 5:sum 

Function Sum of the residual errors 

• Examine your values in L4. What is the meaning of a negative residual in terms of the 

graph and in terms of the function’s predictions? What is the meaning of a positive or 

negative total for the functions in #5? 

Handout 5-2 



Examine the following student’s work. 

• In L4 what is the meaning of 39.23? What is the corresponding value in your 

table? Describe its meaning. 

• What is the meaning of a low total residual error? Is it a good measure of fit? 

Why or why not? 

Handout 5-3 



There are two possible ways to fix the above problem. One way is to take the absolute 

value of the residual; the other is to square the residual. Taking the absolute value of 

the residuals is synonymous with using our spaghetti segments to measure the vertical 

error. 

6. Find Sum(abs(L4)). Record your group’s functions and the corresponding sums. 

FYI: abs can be found under [2 nd ][0] 

Function Sum of the residual error 

• Compare with those in the class to determine who now has the lowest total 

error. 

Note: The calculator’s regression method uses the squared residuals when measuring 

the goodness-of-fit of a regression line. 

Let us compare our lines of best-fit, using the squared residuals. 

7. Find the total of the squared residuals by Sum((L4) 2 ) . This is often referred to as 

the Sum of the Squared Errors, noted SSE. 

Function SSE 

Handout 5-4 



• Compare with those in the class to determine who has the lowest sum of the squared 

errors. Did the best line in the group change? Why or why not? 

Let us compare our lines against the calculator’s regression line. 

8. Use your calculator to compute the linear regression function, f(x) = mx + b. 

f(x) = ___________________ 

9. Enter the function into Y1 and place the function’s predicted values f(xi) in L3, i.e., L3 = 

Y1(L1). 

10. Quickly, compute the sum of squared errors by using SUM((L2- L3) 2 ). 

SSE = ________ 

• How do the functions in the class compare to this one? 

11. Create a scatterplot and graph your group’s functions and the calculator’s regression 

function. Examine visually the goodness of fit of each in regard to their SSE. 

At least two methods exist for evaluating goodness of fit: taking the absolute value of 

the residuals and squaring the residuals. Although taking the absolute value seems 

most intuitive, relying on squaring does several things. The most desirable one is that it 

simplifies the mathematics needed to guarantee the “best” line. (See the appendix.) 

In Activity 3, you can investigate how squaring the residuals when measuring our 

goodness-of-fit affects the choice of the regression line. 

Understanding what you are looking for is always the toughest part of any problem, so 

the hard part is done. You now know how to measure “goodness” of fit. We can also 

say exactly what the calculator means by the line of best-fit. If we compute the residuals 

(i.e., the error in the y direction), square each one, and add up the squares, we say the 

line of best-fit is the line for which that sum is the least. Since it is a sum of squares, 

the method is called the Method of Least Squares! This is the most commonly used 

method but, as we have seen, it isn’t the only way! 

Handout 5-5 



Activity 2 - Possible solutions 

Objective: Investigate various methods of regression. 

Whose model makes the best predictions? Let us compare everyone’s lines using 

the residuals. 

Before we begin, we need to know the equation for your spaghetti function, 

f(x) = mx + b. Assume the lower left corner of the graph is (0,0). 

f(x) = __1/3 x + 9________________ 

1. Enter your function at Y1= in the calculator. 

2. Enter the actual data into L1 and L2. Put the x-values in L1 and the y-values in L2. Make 

certain that the x’s are typed in correspondence to the y’s. 

x 2 5 6 10 12 15 16 20 20 

y 14 19 9 21 7 21 18 10 22 

3. Place the predicted values, f(xi), created by your function, in L3. To do this, place your 

cursor on L3 and enter your function, using L1 as the inputs of the function. (See below.) 

FYI: Y1 can be found under [vars] → [Y-vars] → [1:function] → 1:Y1 



4. Compute the residuals (the distances between the predicted values, f(xi) , and actual y 

values) and place them in L4. This can be done by entering L4 = L2-L3. 

5. On your home screen compute Sum(L4). Record your group’s functions and the 

corresponding sums. 

FYI: Sum can be found under [2 nd ][stat] → [math] → 5:sum 

Function Sum of the residual errors 

Y= 1/3 x + 9 24.66 

Y= ¼ x + 11 15.5 

Y= 5/4 x 8.5 

Y= 2x + 3 -98 

• Examine your values in L4. What is the meaning of a negative residual in terms of the graph 

and in terms of the function’s predictions? What is the meaning of a positive or negative total 

for the functions in #5? 

In the graph, a negative residual in L4 means the actual point is below the line. In 

terms of the function’s predictions a negative residual means the function over predicted value. 

A positive sum of the residuals means you have more total under predictions than over 

predictions and vise versa for a negative sum of the residuals. 



Examine the following student’s work. 

• In L4 what is the meaning of 39.23? What is the corresponding value in your table? 

Describe its meaning. 

It means this person’s function under predicted the value by 39.32. 

• What is the meaning of a low total residual error? Is it a good measure of fit? Why or 

why not? 

This is not a good measure of fit because large under predictions could be cancelled by 

large over predictions hence making the sum small, as in the above example. 



There are two possible ways to fix the above problem. One way is to take the absolute value of 

the residual; the other is to square the residual. Taking the absolute value of the residuals is 

synonymous with using our spaghetti segments to measure the vertical error. 

6. Find Sum(abs(L4)). Record your group’s functions and the corresponding sums. 

FYI: abs can be found under [2 nd ][0] 

Function Sum of the residual error 

Y= 1/3 x + 9 52 

Y= ¼ x + 11 48.5 

Y= 5/4 x 64.5 

Y= 2x + 3 124 

• Compare with those in the class to determine who now has the lowest total error. 

Note: The calculator’s regression method uses the squared residuals when measuring the 

goodness-of-fit of a regression line. 

Let us compare our lines of best-fit, using the squared residuals. 

7. Find the total of the squared residuals by Sum((L4) 2 ) . This is often referred to as the Sum 

of the Squared Errors, noted SSE. 

Function SSE 

Y= 1/3 x + 9 338 

Y= ¼ x + 11 289.375 

Y= 5/4 x 676.375 

Y= 2x + 3 2488 



• Compare with those in the class to determine who has the lowest sum of the squared 

errors. Did the best line in the group change? Why or why not? 

The best line could change. In regression using the absolute value and using the square 

may not agree, because it changes how you define what the best line is. 

Let us compare our lines against the calculator’s regression line. 

8. Use your calculator to compute the linear regression function, f(x) = mx + b. 

f(x) = _.156 x + 13.83__________________ 

9. Enter the function into Y1 and place the function’s predicted values f(xi) in L3, i.e., L3 = 

Y1(L1). 

10. Quickly, compute the sum of squared errors by using SUM((L2- L3) 2 ). 

SSE = _259.67_______ 

• How do the functions in the class compare to this one? 

The calculator linear regression function should have a lower SEE than the classes 

functions. 

11. Create a scatterplot and graph your group’s functions and the calculator’s regression 

function. Examine visually the goodness of fit of each in regard to their SSE. 

At least two methods exist for evaluating goodness of fit: taking the absolute value of 

the residuals and squaring the residuals. Although taking the absolute value seems 

most intuitive, relying on squaring does several things. The most desirable one is that it 

simplifies the mathematics needed to guarantee the “best” line. (See the appendix.) 

In Activity 3, you can investigate how squaring the residuals when measuring our 

goodness-of-fit affects the choice of the regression line. 

Understanding what you are looking for is always the toughest part of any problem, so 

the hard part is done. You now know how to measure “goodness” of fit. We can also 

say exactly what the calculator means by the line of best-fit. If we compute the residuals 

(i.e., the error in the y direction), square each one, and add up the squares, we say the 

line of best-fit is the line for which that sum is the least. Since it is a sum of squares, 

the method is called the Method of Least Squares! This is the most commonly used 

method but, as we have seen, it isn’t the only way! 



Activity 3 Absolute Value vs. Squaring 

OBJECTIVE: It is important to understand the effect squaring has on the residuals and 

the placement of a regression line. In this activity, we will use an interactive java applet 

to investigate several data sets and contrast geometrically and numerically the effect of 

using the square of the residuals vs. the absolute value of the residuals. 

1. Place three points forming a triangle on the graph. Select “plot line” and place a 

trend line on the graph. 

2. Select “Draw residuals.” Using the handle points, adjust your line to visually 

minimize the length of the residuals. 

Select “Show Trend Line Equation.” ____________________ 

3. Select “Draw (residuals) 2 .” Using the handle points, adjust your line to visually 

minimize the area of the squares. 

Equation of the line: ____________________ 

4. Now select “Sum of the residuals” and adjust your line to numerically minimize the 

|residuals|. Record the equation and total: ___________________ 

5. Now select “Sum of the (residuals) 2 ” and adjust your line to numerically minimize the 

(residuals) 2 . Record the equation and total:_________________ 

6. Create a situation where the sum of the squares is less than the sum of the absolute 

value. 

7. Create a data set in which the least absolute value and least squares methods agree 

on the line of best fit. 

8. Place the following ordered pairs (4, 1), (4, 4), (-4, 0), and (-4, -3) in the table. Find 

the line of best fit for each method. 

• Compare and contrast these two methods. 

Handout 6-1 



• How does squaring the residuals affect how individual data points contribute to the 

total error? Does squaring increase or decrease the effect of an individual residual on 

the total error? 

• What is the effect of an outlier point on each of the possible trend lines for each 

method? 

Further investigation 

Another method for finding regression lines is Chebyshev’s Best-Fit Line Method, also 

known as the MinMax Method, which finds the line with the minimum maximum 

residual. Chebyshev’s evaluates each line based on its largest residual and takes the 

line with the smallest (largest residual ) as the regression line. 

• Use Chebyshev’s method in the previous graphs to determine a line of best fit. How 

does it compare to the least absolute value and least squares methods? How it is 

affected by outliers? 

Handout 6-2 



Activity 3 Absolute Value vs. Squaring – Selected Answers 

OBJECTIVE: It is important to understand the effect squaring has on the residuals and 

the placement of a regression line. In this activity, we will use an interactive java applet 

to investigate several data sets and contrast geometrically and numerically the effect of 

using the square of the residuals vs. the absolute value of the residuals. 

1. Place three points forming a triangle on the graph. Select “plot line” and place a 

trend line on the graph. 

2. Select “Draw residuals.” Using the handle points, adjust your line to visually 

minimize the length of the residuals. 

Select “Show Trend Line Equation.” ____________________ 

3. Select “Draw (residuals) 2 .” Using the handle points, adjust your line to visually 

minimize the area of the squares. 

Equation of the line: ____________________ 

4. Now select “Sum of the |residuals|” and adjust your line to numerically minimize the 

|residuals|. Record the equation and total: ___________________ 

5. Now select “Sum of the (residuals) 2 ” and adjust your line to numerically minimize the 

(residuals) 2 . Record the equation and total:_________________ 

6. Create a situation where the sum of the squares is less than the sum of the absolute 

values. Participants should notice the effect squaring has on each residual. Place 

the points close to the line so that the residuals are less than 1. 

7. Create a data set in which the least absolute value and least squares methods agree 

on the line of best fit. Various possible answers 

8. Place the following ordered pairs (4, 1), (4, 4), (-4, 0), and (-4, -3) in the table. Find 

the line of best fit for each method. Note:The absolute value line is not unique. 

• Compare and contrast these two methods. 

Various answers: Note, both methods are valid. However, the absolute value method 

does not always give a unique regression line. 



• How does squaring the residuals affect how individual data points contribute to the 

total error? Does squaring increase or decrease the effect of an individual residual on 

the total error? If the residual is less than one, squaring decreases it’s effect on the 

total squared residual. If the residual is greater than one, squaring increases it’s effect 

on the total squared residual. Thus, the squaring method rewards small errors and 

penalizes large residual errors. This penalizing and rewarding effect of the least 

squares method is often described as desirable by statisticians. The absolute value 

methods however treats all residuals the same (with equal contempt). 

• What is the effect of an outlier point on each of the possible trend lines for each 

method? Since squaring will give disproportion weight to the outlier when compared to 

the absolute value method it will have a greater effect on the sum errors of the least 

squares regression line. 

Further investigation 

Another method for finding regression lines is Chebyshev’s Best-Fit Line Method, also known 

as the MinMax Method, which finds the line with the minimum maximum residual. 

Chebyshev’s evaluates each line based on its largest residual and takes the line with 

the smallest (largest residual ) as the regression line. 

• Use Chebyshev’s method in the previous graphs to determine a line of best fit. How 

does it compare to the least absolute value and least squares methods? How it is 

affected by outliers? 



Supplemental Material 

Two ways to minimize the sum of the squared residuals 

The key to solving this or any problem is understanding exactly for what you are 

looking. Our model, or line of “best fit”, f ( x) 

= mx+ 

b , will be one that minimizes the 

sum of the squares of the vertical distances between the actual points and the predicted 

2 

ones, i.e., the residuals = yi − f ( xi 

) . It can be written L = ∑ ( y − f ( x) 

) or 

∑ 

2 

L = ( y − ( mx + b ) ) . 

∑ 

2 

What kind of equation is L = ( y − ( mx + b ) ) ? That’s right, quadratic. And we 

actually know enough about quadratics from Algebra II to solve this problem. But, one 

of the key words in the above paragraph is minimize, which should also make you think 

Calculus! This gives us an easy alternative approach. 

Let us examine this quadratic more closely. 

∑ 

L = ( y − ( mx + b ) 

=∑ 

2 

) 

2 2 

2 

( m x + 2bmx 

+ b − 2myx 

− 2by 

+ y 

It may look daunting, but remember, m and b are the only unknowns here. x and y are 

just numbers supplied by each of the actual points in our scatterplot. 

2 2 

2 

Expanding L farther, L = m ∑ x + bm∑ 

x + nb − 2m∑ 

xy − 2b∑ 

y + ∑ 

2 y 

(You might want to double check all this! Why let someone else have all the fun?) 

Remember that the summations are just constants! So now we have a choice to use 

calculus to find its minimum or use Algebra II to find its vertex. 

Let’ try the Calculus! 

In calculus, the minimum occurs here where the derivative is equal to zero. Since we 

have two variables, m and b, we will want to take the derivative of each variable 

separately. (These are called partial derivatives.) 

∂L 

= 2m 

∂m 

2 

∑x+ 2b∑x−2∑ 

xy = 

0 

Handout 7-1 


2 

) 

2


∂L 

= 2m y 

∂b 

∑x+ 2nb 

− 2∑ 

= 

0 

All that’s left is to solve this system of equations by elimination or substitution. Take 

your pick. 

Using substitution, b in the second equation looks easiest to solve for. So, we get 

∑ y − m∑ 

x 

b = . Substituting for b into the first equation and simplifying, we get 

n 

n∑xy 

+ ∑ x∑y 

m = 

. 

2 

2 

n x − x) 

∑ 

(∑ 

And that’s it. Your calculator or computer just sums the x’s, the y’s, the xy’s, etc. and 

out pops the slope and y-intercept of your regression equation. It is not hard, but 

certainly tedious when done by hand. 

(You may wonder how we know it is a minimum and not a maximum. The second 

derivative is 2; a positive second derivative means it must be a minimum.) 

Let us try it with Algebra! 

Here we go. Remember that we want to find the minimum of 

L = m 

2 

2 

2 

∑ x + bm ∑ x + nb − 2m 

∑ xy − 2b∑ 

y + ∑ 

2 y 

and that all of those summations are just constants. Thus, L is a quadratic with respect 

to m or b. This can be seen easily by rearranging. 

2 

L(m) = ( ∑ x ) m2 2 2 

+ ( 2b 

∑ x − 2∑ 

xy) 

m − ( 2b 

∑ y + ∑ y + nb ) 

L(b)= n b2 2 2 

2 

+ ( 2m 

∑ x − 2∑ 

y) 

b + ( m 

∑ x − 2m∑ 

xy −∑ 

y ) 

Handout 7-2 


2


Do they open up or down? The leading coefficients, ∑ 2 

x and n, are both positive, so 

the answer is up. 

From Algebra II, we know the vertex of Ax 2 − B 

+ Bx + C occurs at . 

2A 

So m = 

b = 

− ( 2b∑ 

x − 2 

2 

2 x 

∑ 

− ( 2m∑ 

x − 2∑ 

y) 

2n 

∑ 

xy) 

= 

= 

∑ xy − b 

∑ 

x 

∑ y − m∑ 

x 

. 

n 

Handout 7-3 


2 

∑ 

x 

, and 

∑ ∑ ∑ 

∑ (∑ 

n xy + x y 

Substituting one into the other, we get m = 

and 

2 

2 

n x − x) 

∑ 

∑ 

∑ 

∑ ∑ 

(∑ 

2 

y x − x xy 

b = . This is exactly the same result as before. 

2 

2 

n x + x) 

Some Historical Notes 

Who invented the method of least squares? It is not clear. Often credit is given to 

Karl Friedrich Gauss (1777–1855), who was first published on this subject in 1809. But 

the Frenchman Adrien Marie Legendre (1752–1833) published a clear example of the 

method four years earlier. Legendre was in charge of setting up the new metric system 

of measurement, and the meter was to be one ten-millionth of the distance from the 

North Pole through Paris to the Equator. Surveyors had measured portions of the arc 

but to get the best measurement for the whole arc, Legendre developed the method of 

least squares. He would probably use GPS today, but he was still amazingly accurate. 

Where does the term "regression” come from? The term was first used by Sir 

Francis Galton (1822-1911) in his hereditary studies. He wanted to predict the heights 

of sons from their father’s heights. He learned that a tall father tended to have sons 

shorter than himself, and a short father tended to have sons taller than himself. The 

heights of sons thus regressed towards the mean height of the population over several 

generations. The term "regression” is now used for many types of prediction problems, 

and does not merely apply to regression towards the mean.


Activity: Understanding Correlation Properties with a Visual Model 

Overview: This activity encourages participants to visually explore the meaning of 

correlation and to recognize correlation patterns. 

. 

TEKS: This activity supports teacher content knowledge underlying the 

following TEKS: 

§111.32. Algebra I 

(a) Basic understandings. 

(5) Tools for algebraic thinking. Techniques for working with functions 

and equations are essential in understanding underlying 

relationships. Students use a variety of representations (concrete, 

pictorial, numerical, symbolic, graphical, and verbal), tools, and 

technology, (including, but not limited to, calculators with graphing 

capabilities, data collection devices, and computers) to model 

mathematical situations to solve meaningful problems. 

(A.2) Foundations for functions. The student uses the properties and 

attributes of functions. 

The student is expected to: 

(D) collect and organize data, make and interpret scatter plots 

(including recognizing positive, negative, or no correlation for data 

approximating linear situations), and model, predict, and make 

decisions and critical judgments in problem situations. 

Vocabulary: correlation, regression, Pearson Product moment correlation, causation 

Procedure: Participants use a computer to investigate correlation values and to 

practice estimating correlation values for scatterplots. 

After completing the activity, participants should have a visual feel for 

numerical correlation values, and should also be able to relate numerical 

values of correlation to contextual situations. Participants are also 

encouraged to investigate and understand the relationship between 

correlation and causation. 

Materials: Computer with internet access and Java 1.4 

PowerPoint slides: Correlation vs. Causation 






Yard stick 

Photocopy of a forearm. 

Understanding Correlation Properties with a Visual Model 3-34


Grouping: Individually or pairs 

Time: 2 - 2½ hours 

Lesson: 


Activity 1 CSI Correlation 

Part A: 

Participants use a computer to 

investigate how the modeling process is 

used to generate new knowledge. 

Distribute Handouts 1 and 2, Activities 1 

and 2, (pages 3-42 – 3-46 and pages 3- 

53 – 3-55). 

Read the crime scene scenario. 

Participants will collect data from 8 

people using a yard stick. 

Participants will use a computer to 

investigate correlation values. Using the 

applet Correlation. 


Hand out the photo copy of the 

assailants forearm. The participants will 

then extrapolate the assailants height. 

Part B: A Closer Look: 

Participants use a computer to 

investigate correlation values. Have 

participants open the applet Correlation. 


Changes in the data set are investigated. 

Outliers, changes in scale, and the 

geocenter of a set of data are discussed. 

The forearm should be measured from 

the elbow to the wrist. 

Participants should discuss measuring 

techniques and degree of accuracy. 

You will need a photocopy of the 

assailants forearm to distribute to each 

group. If possible use someone who is a 

bit out of the normal range. For 

example, the tallest or shortest 

participant. This will cause the 

participants to extrapolate instead of 

interpolate. 

After completing Activity 1, participants 

should have a visual feel for numerical 

correlation values and should also be 

able to relate numerical values of 

correlation to contextual situations. 




Activity 2 

Part A: 

The goal of this activity is to gain an 

intuitive understanding of r. Using the 

web applet Correlation, scatterplots are 

easily constructed. By clicking and 

dragging points, participants can change 

the data sets and investigate the effect 

on the correlation. 

Part B: The r Game 

Have participants play a game with 

several classmates to develop deeper 

understanding of correlations, leverage 

points, and geocenters. Participants use 

the web applet Correlation, 

http://mathteks006.net/applets, to create 

scatterplots with a specific correlation. 

(See Part B handout for further 

directions.) 

The participants should play several 

times until they have a good intuition of 

how each point’s relationship with the 

others affects the correlation. 

Activity 3 Correlation vs. Causation 

This activity explores the relationship 

between correlation and causation. 

Part A: 

Give out Handout 3, Activity 3 - Part A, 

(pages 3-61 – 3-64) or use the Power 

Point provided and lead a class 

discussion of correlation and causation. 

The dynamic nature of the applet allows 

you to see how the correlation changes 

when a data point is added or moved. 

Without technology, such intuition would 

take years to develop. 

When interpreting the correlation 

coefficient, you should always look at the 

scatterplot first to see if the relationship 

is linear. If it is, you may calculate the 

correlation coefficient. Always 

remember that a visual analysis of data 

is quite valuable in addition to a 

numerical analysis. 




Correlation vs. 

Causation 

In a Gallup poll, surveyors asked, “Do you 

believe correlation implies causation?’” 

64% of American’s answered “Yes” . 

38% replied “No”. 

The other 8% were undecided. 

Ice-cream sales are strongly 

correlated with crime rates. 

Therefore, ice-cream causes 

crime. 

There is a humorous article discussing 

this poll in the appendix. 

If correlation implies causation, this 

would be a fabulous finding! To reduce 

or eliminate crime, all we would have to 

do is stop selling ice cream. Even 

though the two variables are strongly 

correlated, assuming that one causes 

the other would be erroneous. What are 

some possible explanations for the 

strong correlation between the two? 

One possibility might be that high 

temperatures increase crime rates 

(presumably by making people irritable) 

as well as ice-cream sales. 




The Simpsons 

(Season 7, "Much Apu About Nothing") 

Homer:Not a bear in sight. The "Bear 

Patrol" is working like a charm! 

Lisa: That's specious reasoning, Dad. 

Homer:[uncomprehendingly] Thanks, 

honey. 

Lisa: By your logic, I could claim that 

this rock keeps tigers away. 

Homer:Hmm. How does it work? 

Lisa: It doesn't work; it's just a 

stupid rock! 

Homer:Uh-huh. 

Lisa: But I don't see any tigers 

around, do you? 

Homer:(pause) Lisa, I want to buy your 

rock. 

Without proper prope r interpretation, 

inte rpre tation, 

causation should not be 

assumed, or even implied. 

Cons ider the following res earc earch h 

undertaken by the Univers ity of Texas 

Health S cience Center at S an Antonio 

appearing to s how a link between 

cconsumption ons umption of diet diet s oda and weight 

gain. 

The The ss tudy tudy of of more more than than 600 600 normal--weight 

normal weight 

people people found, found, eight eight years years later, later, that that they they 

were were 65 65 percent percent more more likely likely to to be be 

overweight overweight if if they they drank drank one one diet diet ss oda oda a a 

day day than than if if they they drank drank none. none. And And if if they they 

drank drank two two or or more more diet diet ss odas odas a a day, day, they they 

were were even even more more likely likely to to become become 

overweight overweight or or obes obese. e. 

An entertaining demonstration of this 

fallacy once appeared in an episode of 

The Simpsons (Season 7, "Much Apu 

About Nothing"). The city had just spent 

millions of dollars creating a highly 

sophisticated "Bear Patrol" in response 

to the sighting of a single bear the week 

before. 

Our students and the general public 

often take such relationships as causal. 

By no means does this state that diet 

soda causes obesity - but there is a 

strange pattern at play here. 

A relationship other than causal might 

exist between the two variables. It is 

possible that there is some other 

variable or factor that is causing the 

outcome. This is sometimes referred to 

as the "third variable" or "missing 

variable" problem. 

• What are some other possible 

plausible alternative explanations 

to our diet soda/obesity research 

example? 




A re lationship othe r than causal 

m ight e xist be twe e n the two 

variables. It's possible that the re 

is some other variable or factor 

that is causing th e outcom e . This 

is some tim e s re fe rre d to as the 

"third va ria b le " or "m issin g 

variab le " proble m . 

Ice cream sales and the number of shark 

attacks on swimmers are correlated. 

Skirt lengths and stock prices are highly 

correlated (as stock prices go up, skirt 

lengths get shorter). 

The number of cavities in elementary 

school children and vocabulary size are 

strongly correlated. 

The re are two re lationships 

which can be mistaken for 

causation: 

1. Common re sponse 

2. Confounding 

We must be very careful in interpreting 

correlation coefficients. Just because 

two variables are highly correlated does 

not mean that one causes the other. In 

statistical terms, we simply say that 

correlation does not imply causation. 

There are many good examples of 

correlation which are nonsensical when 

interpreted in terms of causation. 




→Z → X &Y 

1 . Common Re sponse : 

Both Xand Yre spond to change s in 

some unobse rve d variable , Z. All 

three of our previous examples are 

examples of common response. 

2. Confounding 

The effect of Xon Yis indistin guishab le 

from the effects of other explanatory 

variable s on Y. When studying medical 

tre atm e nts, the “place bo e ffe ct” is an 

example of confounding. 

When can we imply 

causation? 

Controlled experiments 

must be performed. 

Unless data have been gathered by experimental 

means and confounding variables have been 

eliminated, correlation never implies causation. 

The placebo effect is the phenomenon 

that a patient's symptoms can be 

alleviated by an otherwise ineffective 

treatment, since the individual expects or 

believes that it will work. 

For example, if we are studying the 

effects of Tylenol on reducing pain, and 

we give a group of pain-sufferers Tylenol 

and record how much their pain is 

reduced, the effect of Tylenol is 

confounded with the effect of giving them 

any pill. Many people will report a 

reduction in pain by simply being given a 

sugar pill with no medication. 

Experimental research attempts to 

understand and predict causal 

relationships. Since correlations can be 

created by an antecedent, Z, which 

causes both X and Y, or by confounding 

variables, controlled experiments are 

performed to remove these possibilities. 

Still the great Scottish philosopher David 

Hume has argued that we can only 

perceive correlation, and causality can 

never truly be known or proven. 




Part B: Headlines 

Distribute Handout 4, Part B, (page 3- 

65). Participants brainstorm common 

causes of confounding variables for 

various headlines and related 

correlations. 

Within your group, brainstorm common 

causes or confounding variables. Write 

your ideas below and be prepared to 

share. 

Handout 5, Supplemental Reading, 

(pages 3-66 – 3-67). 

This is a humorous article discussing the 

correlation, causation debate. 

Power point slides of the headlines are 

included to help in a summary 

discussion of this activity. 



ACTIVITY 1 

This module opens with an explanation of the way that paired measurements can be 

plotted in two-dimensional space. Next, positive and negative relationships are 

discussed and participants are asked to predict values using a regression equation. It 

concludes with a discussion of outliers. 

PART A 

Consider the following. 

At approximately 6:45 a.m., Tuesday morning, Principal Espinoza saw something 

strange as he opened the backdoor to B. Wyatt High School. As he entered the 

hallway, he immediately discovered the broken glass from the classroom door. It was a 

9 th grade Math classroom. The computers were missing, the desks were overturned, 

and the prized school banner was torn from the wall. The perpetrators were long gone, 

but they had left something behind. Next to the desk, where Mrs. Joe’s computer once 

sat, was the imprint of a forearm on the board. When the police arrived, they 

immediately began to gather forensic evidence. Mr. Espinosa, knowing your love of CSI 

and Numb3rs, asks you to help gather data to help identify the bandits. 

Bones of the arm can reveal interesting facts about an individual. But can they reveal a 

person's height? Forensic anthropologists team up with law enforcers to help solve 

crimes. Let us combine math with forensics to see how. 

Collect data for 8 people. 

Person Forearm 

Length 

(inches) 

Height 

(inches) 

Handout 1-1 



1. From the table, describe any relationships you see between the variables forearm 

length and height. 

Making a scatter plot can provide a useful summary of a set of bivariate data (two 

variables). It gives a good visual picture of the relationship between the two variables 

and aids in the interpretation of the correlation coefficient and regression model. The 

scatterplot should always be drawn before working out a linear correlation coefficient or 

fitting a regression line. 

A positive association is indicated on a scatterplot by an upward trend (positive 

slope), where larger x-values correspond to larger y-values and smaller x-values 

correspond to smaller y-values. A negative association would be indicated by the 

opposite effect (negative slope) where the higher x-values would correspond to lower yvalues. 

Or, there might not be any notable linear association. 

2. We will use the web applet Correlation for further investigation in the following 

exercises. Enter the forearm length and height data into the table and examine the 

scatterplot. 

In 1896, Karl Pearson gave the formula for calculating the correlation coefficient known 

as r. (To see it, select show equation for r.) He argued that it was the best indicator of 

linear relationships. It derives its name from linear, meaning “straight line,” and corelation 

meaning to "go together." The drudgery of computing the correlation coefficient 

by hand is quite ominous. However, today’s calculators can easily compute r. It is often 

referred to as the Pearson Product Moment Correlation Coefficient. 

We can generally categorize the strength of correlation as follows: 

• Strong |r| > 0.8 

• Moderate: 0.5< |r |


If variables are strongly correlated, we often use one to predict the other. A gross 

example from forensic science is using the size and larva stage of maggots to predict 

time of death. Linear regression is the method used to create these mathematical 

prediction models. Given X, we can predict Y. If the correlation is high enough, record 

the function for the regression line. 

3. Using the information you collected, try predicting the height of our assailant for Mr. 

Espinosa. A copy of the police imprint from our assailant is attached. 

● What would increase your confidence in this prediction? 

In real life, mathematics always begins with a question. What do you want to know? 

This is followed by data collection. If it is bivariate data, scatterplots are drawn to give 

the “big picture.” If the relationship looks linear, the correlation coefficient is calculated 

to quantify the relationship. If the r value is reasonable, a linear function can be found 

that is used to predict what has not been observed; in our case, the height of the 

assailant. 

Handout 1-3 



PART B – A Closer Look 

Now let us look more closely at how we measure the strength of associations between 

data sets. The correlation coefficient can range from -1 to 1. ( ± 1 being a perfect linear 

correlation between the two variables.) If the variables are completely independent, the 

correlation is 0. However, the converse is not true since the correlation coefficient 

detects only linear dependencies between two variables. 

Let us investigate changes in our data set. 

1. Click and drag one point of your scatterplot until the correlation is 0.3. Record the 

coordinates. 

● Is the placement of this point unique? 

● What does the new point represent in terms of the context? 

An outlier is an observation that lies an abnormal distance from other values in a 

sample. In a sense, this definition leaves it up to you, the analyst, to decide what will be 

considered abnormal. Before abnormal observations can be singled out, it is necessary 

to characterize normal observations. If the data point is in error, it should be corrected if 

possible. If there is no reason to believe that the outlying point is in error, it should not 

be deleted without careful consideration. 

● Would you consider your point an outlier? Why? 

2. Suppose a “mistake” was made. All the forearm sizes were reported in centimeters 

(1 in. = 2.54 cm.), and all the heights were recorded in inches. A student tells you 

that the correlation will be too low saying that increasing the forearm data by a factor 

greater than 1 will spread the points in a graph. Do you agree with the student? 

How would you explore this issue? 

Handout 1-4 



● What do you suppose would happen to our correlation value if we changed to 

different height scale? 

3. Delete the outlying point from your table. Now, add two additional points to make a 

correlation of 0.99. Discuss the placement of your points. 

The geocenter, also called the center of mass or centroid is the “average” point of the 

data. If we have the points (x1,y1), (x2,y2) (x3,y3), and (x4,y4) then the coordinates of 

the geocenter would be ⎛ x 1+ x2 

+ x3 

+ x4 

y1 

+ y2 

+ y3 

+ y4 

⎞. 

The further a point is from the 

⎜ 

⎝ 

4 

, 

geocenter of the data the more “leverage” it has. (Note: The regression line always 

passes through this point.) 

4 

Students often have a naïve sense of correlation. We should look to extend their 

understandings. Dynamic applications such as Geometers Sketch Pad and web 

applets open up new avenues for exploration and deeper understandings. By allowing 

students to explore and test their own conjectures, they take ownership of their 

mathematical understandings. 

Handout 1-5 

Understanding Correlation Properties with a Visual Model 3-46 

⎟ 

⎠


ACTIVITY 1 - Selected Answers 

This module opens with an explanation of the way that paired measurements can be 

plotted in two-dimensional space. Next, positive and negative relationships are 

discussed and participants are asked to predict values using a regression equation. It 

concludes with a discussion of outliers. 

PART A 

Consider the following. 

At approximately 6:45 a.m., Tuesday morning, Principal Espinoza saw something 

strange as he opened the backdoor to B. Wyatt High School. As he entered the 

hallway, he immediately discovered the broken glass from the classroom door. It was a 

9 th grade Math classroom. The computers were missing, the desks were overturned, 

and the prized school banner was torn from the wall. The perpetrators were long gone, 

but they had left something behind. Next to the desk, where Mrs. Joe’s computer once 

sat, was the imprint of a forearm on the board. When the police arrived, they 

immediately began to gather forensic evidence. Mr. Espinosa, knowing your love of CSI 

and Numb3rs, asks you to gather data to help identify the bandits. 

Bones of the arm can reveal interesting facts about an individual. But can they reveal a 

person's height? Forensic anthropologists team up with law enforcers to help solve 

crimes. Let us combine math with forensics to see how. 

Collect data for 8 people. (The number can vary, use at least 7.) 

Person Forearm 

Length 

(inches) 

Height 

(inches) 

10.5 63 

10 66 

11.5 72 

10.25 62 

10.5 66 

11.5 71 

12.5 76 



1. From the table, describe any relationships you see between the variables forearm 

length and height. 

As forearm length increases, the height increases. 

Making a scatter plot can provide a useful summary of a set of bivariate data (two 

variables). It gives a good visual picture of the relationship between the two variables 

and aids in the interpretation of the correlation coefficient and regression model. The 

scatterplot should always be drawn before working out a linear correlation coefficient or 

fitting a regression line. 

A positive association is indicated on a scatterplot by an upward trend (positive slope) 

where larger x-values correspond to larger y-values and smaller x-values correspond to 

smaller y-values. A negative association would be indicated by the opposite effect 

(negative slope), where the higher x-values would have lower y-values. Or, there might 

not be any notable linear association. 

2. We will use the web applet Correlation for further investigation in the following 

exercises. Enter the forearm length and height data into the table and examine the 

scatterplot. 



In 1896, Karl Pearson gave the formula for calculating correlation coefficients known as 

r. (To see it, select show equation for r.) He argued that it was the best indicator of 

linear relationships. It derives its name from linear, meaning “straight line,” and corelation 

meaning to "go together." The drudgery of computing them by hand is quite 

ominous. However, today’s calculators can easily compute them. It is often referred to 

as the Pearson Product Moment Correlation. 

We can generally categorize the strength of correlation as follows: 

• Strong: |r| > 0.8 

• Moderate: 0.5< |r |


their forearm? These questions of reliability are very important in applying the 

ideas of this activity to data from any real life experiment. 

In real life, mathematics always begins with a question. What do you want to know? 

This is followed by data collection. If it is bivariate data, scatterplots are drawn to give 

the “big picture.” If the relationship looks linear, the correlation coefficient is calculated 

to quantify the relationship. If the r value is reasonable, a linear function can be found 

that is used to predict what has not been observed, in our case, the height of the 

assailant. 

PART B – A Closer Look -Selected Answers 

Now let us look more closely at how we measure the strength of association between 

data sets. The correlation coefficient can range from -1 to 1. ( ± 1 being a perfect linear 

correlation between the two variables.) If the variables are completely independent, the 

correlation is 0. However, the converse is not true since the correlation coefficient 

detects only linear dependencies between two variables. 

. 

Let us investigate changes in our data set. 

1. Click and drag one point of your scatterplot until the correlation is 0.3. Record the 

coordinates. 

2. 

Ex.(8.07, 72.6) Answers may vary 



● Is the placement of this point unique? 

No, it is not. Encourage participants to find several. 

Further Questions: How many places can the point be placed? 3, 4, infinite? 

● What does the new point represent in terms of the context? 

Answers may vary. The example point (8.07, 72.6) represents a person that is 

approximately 6 feet and ½ inches tall with 8 inch forearms. 

An outlier is an observation that lies an abnormal distance from other values in a 

sample. In a sense, this definition leaves it up to you, the analyst, to decide what will be 

considered abnormal. Before abnormal observations can be singled out, it is necessary 

to characterize normal observations. If the data point is in error, it should be corrected if 

possible. If there is no reason to believe that the outlying point is in error, it should not 

be deleted without careful consideration. 

● Would you consider your point an outlier? Why? 

Answers may vary. Yes. The point represents an abnormal situation. He is 6 feet 

tall and has the arms of a child or possibly an amputee. 

2. Suppose a “mistake” was made. All the forearm sizes were reported in centimeters 

(1 in. = 2.54 cm.) and all the heights were recorded in inches. A student tells you that 

the correlation will be too low saying that increasing the forearm data by a factor 

greater than 1 will spread the points in a graph. Do you agree with the student? 

How would you explore this issue? 

Encourage participants to try it. This question should encourage a healthy 

discussion of important misconceptions. How does the visual spread of the data 

affect the correlation? What if both height and forearm are recorded in 

centimeters? How would this dilation affect the correlation? This confronts the 

students’ belief that if we spread the data, the correlation should diminish. 

However, changes in scale do not affect correlations. 

● What do you suppose would happen to our correlation value if we changed to a 

different height scale? 

It should not affect the correlation. Changes in scale do not affect correlations. 

3. Delete the outlying point from your table. Now, add two additional points to make a 

correlation of 0.99. Discuss the placement of your points. 



The geocenter, also called the center of mass or centroid is the “average” point of the 

data. If we have the points (x1,y1), (x2,y2) (x3,y3), and (x4,y4) then the coordinates of 

the geocenter would be 

. The further a point is from the geocenter 

⎛ x 1+ x2 

+ x3 

+ x4 

y1 

+ y2 

+ y3 

+ y4 

⎞ 

⎜ 

, 

⎟ 

⎝ 4 

4 ⎠ 

of the data the more “leverage” it has. (Note: The regression line always passes 

through this point.) 

Students often have a naïve sense of correlation. We should look to extend their 

understandings. Dynamic applications such as Geometers Sketch Pad and web 

applets open up new avenues for exploration and deeper understandings. By allowing 

students to explore and test their own conjectures, they take ownership of their 

mathematical understandings. 



ACTIVITY 2 

H.G Wells once said,”Statistical thinking will one day be as necessary as the 

ability to read and write.” 

The goal of this activity is to gain an intuitive understanding of r. Using the web applet 

Correlation, scatterplots can easily be constructed. The dynamic nature of the applet 

allows you to see how the correlation changes when a data point is added or moved. 

PART A 

1. Clear your table and place two points on the graph. Note the correlation. 

● Would any two points have the same value? Explain. 

A student remarks that “when r is undefined, it means there is no linear model for the 

data.” Do you agree? How would you explore/explain this? 

2. Make a lower left to upper right pattern of 10 points with a correlation of 0.7. 

3. Make a vertical stack of 9 data points on the left side of the window. Add a 10 th point 

somewhere to the right and drag it until the correlation again reaches 0.7. Is its 

placement unique? 

4. Make another scatter plot with 10 data points in a curved pattern that starts at the 

lower left, rises to the right, then falls again at the far right. Adjust the points until 

you have a smooth curve with a correlation close to 0.7. 

● Does any other curved pattern have this same correlation? 

● What can you conclude about the numerical value of a correlation? 

Handout 2-1 



Juan and Kaylee have collected data for a class experiment. They correctly found an rvalue 

of .15. Juan claims that no function will model the data. But, Kaylee says that the 

r-value is wrong because she has found a good one. Is this possible? How would you 

help these students? 

5. Make a data plot with a correlation of 0 by placing 8 to 10 points on the graph. 

6. Enter 4 points in the table to make a perfect rectangle. Note the correlation value. 

7. Create several other data sets with a horizontal or vertical line of symmetry and note 

the correlation value. 

Let us take a closer look at the numerical value of r by investigating the equation that 

produces this quantity. n is the number of points. 

8. Select “Show equation for r” and examine the formula to determine why and when 

the correlation is undefined. (Hint: use two points) 

So far, we have developed some intuitions about r. Its formal definition is quite 

complex. However, r 2 is much simpler. So we mention it here. r 2 is the fraction of total 

variation in the y variable that can be explained by the regression equation. The rest of 

the variation is due to randomness or some other factors. For example, if the correlation 

coefficient is 0.7 then r 2 = 0.49 meaning that 49% of the variation in the y-variable can be 

explained by the regression equation. The other 51% is due to some other factors. How 

does this affect your understanding of how the strength of correlations are categorized 

in part A of activity 1? Consider some other “strong,” “moderate,” and “weak” r-values. 

Handout 2-2 



PART B – The r Game 

One basic rule when interpreting the correlation coefficient is to “First look at the 

scatterplot to see if the relationship between variables is linear.” If it is, you may 

calculate the correlation coefficient. Always remember that a visual analysis of data is 

quite valuable in addition to a numerical analysis. 

To understand r, it is important to understand how individual points affect the value of 

correlations. The relationship of outliers, leverage points and non-leverage points to the 

geocenter of a set of data are explored in this simple exercise. 

Use the web applet Correlation at http://mathteks006.net/applets to practice creating 

scatterplots with a specific correlation. 

1. Challenge your classmates to place seven points on the graph that have a correlation 

of 0.7. 

2. You are not allowed to delete or drag points once they are placed on the graph. 

3. Price is Right Rules – The closest r-value without going over wins! 

Play several times varying the number of points and r-value. 

Handout 2-3 



ACTIVITY 2 -Selected Answers 

H.G Wells once said, ”Statistical thinking will one day be as necessary as the 

ability to read and write.” 

The goal of this activity is to gain an intuitive understanding of r. Using the web applet 

Correlation, scatterplots can easily be constructed. The dynamic nature of the applet 

allows you to see how the correlation changes when a data point is added or moved. 

PART A 

1. Clear your table and place two points on the graph. Note the correlation. 

● Would any two points have the same value? Explain. 

No, depending on the placement of the points, it may be 1, -1, or undefined. 

A student remarks that “when r is undefined, it means there is no linear model for the 

data.” Do you agree? How would you explore/explain this? 

No, consider the points (1, 4) and (3,4) modeled by y= 4. There is no correlation 

for these points since when the x-values increase, the y-values neither increase 

or decrease. The correlation is not zero since the variables are not independent 

of one another. 



2. Make a lower left to upper right pattern of 10 points with a correlation of 0.7. 

3. Make a vertical stack of 9 data points on the left side of the window. Add a 10 th point 

somewhere to the right and drag it until the correlation again reaches 0.7. Is its 

placement unique? No 



4. Make another scatter plot with 10 data points in a curved pattern that starts at the 

lower left, rises to the right, then falls again at the far right. Adjust the points until 

you have a smooth curve with a correlation close to 0.7. 

● Does any other curved pattern have this same correlation? 

● What can you conclude about the numerical value of a correlation? 

Juan and Kaylee have collected data for a class experiment. They correctly found an rvalue 

of .15. Juan claims that no function will model the data well. But, Kaylee says that 

the r-value is wrong because she has found a good model. Is this possible? How 

would you help these students? 

It is possible. Since correlation measures only a linear relationship, to have r 

close to or equal to zero does not mean that there is no relationship between X 

and Y! For example, a relationship might be quadratic. 



5. Make a data plot with a correlation of 0 by placing 8 to 10 points on the graph. 

6. Enter 4 points in the table to make a perfect rectangle. Note the correlation value. 



7. Create several other data sets with a horizontal or vertical line of symmetry and note 

the correlation value. 

Let us take a closer look at the numerical value of r by investigating the equation that 

produces this quantity. n is the number of points. 

8. Select “Show equation for r” and examine the formula to determine why and when 

the correlation is undefined and zero. (Hint: use two points) 

So far, we have developed some intuitions about r. Its formal definition is quite 

complex. However, r 2 is much simpler, so we mention it here. r 2 is the fraction of total 

variation in the y variable that can be explained by the regression equation. The rest of 

the variation is due to randomness or other factors. For example, if the correlation 

coefficient is 0.7 then r 2 = 0.49 meaning that 49% of the variation in the y-variable can be 

explained by the regression equation. The other 51% is due to other factors. How does 

this affect your understanding of how the strength of correlations are categorized in part 

A of activity 1? Consider some other “strong,” “moderate,” and “weak” r-values. 



ACTIVITY 3 - Correlation vs. Causation 

Objective 

To explore the relationship between correlation and causation. 

PART A 

In a Gallup poll, surveyors asked, “Do you believe correlation implies causation?” 64% 

of American’s answered “Yes” while only 38% replied “No”. The other 8% were 

undecided. 

Consider the following: 

Ice-cream sales are strongly correlated with crime rates. 

Therefore, ice-cream causes crime. 

If correlation implies causation, this would be a fabulous finding! To reduce or eliminate 

crime, all we would have to do is stop selling ice cream. Even though the two variables 

are strongly correlated, assuming that one causes the other would be erroneous. What 

are some possible explanations for the strong correlation between the two? One 

possibility might be that high temperatures increase crime rates (presumably by making 

people irritable) as well as ice-cream sales. 

An entertaining demonstration of this fallacy once appeared in an episode of The 

Simpsons (Season 7, "Much Apu About Nothing"). The city had just spent millions of 

dollars creating a highly sophisticated "Bear Patrol" in response to the sighting of a 

single bear the week before. 

Handout 3-1 



Homer: Not a bear in sight. The "Bear Patrol" is working like a charm! 

Lisa: That's specious reasoning, Dad. 

Homer: [uncomprehendingly] Thanks, honey. 

Lisa: By your logic, I could claim that this rock keeps tigers away. 

Homer: Hmm. How does it work? 

Lisa: It doesn't work; it's just a stupid rock! 

Homer: Uh-huh. 

Lisa: But I don't see any tigers around, do you? 

Homer: (pause) Lisa, I want to buy your rock. 

Correlations are often reported inferring causation in newspaper articles, magazines, 

and television news. But, without proper interpretation, causation should not be implied 

or assumed. 

Consider the following research undertaken by the University of Texas Health Science 

Center at San Antonio, appearing to show a link between consumption of diet soda and 

weight gain. 

The study of more than 600 normal-weight people found, eight years later, that they were 65 

percent more likely to be overweight if they drank one diet soda a day than if they drank none. 

And if they drank two or more diet sodas a day, they were even more likely to become 

overweight or obese. 

Our students and the general public often take such relationships as causal. By no 

means does this state that diet soda causes obesity - but there is a strange pattern at 

play here. 

A relationship other than causal might exist between the two variables. It is possible that 

there is some other variable or factor that is causing the outcome. This is sometimes 

referred to as the "third variable" or "missing variable" problem. 

• What are some other possible plausible alternative explanations to our diet 

soda/obesity research example? 

We must be very careful in interpreting correlation coefficients. Just because two 

variables are highly correlated does not mean that one causes the other. In statistical 

terms, we simply say that correlation does not imply causation. There are many 

Handout 3-2 



good examples of correlation which are nonsensical when interpreted in terms of 

causation. 

For example, each of the following are strongly correlated: 

• Ice cream sales and the number of shark attacks on swimmers. 

• Skirt lengths and stock prices (as stock prices go up, skirt lengths get shorter). 

• The number of cavities in elementary school children and vocabulary size. 

• Peanut butter sales and the economy. 

Two relationships which can be mistaken for causation are: 

1. Common response: Both X and Y respond to changes in some unobserved variable, 

Z. All three of our examples above are examples of common response. 

2. Confounding variables: The effect of X on Y is hopelessly mixed up with the effects 

of other variables on Y. 

When studying medical treatments, the placebo effect is an example of confounding. 

The placebo effect is the phenomenon that a patient's symptoms can be alleviated by 

an otherwise ineffective treatment, since the individual expects or believes that it will 

work. 

For example, if we are studying the effects of Tylenol on reducing pain, and we give a 

group of pain-sufferers Tylenol and record how much their pain is reduced, the effect of 

Tylenol is confounded with giving them any pill. Many people will report a reduction in 

pain by simply being given a sugar pill with no medication. 

False causes can be ruled out using the scientific method. This is done through a 

designed experiment. 

In practice, three conditions must be met in order to conclude that X causes Y, directly 

or indirectly: 

1) X must precede Y 

2) Y must not occur when X does not occur 

3) Y must occur whenever X occurs 

Handout 3-3 



Experimental research attempts to understand and predict causal relationships (X → Y). 

Since correlations can be created by an antecedent, Z, which causes both X and Y 

(Z → X & Y), or by confounding variables, controlled experiments are performed to 

remove these possibilities. Unless data has been gathered by experimental means and 

confounding variables have been eliminated, one can not infer causation. 

Still the great Scottish philosopher David Hume has argued that we can only perceive 

correlation, and causality can never truly be known or proven. 

Handout 3-4 



PART B - Headlines 

Consider the following headlines and their matching correlations from various sources. 

Accepted uncritically, each might be used to "prove" one’s point of view in an article. 

Within your group, brainstorm common causes or confounding variables. Write your 

ideas below and be prepared to share. 

Correlated variables Causation factors 

1. Kids’ TV Habits Tied to Lower IQ Scores 

IQ scores and hours of TV time (r = -0.54) 

2. Eating Pizza ‘Cuts Cancer Risk’ 

Pizza consumption and cancer rate (r = -0.59) 

3. Gun bill introduced to ward off crime 

Gun ownership and crime (r = 0.71) 

4. Reading Fights Cavities 

Number of cavities in elementary school children and their 

vocabulary size (r = 0.67) 

5. Graffiti Linked to Obesity in City Dwellers 

BMI and amount of graffiti and litter (r =0.45) 

6. Stop Global Warming: Become a Pirate 

Average global temperature and number of pirates ( r = -0.93) 

Handout 4 



Supplemental Reading 

NEW POLL SHOWS CORRELATION IS CAUSATION 

WASHINGTON (AP) The results of a new survey conducted by pollsters 

suggest that, contrary to common scientific wisdom, correlation does in fact imply causation. 

The highly reputable source, Gallup Polls, Inc., surveyed 1009 Americans during the month of 

October and asked them, "Do you believe correlation implies causation?" An overwhelming 

64% of American's answered "YES", while only 38% replied "NO". Another 8% were 

undecided. This result threatens to shake the foundations of both the scientific and mainstream 

community. 

"It is really a mandate from the people." commented one pundit who wished to remain 

anonymous. "It says that The American People are sick and tired of the scientific mumbo-jumbo 

that they keep trying to shove down our throats, and want some clear rules about what to 

believe. Now that correlation implies causation, not only is everything easier to understand, it 

also shows that even Science must answer to the will of John and Jane Q. Public." 

Others are excited because this new, important result actually gives insight into why the result 

occurred in the first place. "If you look at the numbers over the past two decades, you can see 

that Americans have been placing less and less faith in the old maxim 'Correlation is not 

Causation' as time progresses." explained pollster and pop media icon Sarah Purcell. "Now, 

with the results of the latest poll, we are able to determine that people's lack of belief in 

correlation not being causal has caused correlation to now become causal. It is a real advance 

in the field of meta-epistemology." 

This major philosophical advance is, surprisingly, looked on with skepticism amongst the 

theological community. Rabbi Marvin Pachino feels that the new finding will not affect the plight 

of theists around the world. "You see, those who hold a deep religious belief have a thing called 

faith, and with faith all things are possible. We still fervently believe, albeit contrary to strong 

evidence, that correlation does not imply causation. Our steadfast and determined faith has 

guided us through thousands of years of trials and tribulations, and so we will weather this storm 

and survive, as we have survived before." 

Joining the theologists in their skepticism are the philosophers. "It's really the chicken and the 

egg problem. Back when we had to worry about causation, we could debate which came first. 

Now that correlation IS causation, I'm pretty much out of work." philosopher-king Jesse "The 

Mind" Ventura told reporters. "I've spent the last fifteen years in a heated philosophical debate 

about epistemics, and then all of the sudden Gallup comes along and says, "Average household 

consumption of peanut butter is up, people prefer red to blue, and...by the way, CORRELATION 

IS CAUSATION. Do you know what this means? This means that good looks actually make 

you smarter! This means that Katie Couric makes the sun come up in the morning! This means 

that Bill Gates was right and the Y2K bug is Gregory's fault." Ventura was referring to Pope 

Gregory XIII, the 16 th century pontiff who introduced the "Gregorian Calendar" we use today, 

and who we now know is to blame for the year 2000. 

The scientific community is deeply divided on this matter. "It sure makes my job a lot easier." 

confided neuroscientist Thad Polk. "Those who criticize my work always point out that, although 

highly correlated, cerebral blood flow is not 'thought'. Now that we know correlation IS causal, I 

can solve that pesky mind-body problem and conclude that thinking is merely the dynamic 

movement of blood within cerebral tissue. This is going to make getting tenure a piece of cake!" 

Handout 5-1 



Anti-correlationist Travis Seymour is more cynical. "What about all the previous correlational 

results? Do they get grandfathered in? Like, the old stock market/hemline Pearson's rho is 

about 0.85. Does this mean dress lengths actually dictated the stock market, even though they 

did it at a time when correlation did not imply causation? And what about negative and 

marginally significant correlations? These questions must be answered before the scientific 

community will accept the results of the poll wholeheartedly. More research is definitely 

needed." 

Whether one welcomes the news or sheds a tear at the loss of the ages-old maxim that hoped 

to eternally separate the highly correlated from the causal, one must admit that the new logic is 

here and it's here to stay. Here to stay, of course, until next October, when Gallup, Inc. plans 

on administering the poll again. But chances are, once Americans begin seeing the 

entrepeneurial and market opportunities associated with this major philosophical advance, there 

will be no returning to the darker age when causal relationships were much more difficult to 

detect. 

http://www.obereed.net/hh/correlation.html 

Handout 5-2

Math TEKS Algebra 1 - Texas Comprehensive Center

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?