Statistics for Decision- Making in Business - Maricopa Community ...

Statistics for 

Decision- 

Making in 

Business 

1 st Edition 

Milos Podmanik

Foreword: What is This Book Good For 

You‟re probably thinking to yourself, “Who does this guy think he is by trying to write his own 

book” 

The answer is both satisfying and deceiving to those who expect the traditional math course with 

the traditional instructor. I write this course manual to most closely match my personal teaching 

philosophy. What might that be Well, I firmly believe that math education focuses too much on 

processes, templates, and repetitive, mundane computational skills. Is this of any importance To 

some extent, yes, they are important. For the most part, however, students fail to make 

connections from math to the real-world and vice versa. We tend to teach students how to “do” 

and not how to “think.” As a result, I believe it is far more important to promote a deep level of 

understanding, engagement, and connections to the planet we live on. After all, do you really 

want to become a calculator If your answer is “yes,” then this will come as a major 

disappointment: a computer could calculate faster and more accurately than you decades ago! 

Not to mention, computers will only continue to get faster and better than you at computing. 

Here‟s the good news: computers don‟t understand why they‟re doing what they‟re doing! They 

are simply computing machines. It takes (and most likely will always take) a rational, deepthinking 

human being to provide a contextual and meaningful analysis of the inputs and outputs 

of a numerical process. And that, my friends, is what this book is all about. 

Statistics for Decision-Making in Business © Milos Podmanik Page 2

A Note to Students 

This book is far from perfect. In fact, it will never be perfect. There is, however, a lot of blood, 

sweat, and tears put into this book (paper cuts hurt!). I spent much of my 2012 winter break 

thinking, writing, and rewriting contents in this book to make it feel “right” for both you and me. 

As such, I don‟t believe it‟s that much to ask for you to read the book. 

What‟s my point 

… Read this book! 


Table of Contents 

Chapter Section Concept Page 

1: Fundamentals of 

Statistics 

1.1 Data and Their Uses 5 

1.2 Descriptive VS. Inferential Statistics 12 

1.3 Statistics in Excel 21 

2: Visual Representations 

of Data 

3: Probability and 

Decision-Making 

4: Discrete Probability 

Distributions 

5: Continuous Probability 

Distributions 

2.1 Visualizing Categorical Data 29 

2.2 Visualizing Quantitative Data 43 

2.3 Descriptive Statistics – Center and Position 56 

2.4 Descriptive Statistics – Variability 67 

3.1 The Idea of Probability 82 

3.2 Joint Probability 89 

3.3 Probability of Unions 99 

3.4 Conditional Probability 107 

3.5 Combinations and Permutations 119 

3.6 Expected Value 135 

4.1 The Binomial Distribution 146 

5.1 The Ideas Behind the Continuous 

158 

Distribution 

5.2 The Normal Distribution 172 

6: Sampling Distributions 

and Estimation 

̅ 

̅ 

6.1 Sampling Distribution for 181 

6.2 Confidence Interval for 191 


6.3 Confidence Interval for ̂ 202 

7: Hypothesis Testing 

7.1 The Concept Behind Hypothesis Testing 208 

Appendices 

APPENDIX A: 

Answers to Select Problems 

220 

Chapter 1 

Fundamentals of Statistics 

1.1 Data and Their Uses 

Our lives are filled with information. While at one point we didn‟t have enough data in the 

world, now we have so much of it that computers need to be revamped continually in order to 

keep up with it. Facebook records rich information about hundreds of millions of users. Studies 

are revealing new conclusions that allow us to make decisions about choosing the right type of 

treatment for medical conditions. Scientific data is establishing the strong correlation between 

humans‟ interaction with the planet and changes in climate. The power of data is limitless. 

However, due to our regularly failing media expertise, the results of studies are often 

miscommunicated because they are not understood. In order to fully extract the meaningfulness 

of data, we must understand how to analyze them. We must be accurate and precise in what we 

measure and how we measure it. 

1.1.1 Three Good Reasons to Study Statistics 

In no particular order, these are: 

1. To be informed 

2. To be able to make good decisions based on data and to understand current issues 

3. To be able to evaluate decisions that affect the operations of a business and our personal 

lives 

1. To be informed 

What does it mean to be informed To be informed we should be able to understand and interpret 

tables, charts, and graphs. We should be able to make sense of conclusions of other's research 


ased on their numerical results. Moreover, we should be able to have insight into the gathering, 

summarization, and analysis of data, and so we should always approach numerical results with a 

slight bit of doubt. In other words, we ideally want to adopt the attitude of "doubt until enough 

evidence to trust." Let's take a look at some examples of where statistics have helped inform 

society. 

Examples: 

- Does it matter how long children are bottle-fed An experiment was run to determine 

differences in iron deficiency and the length of time that a child is bottle-fed. 

- In 2005, Medicare candidates faced a decision of which prescription medication plan to choose. 

A program called PlanFinder was made available online to compare available options. But, are 

senior citizens online 

- A study in 2005 attempted to answer the question, are students ruder today than in the past A 

survey was conducted. 

- Is domestic violence common A study in 2005 interviewed about 24,000 women to attempt to 

answer this question. 

- What factors are involved in student achievement in school Is study-time the most important 

factor in answering this question A study concluded that things such as prioritizing student 

achievement and encouraging teacher collaboration may have some impact. 

- Do the accounts receivable reported by a business accurately reflect the true accounts 

receivable The IRS randomly audits businesses to try and answer this question. 

- A stock’s share value change has fluctuated between -1.2% and 8.9% over the last year. What 

predictions should an investor make about the stock over the coming year in order to decide 

whether to purchase 

- CVS Pharmacy sells 5 lb. bags of 100% Pure Cane Granulated Sugar. As a quality control 

measure, the company would like to know the amount of variability in the true weight of sugar 

placed into each of the bags. 

2. Making Good Decisions 

How can we ever be sure that the results we're seeing or reading are truly the ones we should 

believe Although it is assumed that those who talk about data are supposed to understand 

statistics, you'd be surprised how poor some of their conclusions are. We'll definitely see why by 

the time this course is over. You'll learn how to summarize data how to analyze it, and, most 

importantly, how not to make conclusions about it. The title "Making Good Decisions" should 

not be new to you, hopefully. 

3. Evaluating Decisions that Affect Our Lives 

Are you satisfied that the Food and Drug Administration (FDA) has allowed a new patent for the 

drug Zoloft, which is now also useful for Social Anxiety Disorder (in addition to depression), but 


which has undergone no additional research to prove the claim Do you know why you're paying 

$720 for car insurance every six months, while you're roommate is paying only $450 If a 

mammogram comes back positive for breast cancer, is there any chance that this is a false 

positive Should you be surprised that no ethnic applicants were hired to a company if three 

applicants were to be selected, when 15 were Caucasian and 5 were Hispanic Is there a reason 

to suspect inequality It should not surprise you that these can be answered with probability and 

statistics. 

1.1.2 Types of Data 

In order to be able to reach the goals mentioned above, we need to have some sort of information 

about which to make our decisions – we call this information data. 

Data comes in two main categories: quantitative and qualitative/categorical. 

Quantitative variables, as the title implies, deal with numerical quantities. For example, the 

average revenue of a Whole Foods market store is considered a quantitative variable, since the 

measurement is a number. 

Qualitative variables, on the other hand, deal with qualities. For example, the type of television 

that a customer is likely to purchase is considered a qualitative variable, since its value will be, 

for instance, plasma, LED, LCD, etc. 

1.1.3 Not All Quantitative Variables Are As They Appear! 

Just because a variable is stated as a numerical value doesn‟t mean that it can be treated as a 

numerical value. A variable must be classified according to its scale of measurement. 

For instance, suppose you are to test three marketing tactics on customers. You call these tactics, 

Tactics 1, 2, and 3, respectively. These tactics have numerical values, but the numbers do not 

have any ordering significance. That is, tactic 1 is not necessarily better than tactic 3. These 

numbers serve simply as names for the values of the variables and cannot be numerically 

compared. We call this a variable of nominal scale. 

Suppose that a business magazine reports the top three new businesses in the city each month. 

That is, we have businesses 1, 2, and 3, where 1 is considered the best of the three, 2 the second 

best, and 3 the third best. In this case, we can talk about 1 being better than 2 and 3 and 3 being 

worse than 1 and 2. This type of variable has the properties of a nominal scaled variable, but also 

has the property of order. We call this a variable of ordinal scale. 

In another example, consider the variable IQ. Suppose two people have IQ‟s of 100 and 120. 

Based on this information, we can say that the person with 120 has a higher IQ. However, we 

can also say that the second person has an IQ that is 20 points higher than the first person. We 

couldn‟t really say this for the example above. In addition to being nominal (a person can be 

identified by their value) and ordinal (can rank the scores), we can also talk about the differences 

in scores. This type of variable is of interval scale. 


The most powerful type of variable is one that contains all of the above properties, but whose 

ratio between two values is meaningful and whose value of zero means a complete absence of 

the characteristic. While IQ is of an interval scale, it does not make much sense to say that the 

person with the 120 IQ is 20% . 

/ smarter than the person with the 100 IQ. 

Certainly we cannot say that a person with 0 IQ has no intelligence at all (this person is probably 

not even alive!). Consider, however, the median salary of different types of employees. One 

employee makes $100,000 and another makes $120,000. We can definitely say that the second 

person makes 20% more than the first person, and we can also say that a values of $0 would 

indicate a person makes no money at all (total absence of that variable). This variable is of ratio 

scale. 

1.1.4 How We Obtain Data 

The first question we have after knowing a bit about data is, how do we get it 

Existing Data 

In some instances, this data already exists and is available to the researcher. For instance, one 

can easily go online and find existing data on the U.S. public. We can view things like the 

average credit card debt per person by state, pounds of grains produced in the United States since 

1950, etc. This data is usually available through a number of websites, such as: 

 

 

 

 

 

 

U.S. Statistical Abstract (U.S. Census) - http://www.census.gov/compendia/statab/ 

Federal Reserve Board – http://www.federalreserve.org 

Office of Management and Budget – http://www.whitehouse.gov/omb 

Department of Commerce – http://www.doc.gov 

Bureau of Labor Statistics – http://www.bls.gov 

FedStats - http://www.fedstats.gov/ 

There are literally thousands of other repositories for existing data. Sometimes a little bit of 

research unveils a plethora of results. 

If a company is doing a study of its clients, it may already have a myriad of existing internal 

data. 

Conducting a Study to Obtain Data 

We hear a lot of things coming from our failing media sources. Data is blindly reported, while 

the method of data collection is ignored. Why do you think there are so many conflicting 

conclusions reached One week coffee is linked to cancer, while the next it fights cancer. Which 

is it 

Many times, observational studies are conducted. There is no experimenter manipulation in this 

type of study. For example, a zoologist might study elephant eating patterns in various climates 

to determine whether climate has an effect on caloric intake (response variable – what is 

measured). He probably cannot manipulate the climate (predictor variable – serves to predict 

responses) in which the elephant lives (for many reasons, not the least of which is the difficulty 


of transporting such an animal. Not to mention, there are startling ethical concerns with such an 

action!). He probably cannot dictate how much food is in the environment, either. Certainly, he 

can get an accurate reading of the elephant‟s food intake by following the animal for several 

days. At the end of the day, the zoologist is merely observing what happens. His conclusions are 

limited. 

An experiment, on the other hand, is a type of study in which the experimenter is able to control 

and manipulate most, if not all, environmental factors. If the experimenter is studying the effects 

of caffeine on math test scores, for instance, he would have a control group of, perhaps, students 

who he gives no coffee to and another, experimental group, to which he gives coffee with 60 

mg of caffeine. He then measures each group on test score performance (% of total correct): 

Suppose the experimental group does poorly compared to the control group. Can we be sure that 

it was due to the caffeine As long as test conditions were the same in each group, yes. If, 

however, there was something different between the two groups in addition to the 

presence/absence of caffeine, then the results are not so clear. What if, for instance, they played 

music with the control group and none with the control group How do we know better 

performance in the control group wasn‟t an effect of soothing music calming the nerves It could 

even have been a combination of no caffeine and music. 

Punchline: In an experiment, we manipulate one factor and hold all other conditions constant. 

Most of the time it is desirable to run an experiment. The number one reason for this is that we 

can usually collect evidence that leads to a cause-and-effect relationship, assuming the 

experiment is conducted properly. In an observational study it is impossible to do this as there 

are many confounding variables, or variables that might be related to the explanatory and 

response variable. Consider this classic example: a researcher counts the number of crimes 

committed in a city and then the number of churches in that city. She does this for quite a few 

cities. It is found that there is a positive relationship between the number of crimes committed 

and the number of churches. That is, as crime increases, so do the number of churches. What 

gives Do these people just repent more often for their guilty consciences 


It may not come as a large shock that we're dealing with potentially many confounding variables. 

The simplest one is population. As a city's population increases, more crime is committed and 

more churches are needed. This is but one possible explanation. 

Example 1: An educational researcher finds that there is a strong relationship between the 

number of hours a student studies and his/her grade point average (GPA) List a few possible 

confounding variables. 

SOLUTION: There is no guarantee that studying more causes a higher GPA. There are many 

factors that might influence a higher GPA: 

More sleep 

Less stress (maybe due to lack of job) 

Less television viewing 

Better study environment 

More support from family/friends 

Issues in Planning a Study 

There are many. Let's consider the following scenario to help illustrate a few. 

Scenario: Suppose we want to test whether or not a newly designed Freud circular saw blade 

runs at a lower temperature, and hence causes less burn marks in the wood, than the old blade at 

7200 revolutions per minute (RPM). 

Can we just run the cuts, take the temperatures, and compare I think you know the answer to 

this. 

First off, we face many extraneous factors, or variables that are not of interest in the current 

study but that are thought to affect the response variables. Examples The person doing the 

cutting with each blade (same or not). The type of wood being cut (is one pine and 

the other oak). The type of saw (low-power Craftsman, or professional Jet). 

In order to avoid having these types of factors affect our measurement, we must control them. 

We can do this by having the same person do the cutting, having both boards being cut exactly 

the same, and use the same saw for both tests. 

Secondly, is it sufficient to cut just one board using each blade Definitely not. We must expect 

that there will be some variation or variability in the temperatures we measure. That is, if I run 

the cut with the old saw four times, I may read temperatures of 205 , 202 , 209 and 219 . This 

difference among the measurements is called variability. Thus, to take into account the 

variability, we must take several replications, or repeated measurements. Then, we would likely 

use the mean, or average of the replications. 


Although far from last, we will consider here one more important concept. You might not think 

anything of it at first, but do you suppose that it's a good idea to use just two saw blades for the 

experiment - one old, one new What if we happened to get a faulty blade out of the 

batch If we run 4 replications with each blade, we might consider having 4 of the old blades and 

4 of the new blades. 

If you have a total of 8 sheets of wood to be cut, is it okay to cut the first 4 with the old blade and 

the last 4 with the new blade Surprisingly, the answer is "no." Why not Suppose the sheets 

were delivered freshly cut, and still moist. Well, moisture is subject to gravity, and so the last 

four boards might be more moist than the top four. Thus, we must randomize each board to one 

of the two types of saw blades. In other words, we randomly assign each board to a blade. We 

will not consider this any further at this point. 

Homework Problems - 1.1 

1. Classify each of the following variables as nominal, ordinal, interval, or ratio scale. 

Justify your answer. 

a. Favorite flavor of ice cream 

b. Temperature ( F) 

c. Accounts Receivable Balance 

d. Ranking of Presidential Candidates According to Preference 

2. Based on a study of 2121 children between the ages of one and four, researchers at the 

Medical College of Wisconsin concluded that there was an association between iron 

deficiency and the length of time that a child is bottle-fed (Milwaukee Journal Sentinal, 

November 26, 2005). 

a. How many elements does this dataset contain 

b. Is the variable categorical or quantitative Explain. 

3. The student senate at a university with 15,000 students is interested in the proportion of 

students who favor a change in the grading system to allow for plus and minus grades 

(e.g., B+, B, B-, rather than just B). Two hundred students are interviewed to determine 

their attitude toward this proposed change. 

a. How many elements does this dataset contain 

b. Is the variable categorical or quantitative Explain. 

4. An article titled “Guard Your Kids Against Allergies: Get Them a Pet” (San Luis Obispo 

Tribune, August 28, 2002) described a study that led researchers to conclude that “babies 

raised with two or more animals were about half as likely to have allergies by the time 

they turned six.” 

a. Is this study an observational study or an experiment Explain. 

b. Describe a potential confounding variable that illustrates why it is unreasonable to 

conclude that being raised with two or more animals is the cause of the observed 

lower allergy rate. 


5. The article “Television‟s Value to Kids: It‟s All in How They Use It” (Seattle Times, July 

6, 2005) described a study in which researchers analyzed standardized test results and 

television viewing habits of 1700 children. They found that children who averaged more 

than two hours of television viewing per day when they were younger than 3 tended to 

score lower on measures of reading ability and short term memory. 

a. Is the study described an observational study or an experiment 

b. Is it reasonable to conclude that watching two or more hours of television is the 

cause of lower reading scores Explain. 

6. “More than half of California‟s doctors say they are so frustrated with managed care they 

will quit, retire early, or leave the state within three years.” This conclusion from an 

article titled “Doctors Feeling Pessimistic, Study Finds” (San Luis Obispo Tribune, July 

15, 2001) was based on a mail survey conducted by the California Medical Association. 

Surveys were mailed to 19,000 California doctors, and 2000 completed surveys were 

returned. 

a. Is this study an observational study or an experiment Explain. 

b. Describe any concerns you have regarding the conclusion drawn. 

1.2Descriptive VS. Inferential Statistics 

1.2.1 The Purpose of Statistics and “Statistics” 

Statistics is a branch of mathematics that deals with the analysis of data. This is often confusing 

to some people, since the lower-case version of this word, statistic, actually means: a piece of 

data. So, we have statistics, which are the data themselves, and we have Statistics, which deals 

with the analysis of statistics. Confusing, huh We generally use the word statistics loosely to 

mean “data.” 

A statistician is a special type of mathematician who deals with the analysis of data. Many 

people confuse the profession of the statistician with a person who simply has many statistics 

memorized. While some certainly may, most do not. 

Needless to say, our purpose in the field of Statistics is to understand data. Depending on one‟s 

goal, statistics may be used to simply describe an obtained set of data or to extrapolate the data to 

describe something much larger. These two goals are respectively called, descriptive and 

inferential statistics. 

1.2.2 Descriptive Statistics 

Suppose you work in the accounting department and have collected the following data on 

revenues earned from new and existing customers over the past day: 


Account Type Revenue ($) 

New $5,296 

Old $2,230 

Old $7,643 

Old $3,897 

Old $9,590 

Old $2,689 

Old $5,890 

Old $9,561 

New $3,643 

New $8,861 

Old $3,946 

Your goal is to summarize the data in some meaningful way(s). Descriptive statistics is the 

method of describing or summarizing data. How could this be done 

We first consider the types of variables we have present: 

Account type – Categorical 

o New, Old 

 

Revenue – Quantitative 

o Range from $2,230 to $9,590 

With categorical variables, we cannot mathematically manipulate the observed values, or 

observations (here we have “New” and “Old” for observations). We can only provide 

descriptions of the values. 

We can provide the relative frequency of these values. A relative frequency is a ratio of the 

number of observations of a given value to the total number of observations. Here, we could 

summarize by saying: 

Account Type Relative Frequency 

New 

Old 

This allows us to conclude that 27% of the sales came from new clients while 73% came from 

existing clients. This is very valuable information! This information demonstrates that the 

company has grown over the course of this one day. 

We could present these two descriptive statistics to management by either providing the raw 

percentages, or by some visual display, such as a pie chart or a bar graph. A pie chart shows 

the ratios (or all parts of one whole) of the categorical variable and thus the entire circle 

represents 100% of all account types (100% of the categorical variable values): 


Frequency 

Account Type 

New 

27% 

Old 

73% 

This literally shows the “ingredients” of the pie. A corresponding bar graph might be: 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

New 

Account Type 

Type 

Old 

In a similar way, we could describe Revenue, the quantitative variable. Typically, quantitative 

variables are described by: 

 

 

Central tendency – measure of the “typical” or center-most observation. Examples are 

mean (average), median (the value that is literally the middle number), and mode (most 

frequently occurring number – typically not used and data sets usually do not have one). 

Variability – measure of how spread-out the data values are. A number of possible 

measures exist including (but not limited to): range, interquartile range, and standard 

deviation. 


For the present time, we‟ll proceed to describe one of each of the above descriptive statistics. 

The rest will be discussed in later sections. 

Since we‟re most used to finding a simple average, or the mean, we will do that here. Recall, that 

the mean can be found by summing the observations and dividing by the number of 

observations: 

Recall that when we find an average, we are placing all values into a common “pot.” We then 

divide the pot into equal parts. That is to say, if each company had spent the same amount of 

money on each purchase, they would each spend $5,750. We like to think of this as a measure of 

the center value. Spending less than this amount puts a company below the average and spending 

more puts the company above the average. 

Mean (Simple Average) 

The mean, or simple average, of a quantitative variable is expressed as: 

This value represents the amount allocated to each observation, if each observation were to 

receive an equal share of the total. We think of this as the “center” value. 

In conjunction with measures that summarize the center, it is critical to focus also on how spread 

out the data is. One such measure is the range. The range is simply the difference between the 

minimum and maximum values in the dataset. In this instance, we have: 

Minimum: $2,230 

Maximimum: $9,590 

The difference is: 

Thus, the range of the dataset is $7,360. This tells us that the amount spent varied by as much as 

$7,360 from company-to-company. 

Range 

Range, a measure of the variability (or spread) of a dataset, is measured by taking the difference 

between the largest and smallest observed value. That is, 


Example 1: For the example considered above, summarize the center and spread of revenue 

by account type. Describe any information revealed by splitting up the data in this fashion. 

SOLUTION: We are being asked to look at values specific to the account type. Thus, we will 

have two means and two ranges. 

For “New” accounts: 


New $5,296 

New $3,643 

New $8,861 

For “Old” accounts: 


Old $2,230 

Old $7,643 

Old $3,897 

Old $9,590 

Old $2,689 

Old $5,890 

Old $9,561 

Old $3,946 

We summarize this information in a table: 


Row Labels Average of Revenue ($) Max of Revenue ($) Min of Revenue ($) Range 

New 5933 8861 3643 5218 

Old 5681 9590 2230 7360 

Grand Total 5750 9590 2230 

We see that both company‟s tend to have about the same average purchase amount. However, it 

appears that the amount spent by old customers is prone to more fluctuation than that of new 

customers. This might be due simply to the fact that there are only three new customers. 

Technology Note: All of the information above was generated using Microsoft Excel. 

1.2.3 Inferential Statistics 

Descriptive statistics is a great way to describe what you have, but how can we describe data that 

we do not have 

Let‟s consider an example. You are the manager of the production branch at Healthy Heart 

Organic Foods. Due to recent workload increases, you are concerned that your employees‟ team 

morale has decreased. You have 864 employees working in your department. You would like to 

conduct a survey, but you do not have the means to investigate the data in each of the surveys 

provided. Certainly, you could pay your assistant overtime to analyze them for you, but that 

would be costly of his time and payroll. Instead, you decide to randomly survey 50 of the 

employees in your department in order to get an idea of the overall morale. This process of 

collecting data on a smaller portion of the whole in order to generalize to the whole is known as 

statistical inference. This branch of statistics is called inferential statistics. 

It is of utmost importance to make appropriate conclusions when reporting findings of any study, 

a survey or an experiment. For example, if we find that rats die after ingestion of 20mg of 

caffeine, does that mean caffeine will kill a human, as well This brings up the worthwhile 

discussion of a population versus the sample. Let‟s consider the figure below: 


First off, a researcher must decide who his target population is. That is, is he trying to describe 

all people in the United States All Asian children between the ages of 2 and 5 All elk in 

Minnesota The population is the set of all people, creatures, things, etc., that we wish to 

describe. 

It is often quite time-consuming and costly to conduct a study based on whole populations. Even 

presidential polls rarely involve more than a couple hundred participants. Through one of a 

variety of processes, only a select number of elements of the target population will be selected. 

This select number is referred to as the sample. The process of selecting a sample from the 

population that we will consider is simple random sampling (SRS). This process helps to 

ensure that any differences that we notice among sample elements is entirely due to chance and, 

importantly, that every element in the target population has an equally likely chance of being in 

the sample. 

Simple random sampling can be done by many means. You’ve probably heard of the random 

process of drawing a name from a box to declare the winner of a raffle. More sophisticated 

means of this are done by a random number generator on a computer, wherein every element of 

the population is assigned a whole number. Then, a series of random numbers is drawn by a 

computer and those elements are selected to be in the sample. 

We can see in the illustration above that our goal is to then make inferences about the population 

based on our observations of the sample. Just as you might hear from Gallup: “55% of voters 

plan on voting for Candidate X,” we try to make generalizations based on the target population. 

As another example, consider a lighting company that is hoping to manufacture a light bulb with 

a new type of filament. As with any light bulb, a consumer would want to know how long the 

light bulb is expected to last. Unfortunately, not every light bulb will last equally long as every 

other light bulb. This means that an average will have to be taken. To add to this, it is not 

possible to test every single light bulb to determine how long it will last. So, the company 

decides to randomly test 200 bulbs that come through the assembly line. They hope to use this 

sample, since it is random and is assumed to be representative of all light bulbs, to estimate the 

true average lifespan of a light bulb with this new filament. Here is an overview of their 

inferential statistics process: 


(SOURCE: Essentials of Modern Business Statistics, 4 th Edition, Anderson, et. al.) 

Though it might seem simple enough to conclude that the average light bulb survives for 76 

hours, we have to take into account the variability in the lifetimes. That is to say, we need some 

way to produce a reasonable interval for the true average, since it is the entire population we are 

looking to describe. A discussion of this inference process is left for future sections. 


1. Over its first week in the Box Office (12/14/2012 to 12/20/2012), the movie The Hobbit: 

An Unexpected Journey grossed the following amounts, in millions of dollars (no 

particular order): 

6.9 9.2 1.6 1.9 1.9 1.6 4.9 

(SOURCE: www.the-numbers.com) 

a. Calculate the mean. 

b. Explain the real-world meaning of the mean. 

c. Calculate the range. 

d. Explain the real-world meaning of the range. 

e. Provide a brief written report (summary) to the producers of the film on how the 

film is doing and the stability of gross revenues. 

2. A marketing firm conducts a focus group with eighteen randomly selected college 

students to determine their preference for a variety of clothing lines. 

a. Describe the sample. 

b. Describe the population. 

c. What variables might the marketing firm want to measure 


d. Is the firm‟s goal to conduct descriptive or inferential statistics 

3. In a quality control process, 250 packages of cheese are randomly selected from an 

assembly line. Each package of cheese will be described as either “pass” or “fail,” 

depending on whether or not it passes the inspection. 

a. Describe the sample. 

b. Describe the population. 

c. Quality control will fail if more than 1% of the packages fail. How many 

packages must pass 

4. Two datasets have a range of 30. Describe how it is possible that one dataset is 

considered to be more spread out that the other dataset. 

5. One hundred randomly selected CGCC students are surveyed and asked, “Do you believe 

that racism is an issue in the college setting” The survey makers would like to generalize 

to college students. What is wrong with their study 


1.3Statistics in Excel 

When conducting an analysis of realistic amounts of data, it is tiresome, mundane, and even 

unfeasible to carry out computations by hand. Microsoft Excel is by far a more powerful and 

accessible piece of software that does this all for us. As such, we seek to better understand how it 

works in this section. All images below come from the most recent version of Microsoft Excel. 

Excel is a spreadsheet-based software. This means that each entry, or cell, represents one piece 

of information that is all a part of a larger grid of cells. A cell may contain numerical or textual 

information. 

1.3.1 Sum(), Average(), Min(), and Max() 

Eventually, you will learn to make beautiful spreadsheets, but we are now only concerned with 

some basic features. Let‟s begin by entering the following accounting data from Section 1.2: 



New $5,296 

Old $2,230 

Old $7,643 

Old $3,897 

Old $9,590 

Old $2,689 

Old $5,890 

Old $9,561 

New $3,643 

New $8,861 

Old $3,946 

We can choose any cell we want to begin entering data. Let‟s choose cell A1 to type in the 

header. This cell reference means that we are looking at row A and column 1. We will enter our 

second column‟s label into cell B1. We will list the data vertically, as shown in the table above. 

After clicking on a cell and typing in each entry, simply press ENTER or TAB to move to the 

next cell. Do not press ESC, or the data you are typing will be cancelled. 

In order to see the entire labels in cells A1 and B1, we can expand the column by placing the 

cursor between the grey-shaded labels for columns A and B, clicking, holding, and dragging the 

window to an appropriate size. 


We can make it a bit more presentable by centering and by bolding the labels. 

Excel is extremely useful due to the fact that it allows us to create formulas based on the values 

of existing cells or cell ranges (a collection of one or more cells). 

A formula can either act on a provided value or on a provided set of cells. For example, suppose 

we want to add up the total revenue. We want the result to appear in cell D3. To initiate a 

formula, we must begin with = in the desired formula cell. Thus, we could click cell D3 and 

type: 

This, however, would defeat the purpose of having entered all the data in already! So, we will 

use the built in sum function. To use this, we type: 


= sum(B2:B12) 

This tells Excel to sum up the range of values from B2 to B12. The colon indicates that we want 

the full range and not just the two cells B2 and B12. If we were only to have wanted to sum cells 

B2 and B12 (no in between), then we would have replaced the colon with a comma. 

NOTE: Excel is not case-sensitive when it comes to formulas. You can type SUM or Sum or 

even sUm and Excel will recognize what you are asking it to do. However, if you are analyzing 

categorical data, then “New” is not recognized as being the same as “new.” 

We get: 

(NOTE: It is highly recommended that you label your spreadsheet values. Before or after 

inserting the sum into D3, it is a good idea to label that cell‟s content, perhaps in cell C3 as 

shown above. This will be very helpful when your spreadsheet is loaded with information.) 

To get the proper formatting, highlight cell D3 and select “Currency” from the Number column 

in the Home Tab. This formatting only applies to the selected cell(s). 

To find the average revenue, we would simply type the following into the desired cell (we‟ll use 

D4): 

= average(B2:B12) 


For measures such as the range, Excel does not have a built-in range function. Excel does have a 

function to locate the maximum and minimum values in a range of cells. Into cell D5, we will 

type in: 

= max(B2:B12) – min(B2:B12) 

This will find the maximum value from B2 to B12 and subtract away the minimum from B2 to 

B12, giving us precisely the range. If it is desirable to see the max or the min, you can choose a 

cell and simply type in the max portion or the min portion without doing the subtraction, as 

shown below: 


Suppose that this company assumes the daily revenue of $63,246 is (roughly) expected to be 

earned on a daily basis over the next 30-day month. To get the month‟s revenue we would like to 

multiply this amount by 30. To do this, we would simply type into our desired output cell: 

= 30*D3 

NOTE: To indicate multiplication in Excel formulas, you must use the multiplication sign. 

Parenthesis to indicate multiplication will produce an error. 

There are literally hundreds of functions available through Excel. A very useful tool for learning 

how to do new things in Excel is to Google what you are trying to accomplish. For example, if I 

wanted to find the standard deviation of revenues, I might search Google for “standard deviation 

in Excel.” Thousands of results are bound to pop-up. Why stop there… try YouTube for many 

useful videos. 

1.3.2 Countif() 

It is nice to know that Excel has formulas to operate on quantities, but it could still be 

devastating to have to count categorical values by hand. 

The countif() function is useful for such an act. This function works as follows: you provide a 

range of cells for the function to evaluate. You then provide a condition that it should search for 

and it counts the number of such instances. Suppose we want to count the number of new 

accounts in cells B2 to B12. We would enter: 

= countif(B2:B12, “New”) 

NOTE: we separate the cell range with a comma. After the comma, we type in parenthesis the 

word it is to search for. Note that case does matter here, since we need to tell Excel exactly what 

to search for. 


We get: 

We can do the same for Old. 

A neat little trick is to modify our formula. Let‟s say that we want to minimize the number of 

areas in our spreadsheet that we would need to change if, say, we began calling “New” accounts 

“NB” for “New Business.” We would need to change all the account type names, as well as the 

search criteria in the formula. To make this easier, we can tell our formula to search for 

something that is already typed into an existing cell. Since C10 contains the actual word we want 

to search for, we will simply put C10 after the comma instead of the word “New.” 

= countif(B2:B12, C10) 

This tells Excel what cells to count, and it tells it what cell to find the search criteria in. We still 

get the same result. Caution to the wind: if you modify the entry in C10, your result in D10 will 

change accordingly (or it might produce an error). 


1. A new policy prohibiting personal emails being sent is enforced by a telemarketing 

company. A climate survey was then conducted to ask whether a randomly selected 

number of employees agrees with the policy, and the duration of time they‟ve been with 

the company. The results are below: 


Agrees w/Policy 

Change 

Years at 

Company 

Y 4 

Y 8 

N 3 

Y 10 

N 3 

N 3 

N 6 

Y 3 

Y 5 

N 8 

N 1 

Y 8 

Y 10 

Y 5 

Y 8 

Y 3 

N 8 

N 8 

Y 9 

a. Determine the mean number of years this sample has been with the company. 

b. Determine the minimum and maximum number of years a person from this 

sample has been with the company. 

c. Determine the combined overall number of years this sample has been with the 

company. 

d. Determine the frequency with which people within this sample agreed and 

disagreed with the policy change. 

e. Calculate the mean, the minimum and maximum, and the range for each of the 

two groups (agree and disagree). 

f. Describe any patterns that emerged when considering the two groups separately. 


Chapter 2 

Visual Representations of Data 

2.1 Visualizing Categorical Data 

When summarizing data, it goes without say that there are appropriate and inappropriate ways to 

display the data. For example, if you collected a person‟s age and income, you might be 

interested in studying income as a function of age. In this case, you probably would not want to 

build a pie chart, since you‟re studying quantitative variables (two of them, at that). 

In the previous chapter, the main types of categorical data visualizations were mentioned – bar 

graphs and pie charts. Our aim here is simply to summarize and to show how to use them in 

conjunction with Excel. We‟ll create three types of representations: 

 

 

 

Pie Chart 

Frequency Bar Graph – Vertical axis keeps tracks the number of instances of each 

observation 

Relative Frequency Bar Graph – Vertical axis keeps tracks the ratio of instances of each 

observation (decimal or percentage, typically) 

2.1.1 Creating a Pie Chart Using Excel 

Suppose a hotel owner asks 20 randomly selected recent guests to respond to the following 

statement regarding their experiences at the new hotel lounge: 

“The dining experience in Harlan’s Hotel Lounge is worth revisiting.” 

Respondents circle one of the following letter combinations: 

- SD - Strongly Disagree 

- D -Disagree 

- A - Agree 

- SA - Strongly Agree 

The resulting data is shown below: 

Participant 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 

Opinion D A SD A SD SA A A A A A A D A A A A A A A 


To represent the data to his shareholders, his marketing team constructs the above visual 

representations. 

Since the participant number is not important, it is okay to ignore that line of the dataset. Our 

focus is on the Opinion row. This is a categorical variable, so we‟ll begin by counting the 

number of SD, D, A, and SA responses by using Excel‟s countif() option. Further, we‟ll calculate 

the relative frequency of each response by dividing the number of responses for each category by 

the total number of observations, which we tally below all the individual frequencies: 

One new trick worth mentioning is Excel‟s ability to recognize patterns in our formulas. Let‟s 

say that we typed in our countif() formula for SD in G7 as follows. 


We now have to enter a formula for the three remaining opinions. This can get time-consuming. 

So, we attempt to copy cell G7 and paste it in G8: 

This does work! Note that, since we shifted the formula down one level, F7 turned into F8. That 

is, the search criteria is now being “pulled” from F8, the cell corresponding to an opinion of „D‟. 

However, we have one problem: the counting region also shifted from D6:D25 to D7:D26. We 

don‟t want that! To tell Excel that we still want the counting region to be D6:D25 and to not 

change when we copy our formula, we “lock” the rows and columns by putting a dollar-sign ($) 

before the row letter and before the column number, as shown below: 

(HINT: If you place your cursor over each of the cell names in the formula and press command 

F4 on your keyboard, you will notice the dollar-sign toggle for you) 

Notice that F7 contains no dollar-signs, so as to indicate to Excel that we wish for the criteria cell 

to adjust down one row (still in column F) as we move down one row. We can now copy-paste 

the formula down the remaining cells: 


In G12, we would like the sum of the frequencies, so we type: 

= sum(G7:G10) 


We know from the data that this value is correct! 

To get the relative frequencies, we want to divide each frequency by the constant 20. For 

instance, the relative frequency of „A‟ would be 2/20 = 0.1. Instead of telling Excel to divide 2 

by 20, we will type the following formula into H7: 

= G7/$G$11 

Note that we lock cell G11 so that, when we copy this formula to the remaining cells, we 

continue to divide by 20, the value in G11. 

It is neat to note that we can copy the formula all the way down to H11, since it will simply take 

20 and divide it by 20, indicating that the total is 1 or 100% of the data. 


We are now prepared to construct visuals. 

To build a pie chart, we can simply highlight the four opinions and the corresponding 

frequencies (click and drag from cell F7 to G10), selecting the Insert tab, clicking on Pie in the 

Charts column, and selecting the desired pie chart. We‟ll select the first one. 


Alternatively, it is possible to insert a blank pie chart and to then select the data afterwards. The 

above process saves a couple of steps. 

Now we would like to label the chart. It would be nice to see a title and the percentages for each 

of the slices. To do this, select the chart and click on Design in the Chart Tools tab that appears. 

In the Chart Layouts column, we can select the style of chart most appropriate to our needs. For 

demonstration purposes, the first option will be shown below: 


To add a suitable title, click “Chart Title” and overwrite it with an appropriate name. If the pie 

chart become distorted or label are moved undesirably, the chat box can be adjusted by dragging 

out its corners. 

There are many options when it comes to formatting graphs and charts. This will be left for 

exploration. Note also that many online sources, such as YouTube, offer tutorials on professional 

formatting within Excel. 

2.1.2 Creating a Bar Graph Using Excel 

Depending on what one would like to emphasize, a bar graph may be suitable to meet that need. 

We can create either a frequency bar graph or a relative frequency, depending on whether we 

want to display the number of times an observation appears or the percentage of observations 

resulting in each of the possible variable values. 

Using our example from above, since the frequencies are in the column adjacent to the opinion 

value, we can simply highlight all observations and frequencies and select the Insert tab, the 

Charts column, and select the first 2-D Column graph from Column. Be careful not to select the 

Total row. 


16 

14 

12 

10 

8 

6 

Series1 

4 

2 

0 

SD D A SA 

There is only one variable here, we can click on the “Series1” in the legend and press DELETE. 

This will free-up some space. 

16 

14 

12 

10 

8 

6 

4 

2 

0 

SD D A SA 

With the graph selected, Choose the Layout tab that appears in the Chart Tools area. 


Frequency 

You can label the graph by selecting appropriate options from “Chart Title” and “Axis Titles” on 

the left side of the selected tab. 

Guest Opinions of Harlan's Lounge 

16 

14 

12 

10 

8 

6 

4 

2 

0 

SD D A SA 

Opinion 

In the relative frequency bar graph, we wish only to change the measurement on the vertical axis. 

We want to draw the proportions from the third column of our data. 

We can update our current bar graph to reflect this. If you do not want to lose the information in 

your frequency bar graph, you can copy the graph and paste it beside the existing graph. This 

will allow us to modify the data that is being drawn in. 


Selected the copied graph. In Chart Tools, select the Design tab. From there, click on Select 

Data. 

Select the “Edit” option above the “Legend Entries” box. 

Beside the “Series values” box, click the icon. This will now allow you to select the values 

of the dependent variable. Click and drag to select all the relative frequencies, except the total 

frequency. Then press the icon to close the dialogue box. After relabeling the vertical axis, 

you should now see: 


Relative Frequency 

Guest Opinions of Harlan's Lounge 

0.8 

0.7 

0.6 

0.5 

0.4 

0.3 

0.2 

0.1 

0 

SD D A SA 

Opinion 

We notice that both graphs look nearly identical. This is due to the fact that the relative 

frequencies are proportional to the frequencies (they are the frequencies multiplied by 1/20!). 

2.1.3 Conclusions 

The owner of the hotel can reasonably conclude that 80% of his recent guests enjoyed the lounge 

(enough to consider revisiting!). He can conclude that 20% of his guests either did not care for it 

or absolutely hated it! If he is interested in additional repeat visitors, perhaps he might like to 

determine how to make the experience better for those who seem to be highly dissatisfied. Are 

these descriptive measures demonstrative of the entire population of visitors To a greater or 

lesser extent – perhaps. 


1. The following dataset represents the meat selection made by individuals at a dinner 

banquet. Attendees selected from beef (B), chicken (C) veal (V), or pork (P). 

B C B C V B C 

C C P P B B C 

a. Is this data categorical or quantitative 

b. Create a table that shows the frequency and relative frequency for each of the 

choices. Use Excel. 

c. Create a frequency bar graph. Label all axes. 

d. Create a relative frequency bar graph. Label all axes. 

e. Create a pie chart. Label all axes. 

f. Write a brief report (summary) describing the meal preferences of these attendees. 

Describe any general trends. Use specific data and make appropriate conclusions. 


2. The following data represents per capita meat consumption (pounds per person) in 2009 

for a variety of meats (SOURCE: U.S. Statistical Abstract, Table 217). 

Pounds per 

Meat Person 

Beef 58.1 

Veal 0.3 

Lamb and mutton 0.7 

Pork 46.6 

Chicken 56.0 

Turkey 13.3 

a. Using Excel, find the mean and range of the data. 

b. Explain the real-world meaning of the mean you found. 

c. Explain the real-world meaning of the range you found. 

d. What conclusions can be made about the center and spread of per-capita meat 

consumption 

3. On opening day, the owners of Green Heart Restaurant invited 29 food critics to be a part 

of the culinary experience. Each critic gave a grade of A (Best), B, C, D, or F (Worst) to 

reflect the quality of the overall dining experience. The scores are shown below: 

A B B A C B C B B 

D C B B A A C C C 

C B A D C C B B B 

A B 

a. Generate a relative frequency bar chart. 

b. Generate a pie chart. 

c. What should the owners take away from the experiences of the critics 

4. Consider the scenario in problem 1. 

a. What is the sample 

b. What is the population of interest 

c. What other variable(s) might be of interest to the data analyst to better study 

attendees‟ eating preferences 

5. Consider the scenario in problem 3. 

a. What is the sample 


c. What other variable(s) might be of interest to the data analyst to better study the 

target demographic 


6. Suppose you are the owner of an accounting firm. You would like to better understand 

the employment of the residents within ten miles of your firm. 

a. What variables would you collect Which are quantitative and which are 

qualitative 


c. How would you go about collecting data for this study Be specific. 


Frequency 

2.2 Visualizing Quantitative Data 

To make an assessment of how efficient the technical support department is in helping customers 

solve software issues, management keeps track of the length of each phone call taking place over 

the day. They find the following: 

Length of Call (min.) 

1 2 13 4 12 

4 10 6 6 9 

4 3 4 0 12 

6 4 4 13 15 

0 4 10 4 10 

7 2 10 8 4 

7 0 4 4 4 

Since this data is quantitative, the discussed visual displays are not appropriate. However, 

management still would like to visualize the 35 observations. 

One quick, by-hand technique to visualize how the times appear would be a dot plot, or a simple 

number line, with any repeats stacked above others. Given the presence of great technology, we 

will use Excel to create a histogram, which is a graph similar to a bar graph (can be either 

frequency or relative frequency). The difference is that, instead of having nominal categories on 

the horizontal axis, we will create numerical categories. For example, we could simply create 

tick marks for each observation value present in the table and to then display the number of time 

it appears. Often, with small amounts of data, the graph may appear spread out. In this case, we 

might decide to create a bar representing, say, all calls that fall between 0 and 3 minutes. Let‟s 

demonstrate both: 

14 

12 

10 

Call Times 

8 

6 

4 

2 

0 

0 1 2 3 4 6 7 8 9 10 12 13 15 

Length (min.) 


Frequency 

We clearly see that most calls are between about 4 and 10 minutes (a 4-minute call is most 

frequent – the mode). Alternatively, we might choose to create equal-width categories. Let‟s say 

we have categories that show the times as 3-minute blocks: 

14 

12 

10 

Call Times 

8 

6 

4 

2 

0 

0-2 3-5 6-8 9-11 12-15 

Length (min.) 

Beautiful! Now it is more clear how call times are distributed. This visualization is a bit simpler 

than the one above, as it groups times into more manageable categories. Note that the bars are 

touching. This is the distinction of a histogram from a bar graph – we want to emphasize that 

times are continuous and that every time length between 0 and 15 are accounted for (even 

fractions of minute, potentially). 

We can make these categories as wide or narrow as we‟d like. We call these categories bins. 

Think about this as you would about sorting recycling materials into one of several bins. 

2.2.1 Creating a Frequency Histogram Using Excel 

The most time-consuming part of building a histogram by hand is organizing the data and 

counting the number of observations. Excel does this quite easily via the use of a pivot table. A 

pivot table is a “live” table whose values can be formatted in many different ways. 

We must first begin with the dataset in Excel as a raw column or row of data: 

To insert a pivot table, highlight the entire set of data, including the data label. Click on the 

Insert tab and choose the PivotTable option from the Tables column. A data prompt should 

appear with the table range already appearing in the box: 


You can either choose to have Excel place the table within the same worksheet, or you can have 

it create a new one. This choice is up to you. If you choose “Existing Worksheet” you will have 

to specify a cell to paste it to. Choose a cell that is out of the way of any existing data so that it 

doesn‟t “bump” into it if the pivot table becomes quite large. 

You should now see something similar to the table below: 


When highlighted, a “PivotTable Field List” window should appear to the right of your screen 

with the name(s) of the variable(s) in the “Choose fields to add to report” box. 

This generic template will now allow us to construct a table. From the PivotTable Field List 

window, we will drag the Times variable into the Row Labels box. This will create a series of 

rows with each of the observations appearing, only once. Thus, we will not have to see repeats! 


If we had additional variables, the row labels can be any variable desired. For each of these rows, 

we would like to see a frequency count. This is where the “Values” box comes in handy. Drag 

the Times variable into the “Values” box: 

The values of time are, by default, the sums of the times for each of the row labels. This is not 

what we want. We want “Count of Times.” To change the type of value, click the arrow on the 

“Sum of Times” button. Choose “Value Field Settings.” Change “Summarize value field by” 

option to “Count” and close the dialogue box: 


We can double-check that these values are correct by noting that the Grand Total is 35, the same 

as the number of observations. We would like a histogram to show the “Row Labels” along the 

horizontal axis and the “Count of Times” along the vertical axis. To do this, select the pivot table 

and choose the Options tab from the PivotTable Tools menu. 


Select PivotChart and select the first graphing option: 


Frequency 

We make a few adjustments: delete the legend, re-label the chart title, and remove the two grey 

boxes. Now that a graph has been inserted, a PivotChart Tools menu appears when the graph is 

highlighted. This is very similar to inserting a regular graph. Select Layout to add axis labels. To 

remove the grey boxes, right-click either box and select “Hide All Field Buttons on Chart.” 

14 

12 

10 

Histogram of Call Times 

8 

6 

4 

2 

0 

0 1 2 3 4 6 7 8 9 10 12 13 15 

Times (min.) 

To make the gaps between bars disappear, select the graph and choose the eighth graph option 

from the Design tab in the PivotChart Tools menu shown below (NOTE: this option will 

automatically put in axis labels): 


Frequency 

To make solid black lines appear as the outlines for each bar, change the bar styles from “Chart 

Styles.” 

14 

12 

10 


8 

6 

4 

2 

0 

0 1 2 3 4 6 7 8 9 10 12 13 15 


We now would like to adjust the bin widths. Doing this is simple! 

Select the pivot table. From the Options tab under the PivotTable Tools menu, choose “Group 

Selection” from the Group column. In the dialogue box that appears, the “Starting at” and 

“Ending at” boxes should reflect the smallest and largest values of the variable. You can adjust 

these to be wider or narrower, if you choose to show less than the full dataset. In the “By:” box, 

put the width of the classes. In this case, we chose 3. Press “OK” and the you should then see the 

updated pivot table and graph! 


To change frequency to relative frequency, we must now change “Count of Times” in the 

“Values” box of the “PivotTable Field List.” Click on “Count of Times” and select “Value Field 

Settings.” Within the dialogue box, choose the “Show Value As” tab and choose values to show 

as “% of Grand Total.” Press “OK.” Adjust the vertical axis label accordingly. 




40.00% 

35.00% 

30.00% 

25.00% 

20.00% 

15.00% 

10.00% 

5.00% 

0.00% 

0-2 3-5 6-8 9-11 12-15 



1. An instructor grades a math test and produces the following histogram: 


Frequency 

10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

0 

Histogran of Test Percentages 

60-64 65-69 70-74 75-79 85-90 

Percentage Earned 

a. What can the instructor conclude about the fairness of the test 

b. What appears to be the mean score, based on the histogram 

c. What is the approximate range of scores, and why is it only possible to be 

approximate this from the given information 

2. A cashier at a mall retail clothing outlet asked customers their age for an anonymous 

survey. The ages he collected can be found below: 

31 34 30 30 31 27 33 36 

33 30 29 28 20 32 24 30 

32 30 30 22 31 38 28 31 

25 24 25 31 25 24 36 32 

24 31 31 32 31 31 28 31 

33 20 32 32 52 31 27 30 




d. Create a relative frequency histogram for age. Leave your bin width as 1 year. 

e. Create a relative frequency histogram for age with bin width 5 years. 

f. Describe any trends in the age of shoppers at this store. 

g. Based on your answer to e), which age group(s) can be omitted from the 

company‟s marketing tactics, in an effort to focus only on the regular shoppers 

3. The total number of people (in millions) working in all of the various industries in the 

United States in 2010 is given in the table below: 

2.206 0.731 9.077 14.081 8.789 5.293 3.805 15.934 

7.134 5.88 1.253 3.149 9.35 6.605 2.745 15.253 

9.115 6.138 32.062 13.155 18.907 6.249 9.406 3.252 

12.53 2.966 9.564 6.769 6.102 0.667 6.983 

(SOURCE: U.S. Statistical Abstract, Table 619) 





d. Create a relative frequency histogram for age. Leave your bin width as 2 million 

people. 

e. Create a relative frequency histogram for age with bin width 5 million people. 

f. The federal government regularly publishes reports on employment across the 

many industries. Using the information you have gathered, generate a brief report 

detailing your findings, including any trends in employment. 

4. A resort chain that wishes to expand is constantly searching for new sites to add 

properties that will be profitable. A good place to start is by considering climates. 

Suppose Starwood Hotels and Resorts Worldwide obtains the following data from the 

U.S. Census Bureau on highest temperatures ever recorded in various cities in the United 

States: 

112 100 128 120 134 114 106 110 

109 112 100 118 117 116 118 121 

114 114 105 109 107 112 115 115 

118 117 118 125 106 110 122 108 

110 121 113 120 119 111 104 111 

120 113 120 117 107 110 118 112 

114 115 

(SOURCE: U.S. Statistical Abstract, Table 391) 




d. Create a relative frequency histogram for age. Leave your bin width as 5 degree. 

e. Create a relative frequency histogram for age with bin width 10 degrees. 

f. What percentage of states can be eliminated from consideration if the company 

will not take any risks with states that have had a record high over 115 F 

g. Summarize the distribution of high temperatures in the U.S. 


2.3 Descriptive Statistics – Center and Position 

Histograms provide us with a great visualization of the overall distribution of values. A 

distribution describes the layout of the values of a quantitative or categorical variable. To further 

describe the differences between two similar distributions, it is helpful to use statistics that 

describe center, location, and spread. 

2.3.1 Mean and Median 

To make peace with some regularly occurring notation in statistics, we will use 

mean “the sum of.” For instance, 

(“sigma”) to 

Let‟s say that we have a set of variable values. To distinguish each of these “ ‟s” we‟ll use 

subscripts, denoting them: 

Then, to indicate that we want to sum these values across all subscripts, we would write: 

Which means, “sum up all 

values in the dataset,” or 

Using this new notation, we already know how to calculate the mean: 

Mean – x-bar notation 

The mean value, or average, of a dataset containing 

values can be written as: 

̅, is used to denote the mean of a sample and can be read as “x-bar.” 

A common point of confusion for students is the difference in the subscript and the 

denominator . Many people think that the subscript should be to match the number of 

elements in the dataset. However, specifically refers to the very last value in the dataset. We 

treat the as an index that goes across all subscripts from 1 all the way up to and including . To 


̅ 

̅ 

account for this discrepancy, mathematicians usually write where the index should start below 

sigma and the maximum value above the sigma. For example, if there are 3 values in the dataset, 

we would write the mean as: 

As you can see, the sigma notation can quickly become convoluted, and so we typically just 

write to indicate the sum of all -values. 

Median 

The median value of a dataset is the value that represents the physical center of the data set. To 

locate the median: 

Organize the data values from smallest to largest. Then, 

If there is an odd number of values in the data set, the center value can be located by counting in 

positions from the smallest value, including the smallest value. Alternatively, one can count in an 

equal number of values from the left and right endpoints to locate the center value. 

If there is an even number of values in the data set, average the two middle-most values together. 

The locations of the two middle-most values are: 

Positions from the smallest value, including the smallest value. Once again, these values can be 

found by counting from the left and the right endpoints of the dataset. 

Example 1: Find the mean and median salaries for the company represented by the following 

dataset (in thousands). Explain which measure better reflects the overall company 

demographic. 

SOLUTION: We first find the mean: 

This means that, on average, employees earn $148,200 per year. 


We begin by listing them in ascending order: 

The two middle values are 48 and 50 (these values are four values in from either side). These 

represent the 10/2=5 th and 10/2+1=6 th values in the dataset. To find the median, we average them 

together to get 

The median salary is $49,000 per employee per year. 

The median is clearly a more viable measure. The mean takes into account all values, including 

the outlier, or “extreme” salary of $1.1 million per year. The median is not influenced by 

extreme outliers. 

To find the mean and median salaries in Excel we use the functions average() and median(). 

The parameter for both functions is the cell range corresponding to the dataset. 


2.3.2 Percentile 

Another useful tool for describing the location of data points is a percentile. 

Percentile 

The th percentile is a value such that percent of the values in a dataset (of values) are less 

than or equal to this value. 

To find the location of this value, that is, the index, , first arrange the data in ascending order. 

The index can be calculated by: 

. / 

That is, find the th percent of the number of observations. Round up if the index is a decimal 

and take the average of the values in positions and if the calculated value of is an 

integer. One of these two actions will be taken 

Example 2: Find the 50 th percentile for the salaries in Example 1:. Interpret the real-world 

meaning of this value. 

The values, in ascending order, are: 

We take 

. Since this is an integer, we average together the values in positions 5 and 

6, giving us a value of 49. This means that 50% of employees represented in this dataset make 

$49,000 or less. 


Not surprisingly, the 50 th percentile is actually the median of the dataset! This is always true. 

In Excel, we can use the Percentile() function. The set-up of this function‟s parameters is: 

=percentile(cell range, p/100) 

Thus, for this dataset, we would have: 

2.3.3 Quartiles 

Often times, data analysts like to think about data in terms of quartiles, or quarters. There are 4 

quartiles and can be represented as follows: 

 

 

 

 

Quartile 1 = 25 th Percentile 





2.3.4 Rank 

What if, on the other hand, an employee wants to know what the rank of his salary is (he knows 

his percentile value) This requires reverse-engineering of the idea of a percentile. Without the 

use of any mathematical formulas, we would need to count the number of values that are equal to 

or lesser than salary in question. To make this easier, we can use Excel‟s Rank() function. The 

parameters we will use are as follows: 

= rank(value, cell range, 1) 

This will return the number of values that are less than or equal to the value in question. If we 

changed the parameter of 1 to a 0, Excel would return the ranking of that value, treating rankings 

as being similar to the ranks of, say, runners in a race. 

We will then need to divide this output by the number of observations in the dataset. To make 

the counting process more automated, we can take this output and divide it by the output of the 

count() function. This function will simply count the number of entries in the specified range, 

and has the following parameter: 

= count(cell range) 

Let‟s say the employee making $24,000 would like to know his salary‟s rank. To calculate, we 

would type the following: 

Giving us: 


Thus, his salary is in the 30 th percentile. This means that 30% of people represented in this 

dataset make $24,000 or less. 

Another approach would be to use the “Rank and Percentiles” tool in an Excel add-in called 

Analysis ToolPak. This method will show the ranks and percentiles of all values in the dataset 

and is only useful for relatively small, manageable datasets. The Analysis ToolPak will be 

important later on, so we‟ll describe it‟s installation here. 

2.3.5 Analysis ToolPak 

To install the Analysis ToolPak, select the File tab within Excel. Then select Options from the 

ribbon that appears. Select the Add-Ins option. Click Analysis ToolPak and press Go. 


Check the “Analysis ToolPak” and “Analysis ToolPak – VBA” features from the pop-up 

window and press OK. 

You now have the ToolPak installed. 


To use the “Rank and Percentile” tool, select the Data tab. Choose Data Analysis from the 

Analysis column. Pick “Rank and Percentile” from the pop-up window and press OK. 

Select the input range: 

You can either specify an output range, or have Excel create a new worksheet with the results. 

This is up to your preferences. Check “Labels in First Row” and be sure that the data label has 

been selected. 

The results are shown below: 


You‟ll immediately notice that a salary of $24,000 is shown as being in the 22.2-percentile, 

which does not agree with our calculation. Every software package uses some technique to 

conduct this calculation. A common agreement for calculation purposes does not exist. 

Fortunately, they both are in the same “ballpark.” 


1. Suppose your instructor releases scores on a recent project. The scores are as follows: 

83 89 76 41 92 85 76 71 

95 92 80 84 77 78 81 75 

64 30 80 79 78 70 75 81 

99 85 80 82 70 69 71 70 

a. Generate a relative frequency histogram and comment on any interesting 

observations of the distribution. 

b. Compare the mean and median. What causes them to be different in this particular 

way 

c. What score would be required in order to be in the 80 th percentile 

d. In what percentile is a person who scores 71% on this project 

2. In order to make way for new products, a grocery store chain would like to determine 

whether the Lunch Pack or Family Pack of Flaxem Crackers generate more revenue. The 

following two datasets show the revenue generated by each over a 10-month period: 

Lunch 

Family 

450 510 550 330 400 

500 550 290 310 300 

500 400 600 310 350 

600 200 200 600 430 


a. Compare the mean and median of each dataset. What can be said about the 

middle-most revenues 

b. Find all four quartiles for each dataset. Use this information to make an argument 

for why this grocer should hang on to the Family Pack. 

c. For each of the datasets, determine the top 10% of revenues that can be expected. 

d. Find the range of the data. Comment on how this might influence the grocer‟s 

decision. 

3. Suppose that Budget Car Rentals assesses a variety of new 2012 and 2013 sedans for its 

new line of rental cars. It finds the following information on city and highway fuel 

efficiencies (mpg) for eight vehicles in consideration: 

Year 2012 2013 2013 2012 2012 2012 2012 2012 

Make Toyota Ford Ford Honda Toyota Toyota Hyundai VW 

Model Prius Hyb. Fusion Hyb. C-Max Hyb. Insight Camry LE Hyb. Camry XLE Hyb. Sonata Hyb. Passat 

City 51 47 44 41 43 40 34 31 

Highway 49 47 41 44 39 38 39 43 

(SOURCE: www.fueleconomy.gov) 

a. Find the mean and median fuel efficiency for city and highway mileages of the 

vehicles being considered. Comment on any differences between the two values. 

b. What is the rank percentage of a vehicle that has 43 city mpg 

c. If the company makes its choice based on the top 15% of city and highway for the 

vehicles being considered, what will be the minimum city and highway mileages 

they should consider 

d. Make a recommendation for which vehicle(s) should be purchased, if any. 


̅ 

2.4 Descriptive Statistics – Variability 

The measure of center is always a good start. But what does a sample mean not tell us It fails to 

describe how far apart the data are from one another. In other words, we need to assess the 

variability of variance of the numbers we have collected. 

The simplest way we might go about describing the variability is by simply looking at the range 

of the data, such that: 

Range = largest observation - smallest observation 

Albeit, this still does not help us identify how spread out the data are. For example, suppose we 

find our range to be 100 units (see dataset below). This might seem rather daunting at first, but 

what if all values were clumped between 0 and 10, and there existed an outlier of 110 

Obviously, this range is often determined by outliers alone. 

0 1 3 10 8 7 4 110 

2.4.1 Standard Deviation 

To create a better measure of variability that takes all data points into account, just like the mean 

does, statisticians established a standard deviation. As the title implies, this is a standard tool 

that measures the average deviations (or by how much each values deviates) from the mean. This 

requires us to find all the deviations for points in our dataset, 

We would find all of these. Let‟s demonstrate with the above dataset: 

Value 

̅ 

0 -17.875 

1 -16.875 

3 -14.875 

10 -7.875 

8 -9.875 

7 -10.875 

4 -13.875 

110 92.125 

Mean: 17.875 

The deviations that we observe to be below the mean produce a negative deviation and the one 

above the mean has a positive deviation. To find an average deviation, we would ideally add 


them. However, observe that the sum of the deviations is 0! This is true of any dataset, since the 

mean represents the “balance” of the dataset. Due to mathematical concerns that we won‟t state 

here, mathematicians decided to square these values, since squaring converts all signed numbers 

into positive values. 

Value ̅ ( ̅) 

0 -17.875 319.5156 

1 -16.875 284.7656 

3 -14.875 221.2656 

10 -7.875 62.01563 

8 -9.875 97.51563 

7 -10.875 118.2656 

4 -13.875 192.5156 

110 92.125 8487.016 

Mean: 17.875 Sum: 9782.88 

Great, now they can be summed up to give 9782.88! Thus, we have found the following: 

∑( ̅) 

One would think that dividing by 8 would now be appropriate to find the average. Due to 

mathematical properties that are beyond the scope of this course, the division will be by 7, which 

is . Thus: 

∑( ̅) 

This value that we have found is called the variance. 

NOTE: The division by has to do with the fact that we are often dealing with a sample in 

inferential statistics and hope to make conclusions above a population. 

Sample Variance 

The variance of a sample, an uninterpretable measure of variability denoted by 

by the following formula: 

, can be found 

∑( ̅) 

To make all of these calculations more meaningful (to have a true average), we should probably 

“unsquare” the value that we have. When we do this, we get the sample standard deviation: 


√ ∑( ̅) √ √ 

This is what we can think of as the average deviation of each point from the mean. It is clearly 

high for this dataset. What is causing it The outlier of 110! 

Conclusion: On average, values in the dataset deviate from the mean by about 37 units. 

Sample Standard Deviation 

The standard deviation of a sample, denoted , is given by the following formula: 

∑( ̅) 

√ 

Note that this is simply the square root of the variance. 

In Excel, the standard deviation can be calculated simply by using the function below: 

= stdev(cell range) 

Example 1: A river with mild current is known to have an average depth of 3 feet with a 

standard deviation of 3 feet. The bottom is not visible. Is the river safe to cross by foot Also, 

what is the variance 

SOLUTION: Since there is a standard deviation of 3 feet, we can conclude, that, on average, the 

river depth deviates by 3 feet from the mean. It would not be unusual to encounter a part of the 

river with a depth of 6 or more feet. Therefore, the river should not be crossed by foot. 

Since the standard deviation is the square root of the variance, the variance is the square of the 

standard deviation. That is, 

Thus, the variance is 9. The variance does not have a valuable interpretation. 

2.4.2 How Do We Interpret the Value We Get 

Think about this: n is a fixed value for our sample, specifically 5. The only thing that could make 

s 2 large or small is the numerator. Thus, if the deviations are large (a bad thing!), then the 

squared deviations will be large, and so the sum of squares will be large. This implies a large 

standard deviation. 


If the deviations are small (good thing!), then the squared deviations will be small, and so the 

sum of squares will be small. This implies a small deviation. 

So, a large standard deviation means that there is a lot of variability, or that the values are vastly 

different from one another. A small standard deviation means the values in the data set are quite 

alike. In the near future, you'll see why it is important to have a small standard deviation. In 

general, as the variance and standard deviation get larger, our ability to make precise statements 

about the population quickly evaporates. 

We will be using variance and standard deviation consistently for the rest of the semester. It is 

important to get comfortable with it. 

2.4.3 Do Population Variances and Standard Deviations Fall into Play 

Indeed they do. Do you think that we can find them Definitely not! The population variance 

requires the use of the population mean, . How do we get We take the average of all the 

values in the entire population. Since we typically don't know this value, we also typically don't 

know the population variance, so certainly we don't know the population standard deviation 

(since it's the square root of the population variance). 

The table below summarizes the notations we need to recognize: 

Sample 

Population 

Variance Standard 

Deviation 

The population parameter, , is the lowercase Greek letter “Sigma.” (This is as opposed to the 

sample statistic, .) 

2.4.4 Interquartile Range 

The standard deviation, much like the mean, is easily skewed by excessively small or large 

values. We noticed this in the first example in this section. Using the idea of medians and 

percentiles is a safe bet for outlier-proofing our spread estimates. An interquartile range is the 

difference between the 3 rd quartile and the 1 st quartile. Remember, these are simply the 75 th and 

25 th percentiles, respectively. The difference is the middle 50% of the dataset. 


This gives us a nice measure of how spread out the data is about the median. 

Example 2: Consider the following home prices and find both the standard deviation and the 

interquartile range. Describe what conclusions can be drawn from these values. 

Values (thous. $) 95 875 96 89 87 88 93 91 

SOLUTION: Using Excel, we find the following: 

The standard deviation indicates that home prices, on average, vary by $277,100 from the mean 

value. However, we see from the interquartile range that the middle 50% of homes only vary by 

$6,500. The standard deviation is being skewed by the home that is priced at $875,000. The 

interquartile range tells us that the majority of home values stay pretty close to the median value. 

Additionally, we see that most home values are between $88,000 and $96,000. 

2.4.5 Descriptive Statistics: Analysis ToolPak in Excel 

To generate most of the features we have discussed up until now, we turn to Excel‟s Analysis 

ToolPak for a more automated approach. 

Let‟s consider the house data above: 


Values (thous. $) 

95 

875 

96 

89 

87 

88 

93 

91 

Access the Data Analysis tool from the Data tab in Excel. Select “Descriptive Statistics” from 

the menu and select the data from the spreadsheet containing the data. 

Be sure that you check “Summary Statistics.” 


We can immediately see the mean and the median of the dataset. Additionally, we see the 

standard deviation, variance, range, min/max, sum of the values, and the number of values in the 

dataset, among other tools to ignore for now. We see, as expected, that the dataset does not have 

a mode, or most frequently occurring value. 


2.4.6 Shapes of Distributions 

Now that we have a basis for measuring data in terms of its center and spread, we turn back to 

making connections with the visual shape of the distribution. 

There are many different shapes that we encounter for distributions. Let's discuss a few. First, 

note that the following do not look like the rectangular histograms from earlier on. These are 

smoothed out forms of what we experienced earlier. They are often used to describe the general 

shape of a distribution. And, of course, they are much easier to sketch. 

A histogram is said to be (a) unimodal if it has a single peak, (b) bimodal if it has two peaks, 

and (c) multimodal if it has more than two peaks. 

If we follow the curves from left to right, we begin at the lower tail, move over the peak(s), and 

arrive back down to what is called the upper tail. 

A unimodal histogram is said to be symmetric, if we are able to draw a line down the center 

such that the left side of the line is a mirror image of the right side. Consider the following 

unimodal symmetric histograms: 

A unimodal histogram that is not symmetric is said to be skewed. If the upper tail of the 

histogram stretches out much farther than the lower tail, then the distribution of values is 

positively (right) skewed. On the other hand, if the lower tail is much longer than the upper tail, 

the histogram is negatively (left) skewed. Can you identify the following unimodal histograms 

as positively or negatively skewed 


Lastly, a normal curve is the most desired type, due to its (in general) nice properties. A normal 

curve occurs quite frequently. It has a bell shape and is sometimes called the Gaussian curve. 

Here are examples of normal curves: 

2.4.7 Skewness 

Excel also produces a nice measure that allows us to make conclusions about the general shape 

of the distribution. This measure is called skewness. 

If the skewness measure is: 

 

 

 

Postive, then the distribution is skewed right 

Negative, then the distribution is skewed left 

Zero, then the distribution is symmetric 

The farther from 0 that the skewness measure is, the more skewed in the respective direction the 

distribution will be. 

Consider the following data showing the number of televisions owned by randomly sampled 

individuals in a big city: 


3 4 3 2 3 2 1 1 0 

4 0 4 4 4 3 1 0 1 

4 3 3 0 4 2 1 2 4 

2 4 2 4 0 3 4 3 3 

2 2 0 2 1 1 3 2 2 

0 0 3 1 0 3 4 3 3 

0 1 4 4 2 1 2 0 2 

4 3 2 4 2 4 3 3 3 

1 2 0 3 0 2 3 2 0 

0 2 0 4 4 3 4 1 0 

Using Excel, we produce descriptive statistics using the Analysis ToolPak: 

We notice that the Skewness measure is positive: 0.51. This means the dataset is slightly skewed 

to the right: 


Frequency 

̅ 

Histogram of TV's Owned 

25 

20 

15 

10 

5 

0 

0 1 2 3 4 5 6 

Number of TV's 

2.4.8 Outlier Detection 

After analyzing a dataset, how do we assess likely values for data and deem other values as 

outliers 

One approach is to determine how many standard deviations above (positive value) or below the 

mean (negative value) a given data value is. 

For instance, suppose we have a dataset with mean 20 and standard deviation 3. We have an 

observation of 14. In terms of units, this value is 6 units below the mean. Thus, it has a deviation 

of -6. This deviation tells us that the data value in question is 2 standard deviations below the 

mean, since: 

This measure is often called a z-score. Let‟s recap: 

-Score 

A -score tells us the number of standard deviations a data point, , is from its mean, ̅. 

Mathematically, 


The idea of a -score is quite helpful, in that it tells us how far it is from the mean, relative to the 

size of the standard deviation (the average spread). If is very close to 0, then the score is not far 

from the mean. If it is very large, it is very far from the mean. 

A very useful theorem established by Russian mathematician, Lvovich Chebyshev, allows us to 

determine how large is very large. Chebyshev established the following theorem: 

Chebyshev’s Theorem 

For any , at least . / of the data values must be within (to the left and the right) 

standard deviations of the mean, for any. 

This works for any and all distributions. 

Example 3: 

A data value is 3 standard deviations above the mean. Is this an extreme value 

SOLUTION: Chebyshev‟s Theorem states that 

89% of all data points in this distribution will lie between -3 and +3 standard deviations from the 

mean. Thus, there is, at most, an 11% chance of observing something higher than +3 standard 

deviations. This data value is fairly unlikely an might be considered a mild outlier. 


1. The Connecticut Agricultural Experiment Station conducted a study of the calorie content 

of different types of beer. The calorie content (calories per 100 mL) for 26 brands of 

light beer are: 

29 28 33 31 30 33 30 28 27 41 39 31 29 

23 32 31 32 19 40 22 34 31 42 35 29 43 

a. Find the standard deviation. Explain the real-world meaning of this value. 

b. Find the interquartile range. Explain the real-world meaning of this value. 

c. Find the skewness. What type of shape does this distribution have 

2. The UNICEF report “Progress for Children” (April, 2005) included the accompanying 

data on the percentage of primary-school-age children who were enrolled in school for 23 

countries in Central Africa. 


58.3 34.6 35.5 45.4 38.6 63.8 53.9 61.9 69.9 43 85 63.4 

58.4 61.9 40.9 73.9 34.8 74.4 97.4 61 66.7 79.6 98.9 

a. Find the range, standard deviation, and interquartile range. Explain what these 

three values tell us about the shape of the distribution. 

b. Explain the real-world meaning of the standard deviation and the interquartile 

range. 

c. Produce descriptive statistics for this dataset with the Analysis ToolPak in Excel. 

d. Is the distribution skewed If so, in which direction 

e. Create a relative frequency histogram. Describe any trends in the data. 

f. Is an observation of 79.6 an outlier Use Chebyshev‟s Theorem to justify your 

answer. 

3. The article “Determination of Most Representative Subdivision” (Journal of Energy 

Engineering [1993]: 43-55) gave data on various characteristics of subdivisions that 

could be used in deciding whether to provide electrical power using overhead lines or 

underground lines. Data on the variable x = total length of streets within a subdivision (in 

feet) are as follows: 

1280 5320 4390 2100 1240 3060 4770 1050 

360 3330 3380 340 1000 960 1320 530 

3350 540 3870 1250 2400 960 1120 2120 

450 2250 2320 2400 3150 5700 5220 500 

1850 2460 5850 2700 2730 1670 100 5770 

3150 1890 510 240 396 1419 2109 

a. Find the range, standard deviation, and interquartile range. Explain what these 

three values tell us about the shape of the distribution. 

b. Explain the real-world meaning of the standard deviation and the interquartile 

range. 

c. Produce descriptive statistics for this dataset with the Analysis ToolPak in Excel. 

d. Is the distribution skewed If so, in which direction 

e. Find the -score for the observation 79.6. Explain what your answer means in 

real-world terms. 

f. Create a relative frequency histogram. Is an observation of 79.6 an outlier Use 

Chebyshev‟s Theorem to justify your answer. 

4. Using the five class intervals 100 to 120, 120 to 140, . . ., 180 to 200, devise a frequency 

distribution based on 70 observations whose histogram could be described as follows: 

a. symmetric b. bimodal c. positively (right) skewed d. negatively (left) skewed 


5. The Highway Loss Data Institute publishes data on repair costs resulting from a 5-mph 

crash test of a car moving forward into a flat barrier. The following table gives data for 

10 midsize luxury cars tested in October 2002: 

Model Repair Cost 

Audi A6 0 

BMW 328i 0 

Cadillac Catera 900 

Jaguar X 1254 

Lexus ES300 234 

Lexus IS300 979 

Mercedes C320 707 

Saab 9-5 670 

Volvo S60 769 

Volvo S80 4194 

a. Using Analysis ToolPak in Excel, generate all descriptive statistics. Discuss the 

best measure of center and the best measure of spread based on what you see. 

Justify why these measure were selected. 

b. Find the -score for the observation 4194. Explain what your answer means in 

real-world terms. 

c. Is $4,194 considered an extreme outlier Also use Chebyshev‟s Theorem to 

numerically reinforce your answer. 

6. Cost-to-charge ratios were reported for the 10 hospitals in California with the lowest 

ratios (San Luis Obispo Tribune, December 15, 2002). The 10 cost-to-charge values 

were 

8.81 10.26 10.2 12.66 12.86 12.96 13.04 13.14 14.7 14.84 

Discuss relevant descriptive statistics and a relative frequency distribution . Use your 

information to make a conclusion about the state of hospitals in California. 

7. The technical report “Ozone Season Emissions by State” (U.S. Environmental Protection 

Agency, 2002) gave the following nitrous oxide emissions (in thousands of tons) for 16 

states in the continental United States: 

76 22 40 7 30 5 6 136 72 33 

0 89 136 39 92 40 13 27 1 63 

Generate a brief report about the distribution of nitrous oxide emissions in the sampled 

states. Use descriptive measures and visuals to justify your answer. 


Chapter 3 

Probability and Decision Theory 

When you stop, I mean really stop, and think about 

how often you think in terms of probabilities, I am 

confident you‟ll find you use it more often than not. 

Do you ever decide to get to work by taking one 

route as opposed to another Would you find 

yourself making health decisions based on your 

doctor‟s advice instead of the advice you might 

receive from a ten-year-old child Have you ever 

purchased a birthday gift for someone after deep 

contemplation of what that person might like Do 

you trust one news network over another What are 

your decisions based on in these situations 

Whether or not you‟re willing to give in to your 

inner nerd, you should admit that you think in terms of chances and likelihood. I imagine that 

you do have a preferred route. I think that you do trust an expert‟s medical opinion. I believe that 

you do make a gift purchase after considering what you think the recipient enjoys. I should think 

there are some networks that you trust more than others. 

In this chapter, we‟ll explore the nature of probabilistic thinking. You‟ll also notice the phrase 

“Decision Theory” in the title. Instead of focusing on the trite probability questions involving 

situations that we don‟t ever encounter, we‟ll concern ourselves with real-world situations where 

probabilistic reasoning will help us make a decision. 


3.1 The Idea of Probability 

In this section, we‟ll address what probability is (and isn‟t). 

Example 1: A weather report by the National Weather Service (NWS) stated on July 31, 2011 

that, overnight, there was a 50% chance of precipitation in the 85225 zip code in which 

Chandler-Gilbert Community College is located. What does this mean 

(SOURCE: www.crh.noaa.gov/) 

SOLUTION: This is actually quite a loaded statement. One might want to say that, out of 100 

times, it will rain 50 times. This is a very misleading approach for a couple of different reasons. 

First off, what is meant by “times” We are only concerned with one time: overnight on July 31, 

2011. 

A probability is actually a measure of how likely something is to occur in the long-run. That is, 

if something were to be repeated in trials over and over again then, theoretically, the specified 

outcome would occur a certain percentage of time. Importantly, it must be noted that the 

conditions under which we are measuring a probability must be in place in order for the 

probability to be a valid measure. 

In our case, NWS states that, under the exact same environmental conditions taking place 

throughout the night of July 31, 2011, it would be expected to rain 50% of the time. 

The graph below shows a hypothetical scenario in which there is a 50% chance of precipitation 

under the set of conditions that occurred on the above night. Notice that it rained on the initial 

day and so immediately the proportion (or probability) of rainy days is 100%. As the same 

conditions occur on different days, sometimes it rains and sometimes it does not. Having noted 

that, any given day has a 50% chance of precipitation. We notice that the proportion is quite 

unstable at first, jumping from 100%, down to nearly 40%; However, as many days with this 

same set of conditions pass (in the long-run), we notice that the proportion becomes more stable 

and approaches the theoretical probability of 50%. 


Proportion of Rainy Days 

1.2 

Proportion of Rainy Days Under July 31,2011 Overnight Conditions 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 20 40 60 80 100 120 140 

Day with Specific Conditions 

Graph: Based on a random simulation involving the true probability of a 50% chance of precipitation 

and what occurs in the long-run. 

As an interesting note, NWS has sophisticated helium “balloons” that they send up into the air to 

measure properties such as wind speed and direction, humidity, and barometric pressure. Then 

physics is used based on theories of fluid mechanics to make the prediction. 

Among many others that we could begin to state, there is one other major misconception about 

probability: that if the probability that it rains is said to be very small and yet it rains, then the 

probability must be wrong. This is incorrect. Probability is a measure of uncertainty. As in the 

case of meteorology, the predictions are scientific and are based upon prior data. Just because it 

has only rained, say, 10% of the time on days like today, this is not to say that it won‟t rain. In 

fact, it very well might! The moral of the story is that probability talks about likelihood. Only in 

the instance of 0% and 100% probabilities is anything guaranteed. If there are situations in which 

something either never happens or always happens, then we‟re probably not concerned about 

understanding probabilities. 

Probability 

Probability is a measure of uncertainty, typically expressed as a number between 0 (0%) and 1 

(100%), that describes how likely it is that an event will or will not occur under a specified set of 


conditions in the long-run. 

Measuring Probabilities 

While probability is considerably more complicated than we‟ll let on, the basic idea is that a 

probability can be calculated by considering the number of times some event occurs relative to 

the total number of “trials,” or observable situations. In simpler terms, it is the number of 

“successes” out of the total number of trials. 

Calculating Probability 

The probability that event occurs, denoted ( ), is the ratio (or fraction) of successes divided 

by the number of trials. Mathematically, we write the number of times occurs by ( ) and the 

total number of trials as ( ). That is, 

( ) 

( ) 

( ) 

This formula works when all elements in the sample space are equiprobable, that is, each 

individual outcome in the sample space has the same probability of occurring as any other 

outcome. 

As a note the () notation stands for “the number of ways” the event in parenthesis can occur. 

The in the denominator stands for sample space or the total number of things/situations/trials 

being considered in the experiment. 

Example 2: In a 2009 study of high-fructose corn syrup (HFCS), a corn-based sweetener used in 

a wide variety of foods, beverages, and condiments, 20 samples of HFCS were analyzed. Of 

those, nine of them were found to contain mercury by researchers. Based on the results of this 

study, find the probability that a random sample of HFCS contains mercury and explain what this 

result means. 

SOURCE: http://www.washingtonpost.com/wpdyn/content/article/2009/01/26/AR2009012601831.html 

SOLUTION: The event in this scenario is that mercury is found. Out of the total 20 trials, nine of 

them contained mercury. Therefore, 

( ) 


This means that if samples of HFCS were to be sampled randomly and repeatedly, it would be 

found that 45% of all samples would contain traces of mercury. This does not guarantee that 

exactly 45 samples out of 100 will contain mercury. 

Example 3: In July 2011, temperatures in Gilbert, Arizona were above 100 every day 

(SOURCE: www.weather.com). Based on this data, a researcher concludes that the probability of 

above 100 temperatures in Arizona is 100%. Comment on his findings. 

SOLUTION: Since temperatures in July 2011 were above 100 31 days of the 31 days in the 

month, it is fair to make the experimental observation that approximately 100% of all days in 

July 2011 have temperatures exceeding 100 , in the long-run (there have been days in the past 

when temperatures were below 100 ); However, because we know that temperatures are 

periodic, or that they go from low to high and back to low over the course of a year, 100% is not 

a good estimate for temperatures in Arizona, in general (temperatures are reasonably never above 

100 in January!). 

This example truly stresses the importance of critical thinking when using probabilities. It is 

often that probabilities are used and abused in the media, education, and in politics, just to name 

a few. We want to make sure that we are as specific as possible. 

It will often be considerably helpful to display probabilities in a tabular form, that is, through the 

use of tables. This type of table is called a contingency table. This not only helps to organize 

data, but to simultaneously see the big picture. Let‟s consider an example. 

Example 4: In a 1950 study that considered 1,418 hospital patients in London (half of each) with 

and without lung cancer and whether or not they smoked over the course of their lives, the 

following was found: 

Smoker/Lung Cancer Yes No 

Yes 688 650 

No 21 59 

Assuming this data can be used as a representation of the entire population of London residents, 

analyze the data by discussing the following: 

a. What is the probability that a randomly selected participant within this study develops 

lung cancer 

b. Provided that a person was a smoker, what is the probability that he has lung cancer 

c. Provided that a person was not a smoker, what is the probability that he has lung cancer 

d. Given that a person has lung cancer, what is the probability that he smokes 

SOLUTION: 


When answering these questions, it is fairly useful to fully organize the data by providing all 

totals: 

Smoker/Lung Cancer Yes No Smoker TOTALS 

Yes 688 650 1,338 

No 21 59 80 

Lung Cancer TOTALS: 709 709 1,418 

1. Since there is a total of 1,418 individuals being considered and, of those, 709 developed 

lung cancer, 

( ) 

We must be careful in using this probability as it doesn‟t really reveal anything about the 

link between lung cancer and smoking, since 709 patients with lung cancer and 709 

without lung cancer were chosen to participate in the study to begin with. This is a 

probability that was fixed by the researchers. 

2. There is a total of 1,338 individuals in the study that smoke (we are limited to the 

smokers only, per the way the question is stated). Of those individuals, 688 have lung 

cancer. 

( ) 

Slightly over half of the patients who are smokers developed lung cancer. This number is 

frighteningly large. Before we jump the gun in assuming that smoking is the culprit here, 

we should probably consider what happens with nonsmokers. 

3. There is a total of 80 nonsmokers in the group. Of them, 21 developed lung cancer. 

( ) 

Slightly more than one-fourth of non-smokers developed lung cancer. This number 

appears to be significantly less severe than for the smokers. We speculate (but did not 

prove) that smoking increases the likelihood that one will develop lung cancer. 

4. There are 709 patients with lung cancer. Of these, 688 smoke. 

( ) 


Are we confident in accusing a lung cancer patient of being a smoker According to this 

data, perhaps. 

The moral of the story is: analyze the situation from a variety of lenses. What appears to be true 

might be an illusion of what we see immediately! Sometimes, however, it is about what the 

naked eye does not detect. This is what makes good analysts. 


1. A classmate of yours was absent when this section was discussed. Explain to her what a 

probability is in your own words. 

2. In a study performed by Cambridge University in the United Kingdom, it was found that, 

“One out of three people is overwhelmed by the latest breakthroughs in technology.” 

(SOURCE: http://www.gev.com/2011/07/study-one-out-of-three-people-overwhelmedby-technology/). 

Primarily, individuals are overwhelmed by how much information is 

available through the use of social networks and smartphones, to name just two. Explain 

what is meant by this and explain in terms of probabilistic reasoning. 

3. In a 2007 survey conducted by DDB Worldwide, an internationally known advertising 

company, the following question was asked of a random group of 217 participants: “Is 

consistency in branding becoming any more or less important” The following table 

displays the results: 

Response Number of respondents 

More important 143 

Less important 74 

Find the probability that a respondent believes that consistency in branding is: 

a. More important, then explain what this means. 

b. Less important, then explain what this means. 

4. The probability that a visit to a primary care physician‟s (PCP) office results in neither 

lab work nor referral to a specialist is 35%. Of those coming to a PCP‟s office, 30% are 

referred to specialists and 40% require lab work. 

Determine the probability that a visit to a PCP‟s office results in both lab work and 

referral to a specialist. (Video Solution) 

5. A public health researcher examines the medical records of a group of 937 men who died 

in 1999 and discovers that 210 of the men died from causes related to heart disease. 

Moreover, 312 of the 937 men had at least one parent who suffered from heart disease, 

and, of these 312 men, 102 died from causes related to heart disease. 


Determine the probability that a man randomly selected from this group died of causes 

related to heart disease, provided that neither of his parents suffered from heart disease. 

(PROBLEM SOURCE: SOA/CAS Exam P Sample Questions, Page 5) (Video Solution) 


3.2 Joint Probability 

In the previous section, we began computing probability using some fairly basic ideas. In 

calculating probabilities, we made a huge assumption: that the found number represents what 

will occur in the long-run. For instance, if we conduct a study and find that out of 100 people, 94 

respond positively to a new energy drink, can we conclude the drink is effective in providing 

added energy 

The answer to this question is 

humbling: it depends upon how the 

data was collected. Suppose the 

participants are all college students 

who tend to consume a large amount of 

caffeine as it is. Would it be fair for the 

advertisement to say, “There is a 94% 

chance that this energy drink will 

energize you” Not necessarily, since 

the result only appeared to be valid in a 

sample of college students. This means that the population must be specified form which the 

sample was taken. In this case, the population is the set of all college students and the sample is 

the 100 students who were selected. Thus, perhaps the advertisement should say, “Are you a 

college student If so, there is a 94% chance that this energy drink will energize you” That is, 

provided that this sample was a random sample and not a group of college students hand-picked 

from the respective population. 

Okay, so you have a data sample collected from a specific population and your goal is to now 

talk about probabilities. 

Example 1: Imagine that you work for a marketing agency and your goal is to determine the 

effectiveness of two different branding approaches to a new line of clothing. The first approach 

involves establishing a group of Facebook followers by giving incentives for discounts on 

clothing by becoming a friend of the company. The company hypothesizes that seeing the 

company logo under on their Facebook account each week, 

they will gain a strong familiarity and comfort level with 

the company‟s product. The second approach involves 

hiring Hollywood actors to endorse the product at film 

festivals and celebrity appearances. The company then 

tracks the degree of success of the branding tactic by 

measuring the number of retail outlets that agree to stock 

the product based on the branding used. They find that, of 

the 6 companies exposed to Tactic 1 (T1), 5 agreed to stock 

the product. Of the 7 companies exposed to Tactic 2 (T2), 5 

agreed to stock the product. 


Because of the amount of resources involved in selling the product to retail stores, a single 

marketing analyst can only reach out to about 15 business per month; However, if successfully 

sold, the result is a high level of profit for the clothing company, which, in turn, means you 

might get that raise after all. 

SOLUTION: Let‟s start with a simpler question, and first consider T1. We find that the 

probability of a successful sale is: 

( ) 

This means that we should expect 80% of all companies to sell the clothing line, in the long-run. 

Suppose that a marketing analyst is to offer T1 to two different companies. He would like to 

know, what is the probability that both companies agree to sell the product Is the answer 80% 

Unfortunately, no. There is an 80% chance that each company agrees to sell the clothing line. 

We should expect that the probability that both sign-on is less. 

We know that about 8 out of 10 times, Company 1 (C1) will sign-on and that 8 out of 10 times 

Company 2 (C2) will sign on. Let‟s compare the possibilities by using a tabular approach: 

Company 2 

Choices 

Company 1 

Choices 

Y 

Y 

Y 

Y 

Y 

Y 

Y 

Y 

N 

N 

Y Y Y Y Y Y Y Y N N 

Each cell in the table represents a particular combination of the C1‟s choice and C2‟s choice. So, 

the 1-1 entry (remember, this means first row, first column) of the table is the situation in which 

it does indeed turn out that C1 and C2 agree to sell the clothing line. The question was, what is 

the probability that both sign-on Since the definition of probability is the ratio of the number of 

ways the event can occur divided by the total number of possible outcomes, let‟s do a bit of 

counting by highlighting important features of the table: 

Company 2 

Choices 

Y Y Y Y Y Y Y Y N N 


Company 1 

Choices 

Y 

Y 

Y 

Y 

Y 

Y 

Y 

Y 

N 

N 

The shaded region represents the number of ways in which we can get both companies to sign 

on. This region is 8 x 8, which creates 64 possibilities. The total number of possibilities is simply 

the total number of cells in the table. Since the table is 10 x 10, we have100 possibilities. 

So, 

( ) 

This is, as speculated, less than the probability that only one company signs on. Let‟s consider 

what we really did here: 

( ) 

( ) 

Notice that 

( ) ( ) 

( ) ( ) 

Or, in short, 

( ) ( ) ( ) 


Example 3: The idea of red-light cameras has been disputed quite often in Arizona and all across 

the United States. While unable to find any specific details, the author will assume that red-light 

runners have about a 70% chance of being caught by a red-light camera on any given instance. 

Suppose that on a given day, two cars run through an intersection during separate red lights, 

setting off the camera. What is the probability that both drivers are 

caught 

SOLUTION: We can fairly assume that the first driver being caught 

and the second driver being caught (calling these events and , 

respectively) constitute events that do not affect one another. Thus, 

( ) ( ) ( ) 

There is a 49% chance that both drivers are caught. This is about the likelihood of getting heads 

on the toss of a coin. 

Example 5: In a crop of corn, the Food & Drug Administration 

(FDA) finds that two of the 20 bushels of corn are potentially 

contaminated with E. coli. Supposing that two bushels have 

already gone out for shipment to county marketplaces, how likely 

is it that both of the contaminated bushels have gone out 

SOLUTION: The question asks about the probability that both 

have been shipped, that is, the first contaminated bushel and the 

second contaminated bushel. We will refer to these events as simply and . We will first 

write the “and” probability in the form of dependent events and will then determine whether or 

not a dependency exists (see Independence Property box above). 

( ) ( ) ( ) 

We know that ( ) . Now, since the first probability “removes” one of the two 

contaminated bushels and one bushel out of the 20 available, the probability of shipping a second 

bushel is slightly changed to: 

( ) 

Thus, the events are indeed dependent, and so the probability becomes: 

( ) 


There is less than a 1% chance that both contaminated bushels went out. 

Does this outcome satisfy the farm producing these bushels of corn Thinking in more detail, the 

main concern is actually in regards to one or more (at least one) contaminated bushel going out! 

In order to address how to find this, it is useful to think about the following, perhaps obvious, 

characteristic. 

Basic Properties of Probability (Kolmogorov Axioms) 

1) A particular event is: guaranteed to not occur, is guaranteed to occur, or lies somewhere 

between these extremes. 

2) In a given situation, or sample space, the likelihood of something occurring (however 

small or insignificant), is guaranteed. 

3) The summed probabilities of all the possible events in a situation constitute the entire, or 

the whole of all possibilities. 

Mathematically, suppose that a sample space consists of n events 

above verbal rules translate into: 

. Then, the 

1) For any arbitrary event between events 1 and n, let‟s call this event , then: 

( ) 

2) Using to denote the sample space, 

( ) 

3) Summing the probabilities gives 100% of all possible outcomes: 

( ) ( ) ( ) ( ) 

These basic properties are often referred to as the Kolmogorov axioms, named after the 

mathematician Andrey Kolmogorov. An axiom can be thought of as a necessary assumption. For 

instance, when physicists develop new concepts in physics, they assume that gravity follows 

certain properties. Thus, they have gravity axioms. 

The Kolmogorov axioms are extremely important in probability and the development of new 

ideas. 


In fact, recall Example 5 dealing with the contaminated corn crop. What are all the possibilities 

for shipping out two bushels from the total of 20 Let‟s list them out: 

0 contaminated bushels and 2 uncontaminated bushels ship (call it 

) ) 

1 contaminated and 1 uncontaminated bushels ship (call it ) 

2 contaminated bushels ship (call it ) 

Are there any others Not unless there is a possibility we have not considered. Since two bushels 

are guaranteed to go out, the outcome must fall into one of the three categories listed. 

Let‟s calculate the probability for each of these by hand: 

( ) ( ) 

( ) ( ) 

( ): there are two possibilities; either the first is contaminated and the second is not, or 

vice versa. We must consider both outcomes below: 

o ( ) 

( ) ( ) 

o ( ) 

( ) ( ) 

These two possibilities give 9.5% + 9.5% = 19% of the sample space. 

( ) (from previous calculation) 

(NOTE: Importantly, summing these three probabilities gives 1, as stated in the axioms!) 


We can now see that the situation in which there is at least one contaminated bushel will occur 

of the time. This is much higher than when we concerned ourselves with 

both going out! This is quite a frightening situation! 

Needless to say, this was a lot of work; however, we can use the axioms to simplify the amount 

of work we commit to ourselves. 

According to axiom 2: 

( ) ( ) ( ) 

Our earlier statement involved wanting to know the likelihood that at least one contaminated 

bushel went out. That only involves and ! Solving for the sum of these two probabilities: 

That is, 

( ) ( ) ( ) 

( ) ( ) 

( ) 

This is the same number we achieved taking the long route! We only had to find the probability 

of shipping 0 bushels, which is a little bit of work as compared to a lot of work! 

Probability of At Least One… 

Given any number of events involving quantities, the probability of at least one in quantity is 1 

minus the probability of 0 in quantity. That is: 

( ) ( ) 

Mathematically, let subscripts 

denoted 

. Then, 

represent quantity, where corresponding events are 

( ) ( ) ( ) ( ) 


1. In 2009 the H1N1 virus, commonly referred to as the “Swine Flu,” reportedly infected an 

estimated 10% of New Yorkers (SOURCE: 

http://www.reuters.com/article/2009/08/30/us-flu-newyork-idUSTRE57T26Y20090830). 


Suppose that an emergency room in New York City has two individuals with flu-like 

symptoms. (Video Solution) 

a. What condition(s) do you believe would make it appropriate to assume 

independence in this situation 

b. By using the tabular approach and assuming independence, find the probability 

that both people have the H1N1 virus. 

c. By using the “and” rule, verify that you get the same answer that you found in 

Part b. 

d. Find the probability that neither of these individuals has the H1N1 virus. 

e. Find the probability that at least one of them has the H1N1 virus. 

f. Exposure to flu germs for even a short period of time can significantly increase 

one‟s chances of catching the flu. Suppose that if one is exposed to an individual 

with the flu virus, their chance of becoming infected is 15 percentage points 

higher than normal. Find the probability that both individuals have the flu virus. 

2. Many fire stations handle emergency calls for medical assistance as well as calls 

requesting firefighting equipment. A particular station says that the probability that an 

incoming call is for medical assistance is .85. This can be expressed as P(call is for 

medical assistance) = .85. 

a. Give a relative frequency interpretation of the given probability. That is, interpret 

what the number .85 means based on the definition of probability. 

b. What is the probability that a call is not for medical assistance 

c. Assuming that successive calls are independent of one another (i.e., knowing that 

one call is for medical assistance doesn't influence our assessment of the 

probability that the next call will be for medical assistance), calculate the 

probability that both of the two successive calls will be for medical assistance. 

d. Still assuming independence, calculate the probability that for two successive 

calls, the first is for medical assistance and the second is not for medical 

assistance. 

e. Still assuming independence, calculate the probability that exactly one of the next 

two calls will be for medical assistance. (Hint: There are two different 

possibilities that you should consider. The one call for medical assistance might 

be the first call, or it might be the second call.) 

f. Do you think it is reasonable to assume that the requests made in successive calls 

are independent Explain. 

3. "N.Y. Lottery Numbers Come Up 9-1-1 on 9/11" was the headline of an article that 

appeared in the San Francisco Chronicle (September 13, 2002). More than 5600 people 

had selected the sequence 9-1-1 on that date, many more than is typical for that sequence. 

A professor at the University of Buffalo is quoted as saying, "I'm a bit surprised, but I 

wouldn't characterize it as bizarre. It's randomness. Every number has the same chance of 

coming up. People tend to read into these things. I'm sure that whatever numbers come up 

tonight, they will have some special meaning to someone, somewhere." The New York 

state lottery uses balls numbered 0-9 circulating in 3 separate bins. To select the winning 


sequence, one ball is chose at random from each bin. What is the probability that the 

sequence 9-1-1 would be the one selected on any particular day 

4. On August 8, 2011, the Dow Jones Industrial fell 635 points (5.5%) to 10,810 points, 

representing the 6 th worst point loss ever experienced. On that day, President Obama‟s 

approval ratings also suffered tremendously; only 22% of the nation‟s voters “Strongly 

Approve” of how he is performing in the presidential role (SOURCE: 

http://www.rasmussenreports.com/public_content/politics/obama_administration/daily_pr 

esidential_tracking_poll). 

Suppose presidential hopeful Randall Terry (Democrat) speaks at a rally shortly 

thereafter and assumes that his approval rating as a candidate will likely closely mirror 

President Obama‟s. Suppose there are 40 swing voters (voters that are “on the fence” 

about who to vote for). (Video Solution) 

a. What is the probability that all 40 voters will strongly approve of Terry‟s plan 

b. What is the probability that none of the 40 voters will strongly approve of Terry‟s 

plan 

c. What is the probability that at least one voter will approve of Terry‟s plan 

5. The following case study is reported in the article "Parking Tickets and Missing 

Women," which appears in an early edition of the book Statistics: A Guide to the 

Unknown. In a Swedish trial on a charge of overtime parking, a police officer testified 

that he had noted the position of the two air valves on the tires of a parked car: To the 

closest hour, one valve was at the 1 o' clock position and the other was at the 6 o' clock 

position. After the allowable time for parking in that zone had passed, the policeman 

returned, noted the valves were in the same position, and ticketed the car. The owner of 

the car claimed that he had left the parking place in time and had returned later. The 

values just happened by chance to be in the same positions. An "expert" witness 

computed the probability of this occurring as (1/12)(1/12) = 1/144. 

a. What reasoning did the expert use to arrive at the probability of 1/144 

b. Can you spot the error(s) in the reasoning that leads to the stated probability of 

1/144 What effect does this error(s) have on the probability of occurrence Do 

you think that 1/144 is larger or smaller that the correct probability of occurrence 

6. Jeanie is a bit forgetful, and if she doesn't make a "to do" list, the probability that she 

forgets something she is supposed to do is .1. Tomorrow she intends to run three errands, 

and she fails to write them on her list. 

a. What is the probability that Jeanie forgets all three errands What assumptions did 

you make to calculate this probability 

b. What is the probability that Jeanie remembers at least one of the three errands 

c. What is the probability that Jeanie remembers the first errand but not the second 

or third 

7. One of the myths most commonly believed by students on multiple choice exams is that, 

as long as they always use letter „C‟ as their guess, they increase their chances of 


guessing correctly. This, of course, is absurd, since there is not usually a set pattern used 

by instructors in pairing correct answers with certain letters (certainly not for me, 

anyhow). 

Suppose that a multiple-choice quiz has two problems on it and that the student has no 

idea how to answer them, so he guesses. Each problem has letters A-E corresponding to 

the answers to choose from. Using counting techniques discussed in class, find and 

explain how you found the following: (Video Solution) 

a. What is the probability that both guesses are correct 

b. What is the probability that both guesses are incorrect 

c. What is the probability that he receives a 50% on the test 

d. How likely is it that he gets at least one problem correct 

e. What is the probability that he receives a 90% on the exam (assume no partial 

credit is possible) 

f. How did the idea of “counting tables” allow you to answer these questions 

without having to do additional work for each subsequent table 


3.3 Probability of Unions 

Imagine that you toss a fair, two-sided quarter. You let it land and take a look at the side facing 

up. What is the probability that you see heads or tails (assume the toss will be ignored if it 

happens to land on its side) 

You can probably see fairly quickly that the outcome desired is guaranteed; when a coin is 

tossed, it will result in one of two outcomes: heads or tails. If someone in a bet were to tell you 

that he will win if the toss of a coin results in heads or tails, then you could probably tell him, 

“Congratulations!” 

Adding to our intuition (no pun intended), we will write the situation in the form of a 

mathematical probability. The sample space will have two outcomes: 

Then, 

( ) 

Since we know that 

( ) ( ) 

So, we can gladly write: 

( ) ( ) ( ) 

Simple enough! We feel pretty satisfied and so we hope to 

tackle another problem: 

Example 1: A large company offers a self-insured health 

insurance policy to its employees to help them reduce premium and copay costs. Using its 

historical data from the last two years, the company analyst considers the risk status of the 

employees (low or high) based on preexisting conditions, and the type of claim filed (physical 

health or mental health). He finds that 70% of employees have filed a mental health claim and 

that 40% of employees have been categorized as high risk. Further, he finds that 20% of 

employees are low risk and have filed a physical health claim. The company only insures the 

first claim. All claims thereafter are paid for by a third-party insurer. 


For reporting purposes, he would like to find the probability that a randomly selected employee 

(or an employee that is to be hired in the future) is high risk or will file a mental health claim. As 

he is writing his report, he reaches a speed bump: 

Letting , 

( ) ( ) ( ) 

He quickly realizes that this probability is invalid because a probability cannot be greater than 1, 

or 100%. What happened 

SOLUTION: 

We first organize his data into a table to help us better see what is happening: 

Claim\Risk Low High 

Physical .20 

Mental .70 

.40 

The probabilities outside of the boxes represent totals for mental health claims and for high risk 

claims. The probability in the 1-1 entry of the table represents the probability of being low risk 

and filing a physical health claim. Since we know that this data represents all of those who have 

filed claims, we know that 100% have filed one type or the other. Additionally, each employee 

considered falls into one of the two risk categories. So we fill in more details: 


Physical .20 .30 

Mental .70 

.60 .40 

We can also proceed to fill in the boxes in the table, since each person falls into exactly one of 

the four positions (low physical, low mental, high physical, high mental): 


Physical .20 .10 .30 

Mental .40 .30 .70 

.60 .40 

Now, the analyst added to second row total with the second column total, as highlighted in the 

table below: 



Physical .20 .10 .30 

Mental .40 .30 .70 

.60 .40 

The problem seems to be that the .40 and the .70 both include the probability of High Risk and 

Mental Claim! In other words, it is being counted twice, hence the end probability that is great 

than 1. 

Instead, let‟s add up the individuals box probabilities as illustrated in the table below: 


Physical .20 .10 .30 

Mental .40 .30 .70 

.60 .40 

We find that ( ) , which is a number that rests between 

0% and 100%. We conclude that, in fact, there is an 80% chance that a claim-filing employee is 

high risk or files a mental claim (or both!!). 

While this does not seem like a huge amount of work, suppose that we instead had three types of 

claims and 3 different statuses. It would probably be convenient to have some sort of 

mathematical approach to the solution. 

Let‟s go back to the table in which the double-count occurred: 


Physical .20 .10 .30 

Mental .40 .30 .70 

.60 .40 

We are free to add the two probabilities, ( ) and ( ), but we must be sure to take out the .30 

one time, so that it is single-counted and not double-counted: 

( ) 

This is the same answer as before! Notice what we really did: 

( ) 

( ) ( ) ( ) 

Regardless of the context/application of the probability, this issues can be resolved as shown. 

Probability of One Event “Or” the Other 


Given two events, and , the probability that one or the other occurs is the sum of the 

individual probabilities with the double-count removed once. Mathematically, 

( ) ( ) ( ) ( ) 

Typically, 

is used (called a union) to replace the word “or”, making the above equation 

( ) ( ) ( ) ( ) 

At the beginning of this section, we addressed a coin-tossing problem that involve the 

summation of the probability of heads and the probability of tails. Let‟s see why we could get 

away with not subtracting away the double-count. We use the “Or” probability set-up: 

( ) ( ) ( ) ( ) 

We already know that the first two probabilities on the right-hand side, but what is the third 

probability value Let‟s analyze its meaning: 

( ) 

Of course, it is impossible to get both heads and tails in one toss of a coin! Any impossible 

outcome has a probability of 0%. That is: 

So, 

( ) 

( ) ( ) ( ) ( ) 

We simply “lucked-out” when this problem worked-out according to our intuition. In general, 

you need only to remember the “Or” probability formula for the reasons given to solve any 

problem involving the occurrence of one outcome or another. 

Example 2: It is often interesting to note how political preference (Democrat or Republican) 

varies within a married couple. Suppose that in a survey of 

160 couples it is found that 60 of the couples agree on a 

preference to vote Democrat and 40 are such that the 

husband votes Democrat and the wife votes Republican. The 

total number of wives that vote Democrat is 90. What is the 

probability that the couple has a husband or a wife that is 

Republican 


SOLUTION: We first arrange this information in a table: 

Husband\Wife Democrat Republican 

Democrat 60 40 

Republican 

90 160 

Note that the bottom-right corner represents the table total. 

We know that the number of husbands voting democrat is 

. This means that the 

number of husbands voting Republican is 

. Additionally, we conclude that the 

number of couples where the husband votes Republican and the wife votes Democrat is 

. We fill this information in: 


Democrat 60 40 100 

Republican 30 60 

90 160 

This allows us to fill in the remaining details in the table: 


Democrat 60 40 100 

Republican 30 30 60 

90 70 160 

We convert the totals into percentages by dividing each cell entry by the total number of couples, 

160: 


Democrat .375 .25 .625 

Republican .1875 .1875 .375 

.5625 .4375 

Let 

So, 

( ) ( ) ( ) ( ) 


We find that there is a 62.5% chance that in a couple either the husband votes Republican, the 

wife votes Republican, or both vote Republican. 

At this point you might be wondering why we don‟t simply draw out the table and ignore the 

mathematical formulas. When possible, tables are extremely useful, but they might not always be 

available. Consider the following example. 

Example 3: Testing has determined that a particular ballistic missile has an 

80% chance of hitting its intended target. Suppose that an enemy jet 

approaches a military base and so two missiles are fired at the incoming jet. 

What is the probability that this threat is eliminated 

SOLUTION: This is the probability that one or both missiles hit the target. 

We only have one probability, so filling out a table would not be possible. 

Let 

We want to know 

( ) ( ) ( ) ( ) 

We already know the first two probabilities on the right hand-side (.80), but we are not given 

information on ( ). We can fairly assume that the outcome of one missile has no (or 

very minimal) impact on the outcome of another missile, and so we assume the events are 

independent. This allows us to write: 

( ) ( ) ( ) 

And so, 

( ) 

We conclude that there is a 96% chance that the enemy jet is eliminated. 



1. A gaming investor is considering becoming a financial partner in a new casino. In 

deciding to go in on the deal, he reviews gaming revenues for previous years. From 

experience and industry research, he decides that the gaming industry tends to be 

successful when total gross revenues for card rooms are above $1 million or when gross 

revenues for lotteries are above $20 billion. Between 2000 and 2009, he found that 50% 

of the time, both sectors have been successful and that 0% of the time only card tables 

were successful (and lotteries were not). Lotteries were unsuccessful 30% of the time 

(SOURCE: 2011 U.S. Statistical Abstract, Table 1258). What is the probability that the 

investor‟s conditions will be met In your professional opinion, is it likely that he will 

decide to become a partner in the proposal (Video Solution) 

2. A researcher conducts a study on a total of 600 cats to determine whether or not they tend 

to be adaptive to danger and whether or not their time to respond to those dangers is fast 

enough to avoid harm. The animals were exposed to non-harmful stimuli to assist in 

answering the researcher‟s questions. In his report he details that, “207 non-adaptive cats 

were studied and, of them, 180 were found to have response times that were simply not 

fast enough. By comparison, a total of 300 cats were both adaptive and had response 

times that were fast enough.” How likely is it that a cat is adaptive to environmental 

physical dangers or has a response time that is fast enough (Video Solution) 

3. In the March 3, 2011 episode of the Dr. Oz Show entitled “Dangerous Doctors: Is Your 

MD Hazardous to Your Health” Dr. Oz mentioned that 20% of the time doctors order 

scans to protect themselves from a lawsuit. Dr. Oz also said, “Up to 1/3 of all tests and 

treatments are entirely unnecessary.” (Video Solution) 

a. Two patients are given orders for scans from a particular doctor. What is the 

probability that one patient or the other were given scans to protect the doctor 

against a lawsuit 

b. One patient is given orders for two different tests/treatments. What is the 

probability that one or both of them was/were unnecessary 

c. A patient is prescribed a scan and a blood test. What is the probability that an 

unnecessary prescription was made, through the patient‟s eyes 

4. In all of his Fall 2010 classes, Milos discovered that 44% of his students earned a „B‟ or 

better on their homework average. He also discovered that 50% of his students had a „B‟ 

or better homework average or a „B‟ or better overall grade in the class (SOURCE: 

Milos‟ Fall 2010 Grade Spreadsheet). If 30% of all his students received a „B‟ or better 

homework average and a „B‟ or better class grade, what percentage of his students earned 

a „B‟ or better in the class (Video Solution) 

5. In all of his Fall 2010 classes, Milos discovered that the percentage of all students that 

earned a „C‟ or better homework average, 87% of these students earned a „C‟ or better 

final class grade. 70% of all students in his classes earned a „C‟ or better homework 

average or earned a „C‟ or better final class grade (SOURCE: Milos‟ Fall 2010 Grade 

Spreadsheet), while only 49% earned a „C‟ or better on homework and as a final class 


grade (some still did well in the class, but maybe failed to turn in homework). What is the 

probability that a randomly selected student in his class earned a „C‟ or better final class 

grade (Video Solution) 


3.4 Conditional Probability 

In many cases, a probability depends on what we already know. For instance, would we believe 

that the likelihood of a car accident changes, provided that the roads are slick from snow We 

would probably agree that the likelihood increases if we already know the road conditions. 

Suppose a fair, two-sided coin is tossed. You are told that the outcome is not a head. What is the 

likelihood that the outcome is tails 

The answer is probably obvious… if you know the outcome was not heads, and the only two 

possibilities are heads and tails, then there is a 100% chance the outcome is tails. 

This is a conditional probability. That is, if 

Further, to indicate that the outcome is not one of the above, we often put a bar on top of the 

event name: 

Then, 

̅ 

̅ 

( ) 

However, given that we know the outcome was not tails, the probability of heads jumped to 1. 

We might write: 

( ̅) 

Instead of using the word “given” we often use a vertical line (called a “pipe”), |. That is, 


( ̅) 

Conditional Probability 

The conditional probability of event provided that already occurred is written as 

( ) 

And implies that the likelihood of may be different, knowing that already took place. 

Example 1: Due to wars at sea, shipwrecks, and other such disasters, there are (roughly) 

around 3,000,000 sunken vessels in the all of the seas in the world! Suppose an area of the ocean 

is mapped out due to the historic ships that have wrecked in that area. There is speculation that, 

of the estimated 20 ships in that region, 11 are original pirate ships. Given that a pirate ship is the 

first of the 20 recovered, what is the probability that the next one found will also be a pirate ship 

SOLUTION: 

We would like to find the probability that a pirate ship is found, given that one pirate ship has 

already been removed. If one ship is removed, there are 19 ships left. Since the ship removed 

was a pirate ship, there are only 10 remaining. That is, 

( ) 

Note that this is different than, 

Why 

( ) 

This probability has no condition placed on it. It assumes the very basic information: 20 ships, 11 

pirate ships. So, 

( ) 

The conditional probability, in this case, is different than the unconditional probability. 

Example 2: Determine whether or not the following situations represent and as 

independent or dependent events. 

a) : It rains in Chandler today 

There is a car accident in Chandler 


) : The Arizona Cardinals make it to the playoffs 

Subway runs out of whole wheat bread 

c) : Dow Jones Industrial reports an enormous loss 

Microsoft stocks plummet 

d) ( ) ( ) ( ) 

e) ( ) ( ) ( ) ) 

f) ( ) ( ) ( ) ) 

SOLUTION: 

a) Dependent; rain likely greatens the likelihood for accidents 

b) Independent; these events probably don‟t have any impact on one another 

c) Dependent; Microsoft is part of the Dow Jones Industrial and so there is a strong 

relationship between the two 

d) Independent; we see that the likelihood of does not change given that has occurred 

– it is still .75 

e) Dependent; the likelihood of does change given that has occurred – it drops to .3 

f) If the product of the two given events does equal the probability of and , then the 

events are independent, as this would mean that ( ) is .75, which is the same as 

( ). We see that , and so we conclude that the events are independent. 

Example 3: An aircraft radar system detects 30 aircraft in a 100-mile radius. Of these, 18 are 

ally planes, 6 are cargo planes, and 6 are enemy planes. Given that a plane approaching the radar 

is ruled out as being an enemy plane, what is the probability that it is a cargo plane 

SOLUTION: First off, define: 

We want to know, 

( ̅) 

Since it is not an enemy plane, it must be one of the remaining 24 aircraft. Of those, 6 are cargo 

planes, so 

( ̅) 


Example 4: Suppose that Company 1 (C1) and Company 2 (C2) are 

competitors in the clothing business. In fact, they both have locations 

within Chandler Fashion Center Mall. Given previous business 

experience, the marketing analyst knows that each company has an 

80% chance of agreeing to sell a particular line of clothing; However, if 

C1 agrees to sell the clothing line, C2 wants to stay competitive and so 

definitely purchases the clothing line. How is the probability that both 

will agree affected by this new knowledge 

SOLUTION: In this situation, the decision of C2 is dependent (conditional) upon the decision 

of C1. Consider a table in which C2‟s choices will reflect the decision of C1. 

Company 2 

Choices When 

C1 Agrees 

Company 1 

Choices 

Y 

Y 

Y 

Y 

Y 

Y 

Y 

Y 

N 

N 

Y Y Y Y Y Y Y Y Y Y 

( ) 

The difference is that C2‟s decisions are all to agree, provided that C1 has agreed. If C1 does not 

agree, then we‟re not really sure how C2 will act, but we don‟t really care, since the probability 

we are in search of is when both companies agree! 

Here we have: 

( ) ( ) ( ) 


We could just as well have written, 

So as to be using the decimal form instead of the tabular fractions. 

If you look back at the reasoning here, you‟ll notice that we have bolded the word “dependent.” 

In previous sections, we didn‟t have to worry about dependency, since we assumed that the 

choices of C1 and C2 were independent, that is, one outcome did not affect the other, and vice 

versa. 

How do we know whether events are dependent or independent Often times this is based upon 

some knowledge of the situation or, perhaps, our intuition. Let‟s set up the important ideas here 

and then we‟ll look at a few examples of dependence versus independence. 

Probability of Two Events Occurring Simultaneously 

Given two events, and , then 

If and are independent events, then 

( ) ( ) ( ) 

( ) ( ) ( ) 

where 

is a symbol to represent the word “and”. We use this in mathematics often. 

And if and are dependent events, then 

Or, as it is often written 

( ) ( ) ( ) 

( ) ( ) ( ) 

( ) ( ) ( ) 

In either instance, the end result involves multiplication. 

NOTE: and are generic names and thus can be attached to an event in an arbitrary order. 

As an interesting note, we can make the following conclusion: 


Independence Property 

Given two events, and , if ( ) ( ), then does not depend on , and so the 

dependence formula reduces to: 

( ) ( ) ( ) 

( ) ( ) ( ) 

This result is important, because it allows you to only have to remember the “and” rule for 

dependent events. If the next event does not depend on the prior event, then the end probability is 

just a product of the two individual probabilities. 

Though the ideas presented above might at first seem confusing, you‟ll notice that the idea of 

joint probabilities has not changed. The only new caution is to take care to acknowledge whether 

the events are independent or not. We‟ll consider a few more examples below. 

Example 5: The probability that a resistor and capacitor both fail in a portable electronic 

device in the fifth year of use is 0.95%. The probability that the resistor fails is 1.22% and the 

probability that the capacitor fails is 1%. Are the events independent If they are not 

independent, what is the probability that the capacitor fails given that the resistor fails 

SOLUTION: 

Let 

If the two events are independent, then the product of unconditional probabilities should give us 

the provided joint probability. 

We have that, 

( ) 

( ) 

If they are independent events, then 

( ) 

However, the joint probability under independence is 0.0122%, not 0.95%. 

Thus, 

( ) ( ) ( ) 


That is, the probability that the capacitor fails is dependent upon the resistor failing. Filling in 

what we know: 

Dividing gives, 

( ) 

( ) 

Thus, there is a 77.9% chance the capacitor fails if the resistor fails. The resistor is an integral 

part in this device. The likelihood of the capacitor failing increases, if the resistor fails first. 

The above examples brings up a useful result. 

Calculating the Conditional Probability of A given B 

Since ( ) ( ) ( ) 

We have that, 

( ) 

( ) 

( ) 

Example 6: In a demographic study of a small, it is found that 5% of the adult residents are 

unemployed and living at or below poverty level. A total of 8% are unemployed. What is the 

probability that a person in this town is living at or below the poverty level, given that they are 

unemployed Interpret the meaning of your answer. 

SOLUTION: 

Letting = a person lives at or below the poverty level and = a person is unemployed, we 

would like to know, ( ) 

We have that ( ) ( ) . Thus: 

( ) 

This says that, if a person is unemployed, there is a 62.5% chance they are living at or below the 

poverty level. We would probably expect this figure to be quite high. 


Conditional probability is quite useful when used in the correct way. The counterintuitive 

problem below will allow us to shed light on how important it really is to think about 

dependencies. 

Example 7: As part of a narcotics checkpoint, officers randomly search freight trucks for 

shipments of illegal drugs. The officers search a small number of crates in the trucks that are 

chosen for random inspection. Suppose that, unbeknownst to the officers, there are two trucks 

ahead, one of which contains one crate with illegal drugs. This truck has a total of 8 crates, while 

the truck without drugs has a total of 5 crates. One of the two trucks will be randomly chosen. 

What is the probability that the officers find the drugs 

SOLUTION: At first, it is tempting to say that the probability is , however this is not accurate. 

The probability that the officers find the crate with drugs is dependent on them choosing the 

correct truck first! 

Let 

Two things must happen: they must choose the correct truck and they must choose the correct 

crate. Randomly choosing one of the two trucks is equiprobable, ( ) . If the correct truck is 

chosen, then the probability of choosing the correct crate is , that is, ( ) 

( ) ( ) ( ) 

Why is it not valid to say 1/13 It might appear that probability is simply pulling a “fast one” on 

our intuition. 

A simple way to think about it is as follows: there is not just one random process here. If all the 

crates were in the same truck, there would indeed be a 1/13 chance that we‟d get the right crate. 

However, there are two random processes here. If you don‟t choose the correct truck, then 

choosing the correct crate is impossible. The likelihood of the second random process leading to 

the correct crate is indeed deeply affected by the outcome of the first random process! 

Example 8: Reconsider Example 7:: Let‟s say that the second truck had two crates with 

shipments of drugs. As before, one of the two trucks will be randomly chosen. What is the 

probability that the officers find the drugs 

SOLUTION: 


This can happen in one of two ways: 

 

 

the truck with 8 crates ( ) is selected and the one correct crate is chosen OR 

the truck with 5 crates ( ) is selected and one of the two correct crates is chosen 

We will first create a small tree diagram showing the possible outcomes. 

The beauty of this diagram is that it displays the conditional probabilities on the right “stems” of 

the tree for each initial choosing of the truck. 

The probability that drugs are found would thus be: 

Truck 1: 

Truck 2: 

Since these are distinct outcomes and cannot both occur (there is no overlap in the events), it is 

okay to add them 

Thus, there is a 37% chance that drugs are found between the two trucks. Again, note that the 

probability is not simply , as our intuition might falsely lead us to believe. 

To formalize the tree above, 

Let 


( ) 

( ) ( ) ( ) 

Since only one truck will be chosen, the probability of findings drugs in T1 and T2 is 0. 

( ) ( ) ( ) ( ) 

( ) ( ) ( ) ( ) 

Summing these together yields 

, as with the tree diagram. 


1. A deck of standard playing cards has 52 cards. There are four suits: clubs, diamonds, 

hearts, and spades. There are two colors of cards – red and black. Diamonds and hearts 

are red, and clubs and spades are black. The cards are labeled A (Ace), 1-10, J (Jack), Q 

(Queen), and K (king). To better visualize, consider the illustration below: 

Suppose you are given various conditions and that you must determine the probability of 

the specified draw on the next card. Use the card descriptions above to find that 

probability that: (Video Solution) 

a. Given that one Jack is removed, a Jack is drawn 

b. Given that all red cards are removed, a black card is drawn 

c. Given that a red Queen is removed, a red Queen is drawn 

d. Given that all red Queens are removed, a black Queen is drawn 

e. Given that all Kings are removed, a red card is drawn 


f. All numerical red cards are removed, a king is drawn 

g. A red king is removed, a black king is drawn 

2. An auto insurance company finds that there is an 18% chance that a teenager gets into a 

car accident between ages 16 and 19. There is a 34% chance that a teenager gets a traffic 

ticket during this same age range. They find that the chance of getting into a car accident 

and getting a traffic ticket (not necessarily because of the accident) is 10%. (Video 

Solution) 

a. Based on the probabilities provided, are the two events independent Perform a 

calculation to justify your answer. 

b. Given that a teenager gets into an accident, what is the probability that he gets a 

traffic ticket 

c. Why did the probability change in this way, as compared to the unconditional 

probability of getting a traffic ticket 

d. Given that a teenager gets a traffic ticket, what is the probability that he gets into 

an accident 

e. Explain, in practical terms, what your answer in d) means. 

3. Let , , and be events in a sample space. Do the following: a) explain whether or 

not the events are independent or dependent, and b) answer the questions below regarding 

these events with the information provided. Assume the first event listed in each 

probability statement occurs first (e.g. ( ) means occurs first). (Video 

Solution) 

a. ( ) 

b. ( ) 

c. ( ) 

( ) 

( ) 

( ) 

( ) 

( ) 

( ) 

4. Gregor Mendel was a monk who, in 1865, suggested a theory of inheritance based on the 

science of genetics. He identified heterozygous individuals for flower color that had two 

alleles (one r = recessive white color allele and one R = dominant red color allele). When 

these individuals were mated, ¾ of the offspring were observed to have red flowers and 

¼ had white flowers. The table summarizes this mating; each parent gives one of its 

alleles to form the gene of the offspring. 

Parent 2 

Parent 1 r R 

r rr rR 

R Rr RR 


We assume that each parent is equally likely to give either of the two alleles and that, if 

either one or two of the alleles in a pair is dominant (R), the offspring will have red 

flowers. (Problem source: Mathematical Statistics with Applications, 6 th Ed., Wackerly, 

et al.) (Video Solution) 

a. What is the probability that an offspring has one recessive allele, given that the 

offspring has red flowers 

b. What is the probability that an offspring has one dominant allele, given that the 

offspring has white flowers 

c. What is the probability that an offspring has white flowers, given that it has one 

recessive allele 

d. What is the probability that an offspring has white flowers, given that it has one 

dominant allele 

e. What is the probability that an offspring has red flowers, given that it has one 

dominant allele 

5. There are 5 candidates for 2 town council positions. Three of them are for the removal of 

a landfill just outside of the city limits. The same candidate cannot fill both seats. (Video 

Solution) 

a. What is the probability that one randomly chosen candidate in the group is for the 

removal of the landfill 

b. Given that one of the positions is filled with a candidate in favor of the landfill 

removal, what is the probability that the second candidate chosen is also in favor 

c. What is the probability that two candidates in favor of the landfill removal are 

chosen 

d. What is the probability that only one seat is filled by a candidate in favor of the 

landfill removal 

e. What is the probability that at least one seat is filled by a candidate in favor of the 

landfill removal 


3.5 Combinations and Permutations 

Recall from Section 3.2 the problem faced by a corn growing business: the FDA determines that 

two of the 20 bushels are potentially contaminated with E. coli. Two bushels had been shipped 

out and the question was: what is the probability that both bushels that were shipped to the local 

grocer were uncontaminated 

We wrote the simultaneous probability as 

( ) 

( ) ( ) 

Due to the fact that one of the uncontaminated bushels was removed from the “pool”, there was 

now only a 17/19 chance that the second uncontaminated bushel would be pulled. In short, we 

wrote: 

( ) 

We notice that the numerator and denominator both have a product of two sequential numbers. 

Had they shipped, say, four bushels, the probability that all four were uncontaminated would be: 

As you might imagine, this pattern continues. 

How painful, though, would it be to have to multiply eight or nine probabilities of this nature 

together You could certainly do it, but you might think, “It sure would be nice to take advantage 

of this pattern!” Well, we‟re in luck! 

Let‟s define an important term: 

A factorial is a descending product of whole numbers down to 1, beginning at a specified whole 

number. To start with a generic whole number, , we denote this product by , and write: 

( ) ( ) 

Example 1: Find . 

SOLUTION: By definition of factorial, we write 


This definition is great, but it still does not resolve our crisis: how do we multiply on a specific 

number of sequential whole numbers 

Here‟s a little trick: write the factorial out, then divide out the factors that are not needed. For us, 

this means: 

⏟ 

⏟ 

But this is the same thing as: 

In a similar way, we can write the denominator of our probability by: 

Before we push this too far and get ourselves into a trap, let‟s consider a different example with a 

smaller sample space. 

Suppose that there are only 3 bushels of corn and that only one is contaminated with E. coli. 

Again, let‟s say that two are shipped out. Then, 

( ) 

If you recall the tabular approach to thinking about this, we might show the possibilities for 

uncontaminated bushels, U1 and U2, and the way in which they can appear: 

1 st Bushel 

2 nd Bushel 

U1 

U1 

U2 

U2 

We know that the pairs (U1, U1) and (U2, U2) for the 1 st and 2 nd bushels cannot be possible, 

since that particular bushel is removed from the population. So, we denote that in the table by 

blacking-out those cells: 


1 st Bushel 

2 nd Bushel 

U1 

U1 

U2 

U2 

Perfect! So we see the remaining two possibilities, right Well, actually, is there a difference 

between (U2, U1) and (U1, U2) Not unless those two bushels are actually different than one 

another! So, blacking out either one of these pairs leaves: 

1 st Bushel 

2 nd Bushel 

U1 

U1 

U2 

U2 

One possibility! 

You might be wondering why we‟re bothering with this if we‟ve already found the probability. 

This is a good thing to wonder. 

Recall that a probability is the number of ways an event can happen divided by the total number 

of outcomes. To be consistent with this definition, we really should be putting 1 in the 

numerator. Does that mean we miscomputed the probability Not in this particular example, but 

it can happen. 

To make our denominator consistent, let‟s look at the total number of possibilities for selecting 

bushels, adding in the contaminated bushel, C: 

1 st Bushel 

2 nd Bushel 

U1 

U1 

U2 

C 

U2 C 

Again, it is not possible to select the same pair twice, so we black-out the diagonals: 

1 st Bushel 

2 nd Bushel 

U1 

U1 

U2 

C 

U2 C 

Are we done Not unless we feel that (U2, U1) is different than (U1, U2). We notice that the 

three cells to the right of our blacked out diagonal are duplicates of those to the left. Thus we can 

cross them out, as well: 


1 st Bushel 

2 nd Bushel 

U1 

U1 

U2 

C 

U2 C 

This leaves us with three possibilities. So, our probability should be: 

( ) 

Wait! This is the same as our earlier calculation of 

( ) 

Since we get the same answer, one might think that it must not matter which approach we take. 

Many times, it doesn‟t; however, “many” is not satisfying enough, since this leaves us prone to 

mistakes under different circumstances. 

Let‟s analyze the full situation two different ways. We found that if we don‟t eliminate order 

differences, then we can write the probability as: 

If we did (correctly) eliminate order differences, notice that we cut the number of possibilities in 

half, that is, divided by 2. You‟ll notice that 2 is the same thing as . So, let‟s divide out 

the number of duplicates from top and bottom: 

And 


Which, in its final state gives: 

This does look rather complicated, but remember that it follows from some fairly simple things 

that we have built up on. Also notice that both the top fraction and the bottom fraction have . 

Ah, yes! So that‟s why the order-not-eliminated and order-eliminated answers are the same: 

⏟ 

⏟ 

While this works out beautifully in this example, it is not always true, and so we must take care 

to observe whether order difference is important. We will see examples later where this 

difference will come into play, but those situation are a bit more advanced. 

Let‟s simplify this horrid notation a bit. Suppose that there are a total of 

are to be drawn. 

items and of those 

Permutation – Order Does Matter 

If order is not to be eliminated (in cases where order is important), then the number of ways to 

select things from the given is called a permutation and is denoted: 

( ) 

NOTE: ( ) , that is, factorial is not distributable!! Subtract first, then use 

factorial. 

For our numerator, we had selected 2 uncontaminated bushels from a total of 18 uncontaminated 

bushels. According to our new notation, this can be written as: 

( ) 

And this is precisely what we have written for the numerator! 

For our denominator, we had selected 2 (general) bushels from a total of 20 (general) bushels, 

since we want to know the total number of ways 2 objects can come out of 20. 


( ) 

And this is precisely what we have written for the denominator! 

In simplified notation, 

( ) 

Calculator Clinic – Using Permutations 

To evaluate a permutation, 

1. first enter in your home screen 

2. Go to and move to the left to the PRB tab. 

3. Select 2: nPr. This will return you to your home screen. 

4. Enter and press ENTER 

TIP: Sometimes the value of the numerator or denominator is so large that the computer 

throws an overflow error. It is advisable to enter the entire probability in, numerator and 

denominator to avoid this potential problem. 

Let‟s now consider the case where it is important to 


Combination – Order Does NOT Matter (Eliminated) 

If order is to be eliminated (in cases where order is not important), then the number of ways to 

select things from the given is called a combination and is denoted: 

( ) 

NOTE: ( ) , that is, factorial is not distributable!! Subtract first, then use 

factorial. Additionally, the factorial of a product is not the product of factorials, that is, 

. 

For our numerator, we had selected 2 uncontaminated bushels from a total of 18 uncontaminated 

bushels, eliminating the number of repeats, which was 2, or . According to our new notation, 

this can be written as: 

( ) 

And this is precisely what we have written for the numerator! 

For our denominator, we had selected 2 (general) bushels from a total of 20 (general) bushels, 

since we want to know the total number of ways 2 objects can come out of 20, order aside. 

( ) 

And this is precisely what we have written for the denominator! 

In simplified notation, 

( ) 

Calculator Clinic – Using Combinations 

Follow the steps for finding permutations, but in Step 3, use 3: nCr instead. 

Example 2: Every week, Cori stops at Chipotle Mexican Grill for 

lunch with his colleagues. Each time, he drops a business card into 

the fishbowl for a chance to win lunch for his entire office. After the 

seventh visit, Cori begins to wonder his chances of winning. He 

estimates that there are approximately 40 cards in the bowl. If two 

were to be drawn, what is the probability Cori wins both draws 


SOLUTION: We first think about what it is that we need to know. Per the question asked, the 

event is that the first and second cards drawn are both Cori‟s. 

This event occurs when the 2 cards drawn both come out of the 7 he has put in thus far. Since the 

order in which his two cards are drawn don‟t matter (as the prize is the same), we would like to 

know the value of 

The sample space is simply the total number of outcomes. Two cards will be drawn from the 

stack of 40, and since order doesn‟t matter 

Thus, the probability of this event is 

( ) 

There is about a 3% chance that both of the cards drawn are Cori‟s. 

Example 3: Probability is often used in police investigations to help 

determine probable cause. Suppose that in a gang-related report it was 

stated that three gang members were spotted. In an interrogation room, 

20 gang members are suspects, three of whom are certain to have 

committed the crime. A detective has a suspicion that the three came 

from a gang of which 5 of its members are present. Just by chance, 

how likely is it that the three members came from the gang he believes to be behind the 

crime Does this give him what you might consider “probable cause” to pursue the group 

SOLUTION: The event is that the three criminals come from a group of five particular gang 

members. There are 

The total number of way three-criminal groups that can be formed out of the suspects is 

This means, 


( ) 

There is only a .9% chance that the three gang members all come from the presumed gang. The 

detective should consider more evidence to narrow down the search results before making 

assumptions. 

Example 4: A business creates a new system to keep track of client relations, such that 

information about the client and a particular orders placed can be accessed by a nonrepeating, 

four character or digit number. For instance, KA23 and 

AK23 are possible codes. Any code containing only letters 

will be reserved for large clients. How many such codes of 

non-repeating letters can they make available, and 

assuming all such codes will eventually be used up what 

percentage of the company‟s clients will be considered 

large clients 

SOLUTION: There are 26 letters in the alphabet and, of 

those, four will comprise a single, large-client code. There 

are 

different codes without the same 

letters being repeated, but where order does matter. 

In order to know what percentage (or probability) of the total number of possible codes this 

represents, we need to compute the total number of codes that can be formed, where no letter or 

number is repeated, but where order does matter. This is precisely what permutations are for. 

Since there are 26 letters and 10 numbers, a total of 36 different “symbols” can be selected from. 

The number of permutations is 

total different codes 1 without the same letters or numbers being repeated, but 

where order does matter. 

So, the percentage/probability, then, is: 

( ) 

We conclude that 25% of all clients (the large clients) will have completely alphabetical codes. 

1 Notice that the increase in the number of possibilities after increasing the size of the sample space is not 

proportional to the increase amount. The growth is actually exponential, not linear. 


Example 5: In Example 4:, it was necessary that letters and numbers were not to be repeated. 

Recalculate the number of large client codes and the percentage of them by assuming that 

numbers and letters actually can be repeated. 

SOLUTION: Recall that a permutation or a combination is intended to handle situation in which 

repeats are not allowed. Recall from the beginning of this section that to find the number of ways 

in which two bushels of corn could be selected from a crop of 20 (and after one is selected, the 

sample space reduces in size), we wrote: 

In this situation, we are allowing repeats. For the number of ways to form a 4-letter code, we 

have 26 possibilities for each digit. That is 26 for the first, the second, the third, and the fourth. 

Crossing all of these possibilities gives: 

Which we expect to be larger than in the previous example since we are allowing repeats. 

Similarly, the number of letter/number codes that are possible can be calculated by noting 

that, in general, each piece of the code has 36 possibilities. So, 

The percentage/probability is 

( ) 

The percentage changes to 27% of all codes will contain only letters. 

Moral of the Story with Counting 

’ 

determining some key pieces of information: 


1. Are repeats/replacements allowed If yes, permutations/combinations are likely 

the incorrect approach. 

2. Does order matter If yes, permutations should be used. If no, combinations should 

be used. 

You Might Be Wondering: 

You might be wondering why we must divide by to remove all repeats. This was 

probably somewhat obvious when working with two objects. Say there are 5 objects to 

select from. One is now gone, so for the second selection there are only 4. We proceed to 

cross out everything along the and to the right of the diagonal since they are either not 

possible or are s ’ 

Object 1 

Object 2 

Object 3 

Object 4 

Object 5 

Object 1 Object 2 Object 3 Object 4 

We have essentially multiplied the first five possibilities by the next number of possibilities, 

which is only 4 (this is accounted for by crossing out the diagonals, since this subtracts out 

five possibilities to give ), and then divided that result by 2, since half of the table is a 

repeat. That is, 

What happens when we select a third object We extend the above table as a multiple of 3, 

since there are three objects left. Each table represents a pairing with one of the three 

remaining objects, as shown in the upper-left corner: 


OBJECT 1 Object 1 Object 2 Object 3 Object 4 

Object 1 

Object 2 

Object 3 

Object 4 

Object 5 


Object 1 

Object 2 

Object 3 

Object 4 

Object 5 


Object 1 

Object 2 

Object 3 

Object 4 

Object 5 

In the first table, we can cross out the first column (and first row, if it were there), since it is 

not possible to select object 1 for a third time. In the second table, we can cross out the 

second column/row and in the third table we can cross out the third column/row for the 

same reason as table 1. 


Object 1 

Object 2 

Object 3 

Object 4 

Object 5 


Object 1 

Object 2 

Object 3 

Object 4 

Object 5 


Object 1 

Object 2 

Object 3 


Object 4 

Object 5 

Also notice that the second column of table 1 and the last three rows of table are the same 

(1, 2, 3), (1, 2, 4), and (1, 2, 5). For a similar reason, the third column of table 1 can be 

crossed out, since it is a repeat of what we have in column 1 of table 3. 


Object 1 

Object 2 

Object 3 

Object 4 

Object 5 


Object 1 

Object 2 

Object 3 

Object 4 

Object 5 


Object 1 

Object 2 

Object 3 

Object 4 

Object 5 

Nothing else in table 1 can be eliminated, since (1, 4, 5) cannot be found in either of the two 

remaining tables (this is a unique characteristic of the bottom, right-most entry). 

In table 2, we will try to eliminate any entries that can be found in table 3. These 

eliminations will involve any entries that contain Object 3. We can do so with the (2, 1, 3) 

entry and the third column: 


Object 1 

Object 2 

Object 3 

Object 4 

Object 5 


Object 1 


Object 2 

Object 3 

Object 4 

Object 5 


Object 1 

Object 2 

Object 3 

Object 4 

Object 5 

Now, notice that we have 10 white spots left. This happens to be exactly one-third of what 

we had after we tripled the table. That is, 

⏟ 

⏟ 

Which can be simplified to, 

( ) 

Selecting items allows this process to repeat, ad nauseam, any number of times. 

Mathematicians discovered that this tabular process could be reduced into the formula we 

“ ” general case (where we 

allow to be any value between 0 and the number of items we have to choose from), which 

tends to be discussed in more theoretical mathematics courses such as Discrete 

Mathematical Structures (our MAT227). 


1. If possible, give an imaginary (but realistic) scenario for each of the following. If not 

possible, state why. 

a. 

b. 

c. 

d. 


e. 

2. Your classmate was absent when permutations and combinations. Explain when he 

should and when he should not use permutations and combinations. (Video 

Solution) 

3. A police officer has been brought before the court on accusations of racial profiling. 

This occurs when a person of a particular race has been pulled over or detained by 

the police due to his race. The officer stopped 2 vehicles out of 10 that passed by 

through a freeway tollbooth. Both of the suspects were Asian and there were a total 

of 3 Asian drivers in the 10. (Video Solution) 

a. In how many ways could 2 drivers have been selected from the 10 

b. In how many ways could 2 Asian drivers have been selected from the 3 

c. How likely is it that the 2 selected drivers would both have been Asian if the 

stops were truly random 

4. In the United States, 20 out of the 50 states spend more than 50% of their state park 

and recreation areas revenue on keeping the state park operable (SOURCE: 2012 

U.S. Statistical Abstract). Suppose a survey of 10 states is to be conducted next year 

to see if anything has changed. (Video Solution) 

a. In how many ways can 10 states be selected for the survey 

b. In how many ways can 10 states be drawn so that all 10 are operating on 

more than 50% of their state park and recreation areas revenue 

c. What is the probability that all 10 of the states drawn are operating on more 

than 50% of their state park and recreation areas revenue 

5. Ten pieces of furniture are to be arranged in a long row in a furniture store. In how 

many ways can all 10 be arranged (Video Solution) 

6. At Chandler-Gilbert Community College high-school math competitions, students 

enter into a raffle to win various prizes, including a graphing calculator. There are 

typically around 200 students. Suppose there are 5 different types of calculators to 

be given out and that the best is saved for last. (Video Solution) 

a. In how many ways can the prizes be distributed among the 200 students 

b. Suppose a school has 5 attendees. In how many ways can all 5 students from 

this school win a calculator 

c. What is the probability that all 5 students from this school wins a calculator 

7. A frequent concern of cautious consumers is the idea of the last four digits of a credit 

card number being displayed on receipts. Suppose a consumer has a Visa, which has a 

total of 16-digits, each of which can be between 0 and 9. For the sake of simplicity, 

suppose any combination is possible. A customer left the following receipt lying around 

and is now concerned about his identity: (Video Solution) 


a. First, how many different credit-card numbers are possible with 16 digits 

b. How many different credit-cards numbers can be arranged with 6781 as the last 

four digits 

c. On any one guess by a potential thief, what is the probability that he correctly 

guesses this person‟s credit card number 


3.6 Expected Value 

Imagine that you are an 

insurance salesperson with 

many years of experience. A 

new client has requested that 

your business provide him 

with auto insurance. He is 20 

years old and has never been 

in an accident before. 

Considering age alone, you 

look at industry data and find 

that, as recently as 2008, 

there was about a 15% chance that someone his age would get into an accident (SOURCE: U.S. 

Statistical Abstract, Table 1113). Using your own expertise you find that, of your 20 year-old 

clients, the typical accident payment for his particular make and model of vehicle is about 

$3,200. He brings forward a quote from another insurance agency for a $100/month premium 

with no deductible (nothing to pay when an accident does occur except the running premium). 

The question is, do you insure him 

Let‟s look at the possibilities in a tabular form. Since there‟s a 15% chance the driver will get 

into an accident, there is an 85% chance he won‟t (since it either does happen or it doesn‟t). If 

there is no accident, then the insurance company receives $1200 for the entire year. If an 

accident does occur, the insurer pays out $3200 (hence a negative effect), but still receives the 

year‟s premiums. Thus, the net difference is $2000, which the insurer is responsible for. 

Action Likelihood Monetary Value to Insurer 

Accident 15% 

No Accident 85% 

If we now consider 100 years, it is expected that 15 of those years there would be an accident 

and 85 of them there would be no accident, assuming the constant probability. That means the 

insurer would pay $2000 a total of 15 times and receive $1200 a total of 85 times. Let‟s consider 

the net difference: 


This amount looks very good! In fact, on average, the company received 

. This customer is definitely profitable to the company, in the long-run. Of course, 

we know that an accident could occur the first year, in which an $800 loss would be incurred 

right away. 

Notice what we really did here. We took the sum of the amounts and divided by 100: 

( ) 

By properties of a common denominator we can write: 

( ) ( ) 

( ) ( ) 

( ) ( ) 

In reality, we multiplied each monetary value by its respective probability. This idea is 

known as expected value, since it is what we expect to happen in the long-run. 

Expected Value and Random Variable 

Expected value is the expected, or average, quantity that should occur in the long-run, 

provided that each quantity occurs with a certain probability. 

Suppose there are quantities, , each of which occurs with a certain 

probability, , respectively, then the expected value, denoted , - is 

, - 

A capital , , is used to denote what is called a discrete random variable, a variable that 

takes on one of (a natural number of) values with a certain probability. This value is 

defined by what it measures in the given situation. 

Importantly, 

, that is, we must account for 100% of all possible 

outcomes in order for the expected value to be meaningful. 


An expected value is actually not something terribly new. To see this more explicitly, 

suppose a student earns three test scores: 95%, 80%, and 85%. Then the average 

percentage is: 

Observe that we can use properties of fractions to separate the sum as follows: 

( ) ( ) ( ) 

While one-third in this situation is not a probability (since the scores have already been 

) “ ” -third of the overall 

class grade. 

Example 1: 

A company sells consumer electronics, such as televisions, stereos, and 

computers. For each product, the company offers the consumer 

a warranty that protects any problems that might occur within 

the first two years, with the exception of accidental damage 

and theft. For a particular television that runs $1200, it offers a 

2-year warranty for $ ’ 

determines that 3% of these televisions malfunction each year. 

Is the company offering the warranty at a profitable price 

Explain your answer and define the random variable. 

SOLUTION: We should determine what will happen, on average. We first see that the 

warranty is a 2-year warranty and the defect rate is for one year. If 3% malfunction each 

year, then 6% of all televisions are expected to malfunction within the first two years. 

This means that the company will make $175 with a 94% probability and will lose $1200- 

$175=$1025 with a 6% probability, since it will still receive the payment, but will have to 

either replace the product or offer a credit to the consumer. 

Letting 

, -, is 

, then the expected amount to be gained, or 

, - ( ) 


This means that, after selling this product for a while, it should earn an average of $103 

from each consumer that purchases the warranty. This is a profitable outcome. 

Example 2: The Arizona Lottery has a number of different lottery games that a person 

can play. One in particular is Fantasy 5. The rules of the game are simple: pay $1 per 

ticket and select five numbers between 1 and 41. Five numbers are then selected at 

random. If you correctly selected two or more of these numbers, then you are 

considered a winner. The following table describes the likelihood of winning: 

(SOURCE: www.arizonalottery.com) 

The estimated jackpot for the Wednesday, August 17, 2011 lottery was $54,000. Is the 

game in your favor Why or why not 

SOLUTION: 

We must first consider the fact that these prizes do not take into account that $1 was lost to 

purchase the ticket; we should subtract $1 from each of the prizes. Additionally, we note 

that the probabilities do not add to 1: 

The remainder of the time, it is simply the case that $1 is lost: 

We rebuild the table to show all of the values and probabilities: 

53,999 499 4 0 -1 

( ) 1/749,398 1/4163 1/119 1/11 9,004/10,000 


Where 

The expected value is: 

, - ( ) ( ) ( ) ( ) ( ) 

This means that if one were to play time-after-time, taking into consideration the small 

likelihood of winning occasionally, one would be expected to lose, on average, $0.67 per 

ticket. 

’ 

Notice that we represented the outcomes by using a table, in which we listed the outcomes, 

or the individual along with the probability that this occurs, ( ). This is one way in 

which to display a probability distribution, or how all probabilities are distributed among 

the various outcomes. 

Example 3: A fair, six-sided die is tossed repeatedly. The number of dots that are facing 

up after each throw is recorded. Define the random variable, find its probability 

distribution, and find and interpret the expected value of the random variable. 

SOLUTION: We define the random variable, 

The different values that can take on are , since we know there are six 

sides. Since this is a fair die, each of these six outcomes has an equally likely chance of 

appearing, so ( ) , for all values, of . Our probability distribution is thus, 

1 2 3 4 5 6 

( ) 1/6 1/6 1/6 1/6 1/6 1/6 

The expected value is the sum of the products of each outcome value and its associated 

probability. 


Average Die Roll Outcome 

, - ( ) ( ) ( ) ( ) ( ) ( ) 

The average value of a die that is repeatedly tossed will be 3.5. If we were to conduct a 

simulation we would probably see something similar as in the introductory section of this 

chapter: 

6 

Average Die Roll Outcome 

5 

4 

3 

2 

1 

0 

0 20 40 60 80 100 120 140 

Number of Times Die Has Been Tossed 

As time passes, we see that the average roll becomes more stable and seems to e approaching 

3.5, as we have shown mathematically. 

Example 4: In hopes of understanding the directions in which married couples are naturally 

inclined to walk at an outdoor mall in Arizona, a marketing group conducts a study. It is the 

experience of the mall that men and women tend to walk in different directions once they 

park (and catching up later). The first question is how many individuals within a couple can 

they expect to start their walk through a street that has one or more clothing stores 


SOLUTION: We first note that there are three paths out of five with one or more clothing stores. 

We assume there are two people per couple and that each takes a different initial route. The 

random variable we are interested in is: 

The random variable can take on values, 

take a clothing store route, only one does, or both do. 

, since it is possible that neither of them 

We need to find the probability for each of the three events. 

individuals taking a route with a clothing store would occur when, from the three clothing 

store routes, none are selected, and both routes without clothing stores are selected. We then 

must compare this to the number of ways two routes can be chosen from five. That is, 

( ) 

( )( ) 

( ) 

Similarly, for , we want to know how many ways one clothing-store route and one nonclothing-store 

route can be selected. That is, 

( ) 

( )( ) 

( ) 

For 


( ) 

( )( ) 

( ) 

Our probability distribution is: 

0 1 2 

( ) 1/10 6/10 3/10 

We can see that the probabilities sum to 1, which helps to imply that we have accounted for all 

possibilities. 

The number of individuals expected to take a clothing store route is an expected value of this 

distribution, 

, - ( ) ( ) ( ) 

Thus, it can be expected that, on average, at least one person from the couple will walk along a 

route that contains a clothing store. 

One additional way to represent a probability distribution is by using a probability histogram. 

A histogram looks similar to a bar graph, except that it has a numerical horizontal axis and 

measures the probability along the vertical axis. Additionally, the bars touch in order to show 

continuity, where applicable. For the above situation, we would expect to see: 


Probability 

0.7 

0.6 

0.5 

0.4 

0.3 

0.2 

0.1 

0 

Clothing Store Route Probabilities 

0 1 2 

Number of Individuals 

This is a convenient visual way to view the distribution of probabilities. It is clear to us that it is 

quite unlikely that neither of the individuals in the couple will walk a route without a clothing 

store. 


1. While working in downtown Phoenix, the author tracked minutes that the Blue Line 

bus going through downtown Phoenix, AZ was late in arriving at a specific bus stop. He 

discovered the following: (Video Solution) 

On time 1 2 3 4 

( ) 0.53 0.25 0.18 0.03 0.01 

a. Construct a probability histogram. 

b. What does the probability histogram reveal 

c. Find and interpret the expected value of the random variable. 

(SOURCE: Author‟s data) 

2. A Geico auto insurance policy for a 21-year-old Chandler male driver of a 2012 BMW 

M5 with no previous tickets has a semi-annual premium of $312.41. In the instance of an 

accident, there is a $1,000 deductible that the policyholder must pay before insurance will 

cover the damages (SOURCE: www.geico.com). The vehicle costs about $115,000 to 

replace. From past experience, suppose Geico knows there is a 2.5% chance (annually) 


that this situation will result in an accident. Find the expected payout for Geico and 

comment on its profitability in a situation like this. (Video Solution) 

3. An insurance policy pays $100 per day for up to 3 days of hospitalization and $50 per 

day for each day of hospitalization thereafter. (Video Solution) 

The number of days of hospitalization, , is a random variable with probability given by 

the function 

( ) { 

a. Define the random variable. 

b. Give the probability distribution for by using a probability histogram. 

c. What does the probability histogram tell you about hospitalization 

d. Determine the expected payment for hospitalization under this policy. 

(SOURCE: Society of Actuaries (SOA), Spring 2003 Exam P, #36) 

4. You work on a dairy farm and are in charge of quality control for eggs. Your primary 

concern is that broken eggs do not go out. You know from past experience that about 

25% of the outgoing boxes contain one or more broken eggs (based on complaints). If a 

local restaurant purchases 4 boxes of eggs from you, what is the expected number of 

boxes with broken eggs that this vendor should receive (Video Solution) 

5. At a major seafood restaurant, shrimp fettuccini is a popular dish. The company is 

considering adding a family-sized fettuccini dish, but would first like to make sure that it 

will be a profitable endeavor. The company randomly surveys customers that who 

purchase the original $14.99 dish and finds that 15% would purchase the larger family 

dish. What should they charge for the family-sized dish so that average revenue from 

shrimp fettuccini will be $17.00 (Video Solution) 



Chapter 4 

Discrete Probability Distributions 

It might seem paradoxical to say that uncertainty occurs in certain ways, but the truth is that it 

does – assuming certain assumptions are satisfied. As we build a probability distribution, 

whether in the form of a table or histogram, we can often times save ourselves a lot of labor by 

focusing on the type of experiment that lay before us. The purpose of this chapter is to 

(hopefully) simplify some of our efforts. 

4.1 The Binomial Distribution 

1.1.1 Why Probability Distributions Are Useful 

Suppose a friend of yours, let‟s call him Kyle, tells you that his brother is 6-feet, 9-inches tall. 

You are most likely wide-eyed and surprised by what he just told you. 

Why is this 

You likely have some idea of how tall people generally are. You would probably consider a 

height of 6-feet, 9-inches to be uncommon in the environment you‟re used to. In fact, you might 

even go as far as to call this height an outlier, or a value that falls outside the usual data range. 

How can you be absolutely sure that this height is uncommon What if you live in a region that 

tends to have shorter people 

The statistician would say that it would be nice to see a probability distribution associated with 

heights of all people living in the region, state, country, or continent on which you live. She 

would argue that, if you are trying to describe the people in the U.S. based on people living in 

Arizona, you are drawing from a biased sample. 

While we will not discuss continuous random variables here (variables that can take on any 

number in a specified range), we will show a theoretical distribution for heights in the U.S. 

below: 


For men, we see that the most frequently occurring height is near 70 inches (5-feet, 10-inches). It 

is very uncommon to have someone who is 80 inches tall (6-feet, 9-inches). This type of 

information allows us to conclude that your brother‟s friend is indeed very tall. 

You might be wondering how we know that the shapes of the distributions should look like bells. 

This is based on the data collection process. It is not unlikely in nature for distributions to have a 

heavily loaded center with lower frequencies out towards the left and right tails. While the 

histogram of all heights might not have a perfect bell shape as we indicate, having this shape 

allows us to use mathematics to model the curve. 

Although many variables do take on a continuous set of values, we will begin with discrete 

random variables, as these are slightly simpler to describe. 

1.1.2 The Binomial Distribution 

When we talk about any variable that can take on a finite (as opposed to infinite) number of 

possibilities, we are dealing with a discrete random variable. 

Specifically, a binomial random variable is one that takes on one of two possible values, as 

indicated by the prefix “bi.” We will simply refer to the outcome as either a “success” or a 

“failure.” 

Consider this example: let‟s say that you and a friend are tossing a coin (since this is one of the 

most exciting things to do). Your friend tosses 9 heads out of 10 tosses. Curious about this, you 

begin to analyze the results – how likely is that this type of event could take place 

By letting and represent the events that a head/tail is facing up on a coin toss, respectively, 

we know that one possible way in which this can happen is: 

The probability of this particular sequence of 9 heads and 1 tail is: 

( ) ( ) 


This is definitely a small probability, but it is not the only way in which this can happen. The tail 

can occur first, second, third, fourth, etc., with heads all around it. Another one would be: 

The probability of this sequence is the same: 9 heads, 1 tail. This is okay, since the probability of 

tossing a certain sequence does not affect the probability of getting a head or tail on the next toss. 

So, 

( ) ( ) ( ) ( ) ( ) ( ) 

Not surprisingly, there are 8 more places for the tail to have appeared. We‟ll summarize in the 

table below: 

Arrangement of 9 , 1 

Probability 

( ) ( ) 

( ) ( ) 

( ) ( ) 

( ) ( ) 

( ) ( ) 

( ) ( ) 

( ) ( ) 

( ) ( ) 

( ) ( ) 

( ) ( ) 

Since these are 10 distinct ways of getting this outcome, each with probability 0.000977 (that is, 

each takes up 0.0977% of the entire sample space), the probability of getting 9 heads and 1 tail 

is: 

( ) 

As suspected, this particular event is not very likely. 

What if we complicated the problem a little more and asked, what would be the probability of 

having two tails mixed up in 10 total tosses 

This gets more complicated, since the two tosses could occur one after another, two tosses apart, 

three tosses apart, etc. To simplify our lives, it can be shown that the total number of ways in 

which a binary “success” can occur is by finding the following combination: 

. / 


So, we had 10 trials and wanted to know the number if ways in which 9 heads (successes) can be 

included in the mix. We have: 

. / 

Then, we simply need to find the probability of just one of those arrangements and multiply it by 

the number of different arrangements. 

Since we defined a head resulting as a success, then, what we just calculated was: 

. / ( ) ( ) 

At first glance, it might seem a little confusing that the second exponent is the number of trials 

less the number of successes. 

Why is this 

Suppose there are 10 trials and you want 6 successes. This necessarily means that the other 4 

trials would result in failures. This is precisely , or the number of trials less the 

number of successes. 

Let‟s make this formula easier to consider. First off, let‟s define some variables: 

Let 

Now, in any event, success and failure make up the whole sample space. That is: 

Since they make up the sample space, 

So, 

( ) ( ) 


( ) ( ) 

We rewrite our formula with the above defined components: 

. / ( ) 

This is known as the binomial probability density function, or binomial pdf. 

To make this more clear, we first define a random variable, . In the case of a binomial 

experiment (one in which there are two possible outcomes for each trial), the set listing all 

possible values that can be achieved (between 0 and the number of trials). 

For example, if 

in coin tosses, then * +. That is, between 0 and 10 heads can possibly 

be achieved in 10 tosses of the coin (though not all have the same probability). To indicate a 

binomial pdf calculation, we often write: 

The probability that takes on successes is . / ( ) , or, 

( ) . / ( ) 

We summarize a binomial pdf below, along with the necessary assumptions to use this. 

Binomial Probability Density Function (pdf) 

If the following assumptions are met: 

1) An experiment is carried out with trials, 

2) Each trial can result in only one of two possible values: a success or a failure, 

3) The probability of a success in each trial is (it is always the same), and 

4) Each trial is independent of all other trials (the outcome of one trial in no way affects the 

outcome of any other trial), 

then the experiment is a binomial experiment and the probability of 

calculated by 

successes can be 

( ) . / ( ) 


Example 1: 

A fair-two sided coin is tossed 10 times. The goal is to get 8 heads. 

a) In how many different ways can this event occur 

b) Verify that all assumptions are met to conduct a binomial experiment. 

c) What is the probability of this event 

SOLUTION: 

a) Since there are 10 events and 8 successes desired, there are: 

b) 

. / 

1) There are trials 

2) Each outcome is either a head (success) or a tail (failure) 

3) The probability of success on any trial is 

4) One toss does not influence the outcome of any other toss 

Thus, all assumptions have been met. 

c) 

( ) . / ( ) ( ) 

Thus, there is about a 4.3% of tossing 8 heads in 10 tosses. 

The fact that the probability of getting 8 heads in 10 tosses is higher than getting 9 heads in 10 

tosses should not surprise us. Getting 9 heads is a rather extreme request. Getting 8 heads, while 

still extreme, is a bit more likely. 

Let‟s now build the probability distribution histogram for . We first display the probabilities in 

a table below by applying the binomial pdf: 


Probability 

Successes Probability 

0 0.001 

1 0.010 

2 0.044 

3 0.117 

4 0.205 

5 0.246 

6 0.205 

7 0.117 

8 0.044 

9 0.010 

10 0.001 

Does this match our expectations The table indicates that getting 5 heads has the highest 

likelihood of all 11 possible events. Even more importantly, the probability of getting between 4 

and 6 heads in 10 tosses is 

. The probability of getting very few 

or many successes gets to be very unlikely. This data is displayed in the histogram below: 

0.300 

Tossing X Heads in 10 Tosses 

0.250 

0.200 

0.150 

0.100 

0.050 

0.000 

1 2 3 4 5 6 7 8 9 10 11 

Successes 

This further validates our argument above. 

Additionally, note that the sum of all event probabilities sums to 1. This is necessary and 

important in describing the distribution. 

Sum of Success Probabilities in a Binomial Experiment 

With trials in a binomial experiment, the sum of the probabilities of 0 up to successes must 

constitute the sample space and hence equal 1. 

That is, 


( ) ( ) ( ) ( ) 

Example 2: A fair, 6-sided die is rolled 8 times. The goal is to roll a 1 or a 2 four times during 

the experiment. 

SOLUTION: 

a) Is this a binomial experiment 

b) In how many different ways can this event occur 

c) What is the probability of this event 

a) A success is classified as rolling a 1 or a 2. A failure is classified as rolling a 3, 4, 5, or 6. 

Thus, . There are trials and the probability of a success is always , since 

the 8 outcomes are independent. Thus, this is indeed a binomial experiment. 

b) It is possible to have a success occur in . / different ways. 

c) Let be the number of successes possible. Then * +. 

( ) . / ( ) ( ) 

. / ( ) ( ) 

There is about a 17% chance of getting a 1 or 2 on four out of 8 die rolls. 

A question that follows from Example 2: is, what does the distribution look like Let‟s develop 

the distribution in tabular form first. To do this, we calculate binomial probabilities for each of 

the 9 possible outcomes (anywhere between 0 and 8 successes possible). 

Successes Probability 

0 0.039 

1 0.156 

2 0.273 

3 0.273 

4 0.171 

5 0.068 

6 0.017 

7 0.002 

8 0.000 


Probability 

We see clearly that the number of successes with the highest probability is 2 or 3. The histogram 

follows: 

0.300 

0.250 

0.200 

Rolling a 1 or 2 in 8 Die Rolls 

0.150 

0.100 

0.050 

0.000 

1 2 3 4 5 6 7 8 9 

Successes 

Notice that this distribution is not symmetric. It is said to have to be skewed to the right, since 

the distribution has its probabilities heavily concentrated towards the left and so has a tail to the 

right (hence the name) 

Distribution Types 

There are three single-peaked (called unimodal) distributions, as illustrated below: 

1.1.3 Expected Value 


Expected Value of a Binomial Random Variable 

It can be shown that the expected value of , or the average number of successes we expect to 

see, given that is a binomial random variable, is: 

( ) 

Example 3: Pristine Air Conditioning uses a digital phonebook to call homeowners in a large 

city regarding a $55.99 A/C maintenance special. In an hour, a telemarketer can make about 

10 calls. If the probability that a randomly called homeowner signs up for the maintenance 

special is 0.40, 

a. what is the probability that telemarketer gets at least 80% of his hourly customers 

to sign up 

b. Represent this probability in a histogram. 

c. Find and explain the expected value of the random variable. 

SOLUTION: 

a) We first need to determine whether or not this is a binomial probability. Since the 

probability of success is 0.40 on every one of 10 trials and we assume that the size of the 

population does not significantly impact the percentage of success (as removing one 

potential customer from the pool reduces the size of the callable population), we conclude 

that this is a binomial experiment. Thus, the number of called homeowners that 

accept the offer. 

We want to know the probability of getting business from 8, 9, or all 10 of the called 

individuals. We want: 

( ) ( ) ( ) 

because each of these accounts for disjoint pieces of the sample space. 

With and , we have: 

. / ( ) ( ) . / ( ) ( ) . / ( ) ( ) 

Thus, there is only about a 1.23% chance that the A/C company gets the business of 80% 

or more of the homeowners called. 

b) The histogram is below. The probability we are looking at is the sum of probabilities after 

7 successes: 


c) The expected value is, , - ( ) . Thus, we expect that each hour 4 out of 10 

homeowners accept the maintenance offer. 

Homework Problems –4.1 

1. Determine whether or not each of the following experiments represents a binomial 

experiment. (Video Solution) 

a. A die is rolled 20 times and the number of 6‟s is counted. 

b. A die is rolled until ten 6‟s show up. 

c. In a stream with 1,500 fish, 700 are Rainbow Trout. A total of 20 fish are caught 

and the number of Rainbow Trout is counted. 

d. About 10% of the U.S. population is suspected to have a form of bacteria. A 

sample of 100 people is drawn from the population and the number of people with 

the strain of bacteria is counted. 

e. A brand of LED light bulb has a 0.5% chance of going out prior to the advertised 

life of 30,000 hours. In the testing phase, 850 bulbs are sampled for quality 

assurance. The number of bulbs that don‟t die prior to the 30,000 hour life is 

counted. 

2. Suppose the outcome of random variable is conducted with trials each with 

independent probability of success, . (Video Solution) 

a. Is this a binomial experiment 

b. What is the probability that 

c. What is the probability that 

d. What is the probability that 

e. What is the probability that 


f. What is the probability that 

g. What is , - Does it coincide with the resulting that has the highest 

probability 

3. In preparing for a New Year‟s Eve celebration, police look at past records for arrests due 

driving under the influence (DUI). In the U.S., 10.5% of arrests made are for DUI 

(SOURCE: U.S. Statistical Abstract, Table 324). If it is expected that each police officer 

makes 10 arrests, what is the probability that all arrests result in DUI‟s (Video Solution) 

4. Pancreatic cancer is a vicious killer. The 5-year survival rate between 2001 and 2007 was 

only 5.9%, meaning that the majority of people with pancreatic cancer die within 5-years 

of contracting the cancer. In a group of 25 patients, 5 survive beyond. How likely is such 

an event Assume that the survival of one person is independent of another person. 

(SOURCE: U.S. Statistical Abstract, Table 182). (Video Solution) 

5. A new herbal drink blend is being compared to an older blend via a blind taste-test 

comparison. Four judges will taste each of the two drinks and will state their preference. 

It is anticipated that both blends are equally impressive. (Video Solution) 

a. Find the probability distribution for the number of judges that vote in favor of the 

new blend. 

b. Construct a probability histogram. 

c. What is the probability that at least two of the judges prefer the new blend 

d. What is the expected value of this distribution and what is its real-world meaning 

6. Goranson and Hall (1980) explain that the probability of detecting a crack in an airplane 

wing is the product of , the probability of inspecting a plane with a wing crack; , the 

probability of inspecting the detail in which the crack is located; and , the probability 

of detecting the damage. (Problem Source: Mathematical Statistics with Applications, 6 th 

Ed., Wackerly, et. al.) (Video Solution) 

a. What assumptions justify the multiplication of these probabilities 

b. Suppose and for a certain fleet of planes. If three planes 

are inspected from this fleet, find the probability that a wing crack will be 

detected on at least one of them. 

c. Find the probability distribution for the number of planes in this fleet with 

detected wing cracks. 

d. Construct a probability histogram. 

e. What is the expected value of this distribution and what is its real-world meaning 


Probability (Relative Frequency) 

Chapter 5 

Continuous Probability Distributions 

Up until this point, we have only considered distribution that have discrete values – non-negative 

integers. There are many variables, however, that are continuous in nature. In fact, almost every 

variable you studied in algebra and calculus was continuous! 

Take, for example, heights of NBA basketball players, hourly wage, response time of a database 

server, temperature, depth of a lake, the value of a share of Intel stock, and the lifespan of a car 

engine, to name just a very few. These are all variables that can take on infinitely many values, 

even within a limited range. For example, the response time of a database could be 0 seconds and 

1 second. It could be 0.01 seconds, 0.00001 seconds, or 0.98727495 seconds. 

5.1 The Ideas Behind the Continuous Distribution 

5.1.1 Conceptual Approach to Continuous Distributions 

Think back to a discrete distribution. The probability of a particular value was found by 

observing the height of the relative frequency bar. While relative frequency represents the 

percentage of observations found to have the value specified, it can also be thought of as a 

probability, if we feel that it accurately models predictions that we might use it for. Consider the 

example below showing the number of children in a classroom of 30 that are likely to likely to 

have the flu. 

Number of Children with Flu in a Class 

0.45 

0.4 

0.35 

0.3 

0.25 

0.2 

0.15 

0.1 

0.05 

0 

0.4 

0.2 

0.14 

0.16 

0.1 0.1 

0 1 2 3 4 5 

Number of Children w/Flu 


Probability 

For instance, we see that the probability that any 2 children in a classroom have the flu is 0.2. 

Let‟s call this random variable 

# of children in a classroom of 30 that have the flu. 

Then, we will write the probability that any 2 children have the flu as: 

( ) 

This reads, “the probability that the number of children that have the flue is 2” 

The output of this statement is: 

( ) 

What would it mean to say ask: What is ( ) 

This is asking us to find the probability that 2 or fewer children have the flu. In other words, 

what is the probability that 0, 1, or 2 children have the flu. To answer this, we simply add the bar 

heights corresponding to . 

( ) 

Thus, there is a 74% chance that 2 or fewer children in a class of 30 children have the flu. 

With continuous distributions, we cannot simply read the “height of the bar!” For instance 

consider the following continuous probability distribution that shows the likelihood of various 

wait times in line at a fast-food restaurant: 

0.25 

0.2 

0.15 

Time Speng Waiting in Line 

0.1 

0.05 

0 

0 1 2 3 4 5 

Minutes 


In this case: minutes spent waiting in line is a continuous random variable. The reason is 

that a person doesn‟t wait a whole-number of minutes! It is perfectly okay for a person to wait 

1.42 minutes, for example. 

In this example, suppose we wish to find ( ), that is, the probability that the wait time is 

2-and-a-half minutes. At first glance, we might simply decide to locate 2.5 minutes and assess 

the probability output. We would find: 

( ) 

If this were the case, wouldn‟t it be the case that all wait times have a probability of 0.2 Based 

on the graph, of course. This, however, would be a logical pitfall: if there are infinitely many 

different wait times between 0 and 5 minutes, then the sum of all probabilities would be a sum of 

infinitely many 0.2‟s. In other words, it is only possible for the wait times to have individual 

probabilities of 0.2 if the times were discrete. When we deal with continuous random variables, 

we should actually consider the vertical axis to be density instead of probability. In and of itself, 

density is not a meaningful value, however, in conjunction what we will mention next, it will 

prove to be useful. 

Without going into too much detail, an interval of densities is designed in such a way that the 

area under the function is 1, or 100%. Let‟s reconsider the above graph: 


Density 

0.25 

Time Speng Waiting in Line 

0.2 

0.15 

0.1 

0.05 

0 

0 1 2 3 4 5 


We notice 

. The region underneath the blue line is rectangular. Visually: 

To find the area of a rectangle, we must simply take 

And, so we are able to confirm that 

store has experienced. 

represents all possible wait times this particular 

As you might guess, if we wish to find the probability of a range of values, we would simply find 

the probability between those two values of time. 


Density 

One question does remain, however: what is the probability that the wait time is exactly 

2.5 minutes 

The answer might not come as too much of a surprise: the probability is 0! 

The probability of a single value in a continuous distribution is 0, since there are infinitely many 

possible values. Thus, 2.5 represents 1 of infinitely many values. Take and you get 0! 

We can only find the probability of a non-zero range of values for a continuous random variable! 

Continuous Random Variables 

A continuous random variable is a random variable that has infinitely many possible values 

within a range of real numbers. 

As a result, the probability that a continuous random variable takes on any one specific value is 

0. 

Probability Density Function (PDF) 

The PDF of a continuous random variable is a continuous function such that the total area 

between the function and the horizontal axis is 1. The function‟s input values are the values of 

the random variable, while the output values are densities. Densities are individually meaningless 

values designed so that the total area equals 1. 

Reconsider the above wait-times example: 

0.25 

0.2 

0.15 

Time Spent Waiting in Line 

0.1 

0.05 

0 

0 1 2 3 4 5 



Suppose we wish to find ( 

), that is, the probability that the waiting time is 

between 2.5 and 3.5 minutes. To find this, we simply find the area under the PDF between 2.5 

and 3.5 minutes: 

The area of the rectangular region is: 

Thus, 

( ) 

We can expect to wait between 2.5 and 3.5 minutes with a 20% chance. Thus, approximately one 

in five visits, our wait-time will be somewhere within this interval. 

Similarly, suppose we wish to know: 

( ) 

This is the probability that the wait-time is between 0.3 and 4.4 minutes. We identify this region 

below: 


The area of this region is: 

Thus, there is an 82% chance that the wait-time is between 0.3 and 4.4 minutes. 

5.1.2 Uniform Distribution 

Continuous Uniform Distribution 

When the PDF of a random variable is a constant, we call this a uniform distribution. That is, 

values of the random variable are uniformly distributed. 

The PDF of a random variable, , whose values are in the interval 

is: 

( ) { 

The expected value of this random variable is: 

( ) 

The variance of this random variable is: 

( ) 

( ) 


Density 

Resulting in a standard deviation of: 

√ ( ) 

Example 1: The amount of revenue that a farmers market generates on a given Saturday is 

uniformly distributed between $5,000 and $22,000. 

SOLUTION: 

a. Find the PDF for this random variable. 

b. Find the probability that the between $6,000 and $8,000 is generated. 

c. Find the expected value of this random variable and explain its real-world 

meaning. 

d. Find the standard deviation of this random variable and explain its real-world 

meaning. 

a. The lower limit is and the upper limit is . Thus, 

( ) 

This is constant function is only valid for values between 5000 and 22000. It is valued as 

0 everywhere else. 

0.00007 

0.00006 

0.00005 

0.00004 

0.00003 

0.00002 

0.00001 

0 

Revenue PDF 

5000 22000 

Revenue ($) 

b. We want ( ). The probability will be the length times the width. 


We get: 

( ) 

There is about a 12% chance that revenue earned will fall between $6,000 and $8,000. 

c. The expected value will be: 

This is a simple average. Thus, on average, the farmers market will make $13,500 on a 

given Saturday. 

d. The standard deviation will be: 

On average, revenue will vary by $4,908 less or more than the mean. 

√ 

5.1.3 Other Distributions 


Density 

Without going into detail here, continuous random variables have PDF‟s with area between the 

function and the horizontal axis equal to 1. Clearly, densities will have to be positive, as it is not 

possible to have negative probabilities. 

As an example, a distribution might look like this: 

1.2 

1 

0.8 

0.6 

0.4 

0.2 

0 

0 1 2 

Random Variable Values 

Practically speaking, it appears to be most probable that the random variable will take on a value 

around 1. It is less likely that the random variable will take on values close to 0 or close to 2. 

This might be handy in situations where such criteria is desired. 

Notice that the area is also 1. If you divide the triangle into 2 and use the area of a triangle 

formula . /: 


Then the sum of the two triangular areas is: 

In this next section, we will focus our attention on the most commonly used continuous random 

variable: the normally distributed random variable. 


The first two questions below involve discrete random variables. The aim of these questions is to 

get you thinking in terms of the probabilities of ranges of values. 

1. A pizza shop sells pizzas in four different sizes. The 1000 most recent orders for a single 

pizza gave the following proportions for the various sizes: 

With denoting the size of a pizza in a single-pizza order, the given table is an 

approximation to the population distribution of . 

a. Construct a probability (relative frequency) histogram to represent the 

approximate distribution of this variable. 

b. Approximate ( ). 

c. Approximate ( ). 

d. Find the expected value of .What does this value mean 

e. What is the approximate probability that is within 2 in. of this expected (mean) 

value 

2. Airlines sometimes overbook flights. Suppose that for a plane with 100 seats, an airline 

takes 110 reservations. Define the variable as the number of people who actually show 

up for a sold-out flight. From past experience, the population distribution of is given in 

the following table: 


a. What is the probability that the airline can accommodate everyone who shows up 

for the flight 

b. What is the probability that not all passengers can be accommodated 

3. A particular professor never dismisses class early. Let denote the amount of time past 

the hour (in minutes) that elapses before the professor dismisses class. Suppose that the 

density curve shown in the following figure is an appropriate model for the probability 

distribution of : 

0.20 

0.15 

0.10 

0.05 

2 4 6 8 10 

a. Find the probability density function (PDF) for this random variable. 

b. What is the probability that at most 5 minutes elapse before dismissal 

c. Find ( ). Explain what your answer means. 

d. Find the expected value of this distribution and explain its real-world meaning. 

e. Find the standard deviation of this distribution and explain its real-world meaning. 

f. What is the probability that instructor let‟s out class within one standard deviation 

of the average overtime 

4. A delivery service charges a special rate for any package that weighs less than 1 lb. Let 

denote the weight of a randomly selected parcel that qualifies for this special rate. The 

probability distribution of is specified by the following density curve: 

Density 

0.5 x 

1.5 

1.0 

0.5 

0.0 0.2 0.4 0.6 0.8 1.0 1.2 


Use the fact that the figure can be broken up into the area of a rectangle and the area of a 

triangle, where area of a triangle = ( )( ) and the area of a rectangle = 

( )( ). 

a. What is the probability that a randomly selected package of this type weighs at 

most 0.5 lb. 

b. What is the probability that a randomly selected package of this type weighs 

between 0.25 lb. and 0.5 lb. 

c. What is the probability that a randomly selected package of this type weighs at 

least 0.75 lb. 

d. The probability is defined on the interval . Verify that the area under 

the curve in this region is 1. 

5. A plumbing service is able to respond to off-site emergency calls uniformly between 15 

and 45 minutes. 

a. Find the PDF for this random variable, . 

b. Find ( ) 

c. Find ( ) 

d. Why are both of the above probabilities the same 

e. Find ( ). 

f. Find and interpret the real-world meaning of the expected value. 

g. Find and interpret the real-world meaning of the standard deviation. 

h. What is the probability that the service responds within 1.5 standard deviations of 

the expected time 



5.2 The Normal Distribution 

5.2.1 The Normal Distribution As a Natural Phenomena 

The normal distribution (pictured above), much like the uniform distribution, is a continuous 

distribution. In fact, this distribution is defined for all real numbers. The curve runs from to 

. However, as you might observe, the most likely values occur close to where the density 

function peaks. Values that occur in either one of the “tails” are highly unlikely and, as it 

appears, the density function is very close to the horizontal axis as it extends farther to the left 

and to the right. 

Why do we use this distribution Much like the infamous appears in many natural places, 

many random variables tend to be normally distributed. That is to say, the bulk of values tend to 

occur near the mean and median (both of which are located directly in the center of the 

distribution, since it is perfectly symmetric). For instance, heights of individuals in the United 

States (roughly) follow a normal distribution – there are many people whose heights are near 

average. There are fewer extremely short and extremely tall people in the United States. Thus, 

we would say that the bulk of people are “normal” with respect to their heights. 

While certainly not all random variables are normally distributed, many are. Weights, IQ, newvehicle 

gas mileages (to name just a very few) are variables that have been known to follow a 

normal distribution. As we will later see, any distribution can “become” a normal distribution. 

This is a beautiful phenomenon that allows us to make some important conclusions (more on this 

idea in a later section). 

As before, the overall area under the normal curve is 1 (50% on either side of the mean/median, 

as in the image). To find the area, we would need to use some rather unusual shapes in order to 

apply the same methodology as before. The idea of an integral in calculus would actually allow 

us to find the area exactly, however, the normal curve is modeled by the following pdf: 

( ) 

√ 

( ) 


As you can see, this is a difficult function to work with. Historically, tables have been developed 

with calculated areas, as the calculus was once quite difficult to do. In order to do this, it was 

often necessary to first convert the desired range of values to -scores. Since every normal 

distribution has a different mean and standard deviation, it would be impossible to create a table 

for every possible combination. Instead, since each normal distribution is of the same shape, it 

made sense to create just one table that represented a mean of and a standard deviation of 

. That is, we can think about every distribution as the number of standard deviations each 

score is from the mean. The mean is 0 standard deviations away from the mean (it is the mean!) 

and each unit represents 1 standard deviation. We can think about any distribution this way! 

Normal Distribution Expected Value and Variance 

A normal probability distribution can be modeled by the function 

( ) 

√ 

( ) 

where the 

expected value is , defined as a standard mean, 

∑ 

And variance is 

, defined as a standard variance, 

∑( ) 

IMPORTANT NOTE: and represent the population mean and variance. represents the 

population size. Recall that the sample variance has a divisor of , so that it is an unbiased 

estimator of the population variance. 

Below is an example of what a typical table would look like. We call this a standard normal 

table, since it requires that values between which we would like to know areas are 

“standardized.” This means they are converted to scores prior to using the table: 


As we notice, this table only shows positive scores. A similar table exists for negative scores, 

that is, for values that are less than the mean. The image tells us that each of the entries in the 

center of the table correspond to areas that are to the left of the score we would look up. 

1. In an Arizona town, suppose the heights of adult males is such that inches and 

(so the standard deviation is the square root of this value, ). What is the 

probability that a male is shorter than 72 inches (6 feet tall) 

SOLUTION: We wish to find ( ), where ( ). The normal 

distribution would look like the following: 


We wish to know the area of the shaded region below: 

We first convert the value of 72 to a 

score: 

We round to two decimal places, since the standard normal table can handle up to two decimal 

places. Any additional decimal places would not make a substantial difference. 

We locate by first locating 1.1 along the rows and 0.04 along the columns (since 1.1 + 

0.04 = 1.14). 


The value we find is 0.8729. This means that ( ) . There is an 87.29% chance 

that a randomly selected individual will be less than 72 inches in height. 

What if we wanted to know an area to the right, such as ( ) The table does not provide 

these values. However, if we know that ( ) then the probability of a height 

greater than 72 must be the remaining area, . 

Similarly, if we wish to find the area between two points, we must get creative. 

Suppose we wish to know ( ). We first need to convert both endpoints to scores: 

and 


We can easily find that the probability of a score less than 0.57 is: 0.7157 

The probability of a score less than 1.00 is: 0.8643 

The area between them is the difference in their areas: 


As technology progresses, there is a much lesser need for by-hand computations of the sort 

above. Instead, let us use the web applet from which the above pdf‟s came: 

http://www.rossmanchance.com/applets/NormalCalcs/NormalCalculations.html 

As you can see, we enter the mean and standard deviation in the first section. If we would like to 

plot two functions over one another, we could check the box and enter a second mean and 

standard deviation. 


In the second section, we can check up to two boxes, in the event that we would like to find an 

area between two points. We can either enter values as z-scores or as raw data values ( ). To find 

the probability of a value greater than, we click the grey box to select: 

The probability of such an event is displayed in the “prob” box. If we have two values entered 

and both boxes checked, then the “probability between” these two values is displayed. Isn‟t this 

much more intuitive and convenient than using tables 

NOTE: One limitation of the above applet is that 

a bit of finagling. 

values rounded to two decimal places require 

Homework Problems – 5.2 

Use the applet mentioned in this section to complete these exercises. You are not required to use 

the standard normal table. 

1. In the United States, IQ‟s are normally distributed with and . 

a. What is the probability that a person has an IQ lower than 130 

b. What is the probability that a person has an IQ between 80 and 110 

c. What is the probability that a person has an IQ between 50 and 70 

d. What is the probability that a person has an IQ above 120 

2. In the UK, birth weights are approximately normally distributed with lbs. and 

lbs. (SOURCE: http://www.healthknowledge.org.uk). 

a. Find and explain the real-world meaning of ( ). 

b. Find and explain the real-world meaning of ( ). 

c. Find and explain the real-world meaning of ( ). 

d. Find and explain the real-world meaning of ( ). 

e. What weight is such that 20% of infants weight less than this amount (HINT: 

You can still use the calculator applet.) 

3. In a recent years, Scholastic Aptitude Test (SAT) scores for all college-bound seniors in 

the United States was such that points and points (SOURCE: 

http://www.collegeboard.com) . 

a. 50% of students scored less than how many points 

b. 50% of students scored more than how many points 

c. In order to be in the top 10% of SAT-takers, what score would one have to 

achieve 

d. What score do the lowest 10% score between 

e. The middle 50% of students scored between what two values 

4. Sketch a normal distribution and . Label the mean, standard deviations, 

standard deviations, and standard deviations. 


a. Determine the probability that an observation falls within each of these standard 

deviation ranges. 

b. The Empirical Rule describes the probability of scores within 1, 2, and 3 standard 

deviations of the mean. Do a web search on this topic and compare it to your 

answer in the above part. Are the results the same 

5. Suppose a distribution is such that and . 

a. What would happen to the distribution if was changed to 60 

b. What would happen to the distribution if was changed to 10 There are two 

effects to describe. Discuss why it makes practical sense that these two things 

should happen to the curve. 

c. What would happen to the distribution if was changed to 2 There are two 

effects to describe. Discuss why it makes practical sense that these two things 

should happen to the curve. 

d. Describe the effects, in general, of and on the shape and location of a normal 

distribution. 


Chapter 6 

Sampling Distributions and Estimation 

When it is only our dataset that is of interest, we use descriptive statistics. This is precisely the 

trouble we have been up to so far! Often times, however, we cannot collect all elements in the 

population. Take, for example, a poll to gauge Americans‟ opinion of a candidate in office. 

Certainly, you cannot sample all voting-age adults. This is easily resolved with a manageable 

random sample, but is further complicated by the following idea: sampling variability! 

We will work to answer the following question: 

How do we estimate true population parameters using a random sample, all the while taking into 

account the fact that our sample statistic is variable from sample-to-sample 

This is the purpose of inferential statistics and is a very important aspect of understanding the 

structure of an underlying population. With many advances in statistics, it is possible to make 

precise claims about our population. 

6.1 Sampling Distribution for ̅ 

6.1.1 What is a Sampling Distribution 

The hard-cold truth is that, when working with statistical inference, we likely have no idea what 

the underlying probability distribution for the population looks like. If we did, then we wouldn‟t 

have to draw a random sample and would be nearly done with this course. Since we don‟t, we 

can‟t in good conscience assume that the distribution is normal. So, why spend time studying 

such a distribution We will soon experience why. 

Let‟s start with an example that is concrete. 

Suppose we roll a die. Without too much effort, we can produce the probability distribution for 

the population of all possible outcomes. Here it is: 


Probability 

0.18 

0.16 

0.14 

0.12 

0.10 

0.08 

0.06 

0.04 

0.02 

0.00 

Probability Distribution for Single Die Roll 

1 2 3 4 5 6 

Die Value 

In words, the probability of getting any one face value on a die roll is about 0.17 or 1/6. The 

distribution is uniform. 

If we found the expected value (the average), we would get: 

, - ( ) ( ) ( ) ( ) ( ) ( ) 

(NOTE: This is the same as 

since each event is equally likely) 

The variance of this population requires us to use the population standard deviation formula 

(remember, division by occurs if we are dealing with a sample, so that we have an 

unbiased estimate for the population standard deviation). That is: 

, - 

∑( ) 

Using Excel we find that: 

1 

2 

3 

4 

5 

6 

=VAR.P(A2:A7) 

which give: 

1 

2 

3 

4 

5 

6 

2.916666667 


Thus, the standard deviation would be √ 

, meaning that, on average, we would 

expect the die value to deviate by 1.708, or nearly 2 units from the average (1.5 to 5.5, which is 

pretty much 1 to 6). 

Thus, we have that: 

In reality, keep in mind that we would often not know much about our population. We get the 

luxury of studying something we can fully explain. This is all in an effort to better understand 

sampling distributions. 

Suppose we conducted an experiment of rolling the die 10 times. For one random sequence, we 

might obtain the following result: 

4 6 

3 4 

4 1 

3 4 

1 2 

Not surprisingly, we get a fairly even spread of values 1 – 6. If we are to compute the average, 

we would obtain 3.2. That is if all rolls came up as the same number, each roll would be 3.2. 

Suppose we asked 19 other people to roll a die 10 times and to then report back to us the mean. 

Here is what we might find (based on a computer simulation of rolls): 


First off, we notice there is sampling variability. Not every person obtained the same average 

outcome from 10 tosses each. This is expected, since the process is a random one. 

The distribution of these means is called a sampling distribution. 

Sampling Distribution 

The distribution of sample statistics (such as ̅) computed from repeated sampling is called a 

sampling distribution. 

6.1.2 The Central Limit Theorem 

20 Means 

3.1 

3.3 

2.4 

3.5 

2.7 

2.9 

2.9 

3.6 

3 

4.7 

3.6 

3.2 

3.9 

2.8 

3.2 

3.3 

3.9 

3.3 

3.5 

3.1 

We do notice that the means tend to gravitate towards 3.5. Some, as expected, deviate from this 

value. 

Let us now consider a histogram for this sampling distribution of sample means: 


1.7 to 1.8 

1.8 to 1.9 

1.9 to 2 

2 to 2.1 

2.1 to 2.2 

2.2 to 2.3 

2.3 to 2.4 

2.4 to 2.5 

2.5 to 2.6 

2.6 to 2.7 

2.7 to 2.8 

2.8 to 2.9 

2.9 to 3 

3 to 3.1 

3.1 to 3.2 

3.2 to 3.3 

3.3 to 3.4 

3.4 to 3.5 

3.5 to 3.6 

3.6 to 3.7 

3.7 to 3.8 

3.8 to 3.9 

3.9 to 4 

4 to 4.1 

4.1 to 4.2 

4.2 to 4.3 

4.3 to 4.4 

4.4 to 4.5 

4.5 to 4.6 

4.6 to 4.7 

4.7 to 4.8 

4.8 to 4.9 

4.9 to 5 

5 to 5.1 

5.1 to 5.2 

5.2> 

2.4 to 2.65 

2.65 to 2.9 

2.9 to 3.15 

3.15 to 3.4 

3.4 to 3.65 

3.65 to 3.9 

3.9 to 4.15 

4.15 to 4.4 

4.4 to 4.65 

4.65 to 4.9 

4.9 to 5.15 

5.15> 

6 

Sampling Distribution of x-bar 

5 

4 

3 

2 

1 

0 

This is quite interesting… we have obtained a distribution (of means) that appears somewhat 

bell-shaped. 

Suppose now that we had a total of 1000 people roll a die 10 times each, and to then compute the 

sample mean. Here is what a simulation of this process would look like: 

100 

90 

80 

70 

60 

50 

40 

30 

20 

10 

0 


Wow! Our distribution of means for 1000 individuals for experiments of 10 rolls each produces 

something remarkably like a normal distribution. Additionally, it appears that the mean of this 

distribution is around 3.5! 

Let‟s try this again, but now, let‟s say that 1000 individuals each roll a die 20 times, and each 

individual computes a sample mean. This simulated event would produce the following 

distribution of die-roll average: 


2.2 to 2.3 

2.3 to 2.4 

2.4 to 2.5 

2.5 to 2.6 

2.6 to 2.7 

2.7 to 2.8 

2.8 to 2.9 

2.9 to 3 

3 to 3.1 

3.1 to 3.2 

3.2 to 3.3 

3.3 to 3.4 

3.4 to 3.5 

3.5 to 3.6 

3.6 to 3.7 

3.7 to 3.8 

3.8 to 3.9 

3.9 to 4 

4 to 4.1 

4.1 to 4.2 

4.2 to 4.3 

4.3 to 4.4 

4.4 to 4.5 

4.5 to 4.6 

4.6 to 4.7 

4.7> 

120 


100 

80 

60 

40 

20 

0 

The distribution looks a bit more normal. Upon closer inspection, we also see that the variability 

of these averages is smaller. That is: 

Approximate Range for Means of 10 Tosses: 2.1 to 5.2 

Approximate Range for Means of 20 Tosses: 2.5 to 4.6 

We notice that increasing the sample size ( ) has decreased the sampling distribution‟s 

variability. 

In fact, the standard deviation for the distribution of means computed from 10 and 20 tosses is 

about 0.52 and 0.38, respectively. 

Let‟s do one more experiment. Let‟s say that 1000 individuals each roll a die 30 times, and each 

individual computes the mean of his/her rolls. The sampling distribution of means would look 

like this (based on simulation): 


2.4 to 2.5 

2.5 to 2.6 

2.6 to 2.7 

2.7 to 2.8 

2.8 to 2.9 

2.9 to 3 

3 to 3.1 

3.1 to 3.2 

3.2 to 3.3 

3.3 to 3.4 

3.4 to 3.5 

3.5 to 3.6 

3.6 to 3.7 

3.7 to 3.8 

3.8 to 3.9 

3.9 to 4 

4 to 4.1 

4.1 to 4.2 

4.2 to 4.3 

4.3 to 4.4 

4.4 to 4.5 

4.5 to 4.6 

4.6 to 4.7 

4.7 to 4.8 

4.8 to 4.9 

4.9> 

140 


120 

100 

80 

60 

40 

20 

0 

Again, we notice the bell-curved shape and the decreased range of means (about 2.6 to 4.4)! 

Let‟s summarize: 

Distribution Type 

Original Die Values 

UNIFORM 


Of 10-Roll Means 

NORMAL 



NORMAL 



NORMAL 

Distribution Mean Distribution Standard Deviation 

3.5 1.7 

3.5 0.52 

3.5 0.38 

3.5 0.32 

We can very easily see that the expected value of the sampling distribution is the same as , the 

expected value of the population distribution. That is: 

, ̅- 

But, what is the relationship of the standard deviations of the means in relation to the standard 

deviation of the population of die roll value! 

This is not so clear. Statisticians, after much research, found that the standard deviation of each 

of the sampling distribution is related to the sample size in the following way: 


, ̅- 

√ 

For example, 

√ 

That is very close to the 0.52 we obtained! 

Similarly, for our sample of size 20, 

√ 

This one happens to be fairly spot-on! 

An finally, for our sample of size 30, 

This is again very close to our obtained 0.32! 

√ 

The reason for this difference is simply due to randomness, and estimates can be improved more 

(if desired) by increasing the number of “individuals rolling the die.” 

What we have observed here is formally known as the Central Limit Theorem. 

Central Limit Theorem 

Regardless of the distribution of a random variable, , if we take repeated random samples from 

this distribution of and compute the mean, ̅, for each sample, then the following will 

hold: 

1.) The distribution of ̅ will be approximately normal 

2.) , ̅- 

3.) , ̅- 

√ 

(NOTE: A sample size of at least 30 is a rule-of-thumb and can vary slightly depending on the 

severity of skews and abnormalities in the distribution. For even severely skewed distributions, 

the approximate shape is typically normal.) 

6.1.3 Why the Central Limit Theorem 


The Central Limit Theorem (CLT) has some very powerful, but subtle results. 

First of all, we do not need to understand the shape of the underlying distribution from which we 

are sampling. This is an amazing result in-and-of itself, since we usually have little to know 

information about the population itself (again, if we did, we wouldn‟t be wasting our time with 

any of this!). 

Secondly, since the resulting sampling distribution is approximately normally distributed, we can 

proceed to calculate probabilities using the normal distribution. This is also great, since we 

already have the background in that process! 

Example 1: After experimentation, researchers believe that the mean lifespan of a strain of 

bacteria is days with days. Due to the complexity of the bacteria, the shape 

of the distribution of bacteria lifespans is unknown. A sample of 60 bacteria strains is 

collected. 

a. Does the CLT apply here 

b. Calculate the probability that the sample mean lifespan, ̅, is less than 3 days. 

SOLUTION: 

a. Since the sample size is 60, we should be safe in assuming that the sampling distribution 

of all means is normally distributed with mean and standard deviation √ 

. 

b. We want ( ). Using our probability calculator 

Given the very small level of variability in the sampling distribution of lifespan means, 

we would consider observing an average smaller than 3 feasibly 0. 

6.1.4 Limitations of the CLT 


One major oversight of our excitement with this idea is the notion that we would actually know 

the true population mean, , and the true population standard deviation, . If we have limited 

information about our population, then we certainly would not know these values. In the next 

parts of this chapter, we will learn how to use our sample to make these predictions about the 

population. Though similar in conceptual nature, it is not as straightforward as replacing with ̅ 

and with . 


1. In your own words, what does the Central Limit Theorem tell us 

2. In your own words, why is the Central Limit Theorem a very powerful practical result 

3. A sample of size 36 is taken from a population distribution of unknown shape, though the 

mean is believed to be 100 with a standard deviation of 18. What is the probability that 

the sample mean is: 

a. Greater than 102 

b. Less than 98 

c. Between 95 and 105 

d. Between what two values will the middle 90% of means be 

4. A stained glass company produces panes of glass with a mean thickness of 0.42 inches 

and a standard deviation of 0.04 inches, if produced properly. Suppose a random sample 

of windows reveals a sample mean of 0.43. 

a. What is the probability of this average, or a larger average 

b. Given the probability you have computed, what can be said about recent 

production standards 

5. Promote Marketing has a research team to research new marketing tactics to propose to 

potential clients. A group of 40 clients have been invited for a conference to be put on by 

the marketing firm. The research team usually generates 

in revenues for 

each member of the team with . 

a. What will be the shape of the distribution of ̅ How do you know 

b. What is the probability that average sales will exceed $420,000 for this particular 

event 

c. How would your answer change if 100 clients were to show up 

d. If the team (300 people) have an average revenue that is in the 90 th percentile of 

revenues, they will earn 4-days of paid vacation. What average sales would be 

required for this 

6. A computer simulation reveals that a distribution of average incomes in a sample of 500 

has a standard deviation of $130. What is the standard deviation for the population of all 

incomes Interpret the result you get in real-world terms. 

7. Use the Excel Sampling Distribution Applet to address this problem. In a population, it is 

found that 30% of homes have 5 rooms, 40% have 4 rooms, and 30% have 3 rooms. You 


can set this up in our applet by having a “die” with 10 values: three 5‟s, four 4‟s, and 

three 3‟s. 

a. What is the average number of rooms a home has in this population What is the 

standard deviation in the number of rooms in this population 

b. Now, suppose you take a sample of size 30 from this population. What shape will 

the distribution have and how do you know 

c. Take 1,000 random samples each of size and compute the 1,000 sample 

means. According to the applet, what is the average of the average rooms in the 

sample What is the standard deviation in the average number of rooms in a 

house Compare these two results to what the Central Limit Theorem says we 

should come up with. That is, find , ̅- and , ̅-. 

d. Take 1,000 random samples each of size and compute the 1,000 sample 





e. Take 1,000 random samples each of size and compute the 1,000 sample 





f. Why do the values in the population have the highest standard deviation when 

compared with the distribution of means in the last there parts 

g. What is the probability that, in a sample of 100 homes, the average number of 

rooms is greater than 5 

h. Explain in practical terms why the standard deviation of any ̅ distribution 

decreases as the sample size increases. 

6.2 Confidence Interval for ̅ 

6.2.1 Confidence Interval for ̅ Using Sampling Distributions 

As discussed previously, our ultimate goal is to make inferences about the population parameter 

. Again, keep in mind that this is the only reason why we are spending time on this! Otherwise, 

we would have completed our semester early! 

When we generate our sampling distribution for ̅ we see very vividly that our sample means are 

subject to sampling variability, depending on which “die values” are “rolled” for each individual 

sample of size . Thus, we should be very skeptical of concluding that ̅ is representative 

of the true population mean. However if we have many, many “individuals roll the die,” we 

should get a fairly reasonable understanding of a range of values for the true value of . Let‟s 

consider an example. 

Suppose we want to better understand a population of ages of people in a town. 


1 1 18 22 25 27 30 18 21 2 

3 19 20 32 20 25 29 32 33 40 

29 25 29 24 23 29 29 26 27 1 

31 32 31 31 35 33 30 32 31 33 

19 20 22 21 20 20 19 22 22 9 

23.46 

9.250319 

But, wait! Let‟s pretend that we actually don‟t have access to the entire population of values 

(yes, we clearly see them in the table above, but we normally do not have that luxury). Due to 

limited time and money, you are only able to sample 30 of these values. After taking a random 

sample, here is what you have chosen: 

32 31 31 35 19 20 22 21 20 20 

20 25 29 32 33 19 19 19 18 22 

25 27 30 18 21 33 30 32 31 33 

̅ 

25.56667 

5.870342 

Again, at this point, we would have no way of telling how close we are to the actual mean of 

23.46. 

To get a good estimate of , we will come up with a confidence interval. A confidence 

interval is a range of values such that there is an probability that the true population mean, , 

is between those values. 

How do we calculate this Here is our motivation for what is to come: 

There are two ways to think about inferential statistics: 

1) Use theoretical results and make conclusions using them 

2) Build a sampling distribution for the statistic of choice ( ̅ or ̂) using the Bootstrap 

Method and make conclusions using this empirical data. 

We will draw parallels between the two regularly. 

Here is the basic idea of Bootstrap Sampling: 

1) From the population, take a random sample, preferably of size 30 or greater. The larger 

the random sample, the more power we have in making inferences about the population. 

2) If this is a truly representative sample, then we can think of it as a “mini” population that 

acts and behaves according to the population as a whole. This is a key ingredient! 


3) We cannot use this sample to calculate the corresponding parameter because of sampling 

variability. However, if this sample behaves like the population, then we can resample 

from it and get an idea of the overall variability. That is, draw a sample of the same 

sample size from this “mini” population, but do so with replacement. This is the same 

idea as rolling a die a fixed number of times – we are sampling with replacement from 

the population 1,2,3,4,5, 6. What will this do It will account for sampling variability, if 

repeated. 

4) Calculate the statistic from this sample and record it. 

5) Repeat steps 3) and 4) 1,000 to 10,000 times. We now have a sampling distribution and 

can make estimates about the true population parameter. And, guess what this distribution 

will look like You guessed it – it will be approximately normal, by the Central Limit 

Theorem. 

Below is a diagrammatic representation of steps 1) – 5): 

Sample 1 

Sample 2 

Sample 3 

Population 

Random 

Sample, 

Sample 4 

. 

. 

. 

Sample 10,000 

Some of the assumptions we make are indeed dangerous. For example, do we really have a mini 

population If the answer is “no,” then theoretical results are equally worthless since they, too, 

assume that the sample is representative. 


22.2666666666667 to 

22.7666666666667 

22.7666666666667 to 

23.2666666666667 

23.2666666666667 to 

23.7666666666667 

23.7666666666667 to 

24.2666666666667 

24.2666666666667 to 

24.7666666666667 

24.7666666666667 to 

25.2666666666667 

25.2666666666667 to 

25.7666666666667 

25.7666666666667 to 

26.2666666666667 

26.2666666666667 to 

26.7666666666667 

26.7666666666667 to 

27.2666666666667 

27.2666666666667 to 

27.7666666666667 

27.7666666666667 to 

28.2666666666667 

28.2666666666667 to 

28.7666666666667 

28.7666666666667 to 

29.2666666666667 

29.2666666666667 to 

29.7666666666667 

29.7666666666667> 

Now, back to our example… 

If we have truly collected a random sample, then we should be able to think about the sample as 

a small population. If this is a small population, then we should be able to sample from it. We 

will draw random samples of size from the small “population” which is also of size 

. Sounds strange, but we will sample with replacement, so it is possible to resample the 

same value multiple times. 

We will draw 1,000 samples of size from this “population” and, as you might have 

figured, we will calculate the mean of each and build the sampling distribution for ̅. 

200 

180 

160 

140 

120 

100 

80 

60 

40 

20 

0 


As we should expect based on CLT, the distribution of these 1,000 means is approximately 

normal. 

Let‟s suppose that we want to have an interval within which there is a 95% probability that the 

true population mean, , lies. This is the same as looking for the middle 95% of means! 


Thus, we need to find the lower and upper limits for this interval by finding the 2.5 percentile 

and the 97.5 percentile. In Excel, we can do this by using the percentile() function. We get: 

Upper (97.5 percentile): 27.50 

Lower (2.5 percentile): 23.60 

Thus, we can say that we are 95% confident that the true population mean is between 23.6 years 

and 27.5 years. In other words, there is a 95% probability that we have “trapped” the population 

mean between our lower and upper limit. Said one other way, 95% of all sample means, when 

the variability from sample to sample is taken into account, are between these lower and upper 

limits. If this is representative of the population, then we should believe that 95% of the time, we 

will have means between these two values. 

What if we wanted to be 99% certain We would need to find lower and upper limits so that 

there is only 1% in the tails: 

Thus, we would like 0.01/2 = 0.005 (or .5%) in each of the two tails. To find the lower and upper 

limits, we would need to find the 0.005 percentile and the 1-0.005 = 0.995 percentile. We get: 

Upper (97.5 percentile): 28.17 

Lower (2.5 percentile): 22.83 

Thus, we are 99% confident that the true population mean age, , is between 22.83 years and 

28.17 years. In other words, there is a 99% probability that the true mean age is between 22.83 

and 28.17 years. 

If we want to be more confident, we need to expand our interval of values! 

Note that in only one of our confidence intervals (99%), we have captured the true mean within 

our range. This is very likely, since our confidence percentage is very high. BUT, keep in mind 

that we never know what the true mean is! Thus, we cannot say that it would have been better to 

stick with the wider 99% interval. After all, there is a 1% chance we might have made an error. 

The level of confidence that we desire depends on the situation and the allowable mean width we 

are willing to tolerate. More confidence means wider possibilities. In general, we never know 


whether or not we have captured the true mean in our interval. On the upside, there is a 

probability associated with it! 

As a final note, it is interesting that we actually missed the true mean in our 95% confidence 

interval, since there is only a 5% chance of error. Keep in mind, however, that this interval was 

based on simulation. It is based on 1,000 samples and may have been better to increase the 

number of samples. 

6.2.2 Confidence Interval for ̅ Using Theoretical Results – When and are Unkown 

In the previous section, we found that the sampling distribution of ̅ with is 

approximately normal with , ̅- and , ̅- . As a bit of notation, if a random variable 

has a normal distribution with mean and standard deviation, we would write: 

√ 

̅ ( 

√ 

) 

This reads, “ -bar is normally distributed with mean and standard deviation √ 

.” 

This, however, assumes that we know something that we probably don‟t – the population mean 

and standard deviation! 

As you might guess, we will use ̅ and √ 

to approximate these. This proposes a problem: we are 

introducing more error. In order to account for this, the normal distribution is not appropriate. 

When using these approximations, we must use the theoretical Student’s Distribution. This 

distribution looks much like the normal distribution, but is constructed by sample size, not the 

mean and standard deviation. Below is a comparison of the -distribution in comparison to the 

standard normal distribution for size . 


We see that the standard deviation (in red) is just slightly larger than that of the standard normal 

(in blue) – it is about 1.0339. So, as sample size gets greater, the -distribution begins to look 

more like a standard normal. BUT, look at the one below where sample size is 10: 

The variability is nearly 14% greater. 

As we mentioned, this distribution‟s shape relies on the sample size. The relationship is called 

the degrees of freedom and can be calculated as 

, that is degrees of freedom is equal 

to one less than the sample size. 

So, in our previous example, we had a sample size of 30, so 

In a probability calculator, we would enter 29 for the degrees of freedom: 


̅ 

This will work much like the standard normal distribution. It, too, functions in displaying 

standard deviations. That is, the mean is 0 standard deviations away from the mean. We can to 

know the number of standard deviations to the left and to the right of the mean we need to travel, 

in order to “trap” 95% of the distribution. 

We use the calculator: 

Thus, we would expect 95% of sample means to be within 2.045 standard deviations of the 

mean. In other words: 

√ 

Or: 


√ 

The lower limit is: 

√ 

And the upper limit is: 

Thus, we are 95% confident that the true average age in this town is between 23.4 and 27.8. 

Notice that this is not very much different than our simulated confidence interval of 23.6 to 27.5. 

So, which is more precise This is arguable, but it is difficult to argue with empirical data. 

Personally, I prefer the bootstrap confidence interval we ran earlier. My reasoning is that a 

distribution of means is asymptotically normal, meaning that, under infinitely many sampled 

units, the distribution would be exactly normal. This is very theoretical and not always valid. 

For now, we will compare both. 

For the 99% confidence interval, theory produces the following: 

√ 

We would now simply adjust the number of standard deviations to 2.756: 

Lower limit: 

√ 


√ 

Upper limit: 

√ 

Similarly, there is a 95% chance that the population mean age is between 22.6 and 28.5. 

Compare this to our empirical result above of 22.8 to 28.2. We are, again, very close. 


1. Describe, in your own words, what a bootstrap distribution is and why we would want to 

use one. Be sure to mention the logical process behind building one, as well as the 

assumptions we are making when we do so. 

2. What is a confidence interval Explain in your own words. 

3. The following is a random sample of 10 labor costs associated with farming for civilian 

consumers (in billions of dollars) since 1970. 

Labor Costs (bill. $) 

229.9 303.7 

137.9 58.3 

81.5 196.6 

36.6 168.4 

122.9 347.4 

(SOURCE: Data randomly sampled from U.S. Statistical Abstract, Table 847) 

a. Does the Central Limit Theorem apply for this data Why or why not 

b. Using a bootstrap distribution, calculate a 95% confidence interval for , the true 

population average labor cost. 

c. In a complete sentence, interpret the real-world meaning of this value. 

d. Using the bootstrap distribution and percentiles, how likely is it that a sample of 

labor costs has a mean greater than $190,000,000,000 

4. In Arizona, primarily the Phoenix Metropolitan area, the issue of red-light cameras used 

to catch red-light runners and speeders was a prominent one for much of the early 2000‟s. 

Many studies were carried out over this period of debate to determine whether or not they 

were effective, and whether or not they used taxpayer money appropriately. Suppose the 


following data was collected on the revenue generated by randomly sampled red-lights 

across the valley. The goal is to have, on average, each camera generate $750 and no less 

than $640 per day. 

883 522 590 779 887 615 690 771 843 509 

872 840 536 892 880 588 547 770 687 842 

832 840 676 555 884 617 517 586 505 552 

a. Can the state be 95% confident that the desired average is possible 

b. Generate a 99% confidence interval for , the population average daily revenue 

per camera. Explain in a complete sentence what this means. 

c. Is the CLT valid in this problem Explain. 

d. Using the assumption that the distribution of ̅ is normally distributed, calculate a 

theoretical 95% confidence interval for (you will need to estimate the 

√ 

standard deviation of ̅‟s and ̅ to estimate . 

e. In reality, anytime we estimate parameters, like you did above in part d), we 

actually shouldn‟t assume a normal distribution. Instead, we should assume what 

is known as a -distribution, which is symmetrical, though has more variability to 

account for the uncertainty in our estimates. 

Watch this brief informative video: 

http://www.youtube.com/watchv=yV-0ReCXW64 

Pull up the following applet: http://www.stat.tamu.edu/~west/applets/tdemo.html. 

You can type in the percentile corresponding to means you want to consider. 

stands for “degrees of freedom” and can be calculated by taking the sample size 

minus 1 ( ). (From the video, we know that, if the sample size is really, really 

big, then the difference between the normal distribution and t-distribution 

becomes indistinguishable.) The output of this applet will give you the number of 

standard deviations your endpoints will be on either side of the mean. 

For example, you will find that a 99% confidence interval for a sample of size 100 

has endpoints that are 2.626 standard deviation from the mean (left and right). 

Let‟s say your sample mean is ̅ and standard deviation . Then, the 

confidence interval will be an interval around the sample mean. That is, one 

standard deviation is √ √ 

(remember, the standard deviation of means 

requires that we divide the standard deviation among individual ‟s and divide by 

the square root of the sample size). So, 2.626 standard deviations would be 

2.626(0.5) = 1.313 units away from the mean. The endpoints would be 40 – 1.313 

and 40 + 1.313, or 38.687 to 41.313. 

Formulaically, we found: 


̂ 

̅ 

√ 

Where is the number of standard deviations endpoints for a confidence 

interval with total area in the tails. i.e. 

Using this “crash course” in theoretical confidence interval-finding, compute the 

95% confidence using these ideas. Do you get a similar result How close 

6.3 Confidence Interval for ̂ 

6.3.1 Confidence Interval for ̂ Using Sampling Distributions 

Suppose that it is of interest to estimate the proportion of recent customers that say they would 

come back and shop at your store. You take a sample and determine that, of 30 people, 20 said 

they would and 10 said they wouldn‟t. You would like to make an inference about the population 

of all of your customers. In your sample, you know that: 

Is the proportion of your customers that will come back and purchase from you again. You are 

looking to find a confidence interval for ̂. How do we do that with the simulator if we have no 

data 

In reality, we do. We just have to make it numerical. In reality, 20/30 is an average. It is the 

average of 30 responses. If we let: 


{ 

So, we have a set of twenty 1‟s and ten 0‟s. We enter these in to our simulator. 

We run the bootstrap sample on these 1‟s and 0‟s 1,000 times. We will get a variety of sample 

proportions: 


0.433333333333333 

to 

0.483333333333333 

0.483333333333333 

to 

0.533333333333333 

0.533333333333333 

to 

0.583333333333333 

0.583333333333333 

to 

0.633333333333333 

0.633333333333333 

to 

0.683333333333333 

0.683333333333333 

to 

0.733333333333334 

0.733333333333334 

to 

0.783333333333334 

0.783333333333334 

to 

0.833333333333334 

0.833333333333334 

to 

0.883333333333334 

0.883333333333334 

to 

0.933333333333334 

0.933333333333334 

to 

0.983333333333334 

0.983333333333334> 

We see that this distribution is approximately normal. No surprise there! 

350 

300 

250 

200 

150 

100 

50 

0 

Sampling Distribution of p-hat 

We calculate the 2.5- and 97.5-percentiles to get the middle 95% of sample proportions 

generated in the bootstrap sample: 

(As %) 

Results 

Percentile 1: 97.5 0.833 

Percentile 2: 2.5 0.500 

Thus, we are 95% confident that the proportion of the population of customers that will shop at 

your store will between 0.50 and 0.83. This is quite a wide interval! At least you know what to 

expect with 95% confidence! 


DULY CAUTIONED: The assumptions here are the same as for bootstrapping with ̅: a 

random sample is drawn from the population and is representative of the population. If not, the 

sample is worthless, in any case. 

6.3.2 Confidence Interval for ̂ Using Theoretical Results 

Without providing the intuition for this method, we will simply state the results for the CLT 

pertaining to the sampling distribution of ̂: 

Central Limit Theorem for ̂ 

The sampling distribution of ̂ (which is really just an average of 0‟s and 1‟s) is approximately 

normal just as long as (similar idea as for the standard CLT). 

With 

̂ 

( ̂) 

, ̂- 

, ̂- √ 

̂( ̂) 

NOTE: the standard deviation is often referred to as the margin of error in polls. 

The results above state that, 

1. the average proportion of the sampling distribution is the true population proportion. 

2. The standard deviation of proportions of the sampling distribution is the above, complex, 

calculation. 

AS LONG AS ̂ and ( ̂) , both of which are 

true statements. We can now proceed: 

Here, we get to use the standard normal distribution to calculate the number of standard 

deviations corresponding to the desired interval. So, we know that: 

̂ 


, ̂- √ 

( ) 

The number of standard deviations corresponding to the middle 95% of a standard normal 

distribution is calculated below: 

Thus, these endpoints are approximately 1.96 standard deviations away from the mean. So, our 

confidence interval would be: 

̂ √ 

̂( ̂) 

In our case: 

Lower limit: 

Upper limit: 

These limits are nearly identical to the simulation values! 



1. In a sample of 55 students from Arizona State University taking a political science class, 

30 say they would be interested in taking another political science class. The university is 

interested in determine the proportion of all its students that are interested in taking 

another political science class. 

a. What is the population of interest in this study 

b. Construct a 90% bootstrap confidence interval for, , the true proportion. 

c. Interpret the real-world meaning of your confidence interval. 

2. A software company takes a random sample of recent orders and finds that, of the 250 

sampled, 42 resulted in the return of a piece of purchased software. 


b. Construct a 99% bootstrap confidence interval for, , the true proportion. 


3. A batch of apples was inspected prior to shipment for any defects. Each apple was 

marked as either pass (P), re-inspect (R) or fail (F). The following results were reported. 

F P P P P P P P R R 

P P R P R R P R P P 

P R P R P F R R P P 

P P P P P P P R P P 

P P P F P R P P P R 


b. Construct a 95% bootstrap confidence interval for, , the true proportion of 

passing apples. 


d. Using the CLT for ̂‟s, construct a 95% confidence interval (see blue box in this 

section). How does it compare to the bootstrap confidence interval 

Chapter 7 

Hypothesis Testing 

We are often faced with uncertainty. Specifically, we often want to know whether one product is 

better than the other, whether one group outperforms another in some type of task, or how one 

manufacturing process compares to another, among many other things. How can we ever know 

The first step would be to conduct a study and collect data. The data must then be compared. 


But, how do we do so if there exists variability from one sample to the next This chapter will 

address this question 

7.1 The Concept Behind Hypothesis Testing 

So, you have a research question… what now The question might at first seem obvious: let‟s 

run a study. This question, however, needs some special treatment before anything else happens, 

especially if the study comes at a significant cost. 

For instance, suppose we‟re interested in determining whether pesticides damage the soil in 

which we grow the majority of our food. This is a loaded curiosity. We first need to fully define 

how it is that we would conduct such a study. For instance, will be comparing two regions, one 

that has been sprayed with pesticides and one that hasn‟t been sprayed What is it, exactly, that 

we will measure in order determine the level of soil damage 

First and foremost, we need to formulate a hypothesis, or a belief about what it is that we expect 

to see. For example, 

Our hypothesis is that pesticides inflict serious damage on sprayed soils 

Great, so we know what we believe. Did we just state what we wanted to happen Probably not. 

We‟ll usually formulate a hypothesis based on some existing observations. Perhaps we‟re seeing 

that plants aren‟t producing as many edibles as previously thought. Or, maybe we‟re finding 

rising levels of cancers. (By the way, all of the above are becoming eminent public concerns in 

the U.S. and beyond.) So, based on these observations, we‟re forming an educated belief on the 

effect of pesticides. 

The next critical question: 

How will we measure “soil damage” 

This can be a controversial question and may lack a consensus of an answer. Will it be measured 

by the quantities of beneficial microbes present in the soil By the soil‟s pH level By the 

amount of nitrogen it contains 

However we choose to measure “soil damage,” we want to be sure that we are being accurate. 

That is, we need to be sure that we are actually measuring what we say we‟re measuring. This 

sounds infantile, but it happens all the time that researchers say they‟re measuring something that 

they‟re not actually measuring. 

So, suppose we do some research and conclude that we test for soil damage by determining the 

weight of vegetables harvested from these plants and comparing the average weight per plant for 

the experimental group (some determined quantity of pesticides sprayed). We find that healthy 

plants produce about 30 lbs. of some vegetable across their seasonal life span. Will the average 

plant yield for plants sprayed with pesticides be lower 


Since this is a mathematical question, we would want to formulate our hypothesis into 

mathematical statements. 

Since we are dealing with an average in this scenario, the statistical symbol often used to 

represent the average plant yield for the entire population of this particular vegetable is the 

Greek letter Mu, . 

Now, our experimental hypothesis is that pesticides damage the soil, measured by the pounds of 

vegetables yielded from these plants. If that is the case, we would expect to see a yield of less 

than 30 lbs. of fruit per plant. That is, our hypothesis is that 

Since this is the experimental hypothesis, we have no evidence to conclude that this is true. Thus, 

we should probably assume that there is no difference between the yields of pesticide-sprayed 

and non-sprayed plants. Thus, begin by assuming that: 

This second hypothesis is called the null hypothesis, that is, the hypothesis that is assumed until 

there is sufficient evidence otherwise. Symbolically, this hypothesis is written and is typically 

read as “null hypothesis,” or “h-naught.” 

The hypothesis that we believe is called the alternative hypothesis, and is written 

, or “h-ay.” 

To write these two hypotheses, we would write: 

When evidence is insufficient, we say 

“Based on sample data, we fail to reject in favor of ” 

When evidence is sufficient to conclude that the average is really below 30, we say 

“Based on sample evidence, we reject in favor of ” 

We are cautious to make these conclusions based on sample data. Certainly, we may have 

obtained an oddball sample that doesn‟t represent the population. 

Let‟s practice writing some hypotheses. First, off, let‟s make note of the variety of population 

characteristics, called population parameters, that we can seek to describe in a study. 


Population Parameters 

In a study, we seek to gain information about the target population. There is a number of things 

we can test about the population parameters, actual values. Two common ones are: 

1) Population average, denoted by Greek Mu (“mew”), 

2) Population percentage, denoted by Greek Pi (“pie”), 

Unfortunately, we do not know the true values for and and realistically cannot, unless we 

sample the entire population. We can only estimate them based on the sample we collect. The 

values we collect from the sample are sample statistics and are estimators for the respective 

population parameters. These estimators for the values above, respectively, are notated: 

1) ̂ (“mew-hat”) 

2) ̂ (“pie-hat”) 

Example 1: Because of variation in the manufacturing process, tennis balls produced by a 

particular machine do not have identical diameters. Let denote the true average diameter 

for tennis balls currently being produced. Suppose that the machine was initially calibrated to 

achieve the design specification in. However, the manufacturer is now concerned that 

the diameters no longer conform to this specification. If sample evidence suggests that the 

true average diameter for tennis balls is not 3 inches, the production process will have to be 

halted while the machine is recalibrated. Because stopping the production is costly, the 

manufacturer wants to be quite sure that the true average diameter is not 3 inches before 

undertaking recalibration. What are the competing hypotheses 

SOLUTION: 

Under the original assumption, . The researcher wants to test whether . So: 

Example 2: A long-used chemical in a particular carpet-cleaning product has been known to 

successfully remove dark stains 70% of the time. After extensive research, the product's 

formula is modified. The head of production must decide whether or not to sell the new 

product. Write null and alternative hypotheses for conducting an experiment that might help 

him decide. 

SOLUTION: 

Under original specifications, the proportion of time the product works is . He is 

concerned that . If it is truly less effective, then he will not sell the new product. That is, 


Example 3: Many older homes have electrical systems that use fuses rather than circuit 

breakers. A manufacturer of 40-amp fuses wants to make sure that the mean amperage at 

which its fuses burn out is in fact 40. If the mean amperage is lower than 40, customers will 

complain because the fuses require replacement too often. If the mean amperage is higher 

than 40, the manufacturer might be liable for damage to an electrical system as a result of 

fuse malfunction. To verify the mean amperage of the fuses, a random sample of fuses is 

selected and tested. If a hypothesis test is performed using the resulting data, what null and 

alternative hypotheses would be of interest to the manufacturer 

SOLUTION: 

The fuse is designed and assumed to be 40 amps. That is, on average, 

sure it is not the case that . So, 

. He wants to make 

So Your Average IS Different! 

In our pesticide experiment, our target population is all plants of this particular variety. Thus, we 

will take a random sample of plants from the pesticide group. Once we have that, we will find 

the sample mean, which is called a sample statistic. That is, we can‟t possibly keep track of all 

the plants in the population, so we will use the mean of the sample to help us describe the entire 

population. Usually, this sample statistic is written as ̂ (“mew-hat”). Suppose that you find, 

from the pesticide group, that 

̂ 

The claim has been proven, right Maybe, maybe not. 

We must remember that this is just one random sample from all plants. Certainly, this sample 

average is lower, but can it not just be due to random variation that we‟re seeing a difference 

After all, not all no-pesticide plants will produce exactly 30 lbs. of the vegetable. 

What if we collect a sample and 

̂ 


Without some sort of analysis, we might be tempted to say this is sufficiently lower. However, 

we need to have some sort of formal way to determine: 

When is “low,” low enough Or, more generally 

The Big Question 

When making conclusions about the population based on sample data, we must first ask the 

question, 

When do we conclude that an “extreme” is extreme enough to reject 

As you might guess, there is probability involved. 

That is, if the probability of observing what we have just seen, or what is more extreme, is small 

“enough,” then we will reject and conclude that might be a more valid conclusion. 

Punchline: We shouldn‟t reject the null hypothesis unless the probability of seeing something as 

or more extreme is very unlikely. 

What Happens If I Reject 

When the Data Provides Insufficient Evidence 

Imagine a medical test to determine whether or not you have some disease. Let‟s call this 

disease, Disease X. 

As for having the condition, you have one of two possibilities: you have it or you don‟t. 

As for the test, it will either say that you have it or you don‟t. 

Now, realistically, we know that there is no way to be omniscient and really know whether or not 

you have the condition. However, let‟s imagine that we are all-knowing and can judge the 

validity of the test. There are four possibilities: 

1) The test is positive, and you do have X (accurate) 

2) The test is positive, and you don’t have X (inaccurate) 

3) The test is negative, and you do have X (inaccurate) 

4) The test is positive, and you don’t have X (accurate) 

It is evident that possibilities 2) and 3) represent scenarios where there is an inaccurate result. 

That is, it would be invalid for the test to tell you that you have the condition when, in fact, you 

don‟t. It would also be invalid for the test to tell you that you don‟t have the condition when, in 

fact, you do. 

Contrarily, we do want the test to tell us positive when we do have the condition and negative 

when we don‟t. 


Hypothesis Test 

Conclusion 

Test Says 

Medical researchers usually give these four instances name, as summarized in the following 

table: 

Truth 

Have Don‟t Have 

Positive True Positive False Positive 

(Type II Error) 

Negative False Negative True Negative 

(Type I Error) 

As can be seen, the green cells represent accurate results (true results) and the red cells represent 

inaccurate results (false results). 

As a patient, you would probably be quite upset (devastated, even) if you received false results 

for a terrible condition, such as X! 

In a hypothesis test, we are up against the same dilemma: our test result can be either positive or 

negative. The truth may or may not be accurately represented. Let‟s modify our table slightly to 

represent the hypothesis test scenario: 

Don‟t 

Reject 

Truth 

True 

True Positive 

False 

False Positive 


Reject 

False Negative 


True Negative 

In reality, we shouldn‟t reject (make it appear false), when it is true. If we do, we have a false 

negative on our hands. Similarly, we shouldn‟t not reject (make it appear true), when it is 

false. These are labeled Type I and Type II errors, respectively. 

How Do We Avoid Erroneous Conclusions 

Unfortunately, we are not omniscient. Thus, we can never be sure that our conclusions are 

accurate. If we knew, there would be no testing necessary! 

On the flipside, we can determine how large of an error rate we require. Earlier, we mentioned 

that we will reject when the probability of observing something as or more extreme as what 

we have observed is “small.” This value of small fully determines our probability of a Type I 

error. As researchers, it is our duty to set this value. This probability of a Type I error is called 

the criterion, or alpha-level, and is denoted with the Greek letter alpha, . 

Criterion/Alpha-Level 



Conclusion 

Our chosen risk of a Type I error is called the criterion or alpha-level, and is denoted by the . 

Typical values for are: 

That is, rarely will we choose a very small or considerably large alpha-level. 

Suppose that we reject when the probability of observing something as or more extreme as 

what we have observed is 5% (or smaller). We have that . 

This means that there is still a 5% (or smaller) chance that we observe a value (sample mean, 

sample proportion, etc.) more extreme than what we have observed. That is, there is a 5% chance 

that we have falsely rejected the null hypothesis. Probabilistically, 

( ) ( ) 

( ) 

To visualize this, consider the diagram below. Recall that a conditional probability statement 

limits us to the event after the “pipe,” |, and then asks the question, “what percentage of the time 

can we expect the event to occur, out of the times the specified condition occurs. The modified 

table below shows that. 

Truth 

Don‟t 

Reject 

True 

True Positive 

95% 

Reject 



5% 

100% 

At this point we might wonder: why shouldn‟t we set 

Type 1 error risk 

extremely small so that we minimize the 

Good question. Imagine that your alpha is 0.0001. This means you will only reject 0.01% (or 

1 out of 10,000 times) of the time, when it is true. Certainly, your risk of a Type I error is 

extremely small. 



Conclusion 

Since your decision criteria, or the numerical figure that we later calculate to decide whether or 

not to reject, will be extremely stringent and difficult to achieve. If this is the case, then you 

almost never reject the null hypothesis! 

Okay, so if you very rarely reject the null hypothesis, then you are also potentially committing 

another act of error: not rejecting the null hypothesis, even though it may be false. That is, you 

increase the likelihood of a Type II error. Recall that, 

( ) ( ) 

We can see here that failing to reject results in potentially failing to reject it even when it 

should be rejected! Unfortunately, there is no free lunch in hypothesis testing. 

Truth 

Don‟t 

Reject 

True 



Reject 

True Positive 

Though we cannot yet easily provide numerical support for this claim (which certainly makes 

sense), we will make the following preliminary conclusion: 

Type II Error - 

The probability of a Type II error, denoted , is inversely proportional to , the probability of a 

Type I error. That is, decreasing will increase . 

Important Caution 

Students are often confused that the probability of rejecting when is true and the 

probability of failing to reject when is true sum to 1. After all, these two possibilities are 

only two of the four possible results in a test decision. 

However, keep in mind that these are the percentages of time we reject and fail to reject out of all 

the times that is true! This out of only one column total, not the entire sample space. 



Conclusion 

The important caution brings up the following idea: 

If 

, then, 

( ) ( ) 

( ) 

( ) 

Similarly, 

If 

, then, 

( ) ( ) 

( ) 

( ) 

The probability that we reject the null hypothesis when it is false is referred to as the power of 

the test. We summarize these in the table below: 

Don‟t 

Reject 

Truth 

True 

False 

Reject 

Example 4: The college dropout rate for a particular county is known to be 30%. The 

educational board of a city within the county believe its dropout rate is significantly lower. 

The board follows 60 students and, of them, 15 dropout. The board wants to run a statistical 

hypothesis test with to determine whether their belief is true. Describe the 

hypothesis test by: 

a. Writing competing hypotheses 

b. A decision rule for rejecting 

c. A decision criterion rule 


SOLUTION: 

d. A generic conclusion statement 

a.) Under the null hypothesis, . We want to test to see if . Thus: 

b.) We will reject if the probability of observing something as or more extreme as 15 out of 

60 dropouts ( ) under the assumption of the null hypothesis is less than or equal to 

0.05. That is: 

( ) 

c.) We will reject if the observed value of is smaller than some cutoff value of . That 

is, it might be the case that would have to be smaller than, say, 13 in order for us to 

reject the null hypothesis. 

d.) Based on sample evidence, we (choose from below) 

a. Reject in favor of 

b. Fail to reject . We do not accept as true, but we don‟t have evidence to 

conclude otherwise. 

As we see from the above example, our hypothesis test needs to have a structured layout. We 

need to know ahead of time what we‟ll do. 

It is tempting, but we cannot determine our rejection criterion based on what the sample data 

tells us! In practice, you can carry this type of philosophy, but you increase the error rate. 

Consider, for example, the scenario wherein you take an exam for a biology class. You get the 

results back and look at what you missed. You say, “oh, of course I should have put that! I knew 

that!” If you told that to the instructor, she may say, “sorry, you didn‟t demonstrate that on the 

exam.” Without surprise, we expect this response. Why Because, it is the test that helps to 

determine our level of understanding! It is not the other way around. If the instructor allowed 

you to change your answer, then the test wouldn‟t really be demonstrating what you knew at that 

time of the test. A hypothesis test is quite analogous. We carry one out because we have a hunch. 

Always think back to this statement: 

If you dig long enough in your data, you will find something! 

This, however, looks upon the digging process as a negative thing since it does not justify the 

decision questions. In fact, it creates a high likelihood that we are observing a coincidence and 

not a solid finding at all! Thus, we increase the probability of error exponentially! 


Structure of a Hypothesis Test 

The following should be included in all hypothesis tests: 

1. A statement of competing hypotheses ( vs. ) 

2. A decision rule for rejecting (based on ) 

3. A decision criterion rule (the physical value of the random variable that represents the 

required “extremeness” of our observed sample value. 

4. A conclusion statement (what the sample data tells you to conclude) 

As an important note: we never say, “accept as true.” Instead, we remain accurate and say 

that there is simply not enough evidence to reject it. Think about this as “innocent,” vs. “not 

guilty.” Just because a court cannot prove that someone is guilty, they don‟t say that he is 

innocent. Instead, they give the verdict of “not guilty.” 


1. In your own words, explain the difference between the null and alternative hypotheses. 

Also, explain how to identify each in a research study. 

2. Explain why we assume that the null hypothesis is true before testing a hypothesis. 

3. It is believed that 7% ( ) of an organic corn crop is lost to insect infestations. An 

organic farmer has devised a system that may result in less insect destruction. He would 

like to test this idea with a hypothesis test. Write the competing hypotheses. 

4. A high school statistics class typically gets an average of scores out of 5 on an 

Advanced Placement (AP) exam. Over the recent several years, he has found that his 

students‟ scores were higher. He would like to test this hypothesis. Write the competing 

hypotheses. 

5. A snack dispenser has a failure rate of over a 5-year span. After changes to the 

machine, the manufacturer would like to know whether or not this has changed. Write 

competing hypotheses. 

6. What does it mean to say that when describing a Type I error 

7. Based on the “Structure of a Hypothesis Test” blue box, fully describe the hypothesis test 

for the scenario in question 3, assuming and that he finds that only 52 out of 

1000 bushels of his crop are lost to insect infestations. 


for the scenario in question 4, assuming and that he finds his students have 

been averaging ̅ on the test. 



for the scenario in question 5, assuming and that she finds the failure rate is 16 

out of 1000 machines. 

10. In real-world terms, describe what Type I and II errors would mean for each of questions 

3, 4, and 5. 

11. Why does the risk of a Type II error increase as we decrease 


APPENDIX A 

Answers to Select Problems 

1.1 Data and Their Uses 

1. 

2. 

3. 

4. 

5. 

6. 

a. Nominal; ice cream names cannot be ordered, in general. 

b. Interval; temperatures have order and the differences in temperature can be 

reasonably discussed. For example, to talk about a difference is meaningful. 

c. Ratio: Absolute 0 exists since there can be no balance at all. Additionally, it 

makes sense to talk about ratios. For instance, accounts receivable balances can 

be, say, 20% higher this month as compared to last. 

d. Ordinal; there is an ordering, though we can‟t talk about the number 1 candidate 

as being 2 better than the number 3 candidate. This is because the difference of 1 

might not necessarily be the same from 1 to 2 as it would be from 2 to 3. Maybe 

candidate 3 is a far third. 

a. 2,121 elements in the sample 

b. Length of time is a quantitative variable, since it is a numerical measure. 

a. 15,000 elements in the sample 

b. A proportion is a quantitative variable, since it is a ratio. 

a. Observational; the number of animals a family have is not being assigned. 

Instead, families are simply being asked about how many animals they have. 

b. The study might have considered families with horses. People with horses likely 

live on the outskirts of a big city, perhaps being exposed to less pollen. Also, 

maybe more families have pets because their children do not seem to have 

allergies to them. 

a. Observational; the researchers are looking at preexisting habits. They are not 

attempting to alter the habits to determine what effect doing so might have on 

measures of reading ability and short-term memory. 

b. No; perhaps those who watch more television also have other habits that lead 

them to scoring poorly on such assessments. 

a. Observational; the opinions of the doctors are not being altered in any way. 

b. There is a nonresponse bias since not all participants responded. Thus, it might be 

the case that those with the strongest opinions decided to come forward, whereas 

the other 17,000 who didn‟t respond might have influenced the poll in a different 

way. 


1.2 Descriptive VS. Inferential Statistics 

1. 

2. 

3. 

a. $4 million/day 

b. If all days had the same gross revenue, $4 million would be earned. 

c. $7.6 

d. The amount of gross revenue earned on a given day varies by as much as $7.6 

million as another day. 

e. The film has generated an average of $4 million/day. There is much instability in 

this average in that the actual gross revenue has varied from $1.6 million to $9.2 

million, a range of $7.6 million. It is dangerous to place too many bets on what 

might happen next, due to the extreme variability in revenues. 

a. 18 randomly selected college students 

b. All college students 

c. Answers vary; spending on clothing, style preference, etc. 

d. Inferential; they wish to make conclusions about the population of all college 

students 

a. 250 packages of cheese selected 

b. All packages of cheese produced by the company 

c. 248 or more must pass 

4. Consider the following two datasets with a range of 30: 

0, 1, 2, 2, 3, 2, 28, 29, 30 

0, 1, 2, 3, 4, 3, 4, 2, 1 30 

While both have a range of 30, the first dataset has most of its data towards the outer ends 

of the dataset. In the second dataset, there appears to tightly spaced data, followed by one 

outlier of 30. The second dataset is, overall, less spread out. 

5. The researchers are trying to use CGCC students as a representative population of all 

college students. This presents a bias, in that CGCC probably does not accurately 

represent all college students. 

2.4 Descriptive Statistics – Variability 

1. 

a. Standard deviation = 5.9; on average, beers in this sample are within 5.9 calories 

of the average calorie content. 


2. 

b. Q3 – Q1 = 4.75. The middle 50% of beer calories in this sample have a range of 

4.75 calories. Specifically, they range from 29 calories (first quartile) to 33.75 

calories (third quartile). 

c. The skewness value is 0.14. This means the distribution is slightly skewed to the 

right. 

a. Range = 64.3; Interquartile Range = 27.7 (71.9 – 44.2); Standard Deviation = 

18.7. The difference between the highest and lowest percentage is 64.3%, telling 

us that the percentage of school enrollees varies greatly across Central Africa. 

However, this does not ensure that there is not a single outlier creating this wide 

spread. The interquartile range is 27.7%, telling us that the middle 50% of 

percentages span from 44.2% to 71.9%, still a considerable spread. The standard 

deviation verifies that percentages are quite variable, since, on average, the 

percentage of school enrollees varies by 18.7% points about the mean. 

b. The interquartile range is 27.7%, telling us that the middle 50% of percentages 

span from 44.2% to 71.9%, still a considerable spread. The standard deviation 

verifies that percentages are quite variable, since, on average, the percentage of 

school enrollees varies by 18.7% points about the mean. 

c. 

Enrollment 

Mean 60.9 

Standard Error 3.9 

Median 61.9 

Mode 61.9 

Standard Deviation 18.7 

Sample Variance 351.2 

Kurtosis -0.4 

Skewness 0.4 

Range 64.3 

Minimum 34.6 

Maximum 98.9 

Sum 1401.2 

Count 23.0 

d. Yes, it is skewed to the right, since the skewness value is 0.4, a positive value. 

e. 



35% 

30% 

25% 

20% 

15% 

10% 

5% 

0% 

Percent Enrolled 

Percentage 

The majority of people in Central Africa are not enrolled in school, since it is 

predominantly the case that fewer than 50% of people in each nation attend school. 

f. We know that ̅ and . A percentage of 79.6% is 

standard deviation from the mean. We would expect that at least 

( ) 

of all enrollment percentages would be within one standard deviation of the mean. 

This is considered to be a very normal percentage (it is still within the “average” 

spread). 

3. 

a. The range is 5750, which tells us that there is a difference of 5,750 feet from the 

shortest street to the longest street. The interquartile range is 2170, telling us that 

the middle 50% of all street lengths range from 980 feet to 3,150 feet. The 

standard deviation is 1634, telling us that, on average, a street varies by 1,634 feet 

from the mean street length. 

b. The interquartile range is 2170, telling us that the middle 50% of all street lengths 

range from 980 feet to 3,150 feet. The standard deviation is 1634, telling us that, 

on average, a street varies by 1,634 feet from the mean street length. 

c. 



Street Lengths 

Mean 2231.4 


Median 2100.0 

Mode 960.0 



Kurtosis -0.2 

Skewness 0.8 

Range 5750.0 


Maximum 5850.0 

Sum 104874.0 

Count 47.0 

d. The distribution is strongly skewed to the right. 

e. 

f. 

This means that a street length of 79.6 feet would be about 1.3 standard deviations 

below the mean. 

Street Length 

35.00% 

30.00% 

25.00% 

20.00% 

15.00% 

10.00% 

5.00% 

0.00% 

100-1099 1100-2099 2100-3099 3100-4099 4100-5099 5100-6099 

Feet 

4. Answers vary; 

By C.T. . / of all street lengths in the sample are guaranteed to 

fall within 1.3 standard deviations of the mean. This is not unusual. 


Symmetric: 

35 

30 

25 

20 

15 

10 

5 

0 

100 to 120 120 to 140 140 to 160 160 to 180 180 to 200 

Bimodal (two peaks): 

30 

25 

20 

15 

10 

5 

0 

100 to 120 120 to 140 140 to 160 160 to 180 180 to 200 

Right Skewed: 


35 

30 

25 

20 

15 

10 

5 

0 

100 to 120 120 to 140 140 to 160 160 to 180 180 to 200 

Left Skewed: 

35 

30 

25 

20 

15 

10 

5 

0 

100 to 120 120 to 140 140 to 160 160 to 180 180 to 200 

5. 

a. 


Repair Cost 

Mean 971 

Standard Error 382 

Median 738 

Mode - 

Standard Deviation 1,207 

Sample Variance 1,455,875 

Kurtosis 7 

Skewness 2 

Range 4,194 

Minimum - 

Maximum 4,194 

Sum 9,707 

Count 10 

Due to the great variability in repair costs, it would be most appropriate to use the 

median as measure of center. It also reflects the fact that most repair costs, if there 

are any, tend to be between $600 and $1000. Since the standard deviation 

describes movement about the mean, it is not appropriate to be used in 

combination with a median. Thus, we should probably use the interquartile range 

to describe the middle 50% of repair costs. 

b. 

The repair costs of $4,194 is nearly 3 standard deviations above the mean. This 

means that it is an outlier cost. 

c. According to C.T., at least . / of the data in this data set should be 

within 2.7 standard deviations of the mean. Thus, there is only a 14% chance that 

we have a score outside of 2.7 standard deviations of the mean. This tells us that a 

repair cost of $4,194 is fairly unusual. 

6. 


el freq 

CC Ratios 

Mean 12.35 


Median 12.91 

Mode #N/A 



Kurtosis -0.50 

Skewness -0.60 

Range 6.03 


Maximum 14.84 

Sum 123.47 

Count 10.00 

There do not appear to be extreme outliers, since the mean and median are close. However, 

based on the mean being smaller than the median, and the skewness value being negative, there 

is a slight left-skew to the distribution. The standard deviation tells us that average CC ratios are 

within 0.62, or 62% points, of the mean. We verify these notions by consider the histogram 

45.00% 

40.00% 

35.00% 

30.00% 

25.00% 

20.00% 

15.00% 

10.00% 

5.00% 

0.00% 

CC Ratio Distribution 

CC Ratio 

We should also be careful to note that there is not very much data available, which is why we 

don‟t distinctly see a skew. 

7. 


el freq 

Nitrous Oxide (thous. Tons) 

Mean 46.35 


Median 36 

Mode 40 



Kurtosis 0.09474 

Skewness 0.949789 

Range 136 

Minimum 0 

Maximum 136 

Sum 927 

Count 20 

30% 

25% 

20% 

15% 

10% 

5% 

0% 

Nitrous Oxide Distribution 

Nitrous Oxide (thous. Tons) 

The distribution of nitrous oxide emissions is skewed to the right indicating that most states have 

relatively low emissions, whereas fewer states have relatively high emissions. We note that the 

median is a good measure, indicating that 36 thousand tons is the 50 th percentile. There are two 

outliers of 136 thousand tons. For this value, 

, indicating that at least around 

75% of all values in the data set are within 2.1 standard deviations of the mean. Thus, 136 can be 

considered a mild outlier. 

3.2 Joint Probability 


1. See Video Solution 

2. 

a. About 85% of all the past calls were for medical assistance. 

b. P(call is not for medical assistance) = 1 – 0.85 = 0.15. 

c. P(two successive calls are both for medical assistance) = (0.85)(0.85) = 0.7225. 

d. P(first call is for medical assistance and second call is not for medical assistance) 

= (0.85)(0.15) = 0.1275 

e. P(exactly one of two calls is for medical assistance) = P(first call is for medical 

assistance and the second is not) + P(first call is not for medical assistance but the 

second is) = (0.85)(0.15) + (0.15)(0.85) = 0.255. 

f. Probably not. There are likely to be several calls related to the same event - 

several reports of the same accident or fire that would be received close together 

in time. 

3. (“ ” “ ” “ ”) . / . / . / 

4. See Video Solution 

5. 

a. The "expert" assumed that the positions of the two valves were independent. 

b. The position of the two valves is not independent but rather dependent. The 

effect of the error makes the probability much smaller. The actual probability is 

compared to . 

6. 

a. Assuming that whether Jeanie forgets to do one of her “to do” list items is 

independent of whether or not she forgets any other of her “to do” list items, the 

probability that she forgets all three errands = (0.1)(0.1)(0.1) = 0.001. 

b. ( ) 

( ) 

c. P(remembers the first errand, but not the second or the third) = (0.9)(0.1)(0.1) = 

0.009. 

5.1 The Ideas Behind the Continuous Distribution 

1. 

a. 


Probability 

Pizza Size Distribution 

0.6 

0.5 

0.4 

0.3 

0.2 

0.1 

0 

12 14 16 18 

Size (inches) 

b. ( ) 

c. ( ) 

d. , - ( ) ( ) ( ) ( ) inches per pizza, on 

average. 

e. ( ) (doesn‟t include the 12-inch pizza!) 

2. 

3. 

4. 

a. ( ) 

b. ( ) 

a. , so ( ) for 

b. ( ) 

c. ( ) 

d. ; on average, the professor dismisses class 5 minutes after the hour. 

e. ; on average, the amount of time that the professor dismisses the class 

after the hour by varies by 2.9 minutes about the mean. 

f. ( ) ( ) 

a. ( ) 


. ( ) ( ) 

c. ( ) ( ) ( ) 

d. ( ) 

5. 

a. , so ( ) for 

b. ( ) 

c. ( ) 

d. Both ( ) ( ) because, in a continuous distribution, the 

probability that is 0. 


e. ( ) 

f. ; the average response time is 26 minutes 

g. ; on average, wait times deviate from the mean wait time by 4.6 minutes. 

h. . Thus, we want ( ) ( 

) . 

5.2 The Normal Distribution 

1. 

a. 

b. 


c. 

d. 


2. 

a. The long-run proportion of all children born in the U.K. expected to weight more 

than 10 lbs. is 0.0186. 

b. The long-run proportion of all children born in the U.K. expected to weigh at 

most 10 lbs. is 09814. 


c. The long-run proportion of all children born in the U.K. expected to weigh 

between 5 and 6.5 lbs. is 01837. 

d. The long-run proportion of all children born in the U.K. expected to weigh 

between 1 and 2 lbs. is 0.0000. 


e. 20% of all children are expected to be born weighing less than 6.5 lbs. 

6. In a recent years, Scholastic Aptitude Test (SAT) scores for all college-bound seniors in 

the United States was such that points and points (SOURCE: 

http://www.collegeboard.com) . 

a. 50% of students scored less than how many points 

b. 50% of students scored more than how many points 

c. In order to be in the top 10% of SAT-takers, what score would one have to 

achieve 

d. What score do the lowest 10% score between 

e. The middle 50% of students scored between what two values 


3. 

a. 50% of students score less than 1518 on the test. 

b. By complementary probability, 50% of students should score more than 1518. 

c. You would have to score about 1913 points. 

d. About 1123. 


e. The middle 50% score between about 1310 and 1726. 

4. 

a. 


. The Empirical Rule is a summary of what we have done above. It is a nice ruleof-thumb. 

5. 

a. The distribution would maintain its exact shape, though would be shifted 10 units 

to the right. 

b. The distribution would become wider and have a lower peak. This must happen to 

make sure the area is still 1 when the distribution becomes wider. 

c. The distribution would become narrower and have a higher peak. If a distribution 

becomes narrower, its height must increase to maintain an area of 1. 

d. The mean, , determines where the distribution is centered without altering its 

shape. The standard deviation, , will make a distribution wide and low-peaked if 

it large, and will make a distribution narrow and high-peaked if small. 

6.1 The Sampling Distribution for ̅ 

1. Answers vary 

2. Answers vary – emphasis on the ability to have a population distribution with any 

unknown shape. 

3. 

a. 0.2525 

b. 0.2514 

c. 0.9044 

d. 95.1 and 104.9 

4. 

a. 0.0272 


5. 

b. This might indicate that the production process is outside of the norm. This type 

of average is unlikely in a sample of size where The company 

should investigate why the average thickness of its glass samples is so thick. 

a. It should be approximately normal, regardless of the distribution of revenues. 

b. 0.3869 

c. The standard deviation of means would change from $6,957 to $4,400. This 

would change ( ) . This makes sense, since the 

distribution of means is less spread, and so there will be fewer mean sales 

amounts beyond $420,000. 

d. $421,255.50; If the team averages more than this amount for each team member, 

then they will receive the paid vacation days. 

6. We know that , ̅- , so , so √ A person‟s 

√ √ 

income varies, on average, by about $2,906.89 from the population average of incomes. 

7. 

a. rooms and rooms (NOTE: be sure to use sdev.p() since this is a 

population standard deviation we want) 

b. It should be approximately normal based on the Central Limit Theorem; the 

sample size of 30 satisfies the minimum required sample size to meet normality 

assumptions. 

c. Answers will vary slightly due to sampling variability of the simulation process; 

, ̅- and , ̅- . We see that , ̅- as expected. We also see that 

√ 

√ 

, which is what we obtained via simulation. 

d. Answers will vary slightly due to sampling variability of the simulation process; 



√ 

√ 

e. Answers will vary slightly due to sampling variability of the simulation process; 



√ 

√ 

f. The population standard deviation can be thought of as the distribution of means 

from a sample of size . That is, , ̅- . Since it is the smallest 

possible sample size, it will have the highest degree of variability. 

g. 0.000 or about 0% chance 

h. As with tossing a coin repeatedly, when something is repeated over-and-over 

again, the amount of variation in the outcomes becomes relatively small. That is, 

any mild outliers get averaged in to a large sample of typical values, and its effect 

is dispersed. In small samples, the opposite holds – deviate values are highly 

corrosive to the sample mean. 

√ 

6.2 Confidence Interval for ̅ 




3. 

a. No, the sample size is 10, which is less than the minimum required (30). 

b. ( ) 

c. We are 95% confident that the population average labor cost is between $109.6 

billion and $227.6 billion. 

d. About 0.213 

4. 

a. Yes, since they can be 95% confident that the average revenue per camera will be 

between $654.51 and $752.44. 

b. No, since they can be 99% confident that the average revenue per camera will be 

between $637.42 and $768.01, which includes the possibility of the average being 

lower than $640. 

c. Yes, the sample size is 30, which is the minimum required sample size for the 

CLT results to be applied. 

d. We know that , ̅- , which we are estimating by ̅. That is, we are assuming 

the sample mean is the population mean for the basis of our interval. Here, 

̅ . Similarly , ̅- √ . We are using to estimate . Thus, our 

estimate of , ̅- 

. Using our probability calculator, we find: 

√ 

Our 95% confidence interval would be 652.1 to 755.1, which is close to our 

bootstrap confidence interval. It is a bit wider than we would like. 

e. Here we have that . We have 5% to split between the tails. 

Thus, in each tail. We find that (same number of standard 

deviations from the mean to each tail, since the distribution is symmetric): 


̅ 

We have that ̅ and √ . So our interval will be 

Where 

( ) 


Thus, our interval is: 

( ) 

Or 

( ) 

This is a bit wider, accounting for the extra variability in estimating and . 

6.3 Confidence Interval for ̂ 

7.1 The Concept Behind Hypothesis Testing 

1. The null hypothesis is assumed to be true and is usually based on what has been observed 

before. The alternative hypothesis is what we would like to test, which is something that 

would challenge past observations or assumptions about a population. 

2. We assume it is true because it is based on past observations or research. For example, if 

the Census Bureau finds that 35% of Americans enjoy hypothesis testing, then this is 

typically based on some fairly extensive research. If a researcher believes this rate is 

greater in his community, then he can test his alternative hypothesis. 

3. 

4. 

5. 


6. This is the probability that we reject the null hypothesis when it is, in fact, true. That is 

( ) . This allows us to be 95% confident that we fail to reject 

when it is true, a correct decision. 

7. 

1) Hypotheses: 

8. 

2) Decision Rule: We will reject the null hypothesis when the likelihood of 

observing something as small or smaller than 52 out of 1000 bushels is no 

larger than a 1% probability, under the assumption of the null hypothesis. That is, 

( ) 

3) We will reject if the observed value of is smaller than some cutoff 

value of . 

4) Based on the sample evidence, we will either: 

a. Reject in favor of of insect-related crop destruction for the 

farmer‟s new method. 

b. Fail to reject . We do not have sufficient evidence to conclude that the 

farmer‟s new method is better than his old method. 



observing something as large or larger than ̅ is no larger than a 5% 

probability, under the assumption of the null hypothesis. That is, 

( ̅ ) 

̅ 3) We will reject if the observed average of is larger than some cutoff 

value of ̅. 


a. Reject in favor of out of 5 questions are answered correctly by 

his students (as of recent observations). 

b. Fail to reject . We do not have sufficient evidence to conclude that the 

instructor‟s more recent students do better on the AP exam than his former 

students. 

9. 




observing something as small/large or smaller/larger than 16 out of 1000 

bushels is no larger than a 1% probability, under the assumption of the null 

hypothesis. That is, 

( ) 

3) We will reject if the observed value of is smaller or larger than some 

cutoff values of . That is, if it is smaller than some value, say , or larger than 

some value, say , then we will reject . Remember, we set-up a hypothesis 

first, then do the test. Even though 16 is larger than 15 out of 1000, we did not 

know this to begin with. We are still testing whether or not this value is 

significantly different and do not care about the direction of the difference. 


a) Reject in favor of of machines fail. That is, either a 

significantly fewer number of them fail, or a significantly greater number 

of them fail. 

b) Fail to reject . We do not have sufficient evidence to conclude that new 

machines fail more or less when compared to the old machine. 

10. 

1) Type I: We conclude the farmer‟s method reduces crop destruction, when there is 

no difference; Type II: We conclude the farmer‟s method is no different than the 

old method, when in fact there is less than 7% crop destruction with his new 

method. 

2) Type I: We conclude the instructors students perform better than his former 

students, when in fact there is no difference; Type II: We conclude that his new 

students perform just as well as his former students, when in fact they do better. 

3) Type I: We conclude that the new machines fail more or less than the former 

machines, when in fact there is no difference; Type II: We conclude that there is 

no difference between the failure rates of the new and old machines, when in fact 

there is a significant difference. 

11. Increasing means we will reject less often, as we set more stringent conditions upon 

the rejection process. If we reject less often, then there is an elevated likelihood that we 

may fail to reject, when in fact we should. This is precisely what a Type II error is.

Statistics for Decision- Making in Business - Maricopa Community ...

You also want an ePaper? Increase the reach of your titles

Delete template?

Save as template?