Stat 20: Intro to Probability and Statistics - Lecture 3: Types of Data ...
Stat 20: Intro to Probability and Statistics - Lecture 3: Types of Data ...
Stat 20: Intro to Probability and Statistics - Lecture 3: Types of Data ...
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Stat</strong> <strong>20</strong>: <strong>Intro</strong> <strong>to</strong> <strong>Probability</strong> <strong>and</strong> <strong>Stat</strong>istics<br />
<strong>Lecture</strong> 3: <strong>Types</strong> <strong>of</strong> <strong>Data</strong> <strong>and</strong> Displays<br />
Tessa L. Childers-Day<br />
UC Berkeley<br />
28 January <strong>20</strong>14
Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />
Outline<br />
1 Today’s Goals<br />
2 Kinds <strong>of</strong> <strong>Data</strong><br />
3 Displaying Qualitative <strong>Data</strong><br />
2 / 27
Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />
By the end <strong>of</strong> this lecture...<br />
You will be able <strong>to</strong>:<br />
Define data types<br />
Classify a given response as a type <strong>of</strong> data<br />
Compare types <strong>of</strong> data<br />
Comprehend displays <strong>of</strong> qualitative data<br />
3 / 27
Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />
Example: High School Hobbies<br />
(Hypothetical:) Imagine we are interested in exploring what kinds<br />
<strong>of</strong> hobbies high school students have below. We construct a survey:<br />
(1) Circle your gender: Female Male<br />
(2) Circle your class level: Fresh Soph Jr Sr<br />
(3) What is your GPA (0.0-5.0)?<br />
(4) What is your height in inches?<br />
(5) What is your weight in pounds?<br />
(6) What is your favorite hobby?<br />
How do the questions look? How should we sample students?<br />
4 / 27
Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />
Now what?<br />
We have seen how <strong>to</strong> design experiments, observational studies,<br />
<strong>and</strong> surveys. After they are carried out, what do we have?<br />
<strong>Data</strong>:<br />
the individual bits <strong>of</strong> information or responses collected, e.g. a<br />
single person’s height is a data (point, value), singular<br />
a set or collection <strong>of</strong> information or responses, e.g. all <strong>of</strong> the<br />
heights collected in the course <strong>of</strong> an experiment is data (set),<br />
plural<br />
5 / 27
Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />
Example: High School Hobbies (cont.)<br />
A selection <strong>of</strong> the hypothetical raw data appear below:<br />
obs gender class level gpa height weight hobbies<br />
1 male jr 3.80 71 122 watching tv<br />
2 female soph 2.80 67 132 facebook<br />
3 female jr 3.90 62 148 baseball<br />
4 male jr 3.50 60 139 soccer<br />
5 male sr 2.60 66 133 yearbook<br />
6 female jr 1.60 65 147 baseball<br />
7 male sr 0.70 71 138 facebook<br />
8 female fresh 4.80 60 143 gymnastics<br />
9 male sr 4.00 72 132 hanging out<br />
10 male jr 1.30 72 146 reading<br />
6 / 27
Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />
Ways <strong>to</strong> classify data<br />
Level <strong>of</strong> Measurement:<br />
Qualitative<br />
Nominal<br />
Ordinal<br />
Quantitative<br />
Interval<br />
Ratio<br />
Countability:<br />
Discrete<br />
Continuous<br />
7 / 27
Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />
Qualitative <strong>Data</strong><br />
In general, qualitative data are:<br />
Categorical<br />
Linguistic<br />
Non-numeric<br />
May be subjective or objective<br />
Do we have categorical data in our example?<br />
8 / 27
Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />
Nominal <strong>Data</strong><br />
A type <strong>of</strong> data that are simply names or labels. Cannot be ranked<br />
or ordered by value objectively.<br />
Rocks: igneous, metamorphic, sedimentary<br />
Clouds: cumulus, stratus, cirrus, nimbus<br />
Names: Adam, Andrew, Bethany, Chris<strong>to</strong>pher<br />
College Degree: yes/no<br />
Currently Married: yes/no<br />
Coin flip: heads/tails<br />
Do we have nominal data in our example?<br />
9 / 27
Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />
Ordinal <strong>Data</strong><br />
A type <strong>of</strong> data that are names or labels which can be sensibly<br />
ordered or ranked. Nominal data plus an ordered scale. The<br />
“distances” between each category are not known, but their<br />
ordering is known. Ties are allowed.<br />
Cooking Ability: poor, adequate, fair, good, excellent<br />
Letter Grade: A, B, C, D, F<br />
Likert Scale: strongly disagree, disagree, neutral, agree,<br />
strongly agree<br />
Horse Race Results: first, second, third, fourth<br />
Frequency: never, sometimes, always<br />
Movie Review: 0 stars <strong>to</strong> 5 stars<br />
Do we have ordinal data in our example?<br />
10 / 27
Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />
Quantitative <strong>Data</strong><br />
In general, quantitative data are:<br />
Numeric<br />
Non-linguistic<br />
Objectively Measurable<br />
Do we have quantitative data in our example?<br />
11 / 27
Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />
Interval <strong>Data</strong><br />
A type <strong>of</strong> data that are numeric <strong>and</strong> are naturally ordered, but<br />
have no meaningful zero. Distances are meaningful.<br />
Temperature: <strong>20</strong> ◦ is the same amount hotter than 15 ◦ as 15 ◦<br />
is <strong>to</strong> 10 ◦ , but <strong>20</strong> ◦ is not twice as hot as 10 ◦<br />
Time: The same amount <strong>of</strong> time elapses between 4:05 <strong>and</strong><br />
4:10 as between 5:12 <strong>and</strong> 5:17. 8:00 is not twice 4:00.<br />
Do we have interval data in our example?<br />
12 / 27
Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />
Ratio <strong>Data</strong><br />
A type <strong>of</strong> data that are numeric, naturally ordered, <strong>and</strong> have a<br />
meaningful zero. Distances <strong>and</strong> ratios are meaningful. Often<br />
physical in nature.<br />
Speed in km/h, m/h, etc<br />
Money/debt in dollars, euros, etc<br />
Electrical charge in coulombs<br />
Do we have ratio data in our example?<br />
13 / 27
Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />
Countability<br />
Applies only <strong>to</strong> numeric data<br />
Discrete: countable, exactly measurable, equally spaced; e.g.<br />
number <strong>of</strong> coin flips that l<strong>and</strong> heads, number <strong>of</strong> queens in a<br />
deck <strong>of</strong> cards, integers, money<br />
Continuous: not countable, not exactly measurable, arbitrarily<br />
close, infinitely divisible; e.g. weight, length are continuous,<br />
but our ability <strong>to</strong> measure is finite, so mostly<br />
round/approximate<br />
14 / 27
Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />
Example: High School Hobbies (cont.)<br />
A selection <strong>of</strong> the hypothetical raw data appear below:<br />
obs gender class level gpa height weight hobbies<br />
1 male jr 3.80 71 122 watching tv<br />
2 female soph 2.80 67 132 facebook<br />
3 female jr 3.90 62 148 baseball<br />
4 male jr 3.50 60 139 soccer<br />
5 male sr 2.60 66 133 yearbook<br />
6 female jr 1.60 65 147 baseball<br />
7 male sr 0.70 71 138 facebook<br />
8 female fresh 4.80 60 143 gymnastics<br />
9 male sr 4.00 72 132 hanging out<br />
10 male jr 1.30 72 146 reading<br />
15 / 27
Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />
Visual Displays <strong>of</strong> Information<br />
Aggregating the data <strong>and</strong> displaying it in a picture or summary <strong>of</strong><br />
some sort is in order.<br />
What sorts <strong>of</strong> displays have you seen before?<br />
16 / 27
Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />
Visual Displays <strong>of</strong> Information (cont.)<br />
Today: Qualitative <strong>Data</strong><br />
<strong>Data</strong> Tables<br />
Distribution Tables<br />
Contingency Tables<br />
Pie Charts<br />
Bar Charts<br />
Word Clouds<br />
Can work for quantitative data <strong>to</strong>o, though not ideal<br />
17 / 27
Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />
<strong>Data</strong> Tables<br />
A frequency table (distribution table) simply displays the number<br />
(or percentage) <strong>of</strong> observations which meet a certain criteria.<br />
Gender Female Male Total<br />
Frequency 247 253 500<br />
Class Fresh. Soph. Junior Senior Total<br />
Frequency 122 125 130 123 500<br />
GPA 0-1 1-2 2-3 3-4 4-5 Total<br />
Frequency 51 80 167 123 79 500<br />
Height (in) 60-63 63-66 66-69 69-72 Total<br />
Frequency 106 173 141 80 500<br />
18 / 27
Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />
<strong>Data</strong> Tables (cont.)<br />
Contingency tables (or cross-tabulations) display two or more<br />
variables at once. The frequency listed at the intersection <strong>of</strong> the<br />
column <strong>and</strong> row tells the number <strong>of</strong> observations which meet both<br />
criteria, while the <strong>to</strong>tals along the sides (marginal <strong>to</strong>tals) provide<br />
frequency distributions.<br />
Class<br />
Gender Fresh. Soph. Junior Senior Total<br />
Female 58 61 63 65 247<br />
Male 64 64 67 58 253<br />
Total 122 125 130 123 500<br />
19 / 27
Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />
Pie Charts<br />
Pie charts represent proportion (percentage <strong>of</strong> <strong>to</strong>tal) as sections <strong>of</strong><br />
a circular “pie”. The area <strong>of</strong> the “slice” is the same percentage <strong>of</strong><br />
the <strong>to</strong>tal “pie” as the category.<br />
GPA<br />
Above A A B C D/F<br />
<strong>20</strong> / 27
Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />
Pie Charts (cont.)<br />
The usage <strong>of</strong> pie charts is discouraged in statistics for the following<br />
reasons:<br />
Areas are difficult <strong>to</strong> compare between different shapes<br />
Angles are difficult <strong>to</strong> compare<br />
Enlarging a pie chart does not yield additional information<br />
Small portions are hard <strong>to</strong> see/compare<br />
What proportion do you think have a D/F compared with a C?<br />
21 / 27
Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />
Pie Charts (cont.)<br />
GPA<br />
Above A A B C D/F<br />
22 / 27
Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />
Pie Charts (cont.)<br />
GPA<br />
Above A A B C D/F<br />
16%<br />
10%<br />
16%<br />
25%<br />
33%<br />
23 / 27
Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />
Bar Charts<br />
Bar charts represent frequency by the height <strong>of</strong> the bar. Higher bar<br />
⇒ more observations in that category<br />
180<br />
160<br />
140<br />
1<strong>20</strong><br />
100<br />
80<br />
60<br />
40<br />
<strong>20</strong><br />
GPA<br />
0<br />
Above A A B C D/F<br />
24 / 27
Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />
Bar Charts (cont.)<br />
Compare this with the pie chart:<br />
Only comparing height–not shape, not angle, not area<br />
Horizontal lines can aid in height comparison<br />
Enlarging a bar chart makes the difference in height bigger<br />
Small portions still visible<br />
25 / 27
Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />
Word Clouds<br />
Word clouds display words or phrases, with font size proportional<br />
<strong>to</strong> usage/frequency. Not ideal, but hard <strong>to</strong> display phrases.<br />
Why not ideal? What could we do instead?<br />
26 / 27
Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />
Important Takeaways<br />
There are different kinds <strong>of</strong> data<br />
Qualitative/Quantitative<br />
Nominal/Ordinal/Interval/Ratio<br />
Discrete/Continuous<br />
Need <strong>to</strong> display data–hard <strong>to</strong> underst<strong>and</strong> raw data, especially<br />
in large amounts<br />
Qualitative <strong>Data</strong> Displays<br />
Tables (distribution <strong>and</strong> contingency)<br />
Pie Charts<br />
Bar Charts<br />
Word Clouds<br />
Next time: Quantitative <strong>Data</strong> Displays<br />
27 / 27