04.02.2014 Views

Stat 20: Intro to Probability and Statistics - Lecture 3: Types of Data ...

Stat 20: Intro to Probability and Statistics - Lecture 3: Types of Data ...

Stat 20: Intro to Probability and Statistics - Lecture 3: Types of Data ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Stat</strong> <strong>20</strong>: <strong>Intro</strong> <strong>to</strong> <strong>Probability</strong> <strong>and</strong> <strong>Stat</strong>istics<br />

<strong>Lecture</strong> 3: <strong>Types</strong> <strong>of</strong> <strong>Data</strong> <strong>and</strong> Displays<br />

Tessa L. Childers-Day<br />

UC Berkeley<br />

28 January <strong>20</strong>14


Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />

Outline<br />

1 Today’s Goals<br />

2 Kinds <strong>of</strong> <strong>Data</strong><br />

3 Displaying Qualitative <strong>Data</strong><br />

2 / 27


Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />

By the end <strong>of</strong> this lecture...<br />

You will be able <strong>to</strong>:<br />

Define data types<br />

Classify a given response as a type <strong>of</strong> data<br />

Compare types <strong>of</strong> data<br />

Comprehend displays <strong>of</strong> qualitative data<br />

3 / 27


Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />

Example: High School Hobbies<br />

(Hypothetical:) Imagine we are interested in exploring what kinds<br />

<strong>of</strong> hobbies high school students have below. We construct a survey:<br />

(1) Circle your gender: Female Male<br />

(2) Circle your class level: Fresh Soph Jr Sr<br />

(3) What is your GPA (0.0-5.0)?<br />

(4) What is your height in inches?<br />

(5) What is your weight in pounds?<br />

(6) What is your favorite hobby?<br />

How do the questions look? How should we sample students?<br />

4 / 27


Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />

Now what?<br />

We have seen how <strong>to</strong> design experiments, observational studies,<br />

<strong>and</strong> surveys. After they are carried out, what do we have?<br />

<strong>Data</strong>:<br />

the individual bits <strong>of</strong> information or responses collected, e.g. a<br />

single person’s height is a data (point, value), singular<br />

a set or collection <strong>of</strong> information or responses, e.g. all <strong>of</strong> the<br />

heights collected in the course <strong>of</strong> an experiment is data (set),<br />

plural<br />

5 / 27


Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />

Example: High School Hobbies (cont.)<br />

A selection <strong>of</strong> the hypothetical raw data appear below:<br />

obs gender class level gpa height weight hobbies<br />

1 male jr 3.80 71 122 watching tv<br />

2 female soph 2.80 67 132 facebook<br />

3 female jr 3.90 62 148 baseball<br />

4 male jr 3.50 60 139 soccer<br />

5 male sr 2.60 66 133 yearbook<br />

6 female jr 1.60 65 147 baseball<br />

7 male sr 0.70 71 138 facebook<br />

8 female fresh 4.80 60 143 gymnastics<br />

9 male sr 4.00 72 132 hanging out<br />

10 male jr 1.30 72 146 reading<br />

6 / 27


Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />

Ways <strong>to</strong> classify data<br />

Level <strong>of</strong> Measurement:<br />

Qualitative<br />

Nominal<br />

Ordinal<br />

Quantitative<br />

Interval<br />

Ratio<br />

Countability:<br />

Discrete<br />

Continuous<br />

7 / 27


Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />

Qualitative <strong>Data</strong><br />

In general, qualitative data are:<br />

Categorical<br />

Linguistic<br />

Non-numeric<br />

May be subjective or objective<br />

Do we have categorical data in our example?<br />

8 / 27


Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />

Nominal <strong>Data</strong><br />

A type <strong>of</strong> data that are simply names or labels. Cannot be ranked<br />

or ordered by value objectively.<br />

Rocks: igneous, metamorphic, sedimentary<br />

Clouds: cumulus, stratus, cirrus, nimbus<br />

Names: Adam, Andrew, Bethany, Chris<strong>to</strong>pher<br />

College Degree: yes/no<br />

Currently Married: yes/no<br />

Coin flip: heads/tails<br />

Do we have nominal data in our example?<br />

9 / 27


Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />

Ordinal <strong>Data</strong><br />

A type <strong>of</strong> data that are names or labels which can be sensibly<br />

ordered or ranked. Nominal data plus an ordered scale. The<br />

“distances” between each category are not known, but their<br />

ordering is known. Ties are allowed.<br />

Cooking Ability: poor, adequate, fair, good, excellent<br />

Letter Grade: A, B, C, D, F<br />

Likert Scale: strongly disagree, disagree, neutral, agree,<br />

strongly agree<br />

Horse Race Results: first, second, third, fourth<br />

Frequency: never, sometimes, always<br />

Movie Review: 0 stars <strong>to</strong> 5 stars<br />

Do we have ordinal data in our example?<br />

10 / 27


Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />

Quantitative <strong>Data</strong><br />

In general, quantitative data are:<br />

Numeric<br />

Non-linguistic<br />

Objectively Measurable<br />

Do we have quantitative data in our example?<br />

11 / 27


Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />

Interval <strong>Data</strong><br />

A type <strong>of</strong> data that are numeric <strong>and</strong> are naturally ordered, but<br />

have no meaningful zero. Distances are meaningful.<br />

Temperature: <strong>20</strong> ◦ is the same amount hotter than 15 ◦ as 15 ◦<br />

is <strong>to</strong> 10 ◦ , but <strong>20</strong> ◦ is not twice as hot as 10 ◦<br />

Time: The same amount <strong>of</strong> time elapses between 4:05 <strong>and</strong><br />

4:10 as between 5:12 <strong>and</strong> 5:17. 8:00 is not twice 4:00.<br />

Do we have interval data in our example?<br />

12 / 27


Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />

Ratio <strong>Data</strong><br />

A type <strong>of</strong> data that are numeric, naturally ordered, <strong>and</strong> have a<br />

meaningful zero. Distances <strong>and</strong> ratios are meaningful. Often<br />

physical in nature.<br />

Speed in km/h, m/h, etc<br />

Money/debt in dollars, euros, etc<br />

Electrical charge in coulombs<br />

Do we have ratio data in our example?<br />

13 / 27


Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />

Countability<br />

Applies only <strong>to</strong> numeric data<br />

Discrete: countable, exactly measurable, equally spaced; e.g.<br />

number <strong>of</strong> coin flips that l<strong>and</strong> heads, number <strong>of</strong> queens in a<br />

deck <strong>of</strong> cards, integers, money<br />

Continuous: not countable, not exactly measurable, arbitrarily<br />

close, infinitely divisible; e.g. weight, length are continuous,<br />

but our ability <strong>to</strong> measure is finite, so mostly<br />

round/approximate<br />

14 / 27


Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />

Example: High School Hobbies (cont.)<br />

A selection <strong>of</strong> the hypothetical raw data appear below:<br />

obs gender class level gpa height weight hobbies<br />

1 male jr 3.80 71 122 watching tv<br />

2 female soph 2.80 67 132 facebook<br />

3 female jr 3.90 62 148 baseball<br />

4 male jr 3.50 60 139 soccer<br />

5 male sr 2.60 66 133 yearbook<br />

6 female jr 1.60 65 147 baseball<br />

7 male sr 0.70 71 138 facebook<br />

8 female fresh 4.80 60 143 gymnastics<br />

9 male sr 4.00 72 132 hanging out<br />

10 male jr 1.30 72 146 reading<br />

15 / 27


Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />

Visual Displays <strong>of</strong> Information<br />

Aggregating the data <strong>and</strong> displaying it in a picture or summary <strong>of</strong><br />

some sort is in order.<br />

What sorts <strong>of</strong> displays have you seen before?<br />

16 / 27


Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />

Visual Displays <strong>of</strong> Information (cont.)<br />

Today: Qualitative <strong>Data</strong><br />

<strong>Data</strong> Tables<br />

Distribution Tables<br />

Contingency Tables<br />

Pie Charts<br />

Bar Charts<br />

Word Clouds<br />

Can work for quantitative data <strong>to</strong>o, though not ideal<br />

17 / 27


Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />

<strong>Data</strong> Tables<br />

A frequency table (distribution table) simply displays the number<br />

(or percentage) <strong>of</strong> observations which meet a certain criteria.<br />

Gender Female Male Total<br />

Frequency 247 253 500<br />

Class Fresh. Soph. Junior Senior Total<br />

Frequency 122 125 130 123 500<br />

GPA 0-1 1-2 2-3 3-4 4-5 Total<br />

Frequency 51 80 167 123 79 500<br />

Height (in) 60-63 63-66 66-69 69-72 Total<br />

Frequency 106 173 141 80 500<br />

18 / 27


Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />

<strong>Data</strong> Tables (cont.)<br />

Contingency tables (or cross-tabulations) display two or more<br />

variables at once. The frequency listed at the intersection <strong>of</strong> the<br />

column <strong>and</strong> row tells the number <strong>of</strong> observations which meet both<br />

criteria, while the <strong>to</strong>tals along the sides (marginal <strong>to</strong>tals) provide<br />

frequency distributions.<br />

Class<br />

Gender Fresh. Soph. Junior Senior Total<br />

Female 58 61 63 65 247<br />

Male 64 64 67 58 253<br />

Total 122 125 130 123 500<br />

19 / 27


Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />

Pie Charts<br />

Pie charts represent proportion (percentage <strong>of</strong> <strong>to</strong>tal) as sections <strong>of</strong><br />

a circular “pie”. The area <strong>of</strong> the “slice” is the same percentage <strong>of</strong><br />

the <strong>to</strong>tal “pie” as the category.<br />

GPA<br />

Above A A B C D/F<br />

<strong>20</strong> / 27


Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />

Pie Charts (cont.)<br />

The usage <strong>of</strong> pie charts is discouraged in statistics for the following<br />

reasons:<br />

Areas are difficult <strong>to</strong> compare between different shapes<br />

Angles are difficult <strong>to</strong> compare<br />

Enlarging a pie chart does not yield additional information<br />

Small portions are hard <strong>to</strong> see/compare<br />

What proportion do you think have a D/F compared with a C?<br />

21 / 27


Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />

Pie Charts (cont.)<br />

GPA<br />

Above A A B C D/F<br />

22 / 27


Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />

Pie Charts (cont.)<br />

GPA<br />

Above A A B C D/F<br />

16%<br />

10%<br />

16%<br />

25%<br />

33%<br />

23 / 27


Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />

Bar Charts<br />

Bar charts represent frequency by the height <strong>of</strong> the bar. Higher bar<br />

⇒ more observations in that category<br />

180<br />

160<br />

140<br />

1<strong>20</strong><br />

100<br />

80<br />

60<br />

40<br />

<strong>20</strong><br />

GPA<br />

0<br />

Above A A B C D/F<br />

24 / 27


Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />

Bar Charts (cont.)<br />

Compare this with the pie chart:<br />

Only comparing height–not shape, not angle, not area<br />

Horizontal lines can aid in height comparison<br />

Enlarging a bar chart makes the difference in height bigger<br />

Small portions still visible<br />

25 / 27


Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />

Word Clouds<br />

Word clouds display words or phrases, with font size proportional<br />

<strong>to</strong> usage/frequency. Not ideal, but hard <strong>to</strong> display phrases.<br />

Why not ideal? What could we do instead?<br />

26 / 27


Today’s Goals Kinds <strong>of</strong> <strong>Data</strong> Displaying Qualitative <strong>Data</strong><br />

Important Takeaways<br />

There are different kinds <strong>of</strong> data<br />

Qualitative/Quantitative<br />

Nominal/Ordinal/Interval/Ratio<br />

Discrete/Continuous<br />

Need <strong>to</strong> display data–hard <strong>to</strong> underst<strong>and</strong> raw data, especially<br />

in large amounts<br />

Qualitative <strong>Data</strong> Displays<br />

Tables (distribution <strong>and</strong> contingency)<br />

Pie Charts<br />

Bar Charts<br />

Word Clouds<br />

Next time: Quantitative <strong>Data</strong> Displays<br />

27 / 27

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!