27.10.2014 Views

Russel-Research-Method-in-Anthropology

Russel-Research-Method-in-Anthropology

Russel-Research-Method-in-Anthropology

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

Univariate Analysis 577<br />

<strong>in</strong>terquartile range for this variable is between 96 and 100 men per 100<br />

women. The mean is 100.2 and the median is 98.0—slightly skewed to the<br />

left but quite close to a normal distribution.<br />

Figure 19.5b shows MFRATIO with the extreme outlier excluded. Now the<br />

mean and the median for this variable are almost identical (98.0 and 98.8) and<br />

the distribution looks more-or-less normal. This is not a case for throw<strong>in</strong>g out<br />

outliers. It is a case for exam<strong>in</strong><strong>in</strong>g every variable <strong>in</strong> your data set to get a feel<br />

for the distributions as the first step <strong>in</strong> analysis.<br />

Figure 19.5c shows that 50% of the countries <strong>in</strong> the world had a very low<br />

per capita gross national product. There is almost no whisker to the left of the<br />

box, <strong>in</strong>dicat<strong>in</strong>g that only a very few countries were below the <strong>in</strong>terquartile<br />

range. There is a long whisker to the right of the box, as well as two outliers<br />

(asterisks) and six extreme outliers (circles). The circle farthest to the right is<br />

Switzerland, with over $35,000 <strong>in</strong> per capita GNP. To put this <strong>in</strong> perspective,<br />

Luxembourg had the world’s highest PCGDP <strong>in</strong> 2000, at $43,475. The United<br />

States was seventh, at $31,746. Mozambique was at the bottom, at $92.<br />

Mozambique came up to $1,200 <strong>in</strong> 2004, but by then, the United States had<br />

gone up to nearly $38,000 (http://www.odci.gov/cia/publications/factbook/).<br />

The shape of the box-and-whisker plot <strong>in</strong> figure 19.5c reflects the fact that the<br />

world’s population is divided between the haves and the have-nots.<br />

F<strong>in</strong>ally, figure 19.5d shows that total fertility <strong>in</strong> 2000 ranged between about<br />

2 and about 5, and that there were no outliers (that is, no cases beyond the<br />

whiskers). This is another variable that looks like it might be normally distributed,<br />

but notice that the median (the midl<strong>in</strong>e <strong>in</strong> the box) is off center to the<br />

left. The mean for TFR is 3.14, lower than the median, which is 3.47. We need<br />

more <strong>in</strong>formation. Read on.<br />

Histograms and Frequency Polygons<br />

Two other graphic methods are useful <strong>in</strong> univariate analysis: histograms<br />

and frequency polygons (also called frequency curves). Frequency polygons<br />

are l<strong>in</strong>e draw<strong>in</strong>gs made by connect<strong>in</strong>g the tops of the bars of a histogram and<br />

display<strong>in</strong>g the result without the bars. What you get is pure shape—less <strong>in</strong>formation<br />

than you get with box plots or stem-and-leaf plots, but easily <strong>in</strong>terpreted.<br />

Figure 19.6 shows the histograms and frequency polygons for MFRATIO,<br />

PCGDP, and TFR.<br />

The histogram and frequency polygon for MFRATIO nail down what we<br />

know from the box-plot of that variable: Most countries are <strong>in</strong> a narrow range<br />

for this variable, but a few are real outliers. The histogram/polygon for<br />

PCGDP removes all doubt about the shape of this variable: It is very highly<br />

skewed to the right, with most countries at the bottom and the long tail of

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!