11.07.2015 Views

Rudiments of Numeracy by A.S.C.Ehrenberg - School of Mathematics

Rudiments of Numeracy by A.S.C.Ehrenberg - School of Mathematics

Rudiments of Numeracy by A.S.C.Ehrenberg - School of Mathematics

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

1977] EHRENBERG - <strong>Rudiments</strong> <strong>of</strong> <strong>Numeracy</strong> 295were being looked at for the firstim ever. Good data presentation then takes a fair amount<strong>of</strong> work, since one seldom gets a new table completely righthe firstime round. But thereal problem with new data is not that <strong>of</strong> presenting it well, but <strong>of</strong> having firsto understand it.Luckily this is (or should be) relatively rare. Most situations faced <strong>by</strong> pr<strong>of</strong>essional or frequentusers <strong>of</strong> data are repetitive, in that they have already seen a good deal <strong>of</strong> similar data beforeand therefore know their probable structure (<strong>Ehrenberg</strong>, 1976a).To take Table 15 as an example, one's usual task is not to discover the basic duplicationpattern for the firstime (that can strictly happen at most once, and did so about 10 years agoin this instance-Goodhardt, 1966; <strong>Ehrenberg</strong> and Twyman, 1967). Instead, one needs toassess these particular data against one's prior knowledge <strong>of</strong> the duplication law, to establishand understand any apparent anomalies, to communicate the results to others, and to use theresults (e.g. for theoretical model-building, practical decision-making, prediction or control).In such well-understood repetitive situations the rules <strong>of</strong> data presentation can be appliedroutinely. Their use becomes highly efficient.The Quality <strong>of</strong> the Data. It is <strong>of</strong>ten said that the "quality" <strong>of</strong> the data should affect howthey are presented. This presumably refers to outliers, sampling errors and basic measurementproblems. But given that certainumbers are to be reported at all, it is better to presentthem clearly rather than obscurely, so the rules still apply. Good data presentation makesoutliers and misprints stand out: Twyman's Law-that any reading which looks interestingor different is probably wrong-can only be applied if we first see that a reading is out <strong>of</strong> step.Sampling errors occur if sample sizes are small. Most modern statisticians are <strong>of</strong> coursehighly trained to deal with this (if with nothing else) and in a paper before this Society theexistence <strong>of</strong> such issues can, I hope, largely be taken as read. I only add that some analysts'habit <strong>of</strong> attaching a standard error to every reading in the body <strong>of</strong> a table is both visuallyobnoxious and statistically naive. If standard errors or other devices <strong>of</strong> statistical inferenceneed to be explicitly quoted, this should be done either in a separate display, or in footnotes,or in the text.The basic problem with data is what the variables in question actually measure. In ourunemployment example the figures are for registered unemployed (with a good deal <strong>of</strong> smallprint in the definitions), and do not properly represent "unemployment", whatever thatmay be. Female unemployed tend, for example, to be markedly under-represented, especiallyat times <strong>of</strong> high general unemployment. Learning to understand what one's variables meanusually depends on comparing different types <strong>of</strong> measurement (e.g. the <strong>of</strong>ficial figures <strong>of</strong>registered unemployed with sample survey data <strong>of</strong> supposedly "actual" unemployed). This isusually a complex task and the need for effective data presentation remains. Even if ourmeasurements are known to be biased that is no reason for leaving the numerical results obscure.For the Record or . . . ? Three main types <strong>of</strong> empirically-basedata tables can bedistinguished:working tables, for the use <strong>of</strong> the analyst and his immediate colleagues, with no widercommunication in mind;the final presentation to a more or less specific audience, to support or illustrate somespecificonclusion or findings;tables set out "for the record" (as in <strong>of</strong>ficial statistics) in case someone wants to usethe data.In the firstwo cases the structure <strong>of</strong> the data needs to be apparent both to the analysthimself and to others. Hence the rules <strong>of</strong> this paper apply. With data presented "for therecord" however it is sometimes argued that the data will contain so many different stories,for different kinds <strong>of</strong> uses and users, that its presentation must vary accordingly. But few realinstances have been quoted and this conclusion seems to be the exception rather than therule. In any case, it does not follow that the data must be presented to tell no story, as is so<strong>of</strong>ten the case.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!