15.01.2013 Views

Oracle Database 11g The Complete Reference (Osborne ORACLE ...

Oracle Database 11g The Complete Reference (Osborne ORACLE ...

Oracle Database 11g The Complete Reference (Osborne ORACLE ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

NOTE<br />

AVG is not immune to missing data, and there can be cases where it<br />

will be significantly off (such as when missing data is not randomly<br />

distributed), but these cases will be less common.<br />

Chapter 9: Playing the Numbers 165<br />

<strong>The</strong> relative insensitivity of AVG to missing data needs to be contrasted with, for instance,<br />

SUM. How close to correct is the SUM of the ages of only 20 friends to the SUM of all 100<br />

friends? Not close at all. So if you had a table of friends, but only 20 out of 100 supplied their age,<br />

and 80 out of 100 had NULL for their age, which one would be a more reliable statistic about the<br />

whole group and less sensitive to the absence of data—the AVG age of those 20 friends, or the<br />

SUM of them? Note that this is an entirely different issue than whether it is possible to estimate<br />

the sum of all 100 based on only 20 (in fact, it is precisely the AVG of the 20, times 100). <strong>The</strong><br />

point is, if you don’t know how many rows are NULL, you can use the following to provide a<br />

fairly reasonable result:<br />

select AVG(Age) from BIRTHDAY;<br />

You cannot get a reasonable result from this, however:<br />

select SUM(Age) from BIRTHDAY;<br />

This same test of whether or not results are reasonable defines how the other group functions<br />

respond to NULLs. STDDEV and VARIANCE are measures of central tendency; they, too, are<br />

relatively insensitive to missing data. (<strong>The</strong>se will be shown in “STDDEV and VARIANCE,” later<br />

in this chapter.)<br />

MAX and MIN measure the extremes of your data. <strong>The</strong>y can fluctuate wildly while AVG stays<br />

relatively constant: If you add a 100-year-old man to a group of 99 people who are 50 years old,<br />

the average age only goes up to 50.5—but the maximum age has doubled. Add a newborn baby,<br />

and the average goes back to 50, but the minimum age is now 0. It’s clear that missing or unknown<br />

NULL values can profoundly affect MAX, MIN, and SUM, so be cautious when using them,<br />

particularly if a significant percentage of the data is NULL.<br />

Is it possible to create functions that also take into account how sparse the data is and how<br />

many values are NULL, compared to how many have real values, and make good guesses about<br />

MAX, MIN, and SUM? Yes, but such functions would be statistical projections, which must make<br />

explicit their assumptions about a particular set of data. This is not an appropriate task for a<br />

general-purpose group function. Some statisticians would argue that these functions should return<br />

NULL if they encounter any NULLs because returning any value can be misleading. <strong>Oracle</strong> returns<br />

something rather than nothing, but leaves it up to you to decide whether the result is reasonable.<br />

COUNT is a special case. It can go either way with NULL values, but it always returns a<br />

number; it will never evaluate to NULL. <strong>The</strong> format and usage for COUNT will be shown shortly,<br />

but to simply contrast it with the other group functions, it will count all the non-NULL rows of a<br />

column, or it will count all the rows. In other words, if asked to count the ages of 100 friends,<br />

COUNT will return a value of 20 (because only 20 of the 100 gave their age). If asked to count<br />

the rows in the table of friends without specifying a column, it will return 100. An example of<br />

these differences is given in “DISTINCT in Group Functions,” later in this chapter.<br />

Examples of Single- and Group-Value Functions<br />

Neither the group-value functions nor the single-value functions are particularly difficult to<br />

understand, but a practical overview of how each function works is helpful in fleshing out<br />

some of the options and consequences of their use.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!