Visual Analytics - An Interaction of Sight and Thought - JMP
Visual Analytics - An Interaction of Sight and Thought - JMP
Visual Analytics - An Interaction of Sight and Thought - JMP
Create successful ePaper yourself
Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.
<strong>Visual</strong> <strong><strong>An</strong>alytics</strong><br />
<strong>An</strong> <strong>Interaction</strong> <strong>of</strong> <strong>Sight</strong> <strong>and</strong> <strong>Thought</strong><br />
Stephen Few<br />
Perceptual Edge<br />
www.PerceptualEdge.com<br />
Copyright © 2012 Stephen Few
The world is not yet prepared for a true information age. Most people lack the basic<br />
skills <strong>and</strong> the good tools that are required to use information in effective ways. I recently<br />
discovered an organization named the 21st Century Fluency Project that is trying to<br />
help people develop the skills that are needed to survive <strong>and</strong> thrive in a true information<br />
age.
Hurray data!<br />
Hurray technology!<br />
We celebrate technology. We live for the latest gadgets.
But the abundance <strong>of</strong> data has left people overwhelmed <strong>and</strong> frustrated. You can spend<br />
millions <strong>of</strong> dollars to build the most robust <strong>and</strong> pristine data warehouse in the world,<br />
running on the most powerful hardware, <strong>and</strong> accessed by state-<strong>of</strong>-the-art business<br />
intelligence s<strong>of</strong>tware, but if the people who work with the data don’t know how to make<br />
sense <strong>of</strong> it <strong>and</strong> then clearly present what they find to decision makers, your investment<br />
is wasted.
Upon this gifted age, in its dark hour,<br />
rains from the sky a meteoric shower<br />
<strong>of</strong> facts…they lie, unquestioned, uncombined.<br />
Wisdom enough to leach us <strong>of</strong> our ill<br />
is daily spun; but there exists no loom<br />
to weave it into a fabric.<br />
“Huntsman, What Quarry?”, 1939, Edna St. Vincent Millay<br />
The poet Edna St. Vincent Millay anticipated our situation today way back in 1939 when<br />
she wrote these words. Our problem is not a lack <strong>of</strong> data, but rather an inability to make<br />
sense <strong>of</strong> <strong>and</strong> use what we have.
Information<br />
Data<br />
Sensemaking<br />
Data – all those facts that we’ve accumulated – do not qualify as information until they<br />
inform. To use data effectively, we make first make sense <strong>of</strong> them. Today, sensemaking<br />
is our primary task.
We must tap into that steady stream <strong>of</strong> information that is coming at us in a way that<br />
allows us to make sense <strong>of</strong> it <strong>and</strong> make use <strong>of</strong> it. Endless columns <strong>and</strong> rows <strong>of</strong><br />
numbers don’t reveal the stories that live in our data. Data needs to be expressed in a<br />
way that our eyes can easily see <strong>and</strong> our brains can easily underst<strong>and</strong>.
Only then will the important stories that live in our data come to light.
Data analysts<br />
<strong>JMP</strong><br />
are now sexy.<br />
Data analysts are now beginning to get the respect they deserve. No longer are they<br />
kept in some back room huddled over your computers. The role <strong>of</strong> data analyst is<br />
becoming the sexiest role in the job market. Data analysts will become the priests <strong>of</strong> the<br />
modern world – those who talk with God <strong>and</strong> divine the mysteries <strong>of</strong> data.
Hal Varian<br />
U.C. Berkeley <strong>and</strong> Google<br />
I keep saying the sexy job in the next ten years will be statisticians. People think I’m joking, but<br />
who would’ve guessed that computer engineers would’ve been the sexy job <strong>of</strong> the 1990s? The<br />
ability to take data—to be able to underst<strong>and</strong> it, to process it, to extract value from it, to<br />
visualize it, to communicate it—that’s going to be a hugely important skill in the next decades,<br />
not only at the pr<strong>of</strong>essional level but even at the educational level for elementary school kids,<br />
for high school kids, for college kids. Because now we really do have essentially free <strong>and</strong><br />
ubiquitous data. So the complimentary scarce factor is the ability to underst<strong>and</strong> that data <strong>and</strong><br />
extract value from it.<br />
I think statisticians are part <strong>of</strong> it, but it’s just a part. You also want to be able to visualize the<br />
data, communicate the data, <strong>and</strong> utilize it effectively. But I do think those skills—<strong>of</strong> being able<br />
to access, underst<strong>and</strong>, <strong>and</strong> communicate the insights you get from data analysis—are going to<br />
be extremely important. Managers need to be able to access <strong>and</strong> underst<strong>and</strong> the data<br />
themselves.<br />
You always have this problem <strong>of</strong> being surrounded by “yes men” <strong>and</strong> people who want to<br />
predigest everything for you. In the old organization, you had to have this whole army <strong>of</strong> people<br />
digesting information to be able to feed it to the decision maker at the top. But that’s not the<br />
way it works anymore: the information can be available across the ranks, to everyone in the<br />
organization. <strong>An</strong>d what you need to ensure is that people have access to the data they need to<br />
make their day-to-day decisions. <strong>An</strong>d this can be done much more easily than it could be done<br />
in the past. <strong>An</strong>d it really empowers the knowledge workers to work more effectively.<br />
“Hal Varian on how the Web challenges managers”, www.McKinseyQuarterly.com
To do this we need<br />
the right skills<br />
augmented by<br />
the right tools<br />
To explore <strong>and</strong> make sense <strong>of</strong> data, we need the right skills <strong>and</strong> good tools.
Skills<br />
Tools<br />
Sensemaking requires a close collaboration between our brains <strong>and</strong> computer-based<br />
tools that are designed to augment our brains.
Two <strong>of</strong> the right skills<br />
• Statistical thinking<br />
• <strong>Visual</strong> thinking<br />
Two <strong>of</strong> the important skill sets that we need include statistical thinking – the ability to<br />
find the stories that live in numbers <strong>and</strong> make sense <strong>of</strong> them – <strong>and</strong> visual thinking – the<br />
ability to see meaningful patterns in number by representing <strong>and</strong> interacting with them<br />
visually.
Three modes <strong>of</strong> sensemaking<br />
Verbal<br />
A B C<br />
Complementary<br />
Numerical<br />
1 2 3<br />
<strong>Visual</strong><br />
We humans developed three powerful modes <strong>of</strong> sensemaking during the course <strong>of</strong> our<br />
evolution: verbal, numerical, <strong>and</strong> visual. These modes developed separately to some<br />
degree, but evolved to work in complementary <strong>and</strong> collaborative ways.
What is visual thinking?<br />
Temple Gr<strong>and</strong>in<br />
We may consider “visual thinking” as that form <strong>of</strong> thought in which images are generated or<br />
recalled in the mind <strong>and</strong> are manipulated, overlaid, translated, associated with other similar<br />
forms (as with a metaphor), rotated, increased or reduced in size, distorted, or otherwise<br />
transformed gradually from one familiar image into another…“Pattern recognition” has been a<br />
major concern in artificial intelligence research <strong>and</strong> has proved to be quite complex. It has<br />
turned out to be one <strong>of</strong> those things which is very difficult for computers but easy for most<br />
human beings…For our purposes, we will consider pattern recognition to be the ability to<br />
discern similarities <strong>of</strong> form among two or more things, whether these be textile designs, facial<br />
resemblance <strong>of</strong> family members, graphs <strong>of</strong> repeating biological growth cycles, or similarities<br />
between historical epochs…From pattern recognition it is but a short step to “problem solving,”<br />
since, at least for its more common aspects, problem solving generally involves the recognition<br />
<strong>of</strong> a developing or repeating pattern <strong>and</strong> the carrying out <strong>of</strong> actions to obtain desired results<br />
based on one’s underst<strong>and</strong>ing <strong>of</strong> this pattern. (In the Mind’s Eye, Thomas G. West, p. 36)<br />
Temple Gr<strong>and</strong>in is an amazing visual thinker. Perhaps because she’s autistic, the visual<br />
thinking parts <strong>of</strong> her brain have developed more fully than in most <strong>of</strong> us. A recent HBO movie<br />
about her starring Claire Danes does a wonderful job <strong>of</strong> explaining her world <strong>and</strong> the way she’s<br />
put her unique resources to work to make the world better, especially the life <strong>of</strong> cattle. One <strong>of</strong><br />
the most revealing lines in that trailer was when Gr<strong>and</strong>in said, “I know my system will work<br />
because I’ve been through it a thous<strong>and</strong> times in my head.” The most gifted visual thinkers can<br />
see complex images <strong>and</strong> manipulate them in their mind’s eye.<br />
Most <strong>of</strong> us can’t hold complex images in our heads <strong>and</strong> manipulate them like Temple Gr<strong>and</strong>in. We<br />
can, however, use complex images that are in front <strong>of</strong> our eyes on computers <strong>and</strong> manipulate<br />
them.
We need greater balance.<br />
Modes <strong>of</strong> reasoning through history<br />
Numerical<br />
<strong>Visual</strong><br />
During our hunter-gatherer days, before symbolic reasoning developed, which led to<br />
language <strong>and</strong> mathematics, we were primarily visual, with just a crude sense <strong>of</strong> number.
We need greater balance.<br />
Modes <strong>of</strong> reasoning through history<br />
Verbal<br />
Numerical<br />
<strong>Visual</strong><br />
After language developed, however, especially after the development <strong>of</strong> writing, those in<br />
power favored verbal skills <strong>and</strong> numerical skills to a lesser degree, but visual skills lost<br />
favor as the low-brow reasoning <strong>of</strong> the masses.
We need greater balance.<br />
Modes <strong>of</strong> reasoning through history<br />
Verbal<br />
Numerical<br />
<strong>Visual</strong><br />
With the problems that we face today, this lack <strong>of</strong> balance remains, but we need to<br />
correct it.
Leonardo da Vinci<br />
For some four hundred or five hundred years we have had our schools teaching<br />
basically the skills <strong>of</strong> the medieval clerk—reading, writing, counting, <strong>and</strong><br />
memorizing texts. Now it seems that we might be on the verge <strong>of</strong> a new era, when<br />
we will wish to, <strong>and</strong> be required to, emphasize a very different set <strong>of</strong> skills—those<br />
<strong>of</strong> a Renaissance man such as Leonardo da Vinci. With such a change, traits that<br />
are considered desirable today might very well be obsolete <strong>and</strong> unwanted<br />
tomorrow. In place <strong>of</strong> the qualities desired in a well-trained clerk, we might,<br />
instead, find preferable a habit <strong>of</strong> innovation in many diverse fields, the<br />
perspective <strong>of</strong> the global generalist rather than the narrowly focused specialist,<br />
<strong>and</strong> an emphasis on visual content <strong>and</strong> analysis over parallel verbal modes.<br />
If we continue to turn out people who primarily have the skills (<strong>and</strong> outlook) <strong>of</strong> the<br />
clerk, however well trained, we may increasingly be turning out people who will,<br />
like the unskilled laborer <strong>of</strong> the last century, have less <strong>and</strong> less to sell in the<br />
marketplace. Sometime in the not too distant future machines will be the best<br />
clerks. It will be left to humans to maximize what is most valued among human<br />
capabilities <strong>and</strong> what machines cannot do—<strong>and</strong> increasingly these are likely to<br />
involve the insightful <strong>and</strong> integrative capacities associated with visual modes <strong>of</strong><br />
thought.<br />
(In the Mind’s Eye, Thomas G. West, 2009)
Many people who struggle with dyslexia (difficulty reading) are exceptional visual<br />
thinkings, <strong>of</strong>ten because they develop visual thinking skills to to a greater degree than<br />
the rest <strong>of</strong> us to make up for their inability to read. Thomas West does a wonderful job<br />
<strong>of</strong> describing this in his book In the Mind’s Eye.
Complex problems<br />
require visual<br />
thinking.<br />
Everyone agrees that we have a problem. Our technological culture is drowning in its own success.<br />
Masses <strong>of</strong> data <strong>and</strong> information are accumulating everywhere. Up to now, the basic strategy for dealing<br />
with these growing masses <strong>of</strong> information has been long, mind-numbing education <strong>and</strong> reckless,<br />
blinkered specialization. That this strategy has been effective in a great many respects, so far, there can<br />
be little debate. The problems we are discussing are a tribute to its ample <strong>and</strong> abundant success, so far.<br />
However, after long success, it is becoming increasingly clear that this strategy may be entering a<br />
phase <strong>of</strong> diminishing return. It has long been recognized that this strategy has always had built-in<br />
problems. The more one knows in one’s own, increasingly narrow area, the more one is ignorant in<br />
other areas, the more difficult is effective communication between unrelated areas, <strong>and</strong> the more<br />
unlikely it is that the larger whole will be properly perceived or understood. Like the student who reads<br />
too much small print, the specialist’s habitual near focus <strong>of</strong>ten promotes the myopic perspective that<br />
precludes the comprehension <strong>of</strong> larger, more important patterns. The distant view <strong>of</strong> the whole is blurred<br />
<strong>and</strong> unclear. If you focus only on a small group <strong>of</strong> stars at the edge <strong>of</strong> the Milky Way, you will not<br />
perceive the larger structure <strong>of</strong> the whole galaxy <strong>of</strong> which the group is one tiny part.<br />
The specialist strategy breeds its own limits. Pieces <strong>of</strong> the puzzle in separate areas remain far apart, or<br />
come together only after decades <strong>of</strong> specialist resistance, or success in one area leads to great<br />
problems in another. Material abundance produces waste-disposal problems; cars <strong>and</strong> aircraft produce<br />
wonderful mobility for many people, but also deplete resources, produce accidental fatalities, <strong>and</strong><br />
increase pollution; success in vaccination, hygiene, <strong>and</strong> health care lead to all the problems <strong>of</strong> great<br />
concentrations <strong>of</strong> human population.<br />
As the specialist strategy continues to be pursued, a sense <strong>of</strong> the whole is increasingly lost. Many know<br />
their areas; few see the whole. Many are expert; few are wise. But the visual thinkers, late bloomers,<br />
<strong>and</strong> creative dyslexics we have been dealing with have <strong>of</strong>ten been outsiders or reluctant participants in<br />
this specialist culture—especially those who are energetic, <strong>and</strong> globally minded, who seem always to be<br />
interested in everything, unable to settle down to a “serious” (that is, highly specialized) area <strong>of</strong> study.<br />
(In the Mind’s Eye, Thomas G. West, p. 298)
Only visual thinking can connect the dots.<br />
Connecting the dots <strong>of</strong> complex problems requires visual thinking.
Tools extend our reach<br />
when they’re well designed<br />
<strong>and</strong> properly used.<br />
Technologies extend our reach, but only when it’s well designed <strong>and</strong> properly used. Bad<br />
tools or good tools used in the wrong ways can actually narrow our reach.
Tools<br />
When tools are bad or misused, our reliance on them can make us lazy, weak, <strong>and</strong><br />
stupid.
Can computers replace us?<br />
Our brains <strong>and</strong> computer processors work differently <strong>and</strong> therefore have different<br />
strengths. We’ve designed computers to do things that we don’t do very well, which<br />
makes perfect sense, because we use them to complement <strong>and</strong> extend our abilities.<br />
They don’t, however, replace our abilities. Even though IBM’s supercomputer named<br />
Watson can beat the best human champions <strong>of</strong> Jeopardy, we can all do with relative<br />
ease in a second what Watson can’t figure out with endless effort: recognize that that<br />
this is the word “captcha.”
The right tools support<br />
• Statistical calculations<br />
• Interactive visualizations<br />
Good data sensemaking tools support statistical calculations, using the strength <strong>of</strong><br />
computers to perform those calculations quickly <strong>and</strong> accurately, <strong>and</strong> interactive<br />
visualizations, making it possible to find <strong>and</strong> underst<strong>and</strong> the meaningful patterns in our<br />
data.
<strong>Visual</strong>ization empowers statistics.<br />
"The greatest value <strong>of</strong> a picture is when it forces<br />
us to notice what we never expected to see."<br />
Back in the 1970s, Princeton statistician John Tukey recognized the important<br />
collaboration <strong>of</strong> statistical calculations <strong>and</strong> visual exploration. Even sophisticated<br />
statisticians <strong>of</strong>ten need to use their eyes to find <strong>and</strong> underst<strong>and</strong>s the patterns that<br />
reside in data.
Why the eyes?<br />
I became fascinated by data visualization — the use <strong>of</strong> our eyes in close collaboration<br />
with our brains to make sense <strong>of</strong> information — not because I’m a visually-oriented<br />
person (I’m actually much more verbal than visual) but because it <strong>of</strong>fers solutions to real<br />
analytical problems that concern us, solutions that st<strong>and</strong> apart in their ability to<br />
enlighten.
“I see!”<br />
There is an intimate connection between seeing <strong>and</strong> thinking. For this reason, when we<br />
talking about underst<strong>and</strong>ing, we tend to use terms that are related to vision. When we<br />
suddenly come to underst<strong>and</strong> something we say “I see!” It’s hard to think <strong>of</strong> terms that<br />
describe underst<strong>and</strong>ing that don’t refer to sight.
Vision is our dominant sense.<br />
50%<br />
<strong>of</strong> the brain’s resources<br />
If you think the brain has to devote to vision a lot <strong>of</strong> its precious thinking<br />
resources, you are right on the money. It takes up about half <strong>of</strong> everything you<br />
do, in fact. (Brain Rules, John Medina, p. 231)<br />
When it comes to memory, researchers have known for more than 100 years that<br />
pictures <strong>and</strong> text follow very different rules. Put simply, the more visual the input<br />
becomes, the more likely it is to be recognized — <strong>and</strong> recalled. The<br />
phenomenon is so pervasive, it has been given its own name: the pictorial<br />
superiority effect, <strong>of</strong> PSE. (Ibid, p. 233)
Vision is our dominant sense.<br />
30%<br />
70%<br />
Human perception is amazing. I cherish all five <strong>of</strong> the senses that connect us to the world, that allow<br />
to experience beauty <strong>and</strong> an inexhaustible <strong>and</strong> diverse wealth <strong>of</strong> sensation. But <strong>of</strong> all the senses, one<br />
st<strong>and</strong>s out dramatically as our primary <strong>and</strong> most powerful channel <strong>of</strong> input from the world around us,<br />
that is vision. Approximately 70% <strong>of</strong> the body’s sense receptors reside in the eye.<br />
Perhaps the world’s top expert in visual perception <strong>and</strong> how its power can be harnessed for the effec<br />
display <strong>of</strong> information is Colin Ware, who has convincingly described the importance <strong>of</strong> data<br />
visualization. He asks:<br />
Why should we be interested in visualization? Because the human visual system is a pattern seeke<br />
enormous power <strong>and</strong> subtlety. The eye <strong>and</strong> the visual cortex <strong>of</strong> the brain form a massively parallel<br />
processor that provides the highest-b<strong>and</strong>width channel into human cognitive centers. At higher lev<br />
<strong>of</strong> processing, perception <strong>and</strong> cognition are closely interrelated, which is the reason why the words<br />
‘underst<strong>and</strong>ing’ <strong>and</strong> ‘seeing’ are synonymous. However, the visual system has its own rules. We ca<br />
easily see patterns presented in certain ways, but if they are presented in other ways, they become<br />
invisible…The more general point is that when data is presented in certain ways, the patterns can<br />
readily perceived. If we can underst<strong>and</strong> how perception works, our knowledge can be translated in<br />
rules for displaying information. Following perception-based rules, we can present our data in such<br />
way that the important <strong>and</strong> informative patterns st<strong>and</strong> out. If we disobey the rules, our data will be<br />
incomprehensible or misleading.<br />
(Information <strong>Visual</strong>ization: Perception for Design, Second Edition, Colin Ware, Morgan Kaufmann<br />
Publishers, 2004, page xxi)<br />
Perhaps the best known expert in data visualization, Edward Tufte, says: “Clear <strong>and</strong> precise seeing<br />
becomes as one with clear <strong>and</strong> precise thinking.” (<strong>Visual</strong> Explanations, Edward R. Tufte, Graphics Pr<br />
Cheshire, CT.1997 page 53)
<strong>Visual</strong>ization weaves numbers into pictures.<br />
Data visualization is the loom that will weave the data that we collect into the fabric <strong>of</strong><br />
underst<strong>and</strong>ing. Pictures <strong>of</strong> data can make visible the meanings that might forever<br />
otherwise remain hidden.
Trends, patterns, <strong>and</strong> exceptions come to light.<br />
The presentation <strong>of</strong> data as text, such as you see in this table, is perfect when you need<br />
precise values or when the purpose is to look up or compare individual values, but not<br />
when you wish to see patterns, trends, <strong>and</strong> exceptions, or to make comparisons. When<br />
this is your goal, visualizations work best.<br />
When data is presented visually, it is given visible form, <strong>and</strong> from this we can easily<br />
glean insights that would take a long time to piece together from the same data<br />
presented textually, if ever. This graph <strong>of</strong> the same data that appears in the table makes<br />
brings to light several <strong>of</strong> the stories contained in the data that weren’t obvious before,<br />
<strong>and</strong> it did so instantly.<br />
When] we visualize the data effectively <strong>and</strong> suddenly, there is what Joseph Berkson<br />
called ‘interocular traumatic impact’: a conclusion that hits us between the eyes.<br />
(<strong>Visual</strong>izing Data, William S. Clevel<strong>and</strong>, Hobart Press, 1993, page 12)<br />
Modern data graphics can do much more than simply substitute for small statistical<br />
tables. At their best, graphics are instruments for reasoning about quantitative<br />
information. Often the most effective way to describe, explore, <strong>and</strong> summarize a set<br />
<strong>of</strong> numbers – even a very large set – is to look at pictures <strong>of</strong> those numbers.<br />
Furthermore, <strong>of</strong> all methods for analyzing <strong>and</strong> communicating statistical information,<br />
well-designed data graphics are usually the simplest <strong>and</strong> at the same time the most<br />
powerful.<br />
(The <strong>Visual</strong> Display <strong>of</strong> Quantitative Information, Edward R. Tufte, Graphics Press:<br />
Cheshire, CT 1983, Introduction)
A picture David is worth visits a thous<strong>and</strong> America. words.<br />
As we all know, the right picture can sometimes tell a story in a way that no amount <strong>of</strong><br />
words could ever match. To take advantage <strong>of</strong> visualization’s great potential, we first<br />
must know when pictures rather than words are needed.
This visualization fails to do this. This series <strong>of</strong> circles within circles––blue for the market<br />
values <strong>of</strong> banks in quarter 2 <strong>of</strong> 2007, before the recent financial meltdown, <strong>and</strong> green for<br />
declined values as <strong>of</strong> January <strong>of</strong> 2009––was published by Bloomberg. You would never<br />
guess its purpose, however, which was to show that J. P. Morgan’s decline in market<br />
value was less severe than all other major banks except one: Sant<strong>and</strong>er.<br />
This picture <strong>of</strong> the data doesn’t tell the story clearly, simply, or accurately. The<br />
comparative sizes <strong>of</strong> the circles are far from the comparative market values. Even if the<br />
sizes <strong>of</strong> the circles were accurate, we would still struggle with this chart because visual<br />
perception isn’t well-tuned to h<strong>and</strong>le size comparisons, but it is tuned to h<strong>and</strong>le length<br />
comparisons,...
...such as the lengths <strong>of</strong> these bars in my redesign <strong>of</strong> the chart. We can now easily see<br />
that J. P. Morgan lost roughly half <strong>of</strong> its market value during this period, but the fact that<br />
its losses were less severe than all by one bank––Sant<strong>and</strong>er––still isn’t obvious. The<br />
right addition to the picture, however, such as this one in the bottom half that displays<br />
the losses directly, can make this part <strong>of</strong> the story clear as well.
The right balance<br />
38<br />
Information visualization makes possible an ideal balance between unconscious<br />
perceptual <strong>and</strong> conscious cognitive processes. With the proper tools, we can shift much<br />
<strong>of</strong> the analytical process from conscious processes in the brain to pre-attentive<br />
processes <strong>of</strong> visual perception, letting our eyes do what they do extremely well.
Confused?<br />
Unfortunately, so much <strong>of</strong> “data visualization” today gives it a bad name <strong>and</strong> causes<br />
confusion about what it is, how it works, <strong>and</strong> what can be accomplished when it is<br />
properly done.
Many visualization tools<br />
feature decoration,<br />
not substance.<br />
Rather than building useful tools, most s<strong>of</strong>tware vendors are competing to out dazzle<br />
one another with silly visual effects that treat data visualization like it’s a video game.
Pie!!!<br />
Most so-called data analysis tools – especially data visualization tools – so far show<br />
little underst<strong>and</strong>ing <strong>of</strong> the people who use them <strong>and</strong> little respect for their intelligence.<br />
They treat people as if they’re dumb, <strong>and</strong> by doing so encourage them to become<br />
dumb, assuming that they wish merely to be entertained by cute pictures <strong>and</strong> pretty<br />
colors.
Bling your graph.<br />
Many s<strong>of</strong>tware vendors promote the notion that data visualization is not about<br />
underst<strong>and</strong>ing <strong>and</strong> communication, but about “bling.”
Flashy nonsense<br />
Dashboards like this have become a popular form <strong>of</strong> display in the last few years, but<br />
while they give the appearance <strong>of</strong> data density, most <strong>of</strong> them present little <strong>and</strong> the what<br />
they display is presented poorly. But they look cool, <strong>and</strong> for many that’s all that matters,<br />
but it isn’t enough.
3-D graphs are rarely useful.<br />
One <strong>of</strong> the common themes <strong>of</strong> most s<strong>of</strong>tware vendors is that 2-D displays are boring;<br />
never as good as 3D.
Adding a third dimension <strong>of</strong> depth to the bars on the right without adding a<br />
corresponding third variable, however, is not only meaningless, it makes it more difficult<br />
to decode the data.
Even when there’s a 3 rd variable.<br />
Even when a third dimension <strong>of</strong> meaning exists, such as the regions along the Z-axis <strong>of</strong><br />
these graphs, problems with 3-D graphs persist.
The graph on the left illustrate the common problem <strong>of</strong> occlusion--the fact the some<br />
objects, such as these bars, become hidden behind others. In a graph, we should be<br />
able to see all the data at once to make the necessary connections <strong>and</strong> comparisons.<br />
Looking at the graph on the right, I bet you can’t even determine which line represents<br />
the South region. If we can’t figure out which line is which region, what good is the<br />
graph?
We don’t want to confuse people with graphical puzzles.<br />
This chart <strong>of</strong> Escher’s changing popularity through time was created by B. Brucker. I<br />
found it at www.GraphJam.com.
Data visualization builds a bridge<br />
from data to knowledge.<br />
Data <strong>Visual</strong>ization<br />
Knowledge<br />
Data<br />
Built on an underst<strong>and</strong>ing<br />
<strong>of</strong><br />
visual perception<br />
<strong>and</strong><br />
cognition<br />
Data visualization can help us build a bridge from data to knowledge, but only if the<br />
tools that we use were built on an underst<strong>and</strong>ing <strong>of</strong> visual perception (how we see) <strong>and</strong><br />
cognition (how we think).
Good visualizations <strong>and</strong> good visualization tools are carefully designed to take<br />
advantage <strong>of</strong> human perceptual <strong>and</strong> cognitive strengths <strong>and</strong> to augment human abilities<br />
that are weak. If the goal is to count the number <strong>of</strong> circles, this visualization isn’t well<br />
designed. It is difficult to remember what you have <strong>and</strong> have not counted.<br />
50
This visualization, which shows the same number <strong>of</strong> circles, however, is well designed<br />
for the counting task. Because the circles are grouped into small sets <strong>of</strong> five each, it is<br />
easy to remember which groups have <strong>and</strong> have not been counted, easy to quickly count<br />
the number <strong>of</strong> circles in each group, <strong>and</strong> easy to discover with little effort that each <strong>of</strong><br />
the five groups contains the same number <strong>of</strong> circles (i.e., five), resulting in a total count<br />
<strong>of</strong> 25 circles.<br />
51
This arrangement is even better yet.<br />
52
<strong>Visual</strong> perception works according to<br />
its own rules.<br />
Many <strong>of</strong> the ways that visual perception work are not intuitive. Looking at these two sets<br />
<strong>of</strong> objects, we naturally see those on the left as convex <strong>and</strong> on those the right as<br />
concave.
<strong>Visual</strong> perception works according to<br />
its own rules.<br />
The effect has now been reversed: we see the objects on the left as concave <strong>and</strong> those<br />
on the right as convex. All I did, however, was turn each sets <strong>of</strong> objects upside down—I<br />
didn’t switch them. The reason that we now see those on the left as concave is<br />
because, through eons <strong>of</strong> evolution, visual perception learned to assume that light was<br />
shining from above, which causes us to see the objects on the left as concave, because<br />
the shadows are on the top, <strong>and</strong> those on the right as convex, because the shadows<br />
are on the bottom.
<strong>Visual</strong> perception is not just camera work.<br />
Unlike a camera, visual perception does not record absolute values <strong>of</strong> the things that<br />
we see, but differences between them.
<strong>Visual</strong> perception is not just camera work.<br />
Despite how differently they look in the original image, squares A <strong>and</strong> B are exactly the<br />
same color. What we see is not a simple recording <strong>of</strong> what is actually out there. Seeing<br />
is an active process that involves interpretations by our brains <strong>of</strong> data that is sensed by<br />
our eyes in an effort to make sense <strong>of</strong> it in context. The presence <strong>of</strong> the cylinder <strong>and</strong> its<br />
shadow in the image <strong>of</strong> the checkerboard triggers an adjustment in our minds to<br />
perceive the square labeled B as lighter than it actually is. The illusion is also created by<br />
the fact that the sensors in our eyes do not register actual color but rather the difference<br />
in color between something <strong>and</strong> what’s nearby. The contrast between square A <strong>and</strong> the<br />
light squares that surround it <strong>and</strong> square B <strong>and</strong> the dark squares that surround it cause<br />
us to perceive squares A <strong>and</strong> B quite differently, even though they are actually the same<br />
color, as you can clearly see above after all <strong>of</strong> the surrounding context has been<br />
removed.<br />
The ability to use graphs effectively requires a basic underst<strong>and</strong>ing <strong>of</strong> how we<br />
unconsciously interpret what we see.
Context affects what we see. This image illustrates the surprising effect that a simple<br />
change in the lightness <strong>of</strong> the background alone has on our perception <strong>of</strong> color. The<br />
large rectangle displays a simple color gradient <strong>of</strong> a gray-scale from fully light to fully<br />
dark. The small rectangle is the same exact color everywhere it appears, but it doesn’t<br />
look that way because our brains perceive visual differences rather than absolute<br />
values, in this case between the color <strong>of</strong> the small rectangle <strong>and</strong> the color that<br />
immediately surrounds it.<br />
Among other things, underst<strong>and</strong>ing this should tell us that using a color gradient as the<br />
background <strong>of</strong> a graph should be avoided.
Two essential activities<br />
• Multi-faceted views<br />
• Comparisons<br />
Data sensemaking revolves primarily around two essential activities: viewing data from<br />
multiple perspectives <strong>and</strong> comparing things.<br />
No one view <strong>of</strong> the data will suffice. To underst<strong>and</strong> your data thoroughly, you must view<br />
it from every possible perspective. With each new view, more than anything else, you<br />
make compare things.
Multiple views <strong>of</strong> change through time<br />
Have you ever noticed that time-series data can look quite different if you change the<br />
interval <strong>of</strong> time that you’ve aggregated it to in a graph? For example, if you are<br />
examining a year’s worth <strong>of</strong> visits to your website with one value per quarter<br />
(aggregated to quarterly intervals), <strong>and</strong> then switch to a monthly aggregation <strong>of</strong> the<br />
same data, <strong>and</strong> then switch to a daily aggregation, the patterns <strong>of</strong> change might look<br />
quite different.<br />
All three versions <strong>of</strong> the graph are useful <strong>and</strong> correct, but the daily version reveals<br />
details that aren’t visible when viewing the same data by month or quarter. On the other<br />
h<strong>and</strong>, the overall trend is difficult to discern from the daily view. One view isn’t better<br />
than the other in general, but one is definitely better than the other when you have a<br />
specific analytical purpose in mind. For this reason, don’t restrict your view <strong>of</strong> timeseries<br />
data to a single interval <strong>of</strong> time, especially when looking for anything that seems<br />
interesting. Switch the level <strong>of</strong> aggregation from year to quarter, quarter to month,<br />
month to week, week to day, <strong>and</strong> so on—back <strong>and</strong> forth—to tease out the insights that<br />
can only arise when this is done.<br />
S<strong>of</strong>tware products that allow you to quickly <strong>and</strong> easily switch between various intervals<br />
<strong>of</strong> time while viewing data graphically are what’s needed to encourage this practice. The<br />
ability to switch time intervals with a mouse click or two, or by using something as<br />
simple as an interval slider control, will set you free to explore without distraction from<br />
onerous operations.
Data analysis requires rapid interaction.<br />
?Think<br />
See<br />
Modify<br />
Direct dynamic interaction with the properly visualized data allows us to see discover<br />
meaningful patterns, trends, <strong>and</strong> exceptions in the display <strong>and</strong> to interact with it directly<br />
to filter out what we don’t need, drill into details, combine multiple variables for<br />
comparison, etc., in ways that promote a smooth flow between seeing something,<br />
thinking about it, <strong>and</strong> manipulating it, with no distracting lags in between. This is what I<br />
call “visual analysis at the speed <strong>of</strong> thought.”<br />
Great analysts, like great scientists, great artists, great people <strong>of</strong> all sorts, accept the<br />
call to serve as a voice for data. Important stories can be found in data. We can learn to<br />
discern the meanings that live in information <strong>and</strong> to unravel the stories that are woven<br />
through it. What I do isn’t just work; it is my mission. I work hard to learn the world’s<br />
stories <strong>and</strong> to tell them truthfully. I believe that there is no higher calling. I fight against<br />
those who try to hide or alter the truth. I believe that the truth really can set us free.
“Compared to what?”<br />
Edward Tufte once said that the essential question <strong>of</strong> data sensemaking is “Compared<br />
to what?” This involves comparing the magnitudes <strong>of</strong> values, but not just one to another.<br />
We must compare many values. To do so, we must see how they relate to one another<br />
to form patterns. We not only compare the magnitudes <strong>of</strong> values; we also compare<br />
patterns formed by sets <strong>of</strong> values. We look for how they are similar <strong>and</strong> we look for how<br />
they are different, especially differences that appear to be dramatic departures from the<br />
norm. When we spot these visual characteristics in the data, we then interact with the<br />
data to find out why these things have happened.
Key interactions <strong>of</strong> visual analysis<br />
• Sorting<br />
• Filtering<br />
• Adding/removing variables<br />
• Highlighting<br />
• Aggregating/Disaggregating<br />
• Grouping<br />
• Zooming/Panning<br />
• Re-visualizing<br />
• Re-expressing<br />
• Re-scaling<br />
The process <strong>of</strong> visual data analysis involves several common interactions with data to uncover what’s meaningful. Here<br />
are some <strong>of</strong> the primary interactions:<br />
• Sorting. The act <strong>of</strong> sorting data, especially by the magnitude <strong>of</strong> the values from high to low or low to high, features the<br />
ranking relationship between those values <strong>and</strong> makes it easier to compare the magnitude <strong>of</strong> value to the next.<br />
• Adding/removing variables. You might need to view different variable at different times during the analysis process,<br />
so it is common to add or remove field <strong>of</strong> data from view as necessary<br />
• Filtering. When you want to focus on a subset <strong>of</strong> data, nothing makes it easier to do so than filtering—the removal<br />
from view <strong>of</strong> everything your not interested in at the moment.<br />
• Highlighting. Sometimes you want to focus on a subset <strong>of</strong> information, but do so in a way that allows you to maintain<br />
a sense <strong>of</strong> how that subset relates to the whole. Rather than filtering out the data that falls outside your range <strong>of</strong> focus,<br />
you can simply reduce its visual salience or increase the visual salience <strong>of</strong> the data you wish to focus on. This allows<br />
you to focus on the subset with less distraction from the whole in a way that allow you to remain aware <strong>of</strong> the whole.<br />
This is one way <strong>of</strong> achieving what’s called a focus+context view.<br />
• Aggregating/Disaggregating. <strong>An</strong>alysis <strong>of</strong>ten requires that you examine data a different levels <strong>of</strong> detail. Aggregation<br />
involves viewing data at a higher level <strong>of</strong> summarization. Disaggregation involves viewing data at a lower level <strong>of</strong><br />
detail.<br />
• Grouping. Sometimes it is useful to combine members <strong>of</strong> a variable together, treating them as a single member <strong>of</strong> the<br />
variable. This may take the form <strong>of</strong> combining some members <strong>and</strong> leaving others as they are, or <strong>of</strong> creating an entirely<br />
new variable that combines all members <strong>of</strong> an existing variable into a groups to form members <strong>of</strong> a higher level<br />
variable.<br />
• Zooming/Panning. When a data visualization contains so much that it is difficult to clearly see all the data at once, it is<br />
useful to zoom in on that portion that you want to see more clearly. Panning involves moving around (for example, up,<br />
down, right, or left) in a zoomed view to focus on a different part <strong>of</strong> the larger visualization.<br />
• Re-visualizing. No one visual representation <strong>of</strong> data can show you everything there is to see, so visual analysis<br />
involves shifting from one type <strong>of</strong> visualization to another to explore data from various perspectives.<br />
• Re-expressing. Sometimes it is useful to express a quantitative variable as a different unit <strong>of</strong> measure, such as<br />
expressing dollars as percentages.<br />
• Re-scaling. No single quantitative scale on a graph can serve every analytical need. Rescaling involves changing the<br />
range <strong>of</strong> the quantitative scale to make it easier to see particular patterns <strong>and</strong> sometimes even changing the nature <strong>of</strong><br />
the scale, such as from a normal scale to a logarithmic scale.
Dynamic data filters support analytical flow.<br />
Direct dynamic interaction with the data allows you to manipulate the data easily <strong>and</strong><br />
immediately (such as by filtering it), without interrupting your stream <strong>of</strong> thought.
Shneiderman’s Mantra<br />
“Overview first,<br />
zoom <strong>and</strong> filter,<br />
then details-on-dem<strong>and</strong>.”<br />
When new recruits by intelligence organizations are trained in spy craft, they are taught<br />
a method <strong>of</strong> observation that begins by getting an overview <strong>of</strong> the scene around them<br />
while being sensitive to things that appear abnormal, not quite right, which they should<br />
then focus in on for close observation <strong>and</strong> analysis.<br />
A visual information-seeking mantra for designers: ‘Overview first, zoom <strong>and</strong> filter,<br />
then details-on-dem<strong>and</strong>.’<br />
(Readings in Information <strong>Visual</strong>ization: Using Vision to Think, Stuart K. Card, Jock D.<br />
Mackinlay, <strong>and</strong> Ben Shneiderman, Academic Press, San Diego, California, 1999,<br />
page 625)<br />
Having an overview is very important. It reduces search, allows the detection <strong>of</strong><br />
overall patterns, <strong>and</strong> aids the user in choosing the next move. A general heuristic <strong>of</strong><br />
visualization design, therefore, is to start with an overview. But it is also necessary for<br />
the user to access details rapidly. One solution is overview + detail: to provide<br />
multiple views, an overview for orientation, <strong>and</strong> a detailed view for further work.<br />
(Ibid., page 285)<br />
Users <strong>of</strong>ten try to make a ‘good’ choice by deciding first what they do not want, i.e.<br />
they first try to reduce the data set to a smaller, more manageable size. After some<br />
iterations, it is easier to make the final selection(s) from the reduced data set. This<br />
iterative refinement or progressive querying <strong>of</strong> data sets is sometimes known as<br />
hierarchical decision-making.<br />
(Ibid., page 295)
Overview first, zoom <strong>and</strong> filter, then details-on-dem<strong>and</strong><br />
Shneiderman’s technique begins with an overview <strong>of</strong> the data — the big picture. Let<br />
your eyes search for particular points <strong>of</strong> interest in the whole.
Overview first, zoom <strong>and</strong> filter, then details-on-dem<strong>and</strong><br />
When you see a particular point <strong>of</strong> interest, then zoom in on it.
Overview first, zoom <strong>and</strong> filter, then details-on-dem<strong>and</strong><br />
Once you’ve zoomed in on it, you can examine it more closely <strong>and</strong> in greater detail.
Overview first, zoom <strong>and</strong> filter, then details-on-dem<strong>and</strong><br />
Often you must remove data that is extraneous to your investigation to better focus on<br />
the relevant data.
Overview first, zoom <strong>and</strong> filter, then details-on-dem<strong>and</strong><br />
Filtering out extraneous data removes distractions from the data under investigation.
Overview first, zoom <strong>and</strong> filter, then details-on-dem<strong>and</strong><br />
Jan 21, 2011<br />
$25,193<br />
<strong>Visual</strong> data analysis relies mostly on the shape <strong>of</strong> the data to provide needed insights,<br />
but there are still times when you need to see the details behind the shape <strong>of</strong> the data.<br />
Having a means to easily see the details when you need them, without having them in<br />
the way when you don’t works best.
Data visualization must<br />
focus attention<br />
<strong>and</strong><br />
augment memory.<br />
To achieve this goal, tools must be designed to focus our attention on what’s meaningful<br />
in data <strong>and</strong> to help us work around the limitations <strong>of</strong> memory.
Attention is limited.<br />
Ever saw a magic show <strong>and</strong> wondered just how the magician took your watch<br />
without you even noticing? Ever wonder why is it that you can search for a set <strong>of</strong><br />
misplaced keys for a long time, only to later find them sitting in the exact place<br />
where you were looking? Research has shown that we don’t always see everything<br />
we’re looking at, <strong>and</strong> that attention plays a big part in what consciously registers to<br />
us. The effect where we’re blind to things we don’t attend to is known as<br />
‘Inattentional Blindness’.<br />
(Source: This demonstration <strong>of</strong> inattentional blindness <strong>and</strong> the explanation above<br />
was prepared under the direction <strong>of</strong> Ronald A. Rensink <strong>of</strong> the University <strong>of</strong> British<br />
Columbia. Several other examples <strong>of</strong> this visual phenomenon can be found at http://<br />
psyclab1.psych.ubc.ca/~viscoglab/.)
Memory is limited.<br />
What’s not stored in working memory is lost.<br />
In addition to underst<strong>and</strong>ing visual perception, visual analysis tools must also be rooted<br />
in an underst<strong>and</strong>ing <strong>of</strong> how people think. Only then can they recognize <strong>and</strong> support the<br />
cognitive operations that are necessary to make sense <strong>of</strong> information.<br />
Memory plays an important role in human cognition. Because memory suffers from<br />
certain limitations, visual analysis tools must be able to augment memory.<br />
The example above illustrates one <strong>of</strong> the limitations <strong>of</strong> working memory. We only<br />
remember that to which we attend. <strong>An</strong>y part <strong>of</strong> this image that never gets our attention<br />
will not be missed when we shift to another version <strong>of</strong> the image that lacks that<br />
particular part. If we don’t attend to it, we might notice the change from one version <strong>of</strong><br />
the image to the next, but only if the transition shift immediately from one to another,<br />
without even a split second <strong>of</strong> blank space between them.<br />
In addition to not remembering, we also don’t clearly see that on which we don’t focus.<br />
To see something clearly, we must focus on it, for only a small area <strong>of</strong> receptors on the<br />
retinas <strong>of</strong> our eyes are designed for high-resolution vision.<br />
(Source: This demonstration <strong>of</strong> change blindness was prepared by Ronald A. Rensink<br />
<strong>of</strong> the University <strong>of</strong> British Columbia. Several other examples <strong>of</strong> this visual phenomenon<br />
can be found at http://www.psych.ubc.ca/%7erensink/flicker/download/index.html.)
Capacity = 3 or 4<br />
chunks at a time<br />
Imagination<br />
The<br />
World<br />
Working<br />
Memory<br />
Long-term<br />
Memory<br />
When we think about things, trying to make sense <strong>of</strong> them, information is temporarily<br />
held in working memory. Only three or four chunks <strong>of</strong> visual information can be stored in<br />
working memory at any one time. Information that comes in through our eyes or that is<br />
retrieved from long-term memory in the moment <strong>of</strong> thought is extremely limited in<br />
capacity. When you release information from working memory, it can take one <strong>of</strong> two<br />
possible routes on its way out: 1) it can be stored permanently in long-term memory by<br />
means <strong>of</strong> a rehearsal process that we call memorization, or 2) it can simply be<br />
forgotten.
External aids can augment memory.<br />
To compare facts, you must hold them in working memory simultaneously. Because we<br />
can hold so little in working memory at any one time, however, to do analysis effectively,<br />
we must rely on external aids to memory. This is an ideal job for a computer. Even a<br />
piece <strong>of</strong> paper that you jot down notes on to keep track <strong>of</strong> information as you’re<br />
analyzing data is an external memory aid that is quite powerful despite being low-tech.<br />
A computer running properly designed s<strong>of</strong>tware, however, can augment our ability to<br />
think about information much better than pencil <strong>and</strong> paper.
Good analysis tools augment memory by<br />
exp<strong>and</strong>ing...<br />
• Quantity<br />
• Dimensions<br />
• Perspectives<br />
Good visual analysis s<strong>of</strong>tware can help us overcome the limitations <strong>of</strong> working memory<br />
in several ways. The goal is to enable as many meaningful comparisons as possible.<br />
Good tools can help us increase:<br />
• The amount <strong>of</strong> information that we can compare (that is, greater quantity)<br />
• The range <strong>of</strong> information that we can compare (that is, more dimensions)<br />
• The different views <strong>of</strong> the information that we can compare (that is, multiple<br />
perspectives)
Greater quantity through visual encoding<br />
1 chunk<br />
Traditional BI relies mostly on tabular data displays. Tables are wonderful if you need to<br />
look up individual values, compare a single value to another, or know values precisely,<br />
but they don’t display patterns or trends. This is a problem, because data analysis relies<br />
heavily on our ability to spot <strong>and</strong> make sense <strong>of</strong> patterns <strong>and</strong> trends in data. Take a look<br />
at the table <strong>and</strong> compare it to this line graph, which displays the same data. Relying on<br />
the table to discern the ups <strong>and</strong> downs <strong>of</strong> sales through time <strong>and</strong> to compare the<br />
patterns <strong>of</strong> change from region to region would yield very little <strong>of</strong> the information that is<br />
obvious in this graph. <strong>Visual</strong> representations give form to data, making pattern, trends,<br />
<strong>and</strong> exceptions easy to see.<br />
<strong>An</strong>other advantage <strong>of</strong> properly designed graphs over tables for analytical purposes is<br />
less obvious. If you needed to remember information in the table, you could hold only<br />
about four <strong>of</strong> the values (that is, four <strong>of</strong> the monthly sales numbers) in working memory<br />
at any one time. But by relying on the graph, 12 values are combined into each <strong>of</strong> the<br />
four lines to form a pattern that you could hold entirely as a single chunk in working<br />
memory. Simply by giving visual form to the values, you can hold much more<br />
information in memory.
More dimensions through multiple visual<br />
encodings <strong>and</strong> small multiples<br />
You can increase the number <strong>of</strong> dimensions (variables) that can be viewed<br />
simultaneously in two ways: 1) by using multiple visual encodings, <strong>and</strong> 2) by breaking<br />
the data into a series <strong>of</strong> small graphs that share common scales so they can be<br />
compared to one another. In the bubble plot on the right, we can view four variables at<br />
once: the rate <strong>of</strong> property crime based in position along the X-axis, the rate <strong>of</strong> violent<br />
crime based on position along the Y-axis, the population <strong>of</strong> states based on the size <strong>of</strong><br />
the bubbles, <strong>and</strong> the region in which each state resides based on color. The series <strong>of</strong><br />
small multiples on the left allows us to revenue, costs <strong>of</strong> goods sold, <strong>and</strong> gross pr<strong>of</strong>it<br />
along the Y-axis, quarters along the X-axis, <strong>and</strong> different product lines per graph. If we<br />
tried to display all <strong>of</strong> this information in a single graph, it would be far too cluttered <strong>and</strong><br />
difficult to perceive.
More perspectives through coordinated<br />
views<br />
There’s an old folktale that you’ve probably all heard about three blind men who encounte<br />
elephant one day for the first time <strong>and</strong> do their best to learn about it by touch alone. The<br />
experience <strong>of</strong> each is unique because each touches a different part <strong>of</strong> the elephant. This<br />
ancient story, originally from China, can teach us something important today about busine<br />
intelligence (BI). According to the original Chinese tale, the first man touches the elephant<br />
ear, the second his legs, <strong>and</strong> the third his tail. From this point, here’s how the story goes:<br />
The three blind men then went their way. Each one was secretly excited over the<br />
experience <strong>and</strong> had a lot to say, yet all walked rapidly without saying a word.<br />
"Let's sit down <strong>and</strong> have a discussion about this queer animal," the second blind man sa<br />
breaking the silence.<br />
"A very good idea. Very good." the other two agreed for they also had this in mind. With<br />
waiting for anyone to be properly seated, the second one blurted out, "This queer anima<br />
like our straw fans swinging back <strong>and</strong> forth to give us a breeze. However, it's not so big<br />
well made. The main portion is rather wispy."<br />
"No, no!" the first blind man shouted in disagreement. "This queer animal resembles two<br />
big trees without any branches."<br />
"You're both wrong." the third man replied. "This queer animal is similar to a snake; it's l<br />
<strong>and</strong> round, <strong>and</strong> very strong."<br />
How they argued! Each one insisted that he alone was correct. Of course, there was no<br />
conclusion for not one had thoroughly examined the whole elephant. How can anyone<br />
describe the whole until he has learned the total <strong>of</strong> the parts.<br />
If I retold this story today to teach a lesson about BI, I might call it “Three blind analysts an<br />
data warehouse.” Business people struggle every day to make sense <strong>of</strong> data, stumbling<br />
blindly, touching only small parts <strong>of</strong> the information, <strong>and</strong> coming away with a narrow <strong>and</strong><br />
fragmented underst<strong>and</strong>ing <strong>of</strong> what it means.
Coordinated views exp<strong>and</strong> what we can<br />
connect <strong>and</strong> compare.<br />
To underst<strong>and</strong> something, we <strong>of</strong>ten have to examine it from many angles <strong>and</strong> focus on<br />
many parts. Too much business data analysis involves looking only for one thing in<br />
particular. Is revenue going up? The answer is “yes” or “no” — end <strong>of</strong> story. Perhaps,<br />
however, you ought to look at revenues, expenses, pr<strong>of</strong>its, marketing campaigns,<br />
seasonality, composition <strong>of</strong> the sales force, new product introductions, <strong>and</strong> the<br />
competition to underst<strong>and</strong> the richer story that your data has to tell.
Who is qualified to analyze data?<br />
Data<br />
<strong>An</strong>alysis<br />
Knowledge<br />
?<br />
You are the key that opens the door for good data to result in good decisions. S<strong>of</strong>tware,<br />
no matter how sophisticated, is useless if you don’t possess the fundamental skills <strong>of</strong><br />
data analysis. Data analysis is for one purpose: to enable good decisions. Do you need<br />
to be an Einstein to make sense <strong>of</strong> your business data? For most business data<br />
analysis, the answer is “No”, but when statistical sophistication required, you need a<br />
statistically sophisticated tool.
<strong>An</strong>alytical challenges dem<strong>and</strong><br />
statistical horsepower.<br />
<strong>JMP</strong> is one <strong>of</strong> best tools available today for combining robust statistical functionality with<br />
rich interactive visualizations.
Predictive analysis is one <strong>of</strong> these<br />
challenges.<br />
U.S. Dollars<br />
(thous<strong>and</strong>s)<br />
4,000<br />
Sales Revenues<br />
Sales Revenues<br />
3,500<br />
3,000<br />
2,500<br />
2,000<br />
1,500<br />
1,000<br />
500<br />
0<br />
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec<br />
2012<br />
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec<br />
2013<br />
One <strong>of</strong> the analytical challenges that requires a high degree <strong>of</strong> statistical training is its<br />
use for predicting what might happen in the future. Let’s clarify the meaning <strong>of</strong><br />
predictive analytics. Like many terms that get tossed around by s<strong>of</strong>tware vendors <strong>and</strong><br />
even by thought leaders in the field <strong>of</strong> business intelligence, predictive analytics seems<br />
to mean whatever’s convenient at the moment. It might be useful for marketing<br />
purposes to keep definitions loose <strong>and</strong> adaptable to the occasion, but it doesn’t help us<br />
at all. We’ll start by turning to everyone’s favorite dictionary these days: Wikipedia.<br />
Predictive analytics encompass a variety <strong>of</strong> techniques from statistics <strong>and</strong> data<br />
mining that analyze current <strong>and</strong> historical data to make predictions about future<br />
events. Such predictions rarely take the form <strong>of</strong> absolute statements, <strong>and</strong> are more<br />
likely to be expressed as values that correspond to the odds <strong>of</strong> a particular event or<br />
behavior taking place in the future.<br />
In business, predictive models exploit patterns found in historical <strong>and</strong> transactional<br />
data to identify risks <strong>and</strong> opportunities. Models capture relationships among many<br />
factors to allow assessment <strong>of</strong> risk or potential associated with a particular set <strong>of</strong><br />
conditions, guiding decision making for c<strong>and</strong>idate transactions.<br />
Wikipedia entry for “predictive analytics” as <strong>of</strong> December 15, 2008.
Predictions rely on statistical models.<br />
Don’t let the terms statistical model or predictive model throw you. In concept, they’re<br />
quite simple, just like more familiar models <strong>of</strong> other types. Generally speaking, models<br />
are representations <strong>of</strong> things or events, which we use to examine <strong>and</strong> underst<strong>and</strong> those<br />
things or events when it isn’t possible or practical to observe or interact with them<br />
directly. Statistical models represent mathematical relationships between the parts that<br />
make up the thing or event. Predictive models are those that we can interact with to<br />
investigate the results <strong>of</strong> hypothetical conditions, such as by changing the values <strong>of</strong><br />
particular variables.<br />
Predictive models make it possible for us to do what some people call what-if analysis.<br />
What if such <strong>and</strong> such a condition existed or event occurred? What would happen as a<br />
result? Predictive models give us the means to predict what would probably happen<br />
(the probable outcomes <strong>of</strong> dependent variables) if particular conditions arose naturally<br />
or by intention (specified input values to one or more independent variables).
Good models don’t hide variables <strong>and</strong><br />
their relationships.<br />
Inputs<br />
?<br />
Outputs<br />
Most <strong>of</strong> the applications that I’ve seen marketed by business intelligence s<strong>of</strong>tware<br />
vendors for predictive analytics allow data to be entered on one end (inputs) <strong>and</strong> then<br />
results (outputs) pop out the other; what goes on in between remains hidden in a black<br />
box. Unfortunately, without seeing what goes on in that black box, our brains aren’t fully<br />
engaged in the process <strong>and</strong> too much is missed.<br />
Predictive analytics are most revealing when they allow us to see how all the variables<br />
that contribute either directly or indirectly to the outcomes that concern us relate to<br />
those outcomes <strong>and</strong> to one another. To underst<strong>and</strong> these relationships, we must see<br />
them; we must watch how changes in one variable directly cause or indirectly influence<br />
changes in the others. For this to happen, predictive models must be displayed visually<br />
in a way that allows: (1) our eyes to see the relationships <strong>and</strong> changes; <strong>and</strong> (2) our<br />
minds to make sense <strong>of</strong> them.
Finding <strong>and</strong> underst<strong>and</strong>ing the<br />
stories in data is not enough.<br />
You must tell them.<br />
As a data analyst, your job isn’t done once you’ve found <strong>and</strong> made sense <strong>of</strong> the stories<br />
that live in your data. Unless you can tell those stories in ways that others who need the<br />
information can underst<strong>and</strong>, your work is wasted.
Hans Rosling <strong>of</strong> www.GapMinder.org<br />
Few people tell quantitative stories as compellingly as Hans Rosling. When he spoke at<br />
the TED Conference (Technology, Entertainment, <strong>and</strong> Design) for he first time in 2006, it<br />
was perhaps the first time in history that a large audience <strong>of</strong> people found themselves<br />
on the edge <strong>of</strong> their seats watching a bubble plot.
Wisdom<br />
Knowledge<br />
Information<br />
Information cannot speak for itself. It needs our help. It relies on us to give it a voice.<br />
When we do, information can tell its story, <strong>and</strong> will thus become knowledge. The<br />
ultimate goal, however, isn’t knowledge; it is wisdom. Knowledge becomes wisdom<br />
when it is used to do something good. Only when we use what we know to make the<br />
world a better place has information served its purpose <strong>and</strong> we have done our job.<br />
Our networks are awash in data. A little <strong>of</strong> it is information. A smidgen <strong>of</strong> this shows<br />
up as knowledge. Combined with ideas, some <strong>of</strong> that is actually useful. Mix in<br />
experience, context, compassion, discipline, humor, tolerance, <strong>and</strong> humility, <strong>and</strong><br />
perhaps knowledge becomes wisdom.<br />
Turning Numbers into Knowledge, Jonathan G. Koomey, 2001, <strong><strong>An</strong>alytics</strong> Press:<br />
Oakl<strong>and</strong>, CA page 5, quoting Clifford Stoll.
O perpetual revolution <strong>of</strong> configured stars,<br />
O perpetual recurrence <strong>of</strong> determined seasons,<br />
O world <strong>of</strong> spring <strong>and</strong> autumn, birth <strong>and</strong> dying!<br />
The endless cycle <strong>of</strong> idea <strong>and</strong> action,<br />
Endless invention, endless experiment,<br />
Brings knowledge <strong>of</strong> motion, but not <strong>of</strong> stillness;<br />
Knowledge <strong>of</strong> speech, but not <strong>of</strong> silence;<br />
Knowledge <strong>of</strong> words, <strong>and</strong> ignorance <strong>of</strong> The Word.<br />
All our knowledge brings us nearer to our ignorance,<br />
All our ignorance brings us nearer to death,<br />
But nearness to death no nearer to God.<br />
Where is the Life we have lost in living?<br />
Where is the wisdom we have lost in knowledge?<br />
Where is the knowledge we have lost in information?<br />
Excerpt from The Rock, 1934, T.S. Elliot<br />
[Image source: www.irishastronomy.org]
O perpetual revolution <strong>of</strong> configured stars,<br />
O perpetual recurrence <strong>of</strong> determined seasons,<br />
O world <strong>of</strong> spring <strong>and</strong> autumn, birth <strong>and</strong> dying!<br />
The endless cycle <strong>of</strong> idea <strong>and</strong> action,<br />
Endless invention, endless experiment,<br />
Brings knowledge <strong>of</strong> motion, but not <strong>of</strong> stillness;<br />
Knowledge <strong>of</strong> speech, but not <strong>of</strong> silence;<br />
Knowledge <strong>of</strong> words, <strong>and</strong> ignorance <strong>of</strong> The Word.<br />
All our knowledge brings us nearer to our ignorance,<br />
All our ignorance brings us nearer to death,<br />
But nearness to death no nearer to God.<br />
Where is the Life we have lost in living?<br />
Where is the wisdom we have lost in knowledge?<br />
Where is the knowledge we have lost in information?<br />
Excerpt from The Rock, 1934, T.S. Elliot<br />
[Image source: www.trekvisual.com]
O perpetual revolution <strong>of</strong> configured stars,<br />
O perpetual recurrence <strong>of</strong> determined seasons,<br />
O world <strong>of</strong> spring <strong>and</strong> autumn, birth <strong>and</strong> dying!<br />
The endless cycle <strong>of</strong> idea <strong>and</strong> action,<br />
Endless invention, endless experiment,<br />
Brings knowledge <strong>of</strong> motion, but not <strong>of</strong> stillness;<br />
Knowledge <strong>of</strong> speech, but not <strong>of</strong> silence;<br />
Knowledge <strong>of</strong> words, <strong>and</strong> ignorance <strong>of</strong> The Word.<br />
All our knowledge brings us nearer to our ignorance,<br />
All our ignorance brings us nearer to death,<br />
But nearness to death no nearer to God.<br />
Where is the Life we have lost in living?<br />
Where is the wisdom we have lost in knowledge?<br />
Where is the knowledge we have lost in information?<br />
Excerpt from The Rock, 1934, T.S. Elliot<br />
[Image source: www.i.pbase.com]
O perpetual revolution <strong>of</strong> configured stars,<br />
O perpetual recurrence <strong>of</strong> determined seasons,<br />
O world <strong>of</strong> spring <strong>and</strong> autumn, birth <strong>and</strong> dying!<br />
The endless cycle <strong>of</strong> idea <strong>and</strong> action,<br />
Endless invention, endless experiment,<br />
Brings knowledge <strong>of</strong> motion, but not <strong>of</strong> stillness;<br />
Knowledge <strong>of</strong> speech, but not <strong>of</strong> silence;<br />
Knowledge <strong>of</strong> words, <strong>and</strong> ignorance <strong>of</strong> The Word.<br />
All our knowledge brings us nearer to our ignorance,<br />
All our ignorance brings us nearer to death,<br />
But nearness to death no nearer to God.<br />
Where is the Life we have lost in living?<br />
Where is the wisdom we have lost in knowledge?<br />
Where is the knowledge we have lost in information?<br />
Excerpt from The Rock, 1934, T.S. Elliot<br />
[Image source: www.]
Truth<br />
O perpetual revolution <strong>of</strong> configured stars,<br />
O perpetual recurrence <strong>of</strong> determined seasons,<br />
O world <strong>of</strong> spring <strong>and</strong> autumn, birth <strong>and</strong> dying!<br />
The endless cycle <strong>of</strong> idea <strong>and</strong> action,<br />
Endless invention, endless experiment,<br />
Brings knowledge <strong>of</strong> motion, but not <strong>of</strong> stillness;<br />
Knowledge <strong>of</strong> speech, but not <strong>of</strong> silence;<br />
Knowledge <strong>of</strong> words, <strong>and</strong> ignorance <strong>of</strong> The Word.<br />
All our knowledge brings us nearer to our ignorance,<br />
All our ignorance brings us nearer to death,<br />
But nearness to death no nearer to God.<br />
Where is the Life we have lost in living?<br />
Where is the wisdom we have lost in knowledge?<br />
Where is the knowledge we have lost in information?<br />
Excerpt from The Rock, 1934, T.S. Elliot<br />
[Image source: www.shepherdpics.com]
O perpetual revolution <strong>of</strong> configured stars,<br />
O perpetual recurrence <strong>of</strong> determined seasons,<br />
O world <strong>of</strong> spring <strong>and</strong> autumn, birth <strong>and</strong> dying!<br />
The endless cycle <strong>of</strong> idea <strong>and</strong> action,<br />
Endless invention, endless experiment,<br />
Brings knowledge <strong>of</strong> motion, but not <strong>of</strong> stillness;<br />
Knowledge <strong>of</strong> speech, but not <strong>of</strong> silence;<br />
Knowledge <strong>of</strong> words, <strong>and</strong> ignorance <strong>of</strong> The Word.<br />
All our knowledge brings us nearer to our ignorance,<br />
All our ignorance brings us nearer to death,<br />
But nearness to death no nearer to God.<br />
Where is the Life we have lost in living?<br />
Where is the wisdom we have lost in knowledge?<br />
Where is the knowledge we have lost in information?<br />
Excerpt from The Rock, 1934, T.S. Elliot<br />
[Image source: www.i163.photobucket.com]
The value <strong>of</strong> information depends on how it’s used.<br />
Use it wisely.<br />
O perpetual revolution <strong>of</strong> configured stars,<br />
O perpetual recurrence <strong>of</strong> determined seasons,<br />
O world <strong>of</strong> spring <strong>and</strong> autumn, birth <strong>and</strong> dying!<br />
The endless cycle <strong>of</strong> idea <strong>and</strong> action,<br />
Endless invention, endless experiment,<br />
Brings knowledge <strong>of</strong> motion, but not <strong>of</strong> stillness;<br />
Knowledge <strong>of</strong> speech, but not <strong>of</strong> silence;<br />
Knowledge <strong>of</strong> words, <strong>and</strong> ignorance <strong>of</strong> The Word.<br />
All our knowledge brings us nearer to our ignorance,<br />
All our ignorance brings us nearer to death,<br />
But nearness to death no nearer to God.<br />
Where is the Life we have lost in living?<br />
Where is the wisdom we have lost in knowledge?<br />
Where is the knowledge we have lost in information?<br />
Excerpt from The Rock, 1934, T.S. Elliot<br />
[Image source: www.jamin.org]