11.07.2015 Views

statisticalrethinkin..

statisticalrethinkin..

statisticalrethinkin..

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

268 9. BIG ENTROPY AND THE GENERALIZED LINEAR MODELBetter to think of generalized linear models as non-Gaussian regression models. If youhave an outcome you wish to model as a result of a multivariate process, but the outcome is adistance, duration, count, rank or other non-Gaussian measure, then GLM’s provide a practicaland flexible approach. ese models assume non-Gaussian stochastic nodes, but theyuse the same strategy of replacing parameters of the stochastic node with additive modelsthat incorporate predictor variables. So most of what you’ve already learned about buildingand interpreting Gaussian regression models remains relevant for non-Gaussian regression.ere are some new wrinkles, as you’ll see. But the additional power and predictive precisionare worth it.In this book, I focus mainly on GLM’s that use a family of outcome distributions knownas the exponential family. Indeed, many people actually define GLM’s as those regressionsthat use exponential family distributions. e normal distribution you’ve come to know andlove is the most famous member of the family, but there are others which are just as usefuland natural. ese distributions are scientifically important and mathematically convenient,partly because they arise naturally from many real-world processes, just like the normal distribution.e naturalness of these distributions helps us appreciate why, for example, theexponential distribution is a fundamental distribution of distances and durations.But the overall strategy used in building GLM’s is not constrained to the exponentialfamily. I’ll provide a couple of important applied examples, in the form of rank and orderedregression models. ese are not exactly exponential family distributions, although they dobuild on top of them. Still, by learning how these distributions are built up from more basicconsiderations of probability, it’ll help open your mind to the broader approach of seriouslyconsidering how to predict outcomes, even when they are measured on idiosyncratic scales.e goal is to build a model that addresses the measurement design of the data.Here’s the plan. Much of the remainder of this chapter will teach you the basics of modelingdistances and durations, kinds of measures I like to think of as displacements. Displacementsare continuous measures that contain only information about distance from a pointof reference. ey are always positive (or zero, in special cases). In the next chapter, I focuson count distributions and models. And then in the next, I discuss special distributionsfor awkward measures like ordered categories and ranks, as well as ways to mix exponentialfamily distributions together to begin to model variability in underlying processes. All ofthis work is on a path towards multilevel models, in Chapter 13.But before settling into the serious work of developing these GLM’s, it will pay first tomeet the most prominent members of the family. e important point to make, before movingon, is that these distributions are not arbitrary choices. Rather, they relate naturally tothe kinds of measures we wish to model. en I want to foreground some of the additionalissues in fitting and interpretation that come with GLM’s. You’ll meet these issues in detailas you develop the applied cases in the chapters to follow. But it’ll help now to outline theseissues for you, so you can understand how most of them arise from the same basic characteristicof generalized linear models: the variance of most non-Gaussian outcomes is notindependent of the mean.9.1.1. Meet the family. FIGURE 9.1 illustrates the representative shapes of the densities forthe most common exponential family distributions used in GLM’s. e horizontal axis ineach plot represents values of a variable, and the vertical axis represents density. For eachdistribution, the figure also provides the notation for its stochastic node (above each densityplot), and the name of R’s corresponding built-in density function (below each density plot).

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!