12.07.2015 Views

Chapter 10 Continuous probability distributions - Ugrad.math.ubc.ca

Chapter 10 Continuous probability distributions - Ugrad.math.ubc.ca

Chapter 10 Continuous probability distributions - Ugrad.math.ubc.ca

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Chapter</strong> <strong>10</strong><strong>Continuous</strong> <strong>probability</strong> <strong>distributions</strong><strong>10</strong>.1 IntroductionWe <strong>ca</strong>ll x a continuous random variable in a ≤ x ≤ b if x <strong>ca</strong>n take on any value in this interval.An example of a random variable is the height of adult human male, selected randomly from apopulation. (This takes on values in a range 0.5 ≤ x ≤ 3 meters, say, so a = 0.5 and b = 3.) If weselect a male subject at random from a large population, and measure his height, we might expectto get a result in the proximity of 1.7-1.8 meters most often - thus, such heights will be associatedwith a larger value of <strong>probability</strong> than heights in some other interval of equal length, e.g. heightsin the range 2.7 < x < 2.8 meters, say.Unlike the <strong>ca</strong>se of discrete <strong>probability</strong>, however, the measured height <strong>ca</strong>n take on any realnumber within the interval of interest. This leads us to redefine our idea of the <strong>probability</strong>, usinga continuous function in place of the discrete bar-graph seen in the previous chapter.<strong>10</strong>.2 Basic definitions and propertiesDefinitionA function p(x) is a <strong>probability</strong> density provided it satisfies the following conditions:1. p(x) ≥ 0 for all x.2. ∫ bp(x) dx = 1 where the possible range of values of x is a ≤ x ≤ b.aasThe <strong>probability</strong> that a random variable x takes on values in the interval a 1 ≤ x ≤ a 2 is defined∫ a2a 1p(x) dx.v.2005.1 - January 5, 2009 1


Math <strong>10</strong>3 Notes <strong>Chapter</strong> <strong>10</strong>Unlike our previous discrete <strong>probability</strong>, we will not ask “what is the <strong>probability</strong> that x takeson some exact value?” Rather, we ask for the <strong>probability</strong> that x is within some range of values,and this is computed by performing an integral. (Remark: the <strong>probability</strong> that x is exactly equalto b is the integral∫ bbp(x) dx = 0; the value is zero, by properties of the definite integral.)DefinitionThe cumulative distribution function F(x) represents the <strong>probability</strong> that the random variable takeson any value up to x, i.e.F(x) =∫ xap(s) ds.The cumulative distribution is simply the area under the <strong>probability</strong> density.The above definition has several impli<strong>ca</strong>tions:Properties of continuous <strong>probability</strong>1. Since p(x) ≥ 0, the cumulative distribution is an increasing function.2. The connection between the <strong>probability</strong> density and its cumulative distribution <strong>ca</strong>n be written(using the Fundamental Theorem of Calculus) as3. F(a) = 0. This follows from the fact thatp(x) = F ′ (x).F(a) =∫ aBy a property of the definite integral, this is zero.4. F(b) = 1. This follows from the fact thatap(s) ds.by property 2 of p(x).F(b) =∫ bap(s) ds = 15. The <strong>probability</strong> that x takes on a value in the interval a 1 ≤ x ≤ a 2 is the same asF(a 2 ) − F(a 1 ).This follows from the additive property of integrals:∫ a2p(s) ds −∫ a1p(s) ds =∫ a2aaa 1p(s) dsv.2005.1 - January 5, 2009 2


Math <strong>10</strong>3 Notes <strong>Chapter</strong> <strong>10</strong>Finding the normalization constantNot every function <strong>ca</strong>n represent a <strong>probability</strong> density. For one thing, the function must be positiveeverywhere. Further, the total area under its graph should be 1, by property 2 of a <strong>probability</strong>density. However, in many <strong>ca</strong>ses we <strong>ca</strong>n convert a function to a <strong>probability</strong> density by simplymultiplying it by a constant, C, equivalent to the recipro<strong>ca</strong>l of the total area under its graph overthe interval of interest. This process is <strong>ca</strong>lled “normalization”, and the constant C is <strong>ca</strong>lled thenormalization constant.<strong>10</strong>.2.1 ExampleConsider the function f(x) = sin(πx/6) for 0 ≤ x ≤ 6.(a) Normalize the function so that it describes a <strong>probability</strong> density.(b) Find the cumulative distribution function, F(x).SolutionThe function is positive in the interval 0 ≤ x ≤ 6, so we <strong>ca</strong>n define the desired <strong>probability</strong> density.Letp(x) = C sin( π 6 x).(a) We must find the normalization constant, C, such thatHere is how we find the constant:1 =∫ 60∫ 60p(x) dx = 1.C sin( π 6 x) dx = C 6 π(− cos( π 6 x) ) ∣ ∣ ∣∣61 = C 6 12(− cos(π) + cos(0)) = Cπ πThus, we find that the desired constant of normalization isC = π 12 .Once we res<strong>ca</strong>le our function by this constant, we get the desired <strong>probability</strong> density,p(x) = π 12 sin(π 6 x).This density has the property that the total area under its graph over the interval 0 ≤ x ≤ 6 is 1.A graph of this <strong>probability</strong> density function is shown in Figure <strong>10</strong>.1(a).(b)F(x) =∫ x0p(s) ds = π 12∫ x0sin( π s) ds6v.2005.1 - January 5, 2009 30


Math <strong>10</strong>3 Notes <strong>Chapter</strong> <strong>10</strong>F(x) = π (− 6 ) ∣ ∣∣∣x12 π cos(π 6 s) = 1 (1 − cos( π )02 6 x) .This cumulative distribution function is shown in Figure <strong>10</strong>.1(b).<strong>10</strong>.250.20.80.150.60.<strong>10</strong>.40.050.201 2 3 4 5 601 2 3 4 5 6xxFigure <strong>10</strong>.1: (a) The <strong>probability</strong> density p(x), (left) and (b) the cumulative distribution F(x) (right)for example 1.<strong>10</strong>.3 Mean and medianRe<strong>ca</strong>ll that we have defined the mean of a distribution of grades or mass in a previous chapter. Fora mass density ρ(x), the idea of the mean coincides with the center of mass of the distribution,¯x =∫ ba∫ baxρ(x) dx.ρ(x) dxThis definition <strong>ca</strong>n also be applied to a <strong>probability</strong>∫density, but in this <strong>ca</strong>se the integral in thebdenominator is simply 1 (by property 2), i.e. p(x) dx = 1. (The simplifi<strong>ca</strong>tion is analogous toaan observation we made for expected value in a discrete <strong>probability</strong> distribution.)We define the mean of a <strong>probability</strong> density as follows:DefinitionFor a random variable in a ≤ x ≤ b and a <strong>probability</strong> density p(x) defined on this interval, themean or average value of x (also <strong>ca</strong>lled the expected value), denoted ¯x is given by¯x =∫ baxp(x) dx.The idea of median encountered previously in grade <strong>distributions</strong> also has a parallel here. Simplyput, the median is the value of x that splits the <strong>probability</strong> distribution into two portions whoseareas are identi<strong>ca</strong>l.v.2005.1 - January 5, 2009 4


Math <strong>10</strong>3 Notes <strong>Chapter</strong> <strong>10</strong>DefinitionThe median x m of a <strong>probability</strong> distribution is a value of x in the interval a ≤ x m ≤ b such that∫ xmap(x) dx =∫ bx mp(x) dx = 1 2 .It follows from this definition that the median is the value of x for which the cumulative <strong>distributions</strong>atisfiesF(x m ) = 1 2 .<strong>10</strong>.3.1 ExampleFind the mean and the median of the <strong>probability</strong> distribution found in Example <strong>10</strong>.2.1.SolutionTo find the mean we compute¯x = π 12∫ 60x sin( π x) dx.6Integration by parts is required here. Let u = x, dv = sin( πx) dx. Then du = dx, v = − 6 6 π cos(πx)6The result is(¯x = π −x 6 ∣ ∣∣∣612 π cos(π 6 x) + 6 ∫ )6cos( π0π 0 6 x)dx(¯x = 1 −x cos( π ∣ ∣∣∣62 6 x) + 6 ∣ )∣∣∣6π sin(π 6 x)¯x = 1 (−6 cos(π) + 6 2 π sin(π) − 6 )π sin(0) = 6 2 = 300To find the median, x m , we setF(x m ) = 1 2 .Using the form of the cumulative distribution from example 1, we find that1(1 − cos( π )2 6 x m) = 1 2 .1 − cos( π 6 x m) = 1v.2005.1 - January 5, 2009 5


Math <strong>10</strong>3 Notes <strong>Chapter</strong> <strong>10</strong>cos( π 6 x m) = 0The angles whose cosine is zero are ±π/2, ±3π/2 etc. We select the angle in the relevant interval,i.e. π/2. This leads toπ6 x m = π 2so the median isx m = 3.RemarkA glance at the original <strong>probability</strong> distribution should convince us that it is symmetric about thevalue x = 3. Thus we should have anticipated that the mean and median of this distribution wouldboth occur at the same place, i.e. at the midpoint of the interval. This will be true in general forsymmetric <strong>probability</strong> <strong>distributions</strong>, just as it was for symmetric mass or grade <strong>distributions</strong>.<strong>10</strong>.3.2 How is the mean different from the median?p(x)p(x)xxFigure <strong>10</strong>.2:We have seen in Example 2 that for symmetric <strong>distributions</strong>, the mean and the median arethe same. Is this always the <strong>ca</strong>se? When are the two different, and how <strong>ca</strong>n we understand thedistinction?Re<strong>ca</strong>ll that the mean is closely associated with the idea of a center of mass, a concept from physicsthat describes the lo<strong>ca</strong>tion of a pivot point at which the entire “mass” would exactly balance. It isworth remembering thatmean of p(x) = expected value of x = average value of x.This concept is not to be confused with the average value of a function, which is an average valueof the y coordinate.The median simply indi<strong>ca</strong>tes a place at which the “total mass” is subdivided into two equalportions. (In the <strong>ca</strong>se of <strong>probability</strong> density, each of those portions represents an equal area,A 1 = A 2 = 1/2 since the total area under the graph is 1 by definition.)v.2005.1 - January 5, 2009 6


Math <strong>10</strong>3 Notes <strong>Chapter</strong> <strong>10</strong>Figure <strong>10</strong>.2 shows how the two concepts of median (indi<strong>ca</strong>ted by verti<strong>ca</strong>l line) and mean (indi<strong>ca</strong>tedby triangular “pivot point”) differ. At the left, for a symmetric <strong>probability</strong> density, themean and the median coincide, just as they did in Example 2. To the right, a small portion of thedistribution was moved off to the far right. This change did not affect the lo<strong>ca</strong>tion of the median,since the areas to the right and to the left of the verti<strong>ca</strong>l line are still equal. However, the fact thatpart of the mass is farther away to the right leads to a shift in the mean of the distribution, tocompensate for the change.Simply put, the mean contains more information about the way that the distribution is arrangedspatially. This stems from the fact that the mean of the distribution is a “sum” - i.e. integral - ofterms of the form xp(x)∆x. Thus the lo<strong>ca</strong>tion along the x axis, x, not just the “mass”, p(x)∆x,affects the contribution of parts of the distribution to the value of the mean.<strong>10</strong>.4 Radioactive de<strong>ca</strong>yRadioactive de<strong>ca</strong>y is a probabilistic phenomenon: an atom spontaneously emits a particle andchanges into a new form. We <strong>ca</strong>nnot predict exactly when a given atom will undergo this event,but we <strong>ca</strong>n study a large collection of atoms and draw some interesting conclusions.We <strong>ca</strong>n define a <strong>probability</strong> density function that represents the <strong>probability</strong> that an atom wouldde<strong>ca</strong>y at time t. This function represents the fraction of the atoms that de<strong>ca</strong>y per unit time. Itturns out that a good <strong>ca</strong>ndidate for such a function isp(t) = Ce −kt ,where k is a constant that represents the rate of de<strong>ca</strong>y of the specific radioactive material. Inprinciple, this function is defined over the interval 0 ≤ t ≤ ∞; that is, it is possible that we wouldhave to wait a “very long time” to have all of the atoms de<strong>ca</strong>y. This means that these integralshave to be evaluated “at infinity”, introducing a compli<strong>ca</strong>tion that we will learn how to handle inthe context of this example. Using this function we <strong>ca</strong>n characterize the mean and median de<strong>ca</strong>ytime for the material.NormalizationWe first find the constant of normalization, i.e. setWe need to ensure that∫ ∞0∫ ∞0p(t) dt = 1.Ce −kt dt = 1.An integral of this sort in which one of the endpoints is at infinity is <strong>ca</strong>lled an improper integral.We must see if it makes sense by computing it as a limit, i.e. by <strong>ca</strong>lculatingI T =∫ T0Ce −kt dtv.2005.1 - January 5, 2009 7


Math <strong>10</strong>3 Notes <strong>Chapter</strong> <strong>10</strong>Median de<strong>ca</strong>y timeWe <strong>ca</strong>n use the cumulative distribution function to help determine the median de<strong>ca</strong>y time, t m . Todetermine t m , the time at which half of the atoms have de<strong>ca</strong>yed, we set F(t m ) = 0.5, giving uswe getF(t m ) = 1 − e −ktm = 1 2e −ktm = 1 2e ktm = 2kt m = ln 2So we find thatt m = ln 2k .Thus half of the atoms have de<strong>ca</strong>yed by this time. (Remark: this is easily recognized as the halflife of the radioactive process from previous familiarity with exponentially de<strong>ca</strong>ying functions.)Mean de<strong>ca</strong>y timeThe mean time of de<strong>ca</strong>y ¯t is given by¯t =∫ ∞0tp(t) dt.We compute this integral again as an improper integral by taking a limit as the top endpointincreases to infinity, i.e. we first findand then setI T =To compute I T we use integration by parts:I T =∫ T0∫ T0tp(t) dt,¯t = limT →∞ I T.∫ Ttke −kt dt = k te −kt dt.0Let u = t, dv = e −kt dt. Then du = dt, v = e −kt /(−k), so thatI T =∫ ]∣I T = k[t e−kt e−kt ∣∣∣T(−k) − (−k) dt 0[ ∫−te −kt +]∣ ∣∣∣Te −kt dt0]∣ ∣∣∣=[−te −kt − e−kt Tkv.2005.1 - January 5, 2009 9.0


Math <strong>10</strong>3 Notes <strong>Chapter</strong> <strong>10</strong>Now as T → ∞, we haveI T =[−Te −kT − e−kT+ 1 ].k kTe −kT → 0, e −kT → 0so that¯t = lim I T = 1T →∞ k .Thus the mean or expected de<strong>ca</strong>y time is¯t = 1 k .<strong>10</strong>.5 Discrete versus continuous <strong>probability</strong>In an earlier chapter, we compared the treatment of two types of mass <strong>distributions</strong>. We firstexplored a set of discrete masses strung along a “thin wire”. Later, we considered a single “bar”with a continuous distribution of density along its length. In the first <strong>ca</strong>se, there was an unambiguousmeaning to the concept of “mass at a point”. In the second <strong>ca</strong>se, we could assign a mass to somesection of the bar between, say x = a and x = b. (To do so we had to integrate the mass density onthe interval a ≤ x ≤ b.) In the first <strong>ca</strong>se, we talked about the mass of the objects, whereas in thelatter <strong>ca</strong>se, we were interested in the idea of density (mass per unit distance: Note that the unitsof mass density are not the same as the units of mass.)The same dichotomy exists in the topic of <strong>probability</strong>. In an earlier chapter, we were concernedwith the <strong>probability</strong> of discrete events whose outcome belongs to some finite set of possibilities (e.g.Head or Tail for a coin toss, allele A or a in genetics). But many random processes lead to aninfinite, continuous set of possible outcomes. We need the notion of continuous <strong>probability</strong> to dealwith such <strong>ca</strong>ses. In continuous <strong>probability</strong>, we consider the <strong>probability</strong> density - analogous to massdensity. We <strong>ca</strong>n assign a <strong>probability</strong> to some range of values of the outcome between x = a andx = b. (To do so we have to integrate the <strong>probability</strong> density on the interval a ≤ x ≤ b.)The examples below provide some further insight to the connection between continuous anddiscrete <strong>probability</strong>. In particular, we will see that one <strong>ca</strong>n arrive at the idea of <strong>probability</strong> densityby refining a set of measurements and making the appropriate s<strong>ca</strong>ling. We explore this connectionin more detail below.<strong>10</strong>.5.1 Example: Student heightsSuppose we measure the heights of all UBC students. This would produce about 30,000 datavalues. We could make a graph and show how these heights are distributed. For example, we couldsubdivide the student body into those students between 0 and 1.5m, and those between 1.5 and 3meters. Our bar graph would contain two bars, with the number of students in each height <strong>ca</strong>tegoryrepresented by the heights of the bars, as shown in Figure <strong>10</strong>.3(a).Suppose we want to keep more detail. We could divide the population into smaller groups byshrinking the size of the interval or “bin” into which height is subdivided. (An example is shownv.2005.1 - January 5, 2009 <strong>10</strong>


Math <strong>10</strong>3 Notes <strong>Chapter</strong> <strong>10</strong>p(h)p(h)p(h)∆ hh∆ hhhFigure <strong>10</strong>.3: Refining a histogram by increasing the number of bins leads (eventually) to the ideaof a continuous <strong>probability</strong> density.in Figure <strong>10</strong>.3(b)). Here, by a “bin” we mean a little interval of width ∆h where h is height, i.e. aheight interval. For example, we could keep track of the heights in increments of 50 cm. If we wereto plot the number of students in each height <strong>ca</strong>tegory, then as the size of the bins gets smaller,so would the height of the bar: there would be fewer students in each <strong>ca</strong>tegory if we increase thenumber of <strong>ca</strong>tegories.To keep the bar height from shrinking, we might reorganize the data slightly. Instead of plottingthe number of students in each bin, we might plotnumber of students in bin.∆hIf we do this, then both numerator and denominator decrease as the size of the bins is made smaller,so that the shape of the distribution is preserved (i.e. it does not get flatter).We observe that in this <strong>ca</strong>se, the number of students in a given height <strong>ca</strong>tegory is representedby the area of the bar corresponding to that <strong>ca</strong>tegory:( )number of students in binArea of bin = ∆h= number of students in bin.∆hThe important point to consider is that the height of each bar in the plot represents the number ofstudents per unit height.This type of plot is precisely what leads us to the idea of a density distribution. As ∆h shrinks,we get a continuous graph. If we “normalize”, i.e. divide by the total area under the graph, we geta <strong>probability</strong> density, p(h) for the height of the population. As noted, p(h) represents the fractionof students per unit height whose height is h. It is thus a density, and has the appropriate units.More generally,p(x) ∆xrepresents the fraction of individuals whose height is in the range x ≤ h ≤ x + ∆x.<strong>10</strong>.5.2 Examples: Age dependent mortalityLet p(a) be a <strong>probability</strong> density for the <strong>probability</strong> of mortality of a female Canadian non-smoker atage a, where 0 ≤ a ≤ 120. (We have chosen an upper endpoint of age 120 since practi<strong>ca</strong>lly no Canav.2005.1- January 5, 2009 11


Math <strong>10</strong>3 Notes <strong>Chapter</strong> <strong>10</strong>dian female lives past this age at present.) Let F(a) be the cumulative distribution correspondingto this <strong>probability</strong> density.(a) What is the <strong>probability</strong> of dying by age a?(b) What is the <strong>probability</strong> of surviving to age a?(c) Suppose that we are told that F(75) = 0.8 and that F(80) differs from F(75) by 0.11. Whatis the <strong>probability</strong> of surviving to age 80? Which is larger, F(75) or F(80)?(d) Use the information in part (c) to estimate the <strong>probability</strong> of dying between the ages of 75and 80 years old. Further, estimate p(80) from this information.Solution(a) The <strong>probability</strong> of dying by age a is the same as the <strong>probability</strong> of dying any time up to agea, i.e. it isF(a) =∫ a0p(s) ds,i.e. it is the cumulative distribution for this <strong>probability</strong> density. That, precisely, is the interpretationof the cumulative function.(b) The <strong>probability</strong> of surviving to age a is the same as the <strong>probability</strong> of not dying before agea. By the elementary properties of <strong>probability</strong> discussed in the previous chapter, this is1 − F(a).(c) From the properties of <strong>probability</strong>, we know that the cumulative distribution is an increasingfunction, and thus it must be true that F(80) > F(75). Then F(80) = F(75) + 0.11 = 0.8 + 0.11 =0.91. Thus the <strong>probability</strong> of surviving to age 80 is 1-0.91=0.09. This means that 9% of thepopulation will make it to their 80’th birthday, according to this analysis.(d) The <strong>probability</strong> of dying between the ages of 75 and 80 years old is exactly∫ 8075p(x) dx.However, we <strong>ca</strong>n also state this in terms of the cumulative function, since∫ 8075p(x) dx =∫ 800p(x) dx −∫ 750p(x) dx = F(80) − F(75) = 0.11Thus the <strong>probability</strong> of death between the ages of 75 and 80 is 0.11.To estimate p(80), we use the connection between the <strong>probability</strong> density and the cumulativedistribution:p(x) = F ′ (x).Then it is approximately true thatp(x) ≈F(x) − F(x − ∆x).∆xv.2005.1 - January 5, 2009 12


Math <strong>10</strong>3 Notes <strong>Chapter</strong> <strong>10</strong>(Re<strong>ca</strong>ll the definition of the derivative - the limit of the slope of the se<strong>ca</strong>nt line as the widthincrements ∆x approach 0.)Thusp(80) ≈F(80) − F(75)5= 0.115= 0.022 per year<strong>10</strong>.5.3 Example: Raindrop size distributionDuring a Vancouver rainstorm, the density function which describes the radii of raindrops is constantover the range 0 ≤ r ≤ 4 (where r is measured in mm) and zero for larger r.(a) What is the density function p(r)?(b) What is the cumulative distribution F(r)?(c) In terms of the volume, what is the cumulative distribution F(V )?(d) In terms of the volume, what is the density function p(V )?(e) What is the average volume of a raindrop?SolutionThis problem is challenging be<strong>ca</strong>use one may be tempted to think that the uniform distribution ofdrop radii should give a uniform distribution of drop volumes. This is not the <strong>ca</strong>se, as the followingargument shows! The sequence of steps is illustrated in Figure <strong>10</strong>.4.p(r)F(r)r4 4rp(V)F(V)VVFigure <strong>10</strong>.4: Probability densities for raindrop radius and raindrop volume (left panels) and for thecumulative <strong>distributions</strong> (right) of each.v.2005.1 - January 5, 2009 13


Math <strong>10</strong>3 Notes <strong>Chapter</strong> <strong>10</strong>(a) The density function is p(r) = 1 0 ≤ r ≤ 4. This means that the <strong>probability</strong> per unit radius4of finding a drop of size r is the same for all radii in 0 ≤ r ≤ 4. Some of these drops will correspondto small volumes, and others to very large volumes. We will see that the <strong>probability</strong> per unit volumeof finding a drop of given volume will be quite different.(b) The cumulative distribution function isF(r) =∫ r014 ds = r 4 0 ≤ r ≤ 4.(c) The cumulative distribution function is proportional to the radius of the drop. We use theconnection between radii and volume of spheres to rewrite that function in terms of the volume ofthe drop: SinceV = 4 3 πr3we haveand sor =F(V ) = r 4 = 1 4( ) 1/3 3V 1/34π( ) 1/3 3V 1/3 .4πWe find the range of values of V by substituting r = 0, 4 into the equation V = 4 3 πr3 , to getV = 0,43 π43 . Therefore the interval is 0 ≤ V ≤ 4 3 π43 = (256/3)π.(d) We now use the connection between the <strong>probability</strong> density and the cumulative distribution,namely that p is the derivative of F. Now that the variable has been converted to volume, thatderivative is a little more “interesting”:p(V ) = F ′ (V )Therefore,p(V ) = 1 4( ) 1/3 3 14π 3 V −2/3 .Thus the <strong>probability</strong> per unit volume of finding a drop of volume V in 0 ≤ V ≤ 4 3 π43 is not at alluniform. This results from the fact that the differential quantity dr behaves very differently fromdV , and reinforces the fact that we are dealing with density, not with a <strong>probability</strong> per se. We notethat this distribution has smaller values at larger values of V .(e) The range of values of V is0 ≤ V ≤ 4 3 π43 = 256π3v.2005.1 - January 5, 2009 14


Math <strong>10</strong>3 Notes <strong>Chapter</strong> <strong>10</strong>and therefore the mean volume is¯V =∫ 256π/30= 112= 112= 112= 116V p(V )dV( ) 1/3 ∫ 3 256π/3V · V −2/3 dV4π 0( ) 1/3 ∫ 3 256π/3V 1/3 dV4π 0( ) 1/3∣ 3 3 ∣∣∣256π/34π 4 V 4/3 0( ) 1/3 ( ) 4/33 256π4π 3= 64π3 ≈ 67mm3 .<strong>10</strong>.6 Moments of a <strong>probability</strong> distributionWe are now familiar with some of the properties of <strong>probability</strong> <strong>distributions</strong>. On this page we willintroduce a set of numbers that describe various properties of such <strong>distributions</strong>. Some of thesehave already been encountered in our previous discussion, but now we will see that these fit into apattern of quantities <strong>ca</strong>lled moments of the distribution.MomentsLet f(x) be any function which is defined and positive on an interval [a, b]. We might refer to thefunction as a distribution, whether or not we consider it to be a <strong>probability</strong> density distribution.Then we will define the following moments of this function:zero’th moment M 0 =first moment M 1 =second moment M 2 =∫ ba∫ ba∫ baf(x) dxx f(x) dxx 2 f(x) dxn’th moment M n =.∫ bax n f(x) dx.Observe that moments of any order are defined by integrating the distribution f(x) with asuitable power of x over the interval [a, b]. However, in practice we will see that usually momentsv.2005.1 - January 5, 2009 15


Math <strong>10</strong>3 Notes <strong>Chapter</strong> <strong>10</strong>up to the second are usefully employed to describe common attributes of a distribution.<strong>10</strong>.6.1 Moments of a <strong>probability</strong> density distributionIn the particular <strong>ca</strong>se that the distribution is a <strong>probability</strong> density, p(x), defined on the intervala ≤ x ≤ b, we have already established the following :M 0 =∫ bap(x) dx = 1(This follows from the basic property of a <strong>probability</strong> density.) ThusThe zero’th moment of any <strong>probability</strong> density is 1.FurtherM 1 =∫ bax p(x) dx = ¯x = µ.That is,The first moment of a <strong>probability</strong> density is the same as the mean (i.e. expected value) of that<strong>probability</strong> density.So far, we have used the symbol ¯x to represent the mean or average value of x but often thesymbol µ is also used to denote the mean.The second moment, of a <strong>probability</strong> density also has a useful interpretation. From abovedefinitions, the second moment of p(x) over the interval a ≤ x ≤ b isM 2 =∫ bax 2 p(x) dxWe will shortly see that the second moment helps describe the way that density is distributed aboutthe mean. For this purpose, we must describe the notion of variance or standard deviation.Variance and standard deviationTwo kids of approximately the same size <strong>ca</strong>n balance on a teeter-totter by sitting very close to thepoint at which the beam pivots. They <strong>ca</strong>n also achieve a balance by sitting at the very ends of thebeam, equally far away. In both <strong>ca</strong>ses, the center of mass of the distribution is at the same place:precisely at the pivot point. However, the mass is distributed very differently in these two <strong>ca</strong>ses.In the first <strong>ca</strong>se, the mass is clustered close to the center, whereas in the second, it is distributedfurther away. We may want to be able to describe this distinction, and we could do so by consideringhigher moments of the mass distribution.Similarly, if we want to describe how a <strong>probability</strong> density distribution is distributed about itsmean, we consider moments higher than the first. We use the idea of the variance to describewhether the distribution is clustered close to its mean, or spread out over a great distance from themean.v.2005.1 - January 5, 2009 16


Math <strong>10</strong>3 Notes <strong>Chapter</strong> <strong>10</strong>VarianceThe variance is defined as the average value of the quantity (distance from mean) 2 , where theaverage is taken over the whole distribution. (The reason for the square is that we would not likevalues to the left and right of the mean to <strong>ca</strong>ncel out.)For discrete <strong>probability</strong> with mean, µ we define variance byV = ∑ (x i − µ) 2 p iFor a continuous <strong>probability</strong> density, with mean µ, we define the variance byV =∫ ba(x − µ) 2 p(x) dxThe standard deviationThe standard deviation is defined asσ = √ VLet us see what this implies about the connection between the variance and the moments of thedistribution.Relationship of variance to second moment¿From the equation for variance we <strong>ca</strong>lculate thatV =∫ baExpanding the integral leads to:(x − µ) 2 p(x) dx =∫ ba(x 2 − 2µx + µ 2 ) p(x) dx.V ==∫ ba∫ bax 2 p(x)dx −∫ bx 2 p(x)dx − 2µa∫ b2µx p(x) dx +a∫ bx p(x) dx + µ 2 ∫ baµ 2 p(x) dxap(x) dx.We recognize the integrals in the above expression, since they are simply moments of the <strong>probability</strong>distribution. Plugging in these facts, we arrive atv.2005.1 - January 5, 2009 17


Math <strong>10</strong>3 Notes <strong>Chapter</strong> <strong>10</strong>ThusV = M 2 − 2µ µ + µ 2V = M 2 − µ 2Thus the variance is clearly related to the second moment and to the mean of the distribution.Relationship of variance to second moment¿From the definitions given above, we find thatσ = √ V = √ M 2 − µ 2<strong>10</strong>.6.2 ExampleConsider the continuous distribution, in which the <strong>probability</strong> is constant, p(x) = C, for values of xin the interval [a, b] and zero for values outside this interval. Such a distribution is <strong>ca</strong>lled a uniformdistribution. (It has the shape of a rectangular band of height C and base (b − a).) It is easy tosee that the value of the constant C should be 1/(b − a) so that the area under this rectangularband will be 1, in keeping with the property of a <strong>probability</strong> distribution. Thus the equation of this<strong>probability</strong> is p(x) = 1 . We compute the first moments of this <strong>probability</strong> densityb − aM 0 =∫ bap(x)dx = 1b − a∫ ba1 dx = 1.(This was already known, since we have determined that the zeroth moment of any <strong>probability</strong>density is 1.) We also find thatM 1 =∫ bx p(x) dx = 1 x dxb − a a= 1x 2bb − a 2 ∣ = b2 − a 22(b − a) .This last expression <strong>ca</strong>n be simplified by factoring, leading toaµ = M 1 =a(b − a)(b + a)2(b − a)∫ b= b + a2Thus we have found that the mean µ is in the center of the interval [a, b], as expected. The medianwould be at the same place by a simple symmetry argument: half the area is to the left and halfthe area is to the right of this point.v.2005.1 - January 5, 2009 18


Math <strong>10</strong>3 Notes <strong>Chapter</strong> <strong>10</strong>To find the variance we might first <strong>ca</strong>lculate the second moment,M 2 = x 2 p(x) dx = 1 ∫ bx 2 dxab − a aIt <strong>ca</strong>n be shown by simple integration that this yields the result∫ bwhich <strong>ca</strong>n be simplified toM 2 = b3 − a 33(b − a) ,M 2 = (b − a)(b2 + ab + a 2 )3(b − a)= b2 + ab + a 2.3We would then compute the varianceAfter simplifi<strong>ca</strong>tion, we getThe standard deviation is thenV = M 2 − µ 2 = b2 + ab + a 2−3V = b2 − 2ab + a 212σ ==(b − a)2 √ 3 .(b − a)2.12(b + a)2.4v.2005.1 - January 5, 2009 19

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!