26.12.2014 Views

Calibration

Calibration

Calibration

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>Calibration</strong><br />

Learning Objectives<br />

After completing this module, the student will be able to<br />

• explain the purpose of calibration<br />

• find a calibration curve using the Excel function trendline<br />

• write a macro in Excel<br />

• explain the meaning of R 2<br />

• explain sources of error when estimating the independent<br />

variable value<br />

• find a confidence interval for the independent variable value<br />

Knowledge and Skills<br />

• trendline calculation<br />

• linear regression<br />

• coefficient of determination<br />

• calibration<br />

Prerequisites<br />

• linear equation<br />

• average and standard deviation<br />

• normal distribution<br />

Citation: Neuhauser, C. <strong>Calibration</strong><br />

Created: October 18, 2009 Revisions:<br />

Copyright: © 2009 Neuhauser. This is an open‐access article distributed under the terms of the Creative Commons Attribution<br />

Non‐Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows<br />

others to translate, make remixes, and produce new stories based on this work, provided the original author and source are<br />

credited and the new work will carry the same license.<br />

Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 1


Pre‐assessment<br />

Before completing the module test whether you master the prerequisites. Linear Equation<br />

1. Find the equation of a horizontal line that goes through the point (2,4).<br />

2. Find the equation of a vertical line that goes through the point (‐1,3).<br />

3. Determine the equation of the line passing through (‐2,1) and (3,‐1/2).<br />

4. Determine the equation of the line passing through (1,‐2) and (‐2,4).<br />

5. Determine the equation of the line with slope 3 and vertical intercept (0,2).<br />

6. Determine the equation of the line passing through (‐1,‐1) and parallel to the line passing through<br />

(0,1) and (3,0).<br />

7. Graph of the line given by the equation y= 2x + 1.<br />

8. Graph the line given by the equation 3x − 4y + 1=<br />

0.<br />

Average and Standard Deviation<br />

9. Find the average and sample standard deviation of the following data set: 2,4,5,6,6,7,8<br />

10. Write down the equation for calculating the average and the sample standard deviation of a data set<br />

of size n: x1, x2,..., x<br />

n<br />

Normal Distribution<br />

11. Suppose X is normally distributed with mean 2 and standard deviation 1. Find (a) the 75 th percentile,<br />

(b) the 95 th percentile, and (c) the 99 th percentile.<br />

12. Suppose X is normally distributed with mean 3 and variance 4. Find the probability that X is between<br />

1 and 4, that is, find P(1 ≤ X ≤4)<br />

.<br />

13. Suppose X is normally distributed with mean ‐1 and standard deviation 4. Find an interval centered<br />

about the mean so that with probability 0.95 X is contained in that interval.<br />

14. Suppose that the number of seeds a plant produces is normally distributed with mean 142 and<br />

standard deviation 31. Find the probability that a randomly sampled plant will produce more than<br />

200 seeds.<br />

Citation: Neuhauser, C. <strong>Calibration</strong><br />

Created: October 18, 2009 Revisions:<br />

Copyright: © 2009 Neuhauser. This is an open‐access article distributed under the terms of the Creative Commons Attribution<br />

Non‐Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows<br />

others to translate, make remixes, and produce new stories based on this work, provided the original author and source are<br />

credited and the new work will carry the same license.<br />

Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 2


<strong>Calibration</strong><br />

According to the NIST handbook<br />

(http://www.itl.nist.gov/div898/handbook/pmd/section1/pmd133.htm), “[t]he goal of calibration is to<br />

quantitatively convert measurements made on one of two measurement scales to the other<br />

measurement scale.” The relationship between two measurements is used to convert one measurement<br />

into the other measurement. You saw one such example in your chemistry lab where you measured<br />

absorbance to find the concentration of an unknown sample. In this case, the relationship between<br />

absorbance and concentration was linear. You derived the relationship by measuring absorbance of<br />

standard samples of known concentration. The resulting line is called calibration curve. The basis for the<br />

calibration curve is Beer’s Law, which states that there is a direct linear relationship between<br />

absorbance (A) and concentration (c): When if we graph absorbance as a function of concentration, a<br />

straight line with positive slope provides a good fit. To illustrate this, we provide in the following table<br />

absorption measurements of standard samples:<br />

Concentration Absorbance<br />

[μmole L ‐1 ]<br />

0 0<br />

20 0.2356<br />

40 0.4725<br />

60 0.7127<br />

80 0.9507<br />

If we graph the data points and fit a straight line through the points (Figure 1), we find that the equation<br />

of the straight line is A= 0.0119c− 0.0014 .<br />

Figure 1: Straight line fit<br />

Citation: Neuhauser, C. <strong>Calibration</strong><br />

Created: October 18, 2009 Revisions:<br />

Copyright: © 2009 Neuhauser. This is an open‐access article distributed under the terms of the Creative Commons Attribution<br />

Non‐Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows<br />

others to translate, make remixes, and produce new stories based on this work, provided the original author and source are<br />

credited and the new work will carry the same license.<br />

Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 3


This curve is called a standard curve and is used to infer the unknown concentration of a solution. For<br />

instance, if we find that the absorbance A of an unknown solution is 0.6386, we find for the<br />

concentration c<br />

0.6386 −− ( 0.0014)<br />

c = = 53.8<br />

0.0119<br />

The data in our example fits Beer’s Law extremely well. The data was generated using a Virtual Lab on<br />

Spectrophotometry (http://www.chm.davidson.edu/vce/Spectrophotometry/UnknownSolution.html).<br />

When data are obtained in actual lab experiments, measurement errors need to be taken into account.<br />

A Model for Linear <strong>Calibration</strong><br />

We assume in the following that we measure a signal y that depends linearly on a quantity x. We call the<br />

quantity x the independent variable and the quantity y the dependent variable. We assume that we<br />

measure x without error and that the quantity y is measured with an error ε that is normally distributed<br />

with mean 0 and standard deviation σ. The relationship between the two quantities is then<br />

y= a+ bx+<br />

ε<br />

To get a sense for the measurement uncertainty when inferring the quantity x from the measurement y,<br />

we begin with simulating an experiment in which we have a set of n standard samples and for each<br />

sample we measure the signal m times.<br />

In‐class Activity 1<br />

In the spreadsheet <strong>Calibration</strong>Workbook under the tab “Simulation,” you will find the simulation of<br />

standard samples with values x = 10,20,40,60,80 and 90 and where the intercept a = 0 and the slope<br />

b = 1. Each signal is measured 3 times. The simulated data are in the gray‐colored box. The input<br />

parameters for the slope, the intercept, and the standard deviation s.d. for the error are in the yellowcolored<br />

box. The trendline is calculated using the Excel function LINEST. (This function is difficult to use<br />

and you will not need to learn how at this point.)<br />

To investigate how the estimated value of the independent variable x depends on the error ε, we<br />

proceed as follows. We assume that the (unknown) value of the independent variable x is equal to 50<br />

Citation: Neuhauser, C. <strong>Calibration</strong><br />

Created: October 18, 2009 Revisions:<br />

Copyright: © 2009 Neuhauser. This is an open‐access article distributed under the terms of the Creative Commons Attribution<br />

Non‐Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows<br />

others to translate, make remixes, and produce new stories based on this work, provided the original author and source are<br />

credited and the new work will carry the same license.<br />

Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 4


(Cell F11). Using the equation y= ax+ b+ ε with a = 1 and b = 0 with s.d. 1, we can calculate the<br />

measured value of the quantity y (Cell F12). We can then use the estimated trendline to find the<br />

estimate for x (Cell F13) . The graph displays the simulated data from the calibration experiment, the<br />

trendline, and the data point corresponding to the unknown sample.<br />

When you press F9, you will see that Excel runs another simulation. By repeatedly pressing F9, you can<br />

get a sense for the variability of the estimated value of x in our simulation experiment. It is tedious to<br />

record manually the values of repeated simulations. Excel has a feature, called Macro, that records<br />

repeated key strokes. Let’s write a macro to record the outcome of repeated simulations for the<br />

estimate of x.<br />

(a) To write a macro to simulate values of x, proceed as follows:<br />

1. Open the Developer tab and click on Record Macro in the Code group.<br />

2. Give the macro a name and select a key, for instance, Ctrl‐a works.<br />

3. Select the Home tab.<br />

4. Copy the value of x from Cell F13.<br />

5. Paste the value of x into Cell Q3 as Paste Value.<br />

6. Click on Insert in the Cells group and click on Shift cells down in the Insert window.<br />

7. Go to the Developer tab and click on Stop Recording in the Code group.<br />

If you press Ctrl‐a, the simulated values will be copied into your spreadsheet in Column Q. Repeat the<br />

simulation 100 times. (The numbers in Column P help you keep track of the simulations.)Sort the<br />

simulated values from Smallest to Largest. Find the middle 90%.<br />

(b) Repeat the simulation when x = 15 . Are the inferred values of x more or less spread out compared<br />

to when x = 50 <br />

(c) Change the standard deviation to see how an increase/decrease in the measurement error affects<br />

the uncertainty in the calculation of x.<br />

Citation: Neuhauser, C. <strong>Calibration</strong><br />

Created: October 18, 2009 Revisions:<br />

Copyright: © 2009 Neuhauser. This is an open‐access article distributed under the terms of the Creative Commons Attribution<br />

Non‐Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows<br />

others to translate, make remixes, and produce new stories based on this work, provided the original author and source are<br />

credited and the new work will carry the same license.<br />

Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 5


Figure 2: Screenshot of the simulation. The input parameters are listed in the yellow box; the simulated data are<br />

listed in the gray box; the estimated values of the slope and vertical intercept are listed in the green box together<br />

with the calculation of the unknown quantity x based on the measurement of the unknown sample y. The graph<br />

displays the simulated data (blue symbols), the trendline (black line), and the unknown measurement (red data<br />

point).<br />

Linear Regression<br />

When two quantities are linearly related, such as absorbance and concentration, a straight line provides<br />

a good fit. In Excel, a straight line can be fitted using the Trendline option. The Trendline option is under<br />

the Layout in the Chart Tools. When clicking on the blue triangle under Trendline and choosing More<br />

Trendline Options, a window opens that offers additional options, such as Display Equation on chart<br />

and Display R‐squared value on chart. We already know the meaning of the Equation. We will now look<br />

at the meaning of R‐squared.<br />

Assume a linear model y= a+ bx+ ε where the error has mean 0 and standard deviation σ . We<br />

obtained data points ( x , y ), j= 1,2,..., n, and used the Trendline option to fit a straight line. This results<br />

j<br />

j<br />

in estimates for the slope and the intercept. We denote the estimated value of the intercept by â and<br />

the estimated value of the slope by ˆb.<br />

Citation: Neuhauser, C. <strong>Calibration</strong><br />

Created: October 18, 2009 Revisions:<br />

Copyright: © 2009 Neuhauser. This is an open‐access article distributed under the terms of the Creative Commons Attribution<br />

Non‐Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows<br />

others to translate, make remixes, and produce new stories based on this work, provided the original author and source are<br />

credited and the new work will carry the same license.<br />

Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 6


How does Excel estimate the slope and the intercept<br />

The method that Excel uses to estimate the slope and the intercept is called method of least squares.<br />

The method says: Find â and ˆb so that the expression<br />

n<br />

∑<br />

j=<br />

1<br />

⎡y ( ˆ ˆ<br />

j<br />

− a+<br />

bxj)<br />

⎤<br />

⎣ ⎦<br />

2<br />

is as small as possible. We say that the sum of the squared deviations is minimized. Expressions for the<br />

estimated intercept and slope can be given. It is not important to memorize the expressions.<br />

The least square line (or linear regression line) is given by<br />

with<br />

bˆ<br />

=<br />

∑<br />

y= aˆ<br />

+ bx ˆ<br />

n<br />

j=<br />

1 j<br />

n<br />

∑ j=<br />

1<br />

aˆ<br />

= y −bx<br />

ˆ<br />

( x − x)( y − y)<br />

2<br />

( x − x)<br />

j<br />

j<br />

To measure how good the fit is we calculate a quantity called the coefficient of determination, which is<br />

abbreviated as R 2 . For each data point ( x , y ), we can define y = aˆ<br />

+ bx ˆ . We introduce the deviation of<br />

the measured y‐values from their mean,<br />

j<br />

yj<br />

j<br />

− y<br />

ˆ j<br />

, which we can write as<br />

y − y = ( y − yˆ<br />

) + ( yˆ<br />

− y)<br />

j j j j<br />

n<br />

2<br />

A somewhat lengthy calculation shows that the total sum of squared deviations ∑ ( y )<br />

j 1 j<br />

− y can be<br />

=<br />

written as a part that is explained by the linear model (Explained) and a part that reflects the stochastic<br />

errors (Unexplained)<br />

j<br />

n n n<br />

2 2 2<br />

( y ) ( ˆ ) ( ˆ<br />

j<br />

− y = yj − y + yj −yj)<br />

j= 1 j= 1 j=<br />

1<br />

∑ ∑ ∑<br />

<br />

Total Explained Unexplained<br />

The ratio between the explained variation and the total variation is the coefficient of determination<br />

Citation: Neuhauser, C. <strong>Calibration</strong><br />

Created: October 18, 2009 Revisions:<br />

Copyright: © 2009 Neuhauser. This is an open‐access article distributed under the terms of the Creative Commons Attribution<br />

Non‐Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows<br />

others to translate, make remixes, and produce new stories based on this work, provided the original author and source are<br />

credited and the new work will carry the same license.<br />

Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 7


R<br />

2<br />

∑<br />

∑<br />

2<br />

Explained ( yˆ<br />

)<br />

1 j<br />

− y<br />

j=<br />

= =<br />

n<br />

Total 2<br />

( y − y)<br />

n<br />

j=<br />

1<br />

j<br />

The coefficient of determination<br />

2<br />

R is the proportion of variation that is explained by the model.<br />

In‐class Activity 2<br />

Return to the spreadsheet <strong>Calibration</strong>Workbook. Under the tab “Simulation,” you have already worked<br />

on the simulation of standard samples with values x = 10,20,40,60,80 and 90 and where the intercept<br />

a = 0 and the slope b = 1. Each signal is measured 3 times. The simulated data are in the gray‐colored<br />

box. The graph has a small textbox where the equation of the trendline and the coefficient of<br />

determination is listed. You will see that when you increase the standard deviation, the coefficient of<br />

determination decreases. Give a verbal explanation as to why you would expect this.<br />

The Chemistry <strong>Calibration</strong> Lab<br />

In your <strong>Calibration</strong> Lab, you were asked to prepare a calibration curve. The spreadsheet<br />

<strong>Calibration</strong>Lab.xlsx will help you do the analysis. Open the spreadsheet. The <strong>Calibration</strong> Lab Analysis<br />

sheet is set up so that you can enter your data into the yellow cells. To calculate the calibration curve,<br />

enter the data from the absorbance measurements of the standard samples into C4:C21 (Step 2). The<br />

spreadsheet will calculate the slope and intercept in the cells I19 and I20, respectively, (see blue cells<br />

and Step 3). In Step 4, the spreadsheet calculates the coefficient of determination. Compare the values<br />

in the cell to the textbox in the figure that has the same information.<br />

(a) To include the uncertainty of the calibration curve in your lab report, record the coefficient of<br />

determination together with the equation of the trendline. Explain in words the meaning of the<br />

coefficient of determination.<br />

(b) In the chemistry lab, you then determined the concentration of an unknown sample based on the<br />

calibration curve. Enter the three measurements into cells B25‐B27 (Step 5). The spreadsheet is set up<br />

so that it calculates the estimated concentration. Use paper and pencil to verify the result in Cell B 31<br />

(estimated concentration) the spreadsheet.<br />

(c) While the theory is beyond this course, the spreadsheet is set up to calculate a confidence interval<br />

for the estimated concentration * x . In Cell K25, you can set the confidence level, for instance 95%. The<br />

lower and upper limits of the confidence interval are listed in Cells K27 and K28, respectively. Record the<br />

Citation: Neuhauser, C. <strong>Calibration</strong><br />

Created: October 18, 2009 Revisions:<br />

Copyright: © 2009 Neuhauser. This is an open‐access article distributed under the terms of the Creative Commons Attribution<br />

Non‐Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows<br />

others to translate, make remixes, and produce new stories based on this work, provided the original author and source are<br />

credited and the new work will carry the same license.<br />

Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 8


confidence interval. The Cell K26 contains the value of half the length of the confidence interval, which<br />

we denote by C<br />

x . We can thus report the result also as x* ± Cx<br />

.<br />

If you want to read more about Linear <strong>Calibration</strong>, consult the statistics and data analysis paper by<br />

Burke, S. Regression and <strong>Calibration</strong>. LC GC Europe Online Supplement.<br />

Homework<br />

1. Find a linear regression line through the given points and compute the coefficient of determination<br />

x ‐3.0 ‐2.0 ‐1.0 0.0 1.0 2.0<br />

y ‐6.3 ‐5.6 ‐3.3 0.1 1.7 2.1<br />

2. To determine whether the frequency of chirping crickets depends on temperature, the following<br />

data were obtained by Pierce, 1949 (The Songs of Insects. Cambridge, Mass. Harvard University<br />

Press):<br />

Temperature (F) 69 70 72 75 81 82 83 84 89 93<br />

Chirps/sec 15 15 16 16 17 17 16 18 20 29<br />

Fit a linear trendline and find the coefficient of determination.<br />

3. To determine the glucose in a wine sample an enzyme spectroscopy method is used. The calibration<br />

curve is obtained from the following data<br />

Added glucose, 0.000 0.050 0.100 0.200 0.300 0.400<br />

[glucose] (mM)<br />

Absorbance 0.231 0.279 0.314 0.423 0.540 0.665<br />

(a) Find the equation of the calibration curve and the coefficient of determination.<br />

(b) Suppose the absorbance of an unknown sample is measured as 0.356. Use the calibration curve<br />

to estimate the glucose level.<br />

Citation: Neuhauser, C. <strong>Calibration</strong><br />

Created: October 18, 2009 Revisions:<br />

Copyright: © 2009 Neuhauser. This is an open‐access article distributed under the terms of the Creative Commons Attribution<br />

Non‐Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows<br />

others to translate, make remixes, and produce new stories based on this work, provided the original author and source are<br />

credited and the new work will carry the same license.<br />

Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 9

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!