A Clinical Evaluation of Various Delta Check ... - Clinical Chemistry

CLIN. CHEM. 27/1, 5-9 (1981) 

A Clinical Evaluation of Various Delta Check Methods 

Lawrence A. Wheeler1 and Lewis B. Sheiner2 

To evaluate the performance of delta check techniques, 

we analyzed 707 unselected pairs of continuous-flow test 

results, using three different delta check methods. If any 

of the test results (plus the urea nitrogen/creatinine ratio 

and the anion gap) failed one of the checks, the reason for 

the failure was sought by examining subsequent test results, 

retesting specimens, and (or) reviewing the patient’s 

chart. Each delta check failure was accordingly classified 

as a true or false positive. The percentage of positives we 

judged to be true positives ranged from 5 to 29%. Each 

of the three methods had test types with low and high 

percentages of true positives. We conclude that with the 

delta check methods one can detect errors otherwise 

overlooked, but at the cost of investigating many false 

positives, because, in the population we studied, disease 

processes or therapy often caused large changes in a 

series of test results for a patient. 

The concept of using prior test results from a patient to 

determine whether a newly obtained test result is likely to be 

in error (“delta checks”) is very attractive (1-4). 

First, it is a direct approach in which the test results of interest 

are evaluated rather than an indirect method such as 

traditional quality-control techniques. The latter methods 

only evaluate the performance of the test procedure on a 

quality-control specimen. Errors in specimen identification, 

test performance, and test-result reporting for the clinical 

specimen cannot be detected. 

Second, the magnitude of the delta can be so chosen that 

the delta check will always fail when the change, if it is not 

artifactual, is clinically important. This process will alert 

laboratory personnel to test results that, if incorrect, could 

result in inappropriate therapy. If an effective procedure is 

implemented to follow up delta check failures, two important 

advantages are realized. Many or all (depending on the extent 

of the follow-up procedure) incorrect test results (where the 

error has resulted in a test result that differs significantly from 

a previous value) can be detected and not released. In addition, 

those test results that represent actual clinically important 

changes and pass the follow-up procedure for laboratory 

delta check failure can be so indicated on the test-results 

report, thus alerting the clinician to clinically important 

changes and increasing his or her confidence that these 

changes are not simply laboratory errors. This should eliminate 

some unnecessary retesting and allow appropriate clinical 

steps to be taken more promptly. 

These potential benefits of delta check techniques have 

prompted proposals for their adoption by many groupsincluding 

the College of American Pathologists, which in their 

Inspection and Accreditation Program specifies the absence 

of delta check techniques to be a Phase I (minor) deficiency 

for laboratories with laboratory computer systems. Unfortu- 

Department of Pathology, Indiana University, Indianapolis, IN 

46223. 

2 Department of Medicine, Division of Clinical Pharmacology, and 

Department of Laboratory Medicine, University of California, San 

Francisco, CA 94143. 

Received July 14, 1980; accepted Sept. 5, 1980. 

nately, while the concept of delta checking has been accepted, 

no clinical trial has tested the relative effectiveness of the delta 

check methods that have been proposed for clinical chemistry 

tests. 

Our purpose was to evaluate the three currently proposed 

delta check procedures (1-3) that are applicable to some or 

all of the SMA 6 continuous-flow analysis results. This evaluation 

involved using subsequent test results, repeat determinations, 

and chart review to classify delta check failures into 

true positives (errors made in specimen identification, test 

performance, or test result reporting) and false positives 

(changes ascribable to physiological responses to disease or 

therapy). 

Materials and Methods 

The test results used in this study were collected with the 

clinical laboratory Community Health Computing Laboratory 

computer system of the California Medical Center. For five 

consecutive days (Monday-Friday) all of the SMA 6 (Technicon 

Instruments Corp., Tarrytown, NY 10591) tests done 

during the morning hours were evaluated by using the delta 

check methods of Ladenson (1), Whitehurst et al. (2), and 

Wheeler and Sheiner (3). The Wheeler-Sheiner method uses 

points on two probability density functions of delta values. 

One probability density function was obtained when the two 

results used to form the delta were 0.9 to 1.5 days apart and 

the other when they were 1.5 to 2.5 days apart. To allow this 

method to be evaluated, we included in the study only thpse 

SMA 6 results for which another set of SMA 6 results were 

obtained 0.9 to 2.5 days previously. In addition, the set of 

values from the probability density function that corresponded 

to the 0.05 and 0.95 points were used in the study (i.e., 

nominally 10% of any particular group test results could be 

expected to fail the Wheeler-Sheiner delta check). 

We designed an algorithm to classify each test result that 

failed a delta check as a true or false positive. A test result that 

fails a delta check because of an actual change in the patient’s 

analyte value was defined to be a false positive. A positive that 

was due to any other reason was defined as a true positive (i.e., 

an error was made in identifying the specimen in the patient-care 

area, a specimen-identification error had occurred 

in the laboratory, the SMA 6 had malfunctioned, or the test 

result was incorrectly entered into the laboratory computer). 

Note that this algorithm operates at the individual test (e.g., 

Na+) level. If two or more test results from a specimen failed 

delta checks, each was independently classified as a true or 

false positive. 

We believed that having laboratory personnel immediately 

collect another specimen and re-run it on the SMA 6 would 

be the best approach to determine whether or not a test result 

that failed a delta check was an error, although in some cases 

even this process might not yield definitive evidence. 

The first step in the algorithm (see Figure 1) is an approximation 

of this method. In the discussion that follows, the 

previous test result that was compared to the current test 

result will be designated TR1; the current test result will be 

TR2; and a subsequent test result, obtained within 24 h, TR3. 

If a TR3 was available, an arbitrary rule was used to judge 

whether it indicated that TR2 represented an actual change 

in the patient’s serum, not an error: if TR3 was nearer to TR2 

CLINICAL CHEMISTRY, Vol. 27, No. 1. 1981 5

‘iv 103 Obtained 

Fig. 1. Delta check evaluation algorithm 

Yes False Positive 

No ,True Positive 

False Positive 

than to TR1, TR2 was considered to be correct and the delta 

check failure was specified to be a false positive. 

If the TR3 was closer to TR1 than to TR2, or if a TR3 was 

not obtained, the second step in the algorithm was carried out. 

This step included performing a repeat SMA 6 determination 

on the TR2 specimen (when there was sufficient specimen). 

The results of this determination is designated TR2R. When 

the difference between TR2 and TR2R exceeded three times 

the standard deviation of the corresponding test method, a 

laboratory test-performance error was judged to have occurred 

and the delta check failure was designated a true positive. If 

TR2R validated TR2 or if sufficient specimen was not available 

to perform a repeat determination, the third step of the 

algorithm was performed. 

The third step of the algorithm was to review the patient’s 

chart. This review focused on a search for the etiology of the 

delta. Examples of some clinical situations that we accepted 

as being the cause of large changes in SMA 6 test results 

are: 

1. Renal dialysis done between the time the two specimens 

were drawn. 

2. Potassium supplementation given (as a reason for a 

increase). 

3. Recent renal transplant as a reason for decreasing urea 

nitrogen and creatinine (normal pattern) or increasing 

urea nitrogen and creatinine (rejection). 

4. Intravenous therapy with an electrolyte-containing 

fluid. 

Because we investigated only those test results that failed 

one or more of the delta check methods, we cannot divide the 

test results that did not fail a particular delta check into true 

and false negatives with total accuracy. However, in several 

cases a test result that did not fail one method did fail one or 

both of the other methods and was therefore investigated, so 

we can assign these delta check failures to be true or false 

negatives. Further, if it is assumed that all “medically significant” 

changes in test result values will be detected by one 

of the three methods, true and false negative percentages can 

be computed for all the test result delta check combinations 

except urea nitrogen/creatinine ratio and anion gap, which 

were evaluated only by the Wheeler-Sheiner method. This 

assumption yields a lower bound for the false-negative percentage 

rates, because two classes of undetected test result 

errors will not be included. The first is the “small” error (e.g., 

reporting a K result that was actually determined to be 3.2 

mmolfL as 3.4 mmol/L). This clearly is an error, but it should 

6 CLINICAL CHEMISTRY, Vol. 27, No. 1, 1981 

liv 

No 

No 

No judgenmnt 

made 

UN Tests Failing The 

Wheeler-Shelner Method (27) 

True Positive 

No (I) 

False Positives 

True Positives 

False Positives 

Fig. 2. Delta check evaluation algorithm results for the 

Wheeler-Sheiner delta check methodapplied to urea nitrogen 

test results 

have no impact on patient case. Another example of this type 

of error would be a switch of labels on specimens from two 

patients with “normal” values for electrolytes, urea nitrogen, 

and creatinine. All the results would be in error, but they 

might not differ sufficiently to trigger a delta check failure. 

The second type of test result error that would not be included 

in the false-negative calculation is one in which the true 

test result differs greatly from the previous test result, but the 

erroneous test result is nearly the same. For example, yesterday’s 

K+ result was 4.0 mmol/L and the true value for today’s 

K+ value is 6.0 mmol/L, but an error is made such that 

a value of 3.9 mmol/L is reported. Delta check methods are 

by definition unable to detect this type of error. 

Despite the above problems, we believe that method performance 

estimates based on this assumption are useful, because 

“small” errors are not of clinical importance and the 

second type of error should be relatively rare. 

Results 

A total of 707 sets of SMA 6 results (including the urea nitrogen/creatinine 

ratio and anion gap) were included in the 

study. Of these, 253 had an SMA 6 analysis performed within 

0.9 to 2.5 days previously and therefore satisfied the criterion 

for delta check evaluation. Of these 253 sets of SMA 6 results, 

150 (59%) had at least one test result that failed one or more 

of the three delta check methods. The algorithm described in 

Figure 1 was followed in all the cases. Figure 2 presents the 

results of this process for urea nitrogen test results that failed 

the Wheeler-Sheiner delta check. A total of 27 test results 

failed the Wheeler-Sheiner delta check. Of these, 17 were 

judged to be false positives because the result of a determination 

made on a specimen on the next day (TR3) was nearer 

to TR2 than TR1. Three results had TR3 values nearer to TR1 

than TR2, and in seven cases a urea nitrogen determination 

was not done the next day. The 10 specimens in the two latter 

groups were candidates for repeat tests (i.e., obtaining TR2R 

values). 

In one case the repeat determination was not done. The 

reason repeat determinations were not done was not recorded 

for each case; however, the reason was usually that an insufficient 

volume of specimen was available. In two cases the 

TR2R value exceeded three test-method standard deviations 

from TR2. In these cases the repeat value was judged to in-

dicate that the TR2 value was in error and therefore these two 

cases were deemed to represent true positives (i.e., an error 

had taken place in the performance or reporting of the test 

yielding TR2). In seven cases the absolute magnitude of the 

difference between TR2 and TR2R was less than or equal to 

three test-method standard deviations. 

In the eight cases for which a judgment could not be made 

based on repeat test values, the third step in the algorithm 

(chart review) was performed, if possible. In one case the chart 

was not reviewed, because the chart could not be located at 

the time. In six cases a clinical reason for the change in the test 

results was found on chart review. These six cases were judged 

to represent false positives-i.e., an actual physiological 

change had occurred in the patient. In the remaining case we 

could find no reason for the change in the chart and therefore 

this case was specified to be a true positive. 

Table 1 presents the performance of the three delta check 

methods in this study. The true and false positive results were 

obtained by use of the algorithm described and illustrated 

above. 

The “No judgment made” column lists the number of test 

results of each type that failed one of the delta checks and that 

we were unable to assign to one of the other classes. This 

amounts to 13 of 193 (7%) of the Wheeler-Sheiner method 

values, nine of 91(9%) of the Ladenson method values, and 

seven of 141 (5%) of the Whitehurst method values. 

The “Predictive value” column presents data on the relative 

efficiency of the methods in practice. The values range from 

a low of 5% for K by the Whitehurst method to 29% for creatinine 

in the Wheeler-Sheiner method. 

The “Error incidence rate” column presents information 

on the (inferred) error rate for the test included in the study. 

The error rate ranged from 1.2% for Na to 4% for urea nitrogen. 

We examined the values of the deltas that we judged to be 

true or false positives, to see if different choices of the delta 

check limits would yield better performance. For example, it 

could have been the case that most or all of the deltas that 

were judged to be true positives were larger in magnitude than 

the false positives. We found that if the delta check limits were 

selected to be as large as possible in magnitude while still 

having all the true positives fail the delta check (e.g., for Na 

the three deltas which were judged to be true positives were 

7, -11, and -11 mmol/L and therefore the Na delta check 

limits were selected to be -10 and 6 mmol/L), that the delta 

check performance was not greatly improved. The percentage 

of the deltas failing this delta check that corresponded to true 

positives ranged from 14% for K to 37% for creatinine. These 

results show that, for the population considered in this study, 

no adjustment in the delta check limits would have resulted 

in greatly reduced false-positive rates. 

An important but little-discussed version of the delta check 

is to combine results of delta checks of several separate tests 

performed on any single specimen, to determine whether a 

specimen identification error has occurred. Table 2 presents 

the data obtained in this study. Each array of combinations 

of true and false positives adds up to 150, the total number of 

specimens for which one or more test results failed one or more 

of the delta check methods. The entry under zero false-positives 

and zero true-positives represents the number of specimens 

that had no test results failing that delta check method. 

Not surprisingly, the number of specimens in this cateogry is 

inversely related to the number of tests included in the delta 

check method. 

The Wheeler-Sheiner method has a total of 16 specimens 

with only true positives, five with both true and false positives, 

and 91 with only false positives. If a specimen is judged to be 

a true positive if at least one true positive is detected for it, 

then the percentage of positive specimens with true positives 

is 19% (21/112). This means that in this study approximately 

one fifth of the specimens that had one or more tests fail the 

Wheeler-Sheiner delta check were found to have at least one 

test result in error. By the Ladenson method the true positive 

rate was 16% (11/70), by the Whitehurst method 18% (16/ 

90). 

Discussion 

We evaluated the performance of three delta check methods 

in clinical use by applying them to 707 sets of SMA 6 results 

and attempting to determine the etiology of each delta check 

failure (Figure 1). 

Examination of the column of Table 1 that gives the total 

number of tests failing the delta check tells us about the relative 

stringency of the delta check methods. In general, the 

three delta check methods prescribe a more complicated decision 

procedure than just being positive whenever a maximum 

difference between the current value and the most recent 

previous value is exceeded. For example, for the serum 

urea nitrogen (UN) test, the ranges used by the three methods 

are given below: 

Method 

Wheeler-Sheiner 

Delta failure values 

-7 or A> 3 

- 11 or A> 8 

A< -38or A> 39 

UN1 - UN2I 

Ladenson I >0.5 

UN1 I 

UN2 59 

Whitehurst 

IUN1 - UN2I >0.5 

UN2 25 

IUN1 - UN2 

I UN2 

>0.25 UN2 >25 

Clearly the Whitehurst rule is the most stringent and the 

Ladenson rule the least stringent. The Wheeler-Sheiner rule 

is more complex and falls within these two extremes. This 

property of the rules is reflected by the total number of UN 

tests failing each method (Ladenson 10, Wheeler-Sheiner 25, 

and Whitehurst 56). 

Next consider the true-positive and false-positive columns 

of Table 1. These values indicate the yield in the error detection 

of each of the test/delta check method combinations. The 

predictive value column of Table 1 gives the percentage of 

positives that we judged to be true positives. The values range 

from very low values for K by the Whitehurst method (5%), 

Na by the Wheeler-Sheiner method (6%), K by the 

Wheeler-Sheiner method (6%), and urea nitrogen by the 

Ladenson method (6%) to relatively high values for chloride 

by the Whitehurst method (20%), bicarbonate by the 

Wheeler-Sheiner method (20%), urea nitrogen/creatinine 

ratio by the Wheeler-Sheiner method (28%), and creatinine 

by the Wheeler-Sheiner method (29%). Note that with even 

the best of these, more than 70% of the delta check failures are 

false positives. This result is a consequence of the fact that in 

the patient population to which these methods were applied 

(i.e., patients for whom two SMA 6’s were ordered within 2.5 

days), large variations in these types of test results are commonly 

seen that are attributable to disease or therapy. These 

results indicate the price in follow-up of false positives that 

the laboratory will have to pay to detect errors, when it is 

possible to do so by using delta check methods. 

It is entirely possible that if our patient population had been 

composed of predominantly healthy people (e.g., marines 

taking periodic physical examinations) or of patients with 

relatively well-controlled diseases, (e.g., medicine clinic patients) 

the results would have been different. In such popu- 

CLINICAL CHEMISTRY, Vol. 27, No. 1, 1981 7

Delta 

No. tests 

No Error 

check 

Test failing the 

True 

False 

False True ludgment Predictive incidence 

method 

type delta check _____________________ positives 

positives negatives negatives made value, % rate, % 

TR2R No reason TR3 Reason In 

Indicates for delta confirms chart for 

error In chart TR2 delta 

Wheeler- Na’ 18 0 13 4 2 233 0 6 1.2 

Sheiner N 35 0 2 24 6 4 214 3 6 2.4 

Cl 24 3 0 15 3 1 228 3 14 1.6 

HC03 25 3 2 14 5 0 228 1 21 2.0 

UN 27 2 17 6 7 219 1 12 4.0 

Cr 23 6 0 10 5 1 229 2 29 2.8 

UN/Cr 26 5 2 14 4 a 227 1 a a 

anion gap 15 1 0 8 4 a 238 2 a a 

Ladenson Na 14 1 1 10 2 1 238 0 14 1.2 

K 49 1 5 26 10 0 204 7 12 2.4 

UN 17 0 1 13 2 9 227 1 6 4.0 

Cr 11 2 0 8 0 5 237 1 20 2.8 

Whitehurst Na 18 1 2 11 4 0 235 0 17 1.2 

K 22 0 1 15 4 5 226 2 5 2.4 

CI- 10 1 1 5 3 2 241 0 20 1.6 

HC03 17 2 1 10 3 2 234 1 19 2.0 

UN 60 5 5 36 10 0 193 4 18 4.0 

Cr 14 2 0 11 1 5 234 0 14 2.8 

v False negatives could not be detected for these test results. Therefore these calculations were not performed. 

CR, creatinine: UN, serum urea nitrogen. 

lations the large changes in test values that we attributed to 

disease processes or therapy in the current study should be 

infrequent. 

Examination of the true-positive and false-negative columns 

of Table 1 indicates that relatively few test results of 

each type that we evaluated were judged to be in error. The 

error incidence rate column gives the exact values. The range 

of error incidence, from 1.2 to 4.0%, is somewhat higher than 

laboratorians would expect to find, and may be due to our 

algorithm “overdiagnosing” changes as being due to errors 

rather than to physiology. 

pathologist for review. What happened next should depend 

on the pathologist’s judgment of the potential adverse effect 

of the test result being in error. Follow-up actions could include 

repeating the test or calling the physician who ordered 

the test to determine if the patient’s clinical condition or 

therapy provided an explanation for the large change in the 

test result value. 

If it is decided to release the test result, it should be marked 

with an appropriate symbol so that the clinician will be aware 

TP/specim.n FP/soeclmen 

that it represents a large change, that it has been reviewed in 

0 1 2 3 4 5 the laboratory by a pathologist, and that the review process 

0 38 58 21 9 2 1 did not reveal that the test result was in error. 

1 11 0 2 1 0 0 We can consider the performance of delta check methods 

Table 2. Specimen Level True Positive (TP) and 

False Positive (FP) Results 

Wheeler-Sheiner 

method (3) 

Ladenson 

method(1) 

Whitehurst 

method (2) 

8 CLINICAL CHEMISTRY, Vol. 27, No. 1, 1981 

Table 1. Delta Check Methods: Performance Characteristics 

One approach to utilizing delta checks in a computerized 

laboratory would be to have a message that the delta check has 

failed provided to the technologist at the time the test result 

is recorded. The technologist would check for transcription 

errors at that time. If the technologist could not find a reason 

for the delta check failure, the result should be referred to a 

2 5 0 1 1 0 0 that require two or more test results to fail delta checks for the 

3 0 0 0 0 0 0 specimen to fail a delta check by examining Table 2. Interestingly, 

many specimens had more than one test fail a delta 

check. By the Wheeler-Sheiner method, 43 (38%) of the 

specimens that failed one delta check also failed two or more. 

0 80 47 12 0 0 0 The Whitehurst method had 43% (39/90) and the Ladenson 

1 11 0 0 0 0 0 method had 17% (12/70). Consider a delta check rule that 

2 0 0 0 0 0 0 would specify that a specimen be judged as failing a delta 

3 o 0 0 0 0 0 check if two or more test results fail delta checks. Using this 

rule with the Wheeler-Sheiner method results, we find that 

23% (10/43) of our specimens fail this specimen level delta 

check and include at least one test result delta check failure 

0 60 51 13 7 2 0 that was a true positive. This is a slight improvement over the 

1 10 1 3 0 0 0 rule that one test is needed to fail a delta check; however, this 

2 2 0 0 0 0 0 rule missed 52% (11/21) of the specimens that had at least one 

3 1 0 0 0 0 0 true positive. The number of specimens that would have to 

_______________________________ be evaluated has been reduced from 112 to 43; on the other

hand, missing 11 of 21 specimens with erroneous test results 

is clearly undesirable. 

In a recent paper (4) we evaluated the performance of delta 

check methods by using a simulation approach and found on 

using the Wheeler-Sheiner method as discussed above (i.e., 

a specimen fails if two or more of the eight test results fail delta 

checks) that the true.positive rate was 84% and the falsepositive 

rate was 20%. Therefore, if the specimen errors in this 

study were of the same type studied previously (i.e., mislabeled 

specimens) we would expect 44 specimens to have failed 

the “at least two of eight” delta check (18 true positives and 

26 false positives). In fact 43 specimens failed (10 true positives 

and 33 false positives), most likely owing to the fact that 

many errors other than specimen mislabeling can occur. In 

general, specimen mislabeling causes all the test results to be 

in error, while other error sources, such as machine malfunction 

and test result transcription errors, tend to cause only one 

to be in error. Therefore the latter errors will not be detected 

by a delta check method that requires two or more test results 

to fail individual delta checks. Note that 11 specimens with 

one true positive and no false positives are missed by using this 

rule. These specimens probably represent cases of the second 

type of error. 

In summary, we show that the three delta check methods 

as applied to individual tests can detect erroneous test results, 

but unfortunately deltas of similar magnitude occurred as a 

result of disease or therapy two to 15 times as often as those 

due to errors (for this group of tests and this patient population). 

This result limits the efficiency of delta check methods 

in this setting, because the major effort in delta check failure 

follow-up would be spent on false positives. Nevertheless, we 

believe that delta check methods can serve a useful role. First, 

they detect errors that escape with standard quality control 

techniques-and of course a primary goal of a clinical laboratory 

is to eliminate erroneous results. Second, flagging test 

results that fail the delta but then pass the laboratory’s review 

procedure should increase the clipician’s confidence in these 

results and decrease unnecessary repeat tests. 

References 

1. Ladenson, J. H., Patients as their own controls: Use of the computer 

to identify “laboratory error.” Clin. Chem. 21, 1648-1653 

(1975). 

2. Whitehurst, P., DeSilvio, T. V., and Boyadjian, G., Evaluation of 

discrepancies in patients’ results-an aspect of computer-assisted 

quality control. Clin. Chem. 21,87-92 (1975). 

3. Wheeler, L. A., and Sheiner, L. B., Delta check tables for the 

Technicon SMA 6 continuous-flow analyzer. Clin. Chern. 23, 216-219 

(1977). 

4. Sheiner, L. B., Wheeler, L. A., and Moore, J. K., The performance 

of delta check methods. Gun. Chem. 25, 2034-2037 (1979). 

CLINICAL CHEMISTRY, Vol. 27, No. 1, 1981 9

A Clinical Evaluation of Various Delta Check ... - Clinical Chemistry

Create successful ePaper yourself

Delete template?

Save as template?