An Approach to Combining Results From Multiple Methods Motivated by the ISO GUM

M. S. Levenson; D. L. Banks; K. R. Eberhardt; L. M. Gill; W. F. Guthrie; H. K. Liu; M. G. Vangel; J. H. Yen; N. F. Zhang

doi:10.6028/jres.105.047

Levenson, M. S.; Banks, D. L.; Eberhardt, K. R.; Gill, L. M.; Guthrie, W. F.; Liu, H. K.; Vangel, M. G.; Yen, J. H.; Zhang, N. F.

2000-08-01 00:00:00

Volume 105, Number 4, July–August 2000 Journal of Research of the National Institute of Standards and Technology [J. Res. Natl. Inst. Stand. Technol. 105, 571 (2000)] An Approach to Combining Results From Multiple Methods Motivated by the ISO GUM Volume 105 Number 4 July–August 2000 M. S. Levenson, D. L. Banks, K. R. The problem of determining a consensus Key words: Bayes; reference materials; value and its uncertainty from the results uncertainty. Eberhardt, L. M. Gill, W. F. of multiple methods or laboratories is dis- Guthrie, H. K. Liu, M. G. Vangel, cussed. Desirable criteria of a solution J. H. Yen, and N. F. Zhang are presented. A solution motivated by the ISO Guide to the Expression of Uncer- Accepted: February 4, 2000 National Institute of Standards and tainty in Measurement (ISO GUM) is intro- Technology, duced and applied in a detailed worked example. A Bayesian hierarchical model Gaithersburg, MD 20899-8120 motivated by the proposed solution is [email protected] presented and compared to the solution. Available online: http://www.nist.gov/jres [email protected] [email protected] [email protected] [email protected] [email protected] 1. Introduction Often a reference material is certified based on data Suppose X and s are the sample mean and sample stan- from more than one measurement method (or from dard deviation of the results of n methods. The interval more than one laboratory). This situation occurs when X t s /n is a 95 % confidence interval on the n1,95 no single method can provide the necessary level of population mean of the methods. Here t is the two- n1,95 accuracy and/or when there is no single method whose sided 95 percentile point of a t -distribution with n 1 sources of uncertainty are well understood and quanti- degrees of freedom. fied. The intent of using multiple methods is to realize There are two problems with the use of the t -interval. systematic effects (biases) of individual methods as First, it rests on the assumptions that there is a popula- variation across the multiple methods results. The multi- tion of methods whose biases are centered around zero ple methods should be chosen to avoid common sources and that the chosen methods are a random sample from of biases, which would invalidate the use of the variation the population. Second, when the number of methods is in estimation of the uncertainty of the systematic effects. small, the factor t can be very large. For example, n1,95 If the biases are statistically independent and are cen- if n = 2, then t = 12.7 and if n = 3, then t = 4.3. n1,95 n1,95 tered around zero, then the certified value and the ex- For comparison, if n is large, the value is close to 2. panded uncertainty can be based on a t -interval [1]. 571 Volume 105, Number 4, July–August 2000 Journal of Research of the National Institute of Standards and Technology To further explore the issues related to the certifica- It is the purpose of this paper to propose and justify tion from multiple methods, we present an example. a solution to the problem of certifying reference materi- Figure 1 summarizes the measurement results of two als based on a small number of methods in which the analytes for a reference material. The analyte Cd was systematic effects are not completely understood. We analyzed by two methods. The mean and expanded un- call this problem the two-method problem , although the certainty interval (coverage factor k = 2) [2,3] of each number of methods may be three or four and laborato- method are displayed on the top plot. Similarly, the ries may play the role of methods. Section 2 motivates a analyte Hg was analyzed by two laboratories and the set of desirable criteria for a solution and reviews some results are displayed in the bottom plot. In the Cd case, of the existing solutions to the problem. Section 3 pre- there appears to be agreement between the two methods. sents a solution, called BOB, based on a Type B model It may be reasonable to assume that there are no biases [2,3] of the bias and discusses some implementation between the two methods. issues and related concerns. Section 4 gives a detailed However, in the Hg case, there appears to be disagree- worked example of BOB. Finally, Sec. 5 provides some ment between the two laboratories. In the certification concluding remarks. Appendix A covers some degrees of this analyte, an uncertainty component for the sys- of freedom issues. Appendix B presents a Bayesian jus- tematic effects of the laboratories must be considered. tification of BOB based on a hierarchical model. For a The two problems in using a t -interval for this uncer- review of the context of the problem in chemical refer- tainty component, discussed above, are present in the Hg ence materials, see Ref. [4]. data. Fig. 1. Examples of measurement results. ICPMS means inductively coupled plasma mass spectrometer and ID-ICPMS means isotope dilution inductively cou- pled plasma mass spectrometry. The numbers in parenthesis are the number of measurements on which the results are based. The uncertainty intervals indicate expanded uncertainties with coverage factors k =2. 572 Volume 105, Number 4, July–August 2000 Journal of Research of the National Institute of Standards and Technology 2. Criteria for a Solution degrees of freedom are large enough simply to use a coverage factor of k =2. An important practical property for a solution to the Finally, the solution should be based on a rigorous two-method problem is that it is flexible enough to han- statistical model. A statistical model grounds the solu- dle a wide variety of settings in a straightforward way. tion on a strong base. The formulation of such a model The variety of settings includes the following: (1) the clarifies the assumptions of the solution. It also makes existence and nonexistence of systematic effects in the available a large literature of properties and results. Ap- methods; (2) the availability of two to four methods or pendix B addresses this issue. laboratories and (3) the existence and nonexistence of a Before moving on to the proposed solution, we review valid uncertainty evaluation for each method (i.e., currently available procedures. The t -interval approach within-method uncertainty). The alternatives in setting has already been discussed. It has most of the above (1) are exemplified by the Cd and Hg results shown in properties. However, as mentioned above, it depends on Fig. 1. The Hg results are also relevant to setting (3). In assumptions that may not be valid and may produce this study, based on knowledge of the laboratories, there impractically large intervals when there are a small is reason to believe that the expanded uncertainty for number of methods. Any similar procedure that esti- Laboratory 2 is not valid. mates the uncertainties associated with the systematic A property often considered desirable for a solution is effects of the methods based solely on the observed data that it should produce an expanded uncertainty interval will suffer from the same problems. This constraint was that contains the measurement result of each of the one of the guiding principles in the derivation of the methods. The justification for this property is that any of proposed solution. the methods may be the “correct” one since the biases The Schiller-Eberhardt procedure [6] has been used are unknown. From a statistical point of view, this prop- for some time with acceptable results. It is motivated by erty is not necessary. Statistically, one requires that the the desire for the expanded uncertainty interval to con- expanded uncertainty interval is believed to include the tain each of the individual method means. It does not fit unknown value of the quantity being measured (i.e., into the ISO guidelines and is not based on a rigorous measurand [5]) with a stated level of confidence. Under statistical model. It has an undesirable scaling property the assumptions described in Sec. 1, the t -interval has in that the uncertainty can only increase as the number the correct level of confidence. However, as stated of methods increases. above, if the number of methods is small, the interval Paule-Mandel [7] was developed as an ad hoc proce- may be impractically large. dure to produce a summary value of results from meth- The solution should possess certain continuity and ods with differing biases and precisions. Recently, it has scaling properties. For example, if the solution has been been given a firmer statistical foundation [8]. However, applied in the two-method case and a third method there are unresolved issues related to the uncertainty of becomes available, then the result should not change by the estimate. Additionally, it emphasizes methods with a large amount. Related to the setting (1) described high precision. High precision does not imply low bias. above, the result should not change abruptly as the sys- One final “solution” is to not combine the results if tematic effect goes to zero. there is an indication of systematic effects that are not In the interest of consistency with current interna- understood. tional practice, the solution should not be at odds with the ISO uncertainty guidelines (ISO GUM) [2,3]. Briefly, the ISO guidelines involve expressing the mea- 3. Type B Model of Bias surement result as a function of quantities whose uncer- tainties can be evaluated. The uncertainties of these In this section, we present a framework for a solution quantities are expressed as standard uncertainties, which to the two-method problem. The framework is ex- are propagated to derive the standard uncertainty of the pressed in terms of the language of the ISO guidelines. measurement result. The notation u (X ) is used for the The model has two components. The first component is standard uncertainty of the quantity X . Along with the the estimate of the population mean of the multiple standard uncertainties are associated degrees of free- methods. The second component is the deviation of this dom, which are propagated by the Welch-Satterthwaite population mean from the unknown value of the mea- formula [2,3]. From the degrees of freedom, a coverage surand, i.e., the unknown bias of the population mean. factor k is determined based on the t -distribution. The The possible bias is modeled via a Type B distribution expanded uncertainty is equal to the product of the [2,3]. (The name BOB comes from Type B On Bias). standard uncertainty and the coverage factor, resulting Type B distributions present a means of incorporating in an interval with a given level of confidence. Often the the available information on the problem. Because they 573 Volume 105, Number 4, July–August 2000 Journal of Research of the National Institute of Standards and Technology are distributions, they can account for uncertainty in the information. Distributional forms should be chosen that capture the information in an effective and straightfor- ward way. These aspects will become more apparent in the specifics that follow. The measurement model is given by = + , (1) where is the unknown value of the measurand, is the equally weighted mean of the population means of the methods, and is the possible bias of as an estimate of . We define as an equally weighted mean, because in the majority of reference material applications, it is difficult to quantify the relative biases of the the meth- ods. (Greek symbols are used here to emphasize that the quantities are unobserved and unknown.) Both and require estimates and uncertainties of these estimates. The natural estimate of is the sample mean of the set of method results. Standard statistical theory gives the uncertainty of this quantity (see example of Sec. 4). For it is most often the case in the present setting to assume that the best estimate is zero. However, it is recognized that there is uncertainty in the estimate. If the best estimate were not zero, then according to the ISO guidelines the measurement result should be ad- Fig. 2. The rectangular (or uniform) distribution. justed by the nonzero amount. What is required is a procedure to produce the uncer- tainty estimate of . To do this, the analyst places a probability distribution on the value that best summa- rizes the available information. The top plot in Fig. 2 displays a simple and useful distribution for this pur- pose, called the rectangular (also called uniform) distri- bution. The distribution models the bias as (1) centered at zero; (2) bounded between a ; and (3) equally likely to be anywhere between a . Under this assumption, the standard uncertainty of the bias estimate is equal to a /3. The bottom plot in Fig. 2 in conjunction with the top Fig. 3. The normal distribution. plot justifies a reasonable choice of a . Here the X , X , 1 2 and X represent, respectively, the results of the two unbounded meaning that unlike the rectangular distribu- methods and the mean of the two results. Thus, a is tion any value is possible. These qualities are repre- equal to (X X )/2. Under the measurement model of 2 1 sented by the shape of the distribution. There are several Eq. (1), this choice of a is equivalent to saying that the ways of employing the normal distribution. If the analyst unknown value of the measurand is believed to be (1) believes that there is a 95 % chance that the bias is centered at the mean of the two method results; (2) bounded between a , then the standard uncertainty of bounded between the two method results; and (3) the bias is a /2. As described above, a reasonable value equally likely to be anywhere between the two method for a is equal to (X X )/2. Note that although the 2 1 results. normal distribution is unbounded, the use of it described There are other useful Type B distributions that can above results in a smaller uncertainty for the bias than be placed on the bias. Another simple distribution is the the rectangular assumption described above. It is impor- normal distribution (see Fig. 3). The normal distribution tant to note that in the ISO uncertainty procedure only places higher probability on values near the center of the the standard uncertainty matters and not the actual form distribution than values far from the center. It is also of the distribution. 574 Volume 105, Number 4, July–August 2000 Journal of Research of the National Institute of Standards and Technology 3.1 Implementation Issues into a single GC result with an associated uncertainty. Using the combined GC result and the INAA result, the The previous section described the general frame- analyst can apply the Type B modeling described in this work of the proposed solution to the two-method prob- paper. lem. This section discusses some specific details and The Cd results of Fig. 1 display another important implementation issues that will arise in application. We case. In this case, there does not appear to be a between- emphasize that although the use of the rectangular dis- method effect. The question arises when to apply the tribution was highlighted in the last section as a model procedures described in this paper and when one can for the possible bias, other distributions may be used in assume that there is not a between-method effect. One the general framework of BOB. The particular distribu- way of answering this question is to perform a t -test (or tion is best determined by the experimenter based on the an F -test if the number of methods is greater than 2) on knowledge of the measurement process, previous exam- the difference between the two results [1]. The t -test, as ples, or assistance from a statistician experienced in the typically employed with an -level of 0.05, may favor area. the conclusion that there does not exist a between- Often when there are multiple methods used, the method effect. This conclusion may result in underesti- methods are related. The top plot of Fig. 4 illustrates mating the uncertainty. We recommend that if the t -test such a situation. There are four methods, but three of the is used, that the analyst use an -level of 0.5. Alterna- four are related to each other. In this example, three of tively, the use of BOB with the rectangular distribution, the methods are gas chromatography (GC) analyses and as described above, may be effective. If there is not a the forth method is neutron activation (INAA). It is between-method effect, then the results of the multiple likely that the three GC analyses are more related to methods should tend to be close to each other. In such each other than to the INAA analysis. The naive use of a case the width of the distribution on the bias (and its the t -interval approach would be misleading because uncertainty) will be small. Thus, there will be little these are not four independent methods. One procedure penalty for including the effect when it is small. for handling this case is to combine the three GC results The last case we consider is displayed in the bottom plot of Fig. 4. Here the result of Method 1 (represented by the dot) has the lowest value among the four methods. However, the expanded uncertainty interval of Method 2 extends below the intervals of the other three methods. In this case it may make more sense to define the Type B distribution of the bias based on the limits of the expanded uncertainties. In Appendix A, the presence of large within-method uncertainties is addressed with de- grees of freedom considerations. 4. Example This section presents a worked example that displays the details of the BOB procedure using the rectangular distribution. The example is based on the Hg data dis- cussed in the body of the paper. Before starting the example, we review some neces- sary statistical results. Suppose W , W , , W are n 1 2 n independent measurements. Let W and s(W ) denote the sample mean and sample standard deviation, respec- tively. The standard uncertainty of a sample mean, from the random variation in the measurements, is equal to s(W )/n . (2) Fig. 4. Multimethod examples. GC1, GC2, and GC3 represent gas The associated degrees of freedom for this uncertainty chromatography using three different columns. INAA means instru- is n 1. In addition to the uncertainty from the random mental neutron activation analysis. The uncertainty intervals indicate variation, there may exist uncertainty from systematic expanded uncer tainties with coverage factors k =2. effects. 575 Volume 105, Number 4, July–August 2000 Journal of Research of the National Institute of Standards and Technology We will make multiple uses of the linear measure- Step 0: The Measurement Equation ment equation given by The measurement equation model is given by Eq. (1), repeated below: Y = aW + bZ , (3) = + , (6) where a and b are fixed constants with no uncertainty and W and Z are quantities with uncertainty. Let the where is the unknown value of the concentration, is standard uncertainties of W and Z be u (W)and u (Z)and the equally weighted mean of the population means of the associated degrees of freedom and . In all that the methods, and is the bias of as an estimate of . W Z follows, assume that W and Z are independent. From Each quantity in the model must be estimated. (We use propagation of uncertainties [2,3], the standard uncer- Latin letters to distinguish the estimates, which are ob- tainty of Y is equal to servable, from the unobservable unknown values. Un- certainties will be associated with the estimates, as op- 2 2 2 2 u (Y)= a u (W)+ b u (Z ) . (4) posed to the unknown values.) The measurement equation relating the estimates is The associated degrees of freedom derived from the Welch-Satterthwaite formula [2,3], is Y = X + B , (7) u (Y ) where Y is the final measurement result, X is the sample = . (5) 4 4 4 4 a u (W )/ + b u (Z )/ mean of X and X , and B is equal to zero. The final W Z 1 2 measurement result is Returning to the example, Table 1 gives the relevant summary statistics for the results from the two laborato- 1 – Y = X + B = (X + X ) 1 2 ries. For notation, let X , s (X ), and n be the summary 2 1 1 1 statistics for Laboratory 1 and likewise, X , s (X ), and n 1 2 2 2 +0= (0.368 + 0.310) mg/kg = 0.339 mg/kg. (8) be the summary statistics for Laboratory 2. In order to 2 make certain relationships explicit, we use the notation X and X to refer to the two laboratory results including We point out here that although the number of measure- 1 2 all corrections. ments for the two methods are not the same, we weight the results equally because there is no reason to believe Table 1. Summary statistics for Hg results one result is more accurate than the other. The next steps are the calculation of the uncertainties of X and B Lab 1 2 and their combination to obtain the uncertainty of Y . X 0.368 mg/kg 0.310 mg/kg Step 1: Within-Method Uncertainty s (X ) 0.011 mg/kg 0.0086 mg/kg For each laboratory result, calculate the standard n 420 u (S ) 0.006 mg/kg uncertainty. For Laboratory 2, the laboratory result is X = X . The standard uncertainty u (X ) is given by the 2 2 2 result for the sample mean [see Eq. (2)]. It is equal to Laboratory 1, in addition to the measurement varia- tion, has a possible systematic effect. The uncertainty of – 0.0086 u (X )= u (X )=s (X )/n = mg/kg 2 2 2 2 the effect is quantified as a Type B source of uncer- tainty, referred to as u (S ). We assume that this uncer- tainty has infinite degrees of freedom. If it were possi- = 0.0019 mg/kg. (9) ble to identify all the systematic effects in each laboratory’s measurement process and quantify the re- and the degrees of freedom is equal to =20 1 spective uncertainties then there would be no need to = 19. use the BOB procedure. For Laboratory 1, the Type B uncertainty associated Note in the following calculations, many more digits with the systematic effect must be included in the un- are maintained in the intermediate steps than are shown. certainty. The systematic effect is assumed to be an This will lead to apparent discrepancies in the equations additive effect. The resulting measurement equation is that follow, in which only a small number of digits are displayed. X = X + S , (10) 1 1 1 576 Volume 105, Number 4, July–August 2000 Journal of Research of the National Institute of Standards and Technology where S is a correction that accounts for the possible Step 3: Combining Uncertainties 1 1 systematic effect. The uncertainty of X is equal to First, we calculate u (X ). Recall X = (X + X )= X 1 1 2 1 2 2 u (X )=s (X )/n = 0.011 mg/kg/4 = 0.0055 mg/kg + X . 1 1 1 2 and has =4 1 = 3 degrees of freedom. Although Using Eqs. (3)-(5), with a = b = 1/2, u (S ) is non-zero, the best estimate of S is zero. Using 1 1 2 2 the results of Eqs. (3)-(5), with a = b = 1 and W = X 1 1 1 2 2 u (X)= u (X )+ u (X ) 1 2 and Z = S , the standard uncertainty of the Laboratory 1 2 2 result is 1 1 2 2 – = 0.0081 + 0.0019 mg/kg = 0.0042 mg/kg (15) 2 2 2 2 u (X )= u (X )+ u (S )= 0.0055 + 0.006 mg/kg 4 4 1 1 1 = 0.0081 mg/kg, (11) and the degrees of freedom of u (X ) is equal to with associated degrees of freedom u (X ) 1 4 4 1 4 4 ( ) u (X )/ +( ) u (X )/ 1 X 2 X 2 1 2 2 4 4 u (X ) 0.0081 = – = 1 4 4 4 4 4 u (X )/ + u (S )/ 0.0055 /3 + 0.006 / 0.0042 1 X 1 S 1 1 = = 16.0. (16) 1 4 4 1 4 4 ( ) 0.0081 /14.4 + ( ) 0.0019 /19 2 2 = 14.4. (12) Finally, from the measurement equation, Eq. (7), Note that the term 0.006 / is equal to zero. Table 2 2 2 2 2 summarizes the within-laboratory uncertainties and de- u (Y)= u (X)+ u (B)= 0.0042 + 0.0167 mg/kg grees of freedom. = 0.017 mg/kg (17) Table 2. Within-method uncertainties and the corresponding degrees of freedom is equal to Lab 1 2 u (Y ) u (X ) 0.0081 mg/kg 0.0019 mg/kg = Y 4 4 u (X )/ + u (B )/ X B 14.4 19 0.017 = = 27.0. (18) 4 4 0.0042 /16.0 + 0.0167 /24.0 Step 2: Between-Method Uncertainty In the BOB procedure, a Type B distribution is used The final summary value and its standard uncertainty to account for the possible bias B in the average of the for the results of the two-laboratory study are 0.339 results of the methods. In this example, we use the mg/kg and 0.017 mg/kg. The degrees of freedom is 27. rectangular distribution bounded by the two laboratory The multiplier for a 95 % level of confidence interval is results for B , as described in Sec. 3, for this purpose. 2.1, which is based on a t -multiplier with 27 degrees of The standard uncertainty based on this distribution is freedom (see Table B.1 of Ref. [3]). The expanded un- equal to certainty is equal to (2.1)(0.017) mg/kg = 0.036 mg/kg. |X X | |0.368 0.310| 1 2 u (B)= = mg/kg 23 33 5. Conclusion = 0.0167 mg/kg. (13) It was stated in Sec. 2 that a guiding principle in the derivation of BOB was the constraint that solutions that Using Eq. 20 of Appendix A, the degrees of freedom for are based solely on the observed results will produce this quantity is intervals whose widths are comparable to the t -interval 2 2 with one degree of freedom, i.e., very large. In other 1 (X X ) 1 (0.368 0.310) 2 1 = = B 2 2 2 2 words, two disparate methods give you effectively only 2 u (X )+ u (X ) 2 0.0081 + 0.0019 1 2 two observations of information. BOB does not pull any more information out of the data. BOB overcomes the = 24.0. (14) limitation by bringing in outside information about the 577 Volume 105, Number 4, July–August 2000 Journal of Research of the National Institute of Standards and Technology measurement processes and quantifying this information uncertainty in the uncertainty of the bias term. If X and in terms of a Type B distribution. The particular distri- X are normal, an exact formula for u (|X X |) is 2 2 1 bution is best determined by the experimenter based on possible based on the folded normal distribution [9]. the knowledge of the measurement process, previous examples, or assistance from a statistician experienced in the area. In any given application, a reviewer of the 7. Appendix B. Bayesian Model uncertainty may disagree with the result. However, in BOB, the outside information appears explicitly and This appendix presents a Bayesian justification for the concretely and is open to evaluation. We believe this BOB procedure. It is more technical than the rest of the explicitness, which Bayesian approaches share, is a ma- paper and uses standard notation for Bayesian statistics. jor strength of BOB. See Ref. [10] for an introduction to Bayesian statistics BOB also possesses many of the desirable criteria and the notation used in this section. discussed in Sec. 2. In particular, it fits in the ISO Let [x ¯ , s (x ), n ]and [x ¯ , s (x ), n ) be the summary 1 1 1 2 2 2 framework, it is simple to implement, and it is related to statistics for the two methods. Let and be the 1 2 a rigorous statistical model (see Appendix B). population means of the two methods. These latter quantities represent the sample means of a conceptually infinite number of measurements. Let be the unknown 6. Appendix A. Degrees of Freedom value of ultimate interest. One natural approach would be to build a hierarchical The lower plot of Fig. 4 displays an example in which model around the conditional distribution of , | . 1 2 one of the within-method uncertainties is very large. In We do not follow that path here, because the resulting the basic use of the rectangular distribution presented, uncertainty in would reflect the one degree of free- the values of the multiple method results are the input dom problem we are trying to escape. Instead, we re- into the uncertainty evaluation, that is, u (B)=|X X |/ verse the situation and build a model around the distri- 2 1 12. If these method results have large uncertainties, bution | , . What this model will imply is that if one 1 2 the uncertainty evaluation of the possible bias may not knew and , then there is no more information on 1 2 be reliable. Degrees of freedom may be used to over- in the observed data. In other words, [x ¯ , s (x ), n ]and 1 1 1 come this problem. Degrees of freedom can be thought [x ¯ , s (x ), n ) only provide information on and , 2 2 2 1 2 of as the uncertainty in the uncertainty. Low degrees of which in turn provide information on . freedom correspond to high uncertainty in the uncer- It is up to the scientists to answer the question: If you tainty. Formula G.3 of Ref. [2] provides an approxima- knew the results of an infinite number of measurements, tion to the degrees of freedom of an estimated standard i.e., and , what is the distribution that reflects the 1 2 uncertainty. Using this formula for u (B)=|X X |/ uncertainty in , the value of interest? In this appendix, 2 1 12, the degrees of freedom is we model p ( | , ) as a uniform distribution centered 1 2 on ( + )/2 and with full width | |. 1 2 1 2 1 (X X ) We use the conjugate normal model with reference 2 1 . (19) 2 u (|X X |) priors for the parameters as the models for the results of 1 2 the two methods. The basic result of the conjugate nor- We suggest the use of the approximation u (|X X |) mal model is p [ | x ¯ , s (x )] is the distribution of the 2 1 1 1 1 2 2 ≈ u (X )+ u (X ). Using this approximation, the degrees quantity x ¯ +[s (x )/n ]t , where t has a t -distri- 1 2 1 1 1 n 1 n 1 1 1 of freedom is equal to bution with n 1 degrees of freedom. A similar result holds for p [ | x ¯ , s (x )]. 2 2 2 1 (X X ) With p [ | x ¯ , s (x )], p [ | x ¯ , s (x )], and p ( | , ) 2 1 1 1 1 2 2 2 1 2 . (20) 2 2 2 u (X )+ u (X ) given, the posterior distribution p [ | x ¯ , s (x ), x ¯ , s (x )] 1 2 1 1 2 2 is completely specified. Since all the components are The approximation is good when |X X | is large basic distributions, standard statistical software can be 2 1 relative to u (X )and u (X ). Under this condition, used to simulate from this posterior distribution. Figure 1 2 |X X | is equal to X X with high probability or 5 shows the resulting posterior distribution for the Hg 2 1 2 1 equal to X X with high probability. If the condition data of the paper based on a simulation of 10 values. 1 2 is not true the approximation may be poor. Also, when The sample mean and standard deviation from the simu- the condition is not met, the use of the approximation lation are 0.339 mg/kg and 0.018 mg/kg, respectively, will result inappropriately in very small degrees of free- compared with 0.339 mg/kg and 0.017 mg/kg from the dom. We recommend that the degrees of freedom for the results for the BOB procedure in Sec. 4. bias be at least 3. A value of 3 is equivalent to a 42 % 578 Volume 105, Number 4, July–August 2000 Journal of Research of the National Institute of Standards and Technology Fig. 5. Simulated posterior distribution from Hg data. An exact comparison of the mean and uncertainty for 8. References the BOB procedure and the Bayesian model is possible. In the following derivations, we suppress the depen- [1] M. G. Natrella, Experimental Statistics, Handbook 91, NBS, Gaithersburg, MD (1963). dence on the observed quantities. [2] International Organization for Standardization (ISO), Guide to the Expression of Uncertainty in Measurement, International + x ¯ + x ¯ 1 2 1 2 Organization for Standardization (ISO), Geneva, Switzerland, E( ) = E[E( | , )]=E = (21) 1 2 2 2 1993 (corrected and reprinted 1995). [3] B. N. Taylor and C. E. Kuyatt, Guidelines for Evaluating and Expressing Uncertainty in NIST Measurement Results, NIST Var( ) = E[Var( | , )] + Var[E( | , )] (22) 1 2 1 2 TN 1297, NIST, Gaithersburg, MD (1994). [4] S. B. Schiller, Standard Reference Materials: Statistical Aspects ( ) + 1 2 1 2 of the Certification of Chemical SRMs, NIST SP 260-125, =E +Var (23) 12 2 NIST, Gaithersburg, MD (1996). [5] International Vocabulary of Basic and General Terms in Metrol- ogy (second edition), International Organization for Standard- 1 1 = [E ( )+Var( )] + Var( + ) (24) 1 2 1 2 1 2 ization (ISO), Geneva, Switzerland, 1993. 12 4 [6] S. B. Schiller and K. E. Eberhardt, Combining Data from Inde- pendent Analysis Methods, Spectrochim. Acta 46 (12) (1991). 1 1 2 [7] R. Paule and J. Mandel, Consensus Values and Weighting Fac- = E ( )+ Var( + ) (25) 1 2 1 2 12 3 tors, J. Res. Natl. Bur. Stand. (U.S.) 87 (5) (1982). [8] A. L. Rukhin, B. J. Biggerstaff, and M. G. Vangel, Restricted 2 2 2 Maximum Likelihood Estimation of a Common Mean and the (x ¯ x ¯ ) 1 n 1 s (x ) n 1 s (x ) 1 2 1 2 1 2 = + + . (26) Mandel-Paule Algorithm, to be published. 12 3 n 3 n n 3 n 1 1 2 2 [9] J. K. Patel and C. B. Read, Handbook of the Normal Distribu- tion, Marcel Dekker, New York (1982). The mean from the BOB procedure is identical to that [10] P. M. Lee, Bayesian Statistics: An Introduction, Oxford Univer- sity Press, New York (1989). of Bayes model. The variance from BOB is 2 2 2 About the authors: Mark S. Levenson, William F. (x ¯ x ¯ ) 1 s (x ) s (x ) 1 2 1 2 + + , (27) Guthrie, Hung-kung Liu, Mark G. Vangel, James H. 12 4 n n 1 2 Yen, are Nien-fan Zang are mathematical statisticians in the Statistical Engineering Division of the Informa- which differs from the Bayes model in the second term. tion Technology Laboratory at NIST. David L. Banks, Future work will explore the Bayes model and general- Keith R. Eberhardt, and Lisa M. Gill are former mem- izations of it. bers of the Statistical Engineering Division of the Infor- mation Technology Laboratory at NIST. The National Institute of Standards and Technology is an agency of the Technology Administration, U.S. Department of Commerce.

http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png

Journal of Research of the National Institute of Standards and Technology Pubmed Central

http://www.deepdyve.com/lp/pubmed-central/an-approach-to-combining-results-from-multiple-methods-motivated-by-ZXRClATUN0

An Approach to Combining Results From Multiple Methods Motivated by the ISO GUM

Loading next page...

References (11)

R. Paule, J. Mandel (1982)
Consensus Values and Weighting Factors.
Journal of research of the National Bureau of Standards, 87 5
(1993)
International Vocabulary of Basic and General Terms in Metrology (second edition), International Organization for Standardization (ISO)
S. Schiller, K. Eberhardt (1991)
Combining data from independent chemical analysis methods
Spectrochimica Acta Part B: Atomic Spectroscopy, 46
(1963)
Experimental Statistics, Handbook 91
E. Iso (1993)
Guide to the Expression of Uncertainty in Measurement
A. Rukhin, B. Biggerstaff, Mark Vangel (2000)
Restricted maximum likelihood estimation of a common mean and the Mandel–Paule algorithm
Journal of Statistical Planning and Inference, 83
Jagdish Patel, C. Read (1982)
Handbook of the normal distribution
Technometrics, 25
B. Taylor (2017)
Guidelines for Evaluating and Expressing the Uncertainty of Nist Measurement Results
(1991)
Combining Data from Independent Analysis Methods
, 46
Lawrence Joseph, P. Lee (1989)
Bayesian Statistics: An Introduction
The American Statistician, 47
(1996)
Standard Reference Materials: Statistical Aspects of the Certification of Chemical SRMs, NIST SP 260-125

Publisher: Pubmed Central
ISSN: 1044-677X
eISSN: 2165-7254
DOI: 10.6028/jres.105.047
Publisher site: See Article on Publisher Site

Abstract

Journal

Journal of Research of the National Institute of Standards and Technology – Pubmed Central

Published: Aug 1, 2000

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

An Approach to Combining Results From Multiple Methods Motivated by the ISO GUM

An Approach to Combining Results From Multiple Methods Motivated by the ISO GUM

Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 7-Day Trial for You or Your Team.

Learn More →

An Approach to Combining Results From Multiple Methods Motivated by the ISO GUM

An Approach to Combining Results From Multiple Methods Motivated by the ISO GUM

References (11)

Abstract

Journal

Recommended Articles

There are no references for this article.

Our policy towards the use of cookies