On the Evaluation of Discrepant Scientific Data with Unrecognized Errors

Dan G. Cacuci; Mihaela Ionescu-Bujor

doi:dx.doi.org/10.13182/NSE09-37A

When n measurements and/or computations of the same (unknown) quantity yield data points xj with corresponding standard deviations (uncertainties) j such that the distances [vertical bar]xj - xk[vertical bar] between any two data points are smaller than or comparable to the sum (j + k) of their respective uncertainties, the respective data points are considered to be consistent or to agree within error bars. However, when the distances [vertical bar]xj - xk[vertical bar] are larger than (j + k), the respective data are considered to be inconsistent or discrepant. Inconsistencies can be caused by unrecognized or ill-corrected experimental effects (e.g., background corrections, dead time of the counting electronics, instrumental resolution, sample impurities, calibration errors). Although there is a nonzero probability that genuinely discrepant data could occur (for example, for a Gaussian sampling distribution with standard deviation , the probability that two equally precise measurements would be separated by more than 2 is erfc(1) [approximately equal] 0.157), it is much more likely that apparently discrepant data actually indicate the presence of unrecognized errors.

This work addresses the treatment of unrecognized errors by applying the maximum entropy principle under quadratic loss, to the discrepant data. Novel results are obtained for the posterior distribution determining the unknown mean value (i.e., unknown location parameter) of the data and also for the marginal posterior distribution of the unrecognized errors. These novel results are considerably more rigorous, are more accurate, and have a wider range of applicability than extant recipes for handling discrepant data.