Agreement Among Numerical Values

April 8, 2021

J. Cohen: (1968). Kappa weighted: the nominal scale agreement with provisions for differences of opinion or partial appropriations. Psychological bulletin, 70, 213-220. Note that, in this context, the terms of truth and precision defined in ISO 5725-1 are not applicable. One reason is that there is not a single “real value” of a quantity, but two possible true values for each case, whereas in all cases, accuracy is an average and therefore takes into account both values. In this context, however, the term “precision” is described as another metric that comes from the field of the call for information (see below). (5) Data that appear to be in a bad line can lead to fairly high correlations. For example, Serfontein and Jaroszewicz [2] compared two methods of measuring the age of pregnancy. Babies with a pregnancy age of 35 weeks by one method had plagues between 34 and 39.5 weeks per other, but r was high (0.85). On the other hand, Oldham et al. [3] compared Wright mini-and large flowmeters and found a correlation of 0.992. They then connected the meters in series, so that both measured the same throughput and achieved a “material improvement” (0.996).

If a correlation coefficient of 0.99 can be significantly improved, we need to reconsider our view of a strong correlation in this context. As we show below, the strong correlation of 0.94 for our own data masks an important disagreement between the two instruments. If we have repeated measurements with each of the two methods on the same themes, we can calculate the average of each method for each subject and use these average pairs to compare the two methods with the analysis described above. The distortion estimate will not be affected, but the estimate of the standard difference of differences will be too small, as some of the effects of repeated measurement errors have been eliminated. We can rectify that. Suppose we have two measurements that will be obtained from each method, as in the table. We find the standard deviation between the repeated measurements for each method separately, s1 and s2, and the standard deviation between the average values for each method, sD. The difference-adjusted standard deviation, sc, is root (sD2 – 1/4 s12 – 1/4 s22). This is roughly root (2sD2), but if there are differences between the two methods that are not solely due to repeatability errors (i.e. the interaction between the subject and the measurement method), this approach can lead to an overestimation.

For the PEFR, we have sD 33.2, s1 – 21.6, s2 – 28.2 – l/min. sc is therefore root (33.22 – 1/4 x 21.62 – 1/4 x 28.22) or 37.7 l/min. On the other hand, the approach root (2sD2) overestimates (47.0 l/min). In the previous analysis, it was considered that the differences in the measurement range were not systematically different. That may not be the case. Figure 4 compares the measurement of the average speed of circular fiber reduction (VCF) by the long axis and the short axis in the M-Mode echocardiography. [6] The dispersion of differences increases as the VCF increases. We could ignore that, but the limits of the agreement would be further away than what is needed for small CFVs and narrower than those that should be for large CFVs.

If the differences are proportional to the average, a logarithmic transformation should give a more similar image to that of Figures 2 and 4, and then we can apply the analysis described above to the transformed data.