6.5 Reliability

The reliability is a one-number summary of the accuracy of an instrument. Statisticians define reliability as the proportion of variance attributable to the variation between children’s abilities relative to the total variance. More specifically, the reliability \(R\) of a test is written as

\[R \equiv \frac{\sigma_{\beta}^2}{\sigma_{\beta}^2 + \sigma_{e}^2},\]

where \(\sigma_{\beta}^2\) is the variance of true scores and \(\sigma_{e}^2\) is the error variance.

In general, high reliability is desirable. We often use reliability to decide between instruments. Cronbach’s \(\alpha\) is a widely used estimate of the lower bound of the reliability of a test. In the Rasch model, we can estimate reliability by the ratio

\[\hat{R} = \frac{\hat\sigma_{\hat\beta}^2 - \hat\sigma_{\hat e}^2}{\hat\sigma_{\hat\beta}^2}.\]

For a given model, we can calculate \(\hat\sigma_{\hat\beta}^2\) directly as the sampling variance of the estimated abilities. Getting an estimate for \(\hat\sigma_{\hat e}^2\) is more complicated. We use the modelled person abilities and item difficulties to generate a hypothetical data set of the same size and same missing data pattern, and re-estimate the person ability from the simulated data. Then \(\hat\sigma_{\hat e}^2\) is computable as the variance of the difference between the modelled and re-estimated person ability.

The estimated variance of the modeled abilities is \(\hat\sigma_{\hat\beta}^2 = 76.6\), and the variance of the difference between modeled and re-estimated abilities is equal to \(\hat\sigma_{\hat e}^2 = 1.74\). The corresponding standard error of measurement (sem) is \(\hat\sigma_{\hat e} = 1.32\) logits.

The estimated reliability in the SMOCC data is equal to \((76.6-1.74)/76.6 = 0.977\). We may interpret this estimate in the same way as Cronbach’s \(\alpha\), for which typically any value beyond 0.9 is classified as excellent. Note that the reliability is very high because of the large variation in D-scores. Newborns are very different from 2-year old toddlers, which helps to increase reliability. In practice, it is perhaps more useful to use a measure of accuracy that is less dependent on the variation within the sample. The sem, as explained above, seems to be a more relevant measure of precision.