## 6.2 Precision of the D-score

The EAP algorithm estimates the D-score from a set of PASS/FAIL scores. The standard deviation of the posterior distribution (or *sem*: standard error of measurement) quantifies the imprecision of the D-score estimate. The *sem* is inversely related to the number of items. Thus, when we administer more milestones, the *sem* of the D-score drops.

Figure 6.3 shows that the *sem* drops off rapidly when the number of items is low and stabilises after about 35 items. Apart from test length, the precision of the D-score also depends on item information (c.f. section 5.8). Administering items that are too easy, or too difficult, does not improve precision. The figure suggests that - in practice - a single D-score cannot be more precise than 0.5 D-score units.

One may wonder whether the *sem* depends on age. Figure 6.4 suggests that this is not the case. The average DAZ is close to zero everywhere, as expected. The interval DAZ \(\pm\) *sem* will cover the true, but unknown, DAZ in about 68% of the cases. While the interval varies somewhat across ages, there is no systematic age trend.

Does precision vary with studies? The answer is yes. Figure 6.5 plots the same information as before but now broken down according to cohort. Individual data points are added to give a feel for the design. The Colombia cohort `GCDG-COL-LT45M`

administered the Bayley-III, where each child answered on average 45 items, so the *sem* is small. In contrast, the Dutch cohort `GCDG-NLD-SMOCC`

collected data on a screener consisting of about ten relatively easy milestones, so the *sem* is relatively large. As a result, the Colombian D-scores are much more precise than the Dutch.

Cohort | Test length | Pass probability |

GCDG-ETH | 39 | 0.66 |

GCDG-CHL-1 | 32 | 0.67 |

GCDG-COL-LT45M | 45 | 0.64 |

GCDG-COL-LT42M | 61 | 0.62 |

GCDG-JAM-LBW | 43 | 0.55 |

GCDG-CHN | 27 | 0.50 |

GCDG-JAM-STUNTED | 38 | 0.65 |

GCDG-CHL-2 | 33 | 0.48 |

GCDG-BGD-7MO | 14 | 0.38 |

GCDG-MDG | 8 | 0.35 |

GCDG-BRA-1 | 18 | 0.89 |

GCDG-NLD-SMOCC | 10 | 0.80 |

GCDG-NLD-2 | 11 | 1.00 |

GCDG-ECU | 3 | 0.67 |

GCDG-ZAF | 12 | 1.00 |

The ordering of studies depends on test length and item information. Table 6.1 shows the median number of items per child (test length) and the probability to pass the item. The Ethiopian cohort `GCDG-ETH`

administered 39 milestones with a median probability of 0.66. In contrast, the South Africa study `GCDG-ZAF`

measures 12 items which were all very easy for the sample at hand (median probability of 1.0). One may thus well explain the extremes by test length and item information.

In general, the design of the study has a significant impact on the precision of the measurement. Our ongoing work addresses the question how one may construct a measurement instrument that will be optimally precise given the goals of the research.