9.1 D-score from multiple instruments

We developed the initial D-score methodology for just one instrument. In practice, however, we need to deal with data collected on multiple, partially overlapping tools. This chapter addressed the problem how to define and calculate the D-score based on data coming from various sources, using multiple instruments administered at varying ages.

We had longitudinal data available from 16 cohorts, collected with 15 tools to measure child development at various ages. Our analytic strategy to define a D-score from these data consists of the following steps:

  1. Make an inventory of instruments and cohorts;
  2. Combine all measurements into one dataset;
  3. Find out which shared instruments connect cohorts;
  4. Place similar items from different instruments into equate groups;
  5. Find the best set of active equate groups;
  6. Estimate item difficulty using a restricted Rasch model that requires the estimates of all items within an active equate group to be identical;
  7. Weed out items that do not fit the model.

We need to perform steps 5, 6 and 7 in an iterative fashion. Depending on the result, we may also need to redefine, combine or break up equate groups (step 4).

These techniques are well-known within psychometrics and educational research. Our approach builds upon a well-grounded and robust theory of psychological measurement. We, therefore, expect that repeating our method on other data will lead to very similar results.

A novel aspect in our methodology is the systematic formation of candidate equate groups by subject-matter experts based on similarity in concept and content. Our subsequent testing and tailoring of each equate group given the data provide empirical evidence of its quality for connecting instruments. While anchoring tests by itself is not novel, we are not aware of any work aimed at identifying the best set of active equate groups on this scale.