## 3.1 Are instruments connected?

The ultimate goal is to compare child development across populations and cultures. A complication is that measurements are made by different instruments. To do deal with this issue, we harmonize the data included in the GCDG cohorts. In particular, we process the milestone responses such that the following requirements hold:

• Every milestone in an instrument has a unique name and a descriptive label;
• Every milestone occupies one column in the dataset;
• Item scores are (re)coded as: 1 = PASS; 0 = FAIL;
• Every row in the dataset corresponds to a unique cohort-child-age combination.

Cohorts and milestones need to be connected. There are several ways to connect cohorts:

• Two cohorts are directly connected if they use the same instrument;
• Two cohorts are indirectly connected if both connect to a third cohort that connects them.

Likewise, instruments can be connected:

• Two instruments are directly connected if the same cohort measures both;
• Two instruments are indirectly connected if both connect to a third instrument that connects them.

 study aqi bar bat by1 by2 by3 ddi den gri mac peg sbi sgr tep vin Bangladesh x Brazil 1 x Brazil 2 x Chile 1 x Chile 2 x x China x Colombia 1 x Colombia 2 x x x x Ecuador x Ethiopia x Jamaica 1 x Jamaica 2 x Madagascar x x x Netherlands1 x Netherlands2 x South Africa x x x x

An x in Table 3.1 identifies which cohorts use which instruments. The linkage table shows that studies from China, Colombia, and Ethiopia are directly connected (by by3). Brazil 1 indirectly connects to these studies through den. Some cohorts (e.g., Chile 1 and Ecuador) do not link to any other study. Likewise, we might say that aqi, bat, by3, and den are directly connected. Note that no indirect connections exist to this instrument group.

Table 3.1 is a somewhat simplified version of the linkage pattern. As we saw in section 2.2, there are substantial age differences between the cohorts. The linked instrument linkage table shows the counts of the number of registered scores per age group. What appears in Table 3.1 as one test may comprise of two disjoint subsets, and hence some cohorts may not be connected after all.

Connectedness is a necessary - though not sufficient - requirement for parameter identification. If two cohorts are not connected, we cannot distinguish between the following two alternative explanations:

• Any differences between studies can be attributed to the ability of the children;
• Any differences between studies can be attributed to the difficulties of the instruments.

The data do not contain the necessary information to discriminate between these two explanations. Since many cohorts in Table 3.1 are unconnected, it seems that we are stuck.

The next section suggests a way out of the dilemma.