## 5.1 Identify nature of the problem

The SMOCC dataset, introduced in Section 4.1.2, contains scores on the DDI of Dutch children aged 0-2 years made during nine visits.

Table 5.1: SMOCC DDI milestones, first three children, 0-2 years.
29 30 31 33 34 36 37 39 41 43 44 16 36 41 48 01 02 03 04 05 07 08 09 10 11 12 13 14 15 16 17 18 19 54 06 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 46 68
1 0 1 0 1 1 1
1 1
1 0 1 1 0 0 1 1 1 1 1
0 1 1 1 0 1 1 1 1 1 0 1 1
1 1 0 1 1 1 1 1
1 1 1 1 1 1 0 1
1 1 1 1 0 0
1 0 1 0 0
1 1 1 0 1
1 1 1 1 1 1 1
1 1
1 0 1 1 0 0 1 0 1 0 0
0 1 1 0 0 1 1 1 1 0 1 0 1
1 1 1 1 1 1 1 0 1
1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1
1 0 1 1
1 0 1 0 1 1 1
1 0
1 0 1 1 0 0 1 0 1 1 0
1 1 1 0 0 1 0 1 1 1 1 1 1
1 1 1 1 1 1 1 1 0 0
1 1 1 1 1 0
0 0 1 1 1 1
1 1 0 1 1 1 0 1 1
1 1 1 1 1 1 1 1 1 1 1 1

Table 5.1 contains data of three children, measured on nine visits between ages 0 - 2 years. The DDI scores take values 0 (FAIL) and 1 (PASS). In order to save horizontal space, we truncated the column headers to the last two digits of the item names.

Since the selection of milestones depends on age, the dataset contains a large number of empty cells. Naive use of sum scores as a proxy to ability is therefore problematic. An empty cell is not a FAIL, so it is incorrect to impute those cells by zeroes.

Note that some rows contain only 1’s, e.g., in row 2. Many computer programs for Rasch analysis routinely remove such perfect scores before fitting. However, unless the number of perfect scores is very small, this is not recommended because doing so can severely affect the ability distribution.

In order to effectively handle the missing data and to preserve all persons in the analysis we separate estimation of item difficulties (c.f. section 5.2) and person abilities (c.f. section 5.3).