6.3 Differential item functioning (DIF)

6.3.1 Relevance of DIF for cross-cultural equivalence

An essential assumption in the Rasch model is that a given item has the same difficulty in different subgroups of respondents. Climbing stairs is an example where this assumption is suspect. The exposure to stairs, and hence the opportunity for a child to practice, varies across different cultures. It could thus be that two children with the same ability but from different cultures have different success probabilities for climbing stairs. When these probabilities systematically vary between subgroup, we say there is Differential Item Functioning, or DIF (Holland and Wainer 1983). DIF is undesirable since it can make the instrument culturally biased.

6.3.2 How to detect DIF?

Zumbo (1999) provided a clear definition of DIF:

DIF occurs when examinees from different groups show differing probabilities of success on (or endorsing) the item after matching on the underlying ability that the item is intended to measure.

There are various ways to detect DIF. Here we will model the probability of endorsing an item by logistic regression using the observed item responses as the outcome. Predictors include the ability, the grouping variable, and the ability-grouping interaction. If the latter two terms explain the residual variance of the item scores after adjusting for ability, the item shows DIF for that group. DIF can be visually inspected by plotting the curves for the subgroups separately.

There are two forms of DIF:

  • Uniform DIF: The item response curves differ between groups in location, but are parallel;
  • Non-uniform DIF: The item response curve differ between groups in location, in slope and possibly in other characteristics.

These forms correspond to, respectively, the main effect of group and the ability-group interaction in the logistic regression model.

6.3.3 Examples of DIF

Two milestones from the DDI with similar item response curves for boys and girls. There is no DIF for sex.

Figure 6.6: Two milestones from the DDI with similar item response curves for boys and girls. There is no DIF for sex.

Figure 6.6 shows an example comparing boys and girls. For both milestones, the item response curves are similar for boys and girls, so we see no evidence of DIF here.

Two milestones from the DDI with different item response curves for boys and girls. There is evidence for uniform DIF.

Figure 6.7: Two milestones from the DDI with different item response curves for boys and girls. There is evidence for uniform DIF.

Figure 6.7 displays two milestones with DIF between boys and girls. Provided that the ability estimate (as estimated from all items in the model) is fair for both boys and girls, we see that milestone ddifmm019 (“Takes off shoes and socks”) is easier for girls by about 0.86 logits (= the difference in ability at the intersection of 50 per cent). Conversely, milestone ddigmm064 (“Crawls forward, abdomen on the floor”) is easier for boys by about 0.84 logits. These are the most substantial differences found for sex in the DDI. Both are uniform DIF.

In practice, having milestones with opposite directions of DIF in the same instrument will cancel out one another, so one need not be overly concerned in that case. However, we should be careful when the tool consists of milestones that all have DIF in the same direction.

The DDI did not contain items for which the ability-group interaction was statistically significant, so we conclude that there is no non-uniform DIF in the DDI.