This vignette provides an overview of the default prior settings and demonstrates how to customize the prior mean and standard deviation for D-score calculations. This is an advanced topic that requires a basic understanding of the D-score calculation process. If you are unfamiliar with the D-score methodology, we recommend reviewing the introductory vignettes before proceeding.

The default prior mean and standard deviation for the
`dscore()`

function are determined by the `key`

argument. This function searches for the corresponding
`base_population`

field in the `builtin_keys`

data
frame, which contains several columns including the following:

```
## key base_population
## 1 dutch dutch
## 2 gcdg gcdg
## 3 gsed1912 gcdg
## 8 293_0 phase1
## 10 gsed2212 phase1
## 11 gsed2406 preliminary_standards
```

For instance, for `key = gsed2406`

, the
`base_population`

is identified as
`"preliminary_standards"`

. The `get_mu()`

function
returns the prior mean for the specified `key`

at different
ages:

```
## [1] 9.731266 14.922704 19.282654 23.155214 26.707086 30.031809 33.187157
## [8] 36.211330 39.130903 41.965115 45.158164 47.235287 49.186179
```

This code snippet returns the prior mean for ages ranging from 0 to
12 months. These mean values represent the median of the D-score
distribution for the specified `base_population`

under the
current `key`

.

If the standard deviation of the prior is not specified, the
`dscore()`

function defaults to a value of 5.0 across all
ages. In comparison, the age-specific standard deviation for the
`base_population`

averages around 2.5 to 3.5. Therefore, a
standard deviation of 5.0 signifies a relatively broad prior
distribution, regardless of age.

It’s crucial to note that altering the `key`

parameter
changes both the prior mean and standard deviation. Since these
parameters affect the D-score, comparisons should generally be made only
between D-scores calculated using the same key, prior mean, and standard
deviation.

In certain situations, you may want to define your own prior mean and
standard deviation for the D-score calculations. This can be done by
setting the `prior_mean`

and `prior_sd`

arguments
in the `dscore()`

function. Below are a few examples that
demonstrate how to customize these priors.

In this example, we add a value of 5 to the default prior mean for each child, which results in higher D-scores.

```
# Calculate the custom prior mean by adding 5 to the default prior mean
data <- milestones
mymean <- get_mu(t = data$age, key = "gsed2406") + 5
# Calculate default D-scores
def <- dscore(data)
head(def)
```

```
## a n p d sem daz
## 1 0.4873 11 0.9091 30.76 3.751319 -0.633
## 2 0.6571 14 0.6429 29.06 2.518082 -2.716
## 3 1.1800 19 0.9474 53.35 3.414966 -0.006
## 4 1.9055 13 0.8462 63.88 2.971594 -0.094
## 5 0.5503 11 0.8182 28.75 3.476988 -1.863
## 6 0.7666 14 0.7857 34.21 3.088920 -2.377
```

```
## a n p d sem daz
## 1 0.4873 11 0.9091 33.88 4.146874 0.310
## 2 0.6571 14 0.6429 30.38 2.629683 -2.423
## 3 1.1800 19 0.9474 55.93 3.774148 0.787
## 4 1.9055 13 0.8462 65.78 3.203216 0.438
## 5 0.5503 11 0.8182 31.43 3.840857 -1.155
## 6 0.7666 14 0.7857 36.30 3.395857 -1.867
```

```
# Custom prior, column specification
adj2 <- dscore(cbind(data, mymean), prior_mean = "mymean")
head(adj2)
```

```
## a n p d sem daz
## 1 0.4873 11 0.9091 33.88 4.146874 0.310
## 2 0.6571 14 0.6429 30.38 2.629683 -2.423
## 3 1.1800 19 0.9474 55.93 3.774148 0.787
## 4 1.9055 13 0.8462 65.78 3.203216 0.438
## 5 0.5503 11 0.8182 31.43 3.840857 -1.155
## 6 0.7666 14 0.7857 36.30 3.395857 -1.867
```

`identical(adj1, adj2)`

`## [1] TRUE`

In this code, the `prior_mean`

argument shows two forms.
The first form directly specifies the custom prior mean, while the
second form refers to an additional column in the data frame that
contains the user-specified prior means. Both specifications yield
identical results. In addition, the user can specify a scalar value for
the `prior_mean`

argument, which will be applied to all
observations, but this option is unreasonable if ages vary across
observations.

The next snippet compares the adjusted and default D-scores as a function of the proportion of items passed by the child.

```
# Plot the difference between adjusted and default D-scores
plot(y = adj1$d - def$d, x = def$p,
xlab = "Proportion of items passed by the child",
ylab = "Upward drift of D-score",
pch = 16, main = "Impact of Custom Prior Mean on D-score")
# Add a smoothed line to visualize the trend
lines(lowess(x = def$p, y = adj1$d - def$d, f = 0.5), col = "grey", lwd = 2)
```

The plot illustrates that the upward bias is more pronounced when less informative items are administered, i.e., when the proportion of items passed is either very low (not shown) or very high. The bias is relatively mild (one D-score unit increase) when the child can perform about half of the items.

In some situations, we may have strong prior beliefs about the variability of the D-scores based on factors such as the trajectory of a child’s D-score or expert knowledge. Incorporating this information can lead to more robust or smooth results by better reflecting our understanding of the variability.

The following code snippet demonstrates how to set a custom prior
standard deviation. Here, the `prior_sd`

argument is
specified using a constant value or values derived from the data.

```
# Filter data for a specific child
boy <- milestones[milestones$id == 111, ]
# Calculate default D-scores
def <- dscore(boy)
def
```

```
## a n p d sem daz
## 1 0.4873 11 0.9091 30.76 3.751319 -0.633
## 2 0.6571 14 0.6429 29.06 2.518082 -2.716
## 3 1.1800 19 0.9474 53.35 3.414966 -0.006
## 4 1.9055 13 0.8462 63.88 2.971594 -0.094
```

Suppose we want to inform the estimation process by the previous observation. We can use the location of the last observation (in DAZ units) and calculate an informative mean and standard deviation for the next time point as follows:

```
# Calculate expected D-scores and standard deviations
exp_d <- zad(z = c(0, def$daz[1:3]), x = def$a)
exp_sd <- c(5, def$sem[1:3])
# Calculate adjusted D-scores using the custom prior mean and standard deviation
adj1 <- dscore(boy, prior_mean = exp_d, prior_sd = exp_sd)
```

The code snippet below plots the raw and informed DAZ trajectories for child 111:

```
# Plotting the raw and informed DAZ trajectories
plot(x = def$a, y = def$daz, type = "b", pch = 16,
ylab = "DAZ", xlab = "Age (years)",
main = "Standard (black) and Informed (red) DAZ-trajectory for child 111")
points(x = adj1$a, y = adj1$daz, col = "red", type = "b", lwd = 2, pch = 16)
```

This plot illustrates the DAZ trajectory using standard estimates (in black) and the adjusted estimates (in red) for child 111, highlighting the impact of incorporating more informative prior knowledge into the analysis.

Of course, the examples provided here are simplified and may not
fully capture the complexity of real-world scenarios. However, they
demonstrate how to customize the prior mean and standard deviation in
the `dscore()`

function to better reflect your prior
knowledge and improve the accuracy of the D-score estimates.

By default, the D-score of observations with missing ages will be
`NA`

. It is possible to force D-score calculation by setting
`prior_mean_NA`

and `prior_sd_NA`

to a specific
value. The documentation for the `dscore()`

function states
that `prior_mean_NA = 50`

and `prior_sd_NA = 20`

as reasonable choices for samples between 0-3 years. If these defaults
are not suitable for your data, you can customize them to better reflect
your expectations.

```
# Set missing ages for specific observations
boy$age[2:3] <- NA
# Calculate D-scores using default
def <- dscore(boy)
def
```

```
## a n p d sem daz
## 1 0.4873 11 0.9091 30.76 3.751319 -0.633
## 2 NA 14 0.6429 NA NA NA
## 3 NA 19 0.9474 NA NA NA
## 4 1.9055 13 0.8462 63.88 2.971594 -0.094
```

This call to `dscore()`

produces a D-score of
`NA`

when age data is missing, which effectively excludes
these cases from downstream analyses. This is the safest option, and the
default behavior.

```
# Calculate D-scores for missing ages using age-independent priors
adj1 <- dscore(boy, prior_mean_NA = 50, prior_sd_NA = 20)
adj1
```

```
## a n p d sem daz
## 1 0.4873 11 0.9091 30.76 3.751319 -0.633
## 2 NA 14 0.6429 26.51 2.693178 NA
## 3 NA 19 0.9474 54.25 5.061741 NA
## 4 1.9055 13 0.8462 63.88 2.971594 -0.094
```

This call to `dscore()`

uses custom settings
`prior_mean_NA = 50`

and `prior_sd_NA = 20`

, which
are suggested age-independent values for children with missing ages
between 0 and 3 years.

```
# Forcing D-scores for missing ages to value -1
adj2 <- dscore(boy, prior_mean_NA = -1, prior_sd_NA = 0.001)
adj2
```

```
## a n p d sem daz
## 1 0.4873 11 0.9091 30.76 3.751319 -0.633
## 2 NA 14 0.6429 -1.00 0.000000 NA
## 3 NA 19 0.9474 -1.00 0.000000 NA
## 4 1.9055 13 0.8462 63.88 2.971594 -0.094
```

This call sets a custom prior mean and standard deviation
`prior_mean_NA = -1`

and `prior_sd_NA = 0.001`

,
effectively resulting in a constant value for the D-score (note that
`prior_sd_NA = 0`

produces missing values).

Note that the `prior_mean_NA`

and `prior_sd_NA`

arguments are ignored when `prior_mean`

and
`prior_sd`

are set per observation (either by direct or
column specification). Those options allow for full control over the
handling of missing ages on a case-by-case basis.