Title: | D-Score for Child Development |
---|---|
Description: | The D-score summarizes the child's performance on a set of milestones into a single number. The package implements four Rasch model keys to convert milestone scores into a D-score. It provides tools to calculate the D-score and its precision from the child's milestone scores, to convert the D-score into the Development-for-Age Z-score (DAZ) using age-conditional references, and to map milestone names into a generic 9-position item naming convention. |
Authors: | Stef van Buuren [cre, aut], Iris Eekhout [aut], Arjan Huizing [aut], Jonathan Seiden [aut] |
Maintainer: | Stef van Buuren <[email protected]> |
License: | AGPL-3 |
Version: | 1.9.8 |
Built: | 2025-02-09 05:16:40 UTC |
Source: | https://github.com/d-score/dscore |
A data frame with administrative information per item with difficulty
estimates (tau
) from the Rasch model. The item bank provides the basic
information to calculate D-scores. The items in the item bank
are a subset of all items as collected in builtin_itemtable.
builtin_itembank
builtin_itembank
A data.frame
with variables:
Name | Label |
key |
String indicating a specific Rasch model |
item |
Item name, gsed lexicon |
tau |
Difficulty estimate |
label |
Label (English) |
instrument |
Instrument code |
domain |
Domain code |
mode |
Administration mode |
number |
Item number |
The difficulty estimates were estimated by a Rasch model. The key
indicates the specific Rasch model used to estimate the difficulty.
Strictly speaking, one can only compare D-score calculated from the
same key
.
Updates:
Dec 01, 2022 - Overwrite labels of gto by correct item order.
Dec 05, 2022 - Adds key gsed2212
, adding instruments gl1
and gs1
, and
defining correct order for gto
Jan 05, 2023 - Adds instrument gh1
to key gsed2212
dscore()
, get_tau()
, builtin_itemtable()
# count number of items per instrument in each key table(builtin_itembank$instrument, builtin_itembank$key)
# count number of items per instrument in each key table(builtin_itembank$instrument, builtin_itembank$key)
The built-in variable builtin_itemtable
contains the name and label
of items for measuring early child development.
builtin_itemtable
builtin_itemtable
A data.frame
with variables:
Name | Label |
item |
Item name, gsed lexicon |
equate |
Equate group |
label |
Label (English) |
The builtin_itemtable
is created by script
data-raw/R/save_builtin_itemtable.R
.
Updates:
May 30, 2022 - added gto (LF) and gpa (SF) items
June 1, 2022 - added seven gsd items
Nov 24, 2022 - Added instruments gs1, gs2
Dec 01, 2022 - Labels of gto replaced by correct order. Incorrect item order affects analyses done on LF between 20220530 - 20221201 !!!
Dec 05, 2022 - Redefines gs1 and instrument for Phase 2, removes gs2 (139) Adds gl1 (Long Form Phase 2 items 155)
Jan 05, 2023 - Adds 55 items from GSED-HF
Compiled by Stef van Buuren using different sources
A key contains the item difficulty estimates from a given Rasch model.
The difficulty estimates (tau
) are used to calculate D-scores.
D-scores can only be compared when calculated with the same key.
builtin_keys
builtin_keys
builtin_keys
is a data.frame
with variables:
Name | Label |
key |
String. Name of the key indicating the Rasch model |
base_population |
String. Name of the base population for the key |
n_items |
Number of items in the key |
n_instruments |
Number of instruments in the key |
intercept |
Intercept to convert logit into D-score |
slope |
Slope to convert logit into D-score |
from |
Starting value of the quadrature points |
to |
Stopping value of the quadrature points |
by |
Increment of the quadrature points |
retired |
Has the key been retired? |
20240609 SvB: Added builtin_keys
table by
data-raw\data\R\save_builtin_keys.R
A data frame containing the age-dependent distribution of the D-score for children aged 0-5 years. The distribution is modelled after the LMS distribution (Cole & Green, 1992) or BCT model (Stasinopoulos & Rigby, 2022) and is equal for both boys and girls. The LMS/BCT values can be used to graph reference charts and to calculate age-conditional Z-scores, also known as the Development-for-Age Z-score (DAZ).
builtin_references
builtin_references
A data.frame
with the following variables:
Name | Label |
population |
Name of the reference population |
key |
D-score key, e.g., "dutch" , "gcdg" or "gsed" |
distribution |
Distribution family: "LMS" or "BCT" |
age |
Decimal age in years |
mu |
M-curve, median D-score, P50 |
sigma |
S-curve, spread expressed as coefficient of variation |
nu |
L-curve, the lambda coefficient of the LMS/BCT model for skewness |
tau |
Kurtosis parameter in the BCT model |
P3 |
P3 percentile |
P10 |
P10 percentile |
P25 |
P25 percentile |
P50 |
P50 percentile |
P75 |
P75 percentile |
P90 |
P90 percentile |
P97 |
P97 percentile |
SDM2 |
-2SD centile |
SDM1 |
-1SD centile |
SD0 |
0SD centile, median |
SDP1 |
+1SD centile |
SDP2 |
+2SD centile |
Here are more details on the reference population:
The "dutch"
references were calculated from the SMOCC data, and cover
age range 0-2.5 years (van Buuren, 2014).
The "gcdg"
references were calculated from the 15 cohorts of the
GCDG-study, and cover age range 0-5 years (Weber, 2019).
The "phase1"
references were calculated from the GSED Phase 1 validation
data (GSED-BGD, GSED-PAK, GSED-TZA) cover age range 2w-3.5 years. The
age range 3.5-5 yrs is linearly extrapolated and are only indicative.
The "preliminary_standards"
were calculated from the GSED Phase 1 validation
data (GSED-BGD, GSED-PAK, GSED-TZA) using a subset of children with
covariate indicating healthy development.
Cole TJ, Green PJ (1992). Smoothing reference centile curves: The LMS method and penalized likelihood. Statistics in Medicine, 11(10), 1305-1319.
Van Buuren S (2014). Growth charts of human development. Stat Methods Med Res, 23(4), 346-368. https://stefvanbuuren.name/publication/van-buuren-2014-gc/
Weber AM, Rubio-Codina M, Walker SP, van Buuren S, Eekhout I, Grantham-McGregor S, Caridad Araujo M, Chang SM, Fernald LCH, Hamadani JD, Hanlon A, Karam SM, Lozoff B, Ratsifandrihamanana L, Richter L, Black MM (2019). The D-score: a metric for interpreting the early development of infants and toddlers across global settings. BMJ Global Health, BMJ Global Health 4: e001724. https://gh.bmj.com/content/bmjgh/4/6/e001724.full.pdf
Stasinopoulos M, Rigby R (2022). gamlss.dist: Distributions for Generalized Additive Models for Location Scale and Shape, R package version 6.0-3, https://CRAN.R-project.org/package=gamlss.dist
# get an overview of available references per key table(builtin_references$population, builtin_references$key)
# get an overview of available references per key table(builtin_references$population, builtin_references$key)
If the tauj is not within the range rello - relhi from the dynamic EAP, the procedure ignores the score of item j.
calculate_posterior(scores, tau, qp, scale, mu, sd, relhi, rello)
calculate_posterior(scores, tau, qp, scale, mu, sd, relhi, rello)
scores |
A vector with PASS/FAIL observations.
Scores are coded numerically as |
tau |
A vector containing the item difficulties for the item
scores in |
qp |
Numeric vector of equally spaced quadrature points. |
scale |
Scale expansion |
mu |
Numeric scalar. The mean of the prior. |
sd |
Numeric scalar. Standard deviation of the prior. |
relhi |
Positive numeric scalar. Upper end of the relevance interval |
rello |
Negative numeric scalar. Lower end of the relevance interval |
A list
with three elements:
Name | Label |
eap |
Mean of the posterior |
gp |
Vector of quadrature points |
posterior |
Vector with posterior distribution. |
Since dscore V40.1
the function does not return the "start"
element.
Stef van Buuren, Arjan Huizing, 2020
Returns the age-interpolated median of the D-score of the default reference for a given key.
count_mu(t, key, prior_mean_NA = NA_real_)
count_mu(t, key, prior_mean_NA = NA_real_)
t |
Decimal age, numeric vector |
key |
Character, key of the reference population |
prior_mean_NA |
Numeric, prior mean when age is missing |
Do not use this function if you want the median D-score for a specific reference.
DEPRECATED in dscore 1.9.6
A vector of length length(t)
with the median of the default reference
population for the key.
Returns the age-interpolated median of the Dutch references (van Buuren 2014).
The working range is 0-3 years. This function is used
to set prior mean under key "dutch"
.
count_mu_dutch(t)
count_mu_dutch(t)
t |
Decimal age, numeric vector |
A vector of length length(t)
with the median of the Dutch references.
Internal function. Called by dscore()
dscore:::count_mu_dutch(0:2)
dscore:::count_mu_dutch(0:2)
Returns the age-interpolated median of the GCDG references (Weber
et al, 2019). The working range is 0-4 years. This function is used
to set prior mean under keys "gcdg"
and "gsed1912"
.
count_mu_gcdg(t)
count_mu_gcdg(t)
t |
Decimal age, numeric vector |
A vector of length length(t)
with the median of the GCDG references.
Internal function. Called by dscore()
dscore:::count_mu_gcdg(0:2)
dscore:::count_mu_gcdg(0:2)
Returns the age-interpolated median of the phase1 references
based on LF & SF in GSED-BGD, GSED-PAK, GSED-TZA. This function is used
to set prior mean under keys "293_0"
and "gsed2212"
.
count_mu_phase1(t)
count_mu_phase1(t)
t |
Decimal age, numeric vector |
The interpolation is done in two rounds. First round: Calculate D-scores using .gcdg prior-mean, calculate reference, estimate round 1 parameters used in this function. Round 2: Calculate D-score using round 1 estimates as the prior mean (most differences are within 0.1 D-score points), recalculate references, estimate round 2 parameters used in this function.
Round 1: Count model: <= 9MN: 21.3449 + 26.4916 t + 7.0251(t + 0.2) Count model: > 9Mn & <= 3.5 YR: 14.69947 - 12.18636 t + 69.11675(t + 0.92) Linear model: > 3.5 YRS: 61.40956 + 3.80904 t
Round 2: Count model: < 9MND: 20.5883 + 27.3376 t + 6.4254(t + 0.2) Count model: > 9MND & < 3.5 YR: 14.63748 - 12.11774 t + 69.05463(t + 0.92) Linear model: > 3.5 YRS: 61.37967 + 3.83513 t
The working range is 0-3.5 years. After the age of 3.5 years, the function will increase at an arbitrary rate of 3.8 D-score points per year.
A vector of length length(t)
with the median of the GCDG references.
Internal function. Called by dscore()
Stef van Buuren, on behalf of GSED project
dscore:::count_mu_phase1(0:5)
dscore:::count_mu_phase1(0:5)
Returns the age-interpolated median of the preliminary_standards
based on LF & SF in GSED-BGD, GSED-PAK, GSED-TZA. This function is used
to set prior mean under key "gsed2406"
.
count_mu_preliminary_standards(t)
count_mu_preliminary_standards(t)
t |
Decimal age, numeric vector |
A vector of length length(t)
with the median of the GCDG references.
Internal function. Called by dscore()
Stef van Buuren, on behalf of GSED project
dscore:::count_mu_preliminary_standards(0:5)
dscore:::count_mu_preliminary_standards(0:5)
The daz()
function calculated the Development-for-Age Z-score (DAZ).
The DAZ represents a child's D-score after adjusting for age by an
external age-conditional reference.
daz(d, x, reference_table = NULL, dec = 3, verbose = FALSE) zad(z, x, reference_table = NULL, dec = 2, verbose = FALSE)
daz(d, x, reference_table = NULL, dec = 3, verbose = FALSE) zad(z, x, reference_table = NULL, dec = 2, verbose = FALSE)
d |
Vector of D-scores |
x |
Vector of ages (decimal age) |
reference_table |
A |
dec |
The number of decimals (default |
verbose |
Print out the used reference table (default |
z |
Vector of standard deviation scores (DAZ) |
The zad()
is the inverse of daz()
: Given age and
the Z-score, it finds the raw D-score.
Note 1: The Box-Cox Cole and Green (BCCG) and Box-Cox t (BCT)
distributions model only positive D-score values. To increase
robustness, the daz()
and zad()
functions will round up any
D-scores lower than 1.0 to 1.0.
Note 2: The daz()
and zad()
function call modified version of the
pBCT()
and qBCT()
functions from gamlss
for better handling
of NA
's and rounding.
Unnamed numeric vector with Z-scores of length length(d)
.
Unnamed numeric vector with D-scores of length length(z)
.
Stef van Buuren
Cole TJ, Green PJ (1992). Smoothing reference centile curves: The LMS method and penalized likelihood. Statistics in Medicine, 11(10), 1305-1319.
# using default reference and key daz(d = c(35, 50), x = c(0.5, 1.0)) # print out names of the used reference table daz(d = c(35, 50), x = c(0.5, 1.0), verbose = TRUE) # using the default reference in key gcdg reftab <- get_reference(key = "gcdg") daz(d = c(35, 50), x = c(0.5, 1.0), reference_table = reftab) # using Dutch reference in default key reftab <- get_reference(population = "dutch", verbose = TRUE) daz(d = c(35, 50), x = c(0.5, 1.0), reference_table = reftab) # population median at ages 0.5, 1 and 2 years, default reference zad(z = rep(0, 3), x = c(0.5, 1, 2)) # population median at ages 0.5, 1 and 2 years, gcdg key reftab <- get_reference(key = "gcdg", verbose = TRUE) zad(z = rep(0, 3), x = c(0.5, 1, 2), reference_table = reftab) # population median at ages 0.5, 1 and 2 years, dutch key reftab <- get_reference(key = "dutch", verbose = TRUE) zad(z = rep(0, 3), x = c(0.5, 1, 2), reference = reftab)
# using default reference and key daz(d = c(35, 50), x = c(0.5, 1.0)) # print out names of the used reference table daz(d = c(35, 50), x = c(0.5, 1.0), verbose = TRUE) # using the default reference in key gcdg reftab <- get_reference(key = "gcdg") daz(d = c(35, 50), x = c(0.5, 1.0), reference_table = reftab) # using Dutch reference in default key reftab <- get_reference(population = "dutch", verbose = TRUE) daz(d = c(35, 50), x = c(0.5, 1.0), reference_table = reftab) # population median at ages 0.5, 1 and 2 years, default reference zad(z = rep(0, 3), x = c(0.5, 1, 2)) # population median at ages 0.5, 1 and 2 years, gcdg key reftab <- get_reference(key = "gcdg", verbose = TRUE) zad(z = rep(0, 3), x = c(0.5, 1, 2), reference_table = reftab) # population median at ages 0.5, 1 and 2 years, dutch key reftab <- get_reference(key = "dutch", verbose = TRUE) zad(z = rep(0, 3), x = c(0.5, 1, 2), reference = reftab)
This utility function decomposes item names into components: instrument, domain, mode and number
decompose_itemnames(x)
decompose_itemnames(x)
x |
A character vector containing item names (gsed lexicon) |
The gsed-naming convention is as follows. Position 1-3 codes the instrument, position 4-5 codes the domain, position 6 codes direct/caregiver/message, positions 7-9 is a item sequence number.
A data.frame
with length(x)
rows and
four columns, named: instrument
, domain
, mode
,
and number
.
Stef van Buuren
https://docs.google.com/spreadsheets/d/1zLsSW9CzqshL8ubb7K5R9987jF4YGDVAW_NBw1hR2aQ/edit#gid=0
itemnames <- c("aqigmc028", "grihsd219", "", "by1mdd157", "mdsgmd006") decompose_itemnames(itemnames)
itemnames <- c("aqigmc028", "grihsd219", "", "by1mdd157", "mdsgmd006") decompose_itemnames(itemnames)
The dscore()
function estimates the following quantities: D-score,
a numeric score that quantifies child development by one number,
Development-for-Age Z-score (DAZ) that corrects the D-score for age,
standard error of measurement (SEM) of the D-score.
dscore( data, items = names(data), key = NULL, population = NULL, xname = "age", xunit = c("decimal", "days", "months"), prepend = NULL, itembank = NULL, metric = c("dscore", "logit"), prior_mean = NULL, prior_mean_NA = NULL, prior_sd = NULL, prior_sd_NA = NULL, transform = NULL, qp = NULL, dec = c(2L, 3L), relevance = c(-Inf, Inf), algorithm = c("current", "1.8.7"), verbose = FALSE ) dscore_posterior( data, items = names(data), key = NULL, population = NULL, xname = "age", xunit = c("decimal", "days", "months"), prepend = NULL, itembank = NULL, metric = c("dscore", "logit"), prior_mean = NULL, prior_mean_NA = NULL, prior_sd = NULL, prior_sd_NA = NULL, transform = NULL, qp = NULL, dec = c(2L, 3L), relevance = c(-Inf, Inf), algorithm = c("current", "1.8.7"), verbose = FALSE )
dscore( data, items = names(data), key = NULL, population = NULL, xname = "age", xunit = c("decimal", "days", "months"), prepend = NULL, itembank = NULL, metric = c("dscore", "logit"), prior_mean = NULL, prior_mean_NA = NULL, prior_sd = NULL, prior_sd_NA = NULL, transform = NULL, qp = NULL, dec = c(2L, 3L), relevance = c(-Inf, Inf), algorithm = c("current", "1.8.7"), verbose = FALSE ) dscore_posterior( data, items = names(data), key = NULL, population = NULL, xname = "age", xunit = c("decimal", "days", "months"), prepend = NULL, itembank = NULL, metric = c("dscore", "logit"), prior_mean = NULL, prior_mean_NA = NULL, prior_sd = NULL, prior_sd_NA = NULL, transform = NULL, qp = NULL, dec = c(2L, 3L), relevance = c(-Inf, Inf), algorithm = c("current", "1.8.7"), verbose = FALSE )
data |
A |
items |
A character vector containing names of items to be
included into the D-score calculation. Milestone scores are coded
numerically as |
key |
String. They key identifies 1) the difficulty estimates
pertaining to a particular Rasch model, and 2) the prior mean and standard
deviation of the prior distribution for calculating the D-score.
The default key |
population |
String. The name of the reference population to calculate
DAZ.
Use |
xname |
A string with the name of the age variable in
|
xunit |
A string specifying the unit in which age is measured
(either |
prepend |
Character vector with column names in |
itembank |
A |
metric |
A string, either |
prior_mean |
|
prior_mean_NA |
|
prior_sd |
|
prior_sd_NA |
|
transform |
Numeric vector, length 2, containing the intercept
and slope of the linear transform from the logit scale into the
the D-score scale. The default ( |
qp |
Numeric vector of equally spaced quadrature points.
This vector should span the range of all D-score or logit values.
The default ( |
dec |
A vector of two integers specifying the number of
decimals for rounding the D-score and DAZ, respectively.
The default is |
relevance |
A numeric vector of length with the lower and
upper bounds of the relevance interval. The procedure calculates
a dynamic EAP for each item. If the difficulty level (tau) of the
next item is outside the relevance interval around EAP, the procedure
ignore the score on the item. The default is |
algorithm |
Computational method, for backward compatibility.
Either |
verbose |
Logical. Print settings. |
The scoring algorithm is based on the method by Bock and Mislevy (1982). The method uses Bayes rule to update a prior ability into a posterior ability.
The item names should correspond to the "gsed"
lexicon.
A key is defined by the set of estimated item difficulties.
Key | Model | Quadrature | Instruments | Direct/Caregiver | Reference |
"dutch" |
75_0 |
-10:80 |
1 | direct | Van Buuren, 2014/2020 |
"gcdg" |
565_18 |
-10:100 |
13 | direct | Weber, 2019 |
"gsed1912" |
807_17 |
-10:100 |
21 | mixed | GSED Team, 2019 |
"293_0" |
293_0 |
-10:100 |
2 | mixed | GSED Team, 2022 |
"gsed2212" |
818_6 |
-10:100 |
27 | mixed | GSED Team, 2022 |
"gsed2406" |
818_6 |
-10:100 |
27 | mixed | GSED Team, 2024 |
As a general rule, one should only compare D-scores
that are calculated using the same key and the same
set of quadrature points. For calculating D-scores on new data,
the advice is to use the default, which currently is "gsed2406"
.
The default starting prior is a mean calculated from a so-called
"Count model" that describes mean D-score as a function of age. The
The Count models are implemented in the function [get_mu()]
.
By default, the spread of the starting prior
is 5 D-score points around the mean D-score, which corresponds to
approximately 1.5 to 2 times the normal spread of child of a given age. The
starting prior is informative for very short test (say <5 items), but has
little impact on the posterior for larger tests.
The dscore()
function returns a data.frame
with nrow(data)
rows.
Optionally, the first block of columns can be copied to the
result by using prepend
. The second block consists of the
following columns:
Name | Label |
a |
Decimal age (years) |
n |
Number of items with valid (0/1) data |
p |
Percentage of passed milestones |
d |
D-score, mean of posterior distribution |
sem |
Standard error of measurement, standard deviation of the posterior |
daz |
D-score corrected for age, calculated in Z-scale (for metric "dscore" ) |
The D-score in column d
is a linear scale, with values usually ranging
from 0 to 100. The D-score is NA
if age is missing or if age is lower
than -1/12. It is possible to calculate D-scores for cases with missing ages
by setting prior_mean_NA
and prior_sd_NA
to some reasonable value, e.g.,
prior_mean_NA = 50
and prior_sd_NA = 20
, for the sample at hand.
The SEM is a positive number that quantifies the uncertainty of the D-score.
It is NA
if the D-score is NA
.
The DAZ in column daz
is a Z-score that corrects the D-score for age. It
is NA
when there are no reference values for the given age, or when
the D-score is extremely unlikely to be valid at the given age.
Advanced applications: The dscore_posterior()
function returns a
data frame with nrow(data)
rows and length(qp)
plus prepended columns
with the full posterior density of the D-score at each quadrature point.
If no valid responses are found, dscore_posterior()
returns the
prior density. Versions prior to 1.8.5 returned a matrix
(instead of
a data.frame
). Code that depends on the result being a matrix
may break
and may need adaptation.
Stef van Buuren, Iris Eekhout, Arjan Huizing (2022)
Bock DD, Mislevy RJ (1982). Adaptive EAP Estimation of Ability in a Microcomputer Environment. Applied Psychological Measurement, 6(4), 431-444.
Van Buuren S (2014). Growth charts of human development. Stat Methods Med Res, 23(4), 346-368. https://stefvanbuuren.name/publication/van-buuren-2014-gc/
Weber AM, Rubio-Codina M, Walker SP, van Buuren S, Eekhout I, Grantham-McGregor S, Caridad Araujo M, Chang SM, Fernald LCH, Hamadani JD, Hanlon A, Karam SM, Lozoff B, Ratsifandrihamanana L, Richter L, Black MM (2019). The D-score: a metric for interpreting the early development of infants and toddlers across global settings. BMJ Global Health, BMJ Global Health 4: e001724. https://gh.bmj.com/content/bmjgh/4/6/e001724.full.pdf
builtin_keys()
, builtin_itembank()
, builtin_itemtable()
,
builtin_references()
, get_tau()
, posterior()
, milestones()
# using all defaults and properly formatted data ds <- dscore(milestones) head(ds) # step-by-step example data <- data.frame( id = c( "Jane", "Martin", "ID-3", "No. 4", "Five", "6", NA_character_, as.character(8:10) ), age = rep(round(21 / 365.25, 4), 10), ddifmd001 = c(NA, NA, 0, 0, 0, 1, 0, 1, 1, 1), ddicmm029 = c(NA, NA, NA, 0, 1, 0, 1, 0, 1, 1), ddigmd053 = c(NA, 0, 0, 1, 0, 0, 1, 1, 0, 1) ) items <- names(data)[3:5] # third item is not part of the default key get_tau(items, verbose = TRUE) # calculate D-score dscore(data) # prepend id variable to output dscore(data, prepend = "id") # or prepend all data # dscore(data, prepend = colnames(data)) # calculate full posterior p <- dscore_posterior(data) # check that rows sum to 1 rowSums(p) # plot full posterior for measurement 7 barplot(as.matrix(p[7, 12:36]), names = 1:25, xlab = "D-score", ylab = "Density", col = "grey", main = "Full D-score posterior for measurement in row 7", sub = "D-score (EAP) = 11.58, SEM = 3.99") # plot P10, P50 and P90 of D-score references g <- expand.grid(age = seq(0.1, 4, 0.1), p = c(0.1, 0.5, 0.9)) d <- zad(z = qnorm(g$p), x = g$age, verbose = TRUE) matplot( x = matrix(g$age, ncol = 3), y = matrix(d, ncol = 3), type = "l", lty = 1, col = "blue", xlab = "Age (years)", ylab = "D-score", main = "D-score preliminary standards: P10, P50 and P90") abline(h = seq(10, 80, 10), v = seq(0, 4, 0.5), col = "gray", lty = 2) # add measurements made on very preterms, ga < 32 weeks ds <- dscore(milestones) points(x = ds$a, y = ds$d, pch = 19, col = "red")
# using all defaults and properly formatted data ds <- dscore(milestones) head(ds) # step-by-step example data <- data.frame( id = c( "Jane", "Martin", "ID-3", "No. 4", "Five", "6", NA_character_, as.character(8:10) ), age = rep(round(21 / 365.25, 4), 10), ddifmd001 = c(NA, NA, 0, 0, 0, 1, 0, 1, 1, 1), ddicmm029 = c(NA, NA, NA, 0, 1, 0, 1, 0, 1, 1), ddigmd053 = c(NA, 0, 0, 1, 0, 0, 1, 1, 0, 1) ) items <- names(data)[3:5] # third item is not part of the default key get_tau(items, verbose = TRUE) # calculate D-score dscore(data) # prepend id variable to output dscore(data, prepend = "id") # or prepend all data # dscore(data, prepend = colnames(data)) # calculate full posterior p <- dscore_posterior(data) # check that rows sum to 1 rowSums(p) # plot full posterior for measurement 7 barplot(as.matrix(p[7, 12:36]), names = 1:25, xlab = "D-score", ylab = "Density", col = "grey", main = "Full D-score posterior for measurement in row 7", sub = "D-score (EAP) = 11.58, SEM = 3.99") # plot P10, P50 and P90 of D-score references g <- expand.grid(age = seq(0.1, 4, 0.1), p = c(0.1, 0.5, 0.9)) d <- zad(z = qnorm(g$p), x = g$age, verbose = TRUE) matplot( x = matrix(g$age, ncol = 3), y = matrix(d, ncol = 3), type = "l", lty = 1, col = "blue", xlab = "Age (years)", ylab = "D-score", main = "D-score preliminary standards: P10, P50 and P90") abline(h = seq(10, 80, 10), v = seq(0, 4, 0.5), col = "gray", lty = 2) # add measurements made on very preterms, ga < 32 weeks ds <- dscore(milestones) points(x = ds$a, y = ds$d, pch = 19, col = "red")
This function calculates the ages at which a certain percent in the reference population passes the items.
get_age_equivalent( items, pct = c(10, 50, 90), key = NULL, population = NULL, transform = NULL, itembank = dscore::builtin_itembank, xunit = c("decimal", "days", "months"), verbose = FALSE )
get_age_equivalent( items, pct = c(10, 50, 90), key = NULL, population = NULL, transform = NULL, itembank = dscore::builtin_itembank, xunit = c("decimal", "days", "months"), verbose = FALSE )
items |
A character vector containing names of items to be
included into the D-score calculation. Milestone scores are coded
numerically as |
pct |
Numeric vector with requested percentiles (0-100). The
default is |
key |
String. They key identifies 1) the difficulty estimates
pertaining to a particular Rasch model, and 2) the prior mean and standard
deviation of the prior distribution for calculating the D-score.
The default key |
population |
String. The name of the reference population to calculate
DAZ.
Use |
transform |
Numeric vector, length 2, containing the intercept
and slope of the linear transform from the logit scale into the
the D-score scale. The default ( |
itembank |
A |
xunit |
A string specifying the unit in which age is measured
(either |
verbose |
Logical. Print settings. |
data.frame
with four columns: item
, d
(D-score),
pct
(percentile), and a
(age-equivalent, in xunit
units).
The function internally defines a scale factor given the key.
get_age_equivalent(c("gpagmc018", "gtogmd026", "ddicmm050"))
get_age_equivalent(c("gpagmc018", "gtogmd026", "ddicmm050"))
The get_itemnames()
function matches names against the 9-code
template. This is useful for quickly selecting names of items from a larger
set of names.
get_itemnames( x, instrument = NULL, domain = NULL, mode = NULL, number = NULL, strict = FALSE, itemtable = NULL, order = "idnm" )
get_itemnames( x, instrument = NULL, domain = NULL, mode = NULL, number = NULL, strict = FALSE, itemtable = NULL, order = "idnm" )
x |
A character vector, |
instrument |
A character vector with 3-position codes of instruments
that should match. The default |
domain |
A character vector with 2-position codes of domains
that should match. The default |
mode |
A character vector with 1-position codes of the mode
of administration. The default |
number |
A numeric or character vector with item numbers.
The default |
strict |
A logical specifying whether the resulting item
names must conform to one of the built-in names. The default is
|
itemtable |
A |
order |
A four-letter string specifying the sorting order.
The four letters are: |
The gsed-naming convention is as follows. Position 1-3 codes the instrument, position 4-5 codes the domain, position 6 codes direct/caregiver/message, positions 7-9 is a item sequence number.
A vector with names of items
Stef van Buuren 2020
itemnames <- c("aqigmc028", "grihsd219", "", "age", "mdsgmd999") # filter out impossible names get_itemnames(itemnames) get_itemnames(itemnames, strict = TRUE) # only items from specific instruments get_itemnames(itemnames, instrument = c("aqi", "mds")) get_itemnames(itemnames, instrument = c("aqi", "mds"), strict = TRUE) # get all items from the se domain of iyo instrument get_itemnames(domain = "se", instrument = "iyo") # get all item from the se domain with direct assessment mode get_itemnames(domain = "se", mode = "d") # get all item numbers 70 and 73 from gm domain get_itemnames(number = c(70, 73), domain = "gm")
itemnames <- c("aqigmc028", "grihsd219", "", "age", "mdsgmd999") # filter out impossible names get_itemnames(itemnames) get_itemnames(itemnames, strict = TRUE) # only items from specific instruments get_itemnames(itemnames, instrument = c("aqi", "mds")) get_itemnames(itemnames, instrument = c("aqi", "mds"), strict = TRUE) # get all items from the se domain of iyo instrument get_itemnames(domain = "se", instrument = "iyo") # get all item from the se domain with direct assessment mode get_itemnames(domain = "se", mode = "d") # get all item numbers 70 and 73 from gm domain get_itemnames(number = c(70, 73), domain = "gm")
The builtin_itemtable
object in the dscore
package
contains basic meta-information about items: a name, the equate group,
and the item label.
The get_itemtable()
function returns a subset of items
in the itemtable.
get_itemtable(items = NULL, itemtable = NULL, decompose = FALSE)
get_itemtable(items = NULL, itemtable = NULL, decompose = FALSE)
items |
A logical or character vector of item names to return. The
default ( |
itemtable |
A |
decompose |
If |
A data.frame
with seven columns.
head(get_itemtable(), 3) get_itemtable(LETTERS[1:3], "")
head(get_itemtable(), 3) get_itemtable(LETTERS[1:3], "")
The get_labels()
function obtains the item labels for a
specified set of items.
get_labels(items = NULL, trim = NULL, itemtable = NULL)
get_labels(items = NULL, trim = NULL, itemtable = NULL)
items |
A character vector of item names to return. The
default ( |
trim |
The maximum number of characters in the label. The
default |
itemtable |
A |
A named character vector with length(items)
elements with
item labels, in the same order as in items
.
builtin_itemtable()
, get_itemnames()
# get labels of first two Macarthur items get_labels(get_itemnames(instrument = "mac", number = 1:2), trim = 40)
# get labels of first two Macarthur items get_labels(get_itemnames(instrument = "mac", number = 1:2), trim = 40)
Returns the age-interpolated median of the D-score of the default reference for a given key.
get_mu(t, key, prior_mean_NA = NA_real_)
get_mu(t, key, prior_mean_NA = NA_real_)
t |
Decimal age, numeric vector |
key |
Character, key of the reference population |
prior_mean_NA |
Numeric, prior mean when age is missing |
Use get_reference()
for more options.
A vector of length length(t)
with the median of the default reference
population for the key.
The get_reference()
function selects the D-score reference
distribution.
get_reference( population = NULL, key = NULL, references = dscore::builtin_references, verbose = FALSE, ... )
get_reference( population = NULL, key = NULL, references = dscore::builtin_references, verbose = FALSE, ... )
population |
String. The name of the reference population to calculate
DAZ.
Use |
key |
String. They key identifies 1) the difficulty estimates
pertaining to a particular Rasch model, and 2) the prior mean and standard
deviation of the prior distribution for calculating the D-score.
The default key |
references |
A |
verbose |
Logical. Print settings. |
... |
Used to test whether the call contained the deprecated argument
|
A data.frame
with the LMS reference values.
No references for population "gsed"
exist.
The function will silently rewrite population = "gsed"
into to the population = "gsed"
.
The "dutch"
reference was published in Van Buuren (2014)
The "gcdg"
was calculated from 15 cohorts with direct
observations (Weber, 2019).
The "phase1"
references were calculated from the GSED Phase 1 validation
data (GSED-BGD, GSED-PAK, GSED-TZA) cover age range 2w-3.5 years. The
age range 3.5-5 yrs is linearly extrapolated and are only indicative.
The "preliminary_standards"
references were calculated from the GSED
Phase 1 validation using a subset of children with healthy development.
Van Buuren S (2014). Growth charts of human development. Stat Methods Med Res, 23(4), 346-368.
Weber AM, Rubio-Codina M, Walker SP, van Buuren S, Eekhout I, Grantham-McGregor S, Caridad Araujo M, Chang SM, Fernald LCH, Hamadani JD, Hanlon A, Karam SM, Lozoff B, Ratsifandrihamanana L, Richter L, Black MM (2019). The D-score: a metric for interpreting the early development of infants and toddlers across global settings. BMJ Global Health, BMJ Global Health 4: e001724. https://gh.bmj.com/content/bmjgh/4/6/e001724.full.pdf.
# see key-population combinations of builtin_references table(builtin_references$key, builtin_references$population) # get the default reference reftab <- get_reference() head(reftab, 2) # get the default reference for the key "gsed2212" reftab <- get_reference(key = "gsed2212", verbose = TRUE) # get dutch reference for default key reftab <- get_reference(population = "dutch", verbose = TRUE) # loading a non-existing reference yields zero rows reftab <- get_reference(population = "france", verbose = TRUE) nrow(reftab)
# see key-population combinations of builtin_references table(builtin_references$key, builtin_references$population) # get the default reference reftab <- get_reference() head(reftab, 2) # get the default reference for the key "gsed2212" reftab <- get_reference(key = "gsed2212", verbose = TRUE) # get dutch reference for default key reftab <- get_reference(population = "dutch", verbose = TRUE) # loading a non-existing reference yields zero rows reftab <- get_reference(population = "france", verbose = TRUE) nrow(reftab)
Searches the item bank for matching items, and returns the difficulty estimates. Matching is done by item name. Comparisons are done in lower case.
get_tau( items, key = NULL, itembank = dscore::builtin_itembank, verbose = FALSE )
get_tau( items, key = NULL, itembank = dscore::builtin_itembank, verbose = FALSE )
items |
A character vector containing names of items to be
included into the D-score calculation. Milestone scores are coded
numerically as |
key |
String. They key identifies 1) the difficulty estimates
pertaining to a particular Rasch model, and 2) the prior mean and standard
deviation of the prior distribution for calculating the D-score.
The default key |
itembank |
A |
verbose |
Logical. Print settings. |
A named vector with the difficulty estimate per item with
length(items)
elements.
Stef van Buuren 2020
# difficulty levels in the GHAP lexicon get_tau(items = c("ddifmd001", "DDigmd052", "xyz"))
# difficulty levels in the GHAP lexicon get_tau(items = c("ddifmd001", "DDigmd052", "xyz"))
A demo dataset with developmental scores at the item level for 10 random children from the GSED Phase 1 data.
gsample
gsample
A data.frame
with 10 rows and 295 variables:
Name | Label |
id |
Integer, child ID |
agedays |
Integer, age in days |
gpalac001 |
Integer, Cry when hungry...: 1 = yes, 0 = no, NA = not administered |
gpalac002 |
Integer, Look at/focus...: 1 = yes, 0 = no, NA = not administered |
... |
and so on.. |
There are 138 gpa
items (item gpamoc008
(clench fists) removed) from GSED SF and
and 155 gto
items from GSED LF.
head(gsample)
head(gsample)
A demo dataset with developmental scores at the item level for a set of 27 preterm children.
milestones
milestones
A data.frame
with 100 rows and 62 variables:
Name | Label |
id |
Integer, child ID |
agedays |
Integer, age in days |
age |
Numeric, decimal age in years |
sex |
Character, "male", "female" |
gagebrth |
Integer, gestational age in days |
ddifmd001 |
Integer, Fixates eyes: 1 = yes, 0 = no |
... |
and so on.. |
head(milestones)
head(milestones)
Normalizes the distribution so that the total mass equals 1.
normalize(d, qp)
normalize(d, qp)
d |
A vector with |
qp |
Vector of equally spaced quadrature points. |
A vector of length(d)
elements with
the prior density estimate at each quadature point.
: Internal function
dscore:::normalize(c(5, 10, 5), qp = c(0, 1, 2)) sum(dscore:::normalize(rnorm(5), qp = 1:5))
dscore:::normalize(c(5, 10, 5), qp = c(0, 1, 2)) sum(dscore:::normalize(rnorm(5), qp = 1:5))
Calculate posterior for one item given score, difficulty and prior
posterior(score, tau, prior, qp, scale)
posterior(score, tau, prior, qp, scale)
score |
Integer, either 0 (fail) and 1 (pass) |
tau |
Numeric, difficulty parameter |
prior |
Vector of prior values on quadrature points |
qp |
vector of equally spaced quadrature points |
scale |
expansion relative to the logit scale |
This function assumes that the difficulties have been estimated by
a binary Rasch model, e.g. by rasch.pairwise.itemcluster()
of
the sirt
package.
A vector of length length(prior)
: Internal function
Stef van Buuren, Arjan Huizing, 2020
Function rename_gcdg_gsed()
translates item names in the
gcdg lexicon to item names in the gsed lexicon.
rename_gcdg_gsed(x, copy = TRUE)
rename_gcdg_gsed(x, copy = TRUE)
x |
A character vector containing item names in the gcdg lexicon |
copy |
A logical indicating whether any unmatches names should
be copied ( |
The gsed-naming convention is as follows. Position 1-3 codes the instrument, position 4-5 codes the domain, position 6 codes direct/caregiver/message, positions 7-9 is a item sequence number.
The function currently support ASQ-I (aqi), Barrera-Moncade (bar), Batelle (bat), Bayley I (by1), Bayley II (by2), Bayley III (by3), Dutch Development Instrument (ddi), Denver (den), Griffith (gri), MacArthur (mac), WHO milestones (mds), Mullen (mul), pegboard (peg), South African Griffith (sgr), Stanford Binet (sbi), Tepsi (tep), Vineland (vin).
In cases where the domain of the items isn't clear (vin, bar), the domain is coded as 'xx'.
A character vector of length length(x)
with gcdg
item names replaced by gsed item name.
Iris Eekhout, Stef van Buuren
https://docs.google.com/spreadsheets/d/1zLsSW9CzqshL8ubb7K5R9987jF4YGDVAW_NBw1hR2aQ/edit#gid=0
from <- c( "ag28", "gh2_19", "a14ps4", "b1m157", "mil6", "bm19", "a16fm4", "n22", "ag9", "gh6_5" ) to <- rename_gcdg_gsed(from, copy = FALSE) to
from <- c( "ag28", "gh2_19", "a14ps4", "b1m157", "mil6", "bm19", "a16fm4", "n22", "ag9", "gh6_5" ) to <- rename_gcdg_gsed(from, copy = FALSE) to
A demo dataset with developmental scores at the item level for 10 random children from the GSED Phase 1 data.
sample_hf
sample_hf
A data.frame
with 10 rows and 57 variables:
Name | Label |
subjid |
Integer, child ID |
agedays |
Integer, age in days |
hf001 |
Integer, ...: 1 = yes, 0 = no, NA = not administered |
hf002 |
Integer, ...: 1 = yes, 0 = no, NA = not administered |
... |
and so on.. |
Sample data for 55 gpa
items forming GSED HF V1
head(sample_hf)
head(sample_hf)
A demo dataset with developmental scores at the item level for 10 random children from the GSED Phase 1 data.
sample_lf
sample_lf
A data.frame
with 10 rows and 157 variables:
Name | Label |
subjid |
Integer, child ID |
agedays |
Integer, age in days |
lf001 |
Integer, ...: 1 = yes, 0 = no, NA = not administered |
lf002 |
Integer, ...: 1 = yes, 0 = no, NA = not administered |
... |
and so on.. |
Sample data for 155 gto
items from GSED SF
head(sample_lf)
head(sample_lf)
A demo dataset with developmental scores at the item level for 10 random children from the GSED Phase 1 data.
sample_sf
sample_sf
A data.frame
with 10 rows and 141 variables:
Name | Label |
subjid |
Integer, child ID |
agedays |
Integer, age in days |
sf001 |
Integer, Cry when hungry...: 1 = yes, 0 = no, NA = not administered |
sf002 |
Integer, Look at/focus...: 1 = yes, 0 = no, NA = not administered |
... |
and so on.. |
Sample data for 139 gpa
items from GSED SF
head(sample_sf)
head(sample_sf)
This function sorts the item names according to instrument, domain, mode and number. The user can specify the sorting order.
sort_itemnames(x, order = "idnm") order_itemnames(x, order = "idnm")
sort_itemnames(x, order = "idnm") order_itemnames(x, order = "idnm")
x |
A character vector containing item names (gsed lexicon) |
order |
A four-letter string specifying the sorting order.
The four letters are: |
sort_itemnames()
return a character vector with
length(x)
sorted elements. order_itemnames()
return
an integer vector of length length(x)
with positions of
the sorted elements.
Stef van Buuren
itemnames <- c("aqigmc028", "grihsd219", "", "by1mdd157", "mdsgmd006") sort_itemnames(itemnames)
itemnames <- c("aqigmc028", "grihsd219", "", "by1mdd157", "mdsgmd006") sort_itemnames(itemnames)