| Title: | D-Score for Child Development |
|---|---|
| Description: | The D-score summarizes a child's performance on developmental milestones into a single number. Its key feature is its generic nature. The method does not depend on a specific measurement instrument. The statistical method underlying the D-score is described in van Buuren et al. (2025) <doi:10.1177/01650254241294033>. This package implements model keys to convert milestone scores to D-scores; maps instrument-specific item names to a generic 9-position naming convention; computes D-scores and their precision from a child's milestone scores; and converts D-scores to Development-for-Age Z-scores (DAZ) using age-conditional reference standards. |
| Authors: | Stef van Buuren [cre, aut], Iris Eekhout [aut], Arjan Huizing [aut], Jonathan Seiden [aut] |
| Maintainer: | Stef van Buuren <[email protected]> |
| License: | Apache License (>= 2) |
| Version: | 2.1.0 |
| Built: | 2026-05-22 06:08:18 UTC |
| Source: | https://github.com/d-score/dscore |
The built-in variable builtin_domaintable contains the domain and weight
of items for measuring early child development.
builtin_domaintablebuiltin_domaintable
A data.frame with variables:
| Name | Label |
set |
Item name, gsed lexicon |
domain |
name of the developmental domain |
item |
Item name, gsed lexicon |
weight |
Proportional weight in domain based on expert votes |
The builtin_domaintable is created by script
data-raw/R/save_builtin_domaintable.R.
Votings are collected by Gareth MacCray in surveys administered to several subject matter experts. Each SME could attribute domains to an item, the expert could distribute their vote to multiple domains if applicable. SME votes were weighted to a proportional distribution of votes over the domains for each item.
Compiled by Iris Eekhout using different sources
A data frame with administrative information per item with difficulty
estimates (tau) from the Rasch model. The item bank provides the basic
information to calculate D-scores. The items in the item bank
are a subset of all items as collected in builtin_itemtable.
builtin_itembankbuiltin_itembank
A data.frame with variables:
| Name | Label |
key |
String indicating a specific Rasch model |
item |
Item name, gsed lexicon |
tau |
Difficulty estimate |
label |
Label (English) |
instrument |
Instrument code |
domain |
Domain code |
mode |
Administration mode |
number |
Item number |
The difficulty estimates were estimated by a Rasch model. The key
indicates the specific Rasch model used to estimate the difficulty.
Strictly speaking, one can only compare D-score calculated from the
same key.
Updates:
Dec 01, 2022 - Overwrite labels of gto by correct item order.
Dec 05, 2022 - Adds key gsed2212, adding instruments gl1 and gs1, and
defining correct order for gto
Jan 05, 2023 - Adds instrument gh1 to key gsed2212
Oct 10, 2025 - Adds key gsed2510 for instruments gl1 and gs1 (281 items)
Oct 21, 2025 - Updates keys gsed2212, gsed2406 for gh1 (55 -> 48 items)
Oct 21, 2025 - Adds gh1 extension to key gsed2510 (48 items)
Oct 23, 2025 - Adds by3 extension to key gsed2510 (242 items)
dscore(), get_tau(), builtin_itemtable()
# count number of items per instrument in each key table(builtin_itembank$instrument, builtin_itembank$key)# count number of items per instrument in each key table(builtin_itembank$instrument, builtin_itembank$key)
The built-in variable builtin_itemtable contains the name and label
of items for measuring early child development.
builtin_itemtablebuiltin_itemtable
A data.frame with variables:
| Name | Label |
item |
Item name, gsed lexicon |
equate |
Equate group |
label |
Label (English) |
The builtin_itemtable is created by script
data-raw/R/save_builtin_itemtable.R.
Updates:
May 30, 2022 - added gto (LF) and gpa (SF) items
June 1, 2022 - added seven gsd items
Nov 24, 2022 - Added instruments gs1, gs2
Dec 01, 2022 - Labels of gto replaced by correct order. Incorrect item order affects analyses done on LF between 20220530 - 20221201 !!!
Dec 05, 2022 - Redefines gs1 and instrument for Phase 2, removes gs2 (139) Adds gl1 (Long Form Phase 2 items 155)
Jan 05, 2023 - Adds 55 items from GSED-HF
Jul 15, 2025 - Rename gpaclc088 –> gpaclc089 (Can you child say five or more separate words) Rename gpasec089 –> gpasec088 (Is your child able to pee and poo)
Oct 20, 2025 Replace HF 55 items list by HF 48 item list
Compiled by Stef van Buuren using different sources
A key contains the item difficulty estimates from a given Rasch model.
The difficulty estimates (tau) within a given key are used to
calculate D-scores. D-scores can only be compared when calculated
from the same key.
builtin_keysbuiltin_keys
builtin_keys is a data.frame with variables:
| Name | Label |
key |
String. Name of the key indicating the Rasch model |
base_population |
String. Name of the base population for the key |
n_items |
Number of items in the key |
n_instruments |
Number of instruments in the key |
intercept |
Intercept to convert logit into D-score |
slope |
Slope to convert logit into D-score |
from |
Starting value of the quadrature points |
to |
Stopping value of the quadrature points |
by |
Increment of the quadrature points |
retired |
Has the key been retired? |
Updated: 20251023 SvB: Added builtin_keys table by
data-raw\data\R\save_builtin_keys.R
A data frame containing the age-dependent distribution of the D-score for children aged 0-5 years. The distribution is modelled after the LMS distribution (Cole & Green, 1992) or BCT model (Stasinopoulos & Rigby, 2022) and is equal for both boys and girls. The LMS/BCT values can be used to graph reference charts and to calculate age-conditional Z-scores, also known as the Development-for-Age Z-score (DAZ).
builtin_referencesbuiltin_references
A data.frame with the following variables:
| Name | Label |
population |
Name of the reference population |
key |
D-score key, e.g., "dutch", "gcdg" or "gsed" |
distribution |
Distribution family: "LMS" or "BCT" |
age |
Decimal age in years |
mu |
M-curve, median D-score, P50 |
sigma |
S-curve, spread expressed as coefficient of variation |
nu |
L-curve, the lambda coefficient of the LMS/BCT model for skewness |
tau |
Kurtosis parameter in the BCT model |
P3 |
P3 percentile |
P10 |
P10 percentile |
P25 |
P25 percentile |
P50 |
P50 percentile |
P75 |
P75 percentile |
P90 |
P90 percentile |
P97 |
P97 percentile |
SDM2 |
-2SD centile |
SDM1 |
-1SD centile |
SD0 |
0SD centile, median |
SDP1 |
+1SD centile |
SDP2 |
+2SD centile |
Here are more details on the reference population:
The "dutch" references were calculated from the SMOCC data, and cover
age range 0-2.5 years (van Buuren, 2014).
The "gcdg" references were calculated from the 15 cohorts of the
GCDG-study, and cover age range 0-5 years (Weber, 2019).
The "phase1" references were calculated from the GSED Phase 1 validation
data (GSED-BGD, GSED-PAK, GSED-TZA) cover age range 2w-3.5 years. The
age range 3.5-5 yrs is linearly extrapolated and is only indicative.
The "preliminary_standards" were calculated from the GSED Phase 1 validation
data (GSED-BGD, GSED-PAK, GSED-TZA) using a subset of children with
covariate indicating healthy development.
The "descriptive" references were calculated from the GSED Phase 1 & 2
validation data (GSED-BGD, GSED-BRA, GSED_CHN, GSED-CIV, GSED-NLD, GSED-PAK,
GSED-TZA) cover age range 2w-3.5 years. The age range 3.5-5 yrs is linearly
extrapolated and is only indicative. The source code for the relevant
calculations can be found in https://github.com/D-score/gsedscripts/blob/main/inst/scripts/phase2/models/purify.R
and https://github.com/D-score/gsedscripts/blob/main/inst/scripts/phase2/models/fit_core_model.R.
The GSED site references are specific to the site: GSED-BGD, GSED-PAK, GSED-TZA, GSED-BRA, GSED-CHN, GSED-CIV, GSED-NLD) covering age range 0-3.5 years. NOTE: Between ages 3.5 and 4.0, the GSED site are linearly extrapolated. Values beyond age 3.5 years are only indicative.
Cole TJ, Green PJ (1992). Smoothing reference centile curves: The LMS method and penalized likelihood. Statistics in Medicine, 11(10), 1305-1319. https://doi.org/10.1002
Van Buuren S (2014). Growth charts of human development. Stat Methods Med Res, 23(4), 346-368. https://doi.org/10.1177/0962280212473300
Weber AM, Rubio-Codina M, Walker SP, van Buuren S, Eekhout I, Grantham-McGregor S, Caridad Araujo M, Chang SM, Fernald LCH, Hamadani JD, Hanlon A, Karam SM, Lozoff B, Ratsifandrihamanana L, Richter L, Black MM (2019). The D-score: a metric for interpreting the early development of infants and toddlers across global settings. BMJ Global Health, BMJ Global Health 4: e001724. https://doi.org/10.1136/bmjgh-2019-001724
van Buuren S, Eekhout I, McCray G, Lancaster GA, Waldman MR, McCoy DC, Gladstone M, Cavallera, V, Dua T, Black MM, GSED Team (2025). Enhancing comparability in early child development assessment with the D-score. International Journal of Behavioral Development, 49(4), 348-364, https://doi.org/10.1177/01650254241294033
Stasinopoulos M, Rigby R (2022). gamlss.dist: Distributions for Generalized Additive Models for Location Scale and Shape, R package version 6.0-3, https://CRAN.R-project.org/package=gamlss.dist
# get an overview of available references per key table(builtin_references$population, builtin_references$key)# get an overview of available references per key table(builtin_references$population, builtin_references$key)
The built-in variable builtin_translate contains a table for
translating among sets of item names into each other.
builtin_translatebuiltin_translate
A data.frame with variables:
| Name | Label |
phase1 |
Item names, Phase 1 data |
phase2 |
Item names, Phase 2 data |
gsed |
gsed lexion |
gsed2 |
gto/gpa lexicon for LF/SF |
gsed3 |
gl1/gs1 lexicon for LF/SF |
short1 |
Short item name, phase 1 order |
short2 |
Short item name, phase 2 order |
instrument |
Instrument code |
seq_phase1 |
Phase 1 order |
seq_phase2 |
Phase 2 order |
label |
Item label (English) |
The builtin_translate is created by script
data-raw/R/save_builtin_translate.R.
Updates:
July 2025 - Tranferred from gsedread package
Compiled by Stef van Buuren
If the tauj is not within the range rello - relhi from the dynamic EAP, the procedure ignores the score of item j.
calculate_posterior(scores, tau, qp, scale, mu, sd, relhi, rello)calculate_posterior(scores, tau, qp, scale, mu, sd, relhi, rello)
scores |
A vector with PASS/FAIL observations.
Scores are coded numerically as |
tau |
A vector containing the item difficulties for the item
scores in |
qp |
Numeric vector of equally spaced quadrature points. |
scale |
Scale expansion |
mu |
Numeric scalar. The mean of the prior. |
sd |
Numeric scalar. Standard deviation of the prior. |
relhi |
Positive numeric scalar. Upper end of the relevance interval |
rello |
Negative numeric scalar. Lower end of the relevance interval |
A list with three elements:
| Name | Label |
eap |
Mean of the posterior |
gp |
Vector of quadrature points |
posterior |
Vector with posterior distribution. |
Since dscore V40.1 the function does not return the "start" element.
Stef van Buuren, Arjan Huizing, 2020
Returns the age-interpolated median of the D-score of the default reference for a given key.
count_mu(t, key, prior_mean_NA = NA_real_)count_mu(t, key, prior_mean_NA = NA_real_)
t |
Decimal age, numeric vector |
key |
Character, key of the reference population |
prior_mean_NA |
Numeric, prior mean when age is missing |
Do not use this function if you want the median D-score for a specific reference.
DEPRECATED in dscore 1.9.6
A vector of length length(t) with the median of the default reference
population for the key.
Returns the age-interpolated median of the Dutch references (van Buuren 2014).
The working range is 0-3 years. This function is used
to set prior mean under key "dutch".
count_mu_dutch(t)count_mu_dutch(t)
t |
Decimal age, numeric vector |
A vector of length length(t) with the median of the Dutch references.
Internal function. Called by dscore()
dscore:::count_mu_dutch(0:2)dscore:::count_mu_dutch(0:2)
Returns the age-interpolated median of the GCDG references (Weber
et al, 2019). The working range is 0-4 years. This function is used
to set prior mean under keys "gcdg" and "gsed1912".
count_mu_gcdg(t)count_mu_gcdg(t)
t |
Decimal age, numeric vector |
A vector of length length(t) with the median of the GCDG references.
Internal function. Called by dscore()
dscore:::count_mu_gcdg(0:2)dscore:::count_mu_gcdg(0:2)
Returns the age-interpolated median of the phase1 references
based on LF & SF in GSED-BGD, GSED-PAK, GSED-TZA. This function is used
to set prior mean under keys "293_0" and "gsed2212".
count_mu_phase1(t)count_mu_phase1(t)
t |
Decimal age, numeric vector |
The interpolation is done in two rounds. First round: Calculate D-scores using .gcdg prior-mean, calculate reference, estimate round 1 parameters used in this function. Round 2: Calculate D-score using round 1 estimates as the prior mean (most differences are within 0.1 D-score points), recalculate references, estimate round 2 parameters used in this function.
Round 1: Count model: <= 9MN: 21.3449 + 26.4916 t + 7.0251(t + 0.2) Count model: > 9Mn & <= 3.5 YR: 14.69947 - 12.18636 t + 69.11675(t + 0.92) Linear model: > 3.5 YRS: 61.40956 + 3.80904 t
Round 2: Count model: < 9MND: 20.5883 + 27.3376 t + 6.4254(t + 0.2) Count model: > 9MND & < 3.5 YR: 14.63748 - 12.11774 t + 69.05463(t + 0.92) Linear model: > 3.5 YRS: 61.37967 + 3.83513 t
The working range is 0-3.5 years. After the age of 3.5 years, the function will increase at an arbitrary rate of 3.8 D-score points per year.
A vector of length length(t) with the median of the GCDG references.
Internal function. Called by dscore()
Stef van Buuren, on behalf of GSED project
dscore:::count_mu_phase1(0:5)dscore:::count_mu_phase1(0:5)
Returns the age-interpolated median of the preliminary_standards
based on LF & SF in seven GSED countries. This function is used
to set prior mean under keys "gsed2406" and "gsed2510".
count_mu_preliminary_standards(t, key = NULL)count_mu_preliminary_standards(t, key = NULL)
t |
Decimal age, numeric vector |
key |
Character, key name |
A vector of length length(t) with the median of the GCDG references.
Internal function. Called by dscore()
Stef van Buuren, on behalf of GSED project
dscore:::count_mu_preliminary_standards(0:5)dscore:::count_mu_preliminary_standards(0:5)
The daz() function calculated the Development-for-Age Z-score (DAZ).
The DAZ represents a child's D-score after adjusting for age by an
external age-conditional reference.
daz(d, x, reference_table = NULL, dec = 3, verbose = FALSE) zad(z, x, reference_table = NULL, dec = 2, verbose = FALSE)daz(d, x, reference_table = NULL, dec = 3, verbose = FALSE) zad(z, x, reference_table = NULL, dec = 2, verbose = FALSE)
d |
Vector of D-scores |
x |
Vector of ages (decimal age) |
reference_table |
A |
dec |
The number of decimals (default |
verbose |
Print out the used reference table (default |
z |
Vector of standard deviation scores (DAZ) |
The zad() is the inverse of daz(): Given age and
the Z-score, it finds the raw D-score.
Note 1: The Box-Cox Cole and Green (BCCG) and Box-Cox t (BCT)
distributions model only positive D-score values. To increase
robustness, the daz() and zad() functions will round up any
D-scores lower than 1.0 to 1.0.
Note 2: The daz() and zad() function call modified version of the
pBCT() and qBCT() functions from gamlss for better handling
of NA's and rounding.
Unnamed numeric vector with Z-scores of length length(d).
Unnamed numeric vector with D-scores of length length(z).
Stef van Buuren
Cole TJ, Green PJ (1992). Smoothing reference centile curves: The LMS method and penalized likelihood. Statistics in Medicine, 11(10), 1305-1319.
# using default reference and key daz(d = c(35, 50), x = c(0.5, 1.0)) # print out names of the used reference table daz(d = c(35, 50), x = c(0.5, 1.0), verbose = TRUE) # using the default reference in key gcdg reftab <- get_reference(key = "gcdg") daz(d = c(35, 50), x = c(0.5, 1.0), reference_table = reftab) # using Dutch reference in default key reftab <- get_reference(population = "dutch", verbose = TRUE) daz(d = c(35, 50), x = c(0.5, 1.0), reference_table = reftab) # population median at ages 0.5, 1 and 2 years, default reference zad(z = rep(0, 3), x = c(0.5, 1, 2)) # population median at ages 0.5, 1 and 2 years, gcdg key reftab <- get_reference(key = "gcdg", verbose = TRUE) zad(z = rep(0, 3), x = c(0.5, 1, 2), reference_table = reftab) # population median at ages 0.5, 1 and 2 years, dutch key reftab <- get_reference(key = "dutch", verbose = TRUE) zad(z = rep(0, 3), x = c(0.5, 1, 2), reference = reftab)# using default reference and key daz(d = c(35, 50), x = c(0.5, 1.0)) # print out names of the used reference table daz(d = c(35, 50), x = c(0.5, 1.0), verbose = TRUE) # using the default reference in key gcdg reftab <- get_reference(key = "gcdg") daz(d = c(35, 50), x = c(0.5, 1.0), reference_table = reftab) # using Dutch reference in default key reftab <- get_reference(population = "dutch", verbose = TRUE) daz(d = c(35, 50), x = c(0.5, 1.0), reference_table = reftab) # population median at ages 0.5, 1 and 2 years, default reference zad(z = rep(0, 3), x = c(0.5, 1, 2)) # population median at ages 0.5, 1 and 2 years, gcdg key reftab <- get_reference(key = "gcdg", verbose = TRUE) zad(z = rep(0, 3), x = c(0.5, 1, 2), reference_table = reftab) # population median at ages 0.5, 1 and 2 years, dutch key reftab <- get_reference(key = "dutch", verbose = TRUE) zad(z = rep(0, 3), x = c(0.5, 1, 2), reference = reftab)
Domain specific D-score
ddomain( data, set, domain = NULL, vote_weight = NULL, items = names(data), key = NULL, population = NULL, xname = "age", xunit = c("decimal", "days", "months"), prepend = NULL, itembank = NULL, metric = c("dscore", "logit"), prior_mean = NULL, prior_mean_NA = NULL, prior_sd = NULL, prior_sd_NA = NULL, transform = NULL, qp = NULL, dec = c(2L, 3L), relevance = c(-Inf, Inf), algorithm = c("current", "1.8.7"), verbose = FALSE )ddomain( data, set, domain = NULL, vote_weight = NULL, items = names(data), key = NULL, population = NULL, xname = "age", xunit = c("decimal", "days", "months"), prepend = NULL, itembank = NULL, metric = c("dscore", "logit"), prior_mean = NULL, prior_mean_NA = NULL, prior_sd = NULL, prior_sd_NA = NULL, transform = NULL, qp = NULL, dec = c(2L, 3L), relevance = c(-Inf, Inf), algorithm = c("current", "1.8.7"), verbose = FALSE )
data |
A |
set |
String. The name of the set of domains to use. See
|
domain |
character vector of the name of the domain(s) for which to
compute the domain score. Per default all domains in the |
vote_weight |
minimum proportion of votes (weight) for a domain that an item needs to have to count for that domain. |
items |
A character vector containing names of items to be
included into the D-score calculation. Milestone scores are coded
numerically as |
key |
String. They key identifies 1) the difficulty estimates
pertaining to a particular Rasch model, and 2) the prior mean and standard
deviation of the prior distribution for calculating the D-score.
The default key |
population |
String. The name of the reference population to calculate
DAZ.
Use |
xname |
A string with the name of the age variable in
|
xunit |
A string specifying the unit in which age is measured
(either |
prepend |
Character vector with column names in |
itembank |
A |
metric |
A string, either |
prior_mean |
|
prior_mean_NA |
|
prior_sd |
|
prior_sd_NA |
|
transform |
Numeric vector, length 2, containing the intercept
and slope of the linear transform from the logit scale into the
the D-score scale. The default ( |
qp |
Numeric vector of equally spaced quadrature points.
This vector should span the range of all D-score or logit values.
The default ( |
dec |
A vector of two integers specifying the number of
decimals for rounding the D-score and DAZ, respectively.
The default is |
relevance |
A numeric vector of length with the lower and
upper bounds of the relevance interval. The procedure calculates
a dynamic EAP for each item. If the difficulty level (tau) of the
next item is outside the relevance interval around EAP, the procedure
ignore the score on the item. The default is |
algorithm |
Computational method, for backward compatibility.
Either |
verbose |
Logical. Print settings. |
The ddomain() function returns a list of data.frame objects with
each nrow(data) rows. The name of the list is the name of the domain.
The data.frame consists of the following columns:
| Name | Label |
a |
Decimal age (years) |
n |
Number of items with valid (0/1) data |
p |
Percentage of passed milestones |
d |
D-score, mean of posterior distribution |
sem |
Standard error of measurement, standard deviation of the posterior |
daz |
D-score corrected for age, calculated in Z-scale (for metric "dscore")
|
The D-score in column d is a linear scale, with values usually ranging
from 0 to 100. The D-score is NA if age is missing or if age is lower
than -1/12. It is possible to calculate D-scores for cases with missing ages
by setting prior_mean_NA and prior_sd_NA to some reasonable value, e.g.,
prior_mean_NA = 50 and prior_sd_NA = 20, for the sample at hand.
The SEM is a positive number that quantifies the uncertainty of the D-score.
It is NA if the D-score is NA.
The DAZ in column daz is a Z-score that corrects the D-score for age. It
is NA when there are no reference values for the given age, or when
the D-score is extremely unlikely to be valid at the given age.
dscore() builtin_domaintable()
sample <- dscore::gsample colnames(sample) <- dscore::rename_vector(colnames(sample), lexin = "gsed2", lexout = "gsed3") sample <- sample |> dplyr::select(subjid, agedays, starts_with("gs1")) |> dplyr::mutate(age = agedays / 365.25) ddomain(sample, set = "GFCLS") ddomain(sample, set = "GFCLS", domain = c("finemotor", "grossmotor")) ddomain(sample, set = "GFCLS", domain = c("language"))sample <- dscore::gsample colnames(sample) <- dscore::rename_vector(colnames(sample), lexin = "gsed2", lexout = "gsed3") sample <- sample |> dplyr::select(subjid, agedays, starts_with("gs1")) |> dplyr::mutate(age = agedays / 365.25) ddomain(sample, set = "GFCLS") ddomain(sample, set = "GFCLS", domain = c("finemotor", "grossmotor")) ddomain(sample, set = "GFCLS", domain = c("language"))
This utility function decomposes item names into components: instrument, domain, mode and number
decompose_itemnames(x)decompose_itemnames(x)
x |
A character vector containing item names (gsed lexicon) |
The gsed-naming convention is as follows. Position 1-3 codes the instrument, position 4-5 codes the domain, position 6 codes direct/caregiver/message, positions 7-9 is a item sequence number.
A data.frame with length(x) rows and
four columns, named: instrument, domain, mode,
and number.
Stef van Buuren
https://docs.google.com/spreadsheets/d/1zLsSW9CzqshL8ubb7K5R9987jF4YGDVAW_NBw1hR2aQ/edit#gid=0
itemnames <- c("aqigmc028", "grihsd219", "", "by1mdd157", "mdsgmd006") decompose_itemnames(itemnames)itemnames <- c("aqigmc028", "grihsd219", "", "by1mdd157", "mdsgmd006") decompose_itemnames(itemnames)
The dscore() function estimates the following quantities: D-score,
a numeric score that quantifies child development by one number,
Development-for-Age Z-score (DAZ) that corrects the D-score for age,
standard error of measurement (SEM) of the D-score.
dscore( data, items = names(data), key = NULL, population = NULL, xname = "age", xunit = c("decimal", "days", "months"), prepend = NULL, itembank = NULL, metric = c("dscore", "logit"), prior_mean = NULL, prior_mean_NA = NULL, prior_sd = NULL, prior_sd_NA = NULL, transform = NULL, qp = NULL, dec = c(2L, 3L), relevance = c(-Inf, Inf), algorithm = c("current", "1.8.7"), verbose = FALSE ) dscore_posterior( data, items = names(data), key = NULL, population = NULL, xname = "age", xunit = c("decimal", "days", "months"), prepend = NULL, itembank = NULL, metric = c("dscore", "logit"), prior_mean = NULL, prior_mean_NA = NULL, prior_sd = NULL, prior_sd_NA = NULL, transform = NULL, qp = NULL, dec = c(2L, 3L), relevance = c(-Inf, Inf), algorithm = c("current", "1.8.7"), verbose = FALSE )dscore( data, items = names(data), key = NULL, population = NULL, xname = "age", xunit = c("decimal", "days", "months"), prepend = NULL, itembank = NULL, metric = c("dscore", "logit"), prior_mean = NULL, prior_mean_NA = NULL, prior_sd = NULL, prior_sd_NA = NULL, transform = NULL, qp = NULL, dec = c(2L, 3L), relevance = c(-Inf, Inf), algorithm = c("current", "1.8.7"), verbose = FALSE ) dscore_posterior( data, items = names(data), key = NULL, population = NULL, xname = "age", xunit = c("decimal", "days", "months"), prepend = NULL, itembank = NULL, metric = c("dscore", "logit"), prior_mean = NULL, prior_mean_NA = NULL, prior_sd = NULL, prior_sd_NA = NULL, transform = NULL, qp = NULL, dec = c(2L, 3L), relevance = c(-Inf, Inf), algorithm = c("current", "1.8.7"), verbose = FALSE )
data |
A |
items |
A character vector containing names of items to be
included into the D-score calculation. Milestone scores are coded
numerically as |
key |
String. They key identifies 1) the difficulty estimates
pertaining to a particular Rasch model, and 2) the prior mean and standard
deviation of the prior distribution for calculating the D-score.
The default key |
population |
String. The name of the reference population to calculate
DAZ.
Use |
xname |
A string with the name of the age variable in
|
xunit |
A string specifying the unit in which age is measured
(either |
prepend |
Character vector with column names in |
itembank |
A |
metric |
A string, either |
prior_mean |
|
prior_mean_NA |
|
prior_sd |
|
prior_sd_NA |
|
transform |
Numeric vector, length 2, containing the intercept
and slope of the linear transform from the logit scale into the
the D-score scale. The default ( |
qp |
Numeric vector of equally spaced quadrature points.
This vector should span the range of all D-score or logit values.
The default ( |
dec |
A vector of two integers specifying the number of
decimals for rounding the D-score and DAZ, respectively.
The default is |
relevance |
A numeric vector of length with the lower and
upper bounds of the relevance interval. The procedure calculates
a dynamic EAP for each item. If the difficulty level (tau) of the
next item is outside the relevance interval around EAP, the procedure
ignore the score on the item. The default is |
algorithm |
Computational method, for backward compatibility.
Either |
verbose |
Logical. Print settings. |
The scoring algorithm is based on the method by Bock and Mislevy (1982). The method uses Bayes rule to update a prior ability into a posterior ability.
The item names should correspond to the "gsed" lexicon.
A key is defined by the set of estimated item difficulties.
| Key | Model | Quadrature | Instruments | Direct/Caregiver | Reference |
"dutch" |
75_0 |
-10:80 |
1 | direct | Van Buuren, 2014/2020 |
"gcdg" |
565_18 |
-10:100 |
13 | direct | Weber, 2019 |
"gsed1912" |
807_17 |
-10:100 |
21 | mixed | GSED Team, 2019 |
"293_0" |
293_0 |
-10:100 |
2 | mixed | GSED Team, 2022 |
"gsed2212" |
818_6 |
-10:100 |
27 | mixed | GSED Team, 2022 |
"gsed2406" |
818_6 |
-10:100 |
27 | mixed | GSED Team, 2024 |
"gsed2510" |
281_0 |
-10:125 |
3 | mixed | GSED Team, 2025 |
As a general rule, one should only compare D-scores
that are calculated using the same key and the same
set of quadrature points. For calculating D-scores on new data,
the advice is to use the default, which currently is "gsed2510".
Currently, key "gsed2510" is defined for instrument codes gs1
(GSED SF), gl1 (GSED LF) and gh1 (GSED HF). If you
have another instrument, use the key "gsed2406".
The default starting prior is a mean calculated from a so-called
"Count model" that describes mean D-score as a function of age. The
The Count models are implemented in the function [get_mu()].
By default, the spread of the starting prior
is 5 D-score points around the mean D-score, which corresponds to
approximately 1.5 to 2 times the normal spread of child of a given age. The
starting prior is informative for very short test (say <5 items), but has
little impact on the posterior for larger tests.
The dscore() function returns a data.frame with nrow(data) rows.
Optionally, the first block of columns can be copied to the
result by using prepend. The second block consists of the
following columns:
| Name | Label |
a |
Decimal age (years) |
n |
Number of items with valid (0/1) data |
p |
Percentage of passed milestones |
d |
D-score, mean of posterior distribution |
sem |
Standard error of measurement, standard deviation of the posterior |
daz |
D-score corrected for age, calculated in Z-scale (for metric "dscore")
|
The D-score in column d is a linear scale, with values usually ranging
from 0 to 100. The D-score is NA if age is missing or if age is lower
than -1/12. It is possible to calculate D-scores for cases with missing ages
by setting prior_mean_NA and prior_sd_NA to some reasonable value, e.g.,
prior_mean_NA = 50 and prior_sd_NA = 20, for the sample at hand.
The SEM is a positive number that quantifies the uncertainty of the D-score.
It is NA if the D-score is NA.
The DAZ in column daz is a Z-score that corrects the D-score for age. It
is NA when there are no reference values for the given age, or when
the D-score is extremely unlikely to be valid at the given age.
Advanced applications: The dscore_posterior() function returns a
data frame with nrow(data) rows and length(qp) plus prepended columns
with the full posterior density of the D-score at each quadrature point.
If no valid responses are found, dscore_posterior() returns the
prior density. Versions prior to 1.8.5 returned a matrix (instead of
a data.frame). Code that depends on the result being a matrix may break
and may need adaptation.
Stef van Buuren, Iris Eekhout, Arjan Huizing (2022)
Bock DD, Mislevy RJ (1982). Adaptive EAP Estimation of Ability in a Microcomputer Environment. Applied Psychological Measurement, 6(4), 431-444.
Van Buuren S (2014). Growth charts of human development. Stat Methods Med Res, 23(4), 346-368. https://doi.org/10.1177/0962280212473300
Weber AM, Rubio-Codina M, Walker SP, van Buuren S, Eekhout I, Grantham-McGregor S, Caridad Araujo M, Chang SM, Fernald LCH, Hamadani JD, Hanlon A, Karam SM, Lozoff B, Ratsifandrihamanana L, Richter L, Black MM (2019). The D-score: a metric for interpreting the early development of infants and toddlers across global settings. BMJ Global Health, BMJ Global Health 4: e001724. https://stefvanbuuren.name/publications/#weber-2019-1
builtin_keys(), builtin_itembank(), builtin_itemtable(),
builtin_references(), get_tau(), posterior(), milestones()
# using all defaults and properly formatted data sf <- dscore::triple[, 1:141] ds <- dscore(sf) head(ds) # step-by-step example demonstrating # all possible response vectors for 3 items data <- data.frame( id = c( "Jane", "Martin", "ID-3", "No. 4", "Five", "6", NA_character_, as.character(8:10)), age = rep(round(21 / 365.25, 4), 10), gs1sec001 = c(NA, NA, 0, 0, 0, 1, 0, 1, 1, 1), gs1moc002 = c(NA, NA, NA, 0, 1, 0, 1, 0, 1, 1), gs1sec003 = c(NA, 0, 0, 1, 0, 0, 1, 1, 0, 1) ) # what are these items? items <- names(data)[3:5] get_labels(items) # difficulty parameter in default key get_tau(items, verbose = TRUE) # calculate D-score # the same sumscore leads to the same D-score (column d) dscore(data) # prepend id variable to output dscore(data, prepend = "id") # or prepend all data # dscore(data, prepend = colnames(data)) # calculate full posterior p <- dscore_posterior(data) # check that rows sum to 1 rowSums(p) # plot full posterior for measurement 7 barplot(as.matrix(p[7, 12:36]), names = 1:25, xlab = "D-score", ylab = "Density", col = "grey", main = "Full D-score posterior for measurement in row 7", sub = "D-score (EAP) = 11.58, SEM = 3.99") # plot P10, P50 and P90 of D-score references g <- expand.grid(age = seq(0.1, 4, 0.1), p = c(0.1, 0.5, 0.9)) d <- zad(z = qnorm(g$p), x = g$age, verbose = TRUE) matplot( x = matrix(g$age, ncol = 3), y = matrix(d, ncol = 3), type = "l", lty = 1, col = "blue", xlab = "Age (years)", ylab = "D-score", main = "D-score preliminary standards: P10, P50 and P90") abline(h = seq(10, 80, 10), v = seq(0, 4, 0.5), col = "gray", lty = 2) # add measurements made on very preterms, ga < 32 weeks # we need key = "gsed2406" because DDI is not yet in key "gsed2510" ds <- dscore(milestones, key = "gsed2406") points(x = ds$a, y = ds$d, pch = 19, col = "red")# using all defaults and properly formatted data sf <- dscore::triple[, 1:141] ds <- dscore(sf) head(ds) # step-by-step example demonstrating # all possible response vectors for 3 items data <- data.frame( id = c( "Jane", "Martin", "ID-3", "No. 4", "Five", "6", NA_character_, as.character(8:10)), age = rep(round(21 / 365.25, 4), 10), gs1sec001 = c(NA, NA, 0, 0, 0, 1, 0, 1, 1, 1), gs1moc002 = c(NA, NA, NA, 0, 1, 0, 1, 0, 1, 1), gs1sec003 = c(NA, 0, 0, 1, 0, 0, 1, 1, 0, 1) ) # what are these items? items <- names(data)[3:5] get_labels(items) # difficulty parameter in default key get_tau(items, verbose = TRUE) # calculate D-score # the same sumscore leads to the same D-score (column d) dscore(data) # prepend id variable to output dscore(data, prepend = "id") # or prepend all data # dscore(data, prepend = colnames(data)) # calculate full posterior p <- dscore_posterior(data) # check that rows sum to 1 rowSums(p) # plot full posterior for measurement 7 barplot(as.matrix(p[7, 12:36]), names = 1:25, xlab = "D-score", ylab = "Density", col = "grey", main = "Full D-score posterior for measurement in row 7", sub = "D-score (EAP) = 11.58, SEM = 3.99") # plot P10, P50 and P90 of D-score references g <- expand.grid(age = seq(0.1, 4, 0.1), p = c(0.1, 0.5, 0.9)) d <- zad(z = qnorm(g$p), x = g$age, verbose = TRUE) matplot( x = matrix(g$age, ncol = 3), y = matrix(d, ncol = 3), type = "l", lty = 1, col = "blue", xlab = "Age (years)", ylab = "D-score", main = "D-score preliminary standards: P10, P50 and P90") abline(h = seq(10, 80, 10), v = seq(0, 4, 0.5), col = "gray", lty = 2) # add measurements made on very preterms, ga < 32 weeks # we need key = "gsed2406" because DDI is not yet in key "gsed2510" ds <- dscore(milestones, key = "gsed2406") points(x = ds$a, y = ds$d, pch = 19, col = "red")
This function calculates the ages at which a certain percent in the reference population passes the items.
get_age_equivalent( items, pct = c(10, 50, 90), key = NULL, population = NULL, transform = NULL, itembank = dscore::builtin_itembank, xunit = c("decimal", "days", "months"), verbose = FALSE )get_age_equivalent( items, pct = c(10, 50, 90), key = NULL, population = NULL, transform = NULL, itembank = dscore::builtin_itembank, xunit = c("decimal", "days", "months"), verbose = FALSE )
items |
A character vector containing names of items to be
included into the D-score calculation. Milestone scores are coded
numerically as |
pct |
Numeric vector with requested percentiles (0-100). The
default is |
key |
String. They key identifies 1) the difficulty estimates
pertaining to a particular Rasch model, and 2) the prior mean and standard
deviation of the prior distribution for calculating the D-score.
The default key |
population |
String. The name of the reference population to calculate
DAZ.
Use |
transform |
Numeric vector, length 2, containing the intercept
and slope of the linear transform from the logit scale into the
the D-score scale. The default ( |
itembank |
A |
xunit |
A string specifying the unit in which age is measured
(either |
verbose |
Logical. Print settings. |
data.frame with four columns: item, d (D-score),
pct (percentile), and a (age-equivalent, in xunit units).
The function internally defines a scale factor given the key.
get_age_equivalent(c("gpagmc018", "gtogmd026", "ddicmm050"), key = "gsed2406", population = "dutch", verbose = TRUE)get_age_equivalent(c("gpagmc018", "gtogmd026", "ddicmm050"), key = "gsed2406", population = "dutch", verbose = TRUE)
The get_itemnames() function matches names against the 9-code
template. This is useful for quickly selecting names of items from a larger
set of names.
get_itemnames( x, instrument = NULL, domain = NULL, mode = NULL, number = NULL, strict = FALSE, itemtable = NULL, order = "idnm" )get_itemnames( x, instrument = NULL, domain = NULL, mode = NULL, number = NULL, strict = FALSE, itemtable = NULL, order = "idnm" )
x |
A character vector, |
instrument |
A character vector with 3-position codes of instruments
that should match. The default |
domain |
A character vector with 2-position codes of domains
that should match. The default |
mode |
A character vector with 1-position codes of the mode
of administration. The default |
number |
A numeric or character vector with item numbers.
The default |
strict |
A logical specifying whether the resulting item
names must conform to one of the built-in names. The default is
|
itemtable |
A |
order |
A four-letter string specifying the sorting order.
The four letters are: |
The gsed-naming convention is as follows. Position 1-3 codes the instrument, position 4-5 codes the domain, position 6 codes direct/caregiver/message, positions 7-9 is a item sequence number.
A vector with names of items
Stef van Buuren
itemnames <- c("aqigmc028", "grihsd219", "", "age", "mdsgmd999") # filter out impossible names get_itemnames(itemnames) get_itemnames(itemnames, strict = TRUE) # only items from specific instruments get_itemnames(itemnames, instrument = c("aqi", "mds")) get_itemnames(itemnames, instrument = c("aqi", "mds"), strict = TRUE) # get all items from the se domain of iyo instrument get_itemnames(domain = "se", instrument = "iyo") # get all item from the se domain with direct assessment mode get_itemnames(domain = "se", mode = "d") # get all item numbers 70 and 73 from gm domain get_itemnames(number = c(70, 73), domain = "gm") # get item names from GSED SF (2023 version) in published order items_sf <- get_itemnames(instrument = "gs1", order = "indm") # get item names from GSED LF (2023 version) in published order items_lf <- get_itemnames(instrument = "gl1") items_lf <- items_lf[c(55:155, 1:54)]itemnames <- c("aqigmc028", "grihsd219", "", "age", "mdsgmd999") # filter out impossible names get_itemnames(itemnames) get_itemnames(itemnames, strict = TRUE) # only items from specific instruments get_itemnames(itemnames, instrument = c("aqi", "mds")) get_itemnames(itemnames, instrument = c("aqi", "mds"), strict = TRUE) # get all items from the se domain of iyo instrument get_itemnames(domain = "se", instrument = "iyo") # get all item from the se domain with direct assessment mode get_itemnames(domain = "se", mode = "d") # get all item numbers 70 and 73 from gm domain get_itemnames(number = c(70, 73), domain = "gm") # get item names from GSED SF (2023 version) in published order items_sf <- get_itemnames(instrument = "gs1", order = "indm") # get item names from GSED LF (2023 version) in published order items_lf <- get_itemnames(instrument = "gl1") items_lf <- items_lf[c(55:155, 1:54)]
The builtin_itemtable object in the dscore package
contains basic meta-information about items: a name, the equate group,
and the item label.
The get_itemtable() function returns a subset of items
in the itemtable.
get_itemtable(items = NULL, itemtable = NULL, decompose = FALSE)get_itemtable(items = NULL, itemtable = NULL, decompose = FALSE)
items |
A logical or character vector of item names to return. The
default ( |
itemtable |
A |
decompose |
If |
A data.frame with seven columns.
head(get_itemtable(), 3) get_itemtable(LETTERS[1:3], "")head(get_itemtable(), 3) get_itemtable(LETTERS[1:3], "")
The get_labels() function obtains the item labels for a
specified set of items.
get_labels(items = NULL, trim = NULL, itemtable = NULL)get_labels(items = NULL, trim = NULL, itemtable = NULL)
items |
A character vector of item names to return. The
default ( |
trim |
The maximum number of characters in the label. The
default |
itemtable |
A |
A named character vector with length(items) elements with
item labels, in the same order as in items.
builtin_itemtable(), get_itemnames()
# get labels of first two Macarthur items get_labels(get_itemnames(instrument = "mac", number = 1:2), trim = 40)# get labels of first two Macarthur items get_labels(get_itemnames(instrument = "mac", number = 1:2), trim = 40)
Returns the age-interpolated median of the D-score of the default reference for a given key.
get_mu(t, key, prior_mean_NA = NA_real_)get_mu(t, key, prior_mean_NA = NA_real_)
t |
Decimal age, numeric vector |
key |
Character, key of the reference population |
prior_mean_NA |
Numeric, prior mean when age is missing |
Use get_reference() for more options.
A vector of length length(t) with the median of the default reference
population for the key.
Returns the age-interpolated median of the GSED cohorts based on LF & SF in seven GSED countries. This function is used to set prior mean for poulations "GSED-BGD", "GSED-BRA", etc.
get_mu_gsed_cohorts( t, key = "gsed2510", cohort = c("GSED-BGD", "GSED-BRA", "GSED-CHN", "GSED-CIV", "GSED-NLD", "GSED-PAK", "GSED-TZA") )get_mu_gsed_cohorts( t, key = "gsed2510", cohort = c("GSED-BGD", "GSED-BRA", "GSED-CHN", "GSED-CIV", "GSED-NLD", "GSED-PAK", "GSED-TZA") )
t |
Decimal age, numeric vector |
key |
Character, key name |
cohort |
Character, cohort name |
A vector of length length(t) with the median of the specified GSED cohort
references.
The get_reference() function selects the D-score reference
distribution.
get_reference( population = NULL, key = NULL, references = dscore::builtin_references, verbose = FALSE, ... )get_reference( population = NULL, key = NULL, references = dscore::builtin_references, verbose = FALSE, ... )
population |
String. The name of the reference population to calculate
DAZ.
Use |
key |
String. They key identifies 1) the difficulty estimates
pertaining to a particular Rasch model, and 2) the prior mean and standard
deviation of the prior distribution for calculating the D-score.
The default key |
references |
A |
verbose |
Logical. Print settings. |
... |
Used to test whether the call contained the deprecated argument
|
A data.frame with the LMS reference values.
No references for population "gsed" exist.
The function will silently rewrite population = "gsed"
into to the population = "gsed".
The "dutch" reference was published in Van Buuren (2014)
The "gcdg" was calculated from 15 cohorts with direct
observations (Weber, 2019).
The "phase1" references were calculated from the GSED Phase 1 validation
data (GSED-BGD, GSED-PAK, GSED-TZA) cover age range 2w-3.5 years. The
age range 3.5-5 yrs is linearly extrapolated and are only indicative.
(Van Buuren et al, 2025)
The "preliminary_standards" references were calculated from the GSED
Phase 1 validation using a subset of children with healthy development.
(Van Buuren et al, 2025)
The "descriptive" references were calculated from the GSED
Phase 1 + 2 (Seven countries) validation study using the "gsed2510" key.
It is a descriptive reference, i.e., no selection of children growing
up in healthy environments was made. (In preparation for publication).
Van Buuren S (2014). Growth charts of human development. Stat Methods Med Res, 23(4), 346-368.
Weber AM, Rubio-Codina M, Walker SP, van Buuren S, Eekhout I, Grantham-McGregor S, Caridad Araujo M, Chang SM, Fernald LCH, Hamadani JD, Hanlon A, Karam SM, Lozoff B, Ratsifandrihamanana L, Richter L, Black MM (2019). The D-score: a metric for interpreting the early development of infants and toddlers across global settings. BMJ Global Health, BMJ Global Health 4: e001724. https://gh.bmj.com/content/bmjgh/4/6/e001724.full.pdf.
van Buuren S, Eekhout I, McCray G, Lancaster GA, Waldman MR, McCoy DC, Gladstone M, Cavallera, V, Dua T, Black MM, GSED Team (2025). Enhancing comparability in early child development assessment with the D-score. International Journal of Behavioral Development, 49(4), 348-364, https://doi.org/10.1177/01650254241294033
# see key-population combinations of builtin_references table(builtin_references$key, builtin_references$population) # get the default reference reftab <- get_reference() head(reftab, 2) # get the default reference for the key "gsed2212" reftab <- get_reference(key = "gsed2212", verbose = TRUE) # get dutch reference for default key reftab <- get_reference(population = "dutch", verbose = TRUE) # loading a non-existing reference yield fallback to default reftab <- get_reference(population = "france", verbose = TRUE) # if user specifies a builtin population (e.g. descriptive) and the key # is not found, then it returns the specified reference for its most recent key reftab <- get_reference(key = "none", population = "preliminary_standards", verbose = TRUE) nrow(reftab)# see key-population combinations of builtin_references table(builtin_references$key, builtin_references$population) # get the default reference reftab <- get_reference() head(reftab, 2) # get the default reference for the key "gsed2212" reftab <- get_reference(key = "gsed2212", verbose = TRUE) # get dutch reference for default key reftab <- get_reference(population = "dutch", verbose = TRUE) # loading a non-existing reference yield fallback to default reftab <- get_reference(population = "france", verbose = TRUE) # if user specifies a builtin population (e.g. descriptive) and the key # is not found, then it returns the specified reference for its most recent key reftab <- get_reference(key = "none", population = "preliminary_standards", verbose = TRUE) nrow(reftab)
Searches the item bank for matching items, and returns the difficulty estimates. Matching is done by item name. Comparisons are done in lower case.
get_tau( items, key = NULL, itembank = dscore::builtin_itembank, verbose = FALSE )get_tau( items, key = NULL, itembank = dscore::builtin_itembank, verbose = FALSE )
items |
A character vector containing names of items to be
included into the D-score calculation. Milestone scores are coded
numerically as |
key |
String. They key identifies 1) the difficulty estimates
pertaining to a particular Rasch model, and 2) the prior mean and standard
deviation of the prior distribution for calculating the D-score.
The default key |
itembank |
A |
verbose |
Logical. Print settings. |
A named vector with the difficulty estimate per item with
length(items) elements.
Stef van Buuren 2020
# difficulty levels in the GHAP lexicon get_tau(items = c("ddifmd001", "DDigmd052", "xyz"))# difficulty levels in the GHAP lexicon get_tau(items = c("ddifmd001", "DDigmd052", "xyz"))
A demo dataset with developmental scores at the item level for 10 random children from the GSED Phase 1 data.
gsamplegsample
A data.frame with 10 rows and 295 variables:
| Name | Label |
id |
Integer, child ID |
agedays |
Integer, age in days |
gpalac001 |
Integer, Cry when hungry...: 1 = yes, 0 = no, NA = not administered |
gpalac002 |
Integer, Look at/focus...: 1 = yes, 0 = no, NA = not administered |
... |
and so on.. |
There are 138 gpa items (item gpamoc008 (clench fists) removed) from GSED SF and
and 155 gto items from GSED LF.
On July 15, 2025, the item gpaclc088 was renamed to gpaclc089
(Can you child say five or more separate words) and gpasec089 was renamed
to gpasec088 (Is your child able to pee and poo).
head(gsample)head(gsample)
A demo dataset with developmental scores at the item level for a set of 27 preterm children.
milestonesmilestones
A data.frame with 100 rows and 62 variables:
| Name | Label |
id |
Integer, child ID |
agedays |
Integer, age in days |
age |
Numeric, decimal age in years |
sex |
Character, "male", "female" |
gagebrth |
Integer, gestational age in days |
ddifmd001 |
Integer, Fixates eyes: 1 = yes, 0 = no |
... |
and so on.. |
head(milestones)head(milestones)
Normalizes the distribution so that the total mass equals 1.
normalize(d, qp)normalize(d, qp)
d |
A vector with |
qp |
Vector of equally spaced quadrature points. |
A vector of length(d) elements with
the prior density estimate at each quadature point.
: Internal function
dscore:::normalize(c(5, 10, 5), qp = c(0, 1, 2)) sum(dscore:::normalize(rnorm(5), qp = 1:5))dscore:::normalize(c(5, 10, 5), qp = c(0, 1, 2)) sum(dscore:::normalize(rnorm(5), qp = 1:5))
Calculate posterior for one item given score, difficulty and prior
posterior(score, tau, prior, qp, scale)posterior(score, tau, prior, qp, scale)
score |
Integer, either 0 (fail) and 1 (pass) |
tau |
Numeric, difficulty parameter |
prior |
Vector of prior values on quadrature points |
qp |
vector of equally spaced quadrature points |
scale |
expansion relative to the logit scale |
This function assumes that the difficulties have been estimated by
a binary Rasch model, e.g. by rasch.pairwise.itemcluster() of
the sirt package.
A vector of length length(prior)
: Internal function
Stef van Buuren, Arjan Huizing, 2020
Function rename_gcdg_gsed() translates item names in the
gcdg lexicon to item names in the gsed lexicon.
rename_gcdg_gsed(x, copy = TRUE)rename_gcdg_gsed(x, copy = TRUE)
x |
A character vector containing item names in the gcdg lexicon |
copy |
A logical indicating whether any unmatches names should
be copied ( |
The gsed-naming convention is as follows. Position 1-3 codes the instrument, position 4-5 codes the domain, position 6 codes direct/caregiver/message, positions 7-9 is a item sequence number.
The function currently support ASQ-I (aqi), Barrera-Moncade (bar), Batelle (bat), Bayley I (by1), Bayley II (by2), Bayley III (by3), Dutch Development Instrument (ddi), Denver (den), Griffith (gri), MacArthur (mac), WHO milestones (mds), Mullen (mul), pegboard (peg), South African Griffith (sgr), Stanford Binet (sbi), Tepsi (tep), Vineland (vin).
In cases where the domain of the items isn't clear (vin, bar), the domain is coded as 'xx'.
A character vector of length length(x) with gcdg
item names replaced by gsed item name.
Iris Eekhout, Stef van Buuren
https://docs.google.com/spreadsheets/d/1zLsSW9CzqshL8ubb7K5R9987jF4YGDVAW_NBw1hR2aQ/edit#gid=0
from <- c( "ag28", "gh2_19", "a14ps4", "b1m157", "mil6", "bm19", "a16fm4", "n22", "ag9", "gh6_5" ) to <- rename_gcdg_gsed(from, copy = FALSE) tofrom <- c( "ag28", "gh2_19", "a14ps4", "b1m157", "mil6", "bm19", "a16fm4", "n22", "ag9", "gh6_5" ) to <- rename_gcdg_gsed(from, copy = FALSE) to
Translates names between different lexicons (naming schema).
rename_vector( input, lexin = c("phase2", "phase1", "short1", "short2", "gsed", "gsed2", "gsed3"), lexout = c("gsed3", "gsed2", "gsed", "short2", "short1", "phase1", "phase2"), notfound = "copy", contains = c("", "Ma_SF_", "Ma_LF_", "bsid_"), underscore = TRUE, trim = "Ma_", lowercase = TRUE, force_subjid_agedays = FALSE )rename_vector( input, lexin = c("phase2", "phase1", "short1", "short2", "gsed", "gsed2", "gsed3"), lexout = c("gsed3", "gsed2", "gsed", "short2", "short1", "phase1", "phase2"), notfound = "copy", contains = c("", "Ma_SF_", "Ma_LF_", "bsid_"), underscore = TRUE, trim = "Ma_", lowercase = TRUE, force_subjid_agedays = FALSE )
input |
A character vector with names to be translated |
lexin |
A string indicating the input lexicon. One of |
lexout |
A string indicating the output lexicon. One of |
notfound |
A string indicating what to do some input value is not found |
contains |
A string to filter the translation table prior to matching. Needed to prevent double matches. The default ("") does not filter. |
underscore |
Replaces space (" ") and dash ("-") by underscore ("_") |
trim |
A substring to be removed from |
lowercase |
Sets all variables in lower case.
in |
force_subjid_agedays |
If |
The recommended approach for reading new data is to name the columns
according to the names defined by "short2" and the apply rename_vector()
to translate the names to the "gsed3" lexicon.
The lexicons "phase1", "short1", "gsed" and "gsed2" are included
for backward compatibility, and are not recommended for use with new
data.
A character vector of the same length as input with processed
names.
# Using Ma_SF_Cxx as input names, 2023 SF/LF version input <- c("file", "GSED_ID", "Ma_SF_Parent ID", "Ma_SF_C01", "Ma_SF_C02") rename_vector(input) rename_vector(input, lexout = "short2", lowercase = FALSE) rename_vector(input, lexout = "gsed3", trim = "Ma_SF_") # Convert short names to gsed names input <- c("file", "GSED_ID", "Ma_SF_Parent ID", paste0("SF00", 1:3)) rename_vector(input, lexin = "short2", lowercase = TRUE)# Using Ma_SF_Cxx as input names, 2023 SF/LF version input <- c("file", "GSED_ID", "Ma_SF_Parent ID", "Ma_SF_C01", "Ma_SF_C02") rename_vector(input) rename_vector(input, lexout = "short2", lowercase = FALSE) rename_vector(input, lexout = "gsed3", trim = "Ma_SF_") # Convert short names to gsed names input <- c("file", "GSED_ID", "Ma_SF_Parent ID", paste0("SF00", 1:3)) rename_vector(input, lexin = "short2", lowercase = TRUE)
A demo dataset with developmental scores at the item level for 10 random children from the GSED Phase 1 data.
sample_hfsample_hf
A data.frame with 10 rows and 50 variables:
| Name | Label |
subjid |
Integer, child ID |
agedays |
Integer, age in days |
hf001 |
Integer, ...: 1 = yes, 0 = no, NA = not administered |
hf002 |
Integer, ...: 1 = yes, 0 = no, NA = not administered |
... |
and so on.. |
Sample data for 48 gpa items forming GSED HF V1
The HF item set was revised on October 20, 2025 to contain 48 items. This dataset reflects that change.
head(sample_hf)head(sample_hf)
A demo dataset with developmental scores at the item level for 10 random children from the GSED Phase 1 data.
sample_lfsample_lf
A data.frame with 10 rows and 157 variables:
| Name | Label |
subjid |
Integer, child ID |
agedays |
Integer, age in days |
lf001 |
Integer, ...: 1 = yes, 0 = no, NA = not administered |
lf002 |
Integer, ...: 1 = yes, 0 = no, NA = not administered |
... |
and so on.. |
Sample data for 155 gto items from GSED SF
head(sample_lf)head(sample_lf)
A demo dataset with developmental scores at the item level for 10 random children from the GSED Phase 1 data.
sample_sfsample_sf
A data.frame with 10 rows and 141 variables:
| Name | Label |
subjid |
Integer, child ID |
agedays |
Integer, age in days |
sf001 |
Integer, Cry when hungry...: 1 = yes, 0 = no, NA = not administered |
sf002 |
Integer, Look at/focus...: 1 = yes, 0 = no, NA = not administered |
... |
and so on.. |
Sample data for 139 gpa items from GSED SF
#' @details
On July 15, 2025, the item gpaclc088 was renamed to gpaclc089
(Can you child say five or more separate words) and gpasec089 was renamed
to gpasec088 (Is your child able to pee and poo).
head(sample_sf)head(sample_sf)
This function sorts the item names according to instrument, domain, mode and number. The user can specify the sorting order.
sort_itemnames(x, order = "idnm") order_itemnames(x, order = "idnm")sort_itemnames(x, order = "idnm") order_itemnames(x, order = "idnm")
x |
A character vector containing item names (gsed lexicon) |
order |
A four-letter string specifying the sorting order.
The four letters are: |
sort_itemnames() return a character vector with
length(x) sorted elements. order_itemnames() return
an integer vector of length length(x) with positions of
the sorted elements.
Stef van Buuren
itemnames <- c("aqigmc028", "grihsd219", "", "by1mdd157", "mdsgmd006") sort_itemnames(itemnames)itemnames <- c("aqigmc028", "grihsd219", "", "by1mdd157", "mdsgmd006") sort_itemnames(itemnames)
An example dataset with developmental scores at the item level for
50 random children from the GSED Validation Study (Cavellera et al, 2023).
Each child has measurements from GSED SF (gs1), GSED LF (gl1) and
BSID-III (by3).
tripletriple
A data.frame with 50 rows and 559 variables:
| Name | Label |
id |
Integer, child ID |
age |
Numeric, age in decimal years |
agedays |
Integer, age in days |
gs1sec001 |
Integer, SF001 Does your child smile? |
gs1moc002 |
Integer, SF002 When lying on his/her back, ... |
... |
and so on.. |
The dataset contains 138 items from GSED SF (gs1),
(item gs1moc028 was skipped), 155 items from GSED LF (gl1),
and 263 (out of 326) items from BSID-III (by3).
Cavallera et al. (2023). Protocol for validation of the Global Scales for Early Development (GSED) for children under 3 years of age in seven countries. BMJ Open, 13(1), e062562. DOI: 10.1136/bmjopen-2022-062562. https://bmjopen.bmj.com/content/13/1/e062562
World Health Organization (WHO) (2023). Global Scales for Early Development (GSED) V1.0: Technical Report. Geneva: World Health Organization. https://www.who.int/publications/i/item/WHO-MSD-GSED-package-v1.0-2023.1
# calculate D-score from all instruments ds_all <- dscore(triple) head(ds_all) # calculate D-score from only GSED SF items ds_sf <- dscore(triple, items = get_itemnames(instrument = "gs1")) head(ds_sf)# calculate D-score from all instruments ds_all <- dscore(triple) head(ds_all) # calculate D-score from only GSED SF items ds_sf <- dscore(triple, items = get_itemnames(instrument = "gs1")) head(ds_sf)