The goal of gsedread is to read validation data of the project Global Scales for Early Development (GSED).
Installation
Install the gsedread
package from GitHub as follows:
install.packages("remotes")
remotes::install_github("d-score/gsedread")
There is no CRAN version.
Example
You need access to the proper SharePoint site and sync the data to a local OneDrive. In the file .Renviron
in your home directory add a line specifying the location of your synced OneDrive, e.g.,
After setting the environmental variable ONEDRIVE_GSED
, restart R, and manually check whether you are able to read the OneDrive directory.
head(dir(Sys.getenv("ONEDRIVE_GSED")), 3)
#> [1] "Archive Data dictionaries" "Data Merge"
#> [3] "Final Data Analysis"
The following commands reads all SF data from GSED Final Collated Phase 1 Data Files 18_05_22
Sharepoint directory and returns a tibble with one record per administration.
Count the number of records per file:
table(data$file)
#>
#> ban_sf_2021_11_03 ban_sf_new_enrollment_17_05_2022
#> 1543 72
#> ban_sf_predictive_17_05_2022 pak_sf_2022_05_17
#> 473 1761
#> pak_sf_new_enrollment_2022_05_17 pak_sf_predictive_2022_05_17
#> 72 459
#> tza_sf_2021_11_01 tza_sf_new_enrollment_10_05_2022
#> 1427 74
#> tza_sf_predictive_10_05_2022
#> 469
Process variable names user-friendly alternative:
rename_vector(colnames(data)[c(1:3, 19, 21:25)], lexout = "gsed2", trim = "Ma_SF_")
#> [1] "file" "gsed_id" "parent_id" "date" "gpalac001" "gpacgc002"
#> [7] "gpafmc003" "gpasec004" "gpamoc005"
Operations
The package reads and processes GSED data. It does not store data. The read_sf()
and read_lf()
functions takes the following actions:
- Constructs the paths to the files OneDrive sync file;
- Reads all specified datasets in a list;
- Internally specifies the desired format for each column;
- Specifies the available date and data-time formats per file;
- Recodes empty,
NA
,-8888
,-8,888.00
and-9999
values asNA
; - Repairs problems with mixed data-time formats in the adaptive Pakistan data;
- Stacks the datasets to one tibble and adds columns
file
andadm
; - Removes records without a
GSED_ID
.
Item renaming with rename_variables()
relies on the item translation table at https://github.com/D-score/gsedread/blob/main/inst/extdata/itemnames_translate.tsv.
Acknowledgement
This study was supported by the Bill & Melinda Gates Foundation. The contents are the sole responsibility of the authors and may not necessarily represent the official views of the Bill & Melinda Gates Foundation or other agencies that may have supported the primary data studies used in the present study.