A synthetic data set with person data with records to be corrected. The datasethas missing values
Format
A data frame with x rows and variables:
- income
monthly income, in US dollars
- age
age of a person in year
- gender
gender of a person
- year
year of measurement
- smokes
if a person smokes or not
- cigarettes
how many cigarretes a person smokes
...
The dataset is also available as a sqlite database at
system.file("db/person.db", package="dcmodifydb")
Examples
# load modification rules and apply:
library(dcmodify)
rules <- modifier(.file = system.file("db/corrections.yml", package="dcmodifydb"))
con <- DBI::dbConnect(RSQLite::SQLite(), dbname=system.file("db/person.db", package="dcmodifydb"))
person <- dplyr::tbl(con, "person")
print(person)
#> # Source: table<person> [4 x 6]
#> # Database: sqlite 3.38.5 [/Users/runner/work/_temp/Library/dcmodifydb/db/person.db]
#> income age gender year smokes cigarettes
#> <int> <int> <chr> <int> <chr> <int>
#> 1 2000 12 M 2020 no 10
#> 2 2010 14 f 2019 yes 4
#> 3 2010 25 v 19 no NA
#> 4 1010 65 M 20 yes NA
person2 <- modify(person, rules, copy=TRUE)
print(person2)
#> # Source: table<dcmodifydb_847354> [4 x 7]
#> # Database: sqlite 3.38.5 [/Users/runner/work/_temp/Library/dcmodifydb/db/person.db]
#> income age gender year smokes cigarettes ageclass
#> <int> <int> <chr> <int> <chr> <int> <chr>
#> 1 0 12 M 2020 yes 10 child
#> 2 0 14 F 2019 yes 4 child
#> 3 2010 25 F 2019 no 0 adult
#> 4 1010 65 M 2020 yes NA adult