Skip to contents

A synthetic data set with person data with records to be corrected. The datasethas missing values

Usage

person

Format

A data frame with x rows and variables:

income

monthly income, in US dollars

age

age of a person in year

gender

gender of a person

year

year of measurement

smokes

if a person smokes or not

cigarettes

how many cigarretes a person smokes

...

The dataset is also available as a sqlite database at system.file("db/person.db", package="dcmodifydb")

Examples


# load modification rules and apply:
library(dcmodify)
rules <- modifier(.file = system.file("db/corrections.yml", package="dcmodifydb"))

con <- DBI::dbConnect(RSQLite::SQLite(), dbname=system.file("db/person.db", package="dcmodifydb"))
person <- dplyr::tbl(con, "person")
print(person)
#> # Source:   table<person> [4 x 6]
#> # Database: sqlite 3.38.5 [/Users/runner/work/_temp/Library/dcmodifydb/db/person.db]
#>   income   age gender  year smokes cigarettes
#>    <int> <int> <chr>  <int> <chr>       <int>
#> 1   2000    12 M       2020 no             10
#> 2   2010    14 f       2019 yes             4
#> 3   2010    25 v         19 no             NA
#> 4   1010    65 M         20 yes            NA

person2 <- modify(person, rules, copy=TRUE)
print(person2)
#> # Source:   table<dcmodifydb_847354> [4 x 7]
#> # Database: sqlite 3.38.5 [/Users/runner/work/_temp/Library/dcmodifydb/db/person.db]
#>   income   age gender  year smokes cigarettes ageclass
#>    <int> <int> <chr>  <int> <chr>       <int> <chr>   
#> 1      0    12 M       2020 yes            10 child   
#> 2      0    14 F       2019 yes             4 child   
#> 3   2010    25 F       2019 no              0 adult   
#> 4   1010    65 M       2020 yes            NA adult