The Data Validation CookbookStatistics Netherlands
2020-12-22 | validate version 1.0.1
This version of the book was rendered with
1.0.1. The latest release of
validate can be installed
from CRAN as follows.
The purposes of this book include demonstrating the main tools and workflows of
validate package, giving examples of common data validation tasks, and
showing how to analyze data validation results.
The book is organized as follows. Chapter 1 discusses the bare
necessities to be able to follow the rest of the book. Chapters
2 to 5 form the ‘cookbook’
part of the book and discuss many different ways to check your data by example.
Chapter 6 is devoted to deriving plausibility measures
validate package. Chapters 7 and
8 treat working with validate in-depth. Chapter
9 discusses how to compare two or more versions of a
dataset, possibly automated through the
lumberjack package. The
section with Biblographical Notes lists some references and points out some
literature for further reading.
Readers of this book are expected to have some knowledge of R. In particular, you should know how to import data into R and know a little about working with data frames and vectors.
Citing this work
To cite the
validate package please use the following citation.
MPJ van der Loo and E de Jonge (2020). Data Validation Infrastructure for R. Journal of Statistical Software, Accepted for publication.
To cite this cookbook, please use the following citation.
MPJ van der Loo (2020) The Data Validation Cookbook version 1.0.1. https://data-cleaning.github.io/validate
This work was partially funded by European Grant Agreement 88287–NL-VALIDATION of the European Statistcal System.
I am greatly indebted by dr Olav ten Bosch for carefully reviewing the manuscript of version 1.0.1. Any mistakes are of course entirely the author’s fault.
If you find a mistake, or have some suggestions, please file an issue or a pull request on the github page of the package: https://github.com/data-cleaning/validate. If you do not have or want a github account, you can contact the author via the e-mail address that is listed with the package.
This work is licensed under Creative Commons Attribution BY-NC 4.0 International License.