The Data Validation Cookbook
Statistics Netherlandsmpj.vanderloo@cbs.nl
2023-05-01 | validate version 1.1.3
Preface
This book is about checking data with the validate package for R.
This version of the book was rendered with validate
version
1.1.3. The latest release of validate
can be installed
from CRAN as follows.
The purposes of this book include demonstrating the main tools and workflows of
the validate
package, giving examples of common data validation tasks, and
showing how to analyze data validation results.
The book is organized as follows. Chapter 1 discusses the bare
necessities to be able to follow the rest of the book. Chapters
2 to 5 form the ‘cookbook’
part of the book and discuss many different ways to check your data by example.
Chapter 6 is devoted to deriving plausibility measures
with the validate
package. Chapters 7 and
8 treat working with validate in-depth. Chapter
10 discusses how to compare two or more versions of a
dataset, possibly automated through the
lumberjack package. The
section with Biblographical Notes lists some references and points out some
literature for further reading.
Prerequisites
Readers of this book are expected to have some knowledge of R. In particular, you should know how to import data into R and know a little about working with data frames and vectors.
Citing this work
To cite the validate
package please use the following citation.
MPJ van der Loo and E de Jonge (2021). Data Validation Infrastructure for R. Journal of Statistical Software, 97(10) paper.
To cite this cookbook, please use the following citation.
MPJ van der Loo (2023) The Data Validation Cookbook version 1.1.3. https://data-cleaning.github.io/validate
Acknowledgements
This work was partially funded by European Grant Agreement 88287–NL-VALIDATION of the European Statistcal System.
Contributing
If you find a mistake, or have some suggestions, please file an issue or a pull request on the github page of the package: https://github.com/data-cleaning/validate. If you do not have or want a github account, you can contact the author via the e-mail address that is listed with the package.
License
This work is licensed under Creative Commons Attribution BY-NC 4.0 International License.