Data is essential in any analysis, and keeping it tidy and well-structured is even more important. Every next step will take you down the wrong path, if your initial data structure and format was not well-designed. Hence, it is worth taking an notice of some best practices.
Tidy data format
Tidy data has a specific format, as described by Hadley Wickham:
- Each variable is a column
- Each observation is a row
- Each type of observational unit is a table
Julia Silge and David Robinson propose the following definition: a table with one-token-per-row.
- Tidy Data, Hadley Wickham, 2014 (link)
- Text mining with R – A Tidy Approach, Julia Silge and David Robinson, 2017 (link)
Note: this blog will be progressively updated to capture the core components of tidy data format.