Brief on tidy data format

Data is essential in any analysis, and keeping it tidy and well-structured is even more important. Every next step will take you down the wrong path, if your initial data structure and format was not well-designed. Hence, it is worth taking an notice of some best practices.

Tidy data format

Tidy data has a specific format, as described by Hadley Wickham:

  • Each variable is a column
  • Each observation is a row
  • Each type of observational unit is a table

Julia Silge and David Robinson propose the following definition: a table with one-token-per-row.


Information sources:

  1. Tidy Data, Hadley Wickham, 2014 (link)
  2. Text mining with R – A Tidy Approach, Julia Silge and David Robinson, 2017 (link)

Note: this blog will be progressively updated to capture the core components of tidy data format.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s