Tidy Data

Khalid Gharib
2 min readJul 14, 2020

--

A concept you will hear whilst studying Data science is something called tidy Data.

The term Tidy Data was coined by Hadley Wickham a very prominent and active creator of R packages. It describes a specific structure of data that makes for easy analysis, this constitutes 3 things:

  1. Each variable forms a column
  2. Each observation forms a row
  3. Each type of observational unit forms a table

any data set that doesn’t fit these 3 conditions would be considered messy

this is an example of messy data

now you can see although it is readable the main issue with this dataset is that some of the column names are variable values themselves.

in the example above to make sure it is tidy we will need to do a few things.

  • The airlines are already in a single column
  • The origin airports are column names and need to be transposed into a single column
  • The average arrival delay is tiled across the rows

an easy way to do this is using a the .melt() method.

.melt() method can take columns and stack them one-by-one on top of each other. It has two important parameters:

  • id_vars — a list (or single string) of column names that you want to keep as columns.
  • value_vars — a list (or single string) of column names that you would like to reshape into one column

melting the above data set would give us:

<Basic Layout>
df.melt(id_vars=’column_name’,
value_vars= ‘column_reshaped_into_values’,
var_name=’new_column_name’,
value_name=’new_column_name’)

with the .melt() method you are given two additional parameters:var_name and value_name.Set these parameters equal to column names of your choice.

.melt() is just one of the many tools you can use to ensure your data is clean but to make sure you use the .melt() method and any other method or function correctly, you must make sure you understand the data set you are dealing with

Learn Python, Data Science & Machine Learning with expert instruction

Start learning data science and machine learning using python today with hands-on courses, comprehensive books, and…www.dunderdata.com

--

--

No responses yet