Tips and Tricks with Pandas

Khalid Gharib
4 min readAug 29, 2020

--

In this blog, I wanted to go over tips and tricks that I couldn’t do stand-alone blogs for but are very helpful tools for anyone who wants to advance their pandas skills, as well as things that can improve your presentations and simplify the EDA process

Highlighting

this is something I don’t see often and it makes a really big difference when you are presenting a table where specific values are important to take note of.

let say you have a df that you have performed a pivot table on

pivot_df = df.pivot_table(index=’dept’, columns=’race’,
values=’salary’, aggfunc=’mean’).round(-3)
print(pivot_df)

you can highlight the max values by column or row like so:

pivot_df.style.highlight_max(axis=0, color=’orange’) ← axis=0 is column

you can highlight by row by changing it to ‘axis=1’

you can even further highlight each cell based on the value

pivot_df.style.background_gradient(cmap=’YlOrRd’)

Pandas Display

sometimes when using pandas the data frame can look messy depending on the data as the rows and columns size are set to a default length and width. This can sometimes cut texts in cells or not display fully for columns that contain long strings or long data.

you can manually set the number of columns/rows you want to display or even the column width by accessing the panda's options like so:

pd.options.display.max_columns = 100
pd.options.display.max_rows = 300
pd.options.display.max_colwidth = 50

the numbers are just placeholders and you will need to write your own numbers depending on your data.

Query

This is something that I have hardly ever seen online as people often just use the SQL library on python which is very similar to the query method. I find the query method very straight forward and easy to apply quickly and effortlessly and feel it is very underrated and not used much at all

it is as simple as just applying the method on your data frame. you use the .query() method and using SQL syntax you write in between ‘ ’ the query you want to apply.

bikes.query(‘tripduration > 1000’)

this will look at the data set and return only rows that fulfill this query where the column tripduration is greater than 1000.

similarly, you can do more than one query at a time and can apply the ‘and’, ‘or’, ‘not’ just as you would with a SQL query.

bikes.query(‘tripduration > 1000 and temperature > 85’)
bikes.query(‘gender == “Female” and tripduration > 2000’)

you can even use ‘in’ for multiple entries such as:

bikes.query(‘events in [“snow”, “rain”]’)

this is not meant to be an in-depth explanation of using the .query() method but rather just to bring it to your attention as a simple and easy to use a method that can be used in place of SQL for pandas.

Apply method

the apply method is an easy way to apply a function to a specific column while making your code look shorter and cleaner. An example of this can be seen in the Titanic Dataset. We have a column called Embarked. This refers to the Port of Embarkation.

Now we want to change the single letters to the name of the port to make it clearer and easier to read and analyze. In the example above we would change:
S → Southampton
C → Cherbourg
Q → Queenstown

A possible solution is:

df['Embarked'] = df['Embarked'].map({'S':'Southampton', 'C':'Cherbourg', 'Q':'Queenstown'})

you can see how this method will start to get very messy and long if we were doing it for more than 3 ports in the column.

This is where the Apply method comes in handy, you can write a function that does the above and then call it and apply it to the column you want.

def get_full_city_name(cityCode):
if (cityCode == "S"):
return "Southampton"
elif (cityCode == "C"):
return "Cherbourg"
elif (cityCode == "Q"):
return "Queenstown"
df["Embarked"].apply(get_full_city_name)

What is happening here is that this function is being applied to each individual value in the Embarked column. As you can see the code is much cleaner and easier to read. It is as simple as applying said function on the column.

Hopefully, these few tips and tricks with pandas can help you become more efficient and effective at performing EDA

25 Tricks for Pandas — KDnuggets

Last week, Kevin Markham (@justmarkham) of DataSchool.io posted a handy video and a companion Jupyter notebook titled…www.kdnuggets.com

10 Python Pandas tricks to make data analysis more enjoyable

If one has not yet fallen in love with Pandas, it may be because he/she has not seen enough cool examplestowardsdatascience.com

Learn Python, Data Science & Machine Learning with expert instruction

https://www.kaggle.com/c/titanic/data?select=train.csv

Start learning data science and machine learning using python today with hands-on courses, comprehensive books, and…www.dunderdata.com

--

--

No responses yet