Data Visualizations Best Practices Tutorial

When to Use Heatmaps

Heatmaps are great for visualizing table-like data with variations in coloring.

If you're unfamiliar with heatmaps, please scroll down to see an example.

Heatmaps help reveal patterns of similar values next to one another based on their color.

I'll illustrate a few examples of heatmaps below.

Import Modules

In [1]:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
from datetime import date
% matplotlib inline

Example: Heatmap of Bike Rides by Day of Week and Hour of Day

In the San Francisco Bay Area, a company Motivate operates a network of bikes across several cities. You can walk up to a bike, pay and unlock it from a dock, ride it to your destination, and park it in another nearby dock.

For each ride, Motivate records data on the start time, end time and more.

I'm curious to see riding patterns by day of week and hour of day. Are they different or similar?

Acquire Data and Organize for Heatmap Visualization

Read in CSV.

In [2]:
df = pd.read_csv('201805-fordgobike-tripdata.csv')

Preview data

In [3]:
df[['duration_sec', 'start_time', 'end_time']].head()
duration_sec start_time end_time
0 56791 2018-05-31 21:41:51.4750 2018-06-01 13:28:22.7220
1 52797 2018-05-31 18:39:53.7690 2018-06-01 09:19:51.5410
2 43204 2018-05-31 21:09:48.0150 2018-06-01 09:09:52.4850
3 67102 2018-05-31 14:09:54.9720 2018-06-01 08:48:17.8150
4 58883 2018-05-31 16:07:23.8570 2018-06-01 08:28:47.2020

Convert start_time field into a datetime type with new field name called start_time_datetime

In [4]:
df['start_time_datetime'] = pd.to_datetime(df['start_time'])

Keep records of rides only before May 29th so we have the same number of occurrences for all days of the week.

In [5]:
df = df[df['start_time_datetime']<date(2018, 5, 29)]

Make new column start_time_day_name to be the day name of start of the ride such as Saturday.

In [6]:
df['start_time_day_name'] = df['start_time_datetime'].dt.weekday_name;

Make new column start_time_hour for the start time hour of rides.

In [7]:
df['start_time_hour'] = df['start_time_datetime'].dt.hour;

Make new dataframe df_rides_day_hour2 to pivot our data and by day and hour, get the count of rides.

In [8]:
df_rides_day_hour2 = pd.pivot_table(df[['start_time_day_name', 'start_time_hour', 'duration_sec']], index=['start_time_day_name', 'start_time_hour'], aggfunc='count')

Unstack below puts days on the x-axis and hour of day on the y-axis.

In [9]:
df_rides_day_hour3 = df_rides_day_hour2.unstack(level=0)

Re-index axis so day of week appears in logical progression.

In [10]:
df_rides_day_hour3 = df_rides_day_hour3.reindex_axis(labels=['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'], axis=1, level=1)

Create new axis tick labels for easy to read hours and days of the week.

In [11]:
morning_hours = []
for hour in range(1, 12):
    detailed_hour = str(hour) + "am"
In [12]:
afternoon_hours = []
for hour in range(1, 12):
    detailed_hour = str(hour) + "pm"
In [13]:
detailed_hours = ["12am"] + morning_hours + ["12pm"] + afternoon_hours
In [14]:
day_short_names = ['Mon', 'Tues', 'Wed', 'Thurs', 'Fri', 'Sat', 'Sun']

Plot Heatmap of Ride Counts by Day and Hour of Day

In [31]:
f, ax = plt.subplots(figsize=(11, 15))
ax = sns.heatmap(df_rides_day_hour3, annot=True, fmt="d", linewidths=.5, ax=ax, xticklabels=day_short_names, yticklabels=detailed_hours)
ax.axes.set_title("Heatmap of Ride Counts by Day and Hour of Day", fontsize=24, y=1.01)
ax.set(xlabel='Day of Week', ylabel='Starting Hour of Ride');

Interpretation of Heat Map of Rides

On weekdays, Monday to Friday, most rides are taken during typical commuting hours, from 7am to 9am and 4pm - 7pm.

On weekends, there's fairly consistent amount of rides per hour from 11am - 6pm.

Example: Heatmap of Flights By Month and Year

The Seaborn visualization library provides an example dataset of the count of flights per month over the years 1949 to 1960. I want to easily visualize this data and see if there are any patterns.

Acquire the Flights Dataset

In [21]:
flights_long = sns.load_dataset("flights")

Pivot the Data to Heatmap Format

In [22]:
flights = flights_long.pivot("month", "year", "passengers")

Draw a Heatmap of Flight Counts by Month and Year

In [32]:
f, ax = plt.subplots(figsize=(14, 13))
sns.heatmap(flights, annot=True, fmt="d", linewidths=.5, ax=ax, cmap="Greens")
ax.axes.set_title("Heatmap of Count of Flights by Month and Year", fontsize=24, y=1.01);

Interpret Heatmap of Flights Over Time

Over the years, there's a trend towards more flights by month.

The most frequent flight months of any year are typically July and August.