When to Use Heatmaps¶

Date published: 2018-06-14

Category: Data Visualizations

Subcategory: Best Practices

Tags: heatmaps

Heatmaps are great for visualizing table-like data with variations in coloring.

If you're unfamiliar with heatmaps, please scroll down to see an example.

Heatmaps help reveal patterns of similar values next to one another based on their color.

I'll illustrate a few examples of heatmaps below.

Import Modules¶

In [1]:

                
                    Copied!
                    
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
from datetime import date
% matplotlib inline
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
from datetime import date
% matplotlib inline

Example: Heatmap of Bike Rides by Day of Week and Hour of Day¶

In the San Francisco Bay Area, a company Motivate operates a network of bikes across several cities. You can walk up to a bike, pay and unlock it from a dock, ride it to your destination, and park it in another nearby dock.

For each ride, Motivate records data on the start time, end time and more.

I'm curious to see riding patterns by day of week and hour of day. Are they different or similar?

Acquire Data and Organize for Heatmap Visualization¶

Read in CSV.

In [2]:

                
                    Copied!
                    
df = pd.read_csv('201805-fordgobike-tripdata.csv')
df = pd.read_csv('201805-fordgobike-tripdata.csv')

Preview data

In [3]:

                
                    Copied!
                    
df[['duration_sec', 'start_time', 'end_time']].head()
df[['duration_sec', 'start_time', 'end_time']].head()

Out[3]:

	duration_sec	start_time	end_time
0	56791	2018-05-31 21:41:51.4750	2018-06-01 13:28:22.7220
1	52797	2018-05-31 18:39:53.7690	2018-06-01 09:19:51.5410
2	43204	2018-05-31 21:09:48.0150	2018-06-01 09:09:52.4850
3	67102	2018-05-31 14:09:54.9720	2018-06-01 08:48:17.8150
4	58883	2018-05-31 16:07:23.8570	2018-06-01 08:28:47.2020

Convert start_time field into a datetime type with new field name called start_time_datetime

In [4]:

                
                    Copied!
                    
df['start_time_datetime'] = pd.to_datetime(df['start_time'])
df['start_time_datetime'] = pd.to_datetime(df['start_time'])

Keep records of rides only before May 29th so we have the same number of occurrences for all days of the week.

In [5]:

                
                    Copied!
                    
df = df[df['start_time_datetime']<date(2018, 5, 29)]
df = df[df['start_time_datetime']

Make new column start_time_day_name to be the day name of start of the ride such as Saturday.

In [6]:

                
                    Copied!
                    
df['start_time_day_name'] = df['start_time_datetime'].dt.weekday_name;
df['start_time_day_name'] = df['start_time_datetime'].dt.weekday_name;

Make new column start_time_hour for the start time hour of rides.

In [7]:

                
                    Copied!
                    
df['start_time_hour'] = df['start_time_datetime'].dt.hour;
df['start_time_hour'] = df['start_time_datetime'].dt.hour;

Make new dataframe df_rides_day_hour2 to pivot our data and by day and hour, get the count of rides.

In [8]:

                
                    Copied!
                    
df_rides_day_hour2 = pd.pivot_table(df[['start_time_day_name', 'start_time_hour', 'duration_sec']], index=['start_time_day_name', 'start_time_hour'], aggfunc='count')
df_rides_day_hour2 = pd.pivot_table(df[['start_time_day_name', 'start_time_hour', 'duration_sec']], index=['start_time_day_name', 'start_time_hour'], aggfunc='count')

Unstack below puts days on the x-axis and hour of day on the y-axis.

In [9]:

                
                    Copied!
                    
df_rides_day_hour3 = df_rides_day_hour2.unstack(level=0)
df_rides_day_hour3 = df_rides_day_hour2.unstack(level=0)

Re-index axis so day of week appears in logical progression.

In [10]:

                
                    Copied!
                    
df_rides_day_hour3 = df_rides_day_hour3.reindex_axis(labels=['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'], axis=1, level=1)
df_rides_day_hour3 = df_rides_day_hour3.reindex_axis(labels=['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'], axis=1, level=1)

Create new axis tick labels for easy to read hours and days of the week.

In [11]:

                
                    Copied!
                    
morning_hours = []
for hour in range(1, 12):
    detailed_hour = str(hour) + "am"
    morning_hours.append(detailed_hour)
morning_hours = []
for hour in range(1, 12):
    detailed_hour = str(hour) + "am"
    morning_hours.append(detailed_hour)

In [12]:

                
                    Copied!
                    
afternoon_hours = []
for hour in range(1, 12):
    detailed_hour = str(hour) + "pm"
    afternoon_hours.append(detailed_hour)
afternoon_hours = []
for hour in range(1, 12):
    detailed_hour = str(hour) + "pm"
    afternoon_hours.append(detailed_hour)

In [13]:

                
                    Copied!
                    
detailed_hours = ["12am"] + morning_hours + ["12pm"] + afternoon_hours
detailed_hours = ["12am"] + morning_hours + ["12pm"] + afternoon_hours

In [14]:

                
                    Copied!
                    
day_short_names = ['Mon', 'Tues', 'Wed', 'Thurs', 'Fri', 'Sat', 'Sun']
day_short_names = ['Mon', 'Tues', 'Wed', 'Thurs', 'Fri', 'Sat', 'Sun']

Plot Heatmap of Ride Counts by Day and Hour of Day¶

In [31]:

                
                    Copied!
                    
sns.set_context("talk")
f, ax = plt.subplots(figsize=(11, 15))
ax = sns.heatmap(df_rides_day_hour3, annot=True, fmt="d", linewidths=.5, ax=ax, xticklabels=day_short_names, yticklabels=detailed_hours)
ax.axes.set_title("Heatmap of Ride Counts by Day and Hour of Day", fontsize=24, y=1.01)
ax.set(xlabel='Day of Week', ylabel='Starting Hour of Ride');
sns.set_context("talk")
f, ax = plt.subplots(figsize=(11, 15))
ax = sns.heatmap(df_rides_day_hour3, annot=True, fmt="d", linewidths=.5, ax=ax, xticklabels=day_short_names, yticklabels=detailed_hours)
ax.axes.set_title("Heatmap of Ride Counts by Day and Hour of Day", fontsize=24, y=1.01)
ax.set(xlabel='Day of Week', ylabel='Starting Hour of Ride');

Interpretation of Heat Map of Rides¶

On weekdays, Monday to Friday, most rides are taken during typical commuting hours, from 7am to 9am and 4pm - 7pm.

On weekends, there's fairly consistent amount of rides per hour from 11am - 6pm.

Example: Heatmap of Flights By Month and Year¶

The Seaborn visualization library provides an example dataset of the count of flights per month over the years 1949 to 1960. I want to easily visualize this data and see if there are any patterns.

Acquire the Flights Dataset¶

In [21]:

                
                    Copied!
                    
flights_long = sns.load_dataset("flights")
flights_long = sns.load_dataset("flights")

Pivot the Data to Heatmap Format¶

In [22]:

                
                    Copied!
                    
flights = flights_long.pivot("month", "year", "passengers")
flights = flights_long.pivot("month", "year", "passengers")

Draw a Heatmap of Flight Counts by Month and Year¶

In [32]:

                
                    Copied!
                    
f, ax = plt.subplots(figsize=(14, 13))
sns.heatmap(flights, annot=True, fmt="d", linewidths=.5, ax=ax, cmap="Greens")
ax.axes.set_title("Heatmap of Count of Flights by Month and Year", fontsize=24, y=1.01);
f, ax = plt.subplots(figsize=(14, 13))
sns.heatmap(flights, annot=True, fmt="d", linewidths=.5, ax=ax, cmap="Greens")
ax.axes.set_title("Heatmap of Count of Flights by Month and Year", fontsize=24, y=1.01);

Interpret Heatmap of Flights Over Time¶

Over the years, there's a trend towards more flights by month.

The most frequent flight months of any year are typically July and August.