When to Use Vertical Grouped Barplots¶
Date published: 2018-06-11
Category: Data Visualizations
Subcategory: Best Practices
Tags: grouped barplots
Vertical bar charts are useful to illustrate sizes of data using different bar heights. A vertical grouped barplot often illustrates the sizes of multiple categories using different bar heights.
For example, let's say we had a service that rented out scooters in San Francisco, California. Customers can make a one-time rental or pay a monthly subscription fee and get unlimited rides for under 30 minutes. So, for each ride, we log if the customer was a non-subscriber or subscriber. With these two categories for types of riders, we can see how they compare to one another with a multiple vertical bar chart.
I'll illustrate a few examples below of when vertical grouped bar plots are useful.
Import Modules¶
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import random
%matplotlib inline
Set visualization styles to have all fonts scaled larger and figure sizes larger.
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context('talk')
Example: Scooter Rides Per Month Over Time by Account Type¶
Let's continue with the example mentioned above. I'm curious about the trend of individual versus subscriber scooter rides over the past year. To get a high-level perspective, I think it would be helpful to look at the count of rides per month for each of these account types.
A sample of our original data would look like:
Date | Miles Ridden | Account Type |
---|---|---|
6/9/2018 | 2.1 | non-subscriber |
6/10/2018 | 1.5 | subscriber |
6/10/2018 | 3.9 | subscriber |
We'd like get a count of rides on the monthly level broken out by account type. Therefore, given the data above, we'd perform a group by operation on a month-year field, then group by account type, and then count the number of rides. We'd end up with a sample of data looking like:
Month Year | Count Non-Subscriber Rides | Count Subscriber Rides |
---|---|---|
May 2017 | 31100 | 900 |
June 2017 | 33900 | 1100 |
July 2017 | 36380 | 1300 |
Generate Scooter Ride Data¶
month_list = [i.strftime("%b %Y") for i in pd.date_range(start='5-2017', end='5-2018', freq='MS')]
monthly_count_rides = [32000, 35000, 37680, 41500, 43300, 44000, 44350, 41000, 39000, 39500, 48000, 50000, 52000]
monthly_count_rides_subscribers = [900, 1100, 1300, 1800, 2860, 3300, 3350, 3480, 4005, 4790, 4980, 5150, 5290]
monthly_count_rides_individual = [month[0]-month[1] for month in zip(monthly_count_rides, monthly_count_rides_subscribers)]
Plot Scooter Rides¶
The reason a vertical grouped barplot works well in this scenario is because we're most interested to see the change in count of rides by each type - rather than a change in total rides of the account types combined. So, with separate bars for individual and subscription accounts, we can easily visualize the trend over time.
df = pd.DataFrame({'month_year': month_list, 'count_scooter_rides': monthly_count_rides,
'count_scooter_rides_subscribers': monthly_count_rides_subscribers,
'count_scooter_rides_non-subscribers': monthly_count_rides_individual})
Preview df
df.head()
count_scooter_rides | count_scooter_rides_non-subscribers | count_scooter_rides_subscribers | month_year | |
---|---|---|---|---|
0 | 32000 | 31100 | 900 | May 2017 |
1 | 35000 | 33900 | 1100 | Jun 2017 |
2 | 37680 | 36380 | 1300 | Jul 2017 |
3 | 41500 | 39700 | 1800 | Aug 2017 |
4 | 43300 | 40440 | 2860 | Sep 2017 |
df.set_index('month_year')[['count_scooter_rides_subscribers', 'count_scooter_rides_non-subscribers']].plot(kind='bar', figsize=(14, 10))
plt.xticks(rotation=60)
plt.title("Count of Scooter Rides Per Month By Account Type Over Time", fontsize=18, y=1.01)
plt.xlabel("date [month year]", labelpad=15)
plt.ylabel("count [rides]", labelpad=15)
plt.legend(["subscribers", "non-subscribers"], fontsize=16, title="rider type");
Explanation of Scooter Rides Plot¶
There is a clear trend that the amount of rides by subscribers has increased every month since May 2017. Also, the increase has been pretty significant as the count of subscriber rides per month has doubled from May 2017 to May 2018.
However, non-subscriber scooter rides increased during the warmer months of 2017 from May to November. During the winter season, count of non-subscriber scooter rides decreased per month over several months. By March 2018, when the weather was much warmer, the count of non-subscriber rides per month drastically increased.
Example: Favorite Sport to Play by Gender¶
Let's imagine we surveyed a group of friends to ask them their favorite sport. We recorded each person's response. We only allowed responses for the following sports:
- basketball
- baseball
- soccer
A sample of our data would look like:
Name | Gender | Favorite Sport |
---|---|---|
Jake | Male | Basketball |
Michele | Female | Baseball |
Elizabeth | Female | Soccer |
We want to visualize the data in two ways: 1) see favorite sports by gender and 2) see breakdown of a specific sport's interest by gender.
Generate Fictional Data¶
genders = []
random.seed(a=1) # ensures choices generated each time below are the same
for i in range(0, 26):
gender = random.choice(['Female', 'Male'])
genders.append(gender)
sports = ['Basketball']*6 + ['Baseball']*8 + ['Soccer']*12
df2 = pd.DataFrame({'favorite_sport': sports, 'gender': genders})
Preview df2
.
df2.head()
favorite_sport | gender | |
---|---|---|
0 | Basketball | Female |
1 | Basketball | Female |
2 | Basketball | Male |
3 | Basketball | Female |
4 | Basketball | Male |
Plot Categorized by Gender¶
ax = sns.countplot(x="gender", hue="favorite_sport", data=df2)
ax.axes.set_title("Breakdown of Genders' Favorite Sports", fontsize=22)
plt.legend(fontsize=15, title="sport");
Explanation of Plot Categorized by Gender¶
Our female friends majorly prefer soccer and our male friends nearly prefer all 3 sports equally.
Plot Categorized by Sport¶
ax = sns.countplot(x="favorite_sport", hue="gender", data=df2)
ax.axes.set_title("Breakdown of Favorite Sports by Gender", fontsize=22)
plt.xlabel("sport", labelpad=9);