When to Use Vertical Grouped Barplots¶

Date published: 2018-06-11

Category: Data Visualizations

Subcategory: Best Practices

Tags: grouped barplots

Vertical bar charts are useful to illustrate sizes of data using different bar heights. A vertical grouped barplot often illustrates the sizes of multiple categories using different bar heights.

For example, let's say we had a service that rented out scooters in San Francisco, California. Customers can make a one-time rental or pay a monthly subscription fee and get unlimited rides for under 30 minutes. So, for each ride, we log if the customer was a non-subscriber or subscriber. With these two categories for types of riders, we can see how they compare to one another with a multiple vertical bar chart.

I'll illustrate a few examples below of when vertical grouped bar plots are useful.

Import Modules¶

In [128]:

                
                    Copied!
                    
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import random
%matplotlib inline
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import random
%matplotlib inline

Set visualization styles to have all fonts scaled larger and figure sizes larger.

In [129]:

                
                    Copied!
                    
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context('talk')
sns.set(rc={'figure.figsize':(11.7,8.27)})
sns.set_context('talk')

Example: Scooter Rides Per Month Over Time by Account Type¶

Let's continue with the example mentioned above. I'm curious about the trend of individual versus subscriber scooter rides over the past year. To get a high-level perspective, I think it would be helpful to look at the count of rides per month for each of these account types.

A sample of our original data would look like:

Date	Miles Ridden	Account Type
6/9/2018	2.1	non-subscriber
6/10/2018	1.5	subscriber
6/10/2018	3.9	subscriber

We'd like get a count of rides on the monthly level broken out by account type. Therefore, given the data above, we'd perform a group by operation on a month-year field, then group by account type, and then count the number of rides. We'd end up with a sample of data looking like:

Month Year	Count Non-Subscriber Rides	Count Subscriber Rides
May 2017	31100	900
June 2017	33900	1100
July 2017	36380	1300

Generate Scooter Ride Data¶

In [130]:

                
                    Copied!
                    
month_list = [i.strftime("%b %Y") for i in pd.date_range(start='5-2017', end='5-2018', freq='MS')]
month_list = [i.strftime("%b %Y") for i in pd.date_range(start='5-2017', end='5-2018', freq='MS')]

In [131]:

                
                    Copied!
                    
monthly_count_rides = [32000, 35000, 37680, 41500, 43300, 44000, 44350, 41000, 39000, 39500, 48000, 50000, 52000]
monthly_count_rides_subscribers = [900, 1100, 1300, 1800, 2860, 3300, 3350, 3480, 4005, 4790, 4980, 5150, 5290]
monthly_count_rides = [32000, 35000, 37680, 41500, 43300, 44000, 44350, 41000, 39000, 39500, 48000, 50000, 52000]
monthly_count_rides_subscribers = [900, 1100, 1300, 1800, 2860, 3300, 3350, 3480, 4005, 4790, 4980, 5150, 5290]

In [132]:

                
                    Copied!
                    
monthly_count_rides_individual = [month[0]-month[1] for month in zip(monthly_count_rides, monthly_count_rides_subscribers)]
monthly_count_rides_individual = [month[0]-month[1] for month in zip(monthly_count_rides, monthly_count_rides_subscribers)]

Plot Scooter Rides¶

The reason a vertical grouped barplot works well in this scenario is because we're most interested to see the change in count of rides by each type - rather than a change in total rides of the account types combined. So, with separate bars for individual and subscription accounts, we can easily visualize the trend over time.

In [133]:

                
                    Copied!
                    
df = pd.DataFrame({'month_year': month_list, 'count_scooter_rides': monthly_count_rides, 
                    'count_scooter_rides_subscribers': monthly_count_rides_subscribers,
                   'count_scooter_rides_non-subscribers': monthly_count_rides_individual})
df = pd.DataFrame({'month_year': month_list, 'count_scooter_rides': monthly_count_rides, 
                    'count_scooter_rides_subscribers': monthly_count_rides_subscribers,
                   'count_scooter_rides_non-subscribers': monthly_count_rides_individual})

Preview df

In [134]:

                
                    Copied!
                    
df.head()
df.head()

Out[134]:

	count_scooter_rides	count_scooter_rides_non-subscribers	count_scooter_rides_subscribers	month_year
0	32000	31100	900	May 2017
1	35000	33900	1100	Jun 2017
2	37680	36380	1300	Jul 2017
3	41500	39700	1800	Aug 2017
4	43300	40440	2860	Sep 2017

In [135]:

                
                    Copied!
                    
                        
                        
                    
                    

            
df.set_index('month_year')[['count_scooter_rides_subscribers', 'count_scooter_rides_non-subscribers']].plot(kind='bar', figsize=(14, 10))
plt.xticks(rotation=60)
plt.title("Count of Scooter Rides Per Month By Account Type Over Time", fontsize=18, y=1.01)
plt.xlabel("date [month year]", labelpad=15)
plt.ylabel("count [rides]", labelpad=15)
plt.legend(["subscribers", "non-subscribers"], fontsize=16, title="rider type");
df.set_index('month_year')[['count_scooter_rides_subscribers', 'count_scooter_rides_non-subscribers']].plot(kind='bar', figsize=(14, 10))
plt.xticks(rotation=60)
plt.title("Count of Scooter Rides Per Month By Account Type Over Time", fontsize=18, y=1.01)
plt.xlabel("date [month year]", labelpad=15)
plt.ylabel("count [rides]", labelpad=15)
plt.legend(["subscribers", "non-subscribers"], fontsize=16, title="rider type");

Explanation of Scooter Rides Plot¶

There is a clear trend that the amount of rides by subscribers has increased every month since May 2017. Also, the increase has been pretty significant as the count of subscriber rides per month has doubled from May 2017 to May 2018.

However, non-subscriber scooter rides increased during the warmer months of 2017 from May to November. During the winter season, count of non-subscriber scooter rides decreased per month over several months. By March 2018, when the weather was much warmer, the count of non-subscriber rides per month drastically increased.

Example: Favorite Sport to Play by Gender¶

Let's imagine we surveyed a group of friends to ask them their favorite sport. We recorded each person's response. We only allowed responses for the following sports:

basketball
baseball
soccer

A sample of our data would look like:

Name	Gender	Favorite Sport
Jake	Male	Basketball
Michele	Female	Baseball
Elizabeth	Female	Soccer

We want to visualize the data in two ways: 1) see favorite sports by gender and 2) see breakdown of a specific sport's interest by gender.

Generate Fictional Data¶

In [136]:

                
                    Copied!
                    
genders = []
random.seed(a=1) # ensures choices generated each time below are the same
for i in range(0, 26):
    gender = random.choice(['Female', 'Male'])
    genders.append(gender)
genders = []
random.seed(a=1) # ensures choices generated each time below are the same
for i in range(0, 26):
    gender = random.choice(['Female', 'Male'])
    genders.append(gender)

In [137]:

                
                    Copied!
                    
sports = ['Basketball']*6 + ['Baseball']*8 + ['Soccer']*12
sports = ['Basketball']*6 + ['Baseball']*8 + ['Soccer']*12

In [138]:

                
                    Copied!
                    
df2 = pd.DataFrame({'favorite_sport': sports, 'gender': genders})
df2 = pd.DataFrame({'favorite_sport': sports, 'gender': genders})

Preview df2.

In [139]:

                
                    Copied!
                    
df2.head()
df2.head()

Out[139]:

	favorite_sport	gender
0	Basketball	Female
1	Basketball	Female
2	Basketball	Male
3	Basketball	Female
4	Basketball	Male

Plot Categorized by Gender¶

In [140]:

                
                    Copied!
                    
ax = sns.countplot(x="gender", hue="favorite_sport", data=df2)
ax.axes.set_title("Breakdown of Genders' Favorite Sports", fontsize=22)
plt.legend(fontsize=15, title="sport");
ax = sns.countplot(x="gender", hue="favorite_sport", data=df2)
ax.axes.set_title("Breakdown of Genders' Favorite Sports", fontsize=22)
plt.legend(fontsize=15, title="sport");

Explanation of Plot Categorized by Gender¶

Our female friends majorly prefer soccer and our male friends nearly prefer all 3 sports equally.

Plot Categorized by Sport¶

In [141]:

                
                    Copied!
                    
ax = sns.countplot(x="favorite_sport", hue="gender", data=df2)
ax.axes.set_title("Breakdown of Favorite Sports by Gender", fontsize=22)
plt.xlabel("sport", labelpad=9);
ax = sns.countplot(x="favorite_sport", hue="gender", data=df2)
ax.axes.set_title("Breakdown of Favorite Sports by Gender", fontsize=22)
plt.xlabel("sport", labelpad=9);