When to Use Horizontal Bar Charts¶

Date published: 2018-06-14

Category: Data Visualizations

Subcategory: Best Practices

Tags: horizontal bar chart

Horizontal bar charts illustrate sizes of data using different bar heights.

When are horizontal bar charts preferred over vertical bar charts? I find horizontal bar charts useful to display a list of categories (usually 4 - 20) that have long names; the category names on the left-hand size make this horizontal bar chart easy to read and interpret.

I'll walk through a few examples of horizontal bar charts below.

Import Modules¶

In [61]:

                
                    Copied!
                    
                        
                        
                    
                    

            
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
from random import choice
from random import sample
from random import seed
% matplotlib inline
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
from random import choice
from random import sample
from random import seed
% matplotlib inline

Example: Age of Family Members¶

In this example, I want to visualize the age of a large group of family members. A horizontal bar chart is useful here because names are often long and are easier to display horizontally than squeeze them in vertically on a vertical bar chart.

Generate Family Data¶

In [74]:

                
                    Copied!
                    
first_names = ['Dan', 'Joe', 'Abe', 'Jess', 'Lauren', 'Jamie', 'Matt', 'Penelope', 'Charlotte']
first_names = ['Dan', 'Joe', 'Abe', 'Jess', 'Lauren', 'Jamie', 'Matt', 'Penelope', 'Charlotte']

In [75]:

                
                    Copied!
                    
potential_last_names = ['Williams', 'Johnson']
potential_last_names = ['Williams', 'Johnson']

In [76]:

                
                    Copied!
                    
seed(9001) # seed so we get the same results every time
full_names = [name + " " + choice(potential_last_names) for name in first_names]
seed(9001) # seed so we get the same results every time
full_names = [name + " " + choice(potential_last_names) for name in first_names]

In [77]:

                
                    Copied!
                    
ages = sample(range(12, 85), len(full_names))
ages = sample(range(12, 85), len(full_names))

In [78]:

                
                    Copied!
                    
df = pd.DataFrame({'names': full_names, 'ages': ages})
df.sort_values('ages', inplace=True, ascending=False)
df = pd.DataFrame({'names': full_names, 'ages': ages})
df.sort_values('ages', inplace=True, ascending=False)

Plot Horizontal Bar Plot of Family Members' Ages¶

In [79]:

                
                    Copied!
                    
sns.set_context("talk")
ax = sns.barplot(x='ages', y='names', data=df, orient='h', saturation=0.7)
ax.axes.set_title("Horizontal Bar Chart of Family Members' Age", fontsize=20, y=1.01)
ax.set(xlabel='Age (Years)', ylabel='Name');
sns.set_context("talk")
ax = sns.barplot(x='ages', y='names', data=df, orient='h', saturation=0.7)
ax.axes.set_title("Horizontal Bar Chart of Family Members' Age", fontsize=20, y=1.01)
ax.set(xlabel='Age (Years)', ylabel='Name');

Interpretation of Bar Plot of Family Members' Age¶

The oldest family members is Abe Johnson while the youngest is Joe Johnson.

There's a wide spread of ages from nearly 21 years old to 81 years old.

In the San Francisco Bay Area, a company Motivate operates a network of bikes across several cities. You can walk up to a bike, pay and unlock it from a dock, ride it to your destination, and park it in another nearby dock.

For each ride, Motivate records data on the starting dock station. I want to visualize the most popular starting dock stations.

Load Dataset on May 2018 Rides¶

In [84]:

                
                    Copied!
                    
df2 = pd.read_csv('201805-fordgobike-tripdata.csv')
df2 = pd.read_csv('201805-fordgobike-tripdata.csv')

Preview Some Data¶

In [86]:

                
                    Copied!
                    
df2[['start_time', 'end_time', 'duration_sec', 'member_birth_year', 'member_gender']].head()
df2[['start_time', 'end_time', 'duration_sec', 'member_birth_year', 'member_gender']].head()

Out[86]:

	start_time	end_time	duration_sec	member_birth_year	member_gender
0	2018-05-31 21:41:51.4750	2018-06-01 13:28:22.7220	56791	NaN	NaN
1	2018-05-31 18:39:53.7690	2018-06-01 09:19:51.5410	52797	1983.0	Male
2	2018-05-31 21:09:48.0150	2018-06-01 09:09:52.4850	43204	NaN	NaN
3	2018-05-31 14:09:54.9720	2018-06-01 08:48:17.8150	67102	1979.0	Male
4	2018-05-31 16:07:23.8570	2018-06-01 08:28:47.2020	58883	1986.0	Male

Plot Horizontal Bar Chart of Count of Rides by Starting Stations¶

Below I limit my horizontal bar chart to just show the 15 most frequent starting docks.

In [96]:

                
                    Copied!
                    
ax2 = sns.countplot(y='start_station_name', data=df2, orient='h', order=df2['start_station_name'].value_counts().iloc[:15].index)
ax2.axes.set_title("Horizontal Bar Chart of Count of Rides by Starting Docks", fontsize=20, y=1.01)
ax2.set(xlabel='Count of Rides', ylabel='Start Station Name');
ax2 = sns.countplot(y='start_station_name', data=df2, orient='h', order=df2['start_station_name'].value_counts().iloc[:15].index)
ax2.axes.set_title("Horizontal Bar Chart of Count of Rides by Starting Docks", fontsize=20, y=1.01)
ax2.set(xlabel='Count of Rides', ylabel='Start Station Name');

Interpretation of Horizontal Bar Chart of Count of Rides by Starting Docks¶

The most frequent starting dock is the San Francisco Ferry Building.

The most frequent starting docks are all in the SoMa neighborhood and typically near high-speed public transit options like BART and Caltrain.