When to Use Horizontal Bar Charts¶
Date published: 2018-06-14
Category: Data Visualizations
Subcategory: Best Practices
Tags: horizontal bar chart
Horizontal bar charts illustrate sizes of data using different bar heights.
When are horizontal bar charts preferred over vertical bar charts? I find horizontal bar charts useful to display a list of categories (usually 4 - 20) that have long names; the category names on the left-hand size make this horizontal bar chart easy to read and interpret.
I'll walk through a few examples of horizontal bar charts below.
Import Modules¶
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
from random import choice
from random import sample
from random import seed
% matplotlib inline
Example: Age of Family Members¶
In this example, I want to visualize the age of a large group of family members. A horizontal bar chart is useful here because names are often long and are easier to display horizontally than squeeze them in vertically on a vertical bar chart.
Generate Family Data¶
first_names = ['Dan', 'Joe', 'Abe', 'Jess', 'Lauren', 'Jamie', 'Matt', 'Penelope', 'Charlotte']
potential_last_names = ['Williams', 'Johnson']
seed(9001) # seed so we get the same results every time
full_names = [name + " " + choice(potential_last_names) for name in first_names]
ages = sample(range(12, 85), len(full_names))
df = pd.DataFrame({'names': full_names, 'ages': ages})
df.sort_values('ages', inplace=True, ascending=False)
Plot Horizontal Bar Plot of Family Members' Ages¶
sns.set_context("talk")
ax = sns.barplot(x='ages', y='names', data=df, orient='h', saturation=0.7)
ax.axes.set_title("Horizontal Bar Chart of Family Members' Age", fontsize=20, y=1.01)
ax.set(xlabel='Age (Years)', ylabel='Name');
Interpretation of Bar Plot of Family Members' Age¶
The oldest family members is Abe Johnson while the youngest is Joe Johnson.
There's a wide spread of ages from nearly 21 years old to 81 years old.
Example: Bay Area Bike Share Popular Start Stations¶
In the San Francisco Bay Area, a company Motivate operates a network of bikes across several cities. You can walk up to a bike, pay and unlock it from a dock, ride it to your destination, and park it in another nearby dock.
For each ride, Motivate records data on the starting dock station. I want to visualize the most popular starting dock stations.
Load Dataset on May 2018 Rides¶
df2 = pd.read_csv('201805-fordgobike-tripdata.csv')
Preview Some Data¶
df2[['start_time', 'end_time', 'duration_sec', 'member_birth_year', 'member_gender']].head()
start_time | end_time | duration_sec | member_birth_year | member_gender | |
---|---|---|---|---|---|
0 | 2018-05-31 21:41:51.4750 | 2018-06-01 13:28:22.7220 | 56791 | NaN | NaN |
1 | 2018-05-31 18:39:53.7690 | 2018-06-01 09:19:51.5410 | 52797 | 1983.0 | Male |
2 | 2018-05-31 21:09:48.0150 | 2018-06-01 09:09:52.4850 | 43204 | NaN | NaN |
3 | 2018-05-31 14:09:54.9720 | 2018-06-01 08:48:17.8150 | 67102 | 1979.0 | Male |
4 | 2018-05-31 16:07:23.8570 | 2018-06-01 08:28:47.2020 | 58883 | 1986.0 | Male |
Plot Horizontal Bar Chart of Count of Rides by Starting Stations¶
Below I limit my horizontal bar chart to just show the 15 most frequent starting docks.
ax2 = sns.countplot(y='start_station_name', data=df2, orient='h', order=df2['start_station_name'].value_counts().iloc[:15].index)
ax2.axes.set_title("Horizontal Bar Chart of Count of Rides by Starting Docks", fontsize=20, y=1.01)
ax2.set(xlabel='Count of Rides', ylabel='Start Station Name');
Interpretation of Horizontal Bar Chart of Count of Rides by Starting Docks¶
The most frequent starting dock is the San Francisco Ferry Building.
The most frequent starting docks are all in the SoMa neighborhood and typically near high-speed public transit options like BART and Caltrain.