Data Visualizations Best Practices Tutorial

When to Use Horizontal Bar Charts

Horizontal bar charts illustrate sizes of data using different bar heights.

When are horizontal bar charts preferred over vertical bar charts? I find horizontal bar charts useful to display a list of categories (usually 4 - 20) that have long names; the category names on the left-hand size make this horizontal bar chart easy to read and interpret.

I'll walk through a few examples of horizontal bar charts below.

Import Modules

import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
from random import choice
from random import sample
from random import seed
% matplotlib inline

Example: Age of Family Members

In this example, I want to visualize the age of a large group of family members. A horizontal bar chart is useful here because names are often long and are easier to display horizontally than squeeze them in vertically on a vertical bar chart.

Generate Family Data

first_names = ['Dan', 'Joe', 'Abe', 'Jess', 'Lauren', 'Jamie', 'Matt', 'Penelope', 'Charlotte']
potential_last_names = ['Williams', 'Johnson']
seed(9001) # seed so we get the same results every time
full_names = [name + " " + choice(potential_last_names) for name in first_names]
ages = sample(range(12, 85), len(full_names))
df = pd.DataFrame({'names': full_names, 'ages': ages})
df.sort_values('ages', inplace=True, ascending=False)

Plot Horizontal Bar Plot of Family Members' Ages

sns.set_context("talk")
ax = sns.barplot(x='ages', y='names', data=df, orient='h', saturation=0.7)
ax.axes.set_title("Horizontal Bar Chart of Family Members' Age", fontsize=20, y=1.01)
ax.set(xlabel='Age (Years)', ylabel='Name');

png

Interpretation of Bar Plot of Family Members' Age

The oldest family members is Abe Johnson while the youngest is Joe Johnson.

There's a wide spread of ages from nearly 21 years old to 81 years old.

Example: Bay Area Bike Share Popular Start Stations

In the San Francisco Bay Area, a company Motivate operates a network of bikes across several cities. You can walk up to a bike, pay and unlock it from a dock, ride it to your destination, and park it in another nearby dock.

For each ride, Motivate records data on the starting dock station. I want to visualize the most popular starting dock stations.

Load Dataset on May 2018 Rides

df2 = pd.read_csv('201805-fordgobike-tripdata.csv')

Preview Some Data

df2[['start_time', 'end_time', 'duration_sec', 'member_birth_year', 'member_gender']].head()
start_time end_time duration_sec member_birth_year member_gender
0 2018-05-31 21:41:51.4750 2018-06-01 13:28:22.7220 56791 NaN NaN
1 2018-05-31 18:39:53.7690 2018-06-01 09:19:51.5410 52797 1983.0 Male
2 2018-05-31 21:09:48.0150 2018-06-01 09:09:52.4850 43204 NaN NaN
3 2018-05-31 14:09:54.9720 2018-06-01 08:48:17.8150 67102 1979.0 Male
4 2018-05-31 16:07:23.8570 2018-06-01 08:28:47.2020 58883 1986.0 Male

Plot Horizontal Bar Chart of Count of Rides by Starting Stations

Below I limit my horizontal bar chart to just show the 15 most frequent starting docks.

ax2 = sns.countplot(y='start_station_name', data=df2, orient='h', order=df2['start_station_name'].value_counts().iloc[:15].index)
ax2.axes.set_title("Horizontal Bar Chart of Count of Rides by Starting Docks", fontsize=20, y=1.01)
ax2.set(xlabel='Count of Rides', ylabel='Start Station Name');

png

Interpretation of Horizontal Bar Chart of Count of Rides by Starting Docks

The most frequent starting dock is the San Francisco Ferry Building.

The most frequent starting docks are all in the SoMa neighborhood and typically near high-speed public transit options like BART and Caltrain.