Teaching Answer Guide

Sample Learning Exercise 0 with Answers

Import Modules

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
% matplotlib inline

Set Visualization Styles

sns.set_context("talk")
sns.set_style("darkgrid")

Grading Rubric

Creation of visualizations are each worth 3 points. All other questions are 1 point each. It's possible to get partial-credit on the visualizations if you present work close to the best answer.

Flights Dataset

This dataset includes the total count of passengers on airline flights for each month from 1949 to 1960.

Question 1: Read in dataset using Seaborn and assign to variable

You can use this code snippet to read in the dataset: sns.load_dataset('flights')

df = sns.load_dataset('flights')

Question 2: How many records/rows are in this dataset?

Please programatically print the value.

len(df)
144

Print a DataFrame of 3 columns: year, month and count of passengers but only for the months of December

You will need filter the DataFrame to just view rows where a condition is met (month is equal to December). You can learn how to do that on Chris Albon's tutorial.

df[df['month']=='December']
year month passengers
11 1949 December 118
23 1950 December 140
35 1951 December 166
47 1952 December 194
59 1953 December 201
71 1954 December 229
83 1955 December 278
95 1956 December 306
107 1957 December 336
119 1958 December 337
131 1959 December 405
143 1960 December 432

Plot the count of passengers on the flights in December (of each year) over time

  • This should be a bar graph because each year's December count of passengers essentially represents a summed-up or total value of all passengers. (Alternatively, it could be a line plot but I prefer bar plots for this situation.)
  • x-axis ticks each represent December for a specific year
  • on the x-axis, years should progress from left to right in increasing order from 1949 to 1960
  • y-axis should be the count of passengers for flights
  • please make the figure size larger than the default
  • please provide a proper label for the title, x-label and y-label
  • please make the font larger for default on the x-ticks, y-ticks, x-label, y-label and title so it's more easily readable
  • change the color of the bars to a different dark color (not the default blue)

Hint: you can use Pandas Plot from the documentation at this page. Use the kind argument to pass in a value of bar.

Hint: if you use Matplotlib, Pandas Plot or Seaborn, you can use sns.set_context("poster") or sns.set_context('talk') to make nearly all parts of the plot larger. See the documentation here.

Hint: you can see available colors for bars at this documentation page.

df[df['month']=='December'].set_index('year')['passengers'].plot(kind='bar', color='cadetblue', figsize=(14, 10), rot=0)
plt.xlabel("Year (for the month of December only)", labelpad=12)
plt.ylabel("Count of Passengers on Flights", labelpad=12)
plt.title("Count of Passengers on Flights in the Months of December Over the Years", y=1.01);

png

Based on the plot above, how would you describe the trend of count of passengers on flights in the month of December over time?

With every increasing year from 1949 to 1960, there is an increasing number of passengers on flights.

Based on looking at the plot above, which year had the most number of passengers on flights for the month of December? How many passengers were on all flights then?

df[df['month']=='December'].sort_values(by='passengers', ascending=False).iloc[0]
year              1960
month         December
passengers         432
Name: 143, dtype: object

1960 with 432 passengers.

Given your original DataFrame, create a new column that's the percent change year over year for specific months

The count of passengers for flights in December of 1949 is 118 and the count of passengers for flights in December of 1950 is 140. Therefore, the % change for count of passengers from the previous year's same-month value is:

(140-118)/140*100 = 18.65

You can read more about this metric here:

Hint: you can use the pct_change() method in Pandas. You'll have to use the periods argument and set in a new value from the default. If you're not sure you used pct_change() correctly, you can compare the value above, 18.65%, with the value you see in your new column for December of 1950.

df['passengers_yearly_month_pct_change'] = df['passengers'].pct_change(periods=12)*100
df[df['month']=='December']
year month passengers passengers_yearly_month_pct_change
11 1949 December 118 NaN
23 1950 December 140 18.644068
35 1951 December 166 18.571429
47 1952 December 194 16.867470
59 1953 December 201 3.608247
71 1954 December 229 13.930348
83 1955 December 278 21.397380
95 1956 December 306 10.071942
107 1957 December 336 9.803922
119 1958 December 337 0.297619
131 1959 December 405 20.178042
143 1960 December 432 6.666667

Plot the pct_change_increase column over time for the months of February. % change values > 0 should be a dark blue and % change values < 0 should be red (to denote negative)

  • Should be a bar plot. (Alternatively, you could just put a horizontal line to represent the % change values rather than a bar - but I find a shaded bar easier to see on a plot.)
  • x-axis ticks each represent the months of February over all years provided in the dataset
  • on the x-axis, years should progress from left to right in increasing order from 1949 to 1960
  • y-axis should be % change value from the column pct_change_increase
  • please provide a proper label for the title, x-label and y-label
  • please make the figure size larger
  • please make the font larger for default on the x-ticks, y-ticks, x-label, y-label and title so it's more easily readable
  • for pct_change_increase values greater than 0, please make the bar colors a dark blue, and for bar colors less than 0, please make them red.* (explanation below)
  • only show years 1950 to 1960 (not 1949) because we have no numerical value for 1949.

Hint: you can use Pandas Plot from the documentation at . this page. You should use the kind argument to pass in a value of bar.

Hint: if you use Matplotlib, Pandas Plot or Seaborn, you can use sns.set_context("poster") to make all parts of the plot larger.

Hint: you can see available colors for bars on this documentation page.

Hint: how to change bar colors

Pandas Plot has an argument for color. You could pass a single value like red or a list of values. Hypothetically, let's say you had a bar plot with 3 bars. If you pass 3 colors to the color argument such as ['blue', 'red', 'blue'], the 1st and 3rd bars will be blue and the middle bar will be red. We can use similar logic for our plot below.

How do we get a list of colors like that for our plot below? First create an empty Python list to store our color values. For the pct_change_increase values in the month of February, we can assess if the value is greater than or less than 0. Loop over the pct_change_increase values; if it's greater than 0, append a value to your list of a dark blue color, otherwise append a value of red. Then, use that Python list as the value to the color argument in our plot method.

Hint: the initial code to loop over a field is simply for value in df['column_name']:


Side Example: Various Colors in Pandas Plot
new_colors = ['blue', 'green', 'red', 'yellow', 'orange']
df[(df['month']=='February') & (df['year']>1955)].set_index('year')['passengers'].plot(kind='bar',
                                                                                      color=new_colors);

png


df[df['month']=='February']
year month passengers passengers_yearly_month_pct_change
1 1949 February 118 NaN
13 1950 February 126 6.779661
25 1951 February 150 19.047619
37 1952 February 180 20.000000
49 1953 February 196 8.888889
61 1954 February 188 -4.081633
73 1955 February 233 23.936170
85 1956 February 277 18.884120
97 1957 February 301 8.664260
109 1958 February 318 5.647841
121 1959 February 342 7.547170
133 1960 February 391 14.327485
bar_colors = []

for pct_change in df[(df['month']=='February') & (df['year']>1949)]['passengers_yearly_month_pct_change']:
    if pct_change > 0:
        bar_color='darkslateblue'
    else:
        bar_color='red'
    bar_colors.append(bar_color)
bar_colors
['darkslateblue',
 'darkslateblue',
 'darkslateblue',
 'darkslateblue',
 'red',
 'darkslateblue',
 'darkslateblue',
 'darkslateblue',
 'darkslateblue',
 'darkslateblue',
 'darkslateblue']
df_feb = df[(df['month']=='February') & (df['year']>1949)]
df_feb.set_index('year')['passengers_yearly_month_pct_change'].plot(kind='bar',
                                                                    figsize=(14, 10),
                                                                    ylim=(-5, 25),
                                                                    color=bar_colors,
                                                                    rot=0)
plt.xlabel("Year [February month data only]", labelpad=12)
plt.ylabel("Percentage change from same month of previous year", labelpad=12)
plt.title("Historical % Change for Count of Passengers on Flights in February Months", y=1.01);

png