Data Analysis Data Wrangling Tutorial

melt() Method: Unpivot a DataFrame

In pandas, we can "unpivot" a DataFrame - turn it from a wide format - many columns - to a long format - few columns but many rows. We can accomplish this with the pandas melt() method. This can be helpful for further analysis of our new unpivoted DataFrame.

Import Module

import pandas as pd

Example: Pivot Tesla Car Acceleration Details

Here is fictional acceleration tests for three popular Tesla car models. In order to verify acceleration of the cars, I figured a third-party may make three runs to test the three models alongside one another.

s = 'Tesla Model S P100D'
x = 'Tesla Model X P100D'
three = 'Tesla Model 3 AWD Dual Motor'

s_data = [s, 2.5, 2.51, 2.54]
x_data = [x, 2.92, 2.91, 2.93]
three_data = [three, 3.33, 3.31, 3.35]

data = [s_data, x_data, three_data] 
df = pd.DataFrame(data, columns=['car_model', 'Sept 1 9am', 'Sept 1 10am', 'Sept 1 11am'])
df
car_model Sept 1 9am Sept 1 10am Sept 1 11am
0 Tesla Model S P100D 2.50 2.51 2.54
1 Tesla Model X P100D 2.92 2.91 2.93
2 Tesla Model 3 AWD Dual Motor 3.33 3.31 3.35

Notice how this DataFrame features four columns, one for the car model name, and three for acceleration runs of a car. If they were to continue with this trend of data collection and do far more runs, this dataset would have lots of columns - perhaps making it daunting to visualize and analyze.

I want to "unpivot" this data from a wide format to a long format using the pandas melt() method.

On the df DataFrame, we'll call the melt() method and set the following arguments:

  • id_vars to ['car_model'] since each row from df is identified by the car model name
  • var_name to 'date' since this new column needs a name
  • value_name to '0-60mph_in_seconds' since this new column needs a name
df_unpivoted = df.melt(id_vars=['car_model'], var_name='date', value_name='0-60mph_in_seconds')
df_unpivoted
car_model date 0-60mph_in_seconds
0 Tesla Model S P100D Sept 1 9am 2.50
1 Tesla Model X P100D Sept 1 9am 2.92
2 Tesla Model 3 AWD Dual Motor Sept 1 9am 3.33
3 Tesla Model S P100D Sept 1 10am 2.51
4 Tesla Model X P100D Sept 1 10am 2.91
5 Tesla Model 3 AWD Dual Motor Sept 1 10am 3.31
6 Tesla Model S P100D Sept 1 11am 2.54
7 Tesla Model X P100D Sept 1 11am 2.93
8 Tesla Model 3 AWD Dual Motor Sept 1 11am 3.35

After this "unpivot", we can easily calculate the minimum (essentially the fastest) 0-60 time that we'd publish in a final report. To do so, we take our df_unpivoted DataFrame, group by the car_model column, and find the minimum value in the 0-60mph_in_seconds column.

df_unpivoted.groupby('car_model')['0-60mph_in_seconds'].min()
car_model
Tesla Model 3 AWD Dual Motor    3.31
Tesla Model S P100D             2.50
Tesla Model X P100D             2.91
Name: 0-60mph_in_seconds, dtype: float64