Data Analysis Data Wrangling Tutorial

melt() Method: Unpivot a DataFrame

In pandas, we can "unpivot" a DataFrame - turn it from a wide format - many columns - to a long format - few columns but many rows. We can accomplish this with the pandas melt() method. This can be helpful for further analysis of our new unpivoted DataFrame.

Import Module

In [20]:
import pandas as pd

Example: Pivot Tesla Car Acceleration Details

Here is fictional acceleration tests for three popular Tesla car models. In order to verify acceleration of the cars, I figured a third-party may make three runs to test the three models alongside one another.

In [21]:
s = 'Tesla Model S P100D'
x = 'Tesla Model X P100D'
three = 'Tesla Model 3 AWD Dual Motor'

s_data = [s, 2.5, 2.51, 2.54]
x_data = [x, 2.92, 2.91, 2.93]
three_data = [three, 3.33, 3.31, 3.35]

data = [s_data, x_data, three_data] 
df = pd.DataFrame(data, columns=['car_model', 'Sept 1 9am', 'Sept 1 10am', 'Sept 1 11am'])
df
Out[21]:
car_model Sept 1 9am Sept 1 10am Sept 1 11am
0 Tesla Model S P100D 2.50 2.51 2.54
1 Tesla Model X P100D 2.92 2.91 2.93
2 Tesla Model 3 AWD Dual Motor 3.33 3.31 3.35

Notice how this DataFrame features four columns, one for the car model name, and three for acceleration runs of a car. If they were to continue with this trend of data collection and do far more runs, this dataset would have lots of columns - perhaps making it daunting to visualize and analyze.

I want to "unpivot" this data from a wide format to a long format using the pandas melt() method.

On the df DataFrame, we'll call the melt() method and set the following arguments:

  • id_vars to ['car_model'] since each row from df is identified by the car model name
  • var_name to 'date' since this new column needs a name
  • value_name to '0-60mph_in_seconds' since this new column needs a name
In [22]:
df_unpivoted = df.melt(id_vars=['car_model'], var_name='date', value_name='0-60mph_in_seconds')
df_unpivoted
Out[22]:
car_model date 0-60mph_in_seconds
0 Tesla Model S P100D Sept 1 9am 2.50
1 Tesla Model X P100D Sept 1 9am 2.92
2 Tesla Model 3 AWD Dual Motor Sept 1 9am 3.33
3 Tesla Model S P100D Sept 1 10am 2.51
4 Tesla Model X P100D Sept 1 10am 2.91
5 Tesla Model 3 AWD Dual Motor Sept 1 10am 3.31
6 Tesla Model S P100D Sept 1 11am 2.54
7 Tesla Model X P100D Sept 1 11am 2.93
8 Tesla Model 3 AWD Dual Motor Sept 1 11am 3.35

After this "unpivot", we can easily calculate the minimum (essentially the fastest) 0-60 time that we'd publish in a final report. To do so, we take our df_unpivoted DataFrame, group by the car_model column, and find the minimum value in the 0-60mph_in_seconds column.

In [23]:
df_unpivoted.groupby('car_model')['0-60mph_in_seconds'].min()
Out[23]:
car_model
Tesla Model 3 AWD Dual Motor    3.31
Tesla Model S P100D             2.50
Tesla Model X P100D             2.91
Name: 0-60mph_in_seconds, dtype: float64