melt() Method: Unpivot a DataFrame¶
Date published: 2018-11-25
Category: Data Analysis
Subcategory: Data Wrangling
Tags: unpivot, melt, python, pandas
In pandas, we can "unpivot" a DataFrame - turn it from a wide format - many columns - to a long format - few columns but many rows. We can accomplish this with the pandas melt() method. This can be helpful for further analysis of our new unpivoted DataFrame.
Import Module¶
import pandas as pd
Example: Pivot Tesla Car Acceleration Details¶
Here is fictional acceleration tests for three popular Tesla car models. In order to verify acceleration of the cars, I figured a third-party may make three runs to test the three models alongside one another.
s = 'Tesla Model S P100D'
x = 'Tesla Model X P100D'
three = 'Tesla Model 3 AWD Dual Motor'
s_data = [s, 2.5, 2.51, 2.54]
x_data = [x, 2.92, 2.91, 2.93]
three_data = [three, 3.33, 3.31, 3.35]
data = [s_data, x_data, three_data]
df = pd.DataFrame(data, columns=['car_model', 'Sept 1 9am', 'Sept 1 10am', 'Sept 1 11am'])
df
car_model | Sept 1 9am | Sept 1 10am | Sept 1 11am | |
---|---|---|---|---|
0 | Tesla Model S P100D | 2.50 | 2.51 | 2.54 |
1 | Tesla Model X P100D | 2.92 | 2.91 | 2.93 |
2 | Tesla Model 3 AWD Dual Motor | 3.33 | 3.31 | 3.35 |
Notice how this DataFrame features four columns, one for the car model name, and three for acceleration runs of a car. If they were to continue with this trend of data collection and do far more runs, this dataset would have lots of columns - perhaps making it daunting to visualize and analyze.
I want to "unpivot" this data from a wide format to a long format using the pandas melt()
method.
On the df
DataFrame, we'll call the melt()
method and set the following arguments:
id_vars
to['car_model']
since each row fromdf
is identified by the car model namevar_name
to'date'
since this new column needs a namevalue_name
to'0-60mph_in_seconds'
since this new column needs a name
df_unpivoted = df.melt(id_vars=['car_model'], var_name='date', value_name='0-60mph_in_seconds')
df_unpivoted
car_model | date | 0-60mph_in_seconds | |
---|---|---|---|
0 | Tesla Model S P100D | Sept 1 9am | 2.50 |
1 | Tesla Model X P100D | Sept 1 9am | 2.92 |
2 | Tesla Model 3 AWD Dual Motor | Sept 1 9am | 3.33 |
3 | Tesla Model S P100D | Sept 1 10am | 2.51 |
4 | Tesla Model X P100D | Sept 1 10am | 2.91 |
5 | Tesla Model 3 AWD Dual Motor | Sept 1 10am | 3.31 |
6 | Tesla Model S P100D | Sept 1 11am | 2.54 |
7 | Tesla Model X P100D | Sept 1 11am | 2.93 |
8 | Tesla Model 3 AWD Dual Motor | Sept 1 11am | 3.35 |
After this "unpivot", we can easily calculate the minimum (essentially the fastest) 0-60 time that we'd publish in a final report. To do so, we take our df_unpivoted
DataFrame, group by the car_model
column, and find the minimum value in the 0-60mph_in_seconds
column.
df_unpivoted.groupby('car_model')['0-60mph_in_seconds'].min()
car_model Tesla Model 3 AWD Dual Motor 3.31 Tesla Model S P100D 2.50 Tesla Model X P100D 2.91 Name: 0-60mph_in_seconds, dtype: float64