Data Analysis Data Wrangling Tutorial

query() method: Query/Filter Columns

In pandas, we can query the columns of DataFrames with boolean expressions using the query() method. I'll walk through lots of simple examples.

Import Modules

import pandas as pd
import seaborn as sns

Get Flights Data

Let's get the flights dataset included in the seaborn library and assign it to the DataFrame df_flights.

df_flights = sns.load_dataset('flights')

Preview the first few rows of df_flights.

Each row represents a month's flight history details. The passengers column represents that total number of passengers that flew that month.

df_flights.head()
year month passengers
0 1949 January 112
1 1949 February 118
2 1949 March 132
3 1949 April 129
4 1949 May 121

This dataset spans 1949 to 1960.

Practice Filtering Rows and Columns

Query for rows in which year is equal to 1949

df_flights.query('year==1949')
year month passengers
0 1949 January 112
1 1949 February 118
2 1949 March 132
3 1949 April 129
4 1949 May 121
5 1949 June 135
6 1949 July 148
7 1949 August 148
8 1949 September 136
9 1949 October 119
10 1949 November 104
11 1949 December 118

Query for rows in which month is equal to January

Notice how 'January' is in single quotes because it's a string.

df_flights.query("month=='January'")
year month passengers
0 1949 January 112
12 1950 January 115
24 1951 January 145
36 1952 January 171
48 1953 January 196
60 1954 January 204
72 1955 January 242
84 1956 January 284
96 1957 January 315
108 1958 January 340
120 1959 January 360
132 1960 January 417

Query for rows in which year is equal to 1949 and month is equal to January

df_flights.query("year==1949 and month=='January'")
year month passengers
0 1949 January 112

Query for rows in which month is January or February

df_flights.query("month==['January', 'February']")
year month passengers
0 1949 January 112
1 1949 February 118
12 1950 January 115
13 1950 February 126
24 1951 January 145
25 1951 February 150
36 1952 January 171
37 1952 February 180
48 1953 January 196
49 1953 February 196
60 1954 January 204
61 1954 February 188
72 1955 January 242
73 1955 February 233
84 1956 January 284
85 1956 February 277
96 1957 January 315
97 1957 February 301
108 1958 January 340
109 1958 February 318
120 1959 January 360
121 1959 February 342
132 1960 January 417
133 1960 February 391

Query for rows in which month equals January and year is less than 1955

df_flights.query("month=='January' and year<1955")
year month passengers
0 1949 January 112
12 1950 January 115
24 1951 January 145
36 1952 January 171
48 1953 January 196
60 1954 January 204

Query for rows in which month equals January and year is greater than 1955

df_flights.query("month=='January' and year>1955")
year month passengers
84 1956 January 284
96 1957 January 315
108 1958 January 340
120 1959 January 360
132 1960 January 417

Query for rows in which month equals January and the year is not 1955

df_flights.query("month=='January' and year!=1955")
year month passengers
0 1949 January 112
12 1950 January 115
24 1951 January 145
36 1952 January 171
48 1953 January 196
60 1954 January 204
84 1956 January 284
96 1957 January 315
108 1958 January 340
120 1959 January 360
132 1960 January 417

Query for rows in which month equals January or year equals 1955

df_flights.query("month=='January' or year==1955")
year month passengers
0 1949 January 112
12 1950 January 115
24 1951 January 145
36 1952 January 171
48 1953 January 196
60 1954 January 204
72 1955 January 242
73 1955 February 233
74 1955 March 267
75 1955 April 269
76 1955 May 270
77 1955 June 315
78 1955 July 364
79 1955 August 347
80 1955 September 312
81 1955 October 274
82 1955 November 237
83 1955 December 278
84 1956 January 284
96 1957 January 315
108 1958 January 340
120 1959 January 360
132 1960 January 417