query() method: Query/Filter Columns¶
Date published: 2018-11-24
Category: Data Analysis
Subcategory: Data Wrangling
Tags: query, python, pandas
In pandas, we can query the columns of DataFrames with boolean expressions using the query() method. I'll walk through lots of simple examples.
Import Modules¶
In [2]:
Copied!
import pandas as pd
import seaborn as sns
import pandas as pd
import seaborn as sns
Get Flights Data¶
Let's get the flights
dataset included in the seaborn
library and assign it to the DataFrame df_flights
.
In [3]:
Copied!
df_flights = sns.load_dataset('flights')
df_flights = sns.load_dataset('flights')
Preview the first few rows of df_flights
.
Each row represents a month's flight history details. The passengers
column represents that total number of passengers that flew that month.
In [4]:
Copied!
df_flights.head()
df_flights.head()
Out[4]:
year | month | passengers | |
---|---|---|---|
0 | 1949 | January | 112 |
1 | 1949 | February | 118 |
2 | 1949 | March | 132 |
3 | 1949 | April | 129 |
4 | 1949 | May | 121 |
This dataset spans 1949 to 1960.
Practice Filtering Rows and Columns¶
Query for rows in which year is equal to 1949¶
In [5]:
Copied!
df_flights.query('year==1949')
df_flights.query('year==1949')
Out[5]:
year | month | passengers | |
---|---|---|---|
0 | 1949 | January | 112 |
1 | 1949 | February | 118 |
2 | 1949 | March | 132 |
3 | 1949 | April | 129 |
4 | 1949 | May | 121 |
5 | 1949 | June | 135 |
6 | 1949 | July | 148 |
7 | 1949 | August | 148 |
8 | 1949 | September | 136 |
9 | 1949 | October | 119 |
10 | 1949 | November | 104 |
11 | 1949 | December | 118 |
Query for rows in which month is equal to January¶
Notice how 'January'
is in single quotes because it's a string.
In [6]:
Copied!
df_flights.query("month=='January'")
df_flights.query("month=='January'")
Out[6]:
year | month | passengers | |
---|---|---|---|
0 | 1949 | January | 112 |
12 | 1950 | January | 115 |
24 | 1951 | January | 145 |
36 | 1952 | January | 171 |
48 | 1953 | January | 196 |
60 | 1954 | January | 204 |
72 | 1955 | January | 242 |
84 | 1956 | January | 284 |
96 | 1957 | January | 315 |
108 | 1958 | January | 340 |
120 | 1959 | January | 360 |
132 | 1960 | January | 417 |
Query for rows in which year is equal to 1949 and month is equal to January¶
In [7]:
Copied!
df_flights.query("year==1949 and month=='January'")
df_flights.query("year==1949 and month=='January'")
Out[7]:
year | month | passengers | |
---|---|---|---|
0 | 1949 | January | 112 |
Query for rows in which month is January or February¶
In [8]:
Copied!
df_flights.query("month==['January', 'February']")
df_flights.query("month==['January', 'February']")
Out[8]:
year | month | passengers | |
---|---|---|---|
0 | 1949 | January | 112 |
1 | 1949 | February | 118 |
12 | 1950 | January | 115 |
13 | 1950 | February | 126 |
24 | 1951 | January | 145 |
25 | 1951 | February | 150 |
36 | 1952 | January | 171 |
37 | 1952 | February | 180 |
48 | 1953 | January | 196 |
49 | 1953 | February | 196 |
60 | 1954 | January | 204 |
61 | 1954 | February | 188 |
72 | 1955 | January | 242 |
73 | 1955 | February | 233 |
84 | 1956 | January | 284 |
85 | 1956 | February | 277 |
96 | 1957 | January | 315 |
97 | 1957 | February | 301 |
108 | 1958 | January | 340 |
109 | 1958 | February | 318 |
120 | 1959 | January | 360 |
121 | 1959 | February | 342 |
132 | 1960 | January | 417 |
133 | 1960 | February | 391 |
Query for rows in which month equals January and year is less than 1955¶
In [9]:
Copied!
df_flights.query("month=='January' and year<1955")
df_flights.query("month=='January' and year<1955")
Out[9]:
year | month | passengers | |
---|---|---|---|
0 | 1949 | January | 112 |
12 | 1950 | January | 115 |
24 | 1951 | January | 145 |
36 | 1952 | January | 171 |
48 | 1953 | January | 196 |
60 | 1954 | January | 204 |
Query for rows in which month equals January and year is greater than 1955¶
In [10]:
Copied!
df_flights.query("month=='January' and year>1955")
df_flights.query("month=='January' and year>1955")
Out[10]:
year | month | passengers | |
---|---|---|---|
84 | 1956 | January | 284 |
96 | 1957 | January | 315 |
108 | 1958 | January | 340 |
120 | 1959 | January | 360 |
132 | 1960 | January | 417 |
Query for rows in which month equals January and the year is not 1955¶
In [11]:
Copied!
df_flights.query("month=='January' and year!=1955")
df_flights.query("month=='January' and year!=1955")
Out[11]:
year | month | passengers | |
---|---|---|---|
0 | 1949 | January | 112 |
12 | 1950 | January | 115 |
24 | 1951 | January | 145 |
36 | 1952 | January | 171 |
48 | 1953 | January | 196 |
60 | 1954 | January | 204 |
84 | 1956 | January | 284 |
96 | 1957 | January | 315 |
108 | 1958 | January | 340 |
120 | 1959 | January | 360 |
132 | 1960 | January | 417 |
Query for rows in which month equals January or year equals 1955¶
In [12]:
Copied!
df_flights.query("month=='January' or year==1955")
df_flights.query("month=='January' or year==1955")
Out[12]:
year | month | passengers | |
---|---|---|---|
0 | 1949 | January | 112 |
12 | 1950 | January | 115 |
24 | 1951 | January | 145 |
36 | 1952 | January | 171 |
48 | 1953 | January | 196 |
60 | 1954 | January | 204 |
72 | 1955 | January | 242 |
73 | 1955 | February | 233 |
74 | 1955 | March | 267 |
75 | 1955 | April | 269 |
76 | 1955 | May | 270 |
77 | 1955 | June | 315 |
78 | 1955 | July | 364 |
79 | 1955 | August | 347 |
80 | 1955 | September | 312 |
81 | 1955 | October | 274 |
82 | 1955 | November | 237 |
83 | 1955 | December | 278 |
84 | 1956 | January | 284 |
96 | 1957 | January | 315 |
108 | 1958 | January | 340 |
120 | 1959 | January | 360 |
132 | 1960 | January | 417 |