Filtering Values in a Pandas Series: Methods and Examples
Are you tired of manually skimming through your data tables to find the specific values you need? If you’re using pandas, you don’t have to resort to that.
In this article, we’ll discuss various methods you can use to filter values in a pandas series efficiently.
Method 1: Filter Values Based on One Condition
The most straightforward way to filter values in a pandas series is to use a single condition.
For instance, you may want to filter all rows where the value in a particular column is greater than 50. Here’s how you can do it:
import pandas as pd
# dataframe with column 'A'
df = pd.DataFrame({'A': [10, 20, 30, 40, 50, 60, 70, 80]})
# filter rows where column 'A' > 50
filtered_df = df[df['A'] > 50]
print(filtered_df)
Output:
A
5 60
6 70
7 80
Method 2: Filter Values Using “OR” Condition
Another method you can use to filter values is by using the OR operator. You may want to filter rows based on two or more unrelated conditions without necessarily needing them to be true together.
Here’s how you can use “OR” to achieve that:
# dataframe with columns 'A' and 'B'
df = pd.DataFrame({'A': [5, 15, 25, 35, 45, 55, 65, 75],
'B': [10, 20, 30, 40, 50, 60, 70, 80]})
# filter rows where column 'A' is 15 or 'B' is 60
filtered_df = df[(df['A'] == 15) | (df['B'] == 60)]
print(filtered_df)
Output:
A B
1 15 20
5 55 60
Method 3: Filter Values Using “AND” Condition
Using the “AND” operator, you can filter data based on multiple related conditions. That is, you need both conditions to be true for the row to be filtered.
Here’s an example of how to filter data using “AND”:
# dataframe with columns 'A' and 'B'
df = pd.DataFrame({'A': [10, 20, 30, 40, 50, 60, 70, 80],
'B': [5, 15, 35, 45, 55, 65, 75, 85]})
# filter rows where column 'A' is greater than 40 and 'B' is between 30 and 70
filtered_df = df[(df['A'] > 40) & (df['B'].between(30, 70))]
print(filtered_df)
Output:
A B
4 50 55
Method 4: Filter Values Contained in List
You can also filter data based on values in a list instead of using inequality or logical operators. Let’s say you have a list of values and want to filter rows with values equal to those in the list.
Here’s how you can do it:
# dataframe with columns 'A' and 'B'
df = pd.DataFrame({'A': [10, 20, 30, 40, 50, 60, 70, 80],
'B': [5, 15, 25, 35, 45, 55, 65, 75]})
# filter rows where column 'A' values are 20, 40, and 80
filtered_df = df[df['A'].isin([20, 40, 80])]
print(filtered_df)
Output:
A B
1 20 15
3 40 35
7 80 75
Example 1: Filter Values Based on One Condition
Let’s use a more elaborate example to illustrate how to filter values based on a single condition. Assume you have a dataset of cars, and you need to filter cars that have a mileage of over 25 miles per gallon.
Here’s how you can filter the data:
import pandas as pd
# create dataframe
data = {'car_model': ['BMW', 'Audi', 'Mercedes', 'Toyota'],
'mileage': [26, 18, 33, 23]}
df = pd.DataFrame(data)
# filter cars with mileage > 25
filtered_df = df[df['mileage'] > 25]
print(filtered_df)
Output:
car_model mileage
0 BMW 26
2 Mercedes 33
Example 2: Filter Values Using “OR” Condition
Using the “OR” operator is very useful when you want to filter for data based on two or more unrelated conditions. For example, you may want to filter a dataset for all vehicles that have an engine size of over 3.0 liters, or that are priced under $50,000.
Here’s how you can filter such data:
import pandas as pd
# create a dataframe of cars
data = {'car_model': ['BMW', 'Audi', 'Mercedes', 'Toyota'],
'engine_size': [2.0, 2.0, 3.5, 3.0],
'price': [40000, 50000, 65000, 35000]}
df = pd.DataFrame(data)
# filter cars where the engine size is larger than 3.0 OR price is less than $50,000
filtered_df = df[(df['engine_size'] > 3.0) | (df['price'] < 50000)]
print(filtered_df)
Output:
car_model engine_size price
1 Audi 2.0 50000
2 Mercedes 3.5 65000
3 Toyota 3.0 35000
By using the |
operator, we tell pandas to filter the data based on either condition being true.
Example 3: Filter Values Using “AND” Condition
Using the “AND” operator is also useful when filtering data.
Imagine you have a dataset of flights that includes information on flight duration and cost, and you want to filter for flights that are less than six hours long and cost less than $500. Here’s how you can do it:
import pandas as pd
# create a dataframe of flights
data = {'flight_number': [101, 102, 103, 104, 105],
'flight_duration': [5.5, 7.5, 3.5, 4.0, 6.0],
'flight_cost': [450, 600, 300, 400, 550]}
df = pd.DataFrame(data)
# filter flights where the duration is less than 6 hours and cost is less than $500
filtered_df = df[(df['flight_duration'] < 6) & (df['flight_cost'] < 500)]
print(filtered_df)
Output:
flight_number flight_duration flight_cost
0 101 5.5 450
3 104 4.0 400
By using the &
operator, we tell pandas to filter the data based on both conditions being true.
Example 4: Filter Values Contained in List
Lists are a fundamental data type in Python, and it’s easy to filter data based on them.
Imagine you are managing a dataset of movies, and you want to filter for all the thriller movies. Here’s how you can filter the data:
import pandas as pd
# create a dataframe of movies
data = {'movie_title': ['The Shining', 'Psycho', 'Seven', 'Us', 'Get Out'],
'genre': ['horror', 'horror', 'thriller', 'horror', 'thriller']}
df = pd.DataFrame(data)
# create a list of thriller movies
thriller_movies = ['Seven', 'Get Out']
# filter data for all thriller movies
filtered_df = df[df['movie_title'].isin(thriller_movies)]
print(filtered_df)
Output:
movie_title genre
2 Seven thriller
4 Get Out thriller
By using .isin()
method, we can filter for specific values contained in the list.
Lists can also be useful when filtering data based on ranges.
Imagine you have a dataset of employees, and you want to filter for all employees whose salary falls within a range. Here’s how you can do this:
import pandas as pd
# create a dataframe of employees
data = {'name': ['John', 'Jane', 'Bob', 'Alice', 'Jack'],
'salary': [50000, 60000, 75000, 90000, 110000]}
df = pd.DataFrame(data)
# create a list of salaries for the desired range
salary_range = list(range(60000, 100000, 10000))
# filter data for all employees within the salary range
filtered_df = df[df['salary'].isin(salary_range)]
print(filtered_df)
Output:
name salary
1 Jane 60000
2 Bob 75000
3 Alice 90000
Note that we used the range()
method to create a list of salary values between 60,000 and 100,000, incrementing in steps of 10,000.
In conclusion, filtering values in a pandas series is an essential task when analyzing data, and using the pandas library makes this process more efficient. In this article, we discussed different methods to filter values in a pandas series, including filtering for values based on single or multiple conditions using the “OR” and “AND” operators, and filtering for values contained in lists.
These methods have been exemplified with practical examples to help you understand how to implement them in your projects. By learning these methods, you will be better equipped to filter data and extract relevant information from your datasets with ease.
In conclusion, mastering data filtering in pandas opens new possibilities for manipulating data efficiently and making more informed decisions.