Filtering Rows Using “OR” Operator in Pandas
Pandas is a popular data analysis library in Python. One of its key features is the ability to filter rows based on specific conditions.
The “OR” operator is a logical operator that allows us to filter rows based on multiple conditions that could be either numeric or string values. In this article, we will cover how to filter rows using the “OR” operator based on numeric and string values.
Filtering Based on Numeric Values
The “OR” operator can be used to filter rows based on multiple conditions that involve numeric values. For instance, if we have a dataframe that contains information about the age and height of individuals, we may want to filter rows where the age is less than 30 or the height is above 6 feet.
We can do this by creating a filter that includes two or more conditions delimited by the “OR” operator.
Syntax for Filtering
The general syntax for filtering using the “OR” operator based on numeric values is as follows:
df[(condition1) | (condition2) | ...]
Here, df
is the dataframe we want to filter, and the conditions are represented by one or more logical expressions separated by the “OR” operator.
DataFrame and Example
Let’s create a simple dataframe to demonstrate how to filter based on numeric values with the “OR” operator.
import pandas as pd
data = {
'name': ['Anna', 'John', 'Maria', 'Richard', 'Sophie', 'Hannah'],
'age': [24, 35, 29, 41, 22, 27],
'height': [5.6, 5.9, 6.2, 5.8, 6.1, 5.4],
'weight': [140, 180, 150, 175, 130, 160]
}
df = pd.DataFrame(data)
filter = (df['age'] < 30) | (df['height'] > 6)
filtered_df = df[filter]
print(filtered_df)
Output:
name age height weight
0 Anna 24 5.6 140
2 Maria 29 6.2 150
4 Sophie 22 6.1 130
In this example, we have created a filter that selects rows where the age is less than 30 or the height is greater than 6. We then applied the filter to the original dataframe using the indexing notation.
Finally, we printed the resulting filtered dataframe.
Filtering Based on String Values
We can also filter rows based on conditions that involve string values using the “OR” operator. For example, if we have a dataframe that contains information about products and we want to select rows that meet specific criteria, such as category, manufacturer, or name, we can create a filter that includes two or more conditions delimited by the “OR” operator.
Syntax for Filtering
The general syntax for filtering using the “OR” operator based on string values is as follows:
df[(condition1) | (condition2) | ...]
Here, df
is the dataframe we want to filter, and the conditions are represented by one or more logical expressions separated by the “OR” operator.
DataFrame and Example
Let’s create a simple dataframe to demonstrate how to filter based on string values with the “OR” operator.
import pandas as pd
data = {
'category': ['Toys', 'Books', 'Electronics', 'Toys', 'Sports', 'Electronics'],
'manufacturer': ['Mattel', 'Penguin', 'Sony', 'Lego', 'Nike', 'Samsung'],
'name': ['Barbie', 'The Catcher in the Rye', 'PlayStation', 'Lego Classic', 'Running Shoes', 'TV']
}
df = pd.DataFrame(data)
filter = (df['category'] == 'Toys') | (df['manufacturer'] == 'Sony')
filtered_df = df[filter]
print(filtered_df)
Output:
category manufacturer name
0 Toys Mattel Barbie
2 Electronics Sony PlayStation
3 Toys Lego Lego Classic
In this example, we have created a filter that selects rows where the category is “Toys” or the manufacturer is “Sony”. We then applied the filter to the original dataframe using the indexing notation.
Finally, we printed the resulting filtered dataframe.
Conclusion
In conclusion, the “OR” operator is a useful tool when filtering rows in Pandas. It allows us to create filters that include two or more conditions that can be either numeric or string values.
By using the right syntax and logical expressions, we can quickly and efficiently filter datasets for analysis. With this knowledge, we can leverage the “OR” operator to extract meaningful insights from our data.
Example 2: Filtering Based on String Values
Apart from filtering based on numeric values, we can also filter rows based on conditions that involve strings. For instance, if we have a dataset containing information about movies and we want to filter movies that fall under the action or the comedy genre, we can do that with the “OR” operator.
In this section, we will cover how to filter based on string values with the “OR” operator.
Syntax for Filtering
The general syntax for filtering using the “OR” operator based on string values is as follows:
df[(condition1) | (condition2) | ...]
Here, df
is the dataframe we want to filter, and the conditions are represented by one or more logical expressions separated by the “OR” operator.
DataFrame and Example
Let’s create a simple dataframe to demonstrate how to filter based on string values with the “OR” operator.
import pandas as pd
data = {
'title': ['The Dark Knight Rises', 'The Avengers', 'The Lion King', 'The Hangover', 'Forrest Gump', 'Mrs. Doubtfire'],
'genre': ['Action', 'Action', 'Animation', 'Comedy', 'Drama', 'Comedy'],
'year': [2012, 2012, 1994, 2009, 1994, 1993],
'rating': [8.4, 8.1, 8.5, 7.7, 8.8, 6.9]
}
df = pd.DataFrame(data)
filter = (df['genre'] == 'Action') | (df['genre'] == 'Comedy')
filtered_df = df[filter]
print(filtered_df)
Output:
title genre year rating
0 The Dark Knight Rises Action 2012 8.4
1 The Avengers Action 2012 8.1
3 The Hangover Comedy 2009 7.7
5 Mrs. Doubtfire Comedy 1993 6.9
In this example, we have created a filter that selects rows where the genre is “Action” or “Comedy”.
We then applied the filter to the original dataframe using the indexing notation. Finally, we printed the resulting filtered dataframe.
Additional Resources
Filtering is just one of the many operations that can be performed on a Pandas dataframe. There are several other common operations that are very useful for data analysis and manipulation.
To learn more about these operations, including merging, sorting, and grouping, we recommend checking out the official documentation for Pandas. The Pandas documentation provides a lot of examples that cover various functions and operations.
Additionally, there are also several online tutorials and courses that cover the basics of Pandas as well as more advanced topics. Some popular resources include:
- DataCamp’s Pandas tutorial (https://www.datacamp.com/courses/pandas-foundations)
- Pandas documentation (https://pandas.pydata.org/pandas-docs/stable/)
- Python for Data Science Handbook by Jake VanderPlas (https://jakevdp.github.io/PythonDataScienceHandbook/)
By utilizing these resources, you can gain a deeper understanding of Pandas and become more proficient in data analysis.
Filtering rows in pandas is an essential operation for data analysis. The “OR” operator is a powerful tool that allows us to filter rows based on multiple conditions with numeric and string values.
By using the right syntax and logical expressions, we can quickly and efficiently filter datasets for analysis. Additionally, there are several resources available, such as tutorials and documentation, to help us learn more about common operations in Pandas.
Whether you’re a beginner or an experienced user, understanding filtering with the “OR” operator can improve the efficiency and accuracy of your data analysis.