Dropping Rows Based on Conditions in Pandas DataFrame
Dropping rows based on condition in Pandas DataFrame has become a common task for many data analysis and manipulation projects. With the help of Pandas, a popular data manipulation library in Python, users can easily filter and remove rows based on certain conditions.
This article discusses two methods for dropping rows based on conditions and provides an example DataFrame to illustrate the concepts.
1. Drop Rows Based on One Condition
The first method for dropping rows based on condition involves dropping rows that meet one specified condition.
In Pandas, users can use the drop
function to remove rows based on a single condition.
To illustrate this method, consider the following example DataFrame:
import pandas as pd
df = pd.DataFrame({
'name': ['John', 'Bob', 'Alice', 'Sam'],
'age': [25, 30, 28, 22],
'gender': ['M', 'M', 'F', 'M']
})
print(df)
This will create and display the following table:
name age gender
0 John 25 M
1 Bob 30 M
2 Alice 28 F
3 Sam 22 M
Suppose the user wants to drop rows where the age of the individual is less than 25. To accomplish this, they can use the loc
function, as follows:
df.drop(
df.loc[
df['age'] < 25].index, inplace=True)
print(df)
The output will be the following table:
name age gender
1 Bob 30 M
2 Alice 28 F
In this method, df['age'] < 25
filters the DataFrame based on the condition of age being less than 25. df.loc[df['age'] < 25].index
returns the index of all the rows where the condition is true, and df.drop()
removes the rows by index.
Finally, inplace=True
updates the DataFrame with the new result after removing the rows.
2. Drop Rows Based on Multiple Conditions
The second method for dropping rows based on condition involves dropping rows that meet multiple specified conditions.
In this method, users can drop rows based on multiple conditions using the &
operator for AND and the |
operator for OR. To illustrate this method, consider a different example DataFrame:
import pandas as pd
df = pd.DataFrame({
'name': ['John', 'Bob', 'Alice', 'Sam'],
'age': [25, 30, 28, 22],
'gender': ['M', 'M', 'F', 'M'],
'city': ['New York', 'Los Angeles', 'New York', 'Chicago']
})
print(df)
This will create and display the following table:
name age gender city
0 John 25 M New York
1 Bob 30 M Los Angeles
2 Alice 28 F New York
3 Sam 22 M Chicago
Suppose the user wants to remove all rows where the individual's age is less than or equal to 25 and they live in New York City. One way to achieve this is to use the query
function with logical operators (&
and |
), as follows:
df =
df.query('age > 25 | city != "New York"')
print(df)
The output will be the following table:
name age gender city
1 Bob 30 M Los Angeles
This method uses query
to filter based on age being greater than 25 using age > 25
and city not equal to New York using city != "New York"
. The result is a new DataFrame with the required conditions.
Example DataFrame
When analyzing data in Pandas, the first step is often to read in the data. In this example, the user will generate a small sample dataset using Pandas' DataFrame
function:
import pandas as pd
data = {
'name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
'score': [63, 79, 92, 85, 95],
'gender': ['F', 'M', 'M', 'M', 'F'],
'major': ['Biology', 'Mathematics', 'Computer Science', 'History', 'English'],
'grad_year': [2019, 2020, 2021, 2022, 2021]
}
df = pd.DataFrame(data)
print(df)
This will create and display the following table:
name score gender major grad_year
0 Alice 63 F Biology 2019
1 Bob 79 M Mathematics 2020
2 Charlie 92 M Computer Science 2021
3 David 85 M History 2022
4 Emily 95 F English 2021
Viewing the DataFrame
After creating a DataFrame, it is essential to view the data to ensure it was properly read in and to get a sense of the data's structure. There are several ways to view a Pandas DataFrame.
In Jupyter Notebook, users can call the DataFrame by itself in a cell to display the entire DataFrame, as follows:
df
This will display the full table:
name score gender major grad_year
0 Alice 63 F Biology 2019
1 Bob 79 M Mathematics 2020
2 Charlie 92 M Computer Science 2021
3 David 85 M History 2022
4 Emily 95 F English 2021
Users can also use the head
function to view the first few rows of the DataFrame:
df.head()
This will display the first five rows of the table:
name score gender major grad_year
0 Alice 63 F Biology 2019
1 Bob 79 M Mathematics 2020
2 Charlie 92 M Computer Science 2021
3 David 85 M History 2022
4 Emily 95 F English 2021
Using tail
is a similar function that displays the last few rows:
df.tail()
This will display the last five rows of the table:
name score gender major grad_year
0 Alice 63 F Biology 2019
1 Bob 79 M Mathematics 2020
2 Charlie 92 M Computer Science 2021
3 David 85 M History 2022
4 Emily 95 F English 2021
Conclusion
In conclusion, there are two methods for dropping rows based on conditions in Pandas: dropping rows based on one condition and based on multiple conditions. These methods provide powerful filtering and data manipulation capabilities that users can use to trim their datasets and pursue exploratory data analysis.
Additionally, it is important to properly view the loaded DataFrame to get a sense of the data's contents. By following these practices, users can effectively wrangle and analyze their data.