Using fillna() function in Pandas DataFrame
Data analysis and management is an essential aspect of any business. While working with data, one of the common issues faced is missing values.
These missing values can be due to various reasons like poor quality data, missing data, or measurement errors. Such data can hinder data analysis and produce erroneous results.
However, with Pandas, it is possible to handle missing values with ease. Pandas provides the fillna()
method to replace these missing values with appropriate values.
In this article, we will discuss three different methods of using fillna()
function in pandas DataFrame to fill NaN values in one column, multiple columns, and all columns with median values.
Method 1: Fill NaN Values in One Column with Median
The fillna()
method is used to fill NaN values with a specific value or technique that can be specified.
Here’s a basic syntax to implement this method:
DataFrame.fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None)
To fill NaN values in one column with median, we can use the following code:
df['column_name'].fillna((df['column_name'].median()), inplace=True)
Wherein df
is the DataFrame and 'column_name'
is the name of the column we want to fill NaN values with median.
Example 1:
Let’s say we have a DataFrame with a column ‘Age’ that has some NaN values.
We can fill these NaN values with median using the following code.
import pandas as pd
df = pd.read_csv('data.csv')
# Print the original DataFrame
print("Original DataFrame:n", df)
# Fill NaN values in Age column with median
df['Age'].fillna((df['Age'].median()), inplace=True)
# Print updated DataFrame
print("nUpdated DataFrame:n", df)
Method 2: Fill NaN Values in Multiple Columns with Median
Using the same fillna()
method, we can fill NaN values in multiple columns with median values. To fill NaN values in multiple columns, we can use the following code:
df.fillna(df.median(), inplace=True)
Wherein df
is the DataFrame that contains multiple columns with NaN values.
Example 2:
Let’s consider an example where we have a DataFrame with multiple columns ‘Age’, ‘Salary’, and ‘Contribution’ with several NaN values. We can fill these NaN values in all three columns by applying the same fillna()
method to the DataFrame.
import pandas as pd
df = pd.read_csv('data.csv')
# Print original DataFrame
print("Original DataFrame:n", df)
# Fill NaN values in multiple columns with median
df.fillna(df.median(), inplace=True)
# Print updated DataFrame
print("nUpdated DataFrame:n", df)
Method 3: Fill NaN Values in All Columns with Median
In some cases, it may be useful to replace all NaN values in the entire Dataset with a specific value like median. To replace all NaN values with median, we can use the same fillna()
method with built-in Pandas’ methods.
Example 3:
Let’s consider an example where the DataFrame has NaN values in every column. We can replace all NaN values with median as shown below.
import pandas as pd
df = pd.read_csv('data.csv')
# Print original DataFrame
print("Original DataFrame:n", df)
# Fill NaN values in all columns with median
df.fillna(df.median(), inplace=True)
# Print updated DataFrame
print("nUpdated DataFrame:n", df)
Conclusion
In this article, we discussed how to use the fillna()
method in Pandas DataFrame to replace NaN values with median. Method 1 focused on updating NaN values in a single column, Method 2 updated NaN values in multiple columns, and Method 3 updated NaN values in the entire dataset.
There are other methods available to fill NaN values in DataFrame, like using interpolate()
method or using forward or backward fill. Nevertheless, using fillna()
method with median replacement is a common approach that provides reliable and consistent results.
We hope this article has provided useful insights into managing NaN values encountered while working with Pandas DataFrame. By adopting the right approach, data quality can be maintained, and accurate results can be obtained while performing data analysis.