Pandas is a popular data manipulation library used for data analysis in Python. While working with data, it is inevitable to encounter missing values or nulls represented as NaN (Not a Number) in Pandas.
In this article, we will explore how to deal with NaN values in Pandas DataFrames using the fillna() function.
Using the fillna() function in Pandas DataFrame
The fillna() function is used to fill null or NaN values in a Pandas DataFrame. It replaces the missing values with a specified value or a strategy.
Let’s explore how to replace NaN values in one, multiple, or all columns using the fillna() function.
Replace NaN values in one column
Suppose we have a DataFrame, and we want to replace the NaN values in a particular column. We can use the fillna() function with the inplace parameter set as True to replace the NaN values in place.
Here’s an example:
import pandas as pd
df = pd.DataFrame({'A': [1,2,3,4,5], 'B': [1,2,3,None,5]})
print(df)
# Output:
A B
0 1 1.0
1 2 2.0
2 3 3.0
3 4 NaN
4 5 5.0
df['B'].fillna(0, inplace=True)
print(df)
# Output:
A B
0 1 1.0
1 2 2.0
2 3 3.0
3 4 0.0
4 5 5.0
In the above example, we have a DataFrame with missing values in column B. We use the fillna() function with the inplace parameter set as True to replace the missing value with 0.
Replace NaN values in multiple columns
Suppose we have a DataFrame, and we want to replace the NaN values in several columns. We can pass a dictionary of values to the fillna() function with the column names as keys and the replacement values as values.
Here’s an example:
import pandas as pd
df = pd.DataFrame({'A': [1,2,3,4,5], 'B': [1,None,3,None,5], 'C': [None,2,3,None,5]})
print(df)
# Output:
A B C
0 1 1.0 NaN
1 2 NaN 2.0
2 3 3.0 3.0
3 4 NaN NaN
4 5 5.0 5.0
df.fillna({'B': 0, 'C': 'missing'}, inplace=True)
print(df)
# Output:
A B C
0 1 1.0 missing
1 2 0.0 2
2 3 3.0 3
3 4 0.0 missing
4 5 5.0 5
In the above example, we have a DataFrame with missing values in columns B and C. We use the fillna() function with a dictionary of keys as column names and values as their replacement values.
Here we replace NaN values in column B with 0 and in column C with ‘missing’.
Replace NaN values in all columns
Suppose we have a DataFrame, and we want to replace the NaN values in all columns. We can pass a replacement value to the fillna() function to replace NaN values in all columns.
Here’s an example:
import pandas as pd
df = pd.DataFrame({'A': [1,2,3,None,5], 'B': [1,None,3,None,5], 'C': [None,2,3,None,5]})
print(df)
# Output:
A B C
0 1 1.0 NaN
1 2 NaN 2.0
2 3 3.0 3.0
3 4 NaN NaN
4 5 5.0 5.0
df.fillna(0, inplace=True)
print(df)
# Output:
A B C
0 1 1.0 0
1 2 0.0 2
2 3 3.0 3
3 0 0.0 0
4 5 5.0 5
In the above example, we have a DataFrame with missing values in all columns. We use the fillna() function to replace all NaN values in the DataFrame with 0.
Pandas DataFrame with NaN values
To understand how the fillna() method works, we need a DataFrame with NaN values. Let’s see how to create a DataFrame with NaN values and how to view it.
Create a DataFrame with NaN values
We can create a DataFrame with NaN values by passing a list of lists, a dictionary, or by reading a CSV file containing NaN values. Here’s an example of creating a DataFrame with NaN values using a dictionary:
import pandas as pd
data = {'A': [1,None,3,None,5], 'B': [1,None,3,None,5], 'C': [None,2,3,None,None]}
df = pd.DataFrame(data)
print(df)
# Output:
A B C
0 1.0 1.0 NaN
1 NaN NaN 2.0
2 3.0 3.0 3.0
3 NaN NaN NaN
4 5.0 5.0 NaN
In the above example, we have a dictionary containing lists with NaN values. We pass it to the DataFrame constructor to create a DataFrame with NaN values.
View a DataFrame with NaN values
We can view a DataFrame with NaN values using the head() function or the info() function. Here’s an example:
import pandas as pd
data = {'A': [1,None,3,None,5], 'B': [1,None,3,None,5], 'C': [None,2,3,None,None]}
df = pd.DataFrame(data)
print(df.head())
# Output:
A B C
0 1.0 1.0 NaN
1 NaN NaN 2.0
2 3.0 3.0 3.0
3 NaN NaN NaN
4 5.0 5.0 NaN
print(df.info())
# Output:
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 A 3 non-null float64
1 B 3 non-null float64
2 C 2 non-null float64
dtypes: float64(3)
memory usage: 248.0 bytes
None
In the above example, we use the head() function to view the first five rows of the DataFrame, which contains NaN values. We also use the info() function to get information about the DataFrame, such as column names, non-null count, and data types.
Accessing Complete Online Documentation for fillna() Function
Pandas is a powerful library that makes data manipulation easier in Python. The fillna() function is one of the many vital functions of the Pandas library.
To use this function efficiently, we need to understand its parameters, return type, and capabilities. An easy way to get informational support is to access the complete online documentation for the fillna() function, which provides a wealth of information on how to use fillna() and how to implement it in your project.
Here is a step-by-step guide on how to access complete online documentation for the fillna() function in Pandas DataFrame:
- Go to the official website of Pandas Library
- Navigate to ‘User Guide’
- Open the ‘Missing Data’ section
- Explore the fillna() documentation
- Utilize Examples and Exercises
This documentation also includes examples of how to use fillna() in different ways, such as:
- Filling with a scalar value
- Filling with a method of interpolation
- Filling with the previous or next value in the same column
- Conditional filling
- Forward and backward filling
You can also find a precise explanation of each parameter and how they relate to the fillna() function.
In addition to providing a detailed explanation of each aspect of fillna() function, the complete online documentation also includes practical examples and exercises to help you understand the function’s fundamentals.
These exercises and examples walk you through how to handle missing values in your data, including how to use fillna() in real-life situations.
Conclusion
In conclusion, accessing complete online documentation for the fillna() function is an effective way to learn about the functionality of this method in detail. By utilizing the online documentation, you can identify the optional parameters provided by fillna() and master the various techniques involved in using the function efficiently.
Remember, there’s no easy way to handle missing data effectively, but by utilizing the Pandas fillna() method in combination with other powerful Pandas functions, you can handle missing data and gain valuable insights into your data sets. Keep practicing and exploring the various Pandas functions to master data manipulation and analysis.