Data analysis and pre-processing tasks involve converting variables to different data types, ensuring datasets are free from NULL values and creating backup copies for future reference. In Python, Pandas is a reliable library providing flexible data structures and analysis tools for effective data management.
In this article, we will explore Pandas Conversion Functions. We will dive into four subtopics with practical applications of each to help readers attain a better understanding.
Let’s get started!
1. Pandas astype() function
The astype() function is a primary Pandas conversion function for changing the data type of a variable. Often, datasets are stored as strings, and analysts need to convert them to numeric data types such as integers, float, or category data types.
Here, astype() function comes in handy. Let’s dive into an example demonstrating the functionality of astype():
Example: Changing data type of variable to integer
import pandas as pd
data = {'Name': ['Ali', 'Ahmed', 'Neha', 'Sara'], 'Age': ['21', '22', '23', '21'], 'Gender': ['M', 'M', 'F', 'F']}
df = pd.DataFrame(data)
print(df)
print(df.dtypes)
df['Age'] = df['Age'].astype(int)
print(df)
print(df.dtypes)
In this example, we convert the ‘Age’ column’s data type in the dataframe from String to Int via astype() function.
The original dataset looks like this:
Name | Age | Gender | |
---|---|---|---|
0 | Ali | 21 | M |
1 | Ahmed | 22 | M |
2 | Neha | 23 | F |
3 | Sara | 21 | F |
After conversion, the data type conversion is shown as follows:
Name | Age | Gender | |
---|---|---|---|
0 | Ali | 21 | M |
1 | Ahmed | 22 | M |
2 | Neha | 23 | F |
3 | Sara | 21 | F |
astype() function can also be used to convert a column to a category data type, Boolean value, or any valid Python object.
2. Checking for NULL values using isna() function
Handling missing or NULL values is a crucial pre-processing task in data analysis. In Python, isna() function is a Pandas conversion function that checks for NULL values in a dataset.
The function returns a boolean value of either true or false based on whether a column contains a null value or not. Let’s see an example:
Example: Checking for null values
import pandas as pd
import numpy as np
data = {'Name': ['Ali', 'Ahmed', np.nan, 'Sara'], 'Age': ['21', '22', '23', 'NaN'], 'Gender':['M', np.nan, 'F', 'F']}
df = pd.DataFrame(data)
print(df.isna())
In this example, we use the isna() function to check for NULL values in the ‘df’ dataset.
After execution, this function prints the number of NULL values for each column in the dataset. Its output will be as follows:
Name | Age | Gender | |
---|---|---|---|
0 | False | False | False |
1 | False | False | True |
2 | True | False | False |
3 | False | True | False |
3. Segregating non-null values using notna() function
Sometimes, analysts may need to segregate non-NULL values from a dataset. In Python, notna() function is a Pandas conversion function that returns a boolean array reflecting the locations of non-missing values.
Here’s an example demonstrating the function:
Example: Segregating non-null values
import pandas as pd
import numpy as np
data = {'Name': ['Ali', 'Ahmed', np.nan, 'Sara'], 'Age': ['21', '22', '23', 'NaN'], 'Gender':['M', np.nan, 'F', 'F']}
df = pd.DataFrame(data)
print(df.notna())
In this example, we use the notna() function to segregate non-null values in the ‘df’ dataset. After execution, this function prints the non-null values for each column in the dataset.
Its output will be as follows:
Name | Age | Gender | |
---|---|---|---|
0 | True | True | True |
1 | True | True | False |
2 | False | True | True |
3 | True | False | True |
4. Creating a backup of dataset using dataframe.copy() function
Manipulating datasets can cause data loss; therefore, it is essential to create backup copies to avoid losing important information.
In Python, the dataframe.copy() function is a Pandas conversion function that creates a copy of the original dataset. Any modifications made to the copy will not affect the original dataset hence maintaining a reliable backup.
Here’s an example to demonstrate the function:
Example: Creating a backup of dataset
import pandas as pd
data = {'Name': ['Ali', 'Ahmed', 'Neha', 'Sara'], 'Age': [21, 22, 23, 21], 'Gender': ['M', 'M', 'F', 'F']}
df = pd.DataFrame(data)
#creating backup of dataset
df_copy = df.copy()
#df_copy
print(df_copy)
In this example, we create a backup copy of df using dataframe.copy() function and store it in df_copy. When df_copy is printed, it will contain the same information as df.
Conclusion
In this article, we demonstrated the four Pandas Conversion Functions: astype(), isna(), notna() and dataframe.copy(). We then went ahead and presented practical examples for each so that readers can understand concepts.
Now that you understand how each of these functions works, you can use them independently or together when performing data analysis or pre-processing tasks. Data management is vital in all data analysis projects, so it’s essential to use suitable tools such as Pandas conversion functions to efficiently and reliably handle data.