Adventures in Machine Learning

Mastering Data Management with Pandas Conversion Functions

Data analysis and pre-processing tasks involve converting variables to different data types, ensuring datasets are free from NULL values and creating backup copies for future reference. In Python, Pandas is a reliable library providing flexible data structures and analysis tools for effective data management.

In this article, we will explore Pandas Conversion Functions. We will dive into four subtopics with practical applications of each to help readers attain a better understanding.

Let’s get started!

1. Pandas astype() function

The astype() function is a primary Pandas conversion function for changing the data type of a variable. Often, datasets are stored as strings, and analysts need to convert them to numeric data types such as integers, float, or category data types.

Here, astype() function comes in handy. Let’s dive into an example demonstrating the functionality of astype():

Example: Changing data type of variable to integer

import pandas as pd
data = {'Name': ['Ali', 'Ahmed', 'Neha', 'Sara'], 'Age': ['21', '22', '23', '21'], 'Gender': ['M', 'M', 'F', 'F']}
df = pd.DataFrame(data)
print(df)
print(df.dtypes)
df['Age'] = df['Age'].astype(int)
print(df)
print(df.dtypes)

In this example, we convert the ‘Age’ column’s data type in the dataframe from String to Int via astype() function.

The original dataset looks like this:

Name Age Gender
0 Ali 21 M
1 Ahmed 22 M
2 Neha 23 F
3 Sara 21 F

After conversion, the data type conversion is shown as follows:

Name Age Gender
0 Ali 21 M
1 Ahmed 22 M
2 Neha 23 F
3 Sara 21 F

astype() function can also be used to convert a column to a category data type, Boolean value, or any valid Python object.

2. Checking for NULL values using isna() function

Handling missing or NULL values is a crucial pre-processing task in data analysis. In Python, isna() function is a Pandas conversion function that checks for NULL values in a dataset.

The function returns a boolean value of either true or false based on whether a column contains a null value or not. Let’s see an example:

Example: Checking for null values

import pandas as pd
import numpy as np
data = {'Name': ['Ali', 'Ahmed', np.nan, 'Sara'], 'Age': ['21', '22', '23', 'NaN'], 'Gender':['M', np.nan, 'F', 'F']}
df = pd.DataFrame(data)
print(df.isna())

In this example, we use the isna() function to check for NULL values in the ‘df’ dataset.

After execution, this function prints the number of NULL values for each column in the dataset. Its output will be as follows:

Name Age Gender
0 False False False
1 False False True
2 True False False
3 False True False

3. Segregating non-null values using notna() function

Sometimes, analysts may need to segregate non-NULL values from a dataset. In Python, notna() function is a Pandas conversion function that returns a boolean array reflecting the locations of non-missing values.

Here’s an example demonstrating the function:

Example: Segregating non-null values

import pandas as pd
import numpy as np
data = {'Name': ['Ali', 'Ahmed', np.nan, 'Sara'], 'Age': ['21', '22', '23', 'NaN'], 'Gender':['M', np.nan, 'F', 'F']}
df = pd.DataFrame(data)
print(df.notna())

In this example, we use the notna() function to segregate non-null values in the ‘df’ dataset. After execution, this function prints the non-null values for each column in the dataset.

Its output will be as follows:

Name Age Gender
0 True True True
1 True True False
2 False True True
3 True False True

4. Creating a backup of dataset using dataframe.copy() function

Manipulating datasets can cause data loss; therefore, it is essential to create backup copies to avoid losing important information.

In Python, the dataframe.copy() function is a Pandas conversion function that creates a copy of the original dataset. Any modifications made to the copy will not affect the original dataset hence maintaining a reliable backup.

Here’s an example to demonstrate the function:

Example: Creating a backup of dataset

import pandas as pd
data = {'Name': ['Ali', 'Ahmed', 'Neha', 'Sara'], 'Age': [21, 22, 23, 21], 'Gender': ['M', 'M', 'F', 'F']}
df = pd.DataFrame(data)
#creating backup of dataset
df_copy = df.copy()
#df_copy
print(df_copy)

In this example, we create a backup copy of df using dataframe.copy() function and store it in df_copy. When df_copy is printed, it will contain the same information as df.

Conclusion

In this article, we demonstrated the four Pandas Conversion Functions: astype(), isna(), notna() and dataframe.copy(). We then went ahead and presented practical examples for each so that readers can understand concepts.

Now that you understand how each of these functions works, you can use them independently or together when performing data analysis or pre-processing tasks. Data management is vital in all data analysis projects, so it’s essential to use suitable tools such as Pandas conversion functions to efficiently and reliably handle data.

Popular Posts