Adventures in Machine Learning

Mastering Data Type Conversion with Python’s astype() Method

Data Type Conversion with Python astype() Method

Data Science and Machine Learning require data values to be processed, transformed, and modeled in the correct data format. One of the popular Python functions used for this is the astype() method, which converts data types.

This article will explain the astype() function, its syntax, and provide practical examples of how to use it to convert data types in both DataFrames and datasets.

Understanding astype() function

astype() is a method of Pandas library which is used to convert the data type of a column/series.

Syntax of astype() function

The general syntax of astype() function is as follows:

DataFrame.astype(dtype, copy=True, errors=’raise’)

Here, dtype: data type to which we want to convert the column. It can be either a string or a Python object.

copy: a boolean value to indicate whether to return a new instance of the array or make a copy of the same. errors: it has three possible values, raise (default value), ignore, or coerce.

astype() returns a copy of the data converted to the specified format. Any scalar value is casted to the corresponding data type.

If the conversion is not possible, an error is raised. Example 1: astype() with a DataFrame

Let’s consider the following DataFrame containing information about people.

import pandas as pd

data = {‘Name’: [‘Anna’, ‘John’, ‘Sarah’, ‘Chris’],

‘Age’: [23, 45, 32, 19],

‘Gender’: [‘Female’, ‘Male’, ‘Female’, ‘Male’]}

df = pd.DataFrame(data, columns=[‘Name’, ‘Age’, ‘Gender’])

print(df)

Output:

Name Age Gender

0 Anna 23 Female

1 John 45 Male

2 Sarah 32 Female

3 Chris 19 Male

We can use the astype() function to change the data type of the ‘Gender’ column from object to category data type. df[‘Gender’] = df[‘Gender’].astype(‘category’)

print(df)

Output:

Name Age Gender

0 Anna 23 Female

1 John 45 Male

2 Sarah 32 Female

3 Chris 19 Male

In the output, we can see that the ‘Gender’ column’s data type is now changed to category.

Example 2: astype() with a Dataset

Lets take an example of a dataset containing information about temperature and humidity during different seasons and nights. Suppose we have the following data in a CSV file named Weather.csv.

Season,Day/Night,Temperature (C),Humidity (%)

Winter,Day,18,55

Winter,Night,14,65

Spring,Day,22,45

Spring,Night,17,63

Summer,Day,28,40

Summer,Night,24,50

Autumn,Day,20,75

Autumn,Night,15,80

We can use astype() function to convert data types of different columns in the dataset. import pandas as pd

data = pd.read_csv(‘Weather.csv’, delimiter=’,’)

print(data)

Output:

Season Day/Night Temperature (C) Humidity (%)

0 Winter Day 18 55

1 Winter Night 14 65

2 Spring Day 22 45

3 Spring Night 17 63

4 Summer Day 28 40

5 Summer Night 24 50

6 Autumn Day 20 75

7 Autumn Night 15 80

Now, we can use astype() function to convert data types of different columns in the dataset.

– Converting the data type of Season column to a category

data[‘Season’] = data[‘Season’].astype(‘category’)

– Converting the data type of Day/Night column to a category

data[‘Day/Night’] = data[‘Day/Night’].astype(‘category’)

– Converting the data type of Temperature (C) column to float64

data[‘Temperature (C)’] = data[‘Temperature (C)’].astype(‘float64’)

– Converting the data type of Humidity (%) column to float64

data[‘Humidity (%)’] = data[‘Humidity (%)’].astype(‘float64’)

print(data.dtypes)

Output:

Season category

Day/Night category

Temperature (C) float64

Humidity (%) float64

dtype: object

In the output, we can see that the data types of columns have been converted to the required format.

Conclusion

The astype() function is used to convert data types of columns in Python. It helps in data pre-processing as different data types are required for different models.

DataFrame and dataset both can be used with this function, and it is essential to use this function while working with Data Science and Machine Learning. By understanding the syntax and practical examples of the astype() function, we can modify, preprocess, and transform data accurately.

In summary, the astype() function is a method of Pandas library used to convert the data type of a column or series in Python. It is crucial for data pre-processing in Data Science and Machine Learning, as different models require different data types.

The syntax of astype() includes dtype, copy, and errors, and the function returns a copy of the data converted to the specified format. The two practical examples provided demonstrate how to use astype() function to convert data types in both DataFrames and datasets.

Understanding the astype() function is necessary to modify, preprocess, and transform data accurately for data analysis.

Popular Posts