Data Type Conversion with Python astype() Method
Data Science and Machine Learning require data values to be processed, transformed, and modeled in the correct data format. One of the popular Python functions used for this is the astype() method, which converts data types.
This article will explain the astype() function, its syntax, and provide practical examples of how to use it to convert data types in both DataFrames and datasets.
Understanding astype() function
astype() is a method of Pandas library which is used to convert the data type of a column/series.
Syntax of astype() function
The general syntax of astype() function is as follows:
DataFrame.astype(dtype, copy=True, errors='raise')
Here, dtype
: data type to which we want to convert the column. It can be either a string or a Python object.
copy
: a boolean value to indicate whether to return a new instance of the array or make a copy of the same. errors
: it has three possible values, raise (default value), ignore, or coerce.
astype() returns a copy of the data converted to the specified format. Any scalar value is casted to the corresponding data type.
If the conversion is not possible, an error is raised. Example 1: astype() with a DataFrame
Let’s consider the following DataFrame containing information about people.
import pandas as pd
data = {'Name': ['Anna', 'John', 'Sarah', 'Chris'],
'Age': [23, 45, 32, 19],
'Gender': ['Female', 'Male', 'Female', 'Male']}
df = pd.DataFrame(data, columns=['Name', 'Age', 'Gender'])
print(df)
Output:
Name Age Gender
0 Anna 23 Female
1 John 45 Male
2 Sarah 32 Female
3 Chris 19 Male
We can use the astype() function to change the data type of the ‘Gender’ column from object to category data type.
df['Gender'] = df['Gender'].astype('category')
print(df)
Output:
Name Age Gender
0 Anna 23 Female
1 John 45 Male
2 Sarah 32 Female
3 Chris 19 Male
In the output, we can see that the ‘Gender’ column’s data type is now changed to category.
Example 2: astype() with a Dataset
Lets take an example of a dataset containing information about temperature and humidity during different seasons and nights. Suppose we have the following data in a CSV file named Weather.csv.
Season,Day/Night,Temperature (C),Humidity (%)
Winter,Day,18,55
Winter,Night,14,65
Spring,Day,22,45
Spring,Night,17,63
Summer,Day,28,40
Summer,Night,24,50
Autumn,Day,20,75
Autumn,Night,15,80
We can use astype() function to convert data types of different columns in the dataset.
import pandas as pd
data = pd.read_csv('Weather.csv', delimiter=',')
print(data)
Output:
Season Day/Night Temperature (C) Humidity (%)
0 Winter Day 18 55
1 Winter Night 14 65
2 Spring Day 22 45
3 Spring Night 17 63
4 Summer Day 28 40
5 Summer Night 24 50
6 Autumn Day 20 75
7 Autumn Night 15 80
Now, we can use astype() function to convert data types of different columns in the dataset.
- Converting the data type of Season column to a category
- Converting the data type of Day/Night column to a category
- Converting the data type of Temperature (C) column to float64
- Converting the data type of Humidity (%) column to float64
data['Season'] = data['Season'].astype('category')
data['Day/Night'] = data['Day/Night'].astype('category')
data['Temperature (C)'] = data['Temperature (C)'].astype('float64')
data['Humidity (%)'] = data['Humidity (%)'].astype('float64')
print(data.dtypes)
Output:
Season category
Day/Night category
Temperature (C) float64
Humidity (%) float64
dtype: object
In the output, we can see that the data types of columns have been converted to the required format.
Conclusion
The astype() function is used to convert data types of columns in Python. It helps in data pre-processing as different data types are required for different models.
DataFrame and dataset both can be used with this function, and it is essential to use this function while working with Data Science and Machine Learning. By understanding the syntax and practical examples of the astype() function, we can modify, preprocess, and transform data accurately.
In summary, the astype() function is a method of Pandas library used to convert the data type of a column or series in Python. It is crucial for data pre-processing in Data Science and Machine Learning, as different models require different data types.
The syntax of astype() includes dtype, copy, and errors, and the function returns a copy of the data converted to the specified format. The two practical examples provided demonstrate how to use astype() function to convert data types in both DataFrames and datasets.
Understanding the astype() function is necessary to modify, preprocess, and transform data accurately for data analysis.