Adventures in Machine Learning

Mastering Data Type Conversion with Python’s astype() Method

Data Type Conversion with Python astype() Method

Data Science and Machine Learning require data values to be processed, transformed, and modeled in the correct data format. One of the popular Python functions used for this is the astype() method, which converts data types.

This article will explain the astype() function, its syntax, and provide practical examples of how to use it to convert data types in both DataFrames and datasets.

Understanding astype() function

astype() is a method of Pandas library which is used to convert the data type of a column/series.

Syntax of astype() function

The general syntax of astype() function is as follows:

DataFrame.astype(dtype, copy=True, errors='raise')

Here, dtype: data type to which we want to convert the column. It can be either a string or a Python object.

copy: a boolean value to indicate whether to return a new instance of the array or make a copy of the same. errors: it has three possible values, raise (default value), ignore, or coerce.

astype() returns a copy of the data converted to the specified format. Any scalar value is casted to the corresponding data type.

If the conversion is not possible, an error is raised. Example 1: astype() with a DataFrame

Let’s consider the following DataFrame containing information about people.

import pandas as pd
data = {'Name': ['Anna', 'John', 'Sarah', 'Chris'],
        'Age': [23, 45, 32, 19],
        'Gender': ['Female', 'Male', 'Female', 'Male']}
df = pd.DataFrame(data, columns=['Name', 'Age', 'Gender'])
print(df)

Output:

       Name  Age  Gender
    0   Anna    23   Female
    1   John    45     Male
    2  Sarah    32   Female
    3  Chris    19     Male

We can use the astype() function to change the data type of the ‘Gender’ column from object to category data type.

df['Gender'] = df['Gender'].astype('category')
print(df)

Output:

       Name  Age Gender
    0   Anna    23  Female
    1   John    45    Male
    2  Sarah    32  Female
    3  Chris    19    Male

In the output, we can see that the ‘Gender’ column’s data type is now changed to category.

Example 2: astype() with a Dataset

Lets take an example of a dataset containing information about temperature and humidity during different seasons and nights. Suppose we have the following data in a CSV file named Weather.csv.

Season,Day/Night,Temperature (C),Humidity (%)
Winter,Day,18,55
Winter,Night,14,65
Spring,Day,22,45
Spring,Night,17,63
Summer,Day,28,40
Summer,Night,24,50
Autumn,Day,20,75
Autumn,Night,15,80

We can use astype() function to convert data types of different columns in the dataset.

import pandas as pd
data = pd.read_csv('Weather.csv', delimiter=',')
print(data)

Output:

      Season Day/Night  Temperature (C)  Humidity (%)
    0  Winter       Day                18            55
    1  Winter     Night                14            65
    2  Spring       Day                22            45
    3  Spring     Night                17            63
    4  Summer       Day                28            40
    5  Summer     Night                24            50
    6  Autumn       Day                20            75
    7  Autumn     Night                15            80

Now, we can use astype() function to convert data types of different columns in the dataset.

  • Converting the data type of Season column to a category
  • data['Season'] = data['Season'].astype('category')
  • Converting the data type of Day/Night column to a category
  • data['Day/Night'] = data['Day/Night'].astype('category')
  • Converting the data type of Temperature (C) column to float64
  • data['Temperature (C)'] = data['Temperature (C)'].astype('float64')
  • Converting the data type of Humidity (%) column to float64
  • data['Humidity (%)'] = data['Humidity (%)'].astype('float64')
    print(data.dtypes)

Output:

Season              category
Day/Night           category
Temperature (C)     float64
Humidity (%)         float64
dtype: object

In the output, we can see that the data types of columns have been converted to the required format.

Conclusion

The astype() function is used to convert data types of columns in Python. It helps in data pre-processing as different data types are required for different models.

DataFrame and dataset both can be used with this function, and it is essential to use this function while working with Data Science and Machine Learning. By understanding the syntax and practical examples of the astype() function, we can modify, preprocess, and transform data accurately.

In summary, the astype() function is a method of Pandas library used to convert the data type of a column or series in Python. It is crucial for data pre-processing in Data Science and Machine Learning, as different models require different data types.

The syntax of astype() includes dtype, copy, and errors, and the function returns a copy of the data converted to the specified format. The two practical examples provided demonstrate how to use astype() function to convert data types in both DataFrames and datasets.

Understanding the astype() function is necessary to modify, preprocess, and transform data accurately for data analysis.

Popular Posts