Adventures in Machine Learning

Mastering Data Types in Pandas: A Beginner’s Guide

Get to Know Your Pandas: Checking Data Type in Pandas DataFrame

If you’re working with data, chances are Pandas is on your side. The Pandas library is one of the most powerful tools in Python, particularly for data analysis and manipulation.

Among its many functions, Pandas allows you to create dataframes, which are two-dimensional tables used for storing and manipulating data.

When working with dataframes, it’s crucial to understand the data you’re dealing with.

One of the first things you need to know is the data type of each column in your dataframe. The data type determines how the data is stored, manipulated, and used.

In this article, we will look at two methods for checking data types in a Pandas dataframe.

Gathering Data for a Pandas DataFrame

Before we can create a dataframe and check its data types, we need to gather some data. In this example, we will be using a dataset of basketball players’ performances in the NBA playoffs.

import pandas as pd
data = {'Player Name': ['LeBron James','Stephen Curry','Kevin Durant','James Harden'],
        'Points per Game': [26.4,31.1,28.0,28.6],
        'Assists per Game': [10.9,5.4,4.0,8.0],
        'Rebounds per Game': [8.3,5.4,7.4,5.2],
        'Field Goal Percentage': [55.2,49.5,49.7,44.9],
        'Blocks per Game': [1.0,0.3,1.3,0.7],
        'Steals per Game': [1.5,1.6,1.3,1.5]}
df = pd.DataFrame(data)

With the pd.DataFrame() function, we can create a dataframe from the data, with the player names as the index and their stats as the columns.

Checking Data Types in a Pandas DataFrame

Once we have our dataframe, we can check its data types using the .dtypes attribute.

print(df.dtypes)

This will output the data types of each column in the dataframe:

Player Name               object
Points per Game          float64
Assists per Game         float64
Rebounds per Game        float64
Field Goal Percentage    float64
Blocks per Game          float64
Steals per Game          float64
dtype: object

From this output, we can see that the Player Name column is an object (i.e., string) data type, and the rest of the columns are floats, which are used for decimal numbers. We can also check specific data types for a column using the .dtype attribute.

print(df['Points per Game'].dtype)

As a result, we’ll get the data type of the Points per Game column:

float64

Converting Data Types in a Pandas DataFrame

There might be occasions where you will need to convert one or more columns to a different data type. Here are some cases where you may need to convert data types:

  • The data type is incorrect
  • The data type is not compatible with another function
  • The data needs to be cleaned or manipulated in some way

The astype() function can be used to convert data types in Pandas.

In the previous example, the Steals per Game column is a float. Let’s convert it into an integer data type using astype().

df['Steals per Game'] = df['Steals per Game'].astype(int)
print(df.dtypes)

Output:

Player Name               object
Points per Game          float64
Assists per Game         float64
Rebounds per Game        float64
Field Goal Percentage    float64
Blocks per Game          float64
Steals per Game            int64
dtype: object

We have successfully converted the data type of the Steals per Game column from float to int. Now, it’s an integer data type.

Another example is converting the Field Goal Percentage column from a float (with a fractional percentage) into an integer (with a percentage). We can do this by multiplying the column by 100 and then using astype() to convert the data type.

df['Field Goal Percentage'] = (df['Field Goal Percentage'] * 100).astype(int)

print(df)

Output:

Player Name Points per Game Assists per Game Rebounds per Game Field Goal Percentage Blocks per Game Steals per Game
LeBron James LeBron James 26.4 10.9 8.3 55 1.0 1
Stephen Curry Stephen Curry 31.1 5.4 5.4 49 0.3 1
Kevin Durant Kevin Durant 28.0 4.0 7.4 49 1.3 1
James Harden James Harden 28.6 8.0 5.2 44 0.7 1

From there, we have modified the data type of the Field Goal Percentage column to integers.

Conclusion

In conclusion, checking data types in a Pandas dataframe is essential to understand your data and manipulate them. In this article, we looked at how to check data types in Pandas dataframes and how to convert data types when needed.

Being able to work with different data types means you can handle any type of data that comes your way, which is an important skill in data analysis. In this article, we explored the importance of checking data types in Pandas dataframes.

Data types determine how the data is stored, manipulated, and used, making it crucial to understand them. We saw two methods of checking data types in Pandas dataframes and how to convert them when necessary using the astype() function.

Being able to work with different data types is essential for data analysis, and acquiring this skill will benefit you in many ways. Knowing how to handle data types and manipulate data in Pandas is crucial for deriving insights and making accurate predictions.

Popular Posts