Get to Know Your Pandas: Checking Data Type in Pandas DataFrame
If you’re working with data, chances are Pandas is on your side. The Pandas library is one of the most powerful tools in Python, particularly for data analysis and manipulation.
Among its many functions, Pandas allows you to create dataframes, which are two-dimensional tables used for storing and manipulating data.
When working with dataframes, it’s crucial to understand the data you’re dealing with.
One of the first things you need to know is the data type of each column in your dataframe. The data type determines how the data is stored, manipulated, and used.
In this article, we will look at two methods for checking data types in a Pandas dataframe.
Gathering Data for a Pandas DataFrame
Before we can create a dataframe and check its data types, we need to gather some data. In this example, we will be using a dataset of basketball players’ performances in the NBA playoffs.
import pandas as pd
data = {'Player Name': ['LeBron James','Stephen Curry','Kevin Durant','James Harden'],
'Points per Game': [26.4,31.1,28.0,28.6],
'Assists per Game': [10.9,5.4,4.0,8.0],
'Rebounds per Game': [8.3,5.4,7.4,5.2],
'Field Goal Percentage': [55.2,49.5,49.7,44.9],
'Blocks per Game': [1.0,0.3,1.3,0.7],
'Steals per Game': [1.5,1.6,1.3,1.5]}
df = pd.DataFrame(data)
With the pd.DataFrame()
function, we can create a dataframe from the data, with the player names as the index and their stats as the columns.
Checking Data Types in a Pandas DataFrame
Once we have our dataframe, we can check its data types using the .dtypes
attribute.
print(df.dtypes)
This will output the data types of each column in the dataframe:
Player Name object
Points per Game float64
Assists per Game float64
Rebounds per Game float64
Field Goal Percentage float64
Blocks per Game float64
Steals per Game float64
dtype: object
From this output, we can see that the Player Name
column is an object (i.e., string) data type, and the rest of the columns are floats, which are used for decimal numbers. We can also check specific data types for a column using the .dtype
attribute.
print(df['Points per Game'].dtype)
As a result, we’ll get the data type of the Points per Game
column:
float64
Converting Data Types in a Pandas DataFrame
There might be occasions where you will need to convert one or more columns to a different data type. Here are some cases where you may need to convert data types:
- The data type is incorrect
- The data type is not compatible with another function
- The data needs to be cleaned or manipulated in some way
The astype()
function can be used to convert data types in Pandas.
In the previous example, the Steals per Game
column is a float. Let’s convert it into an integer data type using astype()
.
df['Steals per Game'] = df['Steals per Game'].astype(int)
print(df.dtypes)
Output:
Player Name object
Points per Game float64
Assists per Game float64
Rebounds per Game float64
Field Goal Percentage float64
Blocks per Game float64
Steals per Game int64
dtype: object
We have successfully converted the data type of the Steals per Game
column from float
to int
. Now, it’s an integer data type.
Another example is converting the Field Goal Percentage
column from a float (with a fractional percentage) into an integer (with a percentage). We can do this by multiplying the column by 100 and then using astype()
to convert the data type.
df['Field Goal Percentage'] = (df['Field Goal Percentage'] * 100).astype(int)
print(df)
Output:
Player Name | Points per Game | Assists per Game | Rebounds per Game | Field Goal Percentage | Blocks per Game | Steals per Game | |
---|---|---|---|---|---|---|---|
LeBron James | LeBron James | 26.4 | 10.9 | 8.3 | 55 | 1.0 | 1 |
Stephen Curry | Stephen Curry | 31.1 | 5.4 | 5.4 | 49 | 0.3 | 1 |
Kevin Durant | Kevin Durant | 28.0 | 4.0 | 7.4 | 49 | 1.3 | 1 |
James Harden | James Harden | 28.6 | 8.0 | 5.2 | 44 | 0.7 | 1 |
From there, we have modified the data type of the Field Goal Percentage
column to integers.
Conclusion
In conclusion, checking data types in a Pandas dataframe is essential to understand your data and manipulate them. In this article, we looked at how to check data types in Pandas dataframes and how to convert data types when needed.
Being able to work with different data types means you can handle any type of data that comes your way, which is an important skill in data analysis. In this article, we explored the importance of checking data types in Pandas dataframes.
Data types determine how the data is stored, manipulated, and used, making it crucial to understand them. We saw two methods of checking data types in Pandas dataframes and how to convert them when necessary using the astype()
function.
Being able to work with different data types is essential for data analysis, and acquiring this skill will benefit you in many ways. Knowing how to handle data types and manipulate data in Pandas is crucial for deriving insights and making accurate predictions.