Adventures in Machine Learning

Mastering Data Types in Pandas: A Beginner’s Guide

Get to Know Your Pandas: Checking Data Type in Pandas DataFrame

If youre working with data, chances are Pandas is on your side. The Pandas library is one of the most powerful tools in Python, particularly for data analysis and manipulation.

Among its many functions, Pandas allows you to create dataframes, which are two-dimensional tables used for storing and manipulating data.

When working with dataframes, its crucial to understand the data youre dealing with.

One of the first things you need to know is the data type of each column in your dataframe. The data type determines how the data is stored, manipulated, and used.

In this article, we will look at two methods for checking data types in a Pandas dataframe.

Gathering Data for a Pandas DataFrame

Before we can create a dataframe and check its data types, we need to gather some data. In this example, we will be using a dataset of basketball players’ performances in the NBA playoffs.

“`python

import pandas as pd

data = {‘Player Name’: [‘LeBron James’,’Stephen Curry’,’Kevin Durant’,’James Harden’],

‘Points per Game’: [26.4,31.1,28.0,28.6],

‘Assists per Game’: [10.9,5.4,4.0,8.0],

‘Rebounds per Game’: [8.3,5.4,7.4,5.2],

‘Field Goal Percentage’: [55.2,49.5,49.7,44.9],

‘Blocks per Game’: [1.0,0.3,1.3,0.7],

‘Steals per Game’: [1.5,1.6,1.3,1.5]}

df = pd.DataFrame(data)

“`

With the `pd.DataFrame()` function, we can create a dataframe from the data, with the player names as the index and their stats as the columns.

Checking Data Types in a Pandas DataFrame

Once we have our dataframe, we can check its data types using the `.dtypes` attribute. “`python

print(df.dtypes)

“`

This will output the data types of each column in the dataframe:

“`

Player Name object

Points per Game

float64

Assists per Game

float64

Rebounds per Game

float64

Field Goal Percentage

float64

Blocks per Game

float64

Steals per Game

float64

dtype: object

“`

From this output, we can see that the `Player Name` column is an object (i.e., string) data type, and the rest of the columns are floats, which are used for decimal numbers. We can also check specific data types for a column using the `.dtype` attribute.

“`python

print(df[‘Points per Game’].dtype)

“`

As a result, we’ll get the data type of the `Points per Game` column:

“`

float64

“`

Converting Data Types in a Pandas DataFrame

There might be occasions where you will need to convert one or more columns to a different data type. Here are some cases where you may need to convert data types:

– The data type is incorrect

– The data type is not compatible with another function

– The data needs to be cleaned or manipulated in some way

The `astype()` function can be used to convert data types in Pandas.

In the previous example, the `Steals per Game` column is a float. Let’s convert it into an integer data type using `astype()`.

“`python

df[‘Steals per Game’] = df[‘Steals per Game’].astype(int)

print(df.dtypes)

“`

Output:

“`

Player Name object

Points per Game

float64

Assists per Game

float64

Rebounds per Game

float64

Field Goal Percentage

float64

Blocks per Game

float64

Steals per Game int64

dtype: object

“`

We have successfully converted the data type of the `Steals per Game` column from `float` to `int`. Now, its an integer data type.

Another example is converting the `Field Goal Percentage` column from a float (with a fractional percentage) into an integer (with a percentage). We can do this by multiplying the column by 100 and then using `astype()` to convert the data type.

“`python

df[‘Field Goal Percentage’] = (df[‘Field Goal Percentage’] * 100).astype(int)

print(df)

“`

Output:

| | Player Name | Points per Game | Assists per Game | Rebounds per Game | Field Goal Percentage | Blocks per Game | Steals per Game |

|————————–|—————|—————-|—————–|——————-|———————–|—————-|—————–|

| LeBron James | LeBron James | 26.4 | 10.9 | 8.3 | 55 | 1 | 1 |

| Stephen Curry | Stephen Curry | 31.1 | 5.4 | 5.4 | 49 | 0.3 | 1 |

| Kevin Durant | Kevin Durant | 28 | 4 | 7.4 | 49 | 1.3 | 1 |

| James Harden | James Harden | 28.6 | 8 | 5.2 | 44 | 0.7 | 1 |

From there, we have modified the data type of the `Field Goal Percentage` column to integers.

Conclusion

In conclusion, checking data types in a Pandas dataframe is essential to understand your data and manipulate them. In this article, we looked at how to check data types in Pandas dataframes and how to convert data types when needed.

Being able to work with different data types means you can handle any type of data that comes your way, which is an important skill in data analysis. In this article, we explored the importance of checking data types in Pandas dataframes.

Data types determine how the data is stored, manipulated, and used, making it crucial to understand them. We saw two methods of checking data types in Pandas dataframes and how to convert them when necessary using the `astype()` function.

Being able to work with different data types is essential for data analysis, and acquiring this skill will benefit you in many ways. Knowing how to handle data types and manipulate data in Pandas is crucial for deriving insights and making accurate predictions.

Popular Posts