Adventures in Machine Learning

Counting Unique Values in Pandas DataFrame: A Simple Guide

Counting Unique Values in Pandas DataFrame: A Simple Guide for Beginners

Are you new to Pandas and struggling to count unique values in your DataFrame? Don’t worry, you’re not alone! Counting unique values is a common task in data analysis and can be accomplished easily with Pandas.

In this article, we’ll cover how to count unique values by group in a Pandas DataFrame. Additionally, we’ll explore some examples where we group by one or multiple columns and count unique values.

Let’s dive in!

Counting Unique Values by Group in a Pandas DataFrame

Counting unique values in a DataFrame can be helpful in analyzing data. For instance, if you have a table of employees and their ages in a company, you may want to know how many employees there are in each age range.

You can accomplish this with Pandas by grouping the DataFrame by the relevant column and counting unique values. To count unique values by group in a Pandas DataFrame, you can use the .nunique() function.

This will return a new DataFrame with the counts of unique values for each group. Here’s an example:

import pandas as pd
data = {'team': ['Team A', 'Team A', 'Team B', 'Team B', 'Team B', 'Team C'],
        'age': [25, 28, 35, 35, 32, 26]}
df = pd.DataFrame(data)
print(df.groupby('team')['age'].nunique())

In this example, we have a DataFrame with two columns, ‘team’ and ‘age’. We then group the DataFrame by ‘team’ and count the number of unique ‘age’ values for each team.

The output will be:

team
Team A    2
Team B    3
Team C    1
Name: age, dtype: int64

This tells us that Team A has two unique ages, Team B has three unique ages, and Team C has one unique age.

Grouping By One Column & Counting Unique Values

Let’s take a look at another example where we group by one column and count unique values.

In this example, let’s assume we have a DataFrame that contains information about a group of people and their favorite colors. We want to know how many people chose each color as their favorite.

Here’s an example:

import pandas as pd
data = {'name': ['John', 'Mary', 'Mike', 'Alex', 'Sarah', 'Lena'],
        'favorite_color': ['blue', 'green', 'red', 'blue', 'yellow', 'green']}
df = pd.DataFrame(data)
print(df.groupby('favorite_color')['name'].nunique())

In this code example, we group the DataFrame by the ‘favorite_color’ column and count the number of unique ‘name’ values for each color. The output will be:

favorite_color
blue      2
green     2
red       1
yellow    1
Name: name, dtype: int64

This tells us that two people chose blue and green as their favorite colors, one person chose red, and one person chose yellow.

Grouping By Multiple Columns & Counting Unique Values

Now, let’s take a look at grouping by multiple columns and counting unique values.

In this example, let’s assume we have a DataFrame that contains information about a group of people, their age, and their position in a company. We want to know how many people are in each position in each age group.

Here’s an example:

import pandas as pd
data = {'age': [25, 28, 35, 35, 32, 26],
        'position': ['Manager', 'Assistant', 'Manager', 'Assistant', 'CFO', 'CFO'],
        'team': ['Team A', 'Team A', 'Team B', 'Team B', 'Team B', 'Team C']}
df = pd.DataFrame(data)
print(df.groupby(['age', 'position'])['team'].nunique())

In this code example, we group the DataFrame by both ‘age’ and ‘position’ and count the number of unique ‘team’ values for each group. The output will be:

age  position  
25   Manager       1
26   CFO           1
28   Assistant     1
32   CFO           1
35   Assistant     1
     Manager       1
Name: team, dtype: int64

This tells us that there is one manager in the 25 age group, one CFO in the 26 age group, one assistant in the 28 age group, one CFO in the 32 age group, and one assistant and one manager in the 35 age group.

Additional Resources

Looking for more Pandas tutorials to help you perform common operations? Check out these resources:

Conclusion

In this article, we covered how to count unique values by group in a Pandas DataFrame. We explored examples of grouping by one or multiple columns and counting unique values.

By following the examples and additional resources, you can apply these techniques to your own data analysis projects. Happy coding!

To summarize, this article highlights how to count unique values in a Pandas DataFrame.

By grouping by columns and using the .nunique() function, we can easily count the number of unique values for each group. Three examples were presented using different combinations of columns, which demonstrated how versatile this function is.

For those new to Pandas, this article serves as a guide to counting unique values by group, and for those already familiar, it may provide a quick refresher or show alternate ways of performing the same task. Ultimately, understanding how to count unique values is essential for data analysis, and with Pandas, it can be accomplished quickly and efficiently.

Popular Posts