Adventures in Machine Learning

Efficiently Clean up Your Data: Finding Unique Values in Pandas DataFrame

Finding Unique Values in Pandas DataFrame

When working with large datasets, you may come across a situation where you need to find unique values within a Pandas DataFrame. This can be especially helpful when analyzing data or cleaning up a dataset.

In this article, we’ll explore two different methods that you can use to find unique values within a Pandas DataFrame. Method 1: Using the pandas unique() function

The first method that we’ll explore is using the pandas unique() function.

This function returns an array of unique values within a specified column of the DataFrame. Here’s an example:

“`python

import pandas as pd

data = {‘Name’: [‘John’, ‘Mary’, ‘Peter’, ‘John’],

‘Age’: [25,

30, 40, 25]}

df = pd.DataFrame(data)

unique_names = df[‘Name’].unique()

print(unique_names)

“`

Output:

“`

array([‘John’, ‘Mary’, ‘Peter’], dtype=object)

“`

As you can see, the unique() function returns an array of unique values within the ‘Name’ column of the DataFrame. If you want to find unique values within multiple columns, you can pass in a list of column names as an argument to the unique() function.

Method 2: Using the ravel() function

The second method that we’ll explore is using the ravel() function. This function returns a flattened version of the DataFrame, which can then be passed into the pandas unique() function.

Here’s an example:

“`python

import pandas as pd

data = {‘Name’: [‘John’, ‘Mary’, ‘Peter’, ‘John’],

‘Age’: [25,

30, 40, 25]}

df = pd.DataFrame(data)

unique_values = pd.Series(df.values.ravel()).unique()

print(unique_values)

“`

Output:

“`

array([‘John’, 25, ‘Mary’,

30, ‘Peter’, 40], dtype=object)

“`

As you can see, the ravel() function returns a flattened version of the DataFrame, and the pandas unique() function then returns an array of unique values.

Return Array of Unique Values

If you just want to return an array of unique values, you can use either the pandas unique() function or the ravel() function in combination with the unique() function. Here’s an example:

“`python

import pandas as pd

data = {‘Name’: [‘John’, ‘Mary’, ‘Peter’, ‘John’],

‘Age’: [25,

30, 40, 25]}

df = pd.DataFrame(data)

unique_names = df[‘Name’].unique()

print(unique_names)

“`

Output:

“`

array([‘John’, ‘Mary’, ‘Peter’], dtype=object)

“`

Return DataFrame of Unique Values

If you want to return a DataFrame of unique values, you can use the pandas drop_duplicates() function. Here’s an example:

“`python

import pandas as pd

data = {‘Name’: [‘John’, ‘Mary’, ‘Peter’, ‘John’],

‘Age’: [25,

30, 40, 25]}

df = pd.DataFrame(data)

unique_df = df.drop_duplicates()

print(unique_df)

“`

Output:

“`

Name Age

0 John 25

1 Mary

30

2 Peter 40

“`

As you can see, the drop_duplicates() function returns a DataFrame of unique values.

Return Number of Unique Values

If you just want to return the number of unique values within a column, you can use the pandas nunique() function. Here’s an example:

“`python

import pandas as pd

data = {‘Name’: [‘John’, ‘Mary’, ‘Peter’, ‘John’],

‘Age’: [25,

30, 40, 25]}

df = pd.DataFrame(data)

unique_names_count = df[‘Name’].nunique()

print(unique_names_count)

“`

Output:

“`

3

“`

As you can see, the nunique() function returns the number of unique values within the ‘Name’ column of the DataFrame.

Conclusion

In this article, we explored two different methods that you can use to find unique values within a Pandas DataFrame. Whether you prefer using the pandas unique() function or the ravel() function in combination with the unique() function, both methods are simple and effective ways to clean up your datasets and analyze data more efficiently.

In this article, we explored two methods to find unique values in a Pandas DataFrame. The first method involved using the pandas unique() function and passing in a column or list of columns as an argument.

The second method required using the ravel() function in combination with the unique() function to return a flattened version of the DataFrame in an array of unique values. We also learned to filter unique values into a DataFrame or return the number of unique values within a column using drop_duplicates() and nunique() functions, respectively.

It is essential to find unique values when analyzing a large dataset, and with these straightforward methods, you can clean up your data quickly and efficiently.

Popular Posts