Finding Unique Values in Pandas DataFrame
When working with large datasets, you may come across a situation where you need to find unique values within a Pandas DataFrame. This can be especially helpful when analyzing data or cleaning up a dataset.
In this article, we’ll explore two different methods that you can use to find unique values within a Pandas DataFrame.
Method 1: Using the pandas unique() function
The first method that we’ll explore is using the pandas unique() function.
This function returns an array of unique values within a specified column of the DataFrame. Here’s an example:
import pandas as pd
data = {'Name': ['John', 'Mary', 'Peter', 'John'],
'Age': [25, 30, 40, 25]}
df = pd.DataFrame(data)
unique_names = df['Name'].unique()
print(unique_names)
Output:
array(['John', 'Mary', 'Peter'], dtype=object)
As you can see, the unique() function returns an array of unique values within the ‘Name’ column of the DataFrame. If you want to find unique values within multiple columns, you can pass in a list of column names as an argument to the unique() function.
Method 2: Using the ravel() function
The second method that we’ll explore is using the ravel() function. This function returns a flattened version of the DataFrame, which can then be passed into the pandas unique() function.
Here’s an example:
import pandas as pd
data = {'Name': ['John', 'Mary', 'Peter', 'John'],
'Age': [25, 30, 40, 25]}
df = pd.DataFrame(data)
unique_values = pd.Series(df.values.ravel()).unique()
print(unique_values)
Output:
array(['John', 25, 'Mary', 30, 'Peter', 40], dtype=object)
As you can see, the ravel() function returns a flattened version of the DataFrame, and the pandas unique() function then returns an array of unique values.
Return Array of Unique Values
If you just want to return an array of unique values, you can use either the pandas unique() function or the ravel() function in combination with the unique() function. Here’s an example:
import pandas as pd
data = {'Name': ['John', 'Mary', 'Peter', 'John'],
'Age': [25, 30, 40, 25]}
df = pd.DataFrame(data)
unique_names = df['Name'].unique()
print(unique_names)
Output:
array(['John', 'Mary', 'Peter'], dtype=object)
Return DataFrame of Unique Values
If you want to return a DataFrame of unique values, you can use the pandas drop_duplicates() function. Here’s an example:
import pandas as pd
data = {'Name': ['John', 'Mary', 'Peter', 'John'],
'Age': [25, 30, 40, 25]}
df = pd.DataFrame(data)
unique_df = df.drop_duplicates()
print(unique_df)
Output:
Name Age
0 John 25
1 Mary 30
2 Peter 40
As you can see, the drop_duplicates() function returns a DataFrame of unique values.
Return Number of Unique Values
If you just want to return the number of unique values within a column, you can use the pandas nunique() function. Here’s an example:
import pandas as pd
data = {'Name': ['John', 'Mary', 'Peter', 'John'],
'Age': [25, 30, 40, 25]}
df = pd.DataFrame(data)
unique_names_count = df['Name'].nunique()
print(unique_names_count)
Output:
3
As you can see, the nunique() function returns the number of unique values within the ‘Name’ column of the DataFrame.
Conclusion
In this article, we explored two different methods that you can use to find unique values within a Pandas DataFrame. Whether you prefer using the pandas unique() function or the ravel() function in combination with the unique() function, both methods are simple and effective ways to clean up your datasets and analyze data more efficiently.
In this article, we explored two methods to find unique values in a Pandas DataFrame. The first method involved using the pandas unique() function and passing in a column or list of columns as an argument.
The second method required using the ravel() function in combination with the unique() function to return a flattened version of the DataFrame in an array of unique values. We also learned to filter unique values into a DataFrame or return the number of unique values within a column using drop_duplicates() and nunique() functions, respectively.
It is essential to find unique values when analyzing a large dataset, and with these straightforward methods, you can clean up your data quickly and efficiently.