Adventures in Machine Learning

Customizing Bar Charts in Pandas: Sort Plot and Visualize Data

Are you struggling to analyze data and make sense of it all? If you’re using pandas for your data analysis, there’s a useful function that can help you count the occurrences of values in a given column of a DataFrame and plot them in a bar chart.

In this article, we’ll explore the functionality of the value_counts() function in pandas and the different methods you can use to plot the results.

Functionality of value_counts() function in pandas

Counting occurrences of values in a given column of a DataFrame

The value_counts() function in pandas is a useful method for counting and tabulating the unique values in a column of a DataFrame. It returns a Series object of the unique values in the provided DataFrame column, along with the frequency of each unique value.

This means that if a particular value appears three times in the column, its frequency count in the output will be 3. Here’s some sample code to help illustrate the functionality of the value_counts() function:

import pandas as pd
# create a sample DataFrame
data = {'Fruit': ['apple', 'banana', 'orange', 'banana', 'apple', 'pear', 'kiwi']}
df = pd.DataFrame(data)
# count the occurrences of each fruit value in the 'Fruit' column
fruit_counts = df['Fruit'].value_counts()
# print the resulting Series object
print(fruit_counts)

The output of this code will be a Series object that shows the frequency of each unique fruit value in the ‘Fruit’ column:

apple     2
banana    2
orange    1
pear      1
kiwi      1
Name: Fruit, dtype: int64

Plotting the values produced by the value_counts() function

Once you have the frequency counts of the unique values in a DataFrame column using the value_counts() function, you can use the plot() method to visualize them in a bar chart. This is a great way to quickly see patterns in your data and identify common trends.

Here’s the sample code to create a bar chart from the fruit_counts Series object above:

import matplotlib.pyplot as plt
# create a bar chart from the fruit_counts Series object
fruit_counts.plot(kind='bar') 
# set the chart title and axis labels
plt.title('Fruit Count')
plt.xlabel('Fruit')
plt.ylabel('Count')
# show the chart
plt.show()

This code will generate a bar chart that displays the frequency counts of each unique fruit value in the ‘Fruit’ column:

Example 1 Bar Chart

Methods for plotting value counts in pandas

Plotting value counts in descending order

By default, the value_counts() function sorts the unique values in descending order based on their frequency count. However, if you want to customize the sorting order of the bars in your bar chart, you can use the sort_values() function to sort the output of the value_counts() function by ascending or descending order.

Here’s the code to sort the fruit_counts Series object from the previous example by descending order:

# sort the fruit_counts Series object by descending order
fruit_counts_sorted = fruit_counts.sort_values(ascending=False)
# create a bar chart from the sorted Series object
fruit_counts_sorted.plot(kind='bar')
# set the chart title and axis labels
plt.title('Fruit Count')
plt.xlabel('Fruit')
plt.ylabel('Count')
# show the chart
plt.show()

This code will produce a bar chart that displays the frequency counts of each unique fruit value in the ‘Fruit’ column sorted by descending order:

Example 2 Bar Chart

Plotting value counts in ascending order

If you want to sort the bars in your bar chart by ascending order instead of descending order, you can use the sort_values() function with the ascending=True parameter. Here’s the code to sort the fruit_counts Series object by ascending order:

# sort the fruit_counts Series object by ascending order
fruit_counts_sorted = fruit_counts.sort_values(ascending=True)
# create a bar chart from the sorted Series object
fruit_counts_sorted.plot(kind='bar')
# set the chart title and axis labels
plt.title('Fruit Count')
plt.xlabel('Fruit')
plt.ylabel('Count')
# show the chart
plt.show()

This code will produce a bar chart that displays the frequency counts of each unique fruit value in the ‘Fruit’ column sorted by ascending order:

Example 3 Bar Chart

Plotting value counts in order they appear in DataFrame

Plotting value counts based on order they appear in DataFrame

If you want to display the frequency counts of unique values in a column of a DataFrame in the order they appear, you can use the value_counts() function with the sort=False parameter. This will return a Series object of unique values with their frequency counts, sorted by the order in which they appear in the DataFrame column.

Here’s the code to create a bar chart that displays the frequency counts of each fruit value in the ‘Fruit’ column in the order they appear in the DataFrame:

# count the occurrences of each fruit value in the 'Fruit' column
fruit_counts = df['Fruit'].value_counts(sort=False)
# create a bar chart from the fruit_counts Series object
fruit_counts.plot(kind='bar')
# set the chart title and axis labels
plt.title('Fruit Count')
plt.xlabel('Fruit')
plt.ylabel('Count')
# show the chart
plt.show()

This code will produce a bar chart that displays the frequency counts of each unique fruit value in the ‘Fruit’ column in the order they appear:

Example 4 Bar Chart

Conclusion

In conclusion, the value_counts() function in pandas provides a quick and easy method for counting the frequency of unique values in a column of a DataFrame. With the plot() method in matplotlib, you can also create bar charts of these frequency counts to visualize patterns and trends in your data.

By using the sort_values() function, you can customize the sorting order of the bars in your bar chart to better analyze your data. Finally, using the sort=False parameter in the value_counts() function allows you to display the frequency counts of unique values in the order they appear in the DataFrame column.

With these tools, you can gain valuable insight into your data and make more informed decisions for your business or research needs.

Pandas is one of the most popular data manipulation libraries in Python. It allows you to easily explore, clean, and analyze data with its powerful tools and functions.

In this article, we’ll explore the functionality of the sort_values() and unique() functions in pandas, and how they can be used to manipulate and customize data in a DataFrame. We’ll also cover the customization of bar charts in pandas, including creating horizontal bar charts.

Example 2: Plot Value Counts in Ascending Order

Functionality of sort_values() function in pandas

Sorting data in ascending order

The sort_values() function in pandas allows you to sort a DataFrame by one or more columns in either ascending or descending order. By default, sort_values() sorts data in descending order.

However, you can set the ascending parameter to True to sort the data in ascending order. Here’s some sample code to illustrate this:

import pandas as pd
# create a sample DataFrame
data = {'Fruit': ['apple', 'banana', 'orange', 'banana', 'apple', 'pear', 'kiwi']}
df = pd.DataFrame(data)
# count the occurrences of each fruit value in the 'Fruit' column
fruit_counts = df['Fruit'].value_counts()
# sort the fruit counts in ascending order
fruit_counts_sorted = fruit_counts.sort_values(ascending=True)
# create a bar chart from the sorted fruit counts
fruit_counts_sorted.plot(kind='bar')
# set the chart title and axis labels
plt.title('Fruit Count (Ascending Order)')
plt.xlabel('Fruit')
plt.ylabel('Count')
# show the chart
plt.show()

This code sorts the fruit_counts Series object by ascending order and creates a bar chart that reflects the new sorting.

Reversing the order of data using sort_values() function

If you want to reverse the order of the sorting performed by the sort_values() function, you can set the ascending parameter to False. Here’s the updated code:

import pandas as pd
# create a sample DataFrame
data = {'Fruit': ['apple', 'banana', 'orange', 'banana', 'apple', 'pear', 'kiwi']}
df = pd.DataFrame(data)
# count the occurrences of each fruit value in the 'Fruit' column
fruit_counts = df['Fruit'].value_counts()
# sort the fruit counts in descending order
fruit_counts_sorted = fruit_counts.sort_values(ascending=False)
# create a bar chart from the sorted fruit counts
fruit_counts_sorted.plot(kind='bar')
# set the chart title and axis labels
plt.title('Fruit Count (Descending Order)')
plt.xlabel('Fruit')
plt.ylabel('Count')
# show the chart
plt.show()

This code creates a bar chart that displays the fruit counts in descending order, which was accomplished by setting ascending=False.

Customization of bar chart in pandas

Customizing the appearance of a bar chart

In addition to the sorting and filtering capabilities of pandas, you can use matplotlib to customize the appearance of your bar charts. For example, you can change the color of the bars or the font size of the chart labels.

Here’s some sample code to illustrate this:

import matplotlib.pyplot as plt
# create a sample DataFrame
data = {'Fruit': ['apple', 'banana', 'orange', 'banana', 'apple', 'pear', 'kiwi']}
df = pd.DataFrame(data)
# count the occurrences of each fruit value in the 'Fruit' column
fruit_counts = df['Fruit'].value_counts()
# create a bar chart from the fruit counts
fruit_counts.plot(kind='bar', color=['green', 'yellow', 'orange', 'yellow', 'green', 'brown', 'green'])
# set the chart title and axis labels
plt.title('Fruit Count')
plt.xlabel('Fruit')
plt.ylabel('Count')
# set the font size of the x- and y-axis labels
plt.xticks(fontsize=10)
plt.yticks(fontsize=12)
# show the chart
plt.show()

This code adds color to the bars in the bar chart and adjusts the font size of the x- and y-axis labels.

Creating a horizontal bar chart in pandas

You can also create horizontal bar charts using the barh() method instead of plot(). Here’s some sample code to illustrate this:

import matplotlib.pyplot as plt
# create a sample DataFrame
data = {'Fruit': ['apple', 'banana', 'orange', 'banana', 'apple', 'pear', 'kiwi']}
df = pd.DataFrame(data)
# count the occurrences of each fruit value in the 'Fruit' column
fruit_counts = df['Fruit'].value_counts()
# create a horizontal bar chart from the fruit counts
fruit_counts.plot(kind='barh')
# set the chart title and axis labels
plt.title('Fruit Count')
plt.xlabel('Count')
plt.ylabel('Fruit')
# show the chart
plt.show()

This code uses the barh() method instead of plot() to create a horizontal bar chart.

Example 3: Plot Value Counts in Order They Appear in DataFrame

Functionality of unique() function in pandas

Retrieving unique values from a pandas DataFrame

The unique() method in pandas returns an array of unique values present in a particular column of a DataFrame. This method can be used to retrieve unique values and sort a DataFrame by their order of appearance in that column.

Here’s the code to retrieve unique values:

import pandas as pd
# create a sample DataFrame
data = {'Fruit': ['apple', 'banana', 'orange', 'banana', 'apple', 'pear', 'kiwi']}
df = pd.DataFrame(data)
# retrieve the unique values in the 'Fruit' column
unique_fruits = df['Fruit'].unique()

This code uses the unique() method to retrieve the unique values in the ‘Fruit’ column.

Using unique values to sort data in pandas

By using the unique() method in combination with the loc[] method, you can sort the DataFrame based on the unique values in a specific column. Here’s the code to sort a DataFrame based on the unique values in the ‘Fruit’ column:

import pandas as pd
# create a sample DataFrame
data = {'Fruit': ['apple', 'banana', 'orange', 'banana', 'apple', 'pear', 'kiwi']}
df = pd.DataFrame(data)
# retrieve the unique values in the 'Fruit' column
unique_fruits = df['Fruit'].unique()
# sort the DataFrame based on the unique values in the 'Fruit' column
df_sorted = df.loc[df['Fruit'].isin(unique_fruits)].sort_values(by=['Fruit'])
# count the occurrences of each fruit value in the 'Fruit' column
fruit_counts_sorted = df_sorted['Fruit'].value_counts()
# create a bar chart from the sorted fruit counts
fruit_counts_sorted.plot(kind='bar')
# set the chart title and axis labels
plt.title('Fruit Count')
plt.xlabel('Fruit')
plt.ylabel('Count')
# show the chart
plt.show()

This code uses the loc[] method to create a new DataFrame that only includes rows that contain the unique fruit values from the original DataFrame. Then, it sorts the new DataFrame by the ‘Fruit’ column, which is now in the order of unique fruits.

It counts the unique fruits and creates a bar chart of the counts.

Manipulation of DataFrame in pandas

Reordering rows in a pandas DataFrame

In addition to sorting data in a DataFrame, you can also reorder the rows to better organize your data. This can be done using the reindex() method, which returns a new DataFrame with the row labels reorganized according to a desired order.

Here’s some sample code to illustrate this:

import pandas as pd
# create a sample DataFrame
data = {'Fruit': ['apple', 'banana', 'orange', 'banana', 'apple', 'pear', 'kiwi']}
df = pd.DataFrame(data)
# reorder rows in the DataFrame
df_reorder = df.reindex([2, 4, 1, 0, 3, 6, 5])
# count the occurrences of each fruit value in

Popular Posts