Adventures in Machine Learning

Streamline Data Analysis: Creating a Pareto Chart in Python

Creating a Pareto Chart in Python

Have you ever come across a problem where you needed to analyze a large amount of data and determine which factors were contributing the most to it? If yes, then you might be interested in understanding how you can create a Pareto Chart using Python to simplify the process.

A Pareto Chart, also known as the 80/20 rule, is a useful statistical tool that illustrates the relative frequency or size of a problem’s causes to make it easier to understand the critical areas that require attention. In this article, we will guide you on how to create a Pareto Chart in Python using survey data.

Data Creation

Before creating a Pareto Chart, we need to have data to work with. For this example, we will use survey data obtained from 100 customers about their satisfaction levels with the products and services of a particular company.

Here’s how we can create the data using the pandas DataFrame:

import pandas as pd
df = pd.DataFrame({
   "Satisfaction": ["Very Satisfied", "Satisfied", "Neutral", "Unsatisfied", "Very Unsatisfied"],
   "Count": [30, 35, 15, 10, 10]
})
df.set_index("Satisfaction", inplace=True)
df = df.sort_values("Count", ascending=False)

In the above code, we have created a DataFrame with two columns, Satisfaction and Count. Satisfaction represents the satisfaction levels of the customers, while Count represents the number of customers in each category.

We then set the Satisfaction column as the index of the DataFrame, sort the DataFrame in descending order based on the Count column, and store the results back in the same DataFrame to make it easier to work with.

Pareto Chart Creation

Now that we have our data, we can proceed to create a Pareto Chart using the matplotlib library in Python. Here’s how we can do that:

import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.bar(df.index, df["Count"], color="blue")
ax.set_ylabel("Frequency")
ax.tick_params(axis="y", labelcolor="blue")
ax2 = ax.twinx()
ax2.plot(df.index, df["Count"].cumsum() / df["Count"].sum() *100,
         color="red", marker="D", ms=7)
ax2.set_ylabel("Cumulative Frequency (%)")
ax2.tick_params(axis="y", labelcolor="red")
plt.title("Pareto Chart of Customer Satisfaction")
plt.show()

In the above code, we use the bar plot to display the frequency of each satisfaction level on the y-axis.

We also set the color of the bars to blue. We then create a second y-axis, ax2, with the twinx() function to display the cumulative frequency line graph.

We then plot the cumulative frequency using the line plot and set the color to red. We also add markers to the points on the line using the marker and ms parameters.

Finally, we set the title of the chart to “Pareto Chart of Customer Satisfaction” and display the chart using the show() function.

Customizing a Pareto Chart

If you want to make your Pareto Chart more visually appealing or emphasize certain aspects of the chart, you can customize it based on your preferences.

Changing Colors

One way to customize the Pareto Chart is by changing the colors of the bars or line graph. Here’s how you can do that:

import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.bar(df.index, df["Count"], color=["pink", "purple", "green", "orange", "yellow"])
ax.set_ylabel("Frequency")
ax.tick_params(axis="y", labelcolor="blue")
ax2 = ax.twinx()
ax2.plot(df.index, df["Count"].cumsum() / df["Count"].sum() *100,
         color="black", marker="D", ms=7, lw=2)
ax2.set_ylabel("Cumulative Frequency (%)")
ax2.tick_params(axis="y", labelcolor="red")
plt.title("Custom Pareto Chart of Customer Satisfaction")
plt.show()

In the above code, we use the color parameter to set the color of the bars to pink, purple, green, orange, and yellow, respectively.

We also set the color of the line graph to black, and the thickness of the line to 2, using the lw parameter.

Changing Line Size

You can also change the line size of the line graph by using the linewidth parameter. Here’s an example of how you can do that:

import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.bar(df.index, df["Count"], color=["pink", "purple", "green", "orange", "yellow"])
ax.set_ylabel("Frequency")
ax.tick_params(axis="y", labelcolor="blue")
ax2 = ax.twinx()
ax2.plot(df.index, df["Count"].cumsum() / df["Count"].sum() *100,
         color="black", marker="D", ms=7, linewidth=5)
ax2.set_ylabel("Cumulative Frequency (%)")
ax2.tick_params(axis="y", labelcolor="red")
plt.title("Custom Pareto Chart of Customer Satisfaction")
plt.show()

In the above code, we use the linewidth parameter to set the thickness of the line to 5.

Conclusion

In this article, we have discussed how to create a Pareto Chart in Python and customize it to suit your preferences. We used pandas DataFrame to create the data and matplotlib library to plot the chart.

By using the techniques discussed in this article, you can quickly identify the critical areas that require attention. In this article, we learned about creating a Pareto Chart in Python using survey data to identify the critical areas that require attention.

We also discussed the importance of customizing the chart to suit individual preferences. By following the steps outlined in this article, we can simplify the process of analyzing large amounts of data and gain deeper insights into the factors contributing to a problem.

The key takeaway is that Pareto Charts are a powerful tool for decision-making and can be easily created in Python using pandas DataFrame and matplotlib library.

Popular Posts