Mastering Data Visualization with Matplotlib’s Pyplot Library
If you are looking to create stunning visualizations in Python, then matplotlib.pyplot is the library to use. With matplotlib.pyplot, you can create a wide variety of charts to convey your data in the most meaningful and effective way possible.
One of the more popular plots that you can create with matplotlib.pyplot is a scatterplot. Scatterplots are ideal for displaying the relationship between two variables.
They are also great for identifying trends, clusters, and outliers.
Syntax of matplotlib.pyplot.scatter()
Before we dive into creating our scatterplot, let’s first take a look at the syntax for the matplotlib.pyplot.scatter() function.
The syntax for the function is:
matplotlib.pyplot.scatter(x, y, s=None, c=None, marker=None, cmap=None, norm=None, vmin=None, vmax=None, alpha=None, linewidths=None, verts=None, edgecolors=None, **kwargs)
The ‘x’ and ‘y’ parameters stand for the data that we want to plot. The ‘s’ parameter sets the size of the markers on the plot, and the ‘c’ parameter controls the color of the markers.
Other parameters include ‘marker’, which sets the shape of the markers, and ‘cmap’, which is a colormap instance or registered name that maps data values to colors. The ‘norm’, ‘vmin’, and ‘vmax’ parameters can be used to normalize the colormap, while the ‘alpha’ parameter sets the transparency of the markers.
Coloring the markers based on a third variable
Scatterplots are not limited to just plotting two variables. We can add a third variable by shading the markers on the plot.
In this way, we can visualize the relationship between three variables at once. The third variable can be nominal, ordinal, or interval.
We can use several colormaps to shade the markers on the plot. The available colormaps include ‘viridis’, ‘magma’, ‘plasma’, ‘inferno’, ‘cividis’, ‘cool’, ‘coolwarm’, ‘Wistia’, ‘RdPu’, ‘PuRd’, and many others.
Example 1: Creating a scatterplot with a gray colormap
Now we will create a scatterplot with a gray colormap. We will use a pandas DataFrame for this purpose.
The DataFrame will contain information on the sales performance of a company’s products in terms of units sold and revenue generated. We will then shade the markers based on the region where the products were sold.
Setting up the pandas DataFrame
import pandas as pd
import matplotlib.pyplot as plt
# create the DataFrame
df = pd.DataFrame({'Region': ['North', 'South', 'East', 'West', 'North', 'South',
'East', 'West', 'North', 'South', 'East', 'West'],
'Units Sold': [232, 240, 342, 310, 215, 234, 323, 292,
195, 198, 320, 290],
'Revenue': [19136, 20160, 30232, 27920, 18795, 20288,
28900, 26940, 18185, 18612, 30130, 27880]})
Creating the scatterplot with gray colormap and shaded markers
Once we have our DataFrame ready, we can start creating the scatterplot. We will first create a variable ‘color’ that will contain the values that we want to use to shade the markers.
In this case, ‘color’ will contain the values in the ‘Region’ column of our DataFrame. We will then create our scatterplot using matplotlib.pyplot.scatter().
# create a variable with the labels to shade the markers
color = {'North':'gray', 'South':'gray', 'East':'gray', 'West':'gray'}
# create the scatterplot
plt.scatter(df['Units Sold'], df['Revenue'], c=df['Region'].map(color), s=100)
# add a title and axis labels
plt.title('Sales Performance by Region')
plt.xlabel('Units Sold')
plt.ylabel('Revenue')
# show the plot
plt.show()
In the above code, we set the marker size to 100 (s=100). We then map the ‘color’ variable to the ‘Region’ column of our DataFrame (c=df[‘Region’].map(color)).
This shades the markers based on the region where the products were sold. Finally, we add a title and axis labels to our plot and display it using plt.show().
Conclusion
In conclusion, matplotlib.pyplot is an excellent library for data visualization in Python. With its easy-to-use functions, users can create beautiful and informative plots in no time at all.
Scatterplots are just one of the many types of charts that you can create with matplotlib.pyplot. Shading markers based on a third variable using a colormap is an excellent way of visualizing the relationship between three variables at once.
I hope that this tutorial has been helpful to you, and you can use scatterplots in the future to gain valuable insights into your data.
Example 2: Using a different colormap
In Example 1, we created a scatterplot using a gray colormap.
However, there are many other colormaps that we can use to shade the markers on the plot. In this example, we will create a scatterplot using the Greens colormap and reverse the shading of the markers.
Creating the scatterplot with Greens colormap
To create the scatterplot, we will use the same DataFrame from Example 1, which contains information on the sales performance of a company’s products in terms of units sold and revenue generated. We will shade the markers based on the revenue generated and the Greens colormap.
# create the scatterplot
plt.scatter(df['Units Sold'], df['Revenue'], c=df['Revenue'], cmap='Greens', s=100)
# add a title and axis labels
plt.title('Sales Performance')
plt.xlabel('Units Sold')
plt.ylabel('Revenue')
# show the plot
plt.show()
In the above code, we set the ‘c’ parameter to the ‘Revenue’ column of our DataFrame to shade the markers based on revenue generated. We also set the ‘cmap’ parameter to ‘Greens’ to use the Greens colormap.
We set the default marker size to 100 with the ‘s’ parameter. We then add a title and axis labels to our plot and display it using plt.show().
Reversing the colormap shading
By default, the shading of the colormap goes from the lowest value to the highest value. In some cases, we may want to reverse the order of the shading.
We can achieve this by adding the ‘vmin’ and ‘vmax’ parameters to our scatterplot.
# create the scatterplot with reversed shading
plt.scatter(df['Units Sold'], df['Revenue'], c=df['Revenue'], cmap='Greens', s=100, vmin=df['Revenue'].max(), vmax=df['Revenue'].min())
# add a title and axis labels
plt.title('Sales Performance with Reversed Shading')
plt.xlabel('Units Sold')
plt.ylabel('Revenue')
# show the plot
plt.show()
In the above code, we set the ‘vmin’ parameter to the maximum value of the ‘Revenue’ column of our DataFrame and the ‘vmax’ parameter to the minimum value of the ‘Revenue’ column of our DataFrame.
This reverses the shading of the colormap. We then add a title and axis labels to our plot and display it using plt.show().
Example 3: Using categorical variables for color
In Example 1 and 2, we shaded the markers based on a numerical variable. However, we can also color the markers based on categorical variables.
In this example, we will create a scatterplot using a categorical variable.
Setting up the pandas DataFrame with categorical variable
We will create a pandas DataFrame that contains information on the rating of different movies. The DataFrame will include columns for movie title, director, rating, and genre.
We will shade the markers on the scatterplot based on the movie genre.
# create the DataFrame
df = pd.DataFrame({'Movie Title': ['Titanic', 'The Shawshank Redemption', 'The Godfather', 'Jurassic Park', 'Forrest Gump', 'Jaws', 'Grease', 'Star Wars: A New Hope', 'The Lion King', 'The Dark Knight'],
'Director': ['James Cameron', 'Frank Darabont', 'Francis Ford Coppola', 'Steven Spielberg', 'Robert Zemeckis', 'Steven Spielberg', 'Randal Kleiser', 'George Lucas', 'Roger Allers and Rob Minkoff', 'Christopher Nolan'],
'Rating': [7.8, 9.3, 9.2, 8.1, 8.8, 8.0, 7.2, 8.6, 8.6, 9.0],
'Genre': ['Romance', 'Drama', 'Crime', 'Science Fiction', 'Comedy', 'Thriller', 'Musical', 'Science Fiction', 'Animation', 'Action/Adventure']})
Creating the scatterplot with categorical variable
Once we have our DataFrame ready, we can start creating the scatterplot. We will first create a variable ‘color’ that will contain the values that we want to use to shade the markers.
In this case, ‘color’ will contain the values in the ‘Genre’ column of our DataFrame. We will then create our scatterplot using matplotlib.pyplot.scatter().
# create a variable with the labels to shade the markers
color = {'Romance':'red', 'Drama':'blue', 'Crime':'black', 'Science Fiction':'purple', 'Comedy':'magenta', 'Thriller':'green', 'Musical':'orange', 'Animation':'yellow', 'Action/Adventure':'cyan'}
# create the scatterplot
plt.scatter(df['Director'], df['Rating'], c=df['Genre'].map(color), s=100)
# add a title and axis labels
plt.title('Movie Ratings by Genre')
plt.xlabel('Director')
plt.ylabel('Rating')
# show the plot
plt.show()
In the above code, we set the ‘c’ parameter to the ‘Genre’ column of our DataFrame, which contains categorical variables. We then map the ‘color’ variable to the ‘Genre’ column of our DataFrame (c=df[‘Genre’].map(color)).
This shades the markers based on the genre of the movie. Finally, we add a title and axis labels to our plot and display it using plt.show().
Conclusion
In this article, we explored how to create scatterplots using the matplotlib.pyplot library in Python. We looked at the syntax for the matplotlib.pyplot.scatter() function and how to shade the markers based on a third variable using a colormap.
We also created three examples of scatterplots using different colormaps and categorical variables. I hope that this tutorial has been helpful to you and that you can use scatterplots and colormaps to gain valuable insights from your data.
In summary, this article has explored how to create scatterplots with the matplotlib.pyplot library in Python. We have learned about the syntax for the matplotlib.pyplot.scatter() function and how to shade markers based on a third variable using a colormap.
Three examples were provided, each demonstrating a specific feature of the scatterplot. The importance of creating informative and visually appealing visualizations was emphasized.
The main takeaway from this article is that scatterplots can be a powerful tool for gaining insights into complex data. By using different colormaps and shading methods, we can visualise three or more variables in a single plot.
Remember, effective data visualization can make even the most complex data easy to understand and present with confidence.