Introduction to Matplotlib
Data visualization is an essential aspect of data science. It helps in identifying patterns and trends in the data, which would otherwise be difficult to understand.
Data visualization tools allow us to represent data in a graphical format, making it easier for us to draw conclusions and communicate insights. One of the most popular data visualization tools is Matplotlib.
Matplotlib is a powerful and flexible library for creating static, animated, and interactive visualizations in Python. It is built on top of NumPy and provides an interface that is very similar to MATLAB.
In this article, we will cover the basics of Matplotlib, starting with installation and importing, and then moving on to different types of graphs and charts. We will also take a closer look at line plots, how to create them, and how to display the necessary plot.
Getting started with Matplotlib
Before diving into the world of Matplotlib, you need to install the library. The easiest way to install Matplotlib is through the Anaconda distribution.
It comes pre-installed in Anaconda and can be accessed through your preferred Python environment. Alternatively, you can use pip to install Matplotlib.
Once you have installed the Matplotlib library, you can import it into your Python script. The most commonly used module in Matplotlib is the pyplot module, which provides an interface for creating various types of plots and charts.
To import pyplot, you can use the following code:
import matplotlib.pyplot as plt
Types of graphs and charts in Matplotlib
Matplotlib provides us with a variety of options for creating visuals. Here are some of the most commonly used types of graphs and charts in Matplotlib:
-
Line plot
A line plot is used to visualize data points connected by straight lines. It is used to represent time-series data.
-
Scatter plot
A scatter plot is used to visualize the relationship between two continuous variables.
Each data point is represented by a dot on the chart.
-
Histogram
A histogram is used to represent the distribution of a continuous variable. It is used to understand the shape and spread of the data.
-
Bar chart
A bar chart is used to compare discrete categories of data.
It is used to compare the values of different categories.
-
Pie chart
A pie chart is used to show the proportion of different categories in a dataset. It is used to represent the composition of a whole.
Let’s take a closer look at line plots.
Line plot
A line plot is a type of graph that is used to represent data points connected by straight lines. It is commonly used to visualize time-series data, where the x-axis represents time, and the y-axis represents the value of a variable.
Creating a line plot in Matplotlib is relatively straightforward. The first step is to create two NumPy arrays (or lists) containing the x-axis and y-axis values.
For example, let’s create two NumPy arrays for the number of hours spent studying and the corresponding test scores.
import numpy as np
study_hours = np.array([2, 3, 4, 5, 6, 7, 8])
test_scores = np.array([60, 70, 75, 80, 90, 95, 98])
Once the arrays are created, we can use pyplot’s plot() function to create a line graph.
plt.plot(study_hours, test_scores)
The plot() function takes two arguments – the x-axis values and the y-axis values.
The above code creates a line plot showing the relationship between the number of hours spent studying and the corresponding test scores.
Displaying the plot
Once we have created the plot, we need to display it using pyplot’s show() function.
plt.show()
The show() function displays the plot on the screen.
It is essential to include this function at the end of the code to ensure that the graph is displayed correctly.
Conclusion
In this article, we discussed the basics of Matplotlib – a powerful data visualization library in Python. We covered different types of graphs and charts and how to create them.
We also looked at line plots, how to create them, and how to display them. Data visualization is an essential component of data science, and Matplotlib provides us with a flexible and powerful interface for creating visuals.
Scatter Plot
Scatter plots are used to visualize the relationship between two continuous variables. They are commonly used to identify patterns and trends in the data, such as a positive or negative correlation, outliers, or clustering.
Plotting a scatter plot in Matplotlib is similar to plotting a line plot, with one key difference. Instead of using pyplot’s plot() function, we will use pyplot’s scatter() function.
To create a scatter plot, we need to have two arrays containing the x-axis values and the y-axis values. For example, let’s consider the following data:
import numpy as np
x_values = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
y_values = np.array([5, 7, 8, 9, 10, 13, 12, 15, 17, 20])
To create a scatter plot, we can use the following code:
plt.scatter(x_values, y_values)
The scatter() function takes two arguments – the x-axis values and the y-axis values. The above code creates a scatter plot showing the relationship between the x-values and y-values.
Displaying the plot
Once we have created the scatter plot, we need to display it using pyplot’s show() function.
plt.show()
The show() function displays the scatter plot on the screen.
Histogram
Histograms are used to represent data distribution. They are commonly used to identify the shape of the data, such as whether it is normally distributed or skewed.
Histograms visualize the data by dividing it into intervals or bins and plotting the number of data points that fall into each bin. Plotting a histogram in Matplotlib is straightforward.
We first need to have an array containing the data points. For example, let’s consider the following data:
import numpy as np
data = np.array([1, 1, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5])
To create a histogram, we can use the following code:
plt.hist(data)
The hist() function takes one argument – the data array. The above code creates a histogram showing the distribution of the data.
Customizing the histogram
Matplotlib’s hist() function provides several parameters that allow us to customize the histogram. Some of the most commonly used parameters are:
-
bins
the number of bins or ranges to divide the data into. A higher number of bins can provide more detail in the distribution.
-
range
the minimum and maximum values of the data that will be included in the histogram.
-
density
if set to True, the histogram will be normalized to represent the probability density function (PDF) of the data.
Here’s an example of how to customize the histogram using the bins parameter:
plt.hist(data, bins=5)
The above code creates a histogram with five bins, which divides the data into five ranges.
Conclusion
In this article expansion, we discussed scatter plots and histograms in Matplotlib. Scatter plots are used to visualize the relationship between two continuous variables, while histograms are used to represent data distribution.
We covered how to create scatter plots and histograms using Matplotlib and how to customize them using various parameters. Data visualization is a crucial aspect of data science, and by mastering Matplotlib, we can create powerful visuals that can help us understand and communicate data insights.
Bar Chart
Bar charts were one of the earliest and most popular tools for data visualization. They are mainly used to compare discrete categories of data.
Each category is represented by a rectangular bar, and the length of the bar represents the value of the category. Creating a bar chart in Matplotlib involves three main steps – creating the x-axis and y-axis values, plotting the bars, and customizing the chart.
Plotting a bar chart
To create a bar chart, we first need to have two arrays containing the x-axis and y-axis values. The x-axis values represent the categories, while the y-axis values represent the values for each category.
For example, let’s consider the following data:
import numpy as np
category = np.array(["A", "B", "C", "D", "E"])
values = np.array([20, 35, 30, 25, 45])
To create a bar chart, we can use the following code:
x = np.arange(len(category))
plt.bar(x, values)
plt.xticks(x, category)
plt.ylabel("Values")
plt.title("Bar Chart")
The arange() function creates an array of values from 0 to the length of the category array, which serves as the x-axis values. The bar() function creates the bars, taking two arguments – the x-axis values and the y-axis values.
The xticks() function sets the category labels on the x-axis. The ylabel() function sets the label for the y-axis, and the title() function sets the chart’s title.
Customizing the bar chart
Matplotlib’s bar() function has several parameters that allow us to customize the chart. Some of the most commonly used parameters are:
-
yerr
an error bar for each bar.
-
align
the alignment of the bars with the x-axis.
-
width
the width of each bar.
-
loc
the position of the legend.
-
frameon
whether to draw a frame around the legend.
Here’s an example of how to customize the bar chart using the align parameter and the legend function:
plt.bar(x, values, align="center")
plt.xticks(x, category)
plt.ylabel("Values")
plt.title("Bar Chart")
plt.legend(["Values"], loc="upper right", frameon=False)
The above code creates a bar chart with centered bars and a legend positioned in the upper right corner of the chart.
Pie Chart
A pie chart is used to represent the proportion of different categories in a dataset. Each category is represented by a slice or wedge of the pie, and the size of the slice represents the proportion of the category in the data.
Plotting a pie chart
To create a pie chart, we need to have an array containing the values for each category, as well as an array containing the labels for each category. For example, let’s consider the following data:
import numpy as np
values = np.array([20, 35, 30, 25, 45])
labels = np.array(["A", "B", "C", "D", "E"])
To create a pie chart, we can use the following code:
plt.pie(values, labels=labels, autopct="%1.1f%%", shadow=True, startangle=90)
The pie() function takes several arguments – the values and labels arrays, the autopct parameter to format the percentage labels, the shadow parameter to add a shadow effect, and the startangle parameter to set the starting angle of the first slice.
Customizing the pie chart
Matplotlib’s pie() function has several parameters that allow us to customize the chart. Some of the most commonly used parameters are:
-
explode
an array of values to offset the slices from the center of the pie.
-
colors
an array of colors for each slice.
-
autopct
a format string for the percentage labels.
-
shadow
whether to add a shadow effect.
-
startangle
the starting angle of the first slice.
Here’s an example of how to customize the pie chart using the explode and colors parameters and the axis() function:
colors = ["red", "orange", "yellow", "green", "blue"]
explode = [0.1, 0, 0, 0, 0]
plt.pie(values, labels=labels, colors=colors, explode=explode, autopct="%1.1f%%", shadow=True, startangle=90)
plt.axis("equal")
plt.title("Pie Chart")
The above code creates a pie chart with an array of colors for each slice and an array of values to offset the second slice from the center of the pie.
Finally, the axis() function sets the aspect ratio of the chart to “equal,” ensuring that the pie chart is circular.
Conclusion
In this article expansion, we discussed bar charts and pie charts in Matplotlib. Bar charts are used to compare discrete categories of data, and pie charts are used to represent the proportion of different categories in a dataset.
We covered how to create bar charts and pie charts using Matplotlib and how to customize them using various parameters. By mastering these types of charts, we can create powerful visuals that can help us understand and communicate data insights.
Adding Features to Charts in Matplotlib
Matplotlib provides many features to enhance the look and feel of charts. These features are critical for creating professional-looking charts that are easy to read and comprehend.
In this section, we will discuss how to add labels, colors, and legends to Matplotlib charts.
Adding labels and colors
Adding colors and labels to a Matplotlib chart is as simple as specifying them as parameters. For example, let’s consider the following data:
import numpy as np
import matplotlib.pyplot as plt
x = np.array([1, 2, 3, 4, 5])
y = np.array([5, 7, 9, 11, 13])
To add colors and labels to this chart, we can use the following code:
plt.plot(x, y, color="red", label="Data")
plt.xlabel("x-axis")
plt.ylabel("y-axis")
plt.title("Chart")
plt.legend()
plt.show()
The color parameter specifies the color of the line chart, while the label parameter specifies the text that appears in the legend. The xlabel() and ylabel() functions add labels to the x-axis and y-axis, respectively.
Adding information and legends
Adding legends
to Matplotlib charts is simple. The legend function takes an optional parameter, loc, which specifies the location of the legend.
For example:
plt.legend(loc="upper left")
The loc parameter can take values such as “upper left,” “lower right,” “center,” and others. In addition to adding legends, Matplotlib provides several other features to add information to charts.
For instance, we can add titles to charts by using the title() function:
plt.title("My Chart")
The title text will appear at the top of the chart. Finally, we can remove the frame around the legend by setting the frameon parameter to False:
plt.legend(frameon=False)
This will remove the border around the legend, giving the chart a cleaner look.
Plotting using Object-oriented API in Matplotlib
Matplotlib’s object-oriented API provides an alternative syntax to the functional API. It is more flexible and allows for more advanced customization.
The following are the steps to create a graph using Matplotlib’s object-oriented API.
Creating a graph object
The first step in creating a graph object is to call pyplot’s subplots() function. This function creates a figure and an axes object.
fig, ax = plt.subplots()
The fig object represents the entire figure, while the ax object represents a single plot within the figure.
Plotting the values
Once we have created the figure and axes objects, we can plot our data on the axes object. For example:
ax.plot(x, y, color="red")
The plot function works the same as in the functional API.
It takes the x-values, y-values, and any additional parameters for customizing the line chart.
Displaying the plot
Once we have plotted our data, we can display it by using the tight_layout() function. This function adjusts the layout of the subplots so that they fit the figure size.
fig.tight_layout()
plt.show()