Adventures in Machine Learning

Mastering Data Visualization with Bokeh in Python

Data visualization is the graphical representation of data and information. Data analytics drives all decision-making processes in organizations; this makes visualization an essential tool for deriving insights and communicating the results to stakeholders.

This article will provide an introduction to data visualization and guide you through the basics of generating your first visualization.

From Data to Visualization

Data handling is the first step in executing a visualization project. Pandas and Numpy are popular data manipulation tools in Python.

Clean and transform data, creating datasets that are structured and ready for visualization. In Pandas, read, merge, and group data to create a single dataset.

Once the dataset is ready, determine where the visualization output will be rendered. The Bokeh library is a popular python plotting library that renders visualizations on a webpage or Jupyter notebook.

Output_file() draws the visualization on a webpage, while output_notebook() draws it on the Jupyter notebook. Make sure that the Bokeh library is installed to perform these operations.

Set up the Figure(s)

A figure is a container for visualizations and holds a set of data. Use a figure object to customize the visualization with parameters like height, width, title, axis labels, and background color.

The toolbar is another essential feature of the figure object that displays interaction tools.

Connect to and Covert Your Data

A glyph is a visual symbol that represents data points on a plot. There are several types of glyphs that are used for representing different types of data.

For example, a point marker represents individual data points and is used to plot scatter plots. Bar glyphs are used to represent vertical or horizontal bars to represent tabular data.

Line glyphs are used to display data that varies continuously over an interval. Choose the glyph type that best suits your data.

Organizing the Layout

After creating a glyph, you can organize them to form a tabbed layout, allowing the viewer to analyze data from different angles. The hover tool provides a viewer with additional information on the data points by displaying a tooltip when they move their mouse over a glyph.

Moreover, adding widgets, like sliders, checkboxes, dropdown menus, and radio buttons, can enhance the user experience.

Preview and Save Your Beautiful Data Creation

Once you are done customizing and organizing your visualization, visual inspection is the next crucial aspect. The show() function displays the figure on your output medium.

Save your creation to a JPEG or PNG file using Bokeh’s export tools.

Generating Your First Figure

Outputting the Visualization

The following is a quick tutorial on generating your first Bokeh visualization. When using output_file, the output of the visualization is a .html file stored in the local directory.

Output_notebook displays the visualization inline in the output cell.

Getting Your Figure Ready for Data

Create a figure object to hold and customize our visualization. Initialize the figure object, specify the height and width of our visualization, and set the title and background color.

Drawing Data with Glyphs

The next step is to draw the desired glyph with our data. For example, a scatter plot can be drawn by using the circle glyph.

Define the position using x and y, size, fill color, and alpha. After defining our glyph, we add it to our figure object.

A Quick Aside About Data

Finding the right combination of data and visualizations is subjective and context-driven. For example, the dataset used for this tutorial was from the National Basketball Association (NBA) that contained each NBA player’s average points, rebounds, and assists per contest from the 2017-18 season to the 2020-21 season.

Different datasets and visualization types can be used to showcase another sector’s analytical insights. In conclusion, data visualization is an essential tool for organizations and individuals interested in using data to gain insights and make informed decisions.

This article covers the basics of generating your first visualization from data handling, selecting glyph types, to data display, and organization. Using some additional visualization tactics with Bokeh, you can enhance the user experience by adding interaction tools, widgets and exploring other types of data sets.

By following this guide, you are well on your way to showcasing beautiful and informative data visuals.

3) Using the ColumnDataSource Object

Introducing ColumnDataSource

The ColumnDataSource object is a central feature of the Bokeh library. It is used to connect the source data to the visualization components like the glyph.

Moreover, it allows you to use different types of Python data structures, such as Python lists, Numpy arrays, and Pandas data frames, to create a ColumnDataSource object. One advantage of using this object is that it enables linking data across multiple visualizations, making it possible to view all the charts at once.

Customizing Glyphs With ColumnDataSource

The ColumnDataSource object can also be used to customize glyphs. For example, you can use it to change the color of each glyph based on the value of a specific column in the data frame.

This can be done by passing the name of the column to the fill_color parameter and then referencing it when creating the glyph. By doing this, different glyphs can have different colors based on the values of a specified column.

Leveraging Patch Glyph for Efficiency

The Patch glyph type is an efficient tool for visualizing images, but its usefulness extends beyond this. Suppose you want to display a plot with a large number of data points.

Instead of creating multiple scatter plots, you can use the patch glyph to render all the points efficiently at once. The patch glyph is a single object that connects any number of points using line segments, thus rendering your visualization faster.

4) Organizing Multiple Visualizations With Layouts

Creating a Grid Layout

Grid layouts provide an excellent way to organize multiple visualizations to create complex and detailed dashboards. The gridplot() function can be used to create a grid of plots arranged in rows and columns.

To use this function, you need to pass it a list of plots, specifying the number of rows and columns in the grid. You can control the width and height of each visualization by setting parameters while creating the plots.

This function is excellent for presenting multiple plots at once as each plot occupies the same amount of space in the grid.

Creating a Tabbed Layout

In addition to the grid layout, tabbed layouts can be used to display multiple visualizations in one web page or notebook. The Tabs() function enables us to create a tabbed layout, while the Panel() function is used to create the individual plot panels for each tab.

When using the Tabs() function, you need to pass it a list of panel objects that you want to include in the tabbed layout. The Panel() function is used to create a singular plot that can be utilized in the tabs.

When building an application dashboard, it is essential to have a combination of both a grid layout and tabbed layout in the dashboard. The grid layout is useful for composing the dashboard layout, and tabbed layouts help to organize the data visualization and user interface components.

In conclusion, the ColumnDataSource object is an essential part of the Bokeh library, and managers need to ensure that it is used correctly to connect the source data to the visualization components like the glyph. Furthermore, the Patch glyph type is an efficient tool for visualizing data sets, particularly when dealing with a large number of data points.

The grid and tabbed layouts are essential in designing a dashboard, and understanding the gridplot() and Tabs() function, and their usage is crucial in presenting multiple plots for an application dashboard.

5) Adding Interaction

Data visualization is all about communicating information to the user. Interaction in data visualization is one way of enabling users to explore the data beyond the static visualizations.

In this section, we will discuss some of the ways to add interaction in Bokeh.

Configuring the Toolbar

The toolbar is a vital part of the visualization as it provides the necessary tools for a user to interact with the visualization. By default, Bokeh includes all the tools which can be overwhelming and confusing.

Customize your toolbar by including specific tools relevant to your visualization. You can do this by modifying the toolbar attribute of the figure object.

Selecting Data Points

Selection is one of the key interactives in data visualization. Bokeh includes several tools for selection, including lasso select, box select, and tap tools.

The tap tool is an example of the glyph-specific tools. It allows you to highlight data points on a visualization by simply clicking on them.

Once you have selected data points, you can use the Hover Tool to display information about the selected data points. Highlighting the selected data points can be done by changing color or size.

Adding Hover Actions

The Hover Tool provides a user with additional information on the data element by displaying a tooltip on the interaction. By default, the Hover Tool is configured to display data that is being hovered over.

The Hover Tool is useful in displaying specific information on a selected data point. Using the tooltip function, you can specify the text or HTML that should be displayed when the cursor hovers over the data point.

Linking Axes and Selections

The linkage of data visualization enables you to create a relationship between two or more plots. For example, you can link a selection of data points in one plot to another plot or linked brushing.

This relationship enables you to manipulate data on one plot and apply the same changes to other linked plots.

Highlighting Data Using the Legend

Adding a legend to a visualization makes it easier for the viewer to interpret the graph and also provides a way of highlighting particular data components. Bokeh has a built-in legend tool that enables the user to highlight a subset of data components by varying the legend color on interaction.

This can be achieved by creating a plot with all your data, and then using the legend interface to select the components you want.

In conclusion, adding interactions is a critical aspect of data visualization.

Without interaction, it is nearly impossible to communicate the underlying meaning in vast amounts of data. Configuring the toolbar, selecting data points, adding hover actions, linking axes and selections, and highlighting data using the legend are all effective ways of adding interaction.

These features improve interactivity in visualizations, which enhances the user experience by providing flexibility in data exploration. This article covered the basics of data visualization using Python’s Bokeh library.

It discussed preparing data, selecting visualization outputs, setting up figures, customizing glyphs with ColumnDataSource, creating layouts with grids and tabs, adding interactions, and more. The importance of data visualization in conveying insights and information was emphasized.

By utilizing available tools such as the Hover, Tap, and Legend tools, one can make visualizations more interactive and informative. In conclusion, implementing a comprehensive data visualization strategy can improve data interpretation, leading to better decision making and communication.