Adventures in Machine Learning

Unleashing the Power of Crosstab Function: Creating Pivot Tables in PostgreSQL

Today’s business world is data-driven, where companies need information to make decisions. When companies have large amounts of data that need organizing, the Pivot table is a lifesaver.

The Crosstab function is a useful tool used in PostgreSQL to create pivot tables that group data according to selected categories. The purpose of this article is to educate readers on the terms Crosstab function, Pivot table, and how to enable and use them in PostgreSQL.

Tablefunc Extension and Pivot Table

PostgreSQL is a free and open-source Relational Database Management System (RDBMS) used by many companies worldwide. RDBMS’s function in a structured manner and create tables that store information.

The tablefunc PostgreSQL extension comes bundled in the package when PostgreSQL is installed. This extension provides several functions that create Pivot tables.

A Pivot table, also known as a Cross-Tabulation table, is a summary table that groups, organizes, and aggregates information based on categories. Pivot tables format data into a way that is easy to read and interpreted.

For illustration purposes, imagine a table with the following data on the sales of different toys made by a toy company.

Name Gender Region Toy Type Units Sold
Tom Male North Puzzle 275
Danny Male South Lego 130
Jennifer Female East Rubik 75
Mike Male West Puzzle 200
Emily Female North Lego 150
William Male South Rubik 50
Leia Female East Puzzle 100
Benny Male West Lego 400

If we wanted to know how many units of each toy type were sold in each region, a Pivot table would be the solution.

With the Crosstab function, we can manipulate tables to display data in this format:

East North South West
Lego 150 0 130 400
Puzzle 100 275 0 200
Rubik 75 0 50 0

The Pivot table quickly shows us the total number of units sold for each toy type in each region.

Enabling and Using the Crosstab Function

To use the Crosstab function, we first need to enable the tablefunc extension in PostgreSQL. To do this, we use the CREATE EXTENSION command:


CREATE EXTENSION IF NOT EXISTS tablefunc;

The tablefunc extension creates a Pivot table-like query using the Crosstab function.

However, we need to specify certain parameters for the Crosstab function to work correctly. The Crosstab function requires a SELECT statement that returns four columns, which include a category column, a column showing the values we want to pivot, and the creates the column headings for the Pivot table.

Using the sales data example above, below is an example of a query that can create the Pivot table:


SELECT *
FROM crosstab(
'SELECT "Toy Type",
"Region",
SUM("Units Sold")
FROM sales
GROUP BY 1,2
ORDER BY 1,2',
'SELECT DISTINCT "Region"
FROM sales
ORDER BY 1'
) AS sales("Toy Type" varchar, "East" int, "North" int, "South" int, "West" int);

In the Crosstab function, the SELECT statement returns two columns with “Toy Type” and “Region” data, which forms the categories we want to pivot our data by. The third column, “Units Sold” that we want values for, to summarize in the Pivot table.

The fourth column is the column heading, “Region”, which serves as a reference to create column headings in the Pivot table. The final query returns a table with “Toy Type” and column headings on all the regions with the total units sold.

Conclusion

The Crosstab function is an essential tool for organizing and presenting information in a Pivot table format. As this article has shown, the Crosstab function requires a certain structure for the SELECT statement, which is vital to providing useful information.

Keep in mind that the Crosstab function also allows several helpful options for customizing the Pivot table, like ordering data and some more advanced formations. Hopefully, the information in this article has given you a good foundation for using the Crosstab function in PostgreSQL.

Creating Multiple Pivot Tables from One Data Set

The Crosstab function provides a simple way of creating Pivot tables. However, in some cases, we may need to create more than one Pivot table from the same data set.

This can save time and effort and gives us broader insight into the data, as we can analyze different aspects of the same data set.

In this article expansion, we will explore different examples of creating multiple Pivot tables from one data set, using the Crosstab function.

Example 1: Monthly Evaluation Averages

Suppose we manage a company, and we have a monthly evaluation report for our employees. We want to create a Pivot table that shows the average evaluation score for each month of the year.

In the previous section, we used sales data as an example. The general structure of the Crosstab function creates a template that is broad enough to work with evaluations:


SELECT *
FROM crosstab(
'SELECT extract(MONTH FROM "Evaluation Date") as month,
AVG("Evaluation Score")
FROM employee_evaluations
GROUP BY 1
ORDER BY 1',
'SELECT DISTINCT extract(MONTH FROM "Evaluation Date")
FROM employee_evaluations
ORDER BY 1'
) AS monthly_evaluations("Month" int, "Average Score" decimal);

In this example, the first query extracts the month from the “Evaluation Date” column and calculates the average score. The second query provides a unique list of months used in the Pivot table headings.

When creating a new Pivot table from the same data set, adjust the SELECT statement queries in the Crosstab function to reflect the information required.

Example 2: Handling Incomplete Data

When working with Crosstab functions, one challenge we may face is missing categories.

Imagine we have sales data for four regions: North, East, South, and West. However, due to data errors, we have incomplete data that excludes the South region.

When creating a Pivot table, we still want to include the South region, even if there is no sales data for that region at the moment.

To handle this, we can add a second parameter to the Crosstab function that specifies the expected column headings.

Here is an example of how one could modify the Crosstab function query to include all four regions, even when there is incomplete data:


SELECT *
FROM crosstab(
'SELECT "Toy Type",
"Region",
SUM("Units Sold")
FROM sales
GROUP BY 1,2
ORDER BY 1,2',
'SELECT "Region"
FROM (VALUES ('North'),('East'),('South'),('West'))
AS pivoted ("Region")'
) AS monthly_evaluations("Toy Type" varchar, "North" int, "East" int, "South" int, "West" int);

In this example, we provide the four regions as an array of VALUES with the expected column heading in double inverted commas. The query converts the four headings into a subquery that is used to pivot the table, even when there are no sales for the South region.

The outcome is a Pivot table with each region as a column heading that we can work with.

Exercise: Creating Pivot Table from Raw Data

To further understand how to use the Crosstab function in PostgreSQL, let’s do a quick exercise.

We will use sample weather data over 12 months, gathered from three different cities, to create a Pivot table showing the maximum temperature each month for every city.

Here is a sample table of raw weather data:

City Date Max Temperature
New York 2021-01-01 3C
New York 2021-02-01 2C
New York 2021-03-01 5C
Toronto 2021-01-01 -2C
Toronto 2021-02-01 -3C
Toronto 2021-03-01 0C
San Diego 2021-01-01 15C
San Diego 2021-02-01 14C
San Diego 2021-03-01 18C

To create the Pivot table, we need to extract the month from the “Date” column and create separate columns for each city.

Here’s how we can use the Crosstab function to achieve this:


SELECT *
FROM crosstab(
'SELECT extract(MONTH FROM "Date") as month,
"City",
MAX("Max Temperature")
FROM weather_data
GROUP BY 1,2
ORDER BY 1,2',
'SELECT DISTINCT "City"
FROM weather_data
ORDER BY 1'
) AS monthly_weather("Month" int, "New York" varchar, "Toronto" varchar, "San Diego" varchar);

In this query, we extract the month from the date using the extract function, and we select and group the information by month and city. We then use the Crosstab function, specifying the SELECT statement as the first parameter and the column headings as the second parameter.

The final output of the Pivot table shows the maximum temperature for each month in each respective city.

Conclusion

In conclusion, the Crosstab function in PostgreSQL allows for the creation of pivot tables that display essential information from data sets. With the Crosstab function queries, we can create different styles of Pivot tables, using the same data sets without making any changes in the original data format.

In this article extension, we learned how to create multiple Pivot tables from one data set and handled incomplete data when creating Pivot tables. We also performed an exercise that involved creating a Pivot table from raw data to provide another example that readers could use to further understand the Crosstab function.

In conclusion, the Crosstab function in PostgreSQL is a useful tool for creating Pivot tables that organize data according to selected categories. This article emphasized the importance of enabling and using the Crosstab function, including the tablefunc extension, setting restrictions and parameters for creating Pivot tables, and creating multiple Pivot tables using the same data set.

The importance of handling incomplete data was also highlighted. By understanding the Crosstab function, users can create pivot tables that are essential for decision-making purposes and highlighting insights that might not otherwise be visible.

Overall, working with Crosstab functions is a must-have skill set for anyone working with data analysis and management.

Popular Posts