Adventures in Machine Learning

Mastering Recursive CTEs in SQL: Empowering Your Data Analysis Skills

Common Table Expressions (CTEs): A Comprehensive Guide

What are CTEs?

If you’re a database developer, chances are you’ve come across Common Table Expressions (CTEs). CTEs are a powerful tool that can help you write more efficient and readable SQL queries.

A Common Table Expression is a named temporary result set that is defined within the context of a single SQL query and encapsulated within a WITH clause. It works like a subquery that can be referred to multiple times within a single query. CTEs are also known as “WITH query” syntax in SQL.

Types of CTEs

A CTE can be either non-recursive or recursive. Non-recursive CTEs are straightforward. They are used to define a temporary result set that can be reused within a single query. Recursive CTEs are slightly more complex, but also more powerful. They are used to build hierarchies or recursive sequences within a table.

Non-Recursive CTE Syntax

Syntax

Non-recursive CTEs can be written using the following syntax:


WITH CTE_name (column_list) AS (SELECT column_names FROM table_name WHERE condition)

The CTE_name is the name of the CTE, while column_list is an optional list of columns that define the temporary result set. The SELECT statement at the end of the query defines the data that will be contained within the CTE.

Recursive CTE Syntax

Syntax

Recursive CTEs are a bit more complex, but they provide a lot of functionality. Recursive CTEs are used to define hierarchies within a table.


WITH RECURSIVE CTE_name (column_list) AS (
SELECT column_names FROM table_name WHERE condition
UNION ALL
SELECT column_names FROM CTE_name WHERE condition
)

The initial SELECT statement selects the initial data set from which the hierarchy will be constructed, while the second SELECT statement recursively selects data based on the previous result set from the CTE. Note the use of the UNION ALL keyword between the two SELECT statements. This is crucial for recursive CTEs.

Examples of Recursive CTEs

Recursive CTEs can be used to build complex hierarchies within tables. For example, imagine you have an employee table with columns for employee_id, employee_name, and manager_id. You could use a recursive CTE to define the reporting structure of employees within your organization.

Example 1 – Finding Bosses and Hierarchical Level for All Employees

Using Data from the Employee Table

The employee table contains the following data:

employee_id employee_name manager_id
1 James NULL
2 Sally 1
3 Tom 2
4 Susan 2
5 Bob NULL

Writing the Recursive CTE to List Employees and Direct Bosses

Here’s the SQL query to list all employees and their direct bosses, along with the hierarchical level of each employee:


WITH RECURSIVE boss_hierarchy AS (
SELECT employee_name, manager_id, 1 AS hierarchy_level
FROM employee
WHERE manager_id IS NULL
UNION ALL
SELECT employee.employee_name, employee.manager_id, boss_hierarchy.hierarchy_level + 1
FROM employee
JOIN boss_hierarchy ON employee.manager_id = boss_hierarchy.employee_id
)
SELECT * FROM boss_hierarchy;

Let’s break this query down and see how it works.

  • We start by defining a recursive CTE called “boss_hierarchy”.
  • We select the columns we want to include in our final result set: employee_name, manager_id, and hierarchy_level. We set the initial value of hierarchy_level to 1.
  • In the first SELECT statement, we select all employees who don’t have a manager (i.e., those with a NULL value in the manager_id column). We set their hierarchy_level to 1, since they are at the top of the hierarchy.
  • In the second SELECT statement, we join the employee table back to the boss_hierarchy CTE on the manager_id column. This will give us all employees who have a direct manager and their corresponding boss from the previous level of the hierarchy.
  • We increment the hierarchy_level by 1 for each level of the hierarchy.

And that’s it! We’ve constructed a recursive CTE that outputs a hierarchical list of all employees and their bosses, along with the hierarchical level of each employee.

Example 2 – Finding the Investment Amount by Investor

Using Data from the Investment Table

Let’s say we have an investment table that contains the following data:

investment_id investment_name investment_amount number_of_investors
1 Investment A 500,000 2
2 Investment B 750,000 3
3 Investment C 1,000,000 4

Calculating the Amount per Investor Depending on Their Number

We want to calculate the investment amount per investor for each investment in the table. Since the number of investors can vary for each investment, we’ll use a recursive CTE to do so.

Here’s the SQL query to calculate the amount per investor:


WITH RECURSIVE investment_analysis AS (
SELECT investment_id, investment_name, investment_amount, number_of_investors, investment_amount / number_of_investors AS individual_amount
FROM investment
WHERE number_of_investors > 0
UNION ALL
SELECT investment_id, investment_name, investment_amount, number_of_investors - 1, investment_amount / (number_of_investors - 1) AS individual_amount
FROM investment_analysis
WHERE number_of_investors > 1
)
SELECT * FROM investment_analysis;

Let’s break down this query step-by-step:

  • We start by defining a recursive CTE called “investment_analysis”.
  • In the first SELECT statement, we select all investments with a positive number of investors and calculate the investment amount per investor for each investment. We store the result in a new column called “individual_amount”.
  • In the second SELECT statement, we recursively select from the investment_analysis CTE, subtracting one from the number of investors for each row. We also recalculate the investment amount per investor based on the new number of investors.
  • We keep repeating the second SELECT statement until the number_of_investors is 1, at which point the recursion stops.
  • Finally, we select all rows from the investment_analysis CTE to display the investment_id, investment_name, investment_amount, number_of_investors, and individual_amount for each investment.

By using the recursive CTE, we can calculate the investment amount per investor for each investment in the table, regardless of the number of investors.

Example 3 – Finding Routes Between Cities

Using Data from the Cities_Route Table

Let’s say we have a cities_route table that contains the following data:

start_city end_city distance
Groningen Assen 30
Assen Zwolle 60
Zwolle Amersfoort 50
Amersfoort Utrecht 25
Utrecht Amsterdam 20
Amsterdam Haarlem 25

Finding All Possible Routes from Groningen to Haarlem

We want to find all possible routes from Groningen to Haarlem, based on the distance data in the cities_route table. To do so, we’ll use a recursive CTE.

Here’s the SQL query to find all possible routes:


WITH RECURSIVE route_analysis AS (
SELECT start_city, end_city, distance, CAST(start_city AS TEXT) AS route
FROM cities_route
WHERE start_city = 'Groningen'
UNION ALL
SELECT cities_route.start_city, cities_route.end_city, cities_route.distance, route_analysis.route || ' -> ' || cities_route.end_city
FROM route_analysis
JOIN cities_route ON route_analysis.end_city = cities_route.start_city
)
SELECT * FROM route_analysis WHERE end_city = 'Haarlem' ORDER BY distance;

Let’s break down this query step-by-step:

  • We start by defining a recursive CTE called “route_analysis”.
  • In the first SELECT statement, we select all routes that start in Groningen. We also create a new column called “route” that will store the route concatenated as a string.
  • In the second SELECT statement, we recursively select from the route_analysis CTE, joining on the cities_route table where the start_city of the new row matches the end_city of the previous row. We also concatenate the end_city of the new row to the existing route string.
  • We keep repeating the second SELECT statement until we reach the end_city of Haarlem, at which point the recursion stops.
  • Finally, we select all routes that end in Haarlem and sort them by distance.

By using the recursive CTE, we can find all possible routes from Groningen to Haarlem, based on the distance data stored in the cities_route table.

Learning More About Recursive CTEs

If you’re interested in learning more about recursive CTEs in SQL and further developing your database management and analysis skills, there are several online courses and learning opportunities available.

Recursive Queries Course

One excellent resource for learning more about recursive queries in SQL is the Recursive Queries course on the online learning platform, Udemy. The course is taught by a SQL and database expert, and it offers a comprehensive introduction to recursive CTEs, their syntax, and how they can be used to solve complex analytical problems.

The course covers a wide range of topics related to recursive queries, including:

  • Understanding the structure of recursive queries and how they differ from other types of SQL queries.
  • Building recursive CTEs to solve common problems, such as hierarchies, relationships, and sequences.
  • Using recursive CTEs to optimize queries and improve performance.
  • Creating complex and powerful queries using recursion and other advanced SQL techniques.

The course is suitable for beginners and intermediate students who are interested in learning how to use recursive queries and CTEs in SQL. It includes a mix of video lectures, exercises, and quizzes to help you test your knowledge and master the material.

Advanced SQL Course Track

If you’re interested in taking your SQL skills to the next level and learning more advanced SQL techniques, you may want to consider pursuing an advanced SQL course track. There are several online learning platforms and courses that offer advanced training in SQL, including topics such as window functions, GROUP BY extensions, and beyond.

Some popular advanced SQL courses and training programs include:

  • SQL for Data Analysis – This course, offered by Udacity, is designed for SQL beginners who want to build advanced analytic skills using SQL. It covers a wide range of topics, including SQL syntax, window functions, and advanced SQL techniques such as recursive CTEs.
  • Advanced SQL for Business Intelligence – This course, offered by Pluralsight, is designed for experienced SQL developers who want to take their skills to the next level. It covers advanced SQL topics such as window functions, subqueries, and GROUP BY extensions, and provides hands-on exercises and real-world examples to help you master the material.
  • Advanced SQL Course – This course, offered by Khan Academy, is designed for intermediate SQL learners who want to build advanced SQL skills for data analysis and management. It covers a wide range of advanced SQL topics, including CTEs, window functions, and advanced subqueries.

By taking an advanced SQL course or training program, you can develop the skills and knowledge you need to tackle more complex data management and analysis problems, and become a more versatile and valuable member of your organization’s data team.

Conclusion

Mastering recursive CTEs in SQL is a powerful tool that can help you to build increasingly complex and powerful queries and become a more skilled and valuable database developer. By taking advantage of online courses and training opportunities, such as the Recursive Queries course on Udemy or advanced SQL course tracks, you can deepen your understanding of CTEs and other advanced SQL techniques and take your skills to the next level.

In conclusion, the article explores the benefits of Common Table Expressions (CTEs), with a particular focus on recursive CTEs. Recursive CTEs are a powerful tool that enables developers to build complex hierarchies and perform advanced data analysis. The examples provided demonstrate how they can be used to perform calculations and find routes between cities.

Enrolling in online courses such as the Recursive Queries course on Udemy or the Advanced SQL course track can deepen understanding and proficiency in using CTEs in SQL. Mastering recursive CTEs in SQL helps developers to build better, more efficient queries and become more valuable members of their organizations’ data teams.

Popular Posts