Adventures in Machine Learning

Organize and Simplify SQL Queries with Common Table Expressions

Common Table Expressions: Improving Query Organization and Readability

Are you tired of complicated and unreadable SQL queries? Look no further than common table expressions (CTEs).

CTEs serve as a temporary result set in a WITH query, allowing for convenient and logical organization of query logic. In this article, we will explore the syntax and benefits of CTEs, as well as provide a number of business use cases for this powerful tool.

What are Common Table Expressions?

A common table expression (CTE) is a temporary result set that can be defined in the FROM clause of a SELECT, UPDATE, INSERT, or DELETE statement.

CTEs are defined using the WITH query, which begins with the keyword “WITH” followed by the name of the CTE, and the body of the CTE.

Here is an example of the syntax for a CTE:

WITH cte_name (column1, column2, ...) AS (
    SELECT column1, column2, ...
    FROM table_name
    WHERE condition
  )
  SELECT *
  FROM cte_name;

CTEs are incredibly useful for organizing and breaking down complex queries. By gathering all relevant data into a temporary table, we can more easily understand, test and modify our queries.

Why Choose CTEs Over Subqueries?

Many developers opt to use subqueries instead of CTEs, but they often come with a few downsides.

Subqueries can quickly become confusing and unreadable and can also lead to performance issues.

On the other hand, CTEs provide several benefits, such as improved query organization, readability, and better alignment with human logic.

With CTEs, we can write queries that are much easier to read and maintain, allowing us to devote our time to other tasks. CTE’s also offer the unique advantage of easily handling recursive queries, where a query refers back to itself.

Due to its unique syntax and table structure, CTEs can help developers think through these recursive queries and execute them with precision.

Business Use Cases for CTEs

Example 1: Calculating Average Bonus for Each Employee Position

Let’s assume we have a bonus_jan table with information about employee bonuses for a retail bank.

We want to calculate the average bonus for each position using a CTE.

First, we’ll define the CTE called ‘avg_position‘:

WITH avg_position AS (

    SELECT position, AVG(bonus) AS avg_bonus
    FROM bonus_jan
    GROUP BY position
  )
  SELECT *
  FROM avg_position;

By breaking up our query into distinct CTEs, we can clearly understand each step of our calculation, increasing readability and reducing potential errors.

Example 2: Using Multiple CTEs to Compare Bonus Averages

In this case, we’ll extend the previous example and compare the bonus for each employee to the average for their position and region.

We’ll do so by using multiple CTEs. We start by defining the CTEs named ‘avg_position‘ and ‘avg_region‘:

WITH avg_position AS (

    SELECT position, AVG(bonus) AS avg_bonus
    FROM bonus_jan
    GROUP BY position
  ), 
  
  avg_region AS (
    SELECT region, AVG(bonus) AS avg_bonus
    FROM bonus_jan
    GROUP BY region
  )

Next, we join the two CTEs with the inner join:

WITH avg_position AS (

    ... ), 
  
  avg_region AS (
    ... )
  SELECT b.employee_id, b.position, b.region, b.bonus, 
    ap.avg_bonus AS avg_pos_bonus, ar.avg_bonus AS avg_reg_bonus
  
  FROM bonus_jan b
  INNER JOIN avg_position ap ON b.position = ap.position
  INNER JOIN avg_region ar ON b.region = ar.region;

By using CTEs, we can easily break down our query, making it more readable and straightforward.

Example 3: Evaluating Performance of Outlets in Retail Network

This example involves a bit of nested CTEs. Let’s say we have a sales table containing information about the sales performance of outlets in a retail network.

We want to evaluate the average performance of each outlet, as well as determine which outlets had the minimum and maximum bonuses. We’ll begin by defining a CTE named ‘avg_per_outlet‘, which calculates the average monthly bonus for each outlet:

WITH avg_per_outlet AS (

    SELECT outlet_id, AVG(bonus) AS avg_bonus
    FROM sales
    GROUP BY outlet_id
  )

Next, we’ll use nested CTEs to determine which outlet had the minimum and maximum bonuses. We start by defining a separate CTE called ‘min_bonus_outlet‘, which calculates the minimum bonus value for each outlet:

WITH avg_per_outlet AS (

    ... ), 
  
  min_bonus_outlet AS (
    SELECT outlet_id, MIN(bonus) AS min_bonus
    FROM sales
    GROUP BY outlet_id
  )

Then, we define another CTE called ‘max_bonus_outlet‘, which calculates the maximum bonus value for each outlet:

WITH avg_per_outlet AS (

    ... ), 
  
  min_bonus_outlet AS (
    ... ), 
  
  max_bonus_outlet AS (
    SELECT outlet_id, MAX(bonus) AS max_bonus
    FROM sales
    GROUP BY outlet_id
  )

Finally, we join these three CTEs using CROSS JOIN to obtain the desired results:

WITH avg_per_outlet AS (

    ... ), 
  
  min_bonus_outlet AS (
    ... ), 
  
  max_bonus_outlet AS (
    ... )
  SELECT a.outlet_id, a.avg_bonus, b.min_bonus, c.max_bonus 
  
  FROM avg_per_outlet a 
  CROSS JOIN min_bonus_outlet b 
  CROSS JOIN max_bonus_outlet c
  WHERE a.outlet_id = b.outlet_id AND a.outlet_id = c.outlet_id;

Through the use of nested CTEs and CROSS JOIN, we can break down a complex query into logical smaller parts that are easier to understand and modify.

Conclusion

In many business scenarios, large datasets and complex queries are the norm. However, by utilizing common table expressions, we can make our queries more readable, easier to understand, and maintainable.

By introducing the concept and benefits of CTEs, along with tangible examples, we hope we have laid the groundwork for you to explore their usefulness in your own projects and queries.

3) Recursive CTEs

Recursive CTEs are a type of CTE that allow queries to reference itself, enabling hierarchical queries. Hierarchical data structures are ubiquitous in business use cases – for example, a company’s organizational chart, a product’s category hierarchy, or a social network’s graph.

Before the advent of recursive CTEs, it was challenging to work with hierarchical data as queries required many self-joins or customized solutions. However, using a Recursive CTE, we can transform this spaghetti code mess into something that is much more straightforward, easy to understand, and even elegant.

A recursive CTE is a type of common table expression that references itself within its definition. It is defined using a UNION operator that combines two queries, one of which is the original query, and the other recursively refers to the CTE:

WITH RECURSIVE cte_name (column1, column2, ...) AS (
    SELECT column1, column2, ...
    FROM table_name
    WHERE condition
    UNION
    SELECT column1, column2, ...     FROM cte_name
    WHERE condition
  )
  SELECT *
  FROM cte_name;

Recursive CTEs work by breaking down a query into a base case and a recursive case.

The base case refers to the query that provides the initial data, while the recursive case refers to the query that cites the CTE, generating new rows until the statement is false.

Recursive CTEs are widely used in PostgreSQL and Oracle databases to apply recursive queries to hierarchical data.

A great example of hierarchical data that Recursive CTEs in Oracle can be used for is the employee data in a company.

To understand Recursive CTEs better, it is often helpful to practice with interactive exercises, explore various examples, and gain a deeper understanding of the concept.

LearnSQL.com offers a free course on Recursive Queries that is perfect for learning more about how to use Recursive CTEs in query building. In the Recursive Queries course, students will tackle real-world examples and learn how to use Recursive CTEs to solve challenging problems.

They are guided through the interactive exercises that provide instant feedback and help build a strong understanding of how Recursive CTEs work.

4) Reminder to Practice

To recap, Common Table Expressions (CTEs) allow for more organized, readable, and maintainable queries, as well as improved alignment with human logic. By utilizing CTEs, we can unlock our potential to write efficient SQL queries that maximize the value of the underlying databases.

Recursion introduces a powerful feature that extends the reach of CTEs by enabling hierarchical queries. Recursive CTEs provide a simple, elegant solution to problems that are notoriously difficult to solve.

We recommend the LearnSQL course on Recursive Queries which provides interactive exercises to improve your proficiency with the concept. At this point, it’s essential to keep in mind that the best way to develop proficiency with CTEs is through practice.

Learners should take the time to experiment with CTEs, develop projects that utilize them, and solve complex problems.

In conclusion, we hope this article serves as an excellent resource to help learners better understand the advantages of Common Table Expressions and Recursive CTEs, and the importance of practice for anyone looking to improve their SQL skills.

In summary, Common Table Expressions (CTEs) are powerful tools that help organize and simplify SQL queries. They provide several benefits such as improved query organization, readability, and better alignment with human logic.

Recursive CTEs, a type of CTE, enable hierarchical queries, which are essential for working with hierarchical data found in various business scenarios. By practicing with CTEs, developers can build their proficiency and unlock the potential of their databases.

It is vital to understand the advantages of CTEs and Recursive CTEs for anyone looking to improve their SQL skills and develop more efficient query solutions.

Popular Posts