Adventures in Machine Learning

Mastering Common Table Expressions (CTEs) in SQL: A Comprehensive Guide

When working with SQL queries, there are instances where you need to perform multiple operations on different datasets before combining the results to produce the final output. For example, you may want to filter records based on certain criteria, group them by certain attributes, and then apply an aggregate function to get the desired output.

Common Table Expressions (CTEs)

This is where Common Table Expressions (CTEs) come in handy. CTEs allow you to define a temporary result set within a SQL query that you can then reference multiple times in subsequent operations.

This article explores the basics of CTEs, including their definition, types, and syntax.

1. What is a CTE?

A Common Table Expression (CTE) is a temporary named result set that you can reference multiple times in a SQL statement.

The most common use of CTEs is to simplify complex queries that involve multiple operations on a single dataset. CTEs are defined using the WITH statement, followed by the expression_name, which is the identifier for the CTE.

2. The Syntax for Defining a CTE

The syntax for defining a CTE looks like this:

WITH expression_name AS (subquery)

The subquery includes the SELECT statement that defines the result set of the CTE. For example, let’s say we have a table called “sales” that contains data on sales made by different employees in a company.

We can create a CTE that filters out sales made by employees who are no longer active in the company, like this:

WITH active_sales AS ( SELECT * FROM sales WHERE employee_status = 'active' )

We can then reference the “active_sales” CTE in subsequent operations, such as grouping and aggregating the data to get the total sales made by each active employee.

3. Types of CTEs

There are two types of CTEs: non-recursive CTEs and recursive CTEs.

  • Non-recursive CTEs are the most common type of CTE. They are defined using a simple SELECT statement that does not reference the CTE itself.
  • The subquery in a non-recursive CTE is evaluated only once and the result set is used in subsequent operations. Recursive CTEs, on the other hand, are used to define a result set that references itself.

Recursive CTEs are useful when you need to perform operations on hierarchies or tree-like structures. For example, let’s say we have a table called “employees” that contains data on the employees in a company, including their ID, name, and manager ID.

We can use a recursive CTE to construct a hierarchy of employees based on their reporting structure. The CTE would look something like this:

WITH employee_hierarchy (id, name, manager_id, level) AS ( SELECT id, name, manager_id, 0 FROM employees WHERE manager_id IS NULL -- select top-level employees UNION ALL SELECT e.id, e.name, e.manager_id, level + 1 FROM employees e INNER JOIN employee_hierarchy eh ON e.manager_id = eh.id )

The first SELECT statement selects all top-level employees (those who do not have a manager). The UNION ALL operator combines the result set with the result set of the second SELECT statement, which selects all employees who report to the top-level employees.

The JOIN condition ensures that only the direct reports of each manager are selected. The “level” column is used to keep track of the depth of each employee in the hierarchy.

We can then use this hierarchy to perform operations such as aggregating the total number of employees at each level, or calculating the total salary of all employees in the hierarchy.

4. Syntax of CTEs

Basic CTE syntax

As mentioned earlier, CTEs are defined using the WITH statement, followed by the expression_name and the subquery that defines the result set of the CTE. Here is an example of a basic CTE syntax:

WITH expression_name AS (subquery)

Using CTEs in SELECT queries

Once you have defined a CTE, you can reference it in subsequent SQL statements just like you would any other table or view. Here is an example of a SELECT query that references a CTE:

WITH active_sales AS ( SELECT * FROM sales WHERE employee_status = 'active' ) SELECT employee_name, SUM(sales_amount) as total_sales FROM active_sales GROUP BY employee_name

Note that in this example, we are grouping the data by the “employee_name” column, which is a column in the “sales” table, not the “active_sales” CTE. This is because the CTE only defines a subset of the “sales” table, and the subsequent SELECT statement operates on this subset.

Using multiple CTEs in a query

You can use multiple CTEs in a single SQL statement by separating them with commas, like this:

WITH cte1 AS (subquery1), cte2 AS (subquery2), cte3 AS (subquery3) SELECT

Note that you can also omit the WITH statement when defining multiple CTEs in a query, like this:

WITH cte1 AS (subquery1), cte2 AS (subquery2) SELECT

This syntax can make your SQL queries easier to read and understand, particularly if you are working with complex hierarchical structures or other advanced data models.

5. Benefits and Uses of CTEs

Advantages of using CTEs

  • CTEs can improve the readability and maintainability of SQL code by allowing you to define named result sets that can be referenced throughout the code. This makes it easier to break down complex queries into smaller, more manageable parts.
  • Using CTEs provides you with more flexibility in how you structure your SQL code. By separating out different parts of the query into separate CTEs, you can experiment with different combinations and refine your data model without having to rewrite your entire SQL statement.
  • Using CTEs also allows you to reuse the same result set multiple times within a query. This can be particularly useful when working with complex hierarchical structures or when performing multi-level aggregation.

Common uses for CTEs

CTEs are commonly used for hierarchical structures, providing an efficient way to organize complex queries into manageable sections and are ideal for performing multi-level aggregation on large data sets.

6. Examples of CTEs

Example 1: Calculating average grades for exceptional students

Let’s say we have a database with tables for students, subjects, and their respective exams. We want to find out the average grades of students who have scored above a certain threshold in all subjects.

WITH exceptional_students AS ( SELECT s.student_id, AVG(e.exam_score) AS avg_score FROM students s JOIN exams e ON s.student_id = e.student_id GROUP BY s.student_id HAVING MIN(e.exam_score) > 90 ) SELECT student_id, avg_score FROM exceptional_students;

In this example, we’re using a CTE to first filter out students who have scored less than 90 in any subject. We then group the result set by student_id and calculate the average score for each student.

Finally, we select the student_id and avg_score columns from the CTE.

Example 2: Finding subjects with passing grades and showing their averages

Let’s say we have a similar database with tables for students, subjects, and exams. This time we want to find out the average grade for each subject that has at least one student who has passed the subject.

WITH passing_subjects AS ( SELECT subject_id FROM exams WHERE exam_score >= 60 GROUP BY subject_id ) SELECT s.subject_id, AVG(e.exam_score) AS avg_score FROM subjects s JOIN exams e ON s.subject_id = e.subject_id WHERE s.subject_id IN (SELECT subject_id FROM passing_subjects) GROUP BY s.subject_id;

In this example, we’re using a CTE to first find all subjects where at least one student has passed the subject. We then join the subjects table with exams table and filter the result set by passing subjects’ IDs. Finally, we calculate the average score for each subject and display it in the output.

Example 3: Using CTEs for multi-level aggregation

In addition to recursive and non-recursive CTEs, we can also use CTEs to perform multi-level aggregation. This is particularly useful when working with hierarchical structures like trees or graphs.

WITH sales_hierarchy AS ( SELECT manager_id, employee_id, sales_amount FROM sales WHERE manager_id = 1 -- top-level manager UNION ALL SELECT sh.manager_id, s.employee_id, s.sales_amount FROM sales s JOIN sales_hierarchy sh ON s.manager_id = sh.employee_id ) SELECT manager_id, SUM(sales_amount) AS total_sales FROM sales_hierarchy GROUP BY manager_id;

In this example, we’re using a CTE to first select all sales made by employees who report directly to a top-level manager (manager_id = 1). We then join the sales_hierarchy CTE with the sales table to select the sales made by employees who report to the first-level employees.

This process can be repeated to construct multi-level hierarchies.

7. Conclusion and Further Resources

In this article, we’ve explored the basics of Common Table Expressions (CTEs) in SQL.

We’ve seen how CTEs can simplify complex queries that involve multiple operations on a single dataset, and how they can be used to perform hierarchical and recursive operations. We’ve also covered the syntax and types of CTEs and provided examples of their applications.

To recap, a CTE is a temporary named result set that you can reference multiple times within a SQL statement. It is defined using the WITH statement, followed by the expression_name and the subquery that defines the result set of the CTE.

CTEs come in two types: non-recursive and recursive. Non-recursive CTEs are used to simplify complex queries that involve multiple operations on a single dataset.

Recursive CTEs are used when you need to perform operations on hierarchies or tree-like structures. We’ve also seen how CTEs can be used in practical situations.

With practice, you can learn to identify the business problems that CTEs can solve and discover new ways to use them in your queries. The more you use CTEs, the more efficient and organized your code will become.

Further resources on CTEs can be found in SQL documentation provided by Database Management Systems (DBMS) such as Oracle, SQL Server, MySQL, and PostgreSQL. There are also numerous online courses and tutorials that provide more in-depth information on CTEs and their applications.

By taking advantage of these resources, you’ll be able to improve your SQL skills and become an expert at using CTEs in your queries.

Popular Posts