Mastering Common Table Expressions (CTEs): A Guide for Developers and Data Analysts

Common Table Expressions (CTEs): A Comprehensive Guide

As data analysts or developers, we often find ourselves writing complex SQL queries with several subqueries, joins, or nested queries. While these queries may get the job done, they can be cumbersome to read and may take a long time to execute.

Common Table Expressions (CTEs) provide a solution by allowing us to create temporary result sets that we can reference within a query. In this article, we’ll be taking a deep dive into CTEs, their syntax, best practices for organizing code, and using CTEs for data analysis.

Defining Common Table Expressions (CTEs)

A Common Table Expression (CTE) is a query expression that can be referred to within another query. It starts with the WITH keyword, followed by the CTE name and a SELECT statement that defines the result set.

Here’s an example:

WITH sales AS (
    SELECT order_date, SUM(total) AS revenue
    FROM orders
    GROUP BY order_date
)
SELECT order_date, revenue
FROM sales

In this example, we’ve created a CTE named sales that calculates the total revenue for each order date from a table named orders. We then reference this CTE within the outer query to get a list of order dates and their corresponding revenues.

Comparing CTEs and Subqueries

Subqueries are another way of creating temporary result sets within a query. So, what’s the difference between CTEs and subqueries?

CTEs are more readable, easier to maintain, and can be reused within the same query. On the other hand, subqueries can only be used once and can make the code messy and difficult to read.

Let’s take a look at an example to illustrate this point. Suppose we need to list all freelancers and their job position types from a table named employees.

Here’s how we could do it with a subquery:

SELECT fullname, position_type 
FROM employees
WHERE employee_id IN (
    SELECT employee_id 
    FROM freelancers
)

While this query may give us the desired result, it uses a subquery in the WHERE clause, which makes it harder to read and maintain. Here’s how we could rewrite this query using a CTE:

WITH freelancers AS (
    SELECT employee_id 
    FROM employees 
    WHERE position_type = 'Freelancer'
)
SELECT fullname, position_type

FROM employees
WHERE employee_id IN (
    SELECT employee_id
    FROM freelancers
)

This query achieves the same result as the previous one, but it’s much easier to read and maintain. We’ve created a CTE named freelancers that contains all employee IDs with a position type of Freelancer.

We’ve then referenced this CTE within the outer query to get the desired result.

Best Practices for Organizing CTEs

Now that we’ve seen the benefits of using CTEs over subqueries, let’s discuss some best practices for organizing CTEs in code.

Making Code Readable and Organized

The main advantage of using CTEs is that they make code more readable and organized. You can use descriptive names for your CTEs that explain what they do.

Make sure to indent your code properly, use white spaces, and add comments where necessary. This will make it easier for other developers to understand and modify your code.

Avoiding Repeating Subqueries

Repeating subqueries can slow down the execution time of your queries, especially if the subquery is complex. With CTEs, you can create a temporary result set once and reference it multiple times within the same query.

This can significantly improve the performance of your queries.

Replacing Subqueries in the FROM Clause with CTEs

Subqueries in the FROM clause can be difficult to read and maintain, especially if they return a large result set. With CTEs, you can create a temporary result set and join it with other tables in the outer query.

Using CTEs for Data Analysis

Let’s explore some scenarios where we can use CTEs for data analysis.

Scenario 1: Listing Freelancers and Job Position Types

Suppose we have a table named employees that contains information about employees, including their position types.

We want to list all freelancers and their job position types. Here’s how we could do it with a subquery:

SELECT fullname, position_type 
FROM employees
WHERE employee_id IN (
    SELECT employee_id 
    FROM employees 
    WHERE position_type = 'Freelancer'
)

This query works fine, but as we’ve discussed earlier, it can be messy and difficult to read. Here’s how we could rewrite this query using a CTE:

WITH freelancers AS (
    SELECT employee_id 
    FROM employees 
    WHERE position_type = 'Freelancer'
)
SELECT fullname, position_type

FROM employees
WHERE employee_id IN (
    SELECT employee_id
    FROM freelancers
)

This query uses a CTE named freelancers to get all employee IDs with a position type of Freelancer and then joins it with the employees table to get the desired result. This query is much cleaner and easier to read.

Scenario 2: Calculating Freelancers’ Average Daily Hours Worked

Suppose we have a table named work_hours that contains the number of hours worked by each freelancer on each day. We want to calculate the average daily hours worked by each freelancer.

Here’s how we could do it with a subquery:

SELECT employee_id, AVG(daily_hours)
FROM (
    SELECT employee_id, DATE(work_date) AS work_date, SUM(hours_worked) AS daily_hours
    FROM work_hours
    GROUP BY employee_id, DATE(work_date)
) subquery
GROUP BY employee_id

This query works fine, but the subquery in the FROM clause can be difficult to read and maintain. Here’s how we could rewrite this query using a CTE:

WITH daily_hours AS (
    SELECT employee_id, DATE(work_date) AS work_date, SUM(hours_worked) AS daily_hours
    FROM work_hours
    GROUP BY employee_id, DATE(work_date)
)
SELECT employee_id, AVG(daily_hours)
FROM daily_hours
GROUP BY employee_id

This query uses a CTE named daily_hours to calculate the daily hours worked by each freelancer and then joins it with the employees table to get the desired result. This approach is much cleaner and easier to read than using a subquery.

Conclusion

In conclusion, CTEs are a powerful tool for creating temporary result sets within a query. They make code more readable and maintainable and can significantly improve the performance of your queries.

By following best practices for organizing CTEs and using them for data analysis, you can write SQL queries that are efficient, easy to read, and easy to maintain. In summary, Common Table Expressions (CTEs) are a useful tool for creating temporary result sets within a query that are easy to read, maintain, and reuse.

CTEs offer several advantages over subqueries, including improved performance and readability. By following best practices for organizing CTEs and using them for data analysis, developers can optimize their code and achieve their desired results efficiently.

With the help of CTEs, developers and data analysts can write SQL queries that are well-organized, efficient, and easy to understand. Employing CTEs can save time and effort and improve the overall quality of the code.

Adventures in Machine Learning