Adventures in Machine Learning

Mastering Ranking in SQL: RANK and DENSE_RANK Functions Explained

Ranking Rows in SQL: The RANK and DENSE_RANK Functions

As a data analyst or SQL developer, ranking rows in a database is a routine task. You may want to know the top-performing salesperson, the most profitable customer, or the highest-ranked product.

SQL database management systems provide two functions to partition data based on a specified column and then rank the rows in the partition. These functions are called RANK and DENSE_RANK, respectively.

In this article, we’ll explore how to use these functions and what makes them different.

Overview of the RANK and DENSE_RANK Functions

The RANK function assigns a unique integer value to each distinct value in a specific column, based on the order specified in the ORDER BY clause. The ranking starts from one and increases sequentially.

If two or more rows have the same value, they are assigned the same rank, and the next rank is skipped—the number of skipped ranks equals the number of tied rows. For example,

Suppose we have a table of students and their scores in a math exam.

Student Score
Alice 90
Bob 80
Claire 70
David 80
Emily 90

To rank the students by their scores, we can use the RANK function as follows:

SELECT Student, Score, RANK() OVER (ORDER BY Score DESC) AS 'Rank'
FROM Students;

The result will be:

Student Score Rank
Alice 90 1
Emily 90 1
Bob 80 3
David 80 3
Claire 70 5

Both Alice and Emily are tied for first place, so their RANK values are 1, and the next rank is skipped. Bob and David are ranked third, so the next rank (fourth) is skipped.

The DENSE_RANK function works similarly, but it does not skip any rank when there are tied rows. Instead, it assigns the same rank to all the rows with the same value and then assigns the next rank to the following distinct value.

For example:

SELECT Student, Score, DENSE_RANK() OVER (ORDER BY Score DESC) AS 'Rank'
FROM Students;
Student Score Rank
Alice 90 1
Emily 90 1
Bob 80 2
David 80 2
Claire 70 3

In this result, both Alice and Emily have a rank of 1, and the next rank is assigned to Bob and David, who have a score of 80.

Difference between RANK and DENSE_RANK

The primary difference between RANK and DENSE_RANK is handling equal ranking rows. RANK skips the next rank after assigning a non-unique rank to equal rows, which may cause gaps in the rankings.

DENSE_RANK assigns the same rank to the equal rows and does not skip any ranks, which means that the sequence of ranks is dense and continuous.

For example, in the previous student table, let’s assume that there are two students who scored 85, instead of having two students who scored 90.

Student Score
Alice 90
Bob 80
Claire 70
David 80
Emily 85
Frank 85

If we use the RANK function, the result will be:

SELECT Student, Score, RANK() OVER (ORDER BY Score DESC) AS 'Rank'
FROM Students;
Student Score Rank
Alice 90 1
Emily 85 2
Frank 85 2
Bob 80 4
David 80 4
Claire 70 6

We can see that RANK assigns a non-unique rank to equal rows and skips the next rank. In this case, the ranks of 3 and 5 are skipped.

On the other hand, if we use the DENSE_RANK function, the result will be:

SELECT Student, Score, DENSE_RANK() OVER (ORDER BY Score DESC) AS 'Rank'
FROM Students;
Student Score Rank
Alice 90 1
Emily 85 2
Frank 85 2
Bob 80 3
David 80 3
Claire 70 4

The DENSE_RANK assigns the same rank to the equal rows and does not skip any rank. In this case, all the ranks are dense and continuous.

Partitioning records into groups

In some cases, you may need to partition the records into groups by a specific column (usually categorical) and then rank the records within each group. For example, you may want to find the top-performing salesperson in each region or the most profitable product in each category.

You can partition the data into groups using the PARTITION BY clause in the OVER() clause. For example:

Suppose we have a table of sales data with columns: region, salesperson, product, and revenue.

Region Salesperson Product Revenue
East Alice Apples 1000
East Alice Oranges 2000
East Bob Apples 1500
East Bob Oranges 1200
West Claire Apples 2200
West Claire Oranges 2300

To rank the salespersons by their revenue within each region, we can use the RANK function with the PARTITION BY clause as follows:

SELECT Region, Salesperson, Revenue, RANK() OVER(PARTITION BY Region ORDER BY Revenue DESC) AS 'Rank'
FROM Sales;
Region Salesperson Revenue Rank
East Alice 2000 1
East Bob 1500 2
East Alice 1000 3
East Bob 1200 4
West Claire 2300 1
West Claire 2200 2

In this result, we can see that the ranking is done for each region separately based on revenue.

Ranking records within each group

Similar to the previous example, we can utilize the DENSE_RANK function to rank the records within each group with the same syntax.

SELECT Region, Salesperson, Revenue, DENSE_RANK() OVER(PARTITION BY Region ORDER BY Revenue DESC) AS 'Rank'
FROM Sales;
Region Salesperson Revenue Rank
East Alice 2000 1
East Bob 1500 2
East Alice 1000 3
East Bob 1200 4
West Claire 2300 1
West Claire 2200 2

By using the DENSE_RANK function, we can see that there are no skipped ranks, and the ranks are continuous and dense within each group.

Conclusion

SQL provides functions to partition data based on a specified column and rank the rows within each partition. The RANK and DENSE_RANK functions are widely used for ranking tasks.

RANK assigns a unique rank to each distinct value in an ordered set of values and skips the next rank if there are equal values. DENSE_RANK assigns the same rank, without skipping any rank if there are equal values.

We can also use these functions to rank the records within each group defined by partitioning the data based on a specified column. The syntax of these functions is simple, by using the OVER() clause with the PARTITION BY clause and the ORDER BY clause.

Applying RANK and DENSE_RANK to a Database Example: Displaying Relevant Columns in a Table

Let’s take a deeper dive into the application of the RANK and DENSE_RANK functions in SQL queries. In this section, we will demonstrate how these functions can be used to display relevant columns in a table.

Displaying Relevant Columns in the Table

When working with large data sets, it’s often necessary to filter out unnecessary information to get a better understanding of the data. For example, let’s consider a database that keeps track of customer orders, including the customer’s name, order date, and the products they ordered.

We may want to rank the most popular products sold during a specified month.

Let’s assume we have a table named orders with the following structure:

Order_Id Customer_Name Order_Date Product_Name Quantity
1111 Alice 2022-02-01 Apples 2
1112 Bob 2022-02-02 Pears 4
1113 Claire 2022-02-03 Oranges 6
1114 David 2022-02-04 Apples 3
1115 Emily 2022-02-05 Apples 5

We can filter the table using the WHERE clause to select only the orders made in February 2022.

SELECT *
FROM orders
WHERE Order_Date >= '2022-02-01'
AND Order_Date < '2022-03-01';

The result will be:

Order_Id Customer_Name Order_Date Product_Name Quantity
1111 Alice 2022-02-01 Apples 2
1112 Bob 2022-02-02 Pears 4
1113 Claire 2022-02-03 Oranges 6
1114 David 2022-02-04 Apples 3
1115 Emily 2022-02-05 Apples 5

Now, let’s say we want to rank the products by the total quantity sold. To do this, we need to aggregate the quantity column for each distinct product, sort the result in descending order, and then apply the RANK or DENSE_RANK function.

However, we don’t need all the columns displayed in the result. We just need the product name and the corresponding rank.

We can achieve this by modifying the SELECT clause as follows:

SELECT Product_Name, RANK() OVER(ORDER BY SUM(Quantity) DESC) AS 'Rank'
FROM orders
WHERE Order_Date >= '2022-02-01'
AND Order_Date < '2022-03-01'
GROUP BY Product_Name;

The result will be:

Product_Name Rank
Apples 1
Oranges 2
Pears 3

Now we have a table containing only the product names and their corresponding ranks, which is the information we required.

Implementing RANK and DENSE_RANK Functions in SQL Query

Now, let’s consider the implementation of the RANK and DENSE_RANK functions in SQL queries.

The RANK and DENSE_RANK functions are window functions that operate on a subset of data defined by an OVER() clause.

The OVER() clause defines the partitioning of the data and the ordering of the rows within each partition. The syntax of the RANK and DENSE_RANK functions is as follows:

RANK() OVER ( [PARTITION BY partition_expression, ...
] order_by_clause )

DENSE_RANK() OVER ( [PARTITION BY partition_expression, ... ] order_by_clause )

The partition_expression specifies the column or columns that we use to partition the data into distinct subsets.

The order_by_clause specifies the column or columns that we use to order the rows within each partition. The RANK function assigns a unique integer value to each distinct value in the order_by_clause, based on the specified order.

The ranking starts from one and increases sequentially, skipping the next rank when there are ties. The DENSE_RANK function operates similarly to the RANK function, but it assigns the same rank to all the rows with the same value, and the next rank is assigned to the following unique value.

This means that the sequence of ranks is dense and continuous, without any skipped ranks. Let’s explore how to implement these functions to rank the products sold during a specified month:

SELECT Product_Name, RANK() OVER(ORDER BY SUM(Quantity) DESC) AS 'Rank'
FROM orders
WHERE Order_Date >= '2022-02-01'
AND Order_Date < '2022-03-01'
GROUP BY Product_Name;

This query partitions the data by the Product_Name and orders the rows by the sum of the Quantity column in descending order. The RANK function assigns a unique rank to each distinct sum of the Quantity column.

SELECT Product_Name, DENSE_RANK() OVER(ORDER BY SUM(Quantity) DESC) AS 'Rank'
FROM orders
WHERE Order_Date >= '2022-02-01'
AND Order_Date < '2022-03-01'
GROUP BY Product_Name;

This query uses the DENSE_RANK function to rank the data in a dense, continuous sequence, without any skipped ranks.

Conclusion

In conclusion, the RANK and DENSE_RANK functions are useful tools in SQL queries to perform ranking tasks, especially when working with large datasets. These functions allow for easy filtering of unnecessary information while presenting the relevant information in a clear and concise manner.

Appropriate use of the ORDER BY, OVER, and PARTITION BY clauses can help to further refine queries and achieve more accurate or specific results. In conclusion, the RANK and DENSE_RANK functions are crucial tools in ranking tasks when dealing with extensive datasets in SQL queries.

These functions partition data based on specified columns and rank rows within each partition, allowing for easy filtering of irrelevant information. By utilizing ORDER BY, OVER, and PARTITION BY clauses, SQL developers and data analysts can achieve more accurate and specific outcomes.

Using these functions can help in finding the highest, lowest or any other sorted data in the table of a specified database. A clear understanding of these functions can provide insights into the data and help make better and informed decisions.

Popular Posts