Mastering Ranking in SQL: RANK and DENSE_RANK Functions Explained

Ranking Rows in SQL: The RANK and DENSE_RANK Functions

As a data analyst or SQL developer, ranking rows in a database is a routine task. You may want to know the top-performing salesperson, the most profitable customer, or the highest-ranked product.

SQL database management systems provide two functions to partition data based on a specified column and then rank the rows in the partition. These functions are called RANK and DENSE_RANK, respectively.

In this article, we’ll explore how to use these functions and what makes them different.

Overview of the RANK and DENSE_RANK Functions

The RANK function assigns a unique integer value to each distinct value in a specific column, based on the order specified in the ORDER BY clause. The ranking starts from one and increases sequentially.

If two or more rows have the same value, they are assigned the same rank, and the next rank is skipped—the number of skipped ranks equals the number of tied rows. For example,

Suppose we have a table of students and their scores in a math exam.

Student	Score
Alice	90
Bob	80
Claire	70
David	80
Emily	90

To rank the students by their scores, we can use the RANK function as follows:

SELECT Student, Score, RANK() OVER (ORDER BY Score DESC) AS 'Rank'
FROM Students;

The result will be:

Student	Score	Rank
Alice	90	1
Emily	90	1
Bob	80	3
David	80	3
Claire	70	5

Both Alice and Emily are tied for first place, so their RANK values are 1, and the next rank is skipped. Bob and David are ranked third, so the next rank (fourth) is skipped.

The DENSE_RANK function works similarly, but it does not skip any rank when there are tied rows. Instead, it assigns the same rank to all the rows with the same value and then assigns the next rank to the following distinct value.

For example:

SELECT Student, Score, DENSE_RANK() OVER (ORDER BY Score DESC) AS 'Rank'
FROM Students;

Student	Score	Rank
Alice	90	1
Emily	90	1
Bob	80	2
David	80	2
Claire	70	3

In this result, both Alice and Emily have a rank of 1, and the next rank is assigned to Bob and David, who have a score of 80.

Difference between RANK and DENSE_RANK

The primary difference between RANK and DENSE_RANK is handling equal ranking rows. RANK skips the next rank after assigning a non-unique rank to equal rows, which may cause gaps in the rankings.

DENSE_RANK assigns the same rank to the equal rows and does not skip any ranks, which means that the sequence of ranks is dense and continuous.

For example, in the previous student table, let’s assume that there are two students who scored 85, instead of having two students who scored 90.

Student	Score
Alice	90
Bob	80
Claire	70
David	80
Emily	85
Frank	85

If we use the RANK function, the result will be:

SELECT Student, Score, RANK() OVER (ORDER BY Score DESC) AS 'Rank'
FROM Students;

Student	Score	Rank
Alice	90	1
Emily	85	2
Frank	85	2
Bob	80	4
David	80	4
Claire	70	6

We can see that RANK assigns a non-unique rank to equal rows and skips the next rank. In this case, the ranks of 3 and 5 are skipped.

On the other hand, if we use the DENSE_RANK function, the result will be:

SELECT Student, Score, DENSE_RANK() OVER (ORDER BY Score DESC) AS 'Rank'
FROM Students;

Student	Score	Rank
Alice	90	1
Emily	85	2
Frank	85	2
Bob	80	3
David	80	3
Claire	70	4

The DENSE_RANK assigns the same rank to the equal rows and does not skip any rank. In this case, all the ranks are dense and continuous.

Partitioning records into groups

In some cases, you may need to partition the records into groups by a specific column (usually categorical) and then rank the records within each group. For example, you may want to find the top-performing salesperson in each region or the most profitable product in each category.

You can partition the data into groups using the PARTITION BY clause in the OVER() clause. For example:

Suppose we have a table of sales data with columns: region, salesperson, product, and revenue.

Region	Salesperson	Product	Revenue
East	Alice	Apples	1000
East	Alice	Oranges	2000
East	Bob	Apples	1500
East	Bob	Oranges	1200
West	Claire	Apples	2200
West	Claire	Oranges	2300

To rank the salespersons by their revenue within each region, we can use the RANK function with the PARTITION BY clause as follows:

SELECT Region, Salesperson, Revenue, RANK() OVER(PARTITION BY Region ORDER BY Revenue DESC) AS 'Rank'
FROM Sales;

Region	Salesperson	Revenue	Rank
East	Alice	2000	1
East	Bob	1500	2
East	Alice	1000	3
East	Bob	1200	4
West	Claire	2300	1
West	Claire	2200	2

In this result, we can see that the ranking is done for each region separately based on revenue.

Ranking records within each group

Similar to the previous example, we can utilize the DENSE_RANK function to rank the records within each group with the same syntax.

SELECT Region, Salesperson, Revenue, DENSE_RANK() OVER(PARTITION BY Region ORDER BY Revenue DESC) AS 'Rank'
FROM Sales;

Region	Salesperson	Revenue	Rank
East	Alice	2000	1
East	Bob	1500	2
East	Alice	1000	3
East	Bob	1200	4
West	Claire	2300	1
West	Claire	2200	2

By using the DENSE_RANK function, we can see that there are no skipped ranks, and the ranks are continuous and dense within each group.

Conclusion

SQL provides functions to partition data based on a specified column and rank the rows within each partition. The RANK and DENSE_RANK functions are widely used for ranking tasks.

RANK assigns a unique rank to each distinct value in an ordered set of values and skips the next rank if there are equal values. DENSE_RANK assigns the same rank, without skipping any rank if there are equal values.

We can also use these functions to rank the records within each group defined by partitioning the data based on a specified column. The syntax of these functions is simple, by using the OVER() clause with the PARTITION BY clause and the ORDER BY clause.

Applying RANK and DENSE_RANK to a Database Example: Displaying Relevant Columns in a Table

Let’s take a deeper dive into the application of the RANK and DENSE_RANK functions in SQL queries. In this section, we will demonstrate how these functions can be used to display relevant columns in a table.

Displaying Relevant Columns in the Table

When working with large data sets, it’s often necessary to filter out unnecessary information to get a better understanding of the data. For example, let’s consider a database that keeps track of customer orders, including the customer’s name, order date, and the products they ordered.

We may want to rank the most popular products sold during a specified month.

Let’s assume we have a table named orders with the following structure:

Order_Id	Customer_Name	Order_Date	Product_Name	Quantity
1111	Alice	2022-02-01	Apples	2
1112	Bob	2022-02-02	Pears	4
1113	Claire	2022-02-03	Oranges	6
1114	David	2022-02-04	Apples	3
1115	Emily	2022-02-05	Apples	5

We can filter the table using the WHERE clause to select only the orders made in February 2022.

SELECT *
FROM orders
WHERE Order_Date >= '2022-02-01'
AND Order_Date < '2022-03-01';

The result will be:

Order_Id	Customer_Name	Order_Date	Product_Name	Quantity
1111	Alice	2022-02-01	Apples	2
1112	Bob	2022-02-02	Pears	4
1113	Claire	2022-02-03	Oranges	6
1114	David	2022-02-04	Apples	3
1115	Emily	2022-02-05	Apples	5

Now, let’s say we want to rank the products by the total quantity sold. To do this, we need to aggregate the quantity column for each distinct product, sort the result in descending order, and then apply the RANK or DENSE_RANK function.

However, we don’t need all the columns displayed in the result. We just need the product name and the corresponding rank.

We can achieve this by modifying the SELECT clause as follows:

SELECT Product_Name, RANK() OVER(ORDER BY SUM(Quantity) DESC) AS 'Rank'
FROM orders
WHERE Order_Date >= '2022-02-01'
AND Order_Date < '2022-03-01'
GROUP BY Product_Name;

The result will be:

Product_Name	Rank
Apples	1
Oranges	2
Pears	3

Now we have a table containing only the product names and their corresponding ranks, which is the information we required.

Implementing RANK and DENSE_RANK Functions in SQL Query

Now, let’s consider the implementation of the RANK and DENSE_RANK functions in SQL queries.

The RANK and DENSE_RANK functions are window functions that operate on a subset of data defined by an OVER() clause.

The OVER() clause defines the partitioning of the data and the ordering of the rows within each partition. The syntax of the RANK and DENSE_RANK functions is as follows:

RANK() OVER ( [PARTITION BY partition_expression, ...
] order_by_clause )

DENSE_RANK() OVER ( [PARTITION BY partition_expression, ... ] order_by_clause )

The partition_expression specifies the column or columns that we use to partition the data into distinct subsets.

The order_by_clause specifies the column or columns that we use to order the rows within each partition. The RANK function assigns a unique integer value to each distinct value in the order_by_clause, based on the specified order.

The ranking starts from one and increases sequentially, skipping the next rank when there are ties. The DENSE_RANK function operates similarly to the RANK function, but it assigns the same rank to all the rows with the same value, and the next rank is assigned to the following unique value.

This means that the sequence of ranks is dense and continuous, without any skipped ranks. Let’s explore how to implement these functions to rank the products sold during a specified month:

SELECT Product_Name, RANK() OVER(ORDER BY SUM(Quantity) DESC) AS 'Rank'
FROM orders
WHERE Order_Date >= '2022-02-01'
AND Order_Date < '2022-03-01'
GROUP BY Product_Name;

This query partitions the data by the Product_Name and orders the rows by the sum of the Quantity column in descending order. The RANK function assigns a unique rank to each distinct sum of the Quantity column.

SELECT Product_Name, DENSE_RANK() OVER(ORDER BY SUM(Quantity) DESC) AS 'Rank'
FROM orders
WHERE Order_Date >= '2022-02-01'
AND Order_Date < '2022-03-01'
GROUP BY Product_Name;

This query uses the DENSE_RANK function to rank the data in a dense, continuous sequence, without any skipped ranks.

Conclusion

In conclusion, the RANK and DENSE_RANK functions are useful tools in SQL queries to perform ranking tasks, especially when working with large datasets. These functions allow for easy filtering of unnecessary information while presenting the relevant information in a clear and concise manner.

Appropriate use of the ORDER BY, OVER, and PARTITION BY clauses can help to further refine queries and achieve more accurate or specific results. In conclusion, the RANK and DENSE_RANK functions are crucial tools in ranking tasks when dealing with extensive datasets in SQL queries.

These functions partition data based on specified columns and rank rows within each partition, allowing for easy filtering of irrelevant information. By utilizing ORDER BY, OVER, and PARTITION BY clauses, SQL developers and data analysts can achieve more accurate and specific outcomes.

Using these functions can help in finding the highest, lowest or any other sorted data in the table of a specified database. A clear understanding of these functions can provide insights into the data and help make better and informed decisions.

Adventures in Machine Learning

Mastering Ranking in SQL: RANK and DENSE_RANK Functions Explained

Ranking Rows in SQL: The RANK and DENSE_RANK Functions

Overview of the RANK and DENSE_RANK Functions

The result will be:

For example:

Difference between RANK and DENSE_RANK

Partitioning records into groups

Ranking records within each group

Conclusion

Applying RANK and DENSE_RANK to a Database Example: Displaying Relevant Columns in a Table

Displaying Relevant Columns in the Table

The result will be:

We can achieve this by modifying the SELECT clause as follows:

The result will be:

Implementing RANK and DENSE_RANK Functions in SQL Query

Conclusion

Popular Posts

Handling the IndexError: pop from empty list Error in Python

Mastering File Size Checking in Python using the OS Module

Mastering JSON: Basics Conversion Validation and Parsing