T-SQL: Ranking Functions

Please see my other Database Development articles.

Ranking Functions

There are several new impressive advancements in the latest version of t-sql; this article will focus on “Ranking” functions.

Prior to the release of SQL Server 2005 with its t-sql enhancements, working with blocks of related data was clunky at best. T-sql authors mostly relied on “GROUP BY” statements and then had to perform “CURSOR” and/or “SORT BY” acrobatics in order to return contiguous blocks of data in any meaningful manner.

Ranking functions overcome previous limitations to working with subsets of data within implicit groups.
In other words, SQL Server developers are now able to in essence, perform “GROUP BY” operations within existing groups.
We can achieve such actions by choosing from several segregation actions according to the desired result along with the familiar “SORT BY” finishing touches.

To begin these examples, first I’ll show you the source data.
It’s simply a series of records reflecting average data about people such as would be common to most databases .
For example, for all of our example will be working with the following fields: FirstName, Age, and Gender.

screen_rank_rawtable

ROW_NUMBER()
Our first example will illustrate the most basic ranking function, “ROW_NUMBER().”
The primary purpose of this function is simply to add sequential numbering unto an existing t-sql feature, “ORDER BY.” In other words, just think of how you were already using the “ORDER BY” clause, but with an additional column of ordered numbers and you’ll understand the “ROW_NUMBER()” function.
In this example, I wanted to order everyone according to their “Age” and then see that list order by a sequential “Row Number by Age” column.

screen_rank_row1_code

screen_rank_row1_results

ROW_NUMBER(), part 2
In this example, you’ll see the same function being called by I wanted the data returned in its natural state – the row order left unchanged.
In this manner, I can have the same sequential numbered column according to “Age,” but without the rowset modified.

screen_rank_row2_code

screen_rank_row2_results

ROW_NUMBER() with PARTITION
Here’s where the magic of ranking functions really come into focus – with PARTITIONing!
Have you ever wanted to perform a “GROUP BY” within itself? Now you can!
Use the “ROW_NUMBER()” function just as in the other example, except this time specify by which field you’d like the inner grouping to occur.
In this example, I chose to separate the age groups according to gender. Then, I perform a “ORDER BY” on each subset.

screen_rank_row_partition_code

screen_rank_row_partition_results

RANK()
This function is similar to “ROW_NUMBER()” except instead of simply sequentially numbering records, the sequence is assigned by a field’s data, in this case “Age.”
In the very first example, you saw the first three rows numbered “1, 2, 3,” according to the sequence of the rows.
However, that told me nothing of the “Age” groupings naturally occurring within the result set.
In this example, the sequence only increments when a new value is encountered. Therefore, I get a more meaningful sequential order when I instruct the ranking function to “RANK()” “OVER,” that is “Using” the “Age” column.

screen_rank_rank_code

screen_rank_rank_results

RANK() with PARTITION
In my last example, you may have noticed I lost my PARITIONing according to the natural grouping according to the person’s “Gender.”
This can easily be reapplied using the same “PARTITION” clause.

screen_rank_rank_partition_code

screen_rank_rank_partition_results

DENSE_RANK() with PARTITION
You may have noticed in the two previous examples, the sequence occasionally skips a number.
This anomaly is due in part to the number of records sharing the same value within the field used by the “RANK().”
To avoid breaks in our sequence, we’ll simply use this function with instructs the result set sequence to be more “dense” in its numbering.

screen_rank_denserank_partition_code

screen_rank_denserank_partition_results

NTILE()
In this final example, you can see how the sequencing may be determined by an entire subset as opposed to individual records within each group.
In this example, I’ve instructed the result set it may have only 3 “Age Groups.”
When the query is executed, each record is assigned one of the three groups according to its statistical relevance to other records within the returning result set.

screen_rank_ntile_codescreen_rank_ntile_results

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s