Please see my other Database Development articles.
Ranking Functions
There are several new impressive advancements in the latest version of t-sql; this article will focus on “Ranking” functions.
Prior to the release of SQL Server 2005 with its t-sql enhancements, working with blocks of related data was clunky at best. T-sql authors mostly relied on “GROUP BY” statements and then had to perform “CURSOR” and/or “SORT BY” acrobatics in order to return contiguous blocks of data in any meaningful manner.
Ranking functions overcome previous limitations to working with subsets of data within implicit groups.
In other words, SQL Server developers are now able to in essence, perform “GROUP BY” operations within existing groups.
We can achieve such actions by choosing from several segregation actions according to the desired result along with the familiar “SORT BY” finishing touches.
To begin these examples, first I’ll show you the source data.
It’s simply a series of records reflecting average data about people such as would be common to most databases .
For example, for all of our example will be working with the following fields: FirstName, Age, and Gender.
ROW_NUMBER()
Our first example will illustrate the most basic ranking function, “ROW_NUMBER().”
The primary purpose of this function is simply to add sequential numbering unto an existing t-sql feature, “ORDER BY.” In other words, just think of how you were already using the “ORDER BY” clause, but with an additional column of ordered numbers and you’ll understand the “ROW_NUMBER()” function.
In this example, I wanted to order everyone according to their “Age” and then see that list order by a sequential “Row Number by Age” column.
ROW_NUMBER(), part 2
In this example, you’ll see the same function being called by I wanted the data returned in its natural state – the row order left unchanged.
In this manner, I can have the same sequential numbered column according to “Age,” but without the rowset modified.
ROW_NUMBER() with PARTITION
Here’s where the magic of ranking functions really come into focus – with PARTITIONing!
Have you ever wanted to perform a “GROUP BY” within itself? Now you can!
Use the “ROW_NUMBER()” function just as in the other example, except this time specify by which field you’d like the inner grouping to occur.
In this example, I chose to separate the age groups according to gender. Then, I perform a “ORDER BY” on each subset.
RANK()
This function is similar to “ROW_NUMBER()” except instead of simply sequentially numbering records, the sequence is assigned by a field’s data, in this case “Age.”
In the very first example, you saw the first three rows numbered “1, 2, 3,” according to the sequence of the rows.
However, that told me nothing of the “Age” groupings naturally occurring within the result set.
In this example, the sequence only increments when a new value is encountered. Therefore, I get a more meaningful sequential order when I instruct the ranking function to “RANK()” “OVER,” that is “Using” the “Age” column.
RANK() with PARTITION
In my last example, you may have noticed I lost my PARITIONing according to the natural grouping according to the person’s “Gender.”
This can easily be reapplied using the same “PARTITION” clause.
DENSE_RANK() with PARTITION
You may have noticed in the two previous examples, the sequence occasionally skips a number.
This anomaly is due in part to the number of records sharing the same value within the field used by the “RANK().”
To avoid breaks in our sequence, we’ll simply use this function with instructs the result set sequence to be more “dense” in its numbering.
NTILE()
In this final example, you can see how the sequencing may be determined by an entire subset as opposed to individual records within each group.
In this example, I’ve instructed the result set it may have only 3 “Age Groups.”
When the query is executed, each record is assigned one of the three groups according to its statistical relevance to other records within the returning result set.