I can teach you analytics! I have never seen a database do analytics better than Snowflake. These analytic lessons are the perfect opportunity for you to learn analytics like the back of your hand. If you want your career stock to soar, then invest in yourself and learn these analytic lessons. You are about to be introduced to “promotion heaven.” Each week I am going to teach you analytics until you become the next Einstein, or even better, the next Tera-Tom.
The best place to start is with the ROW_NUMBER command because it is simple enough to teach you the analytic fundamentals, but it is so incredibly powerful that you will be able to build powerful reports instantly.
In each example, you will see an ORDER BY statement, but it will not come at the end of the query. The ORDER BY keywords is always within the analytic calculation. All of these analytics will take the data set and sort it first with an ORDER BY statement. Once the data sort happens in step 1 of the process, the calculations begin.
In its most simple explanation, a ROW_NUMBER will sort the data first with an ORDER BY statement, and then place the number 1 on the first row, a number 2 on the second row, and so on. Check out the example below.
In the picture above, notice that the keyword ROW_NUMBER is immediately followed by an open and close parenthesis. The open and close parenthesis insinuates a function. There is never anything inside the parenthesis, but it is required. The keyword OVER follows which represents that this analytic is an ordered analytic, which is interchangeable with the term window function. The term ordered analytic means that the data set will be put in a specific order before the calculation begins. We sort the data using the ORDER BY statement and in this example, we are sorting by the column Product_ID, and within Product_ID by the column Sale_Date. Product_ID is the major sort, and any ties further sort by Sale_Date, which is the minor sort. Once the data set sorts, the first row gets a one and the number increments sequentially.
Each analytic example will have an ORDER BY statement, but sometimes you will also have a PARTITION statement. If you see the keyword PARTITION, it means the analytic calculation will reset and start over. Check out the next example below.
Now, get ready for something you have never seen before, which is the QUALIFY statement. QUALIFY acts like a filter, but it is different than a WHERE clause filter. The QUALIFY statement waits until all of the analytic calculations finish. When the report finishes and is ready to return, the QUALIFY steps in and filters the rows further.
In the example below, we are attempting to find the smarted two students in each class code. We want to find the top two students with the highest-grade points for the freshman, sophomores, juniors, and seniors.
The brilliance behind this query is that you control the sort through the ORDER BY statement. You also control when you want the calculation to reset through the PARTITION statement. Once the sort, reset, and the calculations finish, the QUALIFY comes in and states, “I only want the top two rows per class code.”
The only two systems I have ever seen that use the QUALIFY statements are Snowflake and Teradata. If you are working with a system that does not support QUALIFY, you can place your SQL statement in a derived table, and then use a WHERE clause. In the example below, all of the SQL using the color blue is part of the derived table. The results are exactly the same as the previous example, but it is done with a derived table and not a QUALIFY statement.
If you want to move data to snowflake or you want to use the greatest query tool known to humankind, then use the Nexus. Download your free Nexus trial at www.CoffingDW.com.
I hope you enjoyed today’s Snowflake analytic lesson. See you next week.
CEO, Coffing Data Warehousing
Direct: 513 300-0341
Youtube channel: CoffingDW
Tom Coffing, better known as Tera-Tom, is the founder of Coffing Data Warehousing where he has been CEO for the past 25 years. Tom has written over 75 books on all aspects of Teradata, Netezza, Yellowbrick, Snowflake, Redshift, Aurora, Vertica, SQL Server, and Greenplum. Tom has taught over 1,000 classes worldwide, and he is the designer of the Nexus Product Line.