I can teach you analytics! The Yellowbrick database does analytics lightning-fast. These analytic lessons are the perfect opportunity for you to become an analytic guru. If you want to be number one in your department then the Row_Number command is what you need. Get ready for your career to soar like a bird. Each week I am going to teach you analytics until you become the next Tera-Tom.
The best place to start is with the ROW_NUMBER command because it is simple enough to teach you the analytic fundamentals, but it is so incredibly powerful that you will be able to build powerful reports instantly.
In each analytic example for the next 12 blogs, you will see an ORDER BY statement. However, the ORDER BY statement will not come at the end of the query. The ORDER BY statement in an ordered analytic comes within the analytic function. All analytics will take the data set and sort it first with an ORDER BY statement. Once the data sort happens in step one of the process, the calculations begin. That is why they call these functions Ordered Analytics.
In its most simple explanation, a ROW_NUMBER will sort the data first with an ORDER BY statement, and then place the number one on the first row, a number two on the second row, and so on. Check out the example below.
In the picture above, notice that the keyword ROW_NUMBER is immediately followed by an open and close parenthesis. The open and close parenthesis insinuates a function. There is never anything inside the parenthesis, but it is required. The keyword OVER follows which represents that this analytic is an ordered analytic, which is interchangeable with the term window function. The term ordered analytic means that the data set will be put in a specific order before the calculation begins. We sort the data using the ORDER BY statement and in this example, we are sorting by the column Product_ID, and within Product_ID by the column Sale_Date. Product_ID is the major sort, and any ties further sort by Sale_Date, which is the minor sort. Once the data set sorts, the first row gets a one and the number increments sequentially.
Each analytic example will have an ORDER BY statement, but sometimes you will also have a PARTITION statement. If you see the keyword PARTITION, it means the analytic calculation will reset and start over. Check out the next example below.
Now, get ready for something a little more advanced. What if you wanted to know the best three daily_sales for each product_id? The example below has a partition statement so the calculations reset with each product_id break. The example also has an ORDER BY daily_sales desc clause so that the data will be sorted by the highest daily_sales value within each product_id. Finally, we put parentheses around our SQL Statement, and we create and select from a derived table.
The true power of the Row_Number comes from the fact that you control how the data sorts through the ORDER BY statement within the calculation. I am going to show you two examples. The first example will bring back the best student in each class_code, and the second example will bring back the worst student in each class_code. Can you spot the only difference in each example?
Here is the best student in each class.
Here is the worst student in each class.
The difference between the two examples is in the ORDER BY statement. In our first example, we order by grade_pt DESC. In the second example, we order by grade_pt ASC.
The Row_Number allows you to control the sort, so you can determine the type of information you want to come back first.
If you want to move data to Yellowbrick or you want to use the greatest query tool known to humankind, then use the Nexus. Download your free Nexus trial at www.CoffingDW.com.
Watch the video of the Nexus moving data to Yellowbrick and all other systems.
I hope you enjoyed today’s Yellowbrick analytic lesson. See you next week.
Yellowbrick – The only modern data warehouse for hybrid cloud
Yellowbrick is the world’s only modern data warehouse for hybrid cloud. Enterprises rely on Yellowbrick to do the impossible in data analytics: get answers to the hardest business questions for improved profitability, better customer loyalty, and faster innovation in near real time, and at a fraction of the cost of alternatives.
Yellowbrick offers superior price/performance for thousands of concurrent users on petabytes of data, along with the unique ability to run analytic workloads on-premises, in a private cloud, and/or any public cloud and manage them in a simple, consistent way–all with predictable pricing via an annual subscription. Learn more at www.yellowbrick.com
Tom Coffing, better known as Tera-Tom, is the founder of Coffing Data Warehousing where he has been CEO for the past 25 years. Tom has written over 75 books on all aspects of Teradata, Netezza, Yellowbrick, Snowflake, Redshift, Aurora, Vertica, SQL Server, and Greenplum. Tom has taught over 1,000 classes worldwide, and he is the designer of the Nexus Product Line.