I can teach you analytics! I have never seen a database do analytics better than Snowflake. This week we are working on the Cumulative Sum, which is interchangeable with the word CSUM. You use a Cumulative Sum (CSUM) to get a running total with subtotals and grand totals.
In each example, you will see an ORDER BY statement, but it will not come at the end of the query. The ORDER BY keywords is always within the CSUM calculation. It is the ORDER BY statement that runs first, and once the system orders the data, the totals calculate. It is the initial ordering of the data set that gives these analytics the name “Ordered Analytics.”
Notice in the picture below that we are doing a SUM(Daily_Sales), but this is not an aggregate of Daily_Sales because of the OVER keyword. The OVER keyword made the SUM(Daily_Sales) an ordered analytic and not a mere SUM of Daily_Sales.
The most difficult part of the example below is the keywords “ROWS UNBOUNDED PRECEDING.” The ROWS UNBOUNDED PRECEDING is something we refer to as the moving window, or in other words, how many rows should we calculate? The ROWS UNBOUNDED PRECEDING means that after the system orders the data with the ORDER BY clause, calculate the running total from the first row to the last, always adding up the preceding totals with the current row’s Daily_Sales value. Check out the example below.
Let me summarize. Order the data first by the column Sale_Date and after the data sorts begin with row one and Sum the Daily_Sales. Then, continue to the next row and add to the sum of that row’s Daily_Sales to the running total. Continue to do this until you finish with the last row.
In the picture below, we are running the same calculation, but ordering the data using two columns in the ORDER BY clause. The only reason the grand totals are different between the above and below picture is that we are not displaying all of the data.
Each CSUM example will have an ORDER BY statement, but sometimes you will also have a PARTITION statement. In the example below, you see the keywords PARTITION BY Product_ID, and that means the CSUM function calculates within each Product_ID only. Snowflake does not place the Product_ID partitions in ascending or descending order. Check out the next example below.
If you want to move data from any system to Snowflake, you should use the Nexus and NexusCore Server for the data movement. You can move these systems to Snowflake:
- SQL Server
- Azure SQL Data Warehouse
Below is an example of how pretty and easy-to-use the NexusCore Server is to move data from another system to Snowflake. You can run this job immediately or you can schedule it to run.
If you want to move data to snowflake or you want to use the greatest query tool known to humankind, then use the Nexus and NexusCore Server. Download your free Nexus trial at www.CoffingDW.com or contact me for a demo. I hope you enjoyed today’s Snowflake analytic lesson. See you next week.
CEO, Coffing Data Warehousing
Direct: 513 300-0341
YouTube channel: CoffingDW
Tom Coffing, better known as Tera-Tom, is the founder of Coffing Data Warehousing where he has been CEO for the past 25 years. Tom has written over 75 books on all aspects of Teradata, Netezza, Yellowbrick, Snowflake, Redshift, Aurora, Vertica, SQL Server, and Greenplum. Tom has taught over 1,000 classes worldwide, and he is the designer of the Nexus Product Line.