I can teach you analytics! I have never seen a database do analytics better than Snowflake. This week we are working on the Moving Average, which is interchangeable with the word MAVG. You use a Moving Average (MAVG) to look for trends. A MAVG shows you the AVG of values in a moving window, which means a certain amount of rows. If the moving window is a three, then the MAVG calculates the AVG for the values for three rows. You can then see trends by analyzing which results were higher and which were lower.
Here is the idea in simpler terms. If you had a product (i.e., Product_ID 1000), and every day you captured the Daily_Sales and found each day that you made $1. After three rows, your AVG would be $1. If this continued for several weeks, then your averages would all be $1, but if you had a 3-day period where you made $100 a day, you would see an average of $100 on that 3rd great day. You would soon realize that you needed to look into this trend. What happened that we skyrocketed sales during this time? And vice versa, if you averaged $100 a day, and then ran into a 3-day period where you averaged only $1, you would want to know what went wrong.
In each example, you will see an ORDER BY statement, but it will not come at the end of the query. The ORDER BY keywords is always within the MAVG calculation. It is the ORDER BY statement that runs first, and once the system orders the data, the totals calculate. It is the initial ordering of the data set that gives these analytics the name “Ordered Analytics.” The other name is “Window Functions,” because they calculate within a certain window of rows.
Notice in the picture below that we are doing an AVG(Daily_Sales), but this is not an aggregate of Daily_Sales because of the OVER keyword. The OVER keyword made the AVG(Daily_Sales) an ordered analytic and not a mere AVG of Daily_Sales.
The most difficult part of the example below is the keywords “ROWS 2 PRECEDING.” The ROWS 2 PRECEDING is something we refer to as the moving window, or in other words, how many rows should we calculate? The ROWS 2 PRECEDING means that after the system orders the data with the ORDER BY clause, calculate the AVG(Daily_Sales) for the current row and the previous two rows.
Let me summarize. Order the data first by the columns Product_ID and Sale_Date and after the data sorts, then begin with row one and Average the Daily_Sales. The calculation for the average for the first row is an outlier because it does not have two-rows preceding. The second row is also an outlier because it averages the current row (2nd row), and it only has one row preceding, which is only a 2-row average. However, from the 3rd row on, we can get the average for the current row’s Daily_Sales and the previous two rows. We don’t look at the trending information for the first two rows, but from the 3rd row on, we can see the average of 3-rows so we can determine trends where we had spikes of great and spikes of horrible.
Each MAVG example will have an ORDER BY statement, but sometimes you will also have a PARTITION statement. In the example below, you see the keywords PARTITION BY Product_ID, and that means the MAVG function calculates within each Product_ID only. Snowflake does not place the Product_ID partitions in ascending or descending order.
In the picture above, we have a moving window of 2 so we will average the current row and the previous row. The only two rows that are not 2-row averages are the outliers, which in this case, is the first row in each Product_ID partition. The outliers are only one-row averages because they do not have a row preceding within their Product_ID partition. The keywords “Partition by Product_ID,” means to start the AVG calculation over (reset) on each Product_ID break.
If you want to move data from any system to Snowflake, you should use the Nexus and NexusCore Server for the data movement. You can move these systems to Snowflake:
- SQL Server
- Azure SQL Data Warehouse
Below is an example of how pretty and easy-to-use the NexusCore Server is to move data from another system to Snowflake. Below is an example of how pretty and easy-to-use the NexusCore Server is to move data from another system to Snowflake. You can run this job immediately, or you can schedule it daily, weekly, monthly, yearly, or custom.
If you want to move data to snowflake or you want to use the greatest query tool known to humankind, then use the Nexus and NexusCore Server. Download your free Nexus trial at www.CoffingDW.com or contact me for a demo.
I hope you enjoyed today’s Snowflake analytic lesson. See you next week.
CEO, Coffing Data Warehousing
Direct: 513 300-0341
YouTube channel: CoffingDW
Tom Coffing, better known as Tera-Tom, is the founder of Coffing Data Warehousing where he has been CEO for the past 25 years. Tom has written over 75 books on all aspects of Teradata, Netezza, Yellowbrick, Snowflake, Redshift, Aurora, Vertica, SQL Server, and Greenplum. Tom has taught over 1,000 classes worldwide, and he is the designer of the Nexus Product Line.