I can teach you analytics!  I have never seen a database do analytics faster than Yellowbrick.  This week we are working on the Moving Average, which is interchangeable with the word MAVG. You use a Moving Average (MAVG) to look for trends.  A MAVG shows you the AVG of values in a moving window, which means a certain number of rows.  If the moving window is a three, then the MAVG calculates the AVG for the values for three rows.  You can then see trends by analyzing which results were higher and which were lower.

Here is the idea in simpler terms.  If you had a product (i.e., Product_ID 1000), and every day you captured the Daily_Sales and found each day that you made $1.  After three rows, your AVG would be $1.  If this continued for several weeks, then your averages would all be $1, but if you had a 3-day period where you made $100 a day, you would see an average of $100 on that 3rd great day.  You would soon realize that you needed to look into this trend.  What happened that we skyrocketed sales during this time?  And vice versa, if you averaged $100 a day, and then ran into a 3-day period where you averaged only $1, you would want to know what went wrong.

In each example, you will see an ORDER BY statement, but it will not come at the end of the query.  The ORDER BY keywords is always within the MAVG calculation.  It is the ORDER BY statement that runs first, and once the system orders the data, the totals calculate.  It is the initial ordering of the data set that gives these analytics the name “Ordered Analytics.”  The other name is “Window Functions,” because they calculate within a certain window of rows.

Notice in the picture below that we are doing an AVG(Daily_Sales), but this is not an aggregate of Daily_Sales because of the OVER keyword.    The OVER keyword made the AVG(Daily_Sales) an ordered analytic and not a mere AVG of Daily_Sales. 

The most difficult part of the example below is the keywords “ROWS 2 PRECEDING.”  The ROWS 2 PRECEDING is something we refer to as the moving window, or in other words, how many rows should we calculate?  The ROWS 2 PRECEDING means that after the system orders the data with the ORDER BY clause, it calculates the AVG(Daily_Sales) for the current row and the previous two rows.  So, a ROWS 2 PRECEDING is a moving window of three!

Let me summarize.  Order the data first by the columns Product_ID and Sale_Date and after the data sorts, then begin with row one and Average the Daily_Sales.  The calculation for the average for the first row is an outlier because it does not have two-rows preceding.  The second row is also an outlier because it averages the current row (2nd row), and it only has one row preceding, which is only a 2-row average.  However, from the 3rd row on, we can get the average for the current row’s Daily_Sales and the previous two rows.  We don’t look at the trending information for the first two rows, but from the 3rd row on, we can see the average of 3-rows so we can determine trends where we had spikes of great and spikes of horrible.

Each MAVG example will have an ORDER BY statement, but sometimes you will also have a PARTITION statement.  In the example below, you see the keywords PARTITION BY Product_ID, and that means the MAVG function calculates within each Product_ID only.  Yellowbrick does not place the Product_ID partitions in ascending or descending order. 

In the picture above, we have a moving window of three so we will average the current row and the previous two rows.  The only two rows that are not 3-row averages are the outliers, which in this case, is the first and second row in each Product_ID partition.  The outliers are only one and two-row averages because they do not have a row preceding within their Product_ID partition.  The keywords “Partition by product_id,” means to start the AVG calculation over (reset) on each product_id break.

If you want to move data from any system to Yellowbrick, you should use the Nexus and NexusCore Server for the data movement.  You can move these systems to Yellowbrick:

  • Teradata
  • Oracle
  • SQL Server
  • DB2
  • Greenplum
  • Redshift
  • Azure SQL Data Warehouse
  • Postgres
  • MySQL
  • Netezza
  • Snowflake

Below is an example of how pretty and easy-to-use the NexusCore Server is to move data from another system to Yellowbrick.  Below is an example of how pretty and easy-to-use the NexusCore Server is to move data from another system to Yellowbrick.  You can run this job immediately, or you can schedule it daily, weekly, monthly, yearly, or custom.

If you want to move data to Yellowbrick or you want to use the greatest query tool known to humankind, then use the Nexus.  Download your free Nexus trial at www.CoffingDW.com.

Watch the video of the Nexus moving data to Yellowbrick and all other systems.

Watch the video: https://youtu.be/mqNQ65H7lps

I hope you enjoyed today’s Yellowbrick analytic lesson.  See you next week. 

Yellowbrick – The only modern data warehouse for hybrid cloud

Yellowbrick is the world’s only modern data warehouse for hybrid cloud. Enterprises rely on Yellowbrick to do the impossible in data analytics: get answers to the hardest business questions for improved profitability, better customer loyalty, and faster innovation in near real-time, and at a fraction of the cost of alternatives. Yellowbrick offers superior price/performance for thousands of concurrent users on petabytes of data, along with the unique ability to run analytic workloads on-premises, in a private cloud, and/or any public cloud and manage them in a simple, consistent way–all with predictable pricing via an annual subscription. Learn more at http://www.yellowbrick.com.

If you want to learn more about modern data analytics, sign up for the ANSWERS seminar that Yellowbrick is holding on September 29, 2020.  Coffing Data is a sponsor.

I hope you enjoyed today’s Yellowbrick analytic lesson.  See you next week. 

Thank you,

Tera-Tom

Tom Coffing
CEO, Coffing Data Warehousing
Direct: 513 300-0341
Website: www.CoffingDW.com
YouTube channel: CoffingDW
Email: Tom.Coffing@CoffingDW.com