Netezza Analytics – Cumulative Sum
I can teach you analytics! Netezza is built for analytics, and I will show you the cumulative sum this week. A cumulative sum is a running total of values progressively increasing or accumulating over time or across a dataset. It represents the summation of all values encountered up to a certain point or row in the dataset, where each new value adds to the total of the previous values. This blog will teach you how to get subtotals and grand totals, so your new nickname is “total star.”
All of these examples have come from my books and training classes. Please do me a favor and tell your training coordinator that you know the best technical trainer in the World. Ask them to hire me to train at your company, either on-site or with a virtual class. They can see our classes, outlines, and a sample of my teaching at this link on our website. I hope to meet you and say thanks in my next class at your company:)
https://coffingdw.com/education/
The query below sorts the dataset rows by sale_date and then adds up the daily_sales from the first row to the last.
The sum of daily_sales first appears as a simple aggregate function. However, the keyword OVER means this is an ordered analytic, not an aggregate. When you see the keyword OVER, the system will have an ORDER BY statement in parenthesis that sorts the rows before the analytic calculation. Once the rows are ordered by sale_date, each new daily_sales value will be added to the total of the previous values.
We also have a WHERE clause, which only places rows with a product_id value of 1000 or 2000 in the data set to be ordered and calculated.
If you took a calculator and added up all of the daily_sales values you see below, you would find that the sum of these values is 442,962.75.
The query below sorts the dataset rows by product_id (major sort) and sale_date (minor sort) and then adds up the daily_sales from the first row to the last.
I want you to focus on the importance of the ORDER BY statement, which is ORDER BY product_id and then sale_date. Once the data is ordered, the cumulative sum adds up the daily_sales from the first row to the last.
If you took a calculator and added up all of the daily_sales values you see below, you would find that the sum of these values is 525,793.92.
The only reason the final total is different from our previous example is that I am not showing all of the rows.
Our next example uses a PARTITION BY statement, which means performing the analytic within each product_id separately. I like to think of it as resetting the cumulative with the beginning of each product_id. Notice that we are ordering the data by sale_date, but it appears we are ordering by product_id and then sale_date because of the combination of partitioning and ordering within the cumulative sum analytic.
Our example adds up all of the daily_sales values for product_id 1000 and then does the same for product_id 2000.
One of the brilliant capabilities of ordered analytics and window functions is mixing and matching many columns on the report with multiple analytics. Below is a report showing the product_id, sale_date, and daily_sales values mixed with two ordered analytics. Our top cumulative sum has a partition statement; the one below does not. However, both analytics were sorted the same way: first by product_id and then by sale_date.
Because the analytics sorted the rows similarly, the report appears exactly as designed. The top analytic provides a cumulative sum within each product_id, and the bottom is a cumulative sum from the first row to the last. We have subtotals and grand totals within the same query.
Did you know that Coffing Data Warehousing was the first to create software that joins data across all systems? Download your free Nexus trial at www.CoffingDW.com.
Watch how Nexus performs federated queries to join a Netezza table to tables from twenty systems in a single query. Watch the video.
To find out more about Nexus, you can watch our 60-second videos here: