I can teach you Yellowbrick analytics!  I have never seen a database perform analytics faster than Yellowbrick.  Today you are going to smile when you learn about NTile.

NTILE is an analytic function. It divides an ordered data set into a number of buckets indicated by expr and assigns the appropriate bucket number to each row. The buckets are numbered 1 through expr.

I will show you an example of an NTile, and then explain the theory.

NTile functions are used to group the values in an ordered list (ascending or descending) into one of several buckets or NTiles.  

For example, if you have 1,000 employees, and each has a different salary, and you did an NTile of 4 (refers to a quantile). The NTile sorts the salaries (i.e., descending), and then places 250 of the top salaries in the first bucket (1).  This means that each of these high salary rows has a bucket of one.  Then, 250 of the next highest salaries go in the next bucket (2), 250 of the next highest salaries in the next bucket (3), and the lowest 250 salaries in the fourth bucket.

For the NTile function, the buckets are calculated so that each bucket has exactly the same number of rows assigned to it or at most one row more than the others (the exception is that identical value rows are placed in the same bucket). These buckets are referred to as equiheight buckets.

Depending on the number of buckets the NTile result uses the names below:

2 tiles = median

3 tiles = terciles

4 tiles = quartiles

5 tiles = quintiles

6 tiles = sextiles

7 tiles = septiles

8 tiles = octiles

10 tiles = deciles

12 tiles = duodeciles

20 tiles = vigintiles

100 tiles = percentiles

1000 tiles = permilles

The next example shows a NTile using five (quintile), and orders the data in DESC mode.  Also, notice the “nulls last” in the ORDER BY statement.    

Now, we are going to take the next example even farther by using a NTile of three and adding a partition statement.  The partition statement resets the calculation, so each partition groups within their partition.

The next example uses a derived table to pull only the top bucket for each product_id.  The example below is using a PARTITION BY statement in a derived table with a WHERE clause to return the rows in the first bucket for each product_id.  We want to see the highest daily_sales values for each product_id.

If you want to move data from any system to Yellowbrick, you should use the Nexus and NexusCore Server for the data movement.  You can move these systems to Yellowbrick:

  • Teradata
  • Oracle
  • SQL Server
  • DB2
  • Greenplum
  • Redshift
  • Azure SQL Data Warehouse
  • Postgres
  • MySQL
  • Netezza
  • Snowflake

How do companies use the Nexus Server?  Some use the Nexus Server to migrate their legacy databases to the cloud.  Thousands of tables can move by merely using the Nexus Pro to choose your source system, database, and tables and then choose your target system, database, and schema.  When you press “Schedule on Server,” the users’ credentials and the load scripts are sent to the Nexus Server, and the load utilities initiate.

Many companies use the Nexus Server to move single tables into the user’s sandbox, while many utilize the Nexus Super Join builder to join data across platforms. User’s point-and-click on the Nexus Pro system tree to join tables across any cloud or on-premises database, and the Nexus Server moves the data to a common platform where it performs the join.

If you want to move data to Yellowbrick or you want to use the most fantastic query tool known to humankind, then use the Nexus.  Download your free Nexus trial at www.CoffingDW.com.

Watch the video of the Nexus moving data to Yellowbrick and all other systems.

I hope you enjoyed today’s Yellowbrick analytic lesson.  See you next week. 

Yellowbrick – The only modern data warehouse for hybrid cloud

Yellowbrick is the world’s only modern data warehouse for hybrid cloud. Enterprises rely on Yellowbrick to do the impossible in data analytics: get answers to the most challenging business questions for improved profitability, better customer loyalty, and faster innovation in near real-time, and at a fraction of the cost of alternatives.

Yellowbrick offers superior price/performance for thousands of concurrent users on petabytes of data, along with the unique ability to run analytic workloads on-premises, in a private cloud, and/or any public cloud and manage them in a simple, consistent way–all with predictable pricing via an annual subscription. Learn more at www.yellowbrick.com.

Thank you,

Tera-Tom

Tom Coffing
CEO, Coffing Data Warehousing
Direct: 513 300-0341
Website: www.CoffingDW.com
YouTube channel: CoffingDW
Email: Tom.Coffing@CoffingDW.com