Tera-Tom here! If you don’t have it now, you will soon.  Why? For the same reason that sometimes you drive a car places, and other times you fly in a plane.  A car is designed for short trips and a plane for long trips.  Columnar is for short trips and rows are for longer trips.  Data warehouses have traditionally parallel processed rows.  A columnar database parallel processes columns.  It dominates on queries that are analytic-based and queries where only a few columns need to be analyzed.  The dominant players right now in Columnar are Amazon’s Redshift, HP’s Vertica, and ParAccel.  Teradata has also implemented Columnar in their V14 systems.

 

Let’s talk about both row-based and column-based systems.  A row-based system stores entire rows in data blocks on disk. If the table has only a few columns, there might be thousands of rows in a block, but if the rows have hundreds of columns, there might be 10.  The real tragedy is that, if you want to analyze just a single column in that table, the entire block must be moved into memory.  It’s like driving a bus every time you want to go to the grocery store.  It is unnecessary overkill on certain queries!

 

A columnar system stores an entire row, but when it physically stores it on disk, each column goes in its own data block. They have to keep the data blocks in sync, and their Achilles Heel is when you want to select every column from a table. But, when you only need to run analytics on 1, 2, or 3 columns, the data block movement slices through traffic like a motorcycle because you’re never taking any unneeded waste from disk to memory.  When you need all the information in a table, a row-based system is much better.  When you need analysis of just a few columns, a columnar system can’t be beat.  That’s why you need both!

 

I’m currently writing my next book on the Amazon Redshift product, a brilliant columnar database.  It is over 600 pages long, and it shows you how to set it up, tune it, load and go.  My book also contains all the SQL you need to query it with ease.  Consider trying out a columnar system for specific applications, and you are going to see why this is one of the great computer inventions of all time.  Look for it on the top 27 book websites around the world!

 

And always remember that the Nexus can query every system in the world including Redshift.  You can purchase the Nexus directly from me or go to Amazon.com and rent it there!

 

Sincerely,

 

Tom

 

Tom Coffing

CEO, Coffing Data Warehousing

Direct: 513 300-0341

www.CoffingDW.com

Email: Tom.Coffing@CoffingDW.com