Teradata vs Netezza vs Hadoop – Software connecting all databases

Tera-Tom here! Isaac Newton changed the world when he brought us the concept of gravity and once said, “If I have seen farther than others, it is because I was standing on the shoulders of giants.” The real gravity of big data is that there are no bigger giants than Teradata, Netezza, and Hadoop. Each changed the world forever, and each stood on the shoulders of its predecessors. This email will discuss the strengths of each as well as their differences.

Let’s first discuss their similarities and strengths. Each of these were born to be parallel. The design around parallel processing is to break up the data, so each processor performs an equal amount of work. As you double your processors, you always double your speeds. It’s like doing your laundry. If you go to the laundry mat on a Saturday night, all of the machines are empty. So, you can do all of your washing and all of your drying in less than an hour and a half. But, if you go on a Sunday afternoon, you can only get one of the machines, and you are there all day long.

Teradata and Netezza have great scalability, but all of the hardware needs to be in the same room and purchased from the manufacturer. Hadoop, on the other hand, allows you to buy any hardware, place each piece of hardware in a separate location around the world, and still gives you the ability to spread the data in any configuration you want. It stood on the shoulders of giants. That’s why people believe that 90% of the world’s data will be on Hadoop within the next 5 years.

Each Teradata table chooses a column to be the primary index, and they distribute the data by hashing that key. This allows Teradata to master 2 extremes. Parallel processing can analyze petabytes of data, and if you use the primary index in the where clause, it uses the same hashing algorithm to find that data in 1 second. To eliminate full table scans, Teradata uses secondary indexes, partitioning, columnar design, and in-memory enhancements for performance tuning that make Teradata the most sophisticated data warehouse solution the world has ever seen.

Netezza took a different approach. Instead of a primary index, Netezza has a distribution key which also allows it to master the 2 extremes. But instead of indexing, Netezza uses a Zone Map and an FPGA Card. The FPGA Card sits on top of the disk, and before any blocks move, the Zone Map is checked. The Zone Map lists the min and max value for each column that the block stores. This allows Netezza to know where the data is not, so it doesn’t ever read unnecessary data blocks. Teradata’s philosophy is to use enormous indexing to find where the data is, otherwise Teradata must move every data block from disk just to evaluate if the block is needed. Netezza takes the opposite approach to know where data is not, so it almost never reads every block. The bottom line is that in some environments, Netezza outperforms Teradata without any tuning or DBA intervention.

In a Teradata environment, you better know your data, know your processes, and then spend a lot of time and money to tune the system. A Netezza system literally allows you to load and go from day one.

Hadoop isn’t a relational database like Teradata and Netezza, but instead a file system. It isn’t owned by any company, so the open source concept can save companies millions. You won’t get sub-second response times, but you can run MapReduce queries on enormous amounts of data and get speeds that eliminate all relational databases from consideration. MapReduce can also cut development times to minutes what might take an SQL developer months. When customers hear ‘save millions’, ‘cut development time’ and ‘gain speeds that are magnitudes of order’ they pay attention. The only problem with Hadoop is that the technology is still maturing, so converting tables to Hadoop, querying Hadoop, and understanding how to build the Hadoop environment is a learning process.

These three technologies are probably the reason that Nexus has become the most recognized query tool in the world. Nexus can query every database in the world simultaneously, can convert the table structures to Hadoop from any database, and allows every user to see tables and views visually as it guides like a GPS system to what tables join together. Nexus then builds the SQL automatically as users click on the columns they want to see on their report. Without a doubt, the Nexus is the best query tool for any and all flavors of Hadoop.

The new wave of technology utilizes Teradata, Netezza, and Hadoop in conjunction with each other and it is the Nexus that ties them together seamlessly!

Sincerely,

Tom

Tom Coffing

CEO, Coffing Data Warehousing

Direct: 513 300-0341

www.CoffingDW.com

Email: Tom.Coffing@CoffingDW.com

Posted in Blog