I am fortunate to have the experience of working with one of the largest insurance companies in the world on a big data cloud migration project. This particular company, which I will call big insurance, has one side of their company utilizing a Teradata enterprise data warehouse and the other a Netezza enterprise data warehouse. They also have Oracle, SQL Server, MySQL, and DB2 systems.
They are migrating to three cloud technologies: Amazon Redshift, Snowflake on AWS, and the Teradata cloud. Because of the sheer amount of data, including hundreds of databases and schemas and thousands of tables within each database and/or schema, they have 12 employees in charge of moving the data.
Where do I come into the picture? 30 years ago, I got a call from them mentioning to me that they were paying Teradata $50,000.00 per week for consulting services to manage and run their Teradata system. I told them they misspoke because they meant to say $50,000.00 per month, but they reiterated that they were paying $50,000.00 per week. They wanted me to train their employees to manage their Teradata system, so I put together a two-week Teradata boot camp for 30 employees. I taught them Teradata SQL, the Teradata architecture, performance tuning, DBA responsibilities, and how to build data loading scripts.
To say the training was successful is an understatement because it saved them $50,000.00 per week, $200,00.00 per month, and $600,000.00 per year. And thus, the Tera-Tom name was born!
I have since provided Netezza training boot camps and Amazon Redshift and Snowflake training boot camps. After writing over 85 books, I am an expert on all big data systems architecture and SQL.
Earning their trust, they were one of the first companies to purchase my Nexus Query Chameleon software to replace Teradata SQL Assistant. Nexus helped their IT and business users rapidly develop applications, and the business users loved that they can point-and-click on tables and let Nexus build the SQL automatically. The DBA team asked us to automate the process of compressing their Teradata tables to save perm space, so we built algorithms that scanned their tables and altered the table’s DDL to save them an average of 20% space savings using multi-value compression.
However, they also purchased my Nexus Server software three years ago for data migrations. Previously, they used Amazon Migration Services to migrate data, but after testing the Nexus Server, they found it was faster, easier to use, and more reliable.
The biggest lesson we learned about migrating data was that many tables contain hundreds of millions of rows, so efficiency and speed are imperative. In the first 15 years of Nexus development, we were proud that anyone could log on to Nexus on their PC and automatically migrate data. However, we use load utilities from each vendor, and Nexus builds and executes the load scripts, so data moves from the source system through the user’s PC and onto the target system, which takes too much time. Ouch!
We went back to the drawing board five years ago and built the Nexus Server, where a company can install the Nexus Server software on any windows server or VM. We then pair the user’s Nexus on their PC with the Nexus Server so users can still set up migrations from their PC, but they can schedule the jobs to run on a Nexus Server. Now, 500 million row tables often move in minutes because the data moves from the source system through the high-speed Nexus Server and onto the target system. The ingenious part about this design is that big insurance can place these Nexus Servers on-premise or on any cloud. And they can scale up to have as many Servers as they need in any location.
Interestingly enough, they first needed to migrate their on-premises Teradata system to the Teradata private Intellicloud. The Nexus Server uses over 100 different utilities depending on the source and target system, so they asked us to add the ability to use Teradata’s QueryGrid as an option to move data between their on-premises Teradata prod and test systems. We added QueryGrid as an option, but they found it was faster to use the Nexus Server and choose Teradata Parallel Transport (TPT) Load scripts. Although both QueryGrid and the TPT options of insert, stream, load, and update were available on the Nexus Server, the TPT load was 10 times faster than the QueryGrid option. They also use the Nexus Server to migrate from their on-premises Teradata production system to the private Teradata cloud.
Fascinating to me, they also requested the ability to move Teradata, Netezza, Oracle, SQL Server, and DB2 to Amazon Redshift and Snowflake. But even more surprising, they also needed to move data from Redshift and Snowflake back to Netezza and Teradata. That is when we made the decision to move to and from all database platforms in the Nexus arsenal, including Excel, Microsoft Access, Teradata, Oracle, SQL Server, DB2, SAP Hana, Netezza, MySQL, Postgres, Snowflake, Azure Synapse, Amazon Redshift, Yellowbrick, and Google BigQuery.
They also asked us to improve Nexus’s “Compare and Sync” feature to include an analytic comparison. After each table loads, the Nexus runs a comparison of both tables and compares row counts and sums, and averages on specific column types to see if the tables are an exact match.
They also asked us to allow them to convert the table structures (DDL) from the source and target systems for tens of thousands of tables without migrating the data. They now run the conversions, produce the tables on the target system, and test for any errors or adjust the data types.
We also allowed them to transform the tables before migration or run a script afterward, essentially ETL and ELT, but also with the ability to exclude certain columns and specific rows.
Another surprise was the difficulties of moving all of the data because, with such a variety of data types and the sheer volume of data, we needed to give them options for record delimiters, null character conversions, and more because, in insurance, you are going to see everything from dirty data to hidden characters in some tables.
They then told us that they want to run a query on Teradata that produces 500 million rows, but they want the answer set to be saved automatically on a MySQL system. We were able to build that functionality, which allows them to save money by utilizing a free system because it is open source. Their group responsible for pricing uses the MySQL data for analysis and machine learning.
Finally, they told us that they need to test the data movement for all tables first, but want the ability to save all migration jobs and make them editable for later when the migration needs to officially happen.
They were so pleased with the functionality and customization that they purchased 100 Nexus desktop licenses and four Nexus Server licenses for query, ad hoc, migration, and federated query capabilities.
Here are the 10 lessons learned:
- Big insurance needs to set up and schedule thousands of tables to migrate in a point-and-click fashion taking only minutes with all DDL and data type conversions, load script building, and options standards to be completely automated by Nexus.
- Big insurance needs to migrate specific tables from the cloud back to their legacy systems from various users.
- Big insurance needs its users to be able to work from anywhere but schedule migration and large queries on their high-speed Nexus Server within their firewall.
- Big insurance needs to automate the compression of their Teradata tables to save space and money.
- Big insurance needs to run SQL on one system and have the result set saved to another system.
- Big insurance needs to test migrations to ensure success and then officially move the data on a specific date.
- Big insurance needs to convert table structures by the thousands in minutes to create the tables on the target system.
- Big insurance needs to compare and sync the moving tables to ensure the data is the same on the source and target system.
- During the migration process, big insurance needs to join data from legacy and cloud systems and determine which system will process the join (federated queries).
- Big insurance needs multiple employees and high-speed Servers to perform large migrations in parallel.
The great news is that we are now working with large financial institutions. I will let you know how that goes as we help them complete their migrations.
Tom Coffing, better known as Tera-Tom, is the founder of Coffing Data Warehousing where he has been CEO for the past 25 years. Tom has written over 75 books on all aspects of Teradata, Netezza, Yellowbrick, Snowflake, Redshift, Aurora, Vertica, SQL Server, and Greenplum. Tom has taught over 1,000 classes worldwide, and he is the designer of the Nexus Product Line.