Amazon has announced that in 2020 they will provide cross-system join capability, which many people call a federated query. Users can even save the results as a table on S3.
Here is exactly what the Amazon documentation says about the announcement:
“You can query data in Amazon RDS for PostgreSQL and Amazon Aurora with PostgreSQL compatibility from Amazon Redshift using external schemas. This type of query is called a federated query. You can use this capability to combine the data queried from one or more Amazon RDS PostgreSQL and Amazon Aurora PostgreSQL databases with data already in Amazon Redshift. You can also combine such data with data in Amazon S3 tables.
Amazon Redshift exposes RDS PostgreSQL and Aurora PostgreSQL metadata with the information schema (information_schema) in Amazon Redshift. Doing this enables business intelligence (BI) tools, such as Tableau and Micro-strategy, to view data referenced by the external schema.”
I never doubt anything that Amazon does, and they are once again being brilliant, which is exactly why they are the most dominating company in the world. Although this is their first attempt at federated queries, they will bring the future of all computing to the masses at an accelerated rate.
However, Amazon only allows federated queries from Amazon Redshift, Amazon RDS for PostgreSQL, and Amazon Aurora.
Who do you think will be the first company to allow federated queries across any combination of Amazon Redshift, Azure SQL Data Warehouse, Greenplum, Teradata, Oracle, SQL Server, DB2, Snowflake, MySQL, Postgres, Hadoop, Netezza, and SAP HANA?
I already know the answer. Coffing Data Warehousing!
So, why am I so happy about a powerhouse competitor like Amazon announcing support for federated queries?
Because it is all my team has been working on for 15 years! Major companies are not going to make a major paradigm shift to adopt a federated query approach because of Tera-Tom. Still, they will stand up and take notice when Amazon shows them the future of all computing.
I consider Amazon, much more of a partner than a competitor.
Our 15-year development for federated queries compares to Thomas Edison testing the light bulb 10,000 times before he got it right. We test hundreds of federated queries every night, and we are years ahead of everyone else. I am going to show you two diagrams at the end of this article that will show you that in 2020 how easy it will be for your company to perform federated queries better than any of your competitors.
Someone once asked a guy in New York, “How do you get to Carnegie Hall?”. He replied, “Practice, man, practice!”
How is it possible that Coffing Data Warehousing is years ahead of the future of federated queries over giants like Amazon, Azure, Google, Teradata, Oracle, IBM, Dell, HP, Tableau, and Microstrategy?
The answer is, vision, engineering genius, expertise on every system, fifteen years of work, no stockholder pressure, and practice, man, practice!
Why have these brilliant companies so far not been able to pull it off? Because it is so difficult. Oracle doesn’t talk to Teradata, and Redshift doesn’t talk with Azure SQL Data Warehouse. Nobody talks to anybody because no vendor is interested in what their competitor is doing.
Database vendors are like the Hotel California, “You can check in your data any time you like, but you can never leave!?
Check out the picture below to see the future of what all federated query companies will be attempting to do.
The following picture shows (top right) an on-premises site with systems from Teradata, Oracle, SQL Server, Greenplum, DB2, Netezza, PostgreSQL, SAP HANA, Hadoop, and MySQL. On the top left, we have the Amazon AWS cloud. We have placed a server at each location and loaded the NexusCore Software. The user can now be anywhere in the world with an internet connection. They will build the federated query by dropping and dragging tables into the Nexus Super Join Builder, defining the join condition, and place a checkmark on the columns they want on their report.
So, to be able to perform federated queries, you need to solve these problems:
- Convert the table structures (DDL) and the data types between systems.
- Allow the user to drop-and-drag tables, views, and Excel worksheets from any on-premises or cloud system in a graphical user interface and then build the SQL automatically as they point-and-click on columns for their report.
- Move the data to a common place where it can join by mastering and automating every load utility from each vendor involved.
- Perform the join, drop the tables, and return the answer set.
- Give the user the option of saving the result as a table, Excel, or Tableau.
- Allow the user to save the join and share it with peers.
And wouldn’t it be a bonus if users could use powerful servers on wide area networks to execute the federated queries as fast as physically possible?
Coffing Data Warehousing and the Nexus Chameleon Desktop and NexusCore Servers do all of the above!
We have learned that small data needs to join differently than large data, and the joins need to be even more intelligent for humongous data. Handling all of the different scenarios is nearly impossible, but we have accounted for them all.
We have automated countless options to handle every scenario.
The most advanced work we have done is to master the table conversions between each system and master each vendor’s data movement utilities. We now convert between all systems and can move the data using 80 different load techniques.
We then allow the user to decide if they want to execute the federated query from their PC, or any of hundreds of NexusCore Servers. We then let the user decide which system should process the joining of the data.
How can we go wrong by allowing the user to process the join on their PC, any NexusCore Server, or any database system? This versatile approach gives a company the ability to spin-up a cloud system of any size and process the federated queries there! Or on Teradata, Oracle, Redshift, Greenplum, Snowflake, etc.
On smaller data (less than 1,000,000 rows), a user can execute the federated query and make their PC the processing hub. The Nexus queries each table in the join separately and brings back only the columns needed to satisfy the query, including the join condition, Order By, Having statements, and WHERE clause. The Nexus builds the join inside the user’s PC, and the report comes back instantly.
On larger data, a user can execute the federated query and make the processing hub any NexusCore server. If a company were to buy several NexusCore Server software modules, they could place one at each data center and cloud region and always ensure they were processing the federated queries in the fastest way. The only information coming back to the user’s PC is the final answer set.
It is one thing for a data scientist to be able to pull off federated queries, but it has always been my dream to make it so simple that everyone can do it. Check out the picture below and see how simple the Super Join Builder of Nexus makes federated queries. We automate everything!
The picture that follows shows the Nexus Super Join Builder. Let me summarize. We are joining the Orders table from Teradata to the Customer_Table from Redshift. We want the Order_No, Order_Total, and Customer_Name on our report. We are going to join and process the data using the Hub MyPC and execute the entire job from the NexusCore Server. The NexusCore Server will query each table separately, and in the background, perform the join. Nexus automatically converts the table structures (DDL) and data types, SQL, and creates the load utilities. All the user does is point-and-click, and press Execute.
The next paradigm shift in our industry is federated queries. You can have the best-federated query system in the world by January 2020!
Please give me a call or drop me an email to see a demo. I will help you set everything up and then put the software in your hands for a free trial. I look forward to our partnership.
Tom Coffing, better known as Tera-Tom, is the founder of Coffing Data Warehousing where he has been CEO for the past 25 years. Tom has written over 75 books on all aspects of Teradata, Netezza, Yellowbrick, Snowflake, Redshift, Aurora, Vertica, SQL Server, and Greenplum. Tom has taught over 1,000 classes worldwide, and he is the designer of the Nexus Product Line.