Best Practices For Federated Queries
Introduction to Federated Queries
Almost every developer or database user has joined two or more tables on a database using SQL.
A federated query joins two or more tables or views on separate database platforms and runs as a single query. The layman’s term is a cross-system join, but experts refer to these as federated queries.
I’ve done a 30-table join spanning 30 database systems in a single query.
This blog will show you the techniques we mastered that are considered best practices for federated queries.
The Components of a Federated Query
Joins within a database are difficult and resource hogs on massively parallel processing systems because two rows joining must be in the same memory pool.
So, the first fundamental for any join to happen is that the rows must join inside the memory of the database engine.
However, we cheat and allow the join to happen inside the memory of the user’s PC, laptop, or any server!
So, let me give you the exact menu of components for a federated query, regardless of whether you are joining two tables from two database systems or 30 tables from 30 systems.
You must figure out which system will perform the final join. We call this the Join Hub.
The SQL must be built in the case and SQL that the Join Hub will execute.
The foreign tables not residing on the Join Hub system must be converted, as well as the data types, so that they can be created on the Join Hub.
Database utilities must be used to move the foreign table(s) data to the Join Hub system.
The join is executed, and the foreign tables are dropped.
And it all has to happen within seconds.
Best Practice – Build the SQL Automatically
We believe we built the business’s best automated SQL join builder, which we named The Super Join Builder. Why make users write the SQL themselves when you can automate the process? Then, you allow the user to edit the SQL if they desire.
When you automate the SQL in a join builder, the user can easily change the column order in the select list. Users can instantly add an ORDER BY clause by dragging and dropping columns.
Most importantly, users can change the Join Hub to another system, and the join builder makes all the necessary changes in seconds.
Another reason to use an automated join builder is that these joins can be saved and executed on a scheduler and shared with others in the company.
But the icing is that you can use a WHERE clause on a foreign table that will eliminate rows before the data is moved, thus allowing tables with millions or billions of rows to work efficiently.
Check out the video below to see all of these best practices come into play.
Best Practice – Perform the Join On the User’s PC
The biggest asset every user has that only they use is their PC or laptop. 90% of federated queries can perform joins within the user’s PC or laptop.
The tables are simultaneously queried separately using an ODBC connection. Then, the PC or Laptop is instructed to join the data using the memory and CPU of the PC or laptop.
Every database platform has a CPU and memory to process joins and analytics. A user’s PC or laptop has a CPU and memory that work similarly.
A database platform indeed has more CPU power and memory, but it is being inundated with hundreds to thousands of queries from other users.
A user’s PC or laptop has no traffic, so it is faster to join the data using the user’s PC or laptop, and it takes traffic off of an already overburdened database system.
Check out the video below to see all of these best practices come into play.
Best Practice – Convert Tables and Build Load Scripts Automatically
The greatest nightmare for anyone using data is to convert a table definition (Data Definition Language – DDL) from one system to another. The second greatest nightmare is to convert the data types.
For a big migration project, this process often takes months to years, but we spent over a decade automatically converting tables across 25 different relational databases. It only takes seconds.
What almost nobody in the world can do is build load scripts that move the data between the source and the target system. We spent another decade mastering 250 load script languages. Hence, the data movement is automated and gives you options to use the load utility best fit for the task, which is usually based on table size.
Check out the video below to see all of these best practices come into play.
Best Practice – Use a WHERE Clause that Filters Before a Table Moves
When you write SQL with a WHERE clause, the data is filtered before the join takes place, but this does you no good if you move the table from the source to the target system.
The only choice that makes sense is to build a WHERE clause on any table moving so only the necessary rows will move from the source to the target, thus exponentially eliminating the data movement volume.
Check out the video below to see this best practice.
Best Practice – Schedule the join to be Executed
When you have built a federated query using a Super Join Builder, you can execute it from the command line and schedule it to run on your PC. If the data volume is large, you can schedule it on a server.
Scheduling allows you to control important factors around company business rules and efficiently controls data movement.
Check out the video below to see this best practice.
Here is a 60-second video explaining the philosophy behind using a Server to execute data movement and federated query jobs.
Best Practice – Save the Result Set as a Table on Another System
You can save enormous money by running a federated query and saving the result set to the database platform of your choice.
Saving the data as a table to another platform is the perfect solution for your Tableau and Power BI users.
Some companies save the result to free systems such as MySQL or Postgres and then have teams analyze the data on these inexpensive systems.
Check out the video below to see this best practice in action.
30-Table Join Spans 30 Systems
Watch the ultimate federated query in a 30-table join that spans 30 systems in a single query. What some thought once impossible is performed right before your eyes.
60-Second Videos of Nexus
The link below will show you 60-second videos of the best features of Nexus.