Federated Queries – Comparing Presto to Nexus
Federated Querying: Comparing Presto and Nexus
Federated queries are essential for organizations needing to analyze data across disparate systems without centralizing it into a single repository. Both Presto and Nexus offer solutions for executing federated queries, but their approaches, architectures, and capabilities differ significantly. While Presto provides a powerful distributed query engine, Nexus offers unparalleled flexibility and automation, performing many of the same functions as Presto while extending well beyond its limitations.
In the picture below, Nexus performs a 20-table join spanning 20 systems in a single query.
Architectural Philosophy
Presto
Presto is a distributed SQL query engine that executes federated queries across multiple data sources. It relies on worker nodes that query data from source systems in parallel and move it into Presto’s memory for processing. Presto excels in environments with high scalability needs, as its worker nodes can be added to handle increased query workloads.
Nexus
Nexus takes a more user-centric and adaptive approach. Rather than defaulting to processing all joins in a dedicated query engine, Nexus empowers users to choose where joins are executed:
• User’s PC (MyPC Hub): Ideal for smaller datasets (up to 2 million rows) with sufficient local memory and processing power.
• Nexus Server: Optimized for larger datasets, the server acts as a scalable intermediary for data movement and join execution.
• Database Hubs: Users can designate any database platform as the hub, leveraging its native SQL capabilities to perform the join.
This flexibility allows Nexus to adapt to data sizes, system constraints, and user access rights.
Below is a conceptual picture of the interaction of Nexus desktops, a Nexus Server, and database platforms. Nexus desktops can connect directly to database platforms, but when large data needs to be moved, the user can execute federated queries or data migrations through the Nexus High-Speed Server. Enterprises can have unlimited Nexus desktops and servers for perfect linear scalability.
Query Execution and Data Movement
Presto
Presto queries data sources through connectors and retrieves only the necessary columns and rows, applying pushdown optimizations where possible. However, when performing joins across systems:
• Data Movement: Presto does move data from the source systems to its worker nodes, where the join is executed in memory. This movement is required for any multi-system join.
• Processing Location: All operations, including joins, are performed exclusively within Presto’s memory. The original database platforms play no role beyond providing the queried data.
Nexus
Nexus also queries data sources through ODBC connections, moving data from source systems to the designated hub in parallel streams. However, Nexus provides additional options:
- Data Movement Optimization:
• Only the necessary rows and columns are transferred, reducing network overhead.
• WHERE clauses are pushed down to source systems before data is retrieved, ensuring efficient queries.
- Join Flexibility:
• Joins can occur on a user’s PC (MyPC hub), a Nexus Server, or within a hub database system.
• Nexus automatically generates SQL for the designated hub, ensuring compatibility with its syntax and optimizing performance.
This flexibility ensures data joins occur in the most logical and efficient location based on the query’s requirements. Watch a 60-second video showing the Nexus perform a federated query.
Scalability and Performance
Presto
Presto’s distributed design allows it to scale horizontally by adding worker nodes. This scalability makes Presto well-suited for environments with large, complex queries requiring high concurrency and performance.
Nexus
Nexus provides scalability through its Nexus Server, which:
• Acts as an intermediary for large-scale joins.
• Processes data using its own memory and CPU resources, offloading work from the user’s PC.
• Can be deployed on-premises or in the cloud to align with organizational infrastructure.
• Companies can have unlimited Nexus desktops and servers, with each server physically located near the databases they access.
While Nexus may not match Presto’s worker node scalability, its flexibility in processing location (PC, server, or hub database) allows it to achieve comparable performance for a wide range of query sizes.
Watch the 60-second video that makes the Nexus Server so advantageous.
Key Differentiators
What Presto Can Do That Nexus Can Also Do:
• Perform distributed federated queries.
• Retrieve data from multiple sources in parallel.
• Execute joins on its central query engine (similar to Nexus’s MyPC hub or Nexus Server hub).
What Nexus Can Do That Presto Can Not:
• Allow users to designate any system as the hub (e.g., Oracle, Teradata) and perform joins there.
• Automatically generate SQL optimized for the hub system, including syntax, case sensitivity, and DDL conversions.
• Push joins to the user’s PC or Nexus Server for smaller or larger datasets, providing local or cloud-based flexibility.
• Automate the creation of load scripts and WHERE clause optimizations to reduce unnecessary data movement.
• Provide a guided, automated user experience that identifies joinable tables and simplifies query execution.
Conclusion
Presto and Nexus are both powerful tools for federated querying, but their design philosophies and capabilities differ significantly. Presto excels in high-scale distributed query environments where worker node scalability is critical. However, Nexus provides unmatched flexibility and ease of use, allowing users to adapt query execution to the data’s size, location, and system constraints.
For organizations requiring distributed, on-demand querying, Presto remains a strong choice. However, for those prioritizing user-friendly automation, hub-based query flexibility, and strategic data movement, Nexus is the more versatile and capable solution. Nexus matches Presto in many core functionalities and extends beyond its limitations, offering a federated querying solution that truly puts the user in control.
The picture below shows Nexus performing a DB2 Federated query. Notice there are three DB2 tables and a table from Oracle, SQL Server, Postgres, and MySQL. Because the Hub (upper right corner) is set to DB2, only the foreign tables move to DB2 (temporarily), where Nexus performs the join. Presto would have moved each table to Presto to perform the join, even if one of the DB2 tables had a billion rows.
Watch Nexus perform a 20-table join spanning 20 systems in a single query and see the conversions when the hub changes to varying systems.
You can also learn more about the capabilities of Nexus on our CoffingDW.com website or at this link.