Federated Query | Factory Labs Learn

Definition

A federated query is a single query that reads data from multiple sources at runtime, joining or combining the results without copying the data into a central store first. The query engine (a database, a SQL warehouse, an MCP-aware AI assistant) is responsible for translating the unified query into per-source sub-queries, executing them, and stitching the results.

The architectural opposite is ETL (Extract, Transform, Load): copy data from sources into one central store, then query the central store.

Why federated matters

For two reasons:

Freshness. Federated queries read live source data. ETL queries read whatever the last extract loaded.
No copy cost. ETL has storage, compute, and pipeline-maintenance costs. Federated does not (within reason; some federated queries are slow).

The trade-off is performance: ETL queries can be highly optimized (the data is local, the indexes are tuned). Federated queries are bounded by the slowest source. For analytical workloads with petabytes of data, federated is often impractical. For mixed workloads where freshness matters more than maximum scale, federated wins.

Where federated queries are used in B2B SaaS

Three common patterns:

CRM + lakehouse. The CRM AI assistant asks "what is the lifetime gross margin for this account?" and the query federates between the CRM (relationship data) and the lakehouse (analytics-grade calculations).
CRM + ERP. Functionally similar to live ERP reads, but at the query layer rather than the entity layer. The CRM SQL surface federates joins between local CRM data and live ERP data.
Across lakehouses. A query that needs both Snowflake (finance data) and Databricks (data-science data) federates across both with the query engine translating per-source.

The lakehouse + CRM federation is the case Factory Labs invests in heaviest because it is where AI-driven CRM use cases compound.

How federated query works (mechanically)

The query engine has to:

Know the schemas of every source. Either through pre-registered catalogs (Unity Catalog, Iceberg REST) or runtime discovery.
Translate the query. Decompose a unified SQL query into per-source sub-queries that each source can execute. Some operations push down to the source (filters, projections, sorts), others have to happen at the federation layer (cross-source joins, multi-source aggregations).
Execute. Run sub-queries in parallel where possible.
Merge. Stitch results in the federation layer.

Modern federation engines (Trino, Databricks Lakehouse Federation, Snowflake's federated query surface) do this with reasonable efficiency for most workloads.

Federated query through MCP

For AI assistants, the federated pattern often happens through the Model Context Protocol. The assistant has access to multiple MCP servers (one per data source); it generates a query plan that calls each server's tools, and synthesizes the results.

This is structurally similar to a SQL federation engine, but the orchestration happens in the LLM's reasoning loop rather than in a query planner. The trade-off is flexibility (the LLM can decide what to call dynamically) vs predictability (the LLM's plans are non-deterministic).

In practice both patterns coexist: pre-built federated SQL for analytical reads, MCP-driven federation for ad-hoc AI conversations.

How Factory Labs uses federated query

The Factory Labs Assistant:

Federates against Databricks via the Genie MCP server for analytics queries.
Federates against Snowflake via the Snowpark MCP server.
Federates against the ERP gateway for live operational data.
Synthesizes across all three when a question requires it (e.g., "what is the projected gross margin on this open opportunity given the current inventory cost basis?").

The query plan is generated by the LLM, executed through MCP, with results returned to the user with citations to each source.

Trade-offs

Latency bounded by slowest source. A federated query is no faster than the slowest sub-query. If the ERP is slow, the federated query is slow.
Auth and governance complexity. Each source has its own permissions; the federation layer has to enforce the right scope across all of them.
Some operations are awkward. Large cross-source joins are slow because the join data has to flow to the federation layer.

These are usually fine for the AI-driven mixed-source workloads federated query enables; they are show-stoppers for analytical workloads at warehouse scale.

Model Context Protocol. The protocol that makes AI-driven federation practical.
Delta Sharing. The complementary pattern; share data, do not federate. Sometimes one, sometimes both.
Lakehouse-Native CRM. The architectural pattern that makes federated query a first-class feature of the CRM.
ETL / ELT. The alternative pattern.