Definition

Lakehouse-Native CRM

A lakehouse-native CRM treats your Databricks or Snowflake estate as a first-class data source, reading from it via MCP and exposing its own data via Delta Sharing, without a sync engine in between.

Last updated

Definition

A lakehouse-native CRM is a CRM that treats your lakehouse (Databricks, Snowflake, or any Delta / Iceberg-compatible warehouse) as a first-class data source. Two architectural decisions make a CRM lakehouse-native:

  1. The CRM can read from the lakehouse at conversation time, typically via the Model Context Protocol (MCP), so the AI assistant can synthesize answers from warehouse data without a pre-built integration for each question.
  2. The CRM exposes its own data back to the lakehouse via Delta Sharing (or Iceberg REST), so analytics consumers see CRM records as governed tables in their existing catalog without operating a connector.

The defining characteristic is the absence of a CRM-to-lakehouse sync engine. The CRM does not copy lakehouse data into itself; the lakehouse does not copy CRM data into itself. Both sides read from each other directly.

Why the distinction matters

Most "Databricks integration" or "Snowflake integration" CRMs are sync-engine based. A Fivetran (or equivalent) connector batch-extracts CRM records into the lakehouse on a schedule. This is the same architectural pattern as middleware-bridged CRM-to-ERP integration, just running in the other direction. It produces the same costs:

  • Staleness of the data in the lakehouse, bounded by the connector's schedule.
  • Operational overhead of running and monitoring the connector.
  • Drift between the CRM's view and the lakehouse's view of the same records.
  • A vendor bill for the connector itself, often comparable in size to the CRM license.

A lakehouse-native CRM eliminates all four because there is no connector.

What "lakehouse-native" specifically requires

The test cuts through marketing language with three concrete questions:

  1. Can the AI assistant write a SQL query against your warehouse, run it federated, and render the result inline? Not "can it open a notebook." Not "can it link to a dashboard." Can it actually execute a federated query at conversation time. If yes, federated read is real.
  2. Does your data engineering team need to install a connector to get CRM data into the lakehouse? If no, and the data just appears as Unity Catalog (or Snowflake / Iceberg) tables under the CRM's catalog, outbound share is real.
  3. Does the CRM hold a local copy of any of your warehouse data? If yes, the integration is the sync-engine pattern in disguise.

A CRM that answers all three correctly (yes / no / no) is lakehouse-native. Anything else is some intermediate state.

The MCP and Delta Sharing pieces

The two protocol pieces:

  • Model Context Protocol (MCP). The emerging standard for LLM tool-use. A lakehouse-native CRM is a bidirectional MCP client and server. Outbound, it speaks MCP to Databricks Genie, Snowflake's MCP server, or any third-party MCP server. Inbound, it speaks MCP so external assistants (Claude, Cursor, Mosaic AI) can drive the CRM as a tool. See /docs/lakehouse/mcp-bridge.
  • Delta Sharing. Databricks' open protocol for sharing Delta tables across organizations and engines. A lakehouse-native CRM exposes its records as Delta tables consumable by any Delta-compatible engine (Databricks, DuckDB, Trino, Apache Spark) without a connector. See /docs/lakehouse/delta-sharing.
  • Iceberg REST catalog. For Snowflake-first shops, Iceberg REST conformance provides the equivalent: the same data, accessible via the catalog protocol Snowflake supports natively. See /docs/lakehouse/overview.

What it looks like for a user

A specific example. A sales rep at a distributor asks the CRM assistant: "What is the lifetime gross margin by account for the Cleveland territory?"

In a lakehouse-native CRM:

  1. The assistant identifies that gross margin is in the lakehouse, not the CRM.
  2. The assistant calls the Databricks MCP server with a generated SQL query against the analytics mart.
  3. Databricks runs the federated query, returns rows.
  4. The assistant renders the table in the CRM, attached to the account context.
  5. The rep clicks one of the accounts; that opens the account record, which itself live-reads order history from the ERP.

Three live reads (CRM, ERP, lakehouse), composed in one answer, no copies.

In a sync-engine CRM, only step 1 happens, then the assistant has no way to answer the question because the gross margin data is not present in the CRM's database, and the integration vendor did not anticipate this specific question.

Trade-offs

  • Lakehouse uptime dependency. Federated queries fail if the lakehouse is unreachable. (Mitigation: gateway-side retry logic, graceful fallback to "the lakehouse is currently unavailable.")
  • Query cost visibility. Federated queries incur lakehouse compute costs. The CRM has to surface this so the team can manage spend; Factory's Lakehouse module shows per-query cost estimates and per-tenant budget caps.
  • Latency budget. A federated query takes 200ms-2s depending on warehouse size. The CRM has to handle this UX, typically with optimistic loading and streaming results.

These costs are real but smaller than the costs of operating a CRM-to-lakehouse connector at distributor scale.

  • ERP-Native CRM. The same architectural principle applied to operational data (the ERP) rather than analytical data (the lakehouse).
  • Delta Sharing. The open protocol from Databricks for sharing Delta tables across organizations and engines.
  • Iceberg REST Catalog. The equivalent protocol from the Apache Iceberg ecosystem for catalog-level sharing.
  • MCP (Model Context Protocol). The emerging standard for exposing tools (and data sources) to LLM assistants.

Further reading