Definition

Delta Sharing

Delta Sharing is an open protocol from Databricks for live, secure sharing of Delta Lake tables across organizations, clouds, and tools without copying the data. It is the outbound half of a lakehouse-native data sharing strategy.

Last updated

Definition

Delta Sharing is an open protocol developed by Databricks for sharing live data in Delta Lake format across organizations, cloud providers, and analytics tools without making a copy. The receiving side reads the data from object storage directly, governed by short-lived bearer tokens issued by the sharing side. Recipients can be on a different cloud, a different tool (Databricks, Snowflake, DuckDB, Trino, Spark), or no platform at all (the protocol has Python and Java clients).

What problem it solves

Before Delta Sharing, sharing analytical data between organizations meant either:

  1. Copying it. Export to CSV / Parquet, ship it, the recipient ingests it on their side. The copy goes stale immediately.
  2. Granting database credentials. Risky and breaks every governance model.
  3. Building a custom API. Expensive, slow, custom auth, custom rate limiting, custom schema evolution.

Delta Sharing collapses these into one protocol. The data stays in the source's object storage. The recipient queries it like any other Delta table. Updates on the source side are visible to the recipient on the next query.

For B2B software, this means a CRM (or any SaaS) can expose its data to a customer's analytics stack with no per-customer integration code.

How it works (at protocol level)

  1. Server. A Delta Sharing server (could be Databricks, Unity Catalog, an open-source server, or a custom implementation) advertises one or more shares. Each share contains one or more tables (Delta Lake format, stored in object storage).
  2. Credentials. The recipient gets a config.share file containing the server endpoint and a short-lived bearer token. Tokens can be rotated and revoked without breaking the receiver's pipelines.
  3. Client. The recipient's tool (Databricks, Snowflake, DuckDB, etc.) reads the share. The actual data access happens through pre-signed URLs to the underlying object storage, scoped per query.
  4. Schema evolution. The protocol handles schema additions and Delta-format changes natively; clients re-read the schema on each query.

The protocol is open and any party can implement either side.

How Factory Labs uses Delta Sharing

Factory Labs exposes CRM data (Accounts, Contacts, Opportunities, Activities, Conversations, Cases) as Delta Sharing tables in a per-tenant share. Customers mount the share in Databricks (via Unity Catalog), Snowflake (via Iceberg-Delta interop), or any other Delta-compatible consumer.

Operationally this means:

  • Data engineers stop building CRM-to-warehouse pipelines.
  • Data is fresh within minutes (snapshot cadence is configurable).
  • Per-tenant HKDF-derived encryption keys mean every share is cryptographically isolated.
  • Revoking a token kills access without breaking other shares.

See /docs/lakehouse/delta-sharing for setup details.

Why it matters for B2B software

Before Delta Sharing, sharing operational data with the customer's analytics stack was the customer's problem (they had to install a Fivetran connector, pay per-row MAR fees, and maintain the pipeline). The vendor was off the hook beyond shipping a REST API.

Delta Sharing flips the responsibility: the vendor publishes a share, the customer consumes it with whatever lakehouse tool they prefer. No connector, no per-row fee, no pipeline maintenance. For analytics-heavy B2B use cases, this is a structural improvement.

Trade-offs

  • Read-only. Delta Sharing is one-directional. Bidirectional flows need a separate mechanism (commonly MCP for AI-tool patterns or a write API for human-driven flows).
  • Eventual consistency. Snapshots are emitted on a cadence (typically every 1-15 minutes). Data is fresh-enough for analytics but not for transactional read-after-write semantics.
  • Storage cost on the source side. The source maintains the Delta tables; that storage is paid for by the source, not the recipient.

These trade-offs are usually fine for analytics use cases and the wrong fit for transactional ones.

  • Iceberg REST Catalog. The Apache Iceberg-flavored equivalent; comparable goals, different protocol.
  • Lakehouse-Native CRM. The architectural pattern that Delta Sharing enables on the outbound side.
  • Model Context Protocol. The inbound complement; MCP for AI-driven federated reads.
  • Fivetran. The conventional alternative (pipelined extracts to the warehouse).

Further reading