Delta Sharing Publish
Stream CRM facts (accounts, contacts, leads, opportunities, activities, cases) back to your Databricks, Snowflake, or PyIceberg lake every 15 minutes as Delta tables.
The Delta Sharing publish stream mirrors core CRM facts back to your lakehouse as Delta tables — written by Factory, read by your data team in Databricks, Snowflake, or PyIceberg with one bearer token. No copy of data inside Factory; no destructive operations on your side.
What gets published
Six tables go into the share, refreshed roughly every 15 minutes:
| Table | Columns | Notes |
|---|---|---|
crm.accounts | id, organization_id, name, domain, industry, employees, revenue, country, created_at, updated_at | No PII columns leaked |
crm.contacts | id, account_id, first_name, last_name, email_hash, phone_hash, title, owner_id, created_at, updated_at | Email/phone are SHA-256 hashed for privacy |
crm.leads | id, organization_id, source, status, owner_id, score, created_at, updated_at | |
crm.opportunities | id, account_id, name, stage, amount, close_date, owner_id, won_at, lost_reason, created_at, updated_at | |
crm.activities | id, related_to_type, related_to_id, type, subject, completed_at, owner_id, created_at | Bodies excluded — joinable to CRM via id if needed |
crm.cases | id, account_id, contact_id, status, priority, queue, sla_due_at, resolved_at, created_at, updated_at |
The schema is fixed (PUBLISH_TABLES.<entity> in code) and tracked as part of the platform contract — additions are non-breaking; removals are versioned.
How publish works
The publisher emits real Delta Lake protocol v1.x tables — proper transaction logs (_delta_log/00000000000000000001.json), Parquet data files, and state.json watermarks per table. Recipients consume via any Delta-aware client without needing a custom connector.
Step 1 — Activate the share
Go to Settings → Integrations → Publish to Lakehouse:
/settings/integrations/shareThe "Your share" card shows:
- Share name — defaults to
organizations.delta_share_name(or your tenant slug) - Schema —
crm(fixed) - Endpoint —
https://<your-domain>/api/v1/delta-sharing - Tables — the 6 tables listed above
Step 2 — Add a recipient
A recipient is a bearer-token-scoped reader of your share. You typically create one per consuming environment (one for the data team's Databricks workspace, one for an analyst Snowflake account, etc.).
- Type a recipient name (e.g.
pilot-databricks). - Click Add recipient.
- A modal pops with a bearer token —
dsh_…(68 characters). Copy it now — once you close the modal, the token is gone forever (only its SHA-256 hash + last-4 suffix is stored). - Click Download .share file. A standard Delta Sharing profile JSON downloads:
{
"shareCredentialsVersion": 1,
"endpoint": "https://app.factorylabs.ai/api/v1/delta-sharing",
"bearerToken": "dsh_…",
"expirationTime": null
}Hand the .share file (or just the token + endpoint) to the consumer.
Step 3 — Trigger the first publish
Publish runs on a 15-minute cron (configured in vercel.json). To force an immediate run for your tenant:
curl -H "Authorization: Bearer $CRON_SECRET" \
https://<your-domain>/api/cron/warehouse-publishThe response includes a per-tenant summary:
{
"ok": true,
"summaries": [{
"slug": "<tenant>",
"status": "ok",
"rowsPerTable": { "accounts": 1234, "opportunities": 5678, ... },
"bytesWritten": 12345678
}]
}Subsequent runs are incremental — only rows with updated_at > watermark get re-emitted. The watermark is persisted per table in share_publish_runs so a missed cycle catches up automatically on the next run.
Consume from Databricks
In Databricks Unity Catalog:
- Catalog → Sharing → Add provider → Bearer token.
- Paste the contents of the
.sharefile. - Click the new provider → Add catalog. Pick your share name. Name the catalog (e.g.
factory_crm). - The catalog appears with a single schema
crmand 6 tables.
Query as if it were a native Unity Catalog table:
SELECT count(*) FROM factory_crm.crm.opportunities;
-- Join warehouse data with CRM facts inside Databricks:
SELECT a.name, COUNT(o.id) AS quotes
FROM factory_crm.crm.accounts a
LEFT JOIN analytics.public.orders o
ON o.account_id = a.id
GROUP BY 1 ORDER BY 2 DESC LIMIT 10;After a fresh publish cycle, run REFRESH SCHEMA factory_crm.crm to pick up new rows.
Consume from Snowflake
Snowflake reads Delta Sharing natively via CREATE CATALOG INTEGRATION:
USE ROLE ACCOUNTADMIN;
CREATE CATALOG INTEGRATION factory_crm_share
TYPE = DELTA_SHARING
TABLE_FORMAT = DELTA
INTEGRATION = 'https://app.factorylabs.ai/api/v1/delta-sharing'
BEARER = 'dsh_…'
ENABLED = TRUE;
SHOW SHARES IN CATALOG INTEGRATION factory_crm_share;
CREATE DATABASE FACTORY_CRM FROM SHARE provider.<share_name>;
USE FACTORY_CRM.crm;
SELECT count(*) FROM accounts;Cross-stack JOINs work without either side egressing data — both sides run inside Snowflake:
SELECT a.name, COUNT(o.id) AS orders
FROM FACTORY_CRM.crm.accounts a
LEFT JOIN ANALYTICS.PUBLIC.orders o
ON o.account_id = a.id
GROUP BY 1 ORDER BY 2 DESC LIMIT 10;Consume from PyIceberg / Python
Any Delta-aware client works. Quick sanity check from Python:
import delta_sharing
profile = "factory_crm.share" # the downloaded .share file
share = delta_sharing.SharingClient(profile)
print(share.list_all_tables())
df = delta_sharing.load_as_pandas(f"{profile}#<share>.crm.accounts")
print(df.head())Iceberg REST
For tools that prefer the Iceberg REST catalog API over Delta Sharing, Factory exposes a read-only conformance subset of Iceberg REST at:
/api/v1/iceberg/v1/*Discovery surface only — list_namespaces, list_tables, load_table (returns metadata + schema). Full table scans require an Iceberg metadata writer (separate roadmap item). Useful for cataloging / lineage tools.
Enable with the ICEBERG_REST_ENABLED=true deployment flag. Auth uses the same recipient bearer tokens as Delta Sharing.
BASE=https://app.factorylabs.ai/api/v1/iceberg/v1
TOKEN="dsh_…"
SHARE="<your share name>"
curl -H "Authorization: Bearer $TOKEN" "$BASE/$SHARE/namespaces" | jq
# { "namespaces": [["crm"]] }
curl -H "Authorization: Bearer $TOKEN" "$BASE/$SHARE/namespaces/crm/tables" | jq
# { "identifiers": [{"namespace":["crm"],"name":"accounts"}, ...] }PyIceberg conformance:
from pyiceberg.catalog import load_catalog
catalog = load_catalog(
"factory",
type="rest",
uri="https://app.factorylabs.ai/api/v1/iceberg",
token="dsh_…",
warehouse="<share>",
)
print(catalog.list_namespaces()) # [("crm",)]
tbl = catalog.load_table(("crm", "accounts"))
print(tbl.schema())Revoke a recipient
In Settings → Integrations → Publish to Lakehouse, click Revoke on the recipient row. Status flips to revoked immediately. Bearer tokens are checked on every Delta Sharing request (no caching), so the next read attempt fails within seconds with HTTP 401. No grace period — revocation is instant by design.
To rotate a token, revoke the recipient and create a new one with the same name.
Operational dashboard
The same page shows the publish history per tenant:
- Last 10 publish runs with status, duration, rows per table, bytes written
- Per-recipient:
last_seen_at,request_count_24h,bytes_egressed_24h - Failed runs surface the
share_publish_runs.errorcolumn inline
Errors in publish are logged but do not page — the next 15-min cycle retries with the same watermark, so a transient blob outage self-heals.
Plan & feature flags
Delta Sharing publish is gated by two deployment env vars:
| Variable | Required | Notes |
|---|---|---|
BLOB_READ_WRITE_TOKEN | Yes | Vercel Blob token — destination for parquet + log files |
CRON_SECRET | Yes | Auth for the cron trigger endpoint |
DELTA_SHARING_BASE_URL | Yes | Public origin for the share endpoint (e.g. https://app.factorylabs.ai/api/v1/delta-sharing) |
When DELTA_SHARING_BASE_URL is unset, the Publish to Lakehouse page renders a "not configured" notice and the cron skips silently.
Troubleshooting
.share file downloads but Databricks says "no shares"
The endpoint URL in the .share file must be reachable from Databricks. If you're using a custom subdomain, confirm it resolves and serves a 200 from GET /api/v1/delta-sharing/shares (with the bearer token).
Publish cron returns ok: true but bytes_written is 0
First run with no data, or the watermark is already at the latest updated_at. Insert a test row in CRM and re-run.
Recipient last_seen_at never updates
Either the consumer hasn't queried yet, or the bearer token they're using doesn't match the SHA-256 hash on file. Re-create the recipient and hand them a fresh token.
Snowflake CREATE CATALOG INTEGRATION returns "invalid endpoint"
Snowflake validates the endpoint by hitting /shares with the bearer. Verify with curl -H "Authorization: Bearer dsh_…" https://<your-domain>/api/v1/delta-sharing/shares — should return a JSON array with one share.
Related guides
- Connect Databricks (warehouse) — the inbound federated reads stream.
- Connect Snowflake — Snowflake-native pilots typically run both this stream and Delta Sharing publish together.
- Governance — revocation guarantees, token storage, audit trail.
Connect Snowflake (Warehouse)
Expose Snowflake tables as governed AI Assistant skills via federated reads. Key-pair JWT auth, ~10 minutes through the wizard.
MCP Bridge — Inbound Clients
Register Databricks Genie or any MCP server as an Assistant skill. Inbound MCP turns your existing agents into first-class Factory tools.