Factory Labs

Delta Sharing Publish

Stream CRM facts (accounts, contacts, leads, opportunities, activities, cases) back to your Databricks, Snowflake, or PyIceberg lake every 15 minutes as Delta tables.

The Delta Sharing publish stream mirrors core CRM facts back to your lakehouse as Delta tables — written by Factory, read by your data team in Databricks, Snowflake, or PyIceberg with one bearer token. No copy of data inside Factory; no destructive operations on your side.

What gets published

Six tables go into the share, refreshed roughly every 15 minutes:

TableColumnsNotes
crm.accountsid, organization_id, name, domain, industry, employees, revenue, country, created_at, updated_atNo PII columns leaked
crm.contactsid, account_id, first_name, last_name, email_hash, phone_hash, title, owner_id, created_at, updated_atEmail/phone are SHA-256 hashed for privacy
crm.leadsid, organization_id, source, status, owner_id, score, created_at, updated_at
crm.opportunitiesid, account_id, name, stage, amount, close_date, owner_id, won_at, lost_reason, created_at, updated_at
crm.activitiesid, related_to_type, related_to_id, type, subject, completed_at, owner_id, created_atBodies excluded — joinable to CRM via id if needed
crm.casesid, account_id, contact_id, status, priority, queue, sla_due_at, resolved_at, created_at, updated_at

The schema is fixed (PUBLISH_TABLES.<entity> in code) and tracked as part of the platform contract — additions are non-breaking; removals are versioned.

How publish works

Loading diagram…

The publisher emits real Delta Lake protocol v1.x tables — proper transaction logs (_delta_log/00000000000000000001.json), Parquet data files, and state.json watermarks per table. Recipients consume via any Delta-aware client without needing a custom connector.

Step 1 — Activate the share

Go to Settings → Integrations → Publish to Lakehouse:

/settings/integrations/share

The "Your share" card shows:

  • Share name — defaults to organizations.delta_share_name (or your tenant slug)
  • Schemacrm (fixed)
  • Endpointhttps://<your-domain>/api/v1/delta-sharing
  • Tables — the 6 tables listed above

Step 2 — Add a recipient

A recipient is a bearer-token-scoped reader of your share. You typically create one per consuming environment (one for the data team's Databricks workspace, one for an analyst Snowflake account, etc.).

  1. Type a recipient name (e.g. pilot-databricks).
  2. Click Add recipient.
  3. A modal pops with a bearer token — dsh_… (68 characters). Copy it now — once you close the modal, the token is gone forever (only its SHA-256 hash + last-4 suffix is stored).
  4. Click Download .share file. A standard Delta Sharing profile JSON downloads:
{
  "shareCredentialsVersion": 1,
  "endpoint": "https://app.factorylabs.ai/api/v1/delta-sharing",
  "bearerToken": "dsh_…",
  "expirationTime": null
}

Hand the .share file (or just the token + endpoint) to the consumer.

Step 3 — Trigger the first publish

Publish runs on a 15-minute cron (configured in vercel.json). To force an immediate run for your tenant:

curl -H "Authorization: Bearer $CRON_SECRET" \
  https://<your-domain>/api/cron/warehouse-publish

The response includes a per-tenant summary:

{
  "ok": true,
  "summaries": [{
    "slug": "<tenant>",
    "status": "ok",
    "rowsPerTable": { "accounts": 1234, "opportunities": 5678, ... },
    "bytesWritten": 12345678
  }]
}

Subsequent runs are incremental — only rows with updated_at > watermark get re-emitted. The watermark is persisted per table in share_publish_runs so a missed cycle catches up automatically on the next run.

Consume from Databricks

In Databricks Unity Catalog:

  1. Catalog → Sharing → Add providerBearer token.
  2. Paste the contents of the .share file.
  3. Click the new provider → Add catalog. Pick your share name. Name the catalog (e.g. factory_crm).
  4. The catalog appears with a single schema crm and 6 tables.

Query as if it were a native Unity Catalog table:

SELECT count(*) FROM factory_crm.crm.opportunities;

-- Join warehouse data with CRM facts inside Databricks:
SELECT a.name, COUNT(o.id) AS quotes
FROM factory_crm.crm.accounts a
LEFT JOIN analytics.public.orders o
  ON o.account_id = a.id
GROUP BY 1 ORDER BY 2 DESC LIMIT 10;

After a fresh publish cycle, run REFRESH SCHEMA factory_crm.crm to pick up new rows.

Consume from Snowflake

Snowflake reads Delta Sharing natively via CREATE CATALOG INTEGRATION:

USE ROLE ACCOUNTADMIN;

CREATE CATALOG INTEGRATION factory_crm_share
  TYPE = DELTA_SHARING
  TABLE_FORMAT = DELTA
  INTEGRATION = 'https://app.factorylabs.ai/api/v1/delta-sharing'
  BEARER = 'dsh_…'
  ENABLED = TRUE;

SHOW SHARES IN CATALOG INTEGRATION factory_crm_share;

CREATE DATABASE FACTORY_CRM FROM SHARE provider.<share_name>;

USE FACTORY_CRM.crm;
SELECT count(*) FROM accounts;

Cross-stack JOINs work without either side egressing data — both sides run inside Snowflake:

SELECT a.name, COUNT(o.id) AS orders
FROM FACTORY_CRM.crm.accounts a
LEFT JOIN ANALYTICS.PUBLIC.orders o
  ON o.account_id = a.id
GROUP BY 1 ORDER BY 2 DESC LIMIT 10;

Consume from PyIceberg / Python

Any Delta-aware client works. Quick sanity check from Python:

import delta_sharing

profile = "factory_crm.share"  # the downloaded .share file
share = delta_sharing.SharingClient(profile)
print(share.list_all_tables())

df = delta_sharing.load_as_pandas(f"{profile}#<share>.crm.accounts")
print(df.head())

Iceberg REST

For tools that prefer the Iceberg REST catalog API over Delta Sharing, Factory exposes a read-only conformance subset of Iceberg REST at:

/api/v1/iceberg/v1/*

Discovery surface only — list_namespaces, list_tables, load_table (returns metadata + schema). Full table scans require an Iceberg metadata writer (separate roadmap item). Useful for cataloging / lineage tools.

Enable with the ICEBERG_REST_ENABLED=true deployment flag. Auth uses the same recipient bearer tokens as Delta Sharing.

BASE=https://app.factorylabs.ai/api/v1/iceberg/v1
TOKEN="dsh_…"
SHARE="<your share name>"

curl -H "Authorization: Bearer $TOKEN" "$BASE/$SHARE/namespaces" | jq
# { "namespaces": [["crm"]] }

curl -H "Authorization: Bearer $TOKEN" "$BASE/$SHARE/namespaces/crm/tables" | jq
# { "identifiers": [{"namespace":["crm"],"name":"accounts"}, ...] }

PyIceberg conformance:

from pyiceberg.catalog import load_catalog
catalog = load_catalog(
    "factory",
    type="rest",
    uri="https://app.factorylabs.ai/api/v1/iceberg",
    token="dsh_…",
    warehouse="<share>",
)
print(catalog.list_namespaces())  # [("crm",)]
tbl = catalog.load_table(("crm", "accounts"))
print(tbl.schema())

Revoke a recipient

In Settings → Integrations → Publish to Lakehouse, click Revoke on the recipient row. Status flips to revoked immediately. Bearer tokens are checked on every Delta Sharing request (no caching), so the next read attempt fails within seconds with HTTP 401. No grace period — revocation is instant by design.

To rotate a token, revoke the recipient and create a new one with the same name.

Operational dashboard

The same page shows the publish history per tenant:

  • Last 10 publish runs with status, duration, rows per table, bytes written
  • Per-recipient: last_seen_at, request_count_24h, bytes_egressed_24h
  • Failed runs surface the share_publish_runs.error column inline

Errors in publish are logged but do not page — the next 15-min cycle retries with the same watermark, so a transient blob outage self-heals.

Plan & feature flags

Delta Sharing publish is gated by two deployment env vars:

VariableRequiredNotes
BLOB_READ_WRITE_TOKENYesVercel Blob token — destination for parquet + log files
CRON_SECRETYesAuth for the cron trigger endpoint
DELTA_SHARING_BASE_URLYesPublic origin for the share endpoint (e.g. https://app.factorylabs.ai/api/v1/delta-sharing)

When DELTA_SHARING_BASE_URL is unset, the Publish to Lakehouse page renders a "not configured" notice and the cron skips silently.

Troubleshooting

.share file downloads but Databricks says "no shares" The endpoint URL in the .share file must be reachable from Databricks. If you're using a custom subdomain, confirm it resolves and serves a 200 from GET /api/v1/delta-sharing/shares (with the bearer token).

Publish cron returns ok: true but bytes_written is 0 First run with no data, or the watermark is already at the latest updated_at. Insert a test row in CRM and re-run.

Recipient last_seen_at never updates Either the consumer hasn't queried yet, or the bearer token they're using doesn't match the SHA-256 hash on file. Re-create the recipient and hand them a fresh token.

Snowflake CREATE CATALOG INTEGRATION returns "invalid endpoint" Snowflake validates the endpoint by hitting /shares with the bearer. Verify with curl -H "Authorization: Bearer dsh_…" https://<your-domain>/api/v1/delta-sharing/shares — should return a JSON array with one share.