Factory Labs

Lakehouse Overview

Federated SQL, Delta Sharing publish, bidirectional MCP, and Vector Search RAG — Factory Labs as a lakehouse-native CRM for Databricks and Snowflake customers.

Factory Labs is the first lakehouse-native CRM. Your warehouse stays where it is — Factory speaks Delta Sharing, MCP, and SQL natively, in both directions. No ETL, no proprietary connector, no copy of your data.

This section walks through the four data streams that connect Factory Labs to your lakehouse, how to set each one up, and how the same governance model (encrypted credentials, per-tenant key derivation, query-guard, audit trails) applies to all of them.

The four streams

Loading diagram…
StreamDirectionWhat it does
Federated readsFactory → your warehouseThe AI Assistant queries your Databricks or Snowflake tables live as governed skills. No copy.
Delta Sharing publishFactory → your lakeCRM facts (accounts, contacts, leads, opportunities, activities, cases) streamed back as Delta tables. Recipients consume in Databricks, Snowflake, or PyIceberg.
MCP — both waysBidirectionalInbound: register a Databricks Genie space (or any MCP server) — its tools become Assistant skills. Outbound: expose /api/mcp so Claude, Cursor, and Mosaic AI agents can drive Factory.
Vector Search RAGFactory → your vector indexFederate to your Databricks Vector Search or Mosaic AI embeddings index — RAG without re-embedding.

What you need to know

  • Zero data movement. Federated reads, RAG queries, and MCP tool calls all execute against your warehouse — Factory only sees the result rows it asked for, scoped by allow-list.
  • Open standards. Delta Lake protocol v1.x for Sharing publish, Iceberg REST catalog (read-only conformance) for discovery, Model Context Protocol for both MCP directions.
  • Same governance, every byte. Query-guard enforces single-SELECT, table & column allow-lists, row filters, and payload caps. Credentials are encrypted with AES-256-GCM and a per-tenant HKDF key.
  • No vendor lock-in. You bring Databricks or Snowflake. Factory adapts. The same wizard shape works for both — only the auth flow differs.

Setup paths

Pick the stream(s) you need and follow the setup guide:

If you want to…Set upTime
Query Databricks tables from the AI AssistantConnect Databricks (warehouse)~5 min
Query Snowflake tables from the AI AssistantConnect Snowflake~10 min
Publish CRM facts back to your lakeDelta Sharing publish~3 min
Register Databricks Genie or another MCP server as a skillMCP bridge — inbound clients~5 min
Federate to your Databricks Vector Search indexVector Search RAG~5 min
Let Claude / Cursor / Mosaic agents drive FactoryConnect Databricks Mosaic AI Agents~10 min

How the four streams compose

A typical pilot deployment uses three of the four streams together:

  1. Federated reads expose ~3–10 customer-warehouse tables as Assistant skills (orders, inventory, pricing, usage events).
  2. Delta Sharing publish mirrors core CRM entities back to the lake every ~15 min so the customer's data team can join CRM facts with the warehouse facts they already have.
  3. Bidirectional MCP turns the customer's existing Databricks Genie agents into Assistant skills (inbound) and lets their Claude/Cursor/Mosaic agents drive Factory (outbound).

Vector Search RAG is opt-in for customers with an existing embeddings investment — it's strictly an Assistant-grounding fallback, not a primary stream.

Plan & feature flags

The Lakehouse stack is available on the Enterprise plan. Per-feature flags (set on the deployment, not per-tenant):

FlagDefaultControls
WAREHOUSE_INTEGRATION_ENABLEDfalseFederated reads + Vector Search + inbound MCP
DELTA_SHARING_BASE_URLunsetOutbound Delta Sharing publish
ICEBERG_REST_ENABLEDfalseIceberg REST catalog discovery (read-only conformance)

See Governance for the full list.

Next steps

  • New to the platform? Start with Connect Databricks — the federated SQL stream is the fastest "wow" and the foundation for everything else.
  • Already have a Databricks Genie agent? Skip ahead to MCP bridge — inbound clients.
  • Care about the security model first? Governance covers query-guard, encryption, and revocation.