Lakehouse Overview
Federated SQL, Delta Sharing publish, bidirectional MCP, and Vector Search RAG — Factory Labs as a lakehouse-native CRM for Databricks and Snowflake customers.
Factory Labs is the first lakehouse-native CRM. Your warehouse stays where it is — Factory speaks Delta Sharing, MCP, and SQL natively, in both directions. No ETL, no proprietary connector, no copy of your data.
This section walks through the four data streams that connect Factory Labs to your lakehouse, how to set each one up, and how the same governance model (encrypted credentials, per-tenant key derivation, query-guard, audit trails) applies to all of them.
The four streams
| Stream | Direction | What it does |
|---|---|---|
| Federated reads | Factory → your warehouse | The AI Assistant queries your Databricks or Snowflake tables live as governed skills. No copy. |
| Delta Sharing publish | Factory → your lake | CRM facts (accounts, contacts, leads, opportunities, activities, cases) streamed back as Delta tables. Recipients consume in Databricks, Snowflake, or PyIceberg. |
| MCP — both ways | Bidirectional | Inbound: register a Databricks Genie space (or any MCP server) — its tools become Assistant skills. Outbound: expose /api/mcp so Claude, Cursor, and Mosaic AI agents can drive Factory. |
| Vector Search RAG | Factory → your vector index | Federate to your Databricks Vector Search or Mosaic AI embeddings index — RAG without re-embedding. |
What you need to know
- Zero data movement. Federated reads, RAG queries, and MCP tool calls all execute against your warehouse — Factory only sees the result rows it asked for, scoped by allow-list.
- Open standards. Delta Lake protocol v1.x for Sharing publish, Iceberg REST catalog (read-only conformance) for discovery, Model Context Protocol for both MCP directions.
- Same governance, every byte. Query-guard enforces single-SELECT, table & column allow-lists, row filters, and payload caps. Credentials are encrypted with AES-256-GCM and a per-tenant HKDF key.
- No vendor lock-in. You bring Databricks or Snowflake. Factory adapts. The same wizard shape works for both — only the auth flow differs.
Setup paths
Pick the stream(s) you need and follow the setup guide:
| If you want to… | Set up | Time |
|---|---|---|
| Query Databricks tables from the AI Assistant | Connect Databricks (warehouse) | ~5 min |
| Query Snowflake tables from the AI Assistant | Connect Snowflake | ~10 min |
| Publish CRM facts back to your lake | Delta Sharing publish | ~3 min |
| Register Databricks Genie or another MCP server as a skill | MCP bridge — inbound clients | ~5 min |
| Federate to your Databricks Vector Search index | Vector Search RAG | ~5 min |
| Let Claude / Cursor / Mosaic agents drive Factory | Connect Databricks Mosaic AI Agents | ~10 min |
How the four streams compose
A typical pilot deployment uses three of the four streams together:
- Federated reads expose ~3–10 customer-warehouse tables as Assistant skills (orders, inventory, pricing, usage events).
- Delta Sharing publish mirrors core CRM entities back to the lake every ~15 min so the customer's data team can join CRM facts with the warehouse facts they already have.
- Bidirectional MCP turns the customer's existing Databricks Genie agents into Assistant skills (inbound) and lets their Claude/Cursor/Mosaic agents drive Factory (outbound).
Vector Search RAG is opt-in for customers with an existing embeddings investment — it's strictly an Assistant-grounding fallback, not a primary stream.
Plan & feature flags
The Lakehouse stack is available on the Enterprise plan. Per-feature flags (set on the deployment, not per-tenant):
| Flag | Default | Controls |
|---|---|---|
WAREHOUSE_INTEGRATION_ENABLED | false | Federated reads + Vector Search + inbound MCP |
DELTA_SHARING_BASE_URL | unset | Outbound Delta Sharing publish |
ICEBERG_REST_ENABLED | false | Iceberg REST catalog discovery (read-only conformance) |
See Governance for the full list.
Next steps
- New to the platform? Start with Connect Databricks — the federated SQL stream is the fastest "wow" and the foundation for everything else.
- Already have a Databricks Genie agent? Skip ahead to MCP bridge — inbound clients.
- Care about the security model first? Governance covers query-guard, encryption, and revocation.
ERP Integrations
Connect SAP, NetSuite, Dynamics 365, or Infor CloudSuite to surface real-time orders, pricing, and inventory in the CRM.
Connect Databricks (Warehouse)
Expose Databricks SQL warehouse tables as governed AI Assistant skills via federated reads. OAuth M2M or PAT, ~5 minutes through the wizard.