Connect Databricks (Warehouse)
Expose Databricks SQL warehouse tables as governed AI Assistant skills via federated reads. OAuth M2M or PAT, ~5 minutes through the wizard.
This guide sets up the federated reads stream — the AI Assistant queries your Databricks SQL warehouse live as governed skills, with no copy of your data.
Two complementary modes. This page is the inbound direction (Factory → Databricks SQL warehouse). For the outbound direction (a Databricks Mosaic AI agent calling into Factory), see Connect Databricks Mosaic AI Agents.
Prerequisites
- A Databricks workspace with at least one SQL Warehouse running.
- One of:
- A Personal Access Token (PAT) scoped to the schema(s) you want to expose. User Settings → Developer → Access tokens.
- OAuth M2M credentials (a service principal with
client_id/client_secretand the warehouse permissions). Recommended for production.
- Admin role in your Factory Labs workspace.
The PAT or service principal needs USE CATALOG + USE SCHEMA + SELECT on every table you plan to expose, and CAN_USE on the SQL warehouse.
Step 1 — Open the wizard
Go to Settings → Integrations → Databricks (Warehouse) → Connect, or directly to:
/settings/integrations/warehouse/databricksThe 6-step wizard renders. Total wall-clock target: ≤ 5 minutes for the first table, < 2 min for each additional table on the same connection.
Step 2 — Drop in credentials
| Field | Where to find it |
|---|---|
| Workspace URL | https://<workspace>.cloud.databricks.com (or your custom subdomain) |
| Warehouse ID | SQL Warehouses → your warehouse → "Connection details" tab |
| Auth | Pick PAT or OAuth M2M |
| PAT | The token you generated in User Settings |
| OAuth M2M | client_id, client_secret, and the OAuth token_endpoint (defaults to https://<workspace>/oidc/v1/token) |
Click Save & continue. A row is created in warehouse_connections with status untested. The credentials are encrypted at rest with AES-256-GCM and a per-tenant key derived via HKDF — never round-tripped to the browser. See Governance → Encrypted credentials.
Step 3 — Test the connection
Click Run test. The platform issues a single SELECT 1 through your warehouse and:
- Verifies auth resolves (OAuth M2M issues a fresh token; PAT is checked directly).
- Captures the round-trip latency.
- Surfaces the principal that authenticated (e.g.
principalHint = svc_warehouse). - Flips the connection status from
untested→ok.
If the test fails, the wizard shows a structured error code and a fix hint:
| Code | Likely cause |
|---|---|
AUTH_FAILED | PAT expired or revoked; OAuth client lacks the warehouse permission |
WAREHOUSE_OFFLINE | The SQL Warehouse is STOPPED — start it from the Databricks UI |
CATALOG_MISSING | Default catalog/schema couldn't be resolved with the principal's grants |
RATE_LIMITED | Too many concurrent test calls — wait 30s |
Step 4 — Pick a table
Browse catalog → schema → table using SHOW CATALOGS / SCHEMAS / TABLES. The picker only shows what the principal can see, so a least-privilege PAT will surface a focused shortlist.
Pick a table, and Factory Labs:
- Renders the first 3 rows in a preview grid (sampled, with type badges).
- Captures the column metadata snapshot — name, type, nullability — for the allow-list editor.
Step 5 — Define the Assistant skill
Each warehouse table becomes one AI Assistant skill keyed warehouse_table__<uuid>. This step controls what the Assistant can see and do with the table:
| Field | What it does |
|---|---|
| Description | Free-text summary the AI uses to decide when to call this skill. Click Suggest with AI for a one-paragraph draft based on the column metadata. |
| Allow-listed columns | Tick which columns the Assistant may SELECT. Anything unticked is invisible — and any SQL that mentions an un-allow-listed column is rejected by query-guard before it reaches your warehouse. |
| Row filter (optional) | A SQL WHERE clause that's AND-merged into every Assistant query — e.g. region = 'NA' or tenant_id = 'acme'. Useful for multi-tenant tables you want to expose only one slice of. |
Trim columns to the minimum needed (typically 5–10) — narrower allow-lists make the AI's queries faster and safer.
Step 6 — Suggest a semantic link
This step is optional but high-leverage. Click Get an AI suggestion and the platform:
- Runs a heuristic pass — looks for column names like
account_id,customer_uuid,account_emailthat obviously map to a CRM entity. Confidence ≥ 0.5 surfaces a recommendation. - Falls through to an AI pass if no heuristic match — the LLM proposes a join target (
accounts,contacts,opportunities, etc.) and the join key.
Accept the suggestion to teach the Assistant how to JOIN this warehouse table with CRM data:
"Show me opportunities for accounts in the warehouse with > $10k LTV." → Assistant emits a single SELECT joining
crm.accounts(Postgres) with the warehouse table on the accepted link key, all guarded.
You can skip this step and accept the link later — call PATCH /api/v1/admin/warehouse/data-sources/<id> with { semantic_link: ... }.
Step 7 — Activate as Assistant skill
Click Activate as Assistant skill. The platform:
- Inserts a row in
warehouse_data_sourceswithenabled_for_assistants = 'all'. - Generates a typed AI tool keyed
warehouse_table__<uuid>with the description, columns, row filter, and semantic link baked in. - Deep-links you to Settings → AI Agents so you can scope the new skill per agent.
The tool is now live for every default agent. To restrict it to specific agents, untick enabled_for_assistants = 'all' on the data source and tick the tool individually inside each custom agent.
Repeat for more tables
Steps 4–7 take < 2 min per additional table on the same connection. Each Databricks connection can host any number of data sources; each becomes its own Assistant skill.
What the Assistant sees
When the Assistant decides a question needs warehouse data, it picks the relevant warehouse_table__* tool, generates SQL from the description + columns + semantic link, and submits it. Before the SQL touches your warehouse, query-guard enforces:
- Single SELECT statement (no DDL, DML, or multi-statement)
- Allow-listed table & columns only
- Row filter
AND-merged into the WHERE LIMITclamp (default 200 rows)- Payload cap (default 1 GiB scanned per call)
Every call is logged to warehouse_query_log with the rewritten SQL, principal, latency, rows returned, and bytes scanned. See Governance → Audit trail.
Connected Warehouses dashboard
Open Data Lake in the sidebar (or /data-lake) for the operator view:
- Connection cards with status, last-tested time, and a one-click Test button.
- KPI strip: connections, tables, last 24h queries, last 24h rejections.
- Per-data-source query log with SQL, latency, and rejection reasons.
- "Disable" toggle to instantly revoke an entire connection from the Assistant.
Troubleshooting
Wizard step 3 reports AUTH_FAILED
PAT expired or scoped too narrowly. Generate a fresh PAT in User Settings → Developer → Access tokens with no expiration (or rotate it via the wizard's Edit credentials button without re-running steps 4–7).
Wizard step 4 shows no schemas
The principal has USE CATALOG but lacks USE SCHEMA on every schema in the catalog. Grant USE SCHEMA on the specific schema(s) you want to expose, or pick a catalog where the principal does have access.
Assistant tool exists but every call returns SQL_REJECTED
The model emitted SQL that violated the guard — usually mentioning an un-allow-listed column or trying a multi-statement. Inspect the rejection reason in warehouse_query_log (or the Data Lake page) and either widen the allow-list or tighten the skill description so the model picks a different approach.
Tool calls succeed but counts don't match Databricks UI
The row filter (Step 5) is AND-merged into every query — verify the filter matches your expectation.
Related guides
- Connect Snowflake — same wizard shape, different auth.
- MCP bridge — inbound clients — register Databricks Genie as an Assistant skill.
- Vector Search RAG — federate to Databricks Vector Search indexes.
- Governance — query-guard, encryption, audit, revocation.
Lakehouse Overview
Federated SQL, Delta Sharing publish, bidirectional MCP, and Vector Search RAG — Factory Labs as a lakehouse-native CRM for Databricks and Snowflake customers.
Connect Snowflake (Warehouse)
Expose Snowflake tables as governed AI Assistant skills via federated reads. Key-pair JWT auth, ~10 minutes through the wizard.