Vector Search RAG (Federated)
Federate the AI Assistant to your Databricks Vector Search or Mosaic AI embeddings index — RAG fallback without re-embedding or maintaining a parallel store.
The Vector Search RAG stream lets the AI Assistant retrieve from your existing Databricks Vector Search (or Mosaic AI embeddings) index as a RAG fallback — without re-embedding documents into Factory's local Pinecone/pgvector store. Use it when you already have a substantial domain corpus indexed in Databricks (product manuals, contracts, runbooks, support tickets) and don't want a parallel embeddings investment.
When to use it
| Situation | Use Vector Search RAG? |
|---|---|
| You have a Databricks Vector Search index with ≥ 10k indexed chunks | Yes — federate, don't duplicate |
| You're starting from scratch with a small (~1k chunk) corpus | No — use Factory's built-in knowledge base (faster, cheaper) |
| You need < 100 ms p95 latency on retrieval | No — federated calls are ~200–500 ms; use the local index |
| Your corpus is in S3 / Postgres / Pinecone, not Databricks | No — register that source separately, or wait for the generic Vector Search adapter |
The federated path is strictly a fallback layered alongside the local search_knowledge tool. The Assistant calls both and ranks results by similarity score regardless of source.
Prerequisites
- A Databricks workspace with Mosaic AI Vector Search enabled.
- A Vector Search index of either type:
- Direct Access — you control the embeddings, Factory queries the index by vector
- Delta Sync — Databricks maintains the index from a Delta table, Factory queries by text or vector
- The index must have:
- A text column (the source chunk text)
- An embedding column (Databricks-managed or BYO)
- Optional metadata columns to surface as
return_columns(e.g.doc_id,title,url,published_at)
- A Databricks PAT (or service principal) with
SQL+Vector Searchprivileges. - An existing federated warehouse data source (Phase 1 of the Connect Databricks flow) — vector indexes are scoped to a warehouse connection so they share auth + per-tenant encryption.
Step 1 — Register the index
Vector indexes are registered via API today (UI shipping next). With the connectionId of an active warehouse connection:
curl -X POST https://app.factorylabs.ai/api/v1/admin/warehouse/vector-indexes \
-H "Cookie: <session>" \
-H "Content-Type: application/json" \
-d '{
"connectionId": "<conn_uuid>",
"indexName": "main.knowledge.product_specs_idx",
"description": "Product specifications, install guides, and runbooks indexed by Databricks Vector Search.",
"textColumn": "text",
"returnColumns": ["doc_id", "title", "url"],
"enabled_for_assistants": true
}'| Field | Purpose |
|---|---|
connectionId | Existing warehouse connection (uses its encrypted PAT for auth) |
indexName | Fully qualified catalog.schema.index_name from Databricks |
description | Free-text summary the Assistant uses to decide when to call this index |
textColumn | Column name that holds the source chunk text |
returnColumns | Metadata columns to include alongside text in each hit (used for citations) |
enabled_for_assistants | If true, the index becomes a tool for every default agent |
Response includes data.id — a row in warehouse_vector_indexes with status='active'.
Step 2 — Tick the tool in the agent
Open Settings → AI Agents → <your agent>. Under Warehouse vector indexes, the new index appears keyed warehouse_vector__<uuid>. Tick it.
Default agents auto-include all enabled_for_assistants=true indexes. Custom agents start empty — tick explicitly.
Step 3 — Use from the Assistant
"Find product spec sheets for SKU
XYZ-123."
The Assistant:
- Sees the question is grounding-style and picks the relevant retrieval tools — typically both
search_knowledge(local) andwarehouse_vector__<uuid>(federated). - Calls them in parallel — sub-second cumulative because the local hit returns immediately.
- Receives:
- From
search_knowledge: top-k chunks from Pinecone/pgvector withscore,text, citation anchor. - From
warehouse_vector__<uuid>: top-k chunks from Databricks withscore,text, and thereturnColumnsyou configured.
- From
- Ranks results by score regardless of source, synthesizes an answer, cites the highest-scoring chunks (with the
urlcolumn from your warehouse hits as the citation link).
RAG ranking behavior
The Assistant doesn't pre-emptively pick one source over the other — both are tools, both get called, the model ranks. Three sub-cases worth knowing:
- Local + warehouse both have hits — both surfaced, model prefers higher-similarity. Citations interleave naturally.
- Only warehouse has hits — Assistant answers from warehouse only; the trace shows zero
search_knowledgeresults. - You disable
search_knowledgefor this agent — the Assistant relies solely on the federated index. Useful for agents whose domain is exclusively in your lakehouse corpus.
To force one source only, untick the other in the Custom Agent builder.
How the federated call works
The Factory tool wraps the Databricks Vector Search REST API. For Delta Sync indexes, the call uses query_text (Databricks generates the embedding server-side). For Direct Access indexes, you can either pass query_text (if Factory has an embedder configured for the index's model) or query_vector (Factory generates the embedding locally first).
Token budget guard
Vector Search indexes can return arbitrarily large chunks — a single hit could be a 50KB document. To keep the Assistant context window manageable, Factory enforces a per-call token budget:
| Limit | Default | Notes |
|---|---|---|
num_results | 5 | Tunable per index; max 20 |
max_chars_per_result | 2 000 | Truncates each chunk's text field |
total_chars_per_call | 8 000 | Across all hits combined |
When total_chars_per_call is exceeded, lowest-score hits drop until under budget. Token budget rejections are logged to warehouse_vector_call_log.budget_exceeded.
Per-index allow-list
Each index registration carries its own enabled_for_assistants flag and per-agent ticking. To restrict an index to one or two custom agents (e.g. only the "Support specialist" agent gets the warranty docs index):
-
Set
enabled_for_assistants = falseon the index registration:curl -X PATCH https://app.factorylabs.ai/api/v1/admin/warehouse/vector-indexes/<id> \ -H "Cookie: <session>" -H "Content-Type: application/json" \ -d '{ "enabled_for_assistants": false }' -
Tick the tool only inside the agents that should have it.
Audit trail
Every federated vector call is logged to warehouse_vector_call_log:
query_text(the prompt sent to Vector Search)result_countandbytes_returnedlatency_msand HTTP statusbudget_exceededflag if guard truncated resultsprincipal(the encrypted PAT's resolved user, surfaced from the test step)
Surface the same data in the Data Lake operator dashboard alongside warehouse SQL queries.
Cost considerations
Federated Vector Search calls hit your Databricks Vector Search endpoint — usage shows up in your Databricks bill, not Factory's. Rough orders of magnitude:
- Direct Access index: ~$0.0001 per query (mostly serving compute)
- Delta Sync index: same query cost + the standing sync compute (varies by source table churn)
If you don't want every Assistant chat to hit Databricks, scope the tool to specific agents (Step 2 — uncheck enabled_for_assistants) so only intentional flows trigger federated retrieval.
Troubleshooting
Tool registers but every call returns 0 results
The textColumn value is wrong — verify the column actually contains text in Databricks: SELECT <textColumn> FROM main.knowledge.product_specs_idx LIMIT 5. Also confirm the index is READY (not still building).
AUTH_FAILED on every call
The warehouse connection's PAT lacks Vector Search privileges. In Databricks, grant the PAT user USE CATALOG + USE SCHEMA + SELECT on the index's catalog/schema. Re-test the warehouse connection (the vector index reuses its credentials — no separate test step).
Hits come back but citations show no url
The index doesn't have a url column, or you didn't include it in returnColumns at registration. Re-register with the corrected returnColumns list.
Latency > 2 seconds per call
Either the index is on a serverless endpoint that's cold-started, or the num_results is too high for the embedding model. Lower num_results to 5 (default) and confirm the endpoint type in Databricks. For consistent latency, switch to a provisioned serving endpoint.
Related guides
- Connect Databricks (warehouse) — federated SQL is a prerequisite (vector indexes share connection credentials).
- MCP bridge — inbound clients — alternative federation path if your retrieval is exposed as an MCP server rather than a Vector Search index.
- Governance — per-call budget enforcement, audit, encryption.
MCP Bridge — Inbound Clients
Register Databricks Genie or any MCP server as an Assistant skill. Inbound MCP turns your existing agents into first-class Factory tools.
Lakehouse Governance
Query-guard, encrypted credentials, schema-per-tenant isolation, recipient revocation, and audit trails — the security model that makes federated reads and bidirectional MCP safe to ship.