Definition

Apache Iceberg REST Catalog

The Iceberg REST catalog is the open API specification that lets tools discover and query Apache Iceberg tables without a vendor-specific catalog client. It is how Snowflake, Databricks, Trino, and Spark all read the same Iceberg tables.

Last updated

Definition

The Apache Iceberg REST Catalog is the open REST API specification that lets any tool discover, read, and (in some cases) write Apache Iceberg tables without a vendor-specific catalog client. It is the standardization point that allows Snowflake, Databricks, Trino, DuckDB, Spark, Flink, and other query engines to read the same Iceberg tables from the same metadata source.

The spec is part of the Apache Iceberg project and is the catalog API behind major Iceberg deployments at Netflix, Apple, AWS Glue, and most modern lakehouse platforms.

What problem it solves

Iceberg as a table format separates three things:

  1. The data files (Parquet, ORC, Avro) in object storage.
  2. The metadata files that describe schema, partition layout, and snapshot history.
  3. The catalog that points at the latest metadata snapshot.

Before the REST catalog spec, every Iceberg tool had its own catalog client (Hive Metastore, AWS Glue, Nessie, Snowflake's catalog, Databricks Unity Catalog). Reading the same Iceberg tables from multiple tools required configuring each one against the same catalog backend, with vendor-specific client code.

The REST catalog spec collapses this: any tool that speaks the open REST API can query any compliant catalog. The catalog is the discovery surface; the data files are addressed by URLs the catalog returns.

How it works

  1. Catalog server. Exposes a REST API per the Iceberg REST spec: list namespaces, list tables, get table metadata, commit table updates.
  2. Authentication. Bearer-token auth is standard; OAuth 2.0 client credentials flow is the production pattern.
  3. Client. Snowflake / Databricks / Trino / Spark / etc. registers the catalog as an external Iceberg catalog and queries tables natively.
  4. File access. The catalog returns metadata that points at data files in object storage; clients read those files directly (commonly via pre-signed URLs or direct cloud credentials).

The protocol cleanly separates the catalog from the storage from the compute, which is why so many engines can share the same Iceberg estate.

How Factory Labs uses Iceberg REST

Factory Labs exposes its CRM data as Iceberg tables via the REST catalog. The Snowflake integration uses this directly: Snowflake mounts Factory's Iceberg REST catalog as an external catalog and queries CRM data natively, with no Snowpipe and no Fivetran.

The same Iceberg catalog is also reachable from Databricks (via Unity Catalog's Iceberg federation), Trino, DuckDB, and any other Iceberg-aware tool.

See /integrations/snowflake for the Snowflake-specific setup.

Iceberg vs Delta Sharing

These are complementary, not competing:

  • Delta Sharing is a protocol for sharing Delta Lake tables. The receiver does not need to know about catalogs; they get a config.share and start reading.
  • Iceberg REST is a catalog protocol. The receiver mounts the catalog and discovers tables; access is governed by the catalog's auth model.

Factory Labs supports both because the consumer ecosystem split: Databricks customers tend to prefer Delta Sharing, Snowflake customers tend to prefer Iceberg. Same underlying CRM data, two outbound surfaces.

Why it matters

For B2B software, Iceberg REST means:

  • The vendor can ship one catalog and reach every modern lakehouse consumer.
  • No per-consumer connector to build or maintain.
  • Schema evolution is handled by the protocol; no breaking changes when fields are added.
  • Snapshot history gives time-travel queries on the receiver side, useful for audit and reproducibility.

It is the standardization that lakehouse interoperability needed.

Trade-offs

  • Read latency. Iceberg snapshots are not real-time; clients see data as of the latest snapshot, typically every few minutes. Not the right protocol for sub-second freshness.
  • Catalog operation. The vendor (us) operates the catalog and the underlying storage; that is a real cost we absorb.
  • Engine coverage. Not every BI tool reads Iceberg yet; Tableau and Looker need a query engine (Snowflake, Databricks, Trino) in between rather than reading Iceberg directly.

These are usually fine for analytical use cases.

Further reading