XTDB 2.1: ATTACH to integrate

We released XTDB 2.0 back in June with a clear mission to simplify how organizations build and maintain auditable systems in the modern era of near-limitless object storage with our cloud native SQL database. Today we have released 2.1, the first incremental update which brings a whole slew of new capabilities.

This release is also a good milestone for stability and performance - with huge thanks to our Design Partners and early adopters for all the feedback and collaborations over recent months.

Multi-Database Support

First and foremost, XTDB now supports hosting and querying across many logical databases. This ability, driven by a desire to isolate datasets across operational domains (think 'Data Mesh' architecture), is presented to users via an ATTACH DATABASE command (with a corresponding DETACH DATABASE).

The ATTACH command allows teams to access 'secondary' databases containing shared domain data, aligning your data model with your organization’s structure while keeping compute independent. For example, querying across an orders database and a sales database (i.e. two secondary databases) with ease:

SELECT c.c_name, SUM(o.o_totalprice) AS total_spend
FROM sales_db.customers c
JOIN orders_db.orders o ON c.c_custkey = o.o_custkey
GROUP BY c.c_name

Users can query across multiple databases in a single query by specifying the absolute catalog paths when referencing tables.

In a traditional 'mutable' SQL database this kind of cross-database querying comes with significant limitations, and users have to commit to non-trivial consistency and performance tradeoffs when organising such an architecture. Most crucially, the choice of whether querying across databases should (A) enforce snapshot isolation by snapshotting the data across all data sources, which could impact concurrent writes and operations with unforeseen consequences (particularly depending on how long the underlying snapshots need to be retained), or (B) relax the consistency within the query to tolerate a degree of 'tearing' of results as the state of the underlying data sources mutates.

This is reflected in Postgres' Foreign-Data Wrappers, where each remote data source is queried independently without a unified snapshot, meaning a single query can see inconsistent states across sources. With FDWs there’s no straightforward way to enforce cross-source consistency on a per-query basis.

In contrast, XTDB’s approach is to make the consistency guarantees of multi-database usage universal and very straightforward to reason about. This is largely possible because of XTDB’s simple replication model and shared storage architecture. In essence, no single instance of XTDB 'owns' the storage, meaning many instances can cooperate transparently by connecting to the same storage - concretely, simply by specifying the same bucket and log configuration:

-- XTDB Node A (CMS app)

-- attach the shared products database
ATTACH DATABASE products WITH $$
  log: !Kafka
    cluster: 'prod-kafka'
    topic: 'xtdb.products'

  storage: !S3
    bucket: 'products-bucket'
$$

-- find content pages with out-of-stock products
SELECT cp._id, cp.page_url, pi.product_name
FROM content_pages cp
  JOIN products.inventory pi ON cp.featured_product_id = pi._id
WHERE pi.stock_level = 0

Then sometime later on an otherwise totally separate XTDB instance…

-- XTDB Node B (Customer Support app)

-- attach the shared products database
ATTACH DATABASE products WITH $$
  log: !Kafka
    cluster: 'prod-kafka'
    topic: 'xtdb.products'

  storage: !S3
    bucket: 'products-bucket'
$$

-- enrich support tickets with product details for the dashboard
SELECT st._id, st.customer_email, pi.product_name, pi.category
FROM support_tickets st
  JOIN products.inventory pi ON st.product_id = pi._id
WHERE st.status = 'open'

In addition to querying this same secondary products database, both XTDB instances can also be used to write to the products database as needed (note: although it may not be a good idea in general to allow many apps to write to a single database like this unrestricted, at least not without thinking through the consequences!).

Under the hood, attaching a database involves downloading the latest snapshot of metadata about that database and initialising the representation of the database in the XTDB catalog hierarchy. Once initialised, a secondary database is available to connect to using Postgres-style connection parameters, and users can transact to the newly attached database right away. The configuration of currently attached secondary databases is stored within the primary database as catalog metadata, ensuring that all nodes agree on which secondary databases are available.

Once a secondary database is attached, there’s no need to wait for data to 'copy' before being able to run analysis or begin data integration tasks - just attach your data sources and start querying! Caches may take some time to warm, but this is transparent.

The consistency is backed by a snapshot model which retains a vector of as-of states across each attached database, which can be serialized into a small string token for use by the application:

SHOW SNAPSHOT_TOKEN;
-- "ChYKBHh0ZGISDgoMCKHqs8cGEPCP2poB"

-- usable right away on the current connection
-- but also on a separate XTDB node (i.e. the same primary database) sometime later
BEGIN READ ONLY
  WITH (SNAPSHOT_TOKEN = 'ChYKBHh0ZGISDgoMCKHqs8cGEPCP2poB');

You can then use this token to achieve read-your-writes consistency and read-repeatability (potentially all within a single read-only transaction, per the example above) regardless of which XTDB node you are connected to. Crucially, this token does not hold open any resources and is usable indefinitely, meaning you can use it as the basis of auditability when constructing application queries.

For more details on how this works, see the docs pages on Databases in XTDB and Read-Only Transactions.

Connectivity

In the 2.0 release we recommended that applications use XTDB’s Postgres wire protocol compatibility wherever possible, along with standard Postgres drivers to aid stability and usability. We have since furthered this commitment by testing and validating our driver support across 11 languages, as demonstrated in our driver-examples repository.

This approach, to rely on another database’s connectivity ecosystem, is not without its challenges and tradeoffs, but it has enabled our team to collaborate effectively with a range of Design Partners on their applications (themselves using a range of languages: TypeScript, Clojure, Python, Kotlin) without getting bogged down in a tangled matrix of client library implementation and support decisions.

If there’s a language you want to use that’s not covered by the drivers we’ve tested so far, please do let us know. Likewise there are many Postgres-centric ORMs and database-related frameworks that may be possible to get working with some more effort - these are not nearly as straightforward to validate as a drivers but we’re happy to discuss & help.

The biggest requirement for any Postgres driver to work well (fast) with XTDB is that you must be able to specify the 'OID' type of parameters explicitly. In practice, the only language ecosystem we’ve explored so far where we’ve not been able to find a way of doing this is C#.

The out-of-the-box compatibility has also unlocked critical tooling with little effort, such as the popular BI tool Metabase (which relies on PgJDBC).

Plus OIDC, OTel, EXPLAIN ANALYZE, and more…

For the full release notes, you can have a read through the 2.1 release over on GitHub now, but we will continue our discussion of its multitude of new features in follow up posts over the coming days and weeks.

Office Hours

If you would like to discuss any of these things live with the team, we are running the next Office Hours session on 2nd Dec at 1600 UTC on Discord to review this week’s developments and walk through some of the 2.1 functionality. Come say hello!

The routine Office Hours drop-in time happens twice per week (usually on a Tuesday and Thursday) but the team is online and always keen to chat. Take a look at event listings in the Discord server for the latest info.

Finally, we delivered a 1-hour seminar titled "Reconstructing History with XTDB" to Andy Pavlo and his students at Carnegie Mellon University last week and the recording is online - we would be happy to chat about any of that also:

👋 hello@xtdb.com

XTDB 2.1: ATTACH to integrate

Multi-Database Support

Connectivity

Plus OIDC, OTel, EXPLAIN ANALYZE, and more…​

Office Hours

Plus OIDC, OTel, EXPLAIN ANALYZE, and more…