Development Diary #5

A lot has happened since our last dev diary entry back in June. The latter half of 2020 was enormously busy for us, and if you’d indulge, we’d like to give a tour of what’s been going on.

Features

Throughout 2020 we addressed a number of popular feature requests from our users.

In the last entry we discussed Transaction Functions, and we also mentioned a couple of particularly exciting roadmap items, both hotly demanded, that shipped soon after: Speculative Transactions available via the withTx() API, and support for SQL queries.

Both of these features are arguably essential capabilities needed for a modern database system - withTx() allows for users to run arbitrary queries and complex integrity checks on draft changes (before committing them!), and SQL provides a familiar on-ramp for a very broad spectrum of users, without the need for any additional infrastructure.

Similarly, in response to user demand, we recently released Full Text Search support as a secondary index module, powered by Lucene. The module is in alpha as we gather user feedback on our implementation. In this module you can use a new built-in Datalog predicate to perform full-text matches against an attribute. There is also another predicate allowing you to search against whole documents using a Lucene query string that can refer to multiple fields.

However, beyond responding to feature requests for core functionality, a key theme of 2020 was making XTDB more consumable for mainstream audiences and generally lower the barriers to adoption. Adding SQL support was a significant step, and with release of 1.13.0 we brought JSON support to the HTTP module. This means that, purely from the command-line, you can now spin up a Docker container for a new node (using xtdb-build), submit a JSON document via curl, and then query against it using SQL - all without touching Clojure or Java.

Another pillar for increasing adoption is strengthening the reach into the Java ecosystem. Work continues on a strongly-typed Java API and we are preparing an xtdb-kotlin Kotlin API shim, with an upcoming blog post where our resident Kotlin enthusiasts will share some enlightenment. We also plan to be pushing release artefacts to Maven Central soon.

To briefly recap the other noteworthy developments in the XTDB repo since the last review:

Index checkpointing, that allows the nodes in a cluster to share checkpoints using a central 'checkpoint store' (e.g. S3), so that nodes joining a cluster can replay from a checkpoint rather than having to replay the whole transaction history (1.13.0)
Destructuring of tuples from input relations (provided via :in), predicates and subqueries (1.13.0)
Native index support (i.e. encoded & internally sorted for efficient range scans) for additional value types: byte arrays, Java’s BigDecimal and BigInteger, and java.time’s Duration, Instant, LocalDateTime, LocalDate and LocalTime (1.13.0)
A revamped module system for simple yet endlessly flexible node configuration - this was a major overhaul that now enables users to create new modules, similarly to the HTTP API or SQL module (1.12.0)
Datalog aggregations (1.12.0)
Azure Blob storage for documents via azure-blobs (which borrows from the blueprint established by modules/s3), kindly contributed by @luposlip (1.12.0)
Query observability to track active/recent/slow queries in conjunction with existing monitoring (1.11.0)
The addition of Strings and Longs as supported Entity ID types
- very handy for JSON (1.10.0)
EQL projection (pull) syntax for tree-structured retrieval within Datalog queries (1.10)
Support for creating non-polling transaction log backends using the TxIngester protocol (1.10)

Performance

We have invested significantly in performance, improving both ingestion throughput and query times, whilst strengthening our benchmarking capability in the process. We’ve also reduced the disk footprint that individual XTDB nodes require, which is an area we will certainly revisit in the year ahead as XTDB transitions to an architecture that aligns with our strategy to 'separate storage from compute'.

Future

2020 was a year of stabilising; adding the highest level user-priority features, concentrating on performance, and making XTDB more accessible.

We’ve added people to the team, so that in 2021 we can aim to complete more of our objectives.

We have exciting plans for 2021 - focusing our efforts on 'separating storage from compute', will mean that an XTDB node will not need the complete universe of data available to it locally, rather it can pull in data from a central store as needed.

Whilst doing this, we will be upgrading our temporality featureset, including the ability for users to efficiently discover 'when something is true', as opposed to relying on timeslice queries at a known point-in-time.

The team is excited for XTDB 2021. Aside from the new features, our aim is for XTDB to be simpler and leaner, more accessible and straightforward for our users to adopt.

Once we’ve closed the loop on separating-storage-from-compute and temporality++, we will then aim to move formally out of beta, with the addition of new modules such as GraphQL to keep the accessibility story going.

Community

The community forums are busier than ever, particularly Zulip. It’s great to see lively conversations and debate, and hear your feedback! Thank you to everyone who helps us track down bugs.

re:Clojure

In December we were lucky enough to open up the re:Clojure virtual conference with a 25 minute video about the XTDB journey so far, including a tour of the internals and a glimpse of what we’re thinking about more generally. It was a great couple of days - many thanks to the other speakers and organisers!

(Because of the immutable nature of YouTube videos, you’ll note XTDB is still referred to as "Crux" in the video and the title.)

Client libraries

It has been exciting to see a lot of interest in building new client libraries around the XTDB HTTP API. Great work by @naomijub & co. on these two edn-based libraries in particular:

transistor, a Rust client underpinned by edn-rs
translixir, an Elixir client

With the changes to the HTTP module for JSON support there is now also the potential to generate JSON-based client libraries using our OpenAPI specification.

Modules

In addition to @luposlip’s contribution of Azure Blob support mentioned above, and the various modules by Avisi, we’ve also seen other new modules crop up in the wild:

@severeoverfl0w created a Redis module for using Redis as a durable doc-store and Redis Streams as a durable tx-log. A clone of that repo now also lives under the crux-labs organisation for easy discovery. ("Crux" has sinced been renamed to "XTDB" but this repository hasn’t been migrated yet.)
@keytiong shared a new KV module for HBase as a remote index-store

Biff

Jacob O’Bryant is a startup founder and enthusiastic Clojure advocate who has cooked up Biff, a new web framework & self-hosted deployment solution for Clojure, with XTDB at its heart (by default, at least). It even includes a Firebase-like authorization model that allows you to securely submit application-level transactions through to XTDB from the front-end.

If you’re planning to get started with building a full-stack application with Clojure and XTDB then Biff is definitely worth a look. Jacob delivered a talk and hosted a workshop on using Biff at re:Clojure.

Get in touch!

We hope this gives a flavour of what to expect in 2021.

Please give XTDB a whirl and get in touch with the team if you have any questions or issues. Happy New Year!