Development Diary #11

Welcome to our third dev diary of 2024!

In our previous dev diary we talked about the indexing structure in XT 2.x. This month, we’ve been busy working on the ingest and query performance, as well as starting to support a wider range of SQL-standard tooling.

Today, we’ve released another early-access snapshot with these changes 🚀

For early-access-users: this contains breaking changes to the transaction log format - you’ll need to re-import your data into XT2. As far as we know, this shouldn’t be an issue for the use cases we’ve seen so far (i.e. we’ve not been made aware of anyone having deployed the early-access to production, nor would we recommend it!) but if it is an issue, please do let us know and we’ll see if we can help. Once we move through alpha and beta, as a marker of stability, we’ll both reduce the frequency of changes of this magnitude, and provide more tooling/guidance on migrations.

Thanks to everyone who’s gotten in touch so far - and please do keep your thoughts and ideas coming!

Performance: a story of 'dense unions'

We’ve spent a fair proportion of this month with our heads buried in the profiler!

One area that we’ve particularly focused on is our use of Arrow’s 'union' vectors, which accounted for a not-insignificant proportion of both ingest and query speed. Union vectors are how XT2 gets its polymorphism.

'Normal' columnar use-cases benefit hugely from monomorphic vectors - knowing that you’ve got a whole column of (for example) fixed-width 32-bit integers ('i32s') arranged contiguously in memory gains you a lot of performance. In contrast with row-oriented storage (where you need to first locate the start of each row, then scan through the row’s other values to find the right integer column), in columnar-oriented storage, I know that to get the value of a certain i32 column from the 34th row, I go straight to byte 34 * 4 = 136 and read my four-byte integer. Similarly, if I’m summing the column, I don’t have to skip over any superfluous data for the rest of each row - I know the start byte offset of each column, I simply start summing from there.

In XT2, though, we’ve elected to give users more flexibility in the data they put in their documents. In Arrow, this means using 'union' vectors, which allow values of different types in a given column. (We use 'dense' unions, which we shorten to 'DUVs'; there are also 'sparse' unions, which I won’t cover here.)

Dense unions do this by storing two additional columns, as well as a column for each type. So, for a column that might either contain i32s or strings, say, we’d store four columns:

'type-id' - for each row, this stores which type that row contains. e.g. it might use 0 for i32s, 1 for strings.
'offset' - again for each row, this stores the value’s offset within the specific type vector
then one vector of i32s, one vector of strings.

For example, to store the vector [14, 18, "foo", 16, "bar"], we’d represent this as follows:

'type-id': [0, 0, 1, 0, 1]
'offset': [0, 1, 0, 2, 1] - e.g. 18 is at offset 1 in the i32s, "foo" is at offset 0 in the strings, etc.
'i32s': [14, 18, 16]
'strings': ["foo", "bar"]

A promotion

Initially, to get XT2 being able to store data and query it out again, we made the choice to store data pessimistically in dense unions. At any point, the user could insert a value of a different type into an existing column - so even though we expect the vast majority of data to be monomorphic, we chose to store everything in DUVs, just in case. However, the cost of both writing and reading this extra indirection, for every value in every column, is not trivial - safe to say, it was lighting up our profiler.

Our aim, therefore, was to get XT performing as closely as possible to fast monomorphic vectors for data that is uniformly typed, while accepting the DUV overhead for polymorphic data.

In this release, we now store data optimistically in monomorphic vectors, unless and until we’re given data of a different type. When this happens, we then 'promote' the monomorphic vector to be the first leg of a new DUV, set up the type-ids and offsets retrospectively, update our metadata as required, and then accept the new value.

I’m pleased to say that the performance improvement from this change has been significant.

Transaction submission (i.e. taking the documents from 'submit-tx' and writing them to the transaction log) is now faster by a factor of ~3x.
Reading the transaction log and indexing the documents is now faster by a factor of ~2x.
This is enough such that neither of these are now the ingestion bottleneck - this dubious honour is held by the metadata calculation at the end of each block of rows, which will be the subject of our next iteration.
On OLAP queries, this improves performance by an average of ~40%.

Compared to XT1, our document ingestion is now around 10-12x faster in XT2, our OLAP query benchmark is ~8x faster - and that’s even after having made significant incremental improvements to XT1’s performance between 1.22 and 1.24.

Absence and null

One breaking change that we’ve introduced as part of this work is the coalescing of absent and null values. Previously, we endeavoured to distinguish between (e.g.) {a: 1, b: null} and {a: 1}, in order to round-trip documents where possible.

(Notably, neither Arrow nor the SQL standard make this distinction.)

The cost of this decision, though, comes from the fact that Arrow therefore only has one type of 'null' - so we had to create our own Arrow extension type in order to separate the two. We used Arrow’s 'null' for the former case, and added an explicit 'absent' type for the latter, which had to go in a union vector.

This meant (and hands up if you can see where this is going! ✋) that whenever users inserted documents with different sets of keys, we had to fall back to DUVs, completely nullifying (ha!) the above performance improvements.

We’ve now reversed this decision, and represent both absence and null using Arrow’s built-in 'null' flag, which doesn’t carry this tradeoff. We’d recommend users that did previously depend on this distinction (i.e. attached implicit meaning to an explicitly provided 'null' in a map) instead use a separate enum column to represent the underlying reason for the 'null'.

More generally, putting my introspection hat on for a moment, this hopefully gives some insight into our decision making process for XTDB. While in certain areas we’re very keen to innovate (in the areas that make XTDB unique and special), in others, we very much value familiarity, consistency, prior art, and a lack of surprise.

We’re heavily influenced by what Arrow can store, and the meaning it attaches to the data in various vector types.
We aim to conform to the SQL specification wherever possible, but also take Postgres’s extensions into account if the spec is unclear.
Finally, we run on the JVM, and try to use its standard library where possible, which adds another dimension.

The fun comes when these four sources of prior art disagree with each other, as they often do!

SQL coverage and tooling

Separately, we’ve been improving our SQL coverage. This month’s focus has mainly been on casting, parsing and rendering operations for various temporal types. Temporal types are without doubt one of the gnarliest areas when it comes to the aforementioned disagreements between Arrow, SQL, Postgres and Java - we even ended up writing a large matrix of how each of them treat instants, dates and timestamps, as well as periods, durations and intervals, and the subtle differences between them.

Keep your eyes peeled for our upcoming documentation improvements on this subject!

On the SQL tooling integration side, we’ve been investigating popular SQL tools (DBeaver, Metabase etc.) to find out how best to get them talking to XT2 - we’d love it if you could Just™ use all of your existing tooling. Our approach here (not dissimilar to CockroachDB or CrateDB, for example) is to implement a Postgres wire protocol server, so that we can capitalise on the existing availability of Postgres support.

We have the initial seeds of a solution here, in our (even more) experimental pgwire-server module - you can already psql -h your-xtdb-server.com, submit transactions and run queries.

Our current iteration is focused on the initial connection queries that these tools send to Postgres (or what they think is Postgres, at least! 🕵). So far we’ve implemented some of information_schema and pg_catalog, so you can query those metadata tables to find out what tables/columns are present in your database.

Those tools also rely on some SQL language constructs that we haven’t yet implemented - for example, some access control function calls, and information about the current user/session/connection. Once we have, this should mean that these tools will then connect to XTDB and query it in the normal way - watch this space!

In other news

We have produced a quickstart tutorial that attempts to introduce XT2’s main concepts to an audience who is already familiar with SQL.

For new users who would prefer to avoid downloading XT2 before they can try using it, we have also created XT Play (in a similar vein to https://www.db-fiddle.com/) - a stateless backend which creates fresh instances of XT2 upon each request. This service is also particularly useful for sharing both working and non-working examples as simple URLs.

Feedback on any of these things would be very welcome over on this forum post.

For anyone curious about some of the motivations for our work on XTQL (and Datalog previously), we spoke with Prof. Viktor Leis for the latest episode of the JUXT Cast, which you can listen to anywhere or watch here:

Get in touch!

As always, we’d love to hear from you - we’ve already had many insightful conversations with people who’ve given XT2 a spin which have influenced our roadmap, and it’d be great to get your thoughts too:

Discuss XTDB (public forums)
hello@xtdb.com

Similarly, if you’d like a demo of XT2, or to be kept informed about its progress, come say hello at hello@xtdb.com.

Cheers,

James

Development Diary #11

Performance: a story of 'dense unions'

A promotion

Absence and null

SQL coverage and tooling

In other news

Get in touch!

Similar posts

Development Diary #1

Development Diary #2

Development Diary #7