CruzDB architecture part 1: it's a log-structured database
15 February 2018
This post is part of the following series:
CruzDB is a distributed shared-data key-value store that manages all its data
in a single, high-performance distributed shared-log. It’s been one the most
challenging and interesting projects I’ve hacked on, and this post is the first
in a series that will explore the current implementation of the system, and
critically, where things are headed.
Throughout this series we will dig into all the details, but here is a brief,
high-level description of CruzDB. The system uses multi-version concurrency
control for managing transactions, and each transaction in CruzDB reads from an
immutable snapshot of the database. This design is a direct result of being
built on top of an immutable log. When a transaction finishes it doesn’t
commit immediately. All of the information required to replay the
transaction—a reference to its snapshot, and a record of the its reads and
writes—are packaged into an object called an intention which is then appended
to the log. Any node with access to the log can append new transaction
intentions, and replay the intentions in the order that they appear in the log.
This process of replaying the log is what determines if a transaction commits
or aborts, and because the log is totally ordered, every node in the system
deterministically makes the same decision about every transaction.
This is a high-level architectural view of the system. At the top is CruzDB which
can be linked directly into an application, or accessed through a proxy. It’s API
is similar to RocksDB. In the middle is a distributed shared-log that stores all
of the data from CruzDB. The implementation of a shared-log that we use is ZLog
which is designed to run on top of a distributed object storage system. While ZLog
is designed to run on a variety of distributed storage systems, we use the Ceph distributed
object storage system, represented at the lowest level in the diagram.