DynamoDB API in Scylla

Jay Taylor's notes

back to listing index

[web search]

Original source (docs.google.com)

Tags: database aws distributed-systems self-hosted dynamo dynamodb alternator design-doc docs.google.com

Clipped on: 2019-09-11

Alternator‌:‌ ‌DynamoDB‌ ‌API‌ ‌in‌ ‌Scylla‌ ‌

August‌ ‌2019,‌ ‌Nadav‌ ‌Har’El‌ ‌

Overview‌ ‌

Amazon‌ ‌DynamoDB‌ ‌is‌ ‌a‌ ‌popular‌ ‌NoSQL‌ ‌database‌ ‌running‌ ‌as‌ ‌a‌ ‌‌service‌ ‌(only)‌ ‌on‌ ‌Amazon’s‌ ‌

cloud.‌ ‌‌db-engines‌ ‌has‌ ‌it‌ ‌ranked‌ ‌below‌ ‌Cassandra‌ ‌(18th‌ ‌place‌ ‌vs‌ ‌10th),‌ ‌but‌ ‌it’s‌ ‌quickly‌ ‌gaining‌ ‌

traction‌ ‌(see‌ ‌‌this‌ ‌graph‌)‌ ‌thanks‌ ‌to‌ ‌being‌ ‌pushed‌ ‌by‌ ‌Amazon‌ ‌and‌ ‌thanks‌ ‌to‌ ‌the‌ ‌convenience‌ ‌to‌ ‌

use‌ ‌it‌ ‌-‌ ‌no‌ ‌need‌ ‌to‌ ‌install‌ ‌anything,‌ ‌just‌ ‌pay‌ ‌for‌ ‌your‌ ‌actual‌ ‌use‌ ‌or‌ ‌for‌ ‌a‌ ‌performance‌ ‌reservation‌ ‌

you‌ ‌can‌ ‌change‌ ‌at‌ ‌any‌ ‌time.‌ ‌

However,‌ ‌while‌ ‌users‌ ‌are‌ ‌finding‌ ‌it‌ ‌easy‌ ‌to‌ ‌start‌ ‌with‌ ‌DynamoDB,‌ ‌they‌ ‌soon‌ ‌find‌ ‌themselves‌ ‌in‌ ‌

vendor‌ ‌lock-in‌,‌ ‌because‌ ‌this‌ ‌API‌ ‌is‌ ‌only‌ ‌supported‌ ‌on‌ ‌Amazon.‌ ‌And‌ ‌when‌ ‌it‌ ‌quickly‌ ‌gets‌ ‌very‌ ‌

expensive‌ ‌(see‌ ‌for‌ ‌example‌ ‌‌this‌ ‌post‌),‌ ‌one‌ ‌cannot‌ ‌install‌ ‌their‌ ‌own‌ ‌setup‌ ‌of‌ ‌DynamoDB,‌ ‌let‌ ‌

alone‌ ‌move‌ ‌the‌ ‌application‌ ‌using‌ ‌the‌ ‌DynamoDB‌ ‌to‌ ‌another‌ ‌cloud‌ ‌or‌ ‌a‌ ‌private‌ ‌data‌ ‌center.‌ ‌

Amazon‌ ‌itself‌ ‌provides‌ ‌an‌ ‌installable‌ ‌version‌ ‌of‌ ‌DynamoDB,‌ ‌‌DynamoDB‌ ‌Local‌,‌ ‌but‌ ‌it‌ ‌is‌ ‌oriented‌ ‌

towards‌ ‌testing,‌ ‌not‌ ‌production‌ ‌usage.‌ ‌There‌ ‌have‌ ‌been‌ ‌previous‌ ‌attempts‌ ‌to‌ ‌solve‌ ‌this‌ ‌vendor‌ ‌

lock-in‌ ‌problem‌ ‌by‌ ‌providing‌ ‌a‌ ‌DynamoDB‌ ‌API‌ ‌implementation‌ ‌on‌ ‌top‌ ‌of‌ ‌other‌ ‌databases,‌ ‌so‌ ‌it‌ ‌

can‌ ‌be‌ ‌installed‌ ‌on‌ ‌any‌ ‌cloud‌ ‌or‌ ‌data‌ ‌center,‌ ‌but‌ ‌the‌ ‌only‌ ‌one‌ ‌I‌ ‌found‌ ‌is‌ ‌‌Dynalite‌,‌ ‌a‌ ‌DynamoDB‌ ‌

implementation‌ ‌on‌ ‌top‌ ‌of‌ ‌LevelDB.‌ ‌

The‌ ‌goal‌ ‌of‌ ‌this‌ ‌proposal‌ ‌is‌ ‌to‌ ‌provide‌ ‌a‌ ‌DynamoDB‌ ‌implementation‌ ‌on‌ ‌top‌ ‌of‌ ‌‌Scylla‌.‌ ‌The‌ ‌

benefits‌ ‌to‌ ‌the‌ ‌users‌ ‌are‌ ‌avoiding‌ ‌vendor‌ ‌lock-in‌ ‌(Scylla‌ ‌is‌ ‌open-source‌ ‌and‌ ‌can‌ ‌be‌ ‌installed‌ ‌on‌ ‌

any‌ ‌cloud)‌ ‌and‌ ‌reduced‌ ‌cost‌ ‌because‌ ‌of‌ ‌Scylla’s‌ ‌world-famous‌ ‌efficiency‌ ‌and‌ ‌avoiding‌ ‌

Amazon’s‌ ‌cut.‌ ‌The‌ ‌reduced‌ ‌cost‌ ‌will‌ ‌be‌ ‌relevant‌ ‌even‌ ‌for‌ ‌users‌ ‌who‌ ‌choose‌ ‌to‌ ‌stay‌ ‌with‌ ‌

Amazon:‌ ‌In‌ ‌our‌ ‌‌blog‌ ‌post‌ ‌from‌ ‌several‌ ‌months‌ ‌ago,‌ ‌we‌ ‌demonstrated‌ ‌that‌ ‌Scylla‌ ‌is‌ ‌at‌ ‌least‌ ‌7‌ ‌

times‌ ‌cheaper‌ ‌than‌ ‌DynamoDB‌ ‌(considering‌ ‌all‌ ‌costs,‌ ‌including‌ ‌VMs‌ ‌and‌ ‌even‌ ‌a‌ ‌commercial‌ ‌

Scylla‌ ‌license).‌ ‌So‌ ‌even‌ ‌if‌ ‌a‌ ‌DynamoDB‌ ‌front-end‌ ‌will‌ ‌add‌ ‌some‌ ‌overhead‌ ‌(and‌ ‌surely‌ ‌it‌ ‌will),‌ ‌we‌ ‌

expect‌ ‌to‌ ‌still‌ ‌have‌ ‌a‌ ‌significant‌ ‌cost‌ ‌advantage‌ ‌over‌ ‌DynamoDB.‌ ‌The‌ ‌biggest‌ ‌risk‌ ‌for‌ ‌the‌ ‌

performance‌ ‌(or‌ ‌cost)‌ ‌advantage‌ ‌of‌ ‌Scylla‌ ‌will‌ ‌be‌ ‌the‌ ‌need‌ ‌to‌ ‌accurately‌ ‌implement‌ ‌

DynamoDB’s‌ ‌query‌ ‌‌isolation‌ ‌guarantees,‌ ‌such‌ ‌as‌ ‌concurrent‌ ‌conditional‌ ‌writes,‌ ‌which‌ ‌may‌ ‌

require‌ ‌us‌ ‌to‌ ‌use‌ ‌LWT‌ ‌which‌ ‌may‌ ‌be‌ ‌slower‌ ‌than‌ ‌Scylla’s‌ ‌traditional‌ ‌read‌ ‌and‌ ‌write‌ ‌operations.‌ ‌

We’ll‌ ‌go‌ ‌into‌ ‌the‌ ‌concurrency‌ ‌issue‌ ‌in‌ ‌more‌ ‌detail‌ ‌below.‌ ‌

The‌ ‌goal‌ ‌of‌ ‌this‌ ‌document‌ ‌is‌ ‌to‌ ‌provide‌ ‌a‌ ‌‌detailed‌ ‌overview‌ ‌of‌ ‌DynamoDB‌ ‌and‌ ‌all‌ ‌its‌ ‌features,‌ ‌

with‌ ‌a‌ ‌focus‌ ‌on‌ ‌how‌ ‌it‌ ‌differs‌ ‌from‌ ‌Scylla‌ ‌and‌ ‌what‌ ‌we‌ ‌need‌ ‌to‌ ‌implement‌ ‌to‌ ‌provide‌ ‌DynamoDB‌ ‌

API‌ ‌support‌ ‌to‌ ‌Scylla,‌ ‌and‌ ‌which‌ ‌features‌ ‌are‌ ‌missing‌ ‌in‌ ‌Scylla‌ ‌for‌ ‌complete‌ ‌DynamoDB‌ ‌

support.‌ ‌Our‌ ‌understanding‌ ‌of‌ ‌DynamoDB‌ ‌stems‌ ‌from‌ ‌Amazon’s‌ ‌public‌ ‌documentation‌ ‌(we’ll‌ ‌

include‌ ‌links‌ ‌below)‌ ‌and‌ ‌public‌ ‌presentations‌ ‌made‌ ‌by‌ ‌Amazon‌ ‌engineers‌ ‌(notably‌ ‌‌this‌ ‌AWS‌ ‌

re:Invent‌ ‌talk‌).‌ ‌

‌ ‌

Table‌ ‌of‌ ‌Contents:‌ ‌

Overview‌‌1‌ ‌

The‌ ‌API‌ ‌server‌‌5‌ ‌

The‌ ‌wire‌ ‌protocol‌‌7‌ ‌

The‌ ‌data‌ ‌model‌‌9‌ ‌

DynamoDB‌ ‌data‌ ‌storage‌‌12‌ ‌

Data‌ ‌size‌ ‌limits‌‌13‌ ‌

Number‌ ‌encoding‌‌13‌ ‌

Tables‌ ‌and‌ ‌table‌ ‌operations‌‌14‌ ‌

Operations‌ ‌on‌ ‌individual‌ ‌items‌ ‌(CRUD)‌‌16‌ ‌

UpdateItem’s‌ ‌update‌ ‌expressions‌‌18‌ ‌

Expression‌ ‌parsers‌ ‌in‌ ‌DynamoDB‌‌19‌ ‌

Concurrent‌ ‌updates‌‌19‌ ‌

Conditional‌ ‌updates‌ ‌using‌ ‌LWT‌‌21‌ ‌

Atomic‌ ‌counters‌‌22‌ ‌

Batch‌ ‌operations‌‌23‌ ‌

Full-table‌ ‌scans‌‌23‌ ‌

Partition‌ ‌or‌ ‌partition-slice‌ ‌scan‌‌25‌ ‌

Consistency‌ ‌levels‌‌26‌ ‌

On-demand‌ ‌backup‌ ‌and‌ ‌restore‌‌28‌ ‌

Continuous‌ ‌backup‌ ‌(point-in-time‌ ‌recovery)‌‌29‌ ‌

Transactions‌‌30‌ ‌

Condition‌ ‌expressions‌‌31‌ ‌

Single-region‌ ‌and‌ ‌multi-region‌ ‌tables‌‌31‌ ‌

Secondary‌ ‌indexes‌‌33‌ ‌

Streams‌ ‌(“Change‌ ‌Data‌ ‌Capture”,‌ ‌or‌ ‌CDC)‌‌37‌ ‌

Encryption‌ ‌at‌ ‌rest‌‌38‌ ‌

Time‌ ‌To‌ ‌Live‌ ‌(TTL)‌‌39‌ ‌

Resources‌ ‌and‌ ‌tags‌‌40‌ ‌

Pricing‌ ‌or‌ ‌throughput‌ ‌limits‌‌40‌ ‌

DAX‌‌40‌ ‌

AWS‌ ‌management‌ ‌console‌‌41‌ ‌

Metrics‌‌41‌ ‌

Service‌ ‌Level‌ ‌Agreement‌ ‌(SLA)‌‌41‌ ‌

AWS‌ ‌integration‌‌41‌ ‌

Benchmarking‌‌41‌ ‌

‌ ‌

Outline

Alternator: DynamoDB API in Scylla

Overview

The API server

The wire protocol

The data model

DynamoDB data storage

Data size limits

Number encoding

Tables and table operations

Operations on individual items (CRUD)

UpdateItem’s update expressions

Expression parsers in DynamoDB

Concurrent updates

Conditional updates using LWT

Atomic counters

Batch operations

Full-table scans

Partition or partition-slice scan

Consistency levels

On-demand backup and restore

Continuous backup (point-in-time recovery)

Transactions

Condition expressions

Single-region and multi-region tables

Secondary indexes

Streams (“Change Data Capture”, or CDC)

Encryption at rest

Time To Live (TTL)

Resources and tags

Pricing or throughput limits

DAX

AWS management console

Metrics

Service Level Agreement (SLA)

AWS integration

Benchmarking

Toggle screen reader support