Jay Taylor's notes

back to listing index

DynamoDB API in Scylla

[web search]
Original source (docs.google.com)
Tags: database aws distributed-systems self-hosted dynamo dynamodb alternator design-doc docs.google.com
Clipped on: 2019-09-11

Image (Asset 1/9) alt=
Alternator‌:‌ ‌DynamoDB‌ ‌API‌ ‌in‌ ‌Scylla‌ 
August‌ ‌2019,‌ ‌Nadav‌ ‌Har’El‌ 
Amazon‌ ‌DynamoDB‌ ‌is‌ ‌a‌ ‌popular‌ ‌NoSQL‌ ‌database‌ ‌running‌ ‌as‌ ‌a‌ ‌‌service‌ ‌(only)‌ ‌on‌ ‌Amazon’s‌ 
cloud.‌ ‌‌db-engines‌ ‌has‌ ‌it‌ ‌ranked‌ ‌below‌ ‌Cassandra‌ ‌(18th‌ ‌place‌ ‌vs‌ ‌10th),‌ ‌but‌ ‌it’s‌ ‌quickly‌ ‌gaining‌ 
traction‌ ‌(see‌ ‌‌this‌ ‌graph‌)‌ ‌thanks‌ ‌to‌ ‌being‌ ‌pushed‌ ‌by‌ ‌Amazon‌ ‌and‌ ‌thanks‌ ‌to‌ ‌the‌ ‌convenience‌ ‌to‌ 
use‌ ‌it‌ ‌-‌ ‌no‌ ‌need‌ ‌to‌ ‌install‌ ‌anything,‌ ‌just‌ ‌pay‌ ‌for‌ ‌your‌ ‌actual‌ ‌use‌ ‌or‌ ‌for‌ ‌a‌ ‌performance‌ ‌reservation‌ 
you‌ ‌can‌ ‌change‌ ‌at‌ ‌any‌ ‌time.‌ 
However,‌ ‌while‌ ‌users‌ ‌are‌ ‌finding‌ ‌it‌ ‌easy‌ ‌to‌ ‌start‌ ‌with‌ ‌DynamoDB,‌ ‌they‌ ‌soon‌ ‌find‌ ‌themselves‌ ‌in‌ 
vendor‌ ‌lock-in‌,‌ ‌because‌ ‌this‌ ‌API‌ ‌is‌ ‌only‌ ‌supported‌ ‌on‌ ‌Amazon.‌ ‌And‌ ‌when‌ ‌it‌ ‌quickly‌ ‌gets‌ ‌very‌ 
expensive‌ ‌(see‌ ‌for‌ ‌example‌ ‌‌this‌ ‌post‌),‌ ‌one‌ ‌cannot‌ ‌install‌ ‌their‌ ‌own‌ ‌setup‌ ‌of‌ ‌DynamoDB,‌ ‌let‌ 
alone‌ ‌move‌ ‌the‌ ‌application‌ ‌using‌ ‌the‌ ‌DynamoDB‌ ‌to‌ ‌another‌ ‌cloud‌ ‌or‌ ‌a‌ ‌private‌ ‌data‌ ‌center.‌ 
Amazon‌ ‌itself‌ ‌provides‌ ‌an‌ ‌installable‌ ‌version‌ ‌of‌ ‌DynamoDB,‌ ‌‌DynamoDB‌ ‌Local‌,‌ ‌but‌ ‌it‌ ‌is‌ ‌oriented‌ 
towards‌ ‌testing,‌ ‌not‌ ‌production‌ ‌usage.‌ ‌There‌ ‌have‌ ‌been‌ ‌previous‌ ‌attempts‌ ‌to‌ ‌solve‌ ‌this‌ ‌vendor‌ 
lock-in‌ ‌problem‌ ‌by‌ ‌providing‌ ‌a‌ ‌DynamoDB‌ ‌API‌ ‌implementation‌ ‌on‌ ‌top‌ ‌of‌ ‌other‌ ‌databases,‌ ‌so‌ ‌it‌ 
can‌ ‌be‌ ‌installed‌ ‌on‌ ‌any‌ ‌cloud‌ ‌or‌ ‌data‌ ‌center,‌ ‌but‌ ‌the‌ ‌only‌ ‌one‌ ‌I‌ ‌found‌ ‌is‌ ‌‌Dynalite‌,‌ ‌a‌ ‌DynamoDB‌ 
implementation‌ ‌on‌ ‌top‌ ‌of‌ ‌LevelDB.‌ 
The‌ ‌goal‌ ‌of‌ ‌this‌ ‌proposal‌ ‌is‌ ‌to‌ ‌provide‌ ‌a‌ ‌DynamoDB‌ ‌implementation‌ ‌on‌ ‌top‌ ‌of‌ ‌‌Scylla‌.‌ ‌The‌ 
benefits‌ ‌to‌ ‌the‌ ‌users‌ ‌are‌ ‌avoiding‌ ‌vendor‌ ‌lock-in‌ ‌(Scylla‌ ‌is‌ ‌open-source‌ ‌and‌ ‌can‌ ‌be‌ ‌installed‌ ‌on‌ 
any‌ ‌cloud)‌ ‌and‌ ‌reduced‌ ‌cost‌ ‌because‌ ‌of‌ ‌Scylla’s‌ ‌world-famous‌ ‌efficiency‌ ‌and‌ ‌avoiding‌ 
Amazon’s‌ ‌cut.‌ ‌The‌ ‌reduced‌ ‌cost‌ ‌will‌ ‌be‌ ‌relevant‌ ‌even‌ ‌for‌ ‌users‌ ‌who‌ ‌choose‌ ‌to‌ ‌stay‌ ‌with‌ 
Amazon:‌ ‌In‌ ‌our‌ ‌‌blog‌ ‌post‌ ‌from‌ ‌several‌ ‌months‌ ‌ago,‌ ‌we‌ ‌demonstrated‌ ‌that‌ ‌Scylla‌ ‌is‌ ‌at‌ ‌least‌ ‌7‌ 
times‌ ‌cheaper‌ ‌than‌ ‌DynamoDB‌ ‌(considering‌ ‌all‌ ‌costs,‌ ‌including‌ ‌VMs‌ ‌and‌ ‌even‌ ‌a‌ ‌commercial‌ 
Scylla‌ ‌license).‌ ‌So‌ ‌even‌ ‌if‌ ‌a‌ ‌DynamoDB‌ ‌front-end‌ ‌will‌ ‌add‌ ‌some‌ ‌overhead‌ ‌(and‌ ‌surely‌ ‌it‌ ‌will),‌ ‌we‌ 
expect‌ ‌to‌ ‌still‌ ‌have‌ ‌a‌ ‌significant‌ ‌cost‌ ‌advantage‌ ‌over‌ ‌DynamoDB.‌ ‌The‌ ‌biggest‌ ‌risk‌ ‌for‌ ‌the‌ 
performance‌ ‌(or‌ ‌cost)‌ ‌advantage‌ ‌of‌ ‌Scylla‌ ‌will‌ ‌be‌ ‌the‌ ‌need‌ ‌to‌ ‌accurately‌ ‌implement‌ 
DynamoDB’s‌ ‌query‌ ‌‌isolation‌ ‌guarantees,‌ ‌such‌ ‌as‌ ‌concurrent‌ ‌conditional‌ ‌writes,‌ ‌which‌ ‌may‌ 
require‌ ‌us‌ ‌to‌ ‌use‌ ‌LWT‌ ‌which‌ ‌may‌ ‌be‌ ‌slower‌ ‌than‌ ‌Scylla’s‌ ‌traditional‌ ‌read‌ ‌and‌ ‌write‌ ‌operations.‌ 
We’ll‌ ‌go‌ ‌into‌ ‌the‌ ‌concurrency‌ ‌issue‌ ‌in‌ ‌more‌ ‌detail‌ ‌below.‌ 
The‌ ‌goal‌ ‌of‌ ‌this‌ ‌document‌ ‌is‌ ‌to‌ ‌provide‌ ‌a‌ ‌‌detailed‌ ‌overview‌ ‌of‌ ‌DynamoDB‌ ‌and‌ ‌all‌ ‌its‌ ‌features,‌ 
with‌ ‌a‌ ‌focus‌ ‌on‌ ‌how‌ ‌it‌ ‌differs‌ ‌from‌ ‌Scylla‌ ‌and‌ ‌what‌ ‌we‌ ‌need‌ ‌to‌ ‌implement‌ ‌to‌ ‌provide‌ ‌DynamoDB‌ 
API‌ ‌support‌ ‌to‌ ‌Scylla,‌ ‌and‌ ‌which‌ ‌features‌ ‌are‌ ‌missing‌ ‌in‌ ‌Scylla‌ ‌for‌ ‌complete‌ ‌DynamoDB‌ 
support.‌ ‌Our‌ ‌understanding‌ ‌of‌ ‌DynamoDB‌ ‌stems‌ ‌from‌ ‌Amazon’s‌ ‌public‌ ‌documentation‌ ‌(we’ll‌ 
include‌ ‌links‌ ‌below)‌ ‌and‌ ‌public‌ ‌presentations‌ ‌made‌ ‌by‌ ‌Amazon‌ ‌engineers‌ ‌(notably‌ ‌‌this‌ ‌AWS‌ 
Table‌ ‌of‌ ‌Contents:‌ 
Alternator: DynamoDB API in Scylla
The API server
The wire protocol
The data model
DynamoDB data storage
Data size limits
Number encoding
Tables and table operations
Operations on individual items (CRUD)
UpdateItem’s update expressions
Expression parsers in DynamoDB
Concurrent updates
Conditional updates using LWT
Atomic counters
Batch operations
Full-table scans
Partition or partition-slice scan
Consistency levels
On-demand backup and restore
Continuous backup (point-in-time recovery)
Condition expressions
Single-region and multi-region tables
Secondary indexes
Streams (“Change Data Capture”, or CDC)
Encryption at rest
Time To Live (TTL)
Resources and tags
Pricing or throughput limits
AWS management console
Service Level Agreement (SLA)
AWS integration
Toggle screen reader support