Jay Taylor's notes

back to listing index

DynamoDB API in Scylla

[web search]
Original source (docs.google.com)
Tags: database aws distributed-systems self-hosted dynamo dynamodb alternator design-doc docs.google.com
Clipped on: 2019-09-11

Image (Asset 1/9) alt=
Alternator‌:‌ ‌DynamoDB‌ ‌API‌ ‌in‌ ‌Scylla‌ 
August‌ ‌2019,‌ ‌Nadav‌ ‌Har’El‌ 
Overview‌ 
Amazon‌ ‌DynamoDB‌ ‌is‌ ‌a‌ ‌popular‌ ‌NoSQL‌ ‌database‌ ‌running‌ ‌as‌ ‌a‌ ‌‌service‌ ‌(only)‌ ‌on‌ ‌Amazon’s‌ 
cloud.‌ ‌‌db-engines‌ ‌has‌ ‌it‌ ‌ranked‌ ‌below‌ ‌Cassandra‌ ‌(18th‌ ‌place‌ ‌vs‌ ‌10th),‌ ‌but‌ ‌it’s‌ ‌quickly‌ ‌gaining‌ 
traction‌ ‌(see‌ ‌‌this‌ ‌graph‌)‌ ‌thanks‌ ‌to‌ ‌being‌ ‌pushed‌ ‌by‌ ‌Amazon‌ ‌and‌ ‌thanks‌ ‌to‌ ‌the‌ ‌convenience‌ ‌to‌ 
use‌ ‌it‌ ‌-‌ ‌no‌ ‌need‌ ‌to‌ ‌install‌ ‌anything,‌ ‌just‌ ‌pay‌ ‌for‌ ‌your‌ ‌actual‌ ‌use‌ ‌or‌ ‌for‌ ‌a‌ ‌performance‌ ‌reservation‌ 
you‌ ‌can‌ ‌change‌ ‌at‌ ‌any‌ ‌time.‌ 
However,‌ ‌while‌ ‌users‌ ‌are‌ ‌finding‌ ‌it‌ ‌easy‌ ‌to‌ ‌start‌ ‌with‌ ‌DynamoDB,‌ ‌they‌ ‌soon‌ ‌find‌ ‌themselves‌ ‌in‌ 
vendor‌ ‌lock-in‌,‌ ‌because‌ ‌this‌ ‌API‌ ‌is‌ ‌only‌ ‌supported‌ ‌on‌ ‌Amazon.‌ ‌And‌ ‌when‌ ‌it‌ ‌quickly‌ ‌gets‌ ‌very‌ 
expensive‌ ‌(see‌ ‌for‌ ‌example‌ ‌‌this‌ ‌post‌),‌ ‌one‌ ‌cannot‌ ‌install‌ ‌their‌ ‌own‌ ‌setup‌ ‌of‌ ‌DynamoDB,‌ ‌let‌ 
alone‌ ‌move‌ ‌the‌ ‌application‌ ‌using‌ ‌the‌ ‌DynamoDB‌ ‌to‌ ‌another‌ ‌cloud‌ ‌or‌ ‌a‌ ‌private‌ ‌data‌ ‌center.‌ 
Amazon‌ ‌itself‌ ‌provides‌ ‌an‌ ‌installable‌ ‌version‌ ‌of‌ ‌DynamoDB,‌ ‌‌DynamoDB‌ ‌Local‌,‌ ‌but‌ ‌it‌ ‌is‌ ‌oriented‌ 
towards‌ ‌testing,‌ ‌not‌ ‌production‌ ‌usage.‌ ‌There‌ ‌have‌ ‌been‌ ‌previous‌ ‌attempts‌ ‌to‌ ‌solve‌ ‌this‌ ‌vendor‌ 
lock-in‌ ‌problem‌ ‌by‌ ‌providing‌ ‌a‌ ‌DynamoDB‌ ‌API‌ ‌implementation‌ ‌on‌ ‌top‌ ‌of‌ ‌other‌ ‌databases,‌ ‌so‌ ‌it‌ 
can‌ ‌be‌ ‌installed‌ ‌on‌ ‌any‌ ‌cloud‌ ‌or‌ ‌data‌ ‌center,‌ ‌but‌ ‌the‌ ‌only‌ ‌one‌ ‌I‌ ‌found‌ ‌is‌ ‌‌Dynalite‌,‌ ‌a‌ ‌DynamoDB‌ 
implementation‌ ‌on‌ ‌top‌ ‌of‌ ‌LevelDB.‌ 
The‌ ‌goal‌ ‌of‌ ‌this‌ ‌proposal‌ ‌is‌ ‌to‌ ‌provide‌ ‌a‌ ‌DynamoDB‌ ‌implementation‌ ‌on‌ ‌top‌ ‌of‌ ‌‌Scylla‌.‌ ‌The‌ 
benefits‌ ‌to‌ ‌the‌ ‌users‌ ‌are‌ ‌avoiding‌ ‌vendor‌ ‌lock-in‌ ‌(Scylla‌ ‌is‌ ‌open-source‌ ‌and‌ ‌can‌ ‌be‌ ‌installed‌ ‌on‌ 
any‌ ‌cloud)‌ ‌and‌ ‌reduced‌ ‌cost‌ ‌because‌ ‌of‌ ‌Scylla’s‌ ‌world-famous‌ ‌efficiency‌ ‌and‌ ‌avoiding‌ 
Amazon’s‌ ‌cut.‌ ‌The‌ ‌reduced‌ ‌cost‌ ‌will‌ ‌be‌ ‌relevant‌ ‌even‌ ‌for‌ ‌users‌ ‌who‌ ‌choose‌ ‌to‌ ‌stay‌ ‌with‌ 
Amazon:‌ ‌In‌ ‌our‌ ‌‌blog‌ ‌post‌ ‌from‌ ‌several‌ ‌months‌ ‌ago,‌ ‌we‌ ‌demonstrated‌ ‌that‌ ‌Scylla‌ ‌is‌ ‌at‌ ‌least‌ ‌7‌ 
times‌ ‌cheaper‌ ‌than‌ ‌DynamoDB‌ ‌(considering‌ ‌all‌ ‌costs,‌ ‌including‌ ‌VMs‌ ‌and‌ ‌even‌ ‌a‌ ‌commercial‌ 
Scylla‌ ‌license).‌ ‌So‌ ‌even‌ ‌if‌ ‌a‌ ‌DynamoDB‌ ‌front-end‌ ‌will‌ ‌add‌ ‌some‌ ‌overhead‌ ‌(and‌ ‌surely‌ ‌it‌ ‌will),‌ ‌we‌ 
expect‌ ‌to‌ ‌still‌ ‌have‌ ‌a‌ ‌significant‌ ‌cost‌ ‌advantage‌ ‌over‌ ‌DynamoDB.‌ ‌The‌ ‌biggest‌ ‌risk‌ ‌for‌ ‌the‌ 
performance‌ ‌(or‌ ‌cost)‌ ‌advantage‌ ‌of‌ ‌Scylla‌ ‌will‌ ‌be‌ ‌the‌ ‌need‌ ‌to‌ ‌accurately‌ ‌implement‌ 
DynamoDB’s‌ ‌query‌ ‌‌isolation‌ ‌guarantees,‌ ‌such‌ ‌as‌ ‌concurrent‌ ‌conditional‌ ‌writes,‌ ‌which‌ ‌may‌ 
require‌ ‌us‌ ‌to‌ ‌use‌ ‌LWT‌ ‌which‌ ‌may‌ ‌be‌ ‌slower‌ ‌than‌ ‌Scylla’s‌ ‌traditional‌ ‌read‌ ‌and‌ ‌write‌ ‌operations.‌ 
We’ll‌ ‌go‌ ‌into‌ ‌the‌ ‌concurrency‌ ‌issue‌ ‌in‌ ‌more‌ ‌detail‌ ‌below.‌ 
The‌ ‌goal‌ ‌of‌ ‌this‌ ‌document‌ ‌is‌ ‌to‌ ‌provide‌ ‌a‌ ‌‌detailed‌ ‌overview‌ ‌of‌ ‌DynamoDB‌ ‌and‌ ‌all‌ ‌its‌ ‌features,‌ 
with‌ ‌a‌ ‌focus‌ ‌on‌ ‌how‌ ‌it‌ ‌differs‌ ‌from‌ ‌Scylla‌ ‌and‌ ‌what‌ ‌we‌ ‌need‌ ‌to‌ ‌implement‌ ‌to‌ ‌provide‌ ‌DynamoDB‌ 
API‌ ‌support‌ ‌to‌ ‌Scylla,‌ ‌and‌ ‌which‌ ‌features‌ ‌are‌ ‌missing‌ ‌in‌ ‌Scylla‌ ‌for‌ ‌complete‌ ‌DynamoDB‌ 
support.‌ ‌Our‌ ‌understanding‌ ‌of‌ ‌DynamoDB‌ ‌stems‌ ‌from‌ ‌Amazon’s‌ ‌public‌ ‌documentation‌ ‌(we’ll‌ 
include‌ ‌links‌ ‌below)‌ ‌and‌ ‌public‌ ‌presentations‌ ‌made‌ ‌by‌ ‌Amazon‌ ‌engineers‌ ‌(notably‌ ‌‌this‌ ‌AWS‌ 
  
Table‌ ‌of‌ ‌Contents:‌ 
Outline
Alternator: DynamoDB API in Scylla
Overview
The API server
The wire protocol
The data model
DynamoDB data storage
Data size limits
Number encoding
Tables and table operations
Operations on individual items (CRUD)
UpdateItem’s update expressions
Expression parsers in DynamoDB
Concurrent updates
Conditional updates using LWT
Atomic counters
Batch operations
Full-table scans
Partition or partition-slice scan
Consistency levels
On-demand backup and restore
Continuous backup (point-in-time recovery)
Transactions
Condition expressions
Single-region and multi-region tables
Secondary indexes
Streams (“Change Data Capture”, or CDC)
Encryption at rest
Time To Live (TTL)
Resources and tags
Pricing or throughput limits
DAX
AWS management console
Metrics
Service Level Agreement (SLA)
AWS integration
Benchmarking
Toggle screen reader support