Peloton – a relational database designed for autonomous operation

Jay Taylor's notes

back to listing index

Peloton – a relational database designed for autonomous operation | Hacker News

[web search]

Original source (news.ycombinator.com)

Tags: database peloton autonomous news.ycombinator.com

Clipped on: 2017-04-22

OP here. Peloton has been posted here before, but didn't get any attention.

I think this database is very interesting even if you don't care about the time saving part of it, since it claims to be a hybrid (OLAP and OLTP), it implements postgres' wire protocol and it claims to compile queries to machine code using LLVM [1].

[1]: https://www.youtube.com/watch?v=mzMnyYdO8jk (slideshow: http://www.cs.cmu.edu/~pavlo/slides/selfdriving-nov2016.pdf)

arielweisberg 90 days ago [-]

Andy Pavlo always tries to spice things up and his lectures and presentations are a treat.

He is on a list of people of mine that fits on 10 fingers. James Mickens is in there.

His work on H-Store was great. I spent 6 years working on VoltDB which is a commercial spinoff of H-store and it was a formative experience for me.

hokkos 90 days ago [-]

Strange talk and slides, I am not sure if he is serious or not.

sceadu 90 days ago [-]

He's joking about the rehab I'm sure...that's his style (I like it personally).

BTW, here are the video lectures to the graduate database course he mentioned in the presentation, where students were developing features for Peloton as part of the course (they're great IMO):

https://www.youtube.com/watch?v=MyQzjba1beA&list=PLSE8ODhjZX...

aerioux 90 days ago [-]

both in person, lectures, and videos - he has a very different style than many others

buremba 90 days ago [-]

The lectures are great.

jarulraj 90 days ago [-]

Thanks for the shout-out!

Jweb_Guru 90 days ago [-]

Side note, but I really dislike the current trend (in in-memory databases, to be clear) of not bothering to include any real provisions for durability and justifying it by saying "NVRAM exists." It effectively doesn't for anyone who need to be able to deploy to off-the-shelf environments, and it's super expensive (and if you're going for performance, like most of the research projects are, countering by using the database in a clustered configuration would be counterproductive). Are there any cloud providers who provide NVRAM in any configuration?

quickben 90 days ago [-]

But, it provides a dead easy way to publish a research work, claim insane speedups, and not worry about disk journals, caches, in flight data corner cases when VM is snapshoted, etc.

fulafel 90 days ago [-]

Flash storage is nvram, so yes, hosting companies offer it.

allyraza 83 days ago [-]

there are many types of NVM's not all of them are available on most hosting providers one of the big player offering ssd on cloud is digitalocean

Jweb_Guru 90 days ago [-]

Not in the sense that people mean in these papers, and you know it. It doesn't have even close to the same performance characteristics.

inconclusive 90 days ago [-]

The idea of write-behind logging is slick.

http://www.cs.cmu.edu/~pavlo/papers/p337-arulraj.pdf

jarulraj 90 days ago [-]

Thanks :) We believe that non-volatile memory (NVM) will be a game-changer for database management systems [1].

[1] https://www-ssl.intel.com/content/www/us/en/architecture-and...

tmd83 88 days ago [-]

Does anyone know what happens after the query plan is generated in most database? I'm assuming individual step, like index scan, hashjoin are coded already and the plan steps are iterated and respective methods are called? So the execution steps are already compiled but the step traversal is kind of interpreted. With Peloton LLVM engine everything is merged together in a single sequence of machine code?

How much advantage does this give you? Is there really so many steps in the execution plan (the visible steps are usually < 50) but what about the internal actual compiled steps? Unless this is allowing merging and further simplification steps identifying redundant operation that gets trimmed of not sure where 100x performance improvement comes from.

Though I remember seeing the scala based in-memory query engine that was sort of doing simplification of the actual steps and doing very well in benchmark, maybe this is similar.

buremba 90 days ago [-]

I wonder why they try to support both OLTP and OLAP workloads. Supporting both of these workloads requires too much work (both row and columnar storage types, different algorithms for both storage and querying etc) and they didn't even prove that autonomous systems (which is the main point of the project) can replace the existing databases.

jarulraj 90 days ago [-]

Great question! There happens to be an autonomous mechanism for supporting hybrid workloads (OLTP & OLAP). Peloton supports hybrid storage layouts that are automatically and dynamically adapted over time based on the workload patterns. Row and columnar storage types are special cases of hybrid storage layouts.

This is a promising area of ongoing research. If you are curious about this kind of autonomous tuning of storage layout, you might want to check this out [1].

[1] https://www.cs.cmu.edu/~jarulraj/papers/2016.tile.sigmod.pdf

allyraza 83 days ago [-]

I guess it is a trend currently with modern MMDB's (MEMSQL,HyperDb etc) have support for both OLTP & OLAP workloads. You can checkout the git repo give it a spin see if it hold up to the claims.

manigandham 90 days ago [-]

http://www.memsql.com/ does this today. Fast, distributed, rowstore + columnstore, relational database with mysql protocol.

buremba 90 days ago [-]

However Peloton also aims to be an autonomous system. That's a lot for undergrad and grad students so I'm not sure if he wants Peloton to be stable in a near future.

aerioux 90 days ago [-]

Also fits in the niche between people who want both possibilities - though the onus is on the authors to show that it actually is just as good

gigatexal 90 days ago [-]

This sure has a lot to live up to: trying to do two thing and do them Well isn't very unix-y. There's a reason relational database are set up to have oltp schemas (highly notmalized tables for supporting transactions etc.) and olap schemas (star schemas for example, large sometimes flat fact and dimension tables etc.). Also I'm not sure about the learning part: any decent database these days will cache frequently used data and tables can be built as in-memory ones.

aerioux 90 days ago [-]

> addressing your caching point

so from my understanding - the learning part isn't frequently used and caching, it's (attempting to be) generalized workload learning, the part of understanding that every DBA should do but usually doesnt.

If that is successfully and is even marginally able to predict workload skews, then the scheduling of operations can be significantly more efficient -- you're essentially reducing entropy in your database massively.

gigatexal 90 days ago [-]

Any team of database admins/engineers worth their salary plans for capacity, fixes inefficient queries, And works with development on future goals for what they want out of the database layer.

michaelmior 90 days ago [-]

And you don't think it would be valuable to be able to automate many of those tasks?

gigatexal 90 days ago [-]

I agree it would but my premise is that I doubt it can be.

mamcx 90 days ago [-]

Is very rare to have a DB that not need both oltp/olap workloads.

All db-based apps end fast the need requeriments for transactional code and move into "infinity-reporting-requests".

For certain ERP I work on in the past, it have at least 300 reports in the base package. Most request was for more reports specialized for each customers. And additions to the transactional code was in part driven by the need to add more data for the reports!

So, I think have both styles is exactly what "everyone" want. Even folks that get stuck with NOSQL databases.

---

I have thinking very much about this, I consider the ideal architecture is a relational-db with decoupled modules that work like this:

Write:

Commands -> WAL -> WaLProcessorAndRejector -> EventLog -> EventLogDispatchToOneOrMoreOf:

- Nothing. EventLog just is history - Caches - Relational Tables for up-to-date view on data - Columnar/Index for speed up part of the reports

Read:

ReadRequest -> ReadDispatchToOneOf:

- EventLog - Caches - Relational Tables - Columnar/Index

The need to be modular is that what is need can change by need.

jarulraj 90 days ago [-]

That's correct! This is the reason why we support both OLTP and OLAP workloads in Peloton.

gigatexal 89 days ago [-]

We do just fine with a data warehouse and a bunch of traditional OLTP databases.

jarulraj 90 days ago [-]

We certainly do :) There happens to be an autonomous mechanism for supporting hybrid workloads (OLTP & OLAP). Peloton supports hybrid storage layouts that are automatically and dynamically adapted over time based on the workload patterns. Row and columnar storage types are special cases of hybrid storage layouts. This is a promising area of ongoing research. If you are curious about this kind of autonomous tuning of storage layout, you might want to check this out [1].

[1] https://www.cs.cmu.edu/~jarulraj/papers/2016.tile.sigmod.pdf

leftnode 90 days ago [-]

How old is this project? I wouldn't be surprised to see a cease and desist from the maker of the exercise bike.

wcarron 90 days ago [-]

Peloton is in fact the French word for platoon. I'd be highly surprised if the bike maker had the legal standing for issue infringement claims. Just as you can't copyright the word "bicycle", peloton is used widely enough that they should be fine. Then again, Jade the preprocessor was forced to rebrand as Pug.

rasjani 90 days ago [-]

Peloton is also Finnish word and it means 'fearless'

kod 90 days ago [-]

Copyright has nothing to do with this. You certainly can register bicycle as a trademark, bicycle brand playing cards, for instance.

lumista 90 days ago [-]

Peloton is also Finnish word and means fearless.

bjterry 90 days ago [-]

Peleton is a word referring to the main group in a bicycle endurance race, so it's not the same as calling your software "Wal-Mart." In this case it seems like the fact they are in completely different markets would be sufficient.

daenney 90 days ago [-]

The first commit was:

  commit 35823950d500314811212282bd68c101e34b9a06
  Author: jarulraj <jarulraj@cs.cmu.edu>
  Date:   Thu Dec 18 16:41:48 2014 -0500

Take a look at the different graphs on GitHub, like code frequency, to get a better idea: https://github.com/cmu-db/peloton/graphs/code-frequency

zirok 90 days ago [-]

Peloton also means fearless in finnish, so the name could be based on that.

rosser 90 days ago [-]

That's not how trademarks work.

Search: