Jay Taylor's notes

back to listing index

Writing Rust the Elixir Way | Hacker News

[web search]
Original source (news.ycombinator.com)
Tags: programming erlang concurrency elixir beam rust bernard-kolobara wasm news.ycombinator.com
Clipped on: 2020-12-18

Image (Asset 1/2) alt=
Image (Asset 2/2) alt=
> Now you can write regular blocking code, but the executor will take care of moving your process off the execution thread if you are waiting, so you never block a thread.

This is super important and awesome. It's a big part of what makes Erlang simpler to write code for than other highly concurrent environments.

Yeah it because the Erlang VM have a preemptive scheduler and each process have a mailbox which gets a designated amount of time to access cpu.

You can do a for(1) for loop and it won't hog the cpu because it get premptive. The problem is when you have large amount of problematic mails in the mailbox....

So having written production Erlang, I have literally never had an issue with large amounts of problematic mails in the mailbox. It's like, yeah, technically it's possible, since there's no forced constraints around what kinds of messages a given process can be sent (which is intentional; after all, the process may enter a later receive statement that knows how to handle those kinds of messages), but it's really easy to do correctly. And you can always program defensively if you want to limit the kinds of messages you respond to by having a wildcard pattern match at the end of a receive.

If you're not running into overfull mailboxes, you're not trying hard enough! :D

Usually, it's some process that's gotten itself into a bad state and isn't making any progress (which happens, we're human), so ideally you had a watchdog on it it, to kill it if it makes no progress, or flag for human intervention, or stop sending it work, cause it's not doing any of it.

But sometimes, you get into a bad place where you've got much more work coming in than your process can handle, and it just grows and grows. This can get worse if some of the the processing involves send + receive to another process and the pattern doesn't trigger the selective receive optimization where the current mailbox is marked at make_ref() and receive only checks from the marker, instead of the whole thing.

If you miss that optimization, the receives will check the whole mailbox, which takes a long time if the mailbox is large, and tends to make you get further behind, making it take longer, etc, until you lose all hope of catching up, and eventually run out of memory; possibly triggering process death if you've set max_heap_size on the process (although where I was using Erlang we didn't set that), or triggering OS process death of BEAM when it can't get more ram because allocation failed or OOM killer, or triggering OS death, if it has trouble when BEAM sucks in all the memory and can't muster its OOM killer in time and just gets stuck or OOM kills the wrong thing.

Yep; as I mentioned elsewhere, I'm not discounting mailboxes filling faster than you can handle them, but rather the idea that you have 'problematic mails in the mailbox' - I've never had problematic mail in the mailbox. I've certainly seen mailboxes grow, because they were receiving messages faster than I was processing them, and i had to rethink my design. But that isn't an issue with the mail going into the mailbox; that's an issue with my not scaling out my receivers to fit the load. As I mentioned elsewhere, that may seem like semantics, but to me it isn't; it means the language's working that way makes sense, and it's rather an issue that I created a bottleneck with no way to relieve the pressure (as compared to messages that can't be handled and just sit there causing receives to take longer and longer over time).

Oh, I think I see. I don't think there's such thing as a problematic mail for BEAM really, mails are just terms, and BEAM handles terms, no problem. A mail that contained a small slice of a very large binary, would of course keep the whole referenced binary hanging around, which could be bad, or a large tuple could be bad because if you tried to inspect the message queue through process_info in another process, that would be a big copy.

But I think maybe the original poster just meant lots of bad mail in the mailbox to mean mail that would take a long time to process, because of how the receiving process will handle it.

Or, possibly bad mail meaning (as you suggest, perhaps), mail that won't be matched by any receive, resulting in longer and longer receives when they need to start from the top.

"But I think maybe the original poster just meant lots of bad mail in the mailbox to mean mail that would take a long time to process, because of how the receiving process will handle it."

Yeah; just, if he meant that, it seems like a...weird call out. Since that's not particular to Erlang's messaging model; that's true in any system where you have a synchronization point that is being hit faster than it can execute. Seems weird to call that out as a notable problem, as such.

What's unique to Erlang, and -could- be prevented (by limiting a process to a single receive block, and having a type system prevent sending any messages not in that receive block), if you wanted to change the model, is the fact I can send arbitrary messages to a process, that will never match on them, and because it's a queue, will cause it to delay all following messages from being handled. Hence my focusing on that; yes, that's a potential problem, no, it's not a particularly likely one.

I've also never had the issue where process mailboxes were filling faster than the messages were being consumed. If I were to run into a problem where that was an issue I would question whether or not erlang/elixir was the right tool for the task, but in my experience there's always been a way to spread that load across multiple processes so that work is being done concurrently and eventually across multiple nodes if the throughput demands keep increasing. If the workload truly does have to be synchronized I've always had the experience that sending a message to another tool was the right answer - maybe a database query or another service, for example.

Overfull mailboxes can be problematic even with a wildcard `receive` and can happen if some branch of the control flow is slower than expected. We have some places in our code that essentially do manual mailbox handling by merging similar messages and keeping them in the process state until it's ready to actually handle them.

Right, but I wouldn't characterize that as problematic messages...rather problematic handling, or systemic issues with load. I.e., my fix does not change the format of the messages. Semantics, perhaps, but a difference between "this design decision of Erlang's caused me problems" and "this architectural/design decision of mine caused me problems"

async-std has been doing this for a while. The runtime detects blocking tasks automatically and offloads them to a seperate thread.

EDIT: The automatic blocking detection of async-std was abandoned after further discussion

There is an important difference here. In Lunatic all syscalls are non-blocking, they are not just offloaded to a separate thread. However, they look like regular "blocking" functions from the developer's perspective. Under the hood Lunatic uses async-wormhole[0] to make this work.

[0]: https://github.com/bkolobara/async-wormhole

Did that actually go into a release? The last I heard was that they tried it on a branch about a year ago and then abandoned it.

I though it had been merged but it looks like they remove the auto blocking detection and merge a revised version of the runtime [1].

1: https://github.com/async-rs/async-std/pull/733

Love Erlang and Elixir and Beams concurrency model is quite interesting. This Rust variant seems very interesting as well. But I wanted to discuss this part:

> In most cases it's enough to use a thread pool, but this approach fails once the number of concurrent tasks outgrows the number of threads in the pool

I think something has been lost from all the dialogue about concurrency. The only way the number of concurrent task can outgrow the number of threads is when you're handling IO. What I see rarely discussed here is forms of non-blocking IO and why those haven't become the norm. Why are we all trying to work around models that involve blocking on IO? I feel I almost never hear about using non-blocking IO as an alternative?

For all other tasks which involve computation, thread pools are needed to improve throughput and performance, only threads can be executed in parallel. Yes you can still multiplex many more small compute task over many threads, if you wanted a really responsive but overall slower program, that could be an option, but I think most people going for a concurrent approach aren't doing so for that, but really just as a way to leverage their IO cards to the fullest.

So my question is, what's wrong with having a thread pool combined with non-blocking IO ?

You can do non-blocking IO (and, of course, that's what's happening behind the scenes here, and in Erlang). The thing is, a program written with non-blocking IO has a lot different structure than with blocking IO.

It can feel clearer and easier to reason about what happens on a single connection if the code is like:

accept, read, process, write and somewhere else handles the concurrency

Especially if the one system does a lot of different things, but they're all related enough to be in one OS process, mashing together a lot of isolated non-blocking IO can be pretty tricky, and hard to keep track of.

> What I see rarely discussed here is forms of non-blocking IO and why those haven't become the norm. Why are we all trying to work around models that involve blocking on IO? I feel I almost never hear about using non-blocking IO as an alternative?

Maybe I don't understand what you mean by non-blocking I/O, because it seems to me that it is the norm in many environments, for better or worse. Async/await syntax is now available in C#, JavaScript, Python, Rust, and possibly others. And before that we had callbacks in JavaScript and in Python frameworks like Twisted and Tornado. So it seems to me that non-blocking I/O has been widely used and discussed for quite a while now.

I think I don't often hear it in the context of C# and Rust maybe, or what I mean is, in the context of a multi-threaded environment, so in combination with a Thread Pool.

Say you used the thread pool model. And you receive 100 requests, and you have a thread pool of 50. Your first 50 requests go to your 50 threads, and the other 50 get queued up. After any of the running 50 requests make non-blocking IO, they release the thread back to the pool, and they yield themselves into an IO waiting queue. Then the scheduler takes from either the request queue or the IO waiting queue prioritizing one over the other as see fit. If any of the queues are full, then the system blocks and waits, dropping requests that'd come in during that time.

Or some similar models.

Is this just async/await that I described?

Yes, this is how async/await typically works in Rust, and in C# when using a server-side framework like ASP.NET Core.

At least for C# ASP.NET it's basically what you described. There's a few optimizations, for example the threadpool has pretty smart autoscaling to maximize throughput.

There are two main ways of doing that.

One is the way GoLang does it. In GoLang whenever you execute a blocking IO function, the language under the hood executes the non blocking equivalent, and reuses the thread for some other task by literally switching stacks. This is a conceptually easy approach, but it has some downsides.

One big downside is a greater cost when doing FFI calls. Because you have many stacks, they are small by default, and you need to grow them significantly before entering FFI code, since C code assumes a large stack is available. You may also need to dynamically change the size of the thread pool if too many threads are currently running FFI code that is blocking. (A logically simpler but more costly option is to have a separate FFI thread pool with large stacks, and all calls to FFI functions are treated as blocking I/O calls on the main pool, where the FFI function gets scheduled to run in the FFI pool, and when it completes the task that called the FFI gets queued back up to continue running, just like asynchronous IO had completed.)

The other approach that most async-await languages use is called "stackless". In this approach, all functions that can block have a different return type, which is some form of promise-like object (these functions are known as "async"). In order to get the result (or ensure all side effects have been completed), you need to "await" the promise, which is a keyword that causes the function to logically pause execution until the result is available.

However rather than stack switch, using await makes the compiler transform the function. The parameters and "stack variables" for the function are instead stored in an object on the heap, along with information about where in the function we paused. The function is transformed such that under the hood, you pass in this heap object. (Alternatively, the function becomes a method on the heap object.) When the function is called, it looks in the heap object to find which pause location we are at, and more or less does a GOTO to that point in the code.

When an await is encountered, if the value of the awaited promise is not yet available, the function will write its current position into the heap object. It will inform the scheduler that it needs to be called again when the promise is complete. It will return a promise associated with the logical invocation to the caller. If this is the first time it returns the caller will be user code. If it had previously returned, then its direct caller will be the scheduled task runner. When the function finally reaches the end (e.g. a return statement), it will update its promise, and return it (again or for the first time).

The net effect of all this is that each thread has a single stack. The transformations result in the call stack pretty much always being the same as if callback based approach was used, if callbacks got scheduled to run on the threadpool (instead of direct execution). But the code flow as written by the programmer looks more like like normal blocking I/O code, making it much more readable than having a whole bunch of callback functions.

The downside is needing to add the async and await keywords all over the codebase, and having more heap churn.

Of course, there are more downsides and upsides to each approach, and there can be variations. It is not impossible to utilize the async/await keywords with a stack switching approach for example.

There were some great articles posted recently about async I/O (in context of modern storage hardware) recently here:



For languages that support async/await (Typescript, C#, Python), you can write code that looks blocking but got executed as if it's a non-blocking state machine.

I'm extremely surprised that creating 2k threads makes mac os reboot. Sure that's a lot for one application but not a totally crazy amount.

Yeah, me too, I found that pretty shocking actually. So shocking that I basically didn't believe it, so I tried myself (in C++, but it doesn't really make a difference). With 2000 threads, the computer had no problem whatsoever, the process only took around 16 megabytes of memory and not very much CPU.

So I bumped it up to 20,000, thinking that would probably work as well: the computer immediately crashed and rebooted. I didn't even get the "kernel panic" screen, it just died (this is on a 2018 mac mini with a 3.2 Ghz Core i7). When it turned back on, this was the first line of the error message:

Panic(CPU 0, time 1921358616513346): NMIPI for unresponsive processor: TLB flush timeout, TLB state:0x0

Weird. I really thought that this wouldn't be an issue at all. And: if it was an issue, that the OS would be able to handle it and kill the process or something, not kernel panic.

Yeah, that’s pretty worrying that the OS just punts. I always thought limiting the potential damage a user space process could do was one of the main jobs of an OS.

If you have any more OSes laying around to run the test on, I’d be interested to hear how well windows and Linux handle the same thing.

Because on the face of it this seems like a serious bug in the OS. I’m only used to seeing this sort of thing with bad drivers

I just tested this on my Linux workstation.

2000 threads does nothing - everything's still responsive, and the process is shown as using 0% of the CPU.

16,000 threads uses ~30% of a core, with a ~136MB RSS. The system still handles it fine, though, and everything stays responsive.

At 20,000 the program panics when spawning threads, with the message "failed to set up alternative stack guard page" due to EWOULDBLOCK. I'm not sure exactly what limit it's hitting, though.

Sounds like it's having trouble allocating memory for the stack and stack guard. Whatever limit it's hitting though, Linux seems to be able to handle it correctly, which is to kill the process instead of a kernel panic.

Thanks, much appreciated.

That’s the sort of thing I expect - the OS starves the program, not the other way around.

It's probably about system resources - VM reservations for each stack and heap etc. Not a lot of checks inside kernel thread creation code; and not a lot to do about it if anything fails. My friend Mike Rowe said it this way: its like having an altimeter on your car so if you go off a cliff, you know how far it is to the ground. When hard limits on system resources are exhausted, it can be very hard to write code to recover. Imagine if the kernel call fail recovery code needed to create a thread to handle the exception!

I believe that this is related to the operating system being overwhelmed with waking up every 100ms on 2k threads. This example is not that great though. Depending on the OS and CPU you should be able to run much higher amount of threads.

I'm not too surprised. mac os isn't tuned out of the box for high loads, and in some areas really can't be (there's no synflood protection, since they forked FreeBSD tcp from months before that was added, for example)

Not that 2k threads is really that high; but it's probably high enough to break something.

You’d expect it to kill the offending user space process rather than the OS though, right?

If you hit an explicit limit, I'd expect the thread spawn to fail, and most processes to panic if thread spawning failed, sure.

But if you run below the explicit limit, but above the implicit limit of whatever continues to work, it's not surprising to me that the OS just freaked out.

You could report it to Apple, but their reporting process is pretty opaque, especially if you're not a registered developer (because why would you be, if you're just using mac os because you like the desktop environment, or whatever). Who knows if they'll fix it, but it's not worth making a big deal over, because you weren't really wanting to run 2k threads on mac os anyway, because it would suck, even if it did work.

From the other message on the thread; it looks like too many threads is causing a watchdog timer to fail, leading to the panic.

Sure, it just seems odd that the OS lets itself get into a situation where it can't recover - even in a corner case like this.

Like, can I just start 20k threads in WebAssembly and reboot anyone who visits my site?

Obviously I'd expect the browser to guard against that... but I expected the OS to as well, so my expectations may be way off!

I mean, I would expect most OSes to today. If you run a couple thousand threads on windows 10, or Android 10, or a recent Linux or BSD, it's probably fine (ish).

But probably not on Windows 9x, maybe not even on NT4 (although, NT4 was pretty solid), and I wouldn't expect good results on Linux or a BSD from 2000 either.

But macos has lowered my expectations. It's a mash of FreeBSD from the turn of the century, with Mach from earlier, and whatever stuff they've fiddled with since then, plus a nice UI layer. They don't regularly pull in updates from FreeBSD, and the killed their server line (which was mediocre at best anyway), so when it panics if you do something weird, it's not unexpected.

A quick stab on my linux laptop has it hitting at most 25% CPU utilisation, and consuming almost zero memory. Seems really odd that this would nuke a Mac somehow.

Author here! I will take some time to answer any questions.

Coming from Go which introduced me to the concept of lightweight threads, I really miss goroutines and Go's concurrency model with channels/select/cancellation/scheduling.

However, targeting WASM seems like a pretty huge compromise to doing this natively in Rust. Conversely, it looks like what you're trying to do wouldn't currently be possible without the WASM/WASI stack. Curious what your thoughts are about not implementing it as a Rust runtime target? Did you investigate this and why did you rule it out if so?

Note, the Go model falls short against the Erlang/Lunatic one. You can't externally kill a goroutine at arbitrary points (you need to cancel them with a cancel chan / context with manual checking). You can't prioritize a goroutine over others. You can't serialize a goroutine state and send it over the network. You can't fork a goroutine. You can't sandbox a goroutine. Etc.

Using WASM is a tradeoff, but WASM is very fast, surely faster than BEAM process.

> You can't externally kill a goroutine at arbitrary points

Why not? Isn't that just an artifact of Go's implementation, or is there fundamentally something in CSP that prevents this?

It's essentially cultural. A fundamental assumption of the Go language, implementation, libraries, and community is that goroutines share memory. You don't write concurrent Go code assuming that a Goroutine could arbitrarily disappear, so breaking this assumption will break things.

Lunatic is not like that:

> Each Lunatic process gets their own heap, stack and syscalls.

Similarly, Rust code that uses threads and/or async isn't going to work in Lunatic (or at least won't get the advantages of its concurrency model) without being rewritten using Lunatic's concurrency primitives. The concurrency model is more like JavaScript workers or Dart isolates, though hopefully lighter weight.

I'm guessing that the Rust language might work well because move semantics assumes that values that aren't pinned can be moved, and that's a better fit for transmitting messages between processes. But there will probably be a lot of crates that you can't use with Lunatic. If it became popular, it would be forking the community at a pretty low level. You'd have some crates that only work with Lunatic and others that use async or threads.

I don't see it as a compromise. Using Wasm allows for some really nice properties, like compiling once and running on any operating system and CPU architecture. Lunatic JIT compiles this code either way to machine code so the performance hit should not be too big.

When you say "implementing it as a Rust runtime target", I assume you mean targeting x86/64. In this case it would be impossible to provide fault tolerance. The sandboxing capability of Wasm is necessary to separate one processes memory from another.

Go's concurrency model seems like a compromise:




Also, I can't think of what scheduling capabilities Go's concurrency has?

I'm not sure I understand your second example qualm. That's logical described-on-the-box behavior.

Because the behavior is merely described, not enforced. It's relying on everyone to obey the two design rules that: 1) only the writing side should close the channel and 2) only one side should write.

In this sense, it's as safe as mutexes: everything works as long as everyone (including any 3rd party library) does exactly what they're supposed to. When compared to what modern and robust concurrent programming looks like (e..g Erlang), it seems like a compromise to me

You can have millions of coroutines in flight in Rust using async/await without issues.

This app explicitly wants to use threads, for some reason.

Recently started my journey with Rust having worked with Elixir in production for about two years, but I keep an ear out on Rust and Elixir/Erlang development

My question: are you familiar with the Lumen project? https://github.com/lumen/lumen#about. Both projects appear to have some overlap. Secondly, what, if any, will be the primary differentiators here? (I've not looked at either projects in too much detail)

One of its creators, Paul Schoenfelder (Bitwalker), you're probably familiar with as he's authored a few popular libraries in Elixir and the other core developer, Luke Imhoff, is ensuring that WASM is taking players like Elixir/Erlang into account being a part of one of the organizations or committees if I recall correctly

Lunatic seems very enticing to me. I remembered it, but not its name, when I read the intro to this blog post and my question was going to be if you’d tried it and what you thought of it but then a couple of paragraphs further into the blog post I learned that you are the author of Lunatic :P

But instead I would like to ask about the future of Lunatic. What is you vision for it? Like, is it a hobby project that you are doing for fun or is it an endeavor that you intend for to be powering big systems in production?

Furthermore, how will you fund the development of Lunatic? And where will other contributors come from? Will they be people scratching their own itches only or will you hire people to work on it with a common roadmap?

I started working on Lunatic around a year ago in my free time. Until now I just wanted to see if it was even technically possible to build such a system. Now that I have the foundation in place and you can build some "demo" apps, like a a tcp server, I'm starting to explore options for funding. From this point on I think that the progress is going to be much faster, especially if others find it useful enough to contribute to it.

Here's a possible funding idea: See if you can get one of the big cloud providers or CDNs to notice the project, so they can hire you to build something that goes head to head with Cloudflare Workers.

We're hiring.... https://careers.microsoft.com/us/en/job/915502/Senior-Softwa...

And definitely looking at doing something like this next year.

Maybe you can apply at https://prototypefund.de/en/

I think the Lunatic architecture can enable shared memory regions between processes and between processes and host in addition to the erlang-like shared nothing approach. That would be an advantage over (non-NIF) erlang for some use-cases. Do you plan something like that? Can you easily map a data structure in Rust? (I think is doable between WASM process but not sure about between WASM and native host).

Another question: What about sending code between process? Like sending a fun between Erlang process.

IMHO this architecture has the potential to go beyond BEAM, good work!

Thank you! This is something I want to support and there is some WIP code for it. Currently I'm only waiting on Wasmtime (the WASM runtime used by Lunatic) to get support for shared memory.

Regarding the question about sending code, this also can be implemented. Wasm allows you to share function indexes (pointers) and dynamically call them.

> working with async Rust is not as simple as writing regular Rust code

Working with async Rust is very simple if you are not writing futures yourself. Tokio and async-std provide api's that mirror std except for the fact that you have to add `.await` to the end of the operation and add the `async` keyword to your function. With proc-macros, `async main` is just a matter of adding a small annotation to the top of the function. Async functions in traits are blocked due to compiler limitations, but `async-trait` makes it easy. What part of async Rust is more complicated than synchronous Rust?

> and it just doesn't provide you the same features as Elixir Processes do.

Can you explain this? How is tokio's `task` different then elixir processes? From the tokio docs:

> Another name for this general pattern is green threads. If you are familiar with Go's goroutines, Kotlin's coroutines, or Erlang's processes, you can think of Tokio's tasks as something similar.

In general thinking about concurrent code in terms of threads is easier than thinking in terms of async code(it's a lower level abstraction).

> Can you explain this? How is tokio's task different then elixir processes?

Tokio's tasks and go's goroutines and kotlin's coroutines are cooperatively scheduled, i.e a infinite loop can block other tasks from running.

Erlang and lunatic have pre-emptive schedulers(similar to a OS scheduler) that schedule processes fairly by giving time slices to threads.

The BEAM scheduler is not quite preemptive as I understand it, but in practice it gets close because instead of yield points being defined manually with “await” they are inserted automatically when you return from a function or call a function or do some number of other fundamental operations, and the pure functional nature of the language means that functions are short and consist almost solely of other function calls.

A favorite HN comment that discusses this: https://news.ycombinator.com/item?id=13503951

Go read it all but here's a relevant quote:

""" So how does Erlang ensure that processes don't hog a core forever, given that you could theoretically just write a loop that spins forever? Well, in Erlang, you can't write a loop. Instead of loops, you have tail-calls with explicit accumulators, ala Lisp. Not because they make Erlang a better language to write in. Not at all. Instead, because they allow for the operational/architectural decision of reduction-scheduling. """

That is a fantastic comment. I also recommend the BEAM book to those who want to go deeper: https://blog.stenmans.org/theBeamBook/

I haven’t finished it yet but the chapters on scheduling are great.

There's also BEAM Wisdoms: http://beam-wisdoms.clau.se/en/latest/

That is how haskell's scheduler works, but I was not aware that it was the same with BEAM. Makes sense.

lol, so BEAM is for schduling like Rust is for memory allocation?

I'm not familiar with erlang/elixir so I assumed that processes were similar to goroutines:

> In other languages they are also known as green threads and goroutines, but I will call them processes to stay close to Elixir's naming convention.

they're similar, but the developer ergonomics around processes are way better. It's difficult to mess up, coding in the BEAM feels like bowling with those rubber bumpers in the gutter lanes, especially around very difficult concepts like concurrency and failure domains.

Go makes it easy to mess up because the abstractions are superficially simple, but it's a pretty thin layer that you punch through.

> What part of async Rust is more complicated than synchronous Rust?

The Rust part, of course :) Seriously though, the compiler error messages alone make it a major pain - although I can't figure out if it's an issue of maturity (language and ecosystem), a fundamental tradeoff with Rust's borrow checker, or me just getting way ahead of myself.

I can rarely go a few days of async programming in Rust before running into some esoteric type or borrow checker issue that takes hours to solve because I borrowed some non Send/Sync value across an await point and the compiler decides to dump a hot mess in my lap with multi-page long type signatures (R2D2/Diesel, looking at you).

Those subpar diagnostics are a bug. The underlying reasons are that

- The async/await feature is an MVP

- async/await desugars to code you can write yourself that leverages a bunch of relatively advanced features, namely trait bounds and associated types

- It also leverages a nightly only feature (generators)

- Async programming is inherently harder :)

- We've had less time to see how people misuse the features to see where they land in order to clean up those cases and make the errors as user friendly as they can be

Put all of those together and the experience of writing async code in Rust is similar to the experience of writing regular Rust code maybe two or three years ago. When you encounter things like this, please file a ticket at https://github.com/rust-lang/rust/issues so that we can fix them.

> Working with async Rust is very simple if you are not writing futures yourself.

> Tokio and async-std provide api's that mirror std except for the fact that you have to add `.await` to the end of the operation and add the `async` keyword to your function.

There are absolute pain in the ass problems with `async` currently, in large part due to async closures not being a thing, which means it's very hard to create anything other than an `FnOnce` unless you desugar the future entirely.

Async `Fn*` traits are possible on nightly with the `unboxed_closures` and `async_closure` features.

I don't consider things enabled by nightly features to be "possible" today, only "potentially possible at some time in the future". The days of using nightly because of a single needed feature in a popular crate are (in my eyes) gone.

Yes, by "not being a thing" I meant "not being stable", I thought I'd edited before posting but apparently I only thought of doing so, sorry 'bout that.

And what do you do with functions from external crates that take callback functions but are not async themselves?

You are now limited to non-async functions and if the operation of the crate depends on the return values of those functions, you will need some extreme measures to make it work.

Erlang processes are not green threads.

Green threads can share memory while Erlang processes cannot, they are strictly forbidden to do it.

Also Erlang scheduler is highly optimized to work as a soft real-time platform, so they never run for infinite amount of time, they never block and never (at least that's the goal) bring down the entire application, the worst thing that can happen is that everything slows down but it's still functional and responsive.

I don't know about Tokio.

> Erlang processes are not green threads. Green threads can share memory while Erlang processes cannot, they are strictly forbidden to do it.

So message passing is the only way to communicate between proccesses? I guess that makes sense with elixir being a fp language. This was not clear in the article:

> Lunatic takes the same approach as Go, Erlang and the earlier implementation of Rust based on green threads.

> So message passing is the only way to communicate between proccesses?

There are escape hatches. Obviously, you can do something really heavy like open a network port or open a file descriptor, but it also provides you with use ets, which is basically "really fast redis for intra-vm stuff" and you can transact over shared memory if you drop down to FFI.

Basically only message passing. As another poited out, you can use FFI calls and Erlang Term Storage, possibly some other means to communicate, but the central feature is that each process has an ID, and then you send() a message to it.

each process also has a receive() block where you essentially listen to everything that ends up in your mailbox and pattern match on it to take action.

Is there a reason you used threads in the rust example instead of tasks? I think it would have been more useful to compare against proper rust async concurrency: I ran 200k sleep tasks over 12 threads using `async-std` and each thread only used up ~30% cpu (~60% on `tokio`), and <10% with 20k tasks.

What is the proper way to call `Process::sleep` in your example? I don't see it in the lunatic crate or documentation, and I can't compare results without running the same test in lunatic.

Edit: I guess async rust is mentioned, but it doesn't really explain in detail what lunatic is solving that async rust doesn't provide, besides a vague "more simple" and "extra features," which the example doesn't really show.

So must the `crashing_c_function()` in your example be compiled to WASM before it can be used in Lunatic? Another comment elsewhere asked:

> Well then that is not comparable to NIFs at all. In fact it is an extremely misleading example...A lot of the time you can not compile the C code to wasm and what do you do then? How do you interface with OS libraries?


Yes, it must be compiled to Wasm to work.

This is a bit out of left field, and perhaps more to motivate others to try, but...

Have you considered making this a port driver for BEAM? Then you could call some function from Elixir to launch a wasm actor (that happens to be written in rust)?

Your BEAM would still be imperiled if the Lunatic? layer violates conventions, of course; but it may (or may not) be simpler than reinventing the rest of OTP?

What do you see as the advantage of using these Rust + WASM running on Lunatic processes, versus developing in Elixir?

Using C bindings without fear would be a big one. If you use NIFs in Elixir you may bring down the whole VM with a segmentation fault. While Lunatic limits this failure to only the affected process. WASM was designed to be run inside the browser so the sandboxing capabilities are much stronger.

Another would be raw compute power, WASM comes close to native in many cases.

Other advantages over BEAM: Static typing, multiple languages with a WASM target available, live migration of process to the browser.

One question that I have is at what point do you call something a VM?

Also, I haven’t looked at your code, so maybe this is not understanding things correctly, but wrt targeting wasm, could your work ever make it into the actual rust runtime?

> into the actual rust runtime?

Rust has the same level of runtime as C, with no plans to expand it. So, regardless of how awesome this project is, this is unlikely.

(Notably, there is no "Rust runtime" for async, you have to bring your own if you want to do it.)

Bastion [1] is also inspired by Erlang and provides many of the features described in this post. Although it does not compile to Wasm, it provides:

- Message-based communication

- Runtime fault-tolerance - a lean mesh of actor system.

- A Supervision system that makes it easy to kill or restart subprocesses

Does anyone have experience with Bastion or other Rust actor systems?

1: https://bastion.rs/

So I am very fascinated by both languages and work in Elixir professionally, with a pretty minimal knowledge of Rust. I was just wondering what the motivation for creating this project was rather than just writing the application in Elixir with NIFs in Rust whenever higher performance is needed?

If a NIF fails it takes down the whole Erlang VM, so you loose all guarantees that make Elixir great once you resort to NIFs. There are many other gotchas, like if you spend too much time inside the NIF it will affect the stability of your whole system. This is eliminated by some design choices taken by Lunatic.

But as I mentioned in the blog, if you can use Elixir/Erlang do it! It's a great language and runtime.

I haven't used it professionally, but it seems like the Rustler project mitigates some of the problems with NIFs. Is that an accurate impression?

Very interesting and good to know! I've been keeping a closer eye on Rust lately as its beginning to look more and more like it would be foolish to ignore. Rust and Go have really made it hard to conceptualize where Elixir fits into the dev world these days. Go just seems to be a faster Elixir in a lot of ways. I'll keep an eye on this project and if you need any help - feel free to tag me on github (same username as this one)

> Go just seems to be a faster Elixir in a lot of ways

If you haven't worked with both, you just don't know. The developer experience in go is nowhere near the developer experience in Elixir. It's just so much easier to write scalable code in Elixir than Go. Sure it's not as fast, but it's not god-awful slow (speedwise, you're doing better than python django, for example), and for 90% of people shopping for what elixir and go offer, the network is the bottleneck.

To expand on that, the Go runtime is nowhere near the BEAM. To see why, check out this rant by a developer who has done a lot of work with Erlang [1], particularly starting at "Where it got really ugly for me".

[1]: https://vagabond.github.io/rants/2014/05/30/a-week-with-go

I've only worked in Elixir. I think I just recently experienced some Go "fomo" because it seems to be more or less attempting to solve the same problem. I can't quite justify moving to Go from Elixir + Rust, but I'm always interested in hearing the community at-large take. Thanks for the response!

I've done both (Go first, actually), and I hated Go. And all of my dev friends who are still in goland have nothing but complaints. Small sample size (n ~ 3), but still.

I'd find it interesting to know what those complaints are about.

Complaints I've heard: json broken, no good way to do orm, http library not great, no good framework for http webserver, etc. Channels/goroutines are deceptively hard, easy to cause resource leaks.

not the person you're responding to, but this blog and blog post has some good, in-depth points: https://fasterthanli.me/articles/i-want-off-mr-golangs-wild-...

Thanks. After reading it, I wonder if the problems he mentions only occur in Go, or if they occur in most languages that are not Rust.

None of these problems are problems in elixir or zig, the two langs I'm most familiar with currently (they are handled differently, of course).

Is this the perfect Hacker News title?

They should've included the WASM part in it

Wait why does calling a crashing c function not crash the process?

the C code is compiled to WASM bytecodes, they are interpreted in a WASM VM. If the C function crash, it does it inside the VM, so the crash is contained.

Well then that is not comparable to NIFs at all. In fact it is an extremely misleading example.

Why is it misleading? If BEAM could isolate crashes in NIFs, it absolutely would. Either way the point is that you are able to leverage C code (though in Rust's case the point is just to leverage existing code, since you don't need to drop into C for performance).

Because you are not actually calling a native compiled C lib you are calling wasm code. A lot of the time you can not compile the C code to wasm and what do you do then? How do you interface with OS libraries?

You may be focusing too closely on the C aspect; the point of the demonstration in the OP is to show that Lunatic can gracefully sandbox and interoperate with any code that can be compiled to WASM, of which C code is just one example. Ideally you wouldn't need to access any OS libs from within the sandbox (indeed, much of the point of the sandbox is to provide alternatives to OS facilities), and even if you did you could still access those libs in an un-sandboxed form from within your ordinary Rust code (and yes, crashing at that point would take down the process, but it's still strictly better to have an option to run C code that doesn't take down the process when crashing).

Agreed. It's not a "foreign" function if everything is in WASM. All the VM sees is WASM code; it doesn't make a difference if it was originally written in C or Rust.

"Foreignness" is an incidental property, the actual goal is interoperation. Being able to interoperate with C code from within the sandbox is both useful and something that BEAM doesn't do. In the meantime, there's nothing preventing anyone from doing regular FFI from within the part of the Rust program that lives outside of Lunatic, if for whatever reason the sandbox is insufficient.

I don't know much about WASM, but how does this work with shared libraries? Is it even possible to call a shared library without any safety guarantees?

WASM runs in a sandbox, it's not possible to call a shared library directly in the same way as you do in C... all sys calls are "imported" functions that a host exposes to the WASM code running... this host has the ability to do anything around a sys call, which is likely how they manage to forbid access to network/filesystem for example.

The host can be the browser when running on the browser, or it can be one of the WASM runtimes (wasmtime, wasmer, Lucet)

Hi, can you please explain the Wasm part? Is there any particular reason to use Wasm? I don't know wasm but does it use a specific threading/concurrency mechanism in it?

Looking at the Lunatic readme: https://github.com/lunatic-lang/lunatic#architecture

"Lunatic treats WebAssembly instances as actors. Using WebAssembly in this context allows us to have per actor sandboxing and removes some of the most common drawbacks in other actor implementations."

And it would be something more lightweight then a process/thread/green thread I presume?

Can your runtime handle non rust wasm code?

Yes! I only provide a higher-level library for Rust, but you can use the lower-level host functions from any WASM code if you run it on Lunatic.

The submitted link does no longer for me, here's a link to the original article: https://dev.to/bkolobara/writing-rust-the-elixir-way-2lm8

To me, this looks extremely promising from a performance and developer ergonomics point of view and a fantastic use case for WASM and WASI - only limiting factor seems to be the rollout for WASI networking support.

>"Even though the threads don't do anything, just running this on my MacBook forces it to reboot after a few seconds. This makes it impractical to have massive concurrency with threads."

That is a problem of your MacBook. I can run thousands of threads when testing some servers sustainably and with no problems on my Windows and Linux laptop, never mind desktop and real servers. So it is pretty much practical. Whether it makes sense pretty much depends on what are you doing in particular. Your conclusion means zilch without context.

Someone else in the HN comments reproduced it so it's definitely an OS problem.

I did not mean it to be hardware problem. Of course it is OS related.

Running this test on my Linux desktop takes 9.9 GB of virtual memory (although very little is actually resident) and 42% of my CPU. So, it doesn't crash my computer, but this seems like it's way too much resource usage.

Forgot to add, for 2000 threads repeatedly doing some very short calculation (added it for fun) and sleeping 100ms in a loop: CPU consumption is around 2% (fluctuates between 1 and 2.6). RAM used 36MB. So do not know what is wrong with your system / or maybe whatever language/library being used.

Reserved stack space that can be adjusted

What's the performance like? I think this would be really interesting if comparable to something like Go.

> but I will call them processes to stay close to Elixir's naming convention.

That seems like an extremely confusing mistake given that there is already a very closely related concept called a "process"?

I would consider calling them "workers" or "isolates" since unless I'm mistaken, you have basically recreated WebWorkers (AKA isolates in V8/Dart)?

Presumably this also means you can't share memory? Very neat idea anyway!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact