Jay Taylor's notes

back to listing index

thread-pools.md

[web search]
Original source (gist.github.com)
Tags: java jvm concurrency parallelism multithreading thread-pools gist.github.com
Clipped on: 2021-07-19

Thread Pools

Thread pools on the JVM should usually be divided into the following three categories:

  1. CPU-bound
  2. Blocking IO
  3. Non-blocking IO polling

Each of these categories has a different optimal configuration and usage pattern.

For CPU-bound tasks, you want a bounded thread pool which is pre-allocated and fixed to exactly the number of CPUs. The only work you will be doing on this pool will be CPU-bound computation, and so there is no sense in exceeding the number of CPUs unless you happen to have a really particular workflow that is amenable to hyperthreading (in which case you could go with double the number of CPUs). Note that the old wisdom of "number of CPUs + 1" comes from mixed-mode thread pools where CPU-bound and IO-bound tasks were merged. We won't be doing that.

The problem with a fixed thread pool is that any blocking IO operation (well, any blocking operation at all) will eat a thread, which is an extremely finite resource. Thus, we want to avoid blocking at all costs on the CPU-bound pool. Unfortunately, this isn't always possible (e.g. when being forced to use a blocking IO library). When this is the case, you should always push your blocking operations (IO or otherwise) over to a separate thread pool. This separate thread pool should be caching and unbounded with no pre-allocated size. To be clear, this is a very dangerous type of thread pool. It isn't going to prevent you from just allocating more and more threads as the others block, which is a very dangerous state of affairs. You need to make sure that any data flow which results in running actions on this pool is externally bounded, meaning that you have semantically higher-level checks in place to ensure that only a fixed number of blocking actions may be outstanding at any point in time (this is often done with a non-blocking bounded queue).

The final category of useful threads (assuming you're not a Swing/SWT application) is asynchronous IO polls. These threads basically just sit there asking the kernel whether or not there is a new outstanding async IO notification, and forward that notification on to the rest of the application. You want to handle this with a very small number of fixed, pre-allocated threads. Many applications handle this task with just a single thread! These threads should be given the maximum priority, since the application latency will be bounded around their scheduling. You need to be careful though to never do any work whatsoever on this thread pool! Never ever ever. The moment you receive an async notification, you should be immediately shifting back to the CPU pool. Every nanosecond you spend on the async IO thread(s) is added latency on your application. For this reason, some applications may find slightly better performance by making their async IO pool 2 or 4 threads in size, rather than the conventional 1.

Global Thread Pools

I've seen a lot of advice floating around about not using global thread pools, such as scala.concurrent.ExecutionContext.global. This advice is rooted in the fact that global thread pools can be accessed by arbitrary code (often library code) and you cannot (easily) ensure that this code is using the thread pool appropriately. How much of a concern this is for you depends a lot on your classpath. Global thread pools are pretty darn convenient, but by the same token, it also isn't all that hard to have your own application-internal global pools. So… it doesn't hurt.

On that note, view with extreme suspicion any framework or library which either a) makes it difficult to configure the thread pool, or b) just straight-up defaults to a pool that you cannot control.

Either way, you're almost always going to have some sort of singleton object somewhere in your application which just has these three pools, pre-configured for use. If you ascribe to the "implicit ExecutionContext pattern", then you should make the CPU pool the implicit one, while the others must be explicitly selected.

@djspiewak Thanks a lot for sharing. The only point which wasn't clear is "You need to make sure that any data flow which results in running actions on this pool is externally bounded...". Could you explain what is the difference between "unbounded thread pool with bounded queue" and "bounded thread pool"?

Image (Asset 3/8) alt=
Owner This user is the owner of the gist. Author This user is the author of this gist.

djspiewak commented on Jan 28, 2020

Thanks a lot for sharing. The only point which wasn't clear is "You need to make sure that any data flow which results in running actions on this pool is externally bounded...". Could you explain what is the difference between "unbounded thread pool with bounded queue" and "bounded thread pool"?

Bounded thread pools contain unbounded task queues that are entirely outside your control. You can't see how many outstanding tasks there are, reschedule them, cancel them, change your semantics, etc. When you start running out of scarce resources, you need to be able to propagate that information back upstream in the form of temporary connection drops, or even better, trigger autoscaling to create more resources. You want to do this at the highest possible level in your stack, because that gives you the greatest semantic control. A thread pool hitting its thread count limit cannot interact with kubernetes to allocate a new pod.

Basically it's just about control over resource management.

Image (Asset 4/8) alt=

aludwiko commented on May 6, 2020

All sounds very reasonable. Although I didn't catch one thing. If I have externally bounded unbounded thread pool, why do I need unbounded thread pool at the first place? In other words, what is the difference between:

  1. externally bounded + unbounded thread pool
    and
  2. externally bounded + bounded thread pool
    ?
Image (Asset 5/8) alt=

calvinlfer commented on May 6, 2020
edited

I believe that unbounded thread pools will guarantee no deadlocks for blocking tasks and externally bounding it will mean that you can only feed in N blocking tasks. If the task happens to somehow bypass the external guard (which would be considered as a bug), at least there is no potential of deadlock (in the case with the unbounded thread pool but that would not be the case with a bounded thread pool)

Image (Asset 6/8) alt=
Owner This user is the owner of the gist. Author This user is the author of this gist.

djspiewak commented on May 6, 2020
edited

You can make a bounded pool where the bound is significantly beyond the upper limit on your external bounding system, but… why? Also that means you have to maintain tuning parameters in multiple places, where the consequences for them falling out of sync are poor throughput and high memory churn (or at worst, starvation).

Image (Asset 7/8) alt=

ashwinbhaskar commented on Jan 10

@djspiewak a noob doubt. I have been trying to understand why a separate thread pool for blocking IO is required?
Part 1 of state of loom says

The result is the proliferation of asynchronous APIs, from asynchronous NIO in the JDK, through asynchronous servlets, to the many so-called “reactive” libraries that do just that — return the thread to the pool while the task is waiting, and go to great lengths to not block threads.

Why are we blocking a thread from the blocking thread pool when it can be accomplished by not blocking any threads at all?

Image (Asset 8/8) alt=
Owner This user is the owner of the gist. Author This user is the author of this gist.

djspiewak commented on Jan 12

Why are we blocking a thread from the blocking thread pool when it can be accomplished by not blocking any threads at all?

It cannot be. :-)

That section of the Loom documentation is rather deceptive. The short version is that no, you cannot avoid hard-blocking threads for certain operations. Even an extremely well-written application will, in practice, have at least a handful of points where threads are blocked. This may be something like file IO (which is fundamentally blocking except on Windows or Linux kernels which support io_uring), or even something seemingly benign like constructing an InetAddress or comparing two URLs for equality (both of which do DNS lookups, which will almost certainly be hard-blocking even after Loom's implementation).

This is kind of the dirty secret here, too: Loom itself promises to remove thread blockage as a concern, which in turn removes the scarcity of threads as a resource from something we need to think about. This would be lovely! Except it won't work without a lot of footnotes and exceptions. Certain things do fundamentally block because of capabilities of the underlying operating system, and our only option is to build facilities which make it easy to do the right thing in this circumstance: shunt the blocking task out of the limited compute pool and onto its own (unlimited) resource set.

ashwinbhaskar commented on Jan 17

@djspiewak when asynchronous libraries do async IO (file/network) operations (with a call back), they aren't actually blocking any thread. Or are they?

Owner This user is the owner of the gist. Author This user is the author of this gist.

djspiewak commented on Jan 17

they aren't actually blocking any thread

Usually not. However, there are a distressing number of libraries out there which actually cheat. The first version of the AWS SDK (well, most components thereof) was a good example of this. The underlying implementation was blocking, but they faked an asynchronous version by hiding a thread pool within their implementation. So literally the worst of both worlds. An even better example of this are the Java NIO2 file APIs. AsynchronousFileChannel, on any platform that isn't Windows, is actually backed by conventional blocking IO and a hidden thread pool that you cannot control.

ashwinbhaskar commented on Jan 17
edited

@djspiewak oh okay, One last question: is async (from the Async type class) method in cats-effect-2 and cats-effect-3 blocking any thread? Is it even the right question to ask because in this example, we have a callback for future completion inside async block. But if the future completion itself is blocking a thread underneath then would it matter if async blocked a thread or not?

Owner This user is the owner of the gist. Author This user is the author of this gist.

djspiewak commented on Jan 17

It's a good question to ask. :-) async does not block a thread; that's basically the whole point. However, if the underlying effect blocks the thread, then obviously async can't really save you. So in this case, if the Future is non-blocking (as it should be!) then wrapping it up with async will convert it into an IO which runs the Future and produces the result, without any blocking whatsoever. If the Future does block, then neither async nor IO will make the problem any worse, but the thread will be blocked nonetheless.

Does that mostly answer the question?

ashwinbhaskar commented on Jan 17

@djspiewak yes, that answers it:) thank you for your patience and explanation!:))

Add header text Add bold text <cmd-b> Add italic text <cmd-i>
Insert a quote Insert code <cmd-e> Add a link <cmd-k>
Add a bulleted list Add a numbered list Add a task list
Directly mention a user or team Reference an issue or pull request
Attach files by dragging & dropping, selecting or pasting them. Styling with Markdown is supported