Oh yes, I’ve seen several programs that spend ~30% of their CPU cycles formattin...

Jay Taylor's notes

Oh yes, I’ve seen several programs that spend ~30% of their CPU cycles formattin... | Hacker News

Original source (news.ycombinator.com)

Tags: logging java cpu log4j verbosity cpu-wasted-rendering-debug-messages news.ycombinator.com

Clipped on: 2021-12-20

Oh yes, I’ve seen several programs that spend ~30% of their CPU cycles formatting strings that are immediately thrown away because the log level is not high enough. Now you can also include vulnerabilities with no extra effort!

Years ago we had a system that was performing really badly. It turns out that it spent over 50% of its time on

log.debug(xmldoc)

The debug method took a string, and Java was converting the xmldoc to a string. This was back in the 1.4/1.5 days.

dmurray 2 days ago | prev | flag| favorite [–]

In Java? Surely one of the benefits of the JIT is to compile logger.debug() into a no-op if the log level is not high enough.

clon 2 days ago | parent | next | flag| favorite [–] [collapse root]

You would still be doing the job of concatenating together the error message, possibly rendering some complex objects to a string, that is then fed to the no-op. The point parent was making is that there is also a performance aspect to this, in addition to the security aspect.

> You would still be doing the job of concatenating together the error message, possibly rendering some complex objects to a string, that is then fed to the no-op.

Why would you still do be doing that work? If a value goes into a no-op, then the value isn't computed. (In theory - I'm sure it doesn't always work out 100% of the time in practice.)

Unless the JVM is sure the logic is not effectful, it couldn’t eliminate it.

There's an even simpler reason why the JIT compiler can't prune it: it's possible to dynamically change the logging level at run time.

That is actually a reason that a JIT compiler can prune where an AOT compiler can't. JIT can deoptimize when assumptions change.

One of the bigger wins with the JVM is assuming that a virtual method can be called statically if there are no derived classes. That's a huge win for every leaf class in the class tree. The assumption can change every time a class is loaded of course. The optimization is called devirtualization and it can be combined with inlining to get even bigger wins.

Fair point... but it seem like, even so, doing that to monitor whether a single integer variable might change sounds like a lot of added complexity. Are there other use cases that would help to justify it? Reducing the impact of failing to follow logging best practices doesn't seem like an obviously sufficient cause.

It's a standard part of the JVM https://docs.oracle.com/javase/7/docs/api/java/lang/invoke/S....

The JVM knows what string concatenation does.

"hello" + myFooObject.toString() Can you eliminate that? It's impossible to say without knowing whether or not myFooObject.toString() is effectful. Maybe the JVM can make that assessment. Maybe not. I don't know, but it's not as trivial as "knowing what string concatenation does".

A straightforward way to do this is for Java to assume built in types like strings are side effect free, but custom types may not be. This would likely cover a good portion of log lines but not all, obviously.

But what if one of the arguments is a function call? Then it isn't easy to prove there are no side effects

If it’s too big to in-line then yes.

That’s says absolutely nothing about whether the function has a side effect.

If you can inline it into the compilation unit, then you can see if it has side effects or not.

The noop determination is made during runtime within the log method called. So, it's LOG.info(obj.toString()), wherein that method inspects the log level setting, then decides whether to ignore.

Are you saying that Java does or should "optimize" in such a way that it branches to the logic within each method called during runtime, inspects which code would be executed per possibly mutable runtime values (like log level), considers possible side effects (in-method and in parameters) then decides whether to even invoke the method at all?

Isn't the point that you don't generally stuff your log of things that are only strings but of things that can become strings? (Asking since I know of your work with truffle)

I assumed things like the names of resources that are already strings because you’re using them in the actual program?

A typical logger.info("user {} did the thing", user) can skip the actual string interpolation and user value stringification, if the log level is not > info. However, logger.info("user " + user + " did the thing") cannot avoid at least the execution of user.toString(), even after jit optimizations, unless the jit could prove that toString does not have side effects. But I don't believe the jvm jit tries to do that. Am I wrong?

If user is a string, then #toString() is a trivial method, will inline, and become a no-op.

However, if user happened be an user-defined User object it would be hard to guarantee that it could be inlined

Yes

Logger levels are mutable to allow switching them at runtime, JIT can not make an assumption that the log level will stay the same.

> JIT can not make an assumption

The whole point of a JIT is that it can make assumptions. 'Assumption' is literally the term used in the JIT to track things like this.

Just because a JIT is "making assumptions", it does not mean it could or should make the assumptions needed for what you are calling for here. Generally speaking, optimizers should only be making semantically-neutral assumptions (optionally under the further assumption that the code fully conforms with any constraints the language spec. puts on program behavior, which is somewhat controversial in C.) Furthermore, a JIT can only spend a limited amount of resources in determining whether skipping the evaluation of an argument expression would be semantically neutral. (At this point, functional-programming enthusiasts are rolling their eyes...)

If logging was part of the language, one could simply rule by fiat that arguments shall not be evaluated unless logging is enabled, but log4j is just another collection of user-defined classes, and gets no special treatment.

You might want to refresh yourself on who you're talking to here on this one. I'm sure that there are certainly people more familiar with the JIT behavior of HotSpot and Graal out there, but I don't know that any of them post here on the topic.

I'm not saying that when it comes to the JVM's optimization behaviors that I'd jump without checking that there's water on Chris's assurance, but I'm not saying I wouldn't either.

It occurred to me today that, somewhat ironically, JNDI+LDAP calls appear (at least to me on first sight) to be prime examples of expressions that would be difficult for a JIT optimizer to analyze for side-effects, given that this is a mechanism for making remote calls. If I am mistaken in so thinking, I would be genuinely interested in learning about how it is done.

To be clear, I am not suggesting that these sort of expressions are inherently insecure - that would depend, I think, on whether they involve user input in a way that allows an attacker to take control of the call. A hard-coded JNDI lookup from the arguments list of a log4j call might be inefficient, but no less secure than if made anywhere else.

I am duly chastened, and ready to learn what boneheaded mistakes I am making. For one thing, I take it that that the analysis is tractable for most practical cases.

You might want to google who Chris Seaton is lol.

It can only make safe assumptions.

Assuming a global mutable variable will never change is generally not a safe assumption.

I suppose, at least in principle, that something like the GraalVM AOT compiler has the option to scan all of the code to verify that it never is mutated. But HotSpot cannot, because it only JITs code as it is loaded.

Actually this is not true. Hotspot can deoptimize code. It optimizes making some assumptions. If these assumptions stop holding, code gets deoptimized, measured for a while, and reoptimized with new assumptions.

For example, an interface implemented by only 1 class might get the class inlined. If a second implementer pops up (which can happen at runtime for e.g. some dynamically generated class), all of this will get undone.

There was a series of small articles with all these things, but I can't seem to find them right now.

> Assuming a global mutable variable will never change is generally not a safe assumption.

Put your logging level behind a switch point and C2 will treat it as a constant but still let it be changed - that works today.

Obviously with your background you know this offhand, but is there a mere-mortals reference for JVM JIT behaviors and capabilities to keep in one's back pocket? I have a pretty strong understanding of a lot of the Java universe (particularly around the JMM) but modern JVM JIT behaviors are a black box to me.

I believe the problem is just that the toString methods implicitly called when concatenating strings and objects, can execute side effects. And the JIT has to keep those effects if it cannot prove they are just heating up your CPU

That struck me at first as bizarre - but is it because one can override a toString method with something that has side-effects? (Or give a user-defined class a tosSring() with side-effects, I suppose.)

Well, yes, you could simply put a System.out.println in there and it would be valid java

I'm confused. Do people not overload toString() on a regular basis along with .equals() and .hashCode() anymore? Common case was prettyprinting of deep object structures, and yes, technically, you could update state while doing it, but dear God, why? Hell, API method names don't actually mean squat as far as the JVM is concerned. It's just the consensus of the community to go with convention over config. If everyone does the same thing with methods of the same name, it decreases cognitive overhead.

Did I fall into an alternate dimension where 90% of Java seems to only use preexisting objects now, and people hardly ever write their own code?

(No disrespect to the asker; it's just such a surprising question to stumble across, that mentally speaking, I had to deoptomize my mental model of the world to accommodate for people who may not have worked much with Java).

I get functional is all the rage, but wow. Default toString() impl is inherited from Object, and basically gives you class and instance number.

>Logger levels are mutable to allow switching them at runtime, JIT can not make an assumption that the log level will stay the same.

Java HotSpot can make and verify that assumption. And switch code when the level changes.

So you've moved the IF statement that was checking the logging level from the program code into the JIT. Great. No saving. (And I doubt it actually does this - as it's not gaining anything)

You didn’t move the IF. you literally removed it.

If (a) DoExpensiveStuff());

Becomes

When modifyA() RecompileIfWithNewAValue();

As long as you call the initial if more often than you modify (a), you’re fine (and the JVM was able to see that you called your if 10k times without modifying the value of (a) even once)

Respectfully, your post is very wrong :) JIT compilers are very fast and run once to prep the code that will run many times. If you had flat code that was always run exactly once the games would be lesser, yes

Sure it can, since the JIT has access to runtime information and all the code. It could reorder the steps so first the log level is checked, then that block is run.

log.debug(“foo count: “ + fooHandler.incrementAndReturnFooCount());

Should the JIT call incrementAndReturnFooCount if debug logs are disabled? This is a long-recognized pitfall of C preprocessor macros that look like function calls but may simply be defined away, causing unexpected behavior.

It should call it if it has side effects, otherwise not. Java isn't C; the runtime can be a lot smarter.

I guess I’m not up on the state of the art with respect to the JRE’s ability to determine whether arbitrary code has side effects. That’s quite a broad problem. If I write fizzbuzz but use log.debug for output, does it optimize my main() to be empty unless I run it with debug enabled?

How could it? The log level is not a constant. It is evaluated at runtime, every time. You can’t compile that out. Am I wrong?

The JIT works by making assumptions using runtime information, and by discarding compiled code when the conditions change and the assumptions are not valid anymore.

vbezhenar 2 days ago | parent | prev | flag| favorite [–] [collapse root]

JVM is not that smart.

Search: