Cap'n'Proto remote vuln: pointer overflow check optimized away by compiler

Jay Taylor's notes

back to listing index

Cap'n'Proto remote vuln: pointer overflow check optimized away by compiler | Hacker News

[web search]

Original source (news.ycombinator.com)

Tags: protobuf protocol-buffers capnproto cve news.ycombinator.com

Clipped on: 2017-08-03

Amusingly, the "technically-correct solution" is also incorrect, albeit probably well-behaved on most widely used systems. Rather than comparing pointers, they cast to uintptr_t and compared those, but as I commented on the commit in github:

    This is not guaranteed to work. C guarantees that casting a pointer
    to a uintptr_t and back results in a pointer which compares equal
    to the original; but it does not guarantee that (uintptr_t)p + 1 ==
    (uintptr_t)((char *)p + 1), that p1 < p2 is equivalent to
    (uintptr_t)(p1) < (uintptr_t)(p2), or even that (uintptr_t)(p) ==
    (uintptr_t)(p); an evil but standard-compliant compiler could
    implement casting from pointer to uintptr_t as "stash the pointer
    in a table and return the table index" and casting back as "look
    up the index in the table".

rbehrends 103 days ago [-]

A concrete example would be the Transputer architecture. Transputers had signed addresses (from -2^(k-1) to 2^(k-1)-1). The null pointer was in the middle of the address space (which already wasn't without problems) and an unsigned comparison would have created a wrong ordering of addresses.

The compiler for such a machine does not even have to be evil.

kentonv 103 days ago [-]

The unsigned comparisons in this code snippet would work fine as long as the object bounds didn't cross the negative-positive boundary (i.e., zero), which it couldn't because the C/C++ standards require that zero is not a valid address for any object. So, I don't think there's a problem with the architecture you describe.

(That said, as mentioned elsewhere, we're talking about a debug check in a non-security-sensitive code path...)

lomnakkus 102 days ago [-]

> the C/C++ standards require that zero is not a valid address for any object. So, I don't think there's a problem with the architecture you describe.

IIRC it's a bit more complicated than that...? Isn't it more along the lines that no object may occupy the NULL/nullptr 'address' and that 0 (literal) in a pointer context must be interpreted as NULL/nullptr? IIRC it doesn't specifically say anything about the address 0x00000000 (add bits to taste).

unwind 103 days ago [-]

Agreed. I don't see any comments on the commit though (assuming "the commit in github" is the first link in the article, i.e. https://github.com/sandstorm-io/capnproto/commit/52bc956459a...). Is there another commit?

Also, I'd be very interested in hearing a suggestion as to how this should be implemented. Of course you might have mentioned that in the comment I can't find, so don't repeat it if a link suffices of course.

I'd look into working explicitly with uint64_t, and stop doing pointer subtractions.

EDIT: Here's the technically correct code from the bottom of the advisory, that I failed to see: https://github.com/sandstorm-io/capnproto/commit/2ca8e41140e....

madmoose 103 days ago [-]

For the simplified example given in the CVE:

  word* target = segmentStart + farPointer.offset;
  if (target < segmentStart || target >= segmentEnd) {
    throwBoundsError();
  }
  doSomething(*target);

Here's how I'd do it. (Caveat emptor, not knowing all the relevant types in the example, etc.)

  size_t segmentLength = segmentEnd - segmentStart;
  if (farPointer.offset >= segmentLength) {
    throwBoundsError();
  }
  word* target = segmentStart + farPointer.offset;
  doSomething(*target);

In general you can never compare pointers unless they point into the same array or object. In fact, even creating an invalid pointer is UB, you don't event have to compare or dereference it.

cperciva 103 days ago [-]

There's a link at the bottom of the article to "A technically-correct solution has been implemented in the next commit".

Also, I'd be very interested in hearing a suggestion as to how this should be implemented

I don't know enough about the problem space to say for certain, but in general I'd say that "work with unsigned integers and convert them to pointers only after performing all necessary sanitization" is good advice.

unwind 103 days ago [-]

Ah, thanks! Didn't see that. That's some hairy code, for sure. :|

I don't even know if these things are the same in C and C++ any more (the code in question is in C++).

/me runs back to C.

cperciva 103 days ago [-]

I wondered about that, so I checked the latest C++ standard. What it says about uintptr_t is basically "this should do what it does in C".

kentonv 103 days ago [-]

Hi Colin,

This line you've commented on is a debug assert, meant to catch a bug that used to exist in the code but doesn't anymore. This check is not required for security and could just as well have been deleted altogether. It is compiled out of opt builds.

e12e 103 days ago [-]

That's not readily readable on a cellphone, unfortunately - this is:

> This is not guaranteed to work. C guarantees that casting a pointer to a uintptr_t and back results in a pointer which compares equal to the original; but it does not guarantee that (uintptr_t)p + 1 == (uintptr_t)((char * )p + 1), that p1 < p2 is equivalent to (uintptr_t)(p1) < (uintptr_t)(p2), or even that (uintptr_t)(p) == (uintptr_t)(p); an evil but standard-compliant compiler could implement casting from pointer to uintptr_t as "stash the pointer in a table and return the table index" and casting back as "look up the index in the table".

kutkloon7 103 days ago [-]

For people claiming that this is a compiler bug, it really is not.

I am no expert, but what I understood is that the C standard defines some situations (such as an overflow error) which result in "undefined behaviour". In this situation, the compiler is free to do whatever he wants. This is in fact what happens here.

This is clearly annoying to programmers. In this case, it is even hard to avoid undefined behaviour even if the source of undefined behaviour (a pointer overflow) is known.

Why is this done? This way the compiler can do some aggresive optimizations which are only valid if there is no undefined behaviour (e.g. no overflow, null pointer acces...).

There are various good articles about this.

To make the situation more complex for programmers, it is possible to write programs which exhibit undefined behaviour but work fine in practice. Until the compiler tries to do a specific optimization.

This is basically why you shouldn't use C. Long-time C programmers know a list of operations which are undefined behaviour, and are usually able to point out a few of these in a program written by newbie (ironically, these programs may run and pass tests just fine).

frederikvs 103 days ago [-]

To be specific, only certain types of overflow are undefined behaviour. Signed integers and pointers have undefined behaviour on overflow. Unsigned integers on the other hand are defined to wrap around to 0.

For those interested, good places to start learning more about this are John Regehr's blog [0], or the llvm blog [1].

[0] http://blog.regehr.org/archives/213

[1] http://blog.llvm.org/2011/05/what-every-c-programmer-should-...

cperciva 103 days ago [-]

Technically unsigned integer wrapping isn't considered to be "overflow".

cperciva 103 days ago [-]

Replying to myself because it's too late to edit the comment: I'm not sure why people are voting that down. Section 6.2.5 paragraph 9 of the C99 standard states that:

A computation involving unsigned operands can never overflow, because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting type.

GTP 103 days ago [-]

I'm no expert too, but please read this opinion about the problem [0]. I think he has a good point saying that undefined behavior != the compiler has the right to do wathewer it wants. [0] http://article.gmane.org/gmane.os.plan9.general/76989

geocar 103 days ago [-]

I think it's a major problem with our industry that so many programmers are amused by being technically correct.

Tradition notwithstanding, if a compiler deletes all my files because of the position of a comma in nested brackets, then I'm going to simply not use that compiler. I don't give a fuck if it's technically correct.

"Oh it's not a bug". Good for you. I don't care if you want to call it a feature or a bug. It's wrong. Press the issue and I'll find someone smarter.

kutkloon7 103 days ago [-]

It's true that I like to be technically correct. I don't consider this a problem itself. In fact, it can be quite important for programmers and mathematicians to be technically correct, as this bug in Cap'n Proto illustrates quite nicely.

However, this is not the only reason I'm pointing out that this is not a compiler bug. The other reason is that there are certain implications:

1. The compiler developers are likely to declare that this is not their problem.

2. The problem may exist for other compilers as well: even if it complies to the standard it may do this. Even worse, it may only show up in a new version of the compiler, or in specific situations.

I am not saying that this is a good situation and programmers should just be more careful. Everybody makes mistakes.

The root of the problem is at the specification. Ideally, there would be no undefined behavior. From an optimizer's point of view it is very sensible to assume that the programmer will not invoke undefined behavior and use optimizations based on this principle. People surely love a compiler which produces fast code, so it may not be desirable to totally eradicate undefined behavior (and end this kind of optimizations).

The most practical remedy I see is to add a debug flag which crashes or somehow indicates undefined behavior. Indeed, GCC has done this. So ultimately, we seem to agree that the compilers should change to improve this situation.

mannykannot 103 days ago [-]

Agreed - I do not think the issue is with being technically correct, but in thinking that it always settles the matter and ends the discussion.

It would seem particularly unfortunate if a compiler were to do static analysis for undefined behavior, but only use it in optimization. From a cursory search, I found this blog post [1] explaining why LLVM does not warn about such things (at least in 2011, when it was written - attitudes may have changed, and the Clang project now has the UndefinedBehaviorSanitizer [2]). One of the arguments is that it is difficult to explain what the problem is, but in my view, that is an argument for making doing so a priority.

[1] http://blog.llvm.org/2011/05/what-every-c-programmer-should-... [2] http://releases.llvm.org/3.8.0/tools/clang/docs/UndefinedBeh...

kutkloon7 103 days ago [-]

IMHO, more work should be done on proving properties of programs. If one could make a programming language like Dafny [1] less quirky but more developer-friendly, you can have programs which are both good-performing and proven to be consistent with their specification (but again, quirks in the specification may cause troubles).

mannykannot 103 days ago [-]

Thanks for bringing Dafny to my attention. I agree with your first sentence, which is why I think it is a big deal that it can be a great deal harder, in practice, to reason about the behavior of a C program, if optimized, than the same program un-optimized. For one thing, unless you successfully reason about what the optimized code does when deploying an optimized build, then you are not deploying the program that you analyzed or inspected.

I think there is a sort of analogy with a large black hole here [1]: you can slip across the "event horizon" into undefined behavior without noticing, but, especially if you are using an optimizing compiler, there may be no escape.

[1] Maybe not, if the black-hole firewall hypothesis is correct.

bluejekyll 103 days ago [-]

I'm totally unfamiliar with Dafny, the front page didn't do much more than specicify the goals.

Any chance you could briefly compare it to Rust or another systems level language trying to remove undefined behavior?

nickpsecurity 103 days ago [-]

It's more like SPARK Ada. It's a limited language designed for easy analysis by an automated prover. You can do preconditions, postconditions, and invariants in it. It was used in ExpressOS for mobile and I think IronClad Apps, too.

bluejekyll 103 days ago [-]

Are those pre and post conditions always executed? Or are they optional assertions that can be enabled/disabled?

neilparikh 103 days ago [-]

If I'm reading the description correctly, the pre and post conditions are analyzed statically during compiling, and not added to the output code. So it's a similar idea to type checking, where the program won't even compile until the versifier is sure your code meets the conditions.

GTP 103 days ago [-]

Thanks! I didn't knew about this GCC's feature.

jjnoakes 103 days ago [-]

I don't know why you think this has anything to do with lack of intelligence.

MaulingMonkey 103 days ago [-]

I'd agree with "undefined behavior != the compiler should do whatever it wants" - but I feel I must admit that the standard lets it, that many compiler authors apparently disagree with me, and that even those who do are sympathetic to my point of view or agree with me may have a point when they say that reason about programmer intention is a difficult bag of worms - and that preserving such, while maintaining "critical" optimizations, may not be as easy as it sounds.

There's always -O0.

GTP 103 days ago [-]

Of course if the compiler has a certain behavior I have to deal with it, I just feel that certain optimizations based on undefined behavior are too aggressive, later this evening I will maybe write in more detail about this in anoter comment below.

to3m 103 days ago [-]

My favourite UB-related links:

"These people simply don't understand what C programmers want": https://groups.google.com/forum/#!msg/boring-crypto/48qa1kWi...

"please don't do this, you're not producing value": http://blog.metaobject.com/2014/04/cc-osmartass.html

"Everyone is fired": http://web.archive.org/web/20160309163927/http://robertoconc... (EDIT: this one just gets better every time I read it...)

I also found a new one thanks to the comments here, which you can find elsewhere in the comments - but I'll add a link to it here anyway, for good measure:

"No sane compiler writer would ever assume it allowed the compiler to 'do anything' with your code": http://article.gmane.org/gmane.os.plan9.general/76989

kbenson 102 days ago [-]

OMG, that first link is golden. It starts off by an email from DJB no less, which includes this snippet:

    I should note that this plan, throwing away gcc and clang in favor of a 
    boring C compiler, isn't the only possible response to these types of 
    security holes. Here are several other responses that I've seen: 
    
       * Attack the messenger. "This code that you've written is undefined, 
         so you're not allowed to comment on compiler behavior!" The most 
         recent time I saw this, another language lawyer then jumped in to 
         argue that the code in question _wasn't_ undefined---as if this 
         side discussion had any relevance to the real issue.

And to top it off, below Kurt Roeckx explains to someone what part of the problem is:

    The undefined behaviour of C is deliberate so that compilers can 
    make optimazations.  They assume you write code that only has 
    defined meaning and generate code for that defined meaning.  That 
    means for instance that if you add 2 signed integers they're going 
    to assume it doesn't overflow and then for instance make 
    assumptions based on that on wether other code ever going to be 
    executed or not.

Niiice. This exact scenario is what happened and it's given as an example of problematic optimization of undefined behavior by compilers. If I felt it was pointless to really expect any change before, I definitely do now.

I'm definitely favoriting your comment so I can find it easily later.

mpweiher 103 days ago [-]

There is a difference between "complies with the standard" and "is bug free".

There were C compilers before there was a C standard. They had bugs.

MaulingMonkey 103 days ago [-]

> There is a difference between "complies with the standard" and "is bug free".

You're not wrong - but the dividing line between bug and feature request is subjective, and just how much (meaningful) difference between the two there is depends on the authors of the compiler.

If you compile with -fno-strict-aliasing and the optimizer breaks your code based solely on strict aliasing violations anyways, by all means, report that as a bug.

If you use a compiler that has no -fno-strict-aliasing equivalent, by all means, switch to a better compiler when they WONTFIX your feature request.

kutkloon7 103 days ago [-]

undefined behavior = the compiler has the right to do wathever it wants

This is exactly what the C standard says. And yes, that is problematic, especially since there is no way to detect undefined behaviour. I consider this to be the main problem. If C compilers would simply check for undefined behaviour in the debug build, there would be a lot less problems.

lmm 103 days ago [-]

What the compiler does in the face of undefined behaviour is out of scope for the standard, by definition.

The problem is that the standard was written as a minimum that programmers could rely on even the worst compilers to implement, with the intention that compiler writers would come up with improvements that would then be standardised (the same way that happens with e.g. web standards). Instead compilers regressed to doing the minimum permitted by the standard.

_pmf_ 103 days ago [-]

> undefined behavior = the compiler has the right to do wathever it wants

It would also be allowed to follow the principle of least surprise. Most instances of UB have a specific expected outcome; for signed overflow, one would probably expect the architecture specific signed overflow handling to happen.

It's a bit paradox that some hard errors for which no such expected outcome exists, such as an address violation, can easily be caught by the programmer (SIGSEGV), but errors (UB) for which a specific, sensible reaction exists are not only silently tolerated, but actively exploited when searching optimization.

wbl 103 days ago [-]

It's not actively exploited, just ignored when checking the optimization is valid. Putting values in loops into registers can require making assumptions about different types of pointers pointing to different things.

vertex-four 103 days ago [-]

The issue there is that you pretty much have to use undefined behaviour in C and certain low-level applications of C++. C is therefore a broken language IMHO, but... there we have it.

_pmf_ 103 days ago [-]

That's wrong. All firmware I worked with used assembly for early initialization (startup.s) in those cases where the only alternative would be to rely on UB. I've worked with firmware that only worked with -O0 and failed with -O2, but these were always bad implementations, not due to some inherent requirement to rely on UB (most common example: failing to recognize that using volatile does not guarantee a memory barrier).

vertex-four 103 days ago [-]

> I've worked with firmware that only worked with -O0 and failed with -O2

But as we can see in this vuln, there's functionality that works in the normal case but where important checks are elided, and compilers might decide not to do anything with a piece of undefined behaviour in one version but not in the next. Whether a piece of code works in -O2 doesn't prove it doesn't contain undefined behaviour.

jjnoakes 103 days ago [-]

I don't think one ever has to rely on undefined behavior.

If you think you do, you can usually rewrite your code to avoid it, or rely on implementation defined behavior for specific compilers.

And if that is not low level enough, you should be using assembly.

GTP 103 days ago [-]

>This is exactly what the C standard says. No, please read the page that I linked: there are historical reasons to believe that the meaning of undefined behavior is different.

kutkloon7 103 days ago [-]

I have read it. The rant does explicitly not quote the C standard, and instead informs us of how compilers should work according to the author. Historical context does not change the content of a standard.

From the 2007 draft for the C99 standard[1]: "undefined behavior: behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements"

[1] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf

GTP 103 days ago [-]

The International Standard imposes no requirement != The compiler can use flawed logic while making assumptions. By flawed logic I'm reffering to the argument "this situation is undefined so it never happens", in particular applied to the example of pointer dereferencing: when GCC see a pointer being dereferneced it instantly assumes that it isn't null, so if the programmer later explicitly checks for the value being not null GCC ignores it and doesn't translate that part of the program. This can introduce severe vulnerabilities only because the compiler is trying to do an optimization that I find too much aggressive: it decides not to compile an entire part of a program!

cperciva 103 days ago [-]

C is unusual in having an explicit concept of undefined behaviour. But that doesn't mean that other languages don't have undefined behaviour... it just means that nobody has gone through the standard (if there even is a standard beyond "the language does what this implementation produces") and identified all of the gaps.

johncolanduoni 103 days ago [-]

Is it that unusual? I've seen undefined behavior be specifically mentioned in both the Java and CLR standards. Rust's (admittedly squishy) "reference" also specifies a lot of behavior as undefined explicitly (along with a general "if you break mandated invariants behavior is undefined").

steveklabnik 103 days ago [-]

In Rust's case, this is limited to unsafe code, at least; safe code should not have UB. (Though, IIRC there is one case right now, but that's a bug) UB is basically "breaking the invariants safe code expects."

Overflow is an example of something that's not UB in Rust, for example. (It's a "program error" and well-defined as two's compliment wrapping.)

Asooka 103 days ago [-]

Maybe. I really want to be writing code for a 2-complement integer CPU with 64-bit registers and a unified memory model, where addresses are unsigned integers and address 0 is NULL. I really, really want a portable-assembly flavour of C which would be machine target-dependent and would e.g. throw an error when compiling a pointer comparison for a machine with a nonuniform memory model, if the pointers cannot be proven to be part of the same object. The current model where parts of my code can just get deleted SILENTLY is really hard to reason about. At least add an -Werror-deleted-code (or -Werror-tautology for code that's tautological due to UB), so I can be alerted if that optimisation is done.

jjnoakes 103 days ago [-]

I think if you had a way to enable warnings about deleted code you would be flooded with warnings from macro expansions, function inlining, template instantiations, and other valid and desirable optimizations.

caf 103 days ago [-]

I don't think it's that hard to reason about.

Checking for overflow (and pointer-out-of-object-bounds, which is a special case of overflow) in C is like defusing a bomb: you can't cut the red wire and then see if it exploded, you have to check that it won't explode before you cut the wire.

naasking 103 days ago [-]

> I am no expert, but what I understood is that the C standard defines some situations (such as an overflow error) which result in "undefined behaviour". In this situation, the compiler is free to do whatever he wants.

Except the proper answer to this scenario is to refuse to compile with an appropriate error, not to eliminate your code.

zAy0LfpBZLC8mAC 103 days ago [-]

You are asking for the impossible. It is not generally possible for the compiler to detect whether undefined behaviour can occur.

Take a function that returns the square of a signed integer: As long as it's called with arguments that are small enough, the behaviour is perfectly defined. But the compiler cannot necessarily know which values might get passed to it at runtime.

Now, it could in principle add runtime checks that do something defined, like abort the program, when the preconditions for defined behaviour are not met--but that often makes the generated code slow.

That is why the standard says that the compiler can do anything it wants in the case of undefined behaviour: It's just a different way of saying that it's the responsibility of the programmer to make sure that the preconditions for defined behaviour are never violated, so the compiler can generate code based on the assumption that that is the case, instead of littering the code with tons of overhead for the case that the programmer did something wrong.

Compilers don't detect undefined behaviour and then use those detected instances of UB in order to mess up your software, they simply operate on the assumption that there is no UB in your code--which means that the code they generate is correct if there is indeed no UB, but there are no guarantees as to what happens if you don't hold up your implicit promise to not invoke UB.

kbenson 103 days ago [-]

Any time a compiler has elided some code it's because it has detected that it can because it's undefined, so in the context of this argument saying that the compiler cannot determine undefined behavior is not useful, as we are only talking about the times it already has.

The obvious interpretation of the request to have the compiler error is to have the compiler error in any case it would elide statements. That may be impractical in many or most cases, but it's a different problem than what you are talking about. Presumably there would be some cases where the compiler could warn or abort (if a flag was used) if branches were elided based on undefined behavior and those same branches also included an abort of some type shortly thereafter. Or maybe some better heuristic that I'm not thinking of.

Hard is not the same as impossible, and imperfect is often better than nothing.

zAy0LfpBZLC8mAC 103 days ago [-]

> Any time a compiler has elided some code it's because it has detected that it can because it's undefined, so in the context of this argument saying that the compiler cannot determine undefined behavior is not useful, as we are only talking about the times it already has.

No, you have it all backwards, and that's not what's happening. The compiler doesn't elide code because it is undefined. The compiler elides code because it would never be executed unless it happens to be called with arguments that would produce undefined behaviour anyway.

Take this, for example:

    void foo(int *x){
      *x=5;
      if(x==NULL)abort();
    }

The operation that is potentially UB here is the *x=5--if x happens to be NULL. But the compiler doesn't remove that assignment, it removes the check following it. And it does so because execution could never reach that check in a defined manner unless it is false: Either x is NULL, then the behaviour of the assignment is undefined (and commonly would cause a segfault, for example), or it is not NULL, then the check is superfluous.

And there is nothing necessarily wrong with that code: It could be (and such code is common) that it is actually never called with a NULL pointer, in which case the behaviour is perfectly defined.

If a compiler actually determines that your code will always exhibit undefined behaviour, then chances are it will indeed warn you, but that just isn't what usually happens, and it's not the cause of such bugs. The compiler doesn't say "this is undefined, therefore, let's screw it up", it's exactly the opposite: It says "if we assume that this code is never called with arguments that would cause undefined behaviour, what is the most efficient machine code that we could map it to?"

> The obvious interpretation of the request to have the compiler error is to have the compiler error in any case it would elide statements.

That would be plain idiotic. It is perfectly normal to have tons of dead code, erroring out in that case would just make it impossible to compile anything.

You have to consider that the compiler isn't interested in the elegant structure of your code, so it mashes it all up to figure out the most efficient machine code for the whole thing. So, you might have code that calls lots of functions on an array, say, where each function does a bounds check. Now, the compiler potentially will inline it all into one big spaghetti function. And then it will figure out that many of the bounds checks are actually redundant (again, under the assumption that the code doesn't invoke undefined behaviour), so it will remove any checks and corresponding error handling paths that it can prove to be implied by preceding checks, or it might try and combine multiple checks into one.

That does not mean that there is anything wrong with your code, or that you should manually remove all the redundant checks (which might not be redundant for other call sites, after all). The code is perfectly fine, and the compiler tries its best to remove anything that you don't actually need.

> Hard is not the same as impossible, and imperfect is often better than nothing.

True, but also besides the point. Compilers do issue warnings for lots of stuff, and more are added regularly, but much of what you are suggesting is actually not in any way even a coherent idea, and as such, is indeed not just hard, but impossible.

kbenson 103 days ago [-]

> No, you have it all backwards, and that's not what's happening. The compiler doesn't elide code because it is undefined. The compiler elides code because it would never be executed unless it happens to be called with arguments that would produce undefined behaviour anyway.

You're right, it's the interaction between the undefined behavior and the ability to reason about what's possible that's the problem. It's also not entirely pertinent to the overall point, which is that the compiler does know when it elides code, and optimizing out instructions is specifically what this particular issue is about.

> That would be plain idiotic. It is perfectly normal to have tons of dead code, erroring out in that case would just make it impossible to compile anything.

The compiler doesn't need to error on all instances of this, but a flag to have the compiler warn or error if it would remove a unique branch and that branch may exit prior to returning control, that could trigger the desired behavior. If the compiler has enough knowledge to determine redundant code, then it has enough knowledge to know whether it is removing code that is not due to duplication.

C compilers have historically chosen to error on the side of speed, not on the side of safety, and we've built ourselves into a corner. Any new language introduced today that said "well, there's some interesting interactions sometimes if you don't pay close attention, and the compiler/VM might remove statements you write because they are testing those same weird interactions[1] as it things they can't happen" would be laughed out of town.

That the optimizations that C compilers do are complex and have many stages of optimization is not a suitable counter for the criticism that those same optimizations sometimes cause non-obvious interactions with safety checks meant to test the same edge cases that those optimizations take advantage of. That's like someone saying for safety reasons you need to see at least 20 feet of road in front of you at all times per 10 MPH on the highway, and people complaining about how that's not feasible because it would force you to slow down 10-20 MPH occasionally as you went around some turns. Yes. Yes it would. Just because you can do an optimization, doesn't mean you should.

> much of what you are suggesting is actually not in any way even a coherent idea

I'm not the original commenter, but if you're referring to my suggestion of at least warning when entire unique branches of the original code are removed, that's entirely possible. That it might require reworking or ever removing portions of the current optimization pipelines, or cause compilation speed to slow considerably is irrelevant to this particular aspect, because right now we aren't having a discussion of whether it's worth it, but whether it's event possible.

1: As happened here in this case.

zAy0LfpBZLC8mAC 102 days ago [-]

> The compiler doesn't need to error on all instances of this, but a flag to have the compiler warn or error if it would remove a unique branch and that branch may exit prior to returning control, that could trigger the desired behavior. If the compiler has enough knowledge to determine redundant code, then it has enough knowledge to know whether it is removing code that is not due to duplication.

I wrote "redundant", not "duplicate", intentionally. A check is redundant if other code implies that it cannot possibly ever end up true. It being a duplicate check is not the only way for that to happen, and it's actually the exception. Also, no, the compiler most likely doesn't have that knowledge. A compiler doesn't work the way you seem to think.

> Any new language introduced today that said "well, there's some interesting interactions sometimes if you don't pay close attention, and the compiler/VM might remove statements you write because they are testing those same weird interactions[1] as it things they can't happen" would be laughed out of town.

Or, more likely, it wouldn't. These are good reasons to not write high-level software in C, but there are also good reasons why people who actually need high speed still do use C. And it's not that we like the fact that writing correct C is hard.

You seem to imply that there is no real reason for C behaving the way it does, and that those rules for undefined behaviour only exist to make the life of programmers miserable. Those things are undefined because making them defined would actually be expensive in terms of performance.

> That the optimizations that C compilers do are complex and have many stages of optimization is not a suitable counter for the criticism that those same optimizations sometimes cause non-obvious interactions with safety checks meant to test the same edge cases that those optimizations take advantage of.

The way you phrase things suggests that you might be confused about how the compiler "reasons". The compiler doesn't read your program, sees what you mean, and then tries to find loopholes in order to misunderstand you. The compiler reads your program, and only understands what your program means according to the formal specification of the language that you claim it is written in. In the example I gave above, you seem to think that there is a call to abort() that the compiler "removes". There isn't. According to the C spec, that call is unreachable, and as such the semantics of that statement is a noop, which is what the compiler will correctly map to machine code, somehow. Explicit dead code removal on some intermediate representation of the AST is just one way that could happen, and it's an implementation detail of the compiler.

> That's like someone saying for safety reasons you need to see at least 20 feet of road in front of you at all times per 10 MPH on the highway, and people complaining about how that's not feasible because it would force you to slow down 10-20 MPH occasionally as you went around some turns. Yes. Yes it would.

Yep, that's a perfect analogy for people complaining that when they talk to a C compiler, they maybe should be writing C, and not a language that they themselves made up, if they expect the compiler to understand them. Even it's sometimes difficult.

> Just because you can do an optimization, doesn't mean you should.

I agree. But for the most part, that's not what's happening. For the most part, optimizations are not intended to break your code, but rather it so happens that, in order to optimize some code, you have to rely on all code having certain correctness properties (that it should have if it is C code, according to the C spec), which then, unfortunately, happens to break some code that doesn't have those properties. Often it's somewhere between infeasible and impossible to distinguish those cases that are correct and can thus be optimized without introducing unwanted behaviour from those that are not correct and thus break as a result.

> I'm not the original commenter, but if you're referring to my suggestion of at least warning when entire unique branches of the original code are removed, that's entirely possible. That it might require reworking or ever removing portions of the current optimization pipelines, or cause compilation speed to slow considerably is irrelevant to this particular aspect, because right now we aren't having a discussion of whether it's worth it, but whether it's event possible.

First, equivalence between pieces of code is undecidable, and second, "unique branches" is not in any way a useful concept anyway.

Sure, you can try to make a compiler detect certain instances of what seems like a safety check that doesn't ever trigger. But that will either be completely ineffective (as it only detects a small minority of cases), or it will produce tons of bogus warnings (because there are tons of cases where it is perfectly sensible to have "unique branches" in your code that are provably never taken, even ones that abort the program, and it is logically impossible to distinguish those from ones that were written with the intent to catch a runtime exception that the programmer expects to actually happen at runtime).

kbenson 102 days ago [-]

You seem to be arguing "It's not possible to catch all cases of this". I'm arguing "It's possible to catch some cases of this, and those should be caught and fixed".

In this specific case, the compiler inferred that since two pointers were added, and pointers that overflow is undefined, that it couldn't possibly be that the programmer specified undefined behavior, so it must be that there was no overflow. Since there was "no overflow", the compiler decided that the conditional statement testing for overflow could never be true, and removed that branch.

In the specific example submitted to HN, the compiler has a few options:

First, it can assume the programmer is infallible and will never make a mistake, and use that to actually change the code as defined for performance reasons.

Second, it can assume the programmer is fallible, and that without additional information, it is unsafe to alter the code based on assumptions of programmer infallibility for safety reasons.

Finally, it can assume the programmer is fallible, but try to keep the information present in some manner (tag as "maybe true") so that it can be reported on later, while not using it for additional optimizations for safety with advantages. It would be trivial to note optimization that it could do, but will not because it cannot reason adequately about one or more required assumptions. It would also be trivial to note occurrences of statements where undefined behavior it knows may be encountered if the programmer is not vigilant about the inputs using that same data (that is, it need not report everything, just what it can definitively find). That is, it's trivial because the hard work of detecting the problem cases is already being done.

Currently many C compilers default to optimizing for performance, the first options above. They could, and many think should at a minimum default to safety instead. We know people aren't infallible, so acting like they are has no basis in reality.

That's not to say every optimization has to be thrown out. There should be a distinction between something that can be assumed because of mathematical properties and constraints the compiler enforces compared to constraints it's assumed the programmer will correctly follow. There are clearly cases where the compiler can optimize based on knowledge it it has. If you cast an unsigned char to an int, and add another unsigned char to it prior to doing any other operation, you can assume there will be no overflow. You can assume the same of short on platforms where int is twice as bit as a short.

The bottom line is that compilers are assuming truthfulness of expressions that aren't necessarily true, and that require extraordinary effort from programmers to make sure they avoid, as evidenced by their continual discovery in enterprise grade libraries and applications.

zAy0LfpBZLC8mAC 102 days ago [-]

> First, it can assume the programmer is infallible and will never make a mistake, and use that to actually change the code as defined for performance reasons.

You keep saying that it changed the code. It didn't. It compiled exactly what the code said. You dislike C, and that's OK, but that doesn't mean that a C compiler compiling C code into machine language that is semantically equivalent to the C code is somehow changing the program. It's not.

> That is, it's trivial because the hard work of detecting the problem cases is already being done.

No, it's not, you are still committing the same fallacy as before. The compiler doesn't "collect knowledge about undefined behaviour in the program", because that is useless knowledge for the compiler.

The compiler collects knowledge that helps it reason about the program. A big part of that is tracking the range of values variables can take on. That is information that is useful in selecting how to express certain code in machine code. Undefined behaviour plays into this because operations that are defined to have undefined behaviour have no results that need to be considered as possible values of the variable that the result is stored in. So, if the compiler sees an if(x>0&&y>0){ x+=y;x/=2; }, with x and y being ints, it can derive that x will be positive, and therefore, for example, the division can be compiled to a right-shift.

There is no code in the compiler that goes "well, there is this pointer dereference, so let's remove the NULL-check branch". It's rather that the dereference limits the range of possible values of the pointer to non-NULL values, which a later stage then uses to determine that the branch cannot ever be taken, and thus can be eliminated. The same "knowledge" could be inferred from an assignment of a constant, for example, or from a preceding check ... the compiler tracks the value, not whether undefined behaviour could happen.

> Currently many C compilers default to optimizing for performance, the first options above. They could, and many think should at a minimum default to safety instead. We know people aren't infallible, so acting like they are has no basis in reality.

That's completely tautological. Every programming language assumes the programmer to be infallible. Every compiler and interpreter does what the program means according to the language spec, and if the programmer fails to express what they mean in the language, the program will do the wrong thing.

Now, there is an argument to be had over what kind of semantics of a language are easier to reason about than others, and to construct languages that make reasoning about the code as easy as possible, and some of that can be applied to adding additional restraints in a C compiler on top of the language specification, to make the language as understood by the compiler easier to reason about (while still staying within what the C spec defines, so as to stay compatible with existing, correct C code).

But there are two major problems with your reasoning here:

(1) Distuinguishing programming mistakes and legitimately optimizable code is far harder than you think. You are just handwaving through that part, but that's actually the hard part. You will either miss a lot of optimization opportunity, or you will catch close to none of the relevant mistakes. If you think you have the solution to tell those cases apart much better than current compilers do, please write a paper about it, compiler writers certainly will be interested.

(2) Performance is actually kindof central to C. If you don't need performance, you probably should just not be writing C in the first place. And if you actually need performance, just erring on the side of safety isn't necessarily gonna cut it. The question is not whether you could add all of the safety features of, I dunno, python, to C. The question is what you would expect the result to look like? There is a reason why C code tends to be faster than python, and part of that is the lack of safety.

To maybe get an idea of why compilers do assume signed overflow to be undefined behaviour, this article seems to give a good overview: https://kristerw.blogspot.de/2016/02/how-undefined-signed-ov...

kbenson 102 days ago [-]

> You keep saying that it changed the code. It didn't. It compiled exactly what the code said.

From the submitted article: Thus, the compiler removes this part of the check. It did not compile what the code said, it compiled what it determined it needed to compile. That determination included an assumption of what values a variable could be based on whether they could overflow, which would be undefined. That's the whole point.

The code specfied to test whether "target < segmentStart", and the compiler determined that could never be true and removed that check. We have in this bug report direct evidence that the compiler was too aggressive in its assumptions, as it is indeed possible. It was too aggressive for exactly the reasons I have been going over, which is to say that condition can only never be true as long as the programmer protects the actual values being used from being large enough to cause an overflow in the prior statement.

> It's rather that the dereference limits the range of possible values of the pointer to non-NULL values, which a later stage then uses to determine that the branch cannot ever be taken, and thus can be eliminated.

The check in question (in the compiler in question) specifically uses an assumption that the programmer will prevent an overflow which would be undefined. That is the assumption of infallibility I'm referring to.

> That's completely tautological. Every programming language assumes the programmer to be infallible.

No, they don't. If they did, Java, Rust and just about every dynamic language would never do bounds checks. One of the main reasons for a type system is to force the programmer to follow rules to prevent mistakes.

> Distuinguishing programming mistakes and legitimately optimizable code is far harder than you think.

At this point I'm referring to a specific type of optimization that they are doing that I think rests on shaky presumptions.Removing that optimization is not hard work. It may be hard for the community to stomach, depending on how much performance impact it has.

> Performance is actually kindof central to C. If you don't need performance, you probably should just not be writing C in the first place. And if you actually need performance, just erring on the side of safety isn't necessarily gonna cut it.

I gave a specific example for how to get the same performance if this particular type of optimization was made more conservative. Performance is important, but when it comes to performance or correctness, correctness should win. Full stop.

Very little of what I'm referring to at this point is theoretical. I'm referring to real world situations, mostly the one these comments are in response to. Your comments seem to indicate you think this situation isn't possible. Can you clarify on whether you think the bug report is wrong, or whether I'm incorrect in my assessment of what the bug report is saying, or whether I'm misinterpreting your point? At this point, I'm under the impression that much of what I'm stating is fact, so I'm not sure how to interpret statements such as "It compiled exactly what the code said." as anything but wrong, but that's not getting us anywhere.

zAy0LfpBZLC8mAC 102 days ago [-]

> It did not compile what the code said,

Yes, it did--that seems to be your fundamental confusion.

What that code means is determined by the specification of the C language, and only the specification of the C language. You constantly keep implying stuff that you think, or hope, or would prefer the code means, but that is completely irrelevant for the question of what the code actually means. Just because some intuitive reading of the characters that make up the code makes you assume that it should means a certain thing, does not make it so.

That code does not mean "check for overflow", no matter how much you wish it did. And because it doesn't mean that, the compiler didn't translate it as that either.

> No, they don't. If they did, Java, Rust and just about every dynamic language would never do bounds checks. One of the main reasons for a type system is to force the programmer to follow rules to prevent mistakes.

You are completely missing the point, essentially due to the same confusion as above. I didn't say that those languages didn't have bounds checks. I said that they assume that the programmer is infallible. Every programming language specifies exactly what each syntactic construct means, and which syntactic constructs don't mean anything, and what the runtime behaviour is, and where it is undefined. That is what makes a programming language a programming language. It is the job of the programmer to translate what they mean into the syntax of the respective programming language. If the programmer makes a mistake in this translation, the programm will be wrong, and it will not do what the programmer meant it to do, no matter which programming language they are using--in that sense, every programming language expects the programmer to be infallible.

The difference between programming languages is not whether they allow you to make mistake (none does or ever will), but how difficult it is (mentally) to avoid making mistakes.

> I gave a specific example for how to get the same performance if this particular type of optimization was made more conservative. Performance is important, but when it comes to performance or correctness, correctness should win. Full stop.

That's completely besides the point. Nobody is saying we should have incorrect code (well, ok, some misguided people probably do, but they aren't really part of this discussion). The question is how we are going to achieve that, and that is ultimately a question of economics: What is the easiest/cheapest way to get the greatest amount of software into a state where its execution matches what the programmer intended? Just claiming that we should throw infinite resources at the problem doesn't actually help the problem disappear.

> Can you clarify on whether you think the bug report is wrong, or whether I'm incorrect in my assessment of what the bug report is saying, or whether I'm misinterpreting your point?

Really none of those, I think. I think the way you think about the problem is just confused, which makes it difficult to nail down why exactly your suggested solutions aren't really solutions.

> I'm under the impression that much of what I'm stating is fact, so I'm not sure how to interpret statements such as "It compiled exactly what the code said." as anything but wrong, but that's not getting us anywhere.

I hope I maybe managed to explain that above? I think that's really at the core of your confusion: You are mixing up what you intuitively think things mean and what things mean according to the appropriate formal definition in the respective context. But code in particular does not mean anything, except for what the formal specification of the respective language defines, and that can deviate arbitrarily far from your intuitive understanding.

It's a bit like false friends in natural languages: Just because you know a word from one language, doesn't mean the same word cannot mean something completely different in another language, and it's just confused to use the vocabulary of one language to determine the meaning of a sentence in a different lanugage.

kbenson 102 days ago [-]

> Yes, it did--that seems to be your fundamental confusion.

No, it compiled what it determined it had to, based on the C standard. There is a difference. The code, as written, specified a certain set of actions to be taken. The compiler determined some of those directions need not me translated to machine code, and thus did not, but they were specified nonetheless.

To say that the compiler did not remove any code, or directions to be carried out, when translating to machine code, is to subscribe to a torturous and unuseful definition of the terms we have been using.

Of the actions specified by the programmer in the source file, one of which was optimized out in the translation of that source specification to machine code. This change alters the execution path of the program when it is present, and to such a degree that without the optimization the program would halt almost immediately, but with the optimization it allows an out of bounds memory access.

We are not arguing whether the C standard allows this. We are arguing whether the C compilers should do this. There is a distinct difference. Stating that no code was removed has been extremely unhelpful to this conversation, regardless of whether you think it is a technically correct statement. In the generated machine code, a condition of a branch statement does not exist in the version with optimization, but does without it.

The fact that this particular optimization relied on a case where the programmer specified a statement that depending on values not knowable to the compiler at the time of compilation may have resulted in undefined behavior or not makes this a poor optimization to carry out.

> Just because you know a word from one language, doesn't mean the same word cannot mean something completely different in another language, and it's just confused to use the vocabulary of one language to determine the meaning of a sentence in a different lanugage.

Perhaps you could actually address a point I've made instead of arguing over the words used. You are arguing over a technicality of the instead of the topic at hand.

Feel free to reply, I'll read it, but I'm done with this conversation beyond that.

zAy0LfpBZLC8mAC 102 days ago [-]

> The code, as written, specified a certain set of actions to be taken.

That's a nonsensical statement. "The code, as written" doesn't have any meaning, other than perhaps what you make up in your mind, which is not a useful reference for discussion, unless you also explain what you interpret it to mean.

I understand that maybe you do not actually mean this literally, and that you maybe are just using somewhat imprecise language to get the idea across--the problem is that exactly in the details that you are not spelling out are the problems that this discussion is all about.

> To say that the compiler did not remove any code, or directions to be carried out, when translating to machine code, is to subscribe to a torturous and unuseful definition of the terms we have been using.

No, quite to the contrary. Those definitions might not be useful for day-to-day programming work, but they are exactly the definitions that you need to clearly discuss compiler behaviour, because those are the definitions that the compiler is using, and the compiler is using those definitions because they match the concepts of how you build a compiler.

> Of the actions specified by the programmer in the source file, one of which was optimized out in the translation of that source specification to machine code. This change alters the execution path of the program when it is present

No, there is no "change", that's just confused language. There is a difference between compilation results, but neither of those is in any way the "real" thing, while the other is "changed", they are both equally valid mappings from C to machine code, with one arguably being closer to the intention of the programmer and thus maybe more useful in this specific case.

> We are not arguing whether the C standard allows this. We are arguing whether the C compilers should do this.

The problem is that those are inextricably interlinked, because the compiler must still stay within the bounds of the standard, and still produce code with reasonably good performance.

> Stating that no code was removed has been extremely unhelpful to this conversation, regardless of whether you think it is a technically correct statement.

The point is not that it's a technically correct statement, the point is that that's not necessarily how the compiler "thinks", so it's often unhelpful in discussing compiler behaviour to talk about "removing code".

> In the generated machine code, a condition of a branch statement does not exist in the version with optimization, but does without it.

It just so happens that in this case, the compilation result without optimization was closer to the programmer's intention than with optimization. But the usefulness of this observation is severely limited because in other cases the exact opposite could be true. The programmer wrote something different than what they meant, and the compiler in some situation produced code that still matched the intention of the programmer ...

> The fact that this particular optimization relied on a case where the programmer specified a statement that depending on values not knowable to the compiler at the time of compilation may have resulted in undefined behavior or not makes this a poor optimization to carry out.

Except that if a C compiler avoided all optimizations for which this is true, a lot of code would be a lot slower. You seem to only be seeing some specific cases for which the performance difference is negligible, and the risk of the optimization is obvious to you, and your imprecise use of language doesn't make discussing this any easier. What you don't seem to realize is how much optimization a C compiler does that is perfectly safe that the compiler cannot easily, if at all, distinguish from this arguably dangerous case, which is why the compiler could only choose to either in many cases produce unnecessarily slow code, or use the current strategy and occasionally produce code that does something else than what the programmer had in mind.

> Perhaps you could actually address a point I've made instead of arguing over the words used. You are arguing over a technicality of the instead of the topic at hand.

Your point is incoherent because you are using imprecise language, which makes it difficult to address. That's why I am addressing your imprecise use of language first.

klodolph 103 days ago [-]

I think there's a misunderstanding here—the undefined behavior occurs at runtime, not at compile time, so rejecting it outright at compile time is not a sensible solution (it's not sensible to reject something if you don't even know that it happens). The modern purpose of undefined behavior is to enable these kinds of optimizations anyway, so I can write:

    int *x = some_call();
    ...
    *x = 5;
    ...
    if (x == NULL)
        abort();

The compiler will optimize out the abort(), which is a good optimization since obviously x can't be NULL otherwise *x = 5; would be wrong. You might say that the programmer should be responsible for removing the abort()—but it might be inlined from a different function or something like that. Rejecting the program with an error would seriously piss me off. We want our compiler to be aggressive about optimization under the assumption that if we write clean, standards-conformant code, the compiler will optimize out any bounds checks that it can prove will pass, so we can be free to put bounds checks everywhere and get good performance. That's hard, and it's not obvious that it's hard, and people routinely underestimate how hard it is to write standards-conformant C. There are a lot of alternative ideas in the pipeline but we don't automatically get to blame the compiler vendor, even though sometimes it really has been the compiler's fault.

People get rather riled up about undefined behavior. There is a good subset of people who rather like fast C code (optimizing out paths with UB can make code much faster), and a good subset of people who think C should just be safe and predictable. If you want safe and predictable, C is a tough sell.

kbenson 103 days ago [-]

Arguing the extremes of this as people tend to do about this is not constructive. A compiler flag that warned or errored on branches that were optimized out but the branch included an abort or exit within a certain distance (AST nodes?) of the branch would be useful. Possibly even disallowing optimizing our that conditional that branch if it can be determined it contains any sort of exit or abort before returning control flow.

naasking 103 days ago [-]

> I think there's a misunderstanding here—the undefined behavior occurs at runtime, not at compile time, so rejecting it outright at compile time is not a sensible solution (it's not sensible to reject something if you don't even know that it happens).

Except the compiler clearly knows that it does happen since it elides the code because of it.

> The compiler will optimize out the abort(), which is a good optimization since obviously x can't be NULL otherwise *x = 5; would be wrong.

This isn't an optimization based on undefined behaviour, it's actually a perfectly standard dataflow analysis that even languages with fully defined behaviour would implement.

klodolph 103 days ago [-]

> Except the compiler clearly knows that it does happen since it elides the code because of it.

That's definitely the opposite of what's happening here. The compile believes that it does not happen and therefore removes the code.

kbenson 103 days ago [-]

It's actually an interaction between those two aspects which causes the problem. The compiler makes an assumption that something can't be true because that would mean undefined behavior (and of course the programmer isn't doing something that's undefined), and then a later optimization uses that fact to determine code that tests for that exact undefined behavior could never be run, and removes it. The reasoning by itself is not problematic without it's later use in optimization decisions, and the optimization decisions are not problematic without unfounded reasoning about undefined behavior.

I think that's why people are so caught up on this. Technically, neither action is a problem, but together they obviously caused a problem. Either one in isolation doesn't look all that horrible (even if the undefined behavior reasoning is pushing it).

masklinn 103 days ago [-]

> For people claiming that this is a compiler bug, it really is not.

In fact TFA specifically points that out multiple times.

mpweiher 103 days ago [-]

Claims don't become true by repetition.

infimum 103 days ago [-]

does anyone know, how hard it would be to have a compiler flag -Wundefined that tells me when i write code with undefined behaviour? this way i can fix these issues myself and don't have to rely on the compiler inferring my intentions.

kutkloon7 103 days ago [-]

GCC has -fsanitize=undefined. I don't think it detects all kinds of undefined behavior (I haven't tried it myself). You should be able to find more information if you please.

Edit: mannykannot noted somewhere else that clang has the same flag.

staticassertion 103 days ago [-]

These are runtime sanitizers designed to aid testing and fuzzing, not meant for running in production software. It involves instrumentation.

nickm12 103 days ago [-]

This isn't the first security vulnerability caused by optimizing compilers assuming programs don't exhibit undefined behavior and it won't be the last. Some relevant literature:

https://people.csail.mit.edu/nickolai/papers/wang-undef-2012...

http://www.complang.tuwien.ac.at/kps2015/proceedings/KPS_201...

mpweiher 103 days ago [-]

And the claim that this is not a bug in the compiler is laughable. All this is stuff that used to work. The compiler changed and now removes code. That's a compiler bug.

frederikvs 103 days ago [-]

The standard says the compiler is allowed to make that change. The standard says the programmer writing that piece of code is wrong. So no, it's not a bug in the compiler, the compiler is complying with the standard.

You could make a case that it's a bug in the standard though :-)

mpweiher 103 days ago [flagged] [dead] [-]

No, the compiler is interpreting the standard in a particular way. It doesn't have to do this.

I had a C compiler (paid for it) before there was a C standard. So in your opinion, this compiler could do anything at all and nothing could possibly ever be a bug.

UPDATE: A lot of mindless downvotes, but no actual substantive rebuttals. As expected.

kutkloon7 103 days ago [-]

Well, of course the author of the compiler could argue that the compiler is bug-free, and all behavior is intentional. While this would be laughable in most cases, there is no 'hard evidence' that this is not the case.

A standard is needed in some cases where there is no 'obvious' behavior.

I think this is the way that the 'Ruby standard' is defined, by a reference implementation which is assumed to be bug-free.

About the interpretation of the standard: this is probably true, but this kind of technical documents is meant to be mono-interpretable, if you know what I mean. Ultimately there might be a few limitations which you don't encounter in practice (if you give your variables names of more than 4k characters, I doubt if it will compile), so you might argue this is an 'interpretation of the standard'.

I would argue that in such a case technically, the compiler doesn't comply with the standard, but for all purposes and intents, it does (and so, in practice no one would doubt that the compiler complies to the standard).

to3m 103 days ago [-]

It would be complying with the standard if it compiled the whole program into one single call to abort! And yet it didn't do that. I wonder why not.

caf 103 days ago [-]

The code in question is not ill-defined over all inputs - it has well-defined behaviour as long as the offset in question doesn't exceed the size of the object that it is being used to address.

lawnchair_larry 103 days ago [-]

Actually, invoking undefined behavior anywhere in the code, even unreachable code, means the entire program is invalid. Including for all inputs. The reason it doesn't simply abort is because compiler authors don't go out of their way to do extra work for no reason.

zAy0LfpBZLC8mAC 103 days ago [-]

> Actually, invoking undefined behavior anywhere in the code, even unreachable code, means the entire program is invalid.

Erm ... nope. If code is unreachable, it can, by definition, not exhibit undefined behaviour.

> The reason it doesn't simply abort is because compiler authors don't go out of their way to do extra work for no reason.

Well, true, but the primary reason is that there are parts of the input domain for which the behaviour of the code is actually defined, and it's not defined to mean a call to abort(). The mere possibility to call some code at runtime with values for which the behaviour is undefined does not make the code's semantics completely undefined.

Pxtl 103 days ago [-]

Sounds to me more that it's a bug in the language. Too much undefined behavior is necessary for a performant program.

staticassertion 103 days ago [-]

OK, well it's a bug that you can expect to exist in compilers in the future because every mainstream C compiler is going to optimize around UB. So... who cares?

azinman2 103 days ago [-]

It'd be nice if there was a way that when compilers "re-write" or eliminate parts, those could be highlighted or otherwise nicely conveyed in our IDEs.

I don't know why we expect phones and apps to have modern great UX, solving all kinds of problems, but our text editors should remain essentially the same as in the 70s.

chrisseaton 103 days ago [-]

The 70s era bit of this is the thinking that compilers 'sometimes re-write parts' of your program. Really they completely break your entire program down and then build it up from the ground again. Everything would show up as having been rewritten.

johncolanduoni 103 days ago [-]

Most C/C++ compilers support outputting line number information even for optimized code (DWARF for example has a specific mechanism for identifying inlined code in stack traces). That's more than enough location data to provide something useful.

MaulingMonkey 103 days ago [-]

Seeing e.g. the original source in the disassembly window is useful. And the same info lets you do things like: https://gcc.godbolt.org/

Taking that information into any sort of "yeah, this still represents the original intent of the author" highlight/lack of highlight/summary as it sounds like azinman2 wants might be a little more difficult. (After all, if you can implement that, why not have your compiler only emit 'sane' code in the first place?)

rainforest 103 days ago [-]

In this specific case an IDE could conceivably highlight a check that the compiler will omit, or point at undefined behaviour directly. The general aim is probably unachievable as you suggest - if 3 functions are rewritten to one it'd be hard to point out where the change was applied.

adrianN 103 days ago [-]

That would be a nice research project for a PhD or two, but I think it will be very difficult to present optimizations in a way that is useful for the programmer. In this case for example the optimization concerns an inline function. The compiler might optimize it in completely different ways depending on the context. How do you present that in the editor? It would also make compilation much slower, I think, because you have to schlepp around all the information that you use for the optimizations.

There are some tools that go in that direction. For example clang can help you help the optimizer auto vectorize loops: http://llvm.org/docs/Vectorizers.html#diagnostics

reitanqild 103 days ago [-]

but our text editors should remain essentially the same as in the 70s

Not everybody expexts that:

I use sublime text for smaller editing tasks (although vscode and notepad++ are kind of ok too) and Visual studio or in my case preferably Netbeans for anything bigger (although I accept that this might sound weird).

Only if I connect over ssh or work on a console I prefer vim.

Edit: also my preferred IDE already does something similar by telling me whenever it thinks something can obviously be simplified or written in a more idiomatic way.

azinman2 103 days ago [-]

To both replies so far, I'd say open your mind a bit. Think about it this way... email was invented long, long ago, yet people have moved onto facebook and snapchat and instagram and all these completely different experiences there are more nuanced and richer. If you squint your eyes, programming is basically the same.... there are vast opportunities to change how it fundamentally works. Having feedback loops from the compiler into your source code (it can be visualized in a wide number of ways) is just one area that's low-hanging fruit that could also help with such security vulnerabilities.

BuuQu9hu 103 days ago [-]

Hi, compiler author here. You're not being downvoted for your POV, but because you're not grokking that the thing that you are asking for is fundamentally difficult.

Nearly all serious compilers have at least one representation which is not isomorphic for all programs. (It is homomorphic, which is required for the compiler to be correct.) As a consequence, at least some programs will be mangled beyond recognition.

azinman2 103 days ago [-]

Hi, creative programmer here.

Just because it's hard doesn't mean it isn't worthwhile to do, or that everything the compiler does must be reserved and shown in the IDE. I'm also not suggesting that this compiler feedback is the only IDE improvement we make. I'm observing that the entire way we develop software hasn't fundamentally changed in decades, yet every other aspect of computing has. Even something that roughly seems the same like word processing is now collaborative and in the cloud with real-time updates and social threads.

Light table is one IDE that is trying to question some assumptions -- like that files are important and that everything must be 1 line spaces apart in mono font. This is the kind of progress I'd like to see.

It does seem reasonable that there be a way, even if difficult, to highlight a bounds check that gets optimized away. If we start from the problem we're trying to solve (show eliminated code that poses security risks... such as bounds checks or erasing memory) rather than the general problem (reverse all compiler changes), then things become more tractable.

jononor 103 days ago [-]

We need a -Wundefined-behavior, which can be used in combination with -Werror. It is a practically impossible job to be so aware of the standard while working to get this right without assistance.

masklinn 103 days ago [-]

> We need a -Wundefined-behavior, which can be used in combination with -Werror.

The problem is that you probably won't be able to use any optimising compiler if you do that. As a result of a bunch of optimisation (mainly inlining) the compiler "uncovers" lots of UBs which it can use to pare down the code, that's why inlining is one of the most important intermediate/early optimisation.

Here's an old example by mikeash:

    int ComputeStuff(int *value) {
        if(value == NULL) {
            long and complex computation for a NULL value
            return result
        } else {
            long and complex computation using the data pointed to by value
            return result
        }
    }

    void DoStuff(int *value) {
        int pointedTo = *value; // value *must* be non-NULL
        // do some work with pointedTo
        int computedResult = ComputeStuff(value);
        // do some more work with whatever
    }

Here the compiler can inline ComputeStuff, at which point it can rely on UB to assert that half the inlined code is dead and can remove it. What would it even do on "-Werror -Wundefined-behavior", fault on the original DoStuff? On the inlined one? On both?

Would -Wundefined-behaviour forbid assuming a pointer is non-null ever, requiring (and outputting) a check before each use of a pointer in a given scope/lifetime?

Asooka 103 days ago [-]

> Would -Wundefined-behaviour forbid assuming a pointer is non-null ever, requiring (and outputting) a check before each use of a pointer in a given scope/lifetime

Either emit the if, or throw an error that the code is internally inconsistent. I mean, it IS inconsistent - first it dereferences the pointer, then it checks if it's NULL or not! This is pretty much obviously an error! I'd much rather add a bunch of assert(value != NULL); to my code to signal to the compiler that yes, this pointer is really not NULL within this scope, than have to deal with the current UB-hell.

Another acceptable alternative would be to rewrite the if(value == NULL) branch to call terminate() since if we get there, then the program has executed UB and cannot continue. Since UB is rare, we can even arrange things such that the branch doesn't slow down the CPU in the general case.

zAy0LfpBZLC8mAC 103 days ago [-]

> This is pretty much obviously an error!

No, in most cases that's the result of defensive programming and macros or inlining, where there is a perfectly useful sanity check in some inlined function, but the compiler can see that in some specific context where it was inlined, the check is actually redundant, and thus removes it.

> Since UB is rare, we can even arrange things such that the branch doesn't slow down the CPU in the general case.

It's an additional instruction, so it always has the potential to slow the code down, if only because it makes the code bigger than the cache. Also, it's usually not one such branch, but lots and lots of them throughout the program. That's why the optimizer tries to eliminate them.

jononor 103 days ago [-]

For such inlining optimization it is the compiler that 'introduces' the UB in its intermediate step. That is not a fault of the input code, so no warning should be emitted?

If the compiler can know that the pointer is non-null, why would it need to rely on UB to optimize it? Problem I guess is the ever-present possibility of pointer aliasing and such nearly allowing anything to change anything else at any time... Not easy making things more sound on such a foundation.

derf_ 103 days ago [-]

If the compiler can know that the pointer is non-null, why would it need to rely on UB to optimize it?

You've got this backwards. The compilers "knows" the pointer is non-null because you dereferenced it, and dereferencing null would have been undefined behavior. That's what allows the compiler to assume that didn't happen.

There are many operations which will produce undefined behavior for some inputs (e.g., every pointer dereference, every signed integer arithmetic operation). Figuring out if it is possible for a program to encounter such inputs at runtime is equivalent to the halting problem.

This is why any attempt to craft a -Wundefined-behavior that works on non-trivial programs is doomed to failure. It may be possible to prove a program will definitely invoke undefined behavior in some useful subset of cases. Compilers already do this in some of them, even. But those cases (and many more, besides) would also be caught with ubsan and a unit test.

jononor 103 days ago [-]

You're right, it is an intractable problem in general. I guess just use Rust if one cares about this..

masklinn 103 days ago [-]

> For such inlining optimization it is the compiler that 'introduces' the UB in its intermediate step.

Technically the UB is in the original code, it's the dereferencing of a pointer without checking it. The compiler can merely use that to assert dead code (checking for a null pointer after having already deref'd the pointer is nonsensical) and optimise it away.

jononor 103 days ago [-]

I don't support this unsafe compiler behavior. There should be a check on the pointer in sight, or a manual declaration that "this is safe, trust me". Otherwise, emit a UB warning...

adrianN 103 days ago [-]

Then you want to use a different language. Java for example throws NullPointerException instead of having UB for null pointer accesses.

benibela 103 days ago [-]

Object Pascal just calls the function with this == null (well self = nil)

adrianN 103 days ago [-]

ON ERROR RESUME NEXT

Manishearth 103 days ago [-]

The presence of UB is not statically detectable. Global analysis may help, but it's probably reducible to the halting problem. (of course, you can detect some common mistakes, which is what many analysis tools do)

UB is not a property of the local code. It's a property of the code and the values you feed to it. Without explicit annotations of contracts you won't be able to statically detect this for C/C++.

At runtime UBSan exists to detect this.

kbenson 103 days ago [-]

This isn't about statically detecting undefined behavior in general, it's about using detected undefined behavior to make assumptions that don't play well with later stages in the optimization pipeline.

Deciding that a statement must be true because otherwise it would contain undefined behavior (which is assuming the developer won't cause undefined behavior), and then using it's truthiness to elide the check the developer included to attempt to make sure they weren't causing undefined behavior is some contorted reasoning that I'm confident isn't something that compiler authors decided specifically should be included, but instead is an unfortunate interaction between two separate aspects of compilation.

Manishearth 102 days ago [-]

I don't think it's that clear cut. It's not "an unfortunate interaction between two separate aspects of compilation", it's how it's supposed to work -- the optimizer gathers invariants about the control flow and then makes optimizations based on it. The problem here is that the intent of the code is not something the optimizer can pick up on. Regular bounds check elimination works pretty much exactly the same way.

kbenson 102 days ago [-]

There is a difference between whether something is an invariant based entirely on mathematical properties and rules that the compiler ensures, and one that also includes expected programmer behavior.

In this case, the optimizer made an assumption that required the programmer to have ensured than an addition could not overflow, otherwise it would be undefined behavior. The compiler assuming that holds as true when it isn't necessarily so (and in this case was not so), is a problem. Optimizations that assume a programmer was diligent enough to follow those rules should not be attempted, period, if they result in the removal of some code (or possibly ever, just to be safe).

That's not to say those optimization are forever lost to us. It just requires the programmer be more careful about the actions they take such that no assumptions need be made. For example, if you are adding two types which could cause an overflow, and overflows are undefined, you can cast and use larger types, or rely on compiler knowledge. Examples of that might be casting in the comparison or casting to larger types earlier on and not modifying it prior to comparison, in which case a sufficiently smart compiler might infer that since it was an unsigned char originally and you haven't modified it since you casted it to an unsigned int, while its type is actually an unsigned int, for comparisons of overflow detection for optimization you can treat it as an unsigned char.

Manishearth 102 days ago [-]

I still don't think it's that clear cut, most optimizations are based on these assumptions, it's just that some assumptions are more reasonable than others. A lot of compilers do this to some extent already, avoiding using certain kinds of information for optimization even if they could.

kbenson 101 days ago [-]

Sure, I'm specifically talking about assumptions that rest on programmers having followed some constraint purposefully. Humans are fallible, and if the compiler can't actually prove it, it shouldn't be taking any action that might remove statements.

That's not to say programmers can't take steps to make it provable for the compiler.

Manishearth 101 days ago [-]

Right, my point is that most of the assumptions made are like that. Some are more obvious than others.

Yes, this could help construct a "warn on assumption made" mode, but you'd likely be inundiated with a lot of mundane assumptions as well. You could filter by kind of assumption, but scary-ub-triggering assumptions like the one in the post happen all the time too, and they usually depend on the runtime properties of the program, which are hard to statically figure out. Basically, this is a very nontrivial problem, and quite likely intractable to solve in a way that doesn't produce a deluge of unnecessary info.

(In the presence of annotations to help the compiler -- like the ISOC++ core guidelines -- the problem becomes significantly easier because you have local information on the runtime properties)

kbenson 100 days ago [-]

> Right, my point is that most of the assumptions made are like that. Some are more obvious than others.

My argument is that any optimization that requires the compiler to infer intent rather than make concrete decisions based on known facts and this results in a change in the output of the program, then that is an optimization that is irresponsible to apply. The only part hard to know about this is how it affects execution, as the rest is already done currently. If that's too hard to determine, the correct stance is to disallow the optimization in that instance.

It is not irresponsible to have undefined behavior that the programmer can leverage.

It is not irresponsible to optimize out instructions that do not affect the result.

It is irresponsible to allow undefined behavior, and then change the output based on how that undefined behavior is interpreted but only if a specific optimization is applied. Optimizations should never change deterministic output. Code that is non-deterministic purely because of undefined behavior needs to noted.

That people have gotten used to some speed improvements at the expense of consistency, but not necessarily to their knowledge, is no excuse not to fix it. In many cases the code could be changed to once again take advantage of the same optimizations which are more strict, or to avoid undefined behavior. In this case, that would mean either casting to a type with defined overflow prior to addition or using a large data type to accumulate the values and test if it's too large. Neither of those allows the optimization, because that optimization is actually wrong in this case. In the cases where the optimization would be correct, correct use of types and casting should yield the same result.

I would love to see a specific counter-example where this would be unworkable. I do not consider code having to be changed from what was previously a possible undefined behavior to definitively not undefined behavior as unworkable. I don't see how C could be considered a systems language without this. I understand how this is an unpopular stance with C programmers, but without it, there's actually a bunch of non-deterministic source code in the wile that's an allowed compiler tweak away from changing how it functions, not just what instructions it uses to achieve that function.

lmm 103 days ago [-]

I thought sandstorm was supposed to be a security-focused project? But they're writing new C code and connecting it to the Internet. And surprise surprise, this happens.

Fuzzers will catch a certain proportion of this type of problem, sure. But for a ground-up project like this there's really no excuse not to use a better language.

kentonv 103 days ago [-]

Dynamic languages are susceptible to their own kind of security bugs. I've found lots of bugs in Node apps where they accepted some JSON from the client and then passed it directly into a Mongo query without type-checking it. Or look at Ruby's YAML security issues.

Nearly every language is vulnerable to integer overflow. C++ is one of the few languages (possibly the only popular language) where you can reasonably check for overflows at compile time, as Cap'n Proto now does: https://capnproto.org/news/2015-03-02-security-advisory-and-...

So, I don't accept the assertion that C++ is inherently a security problem.

In any case, Sandstorm's low-level container management bits pretty much had to be written in C/C++ since they interact closely with the operating system. Or if we were starting over today, Rust might now be an option, but it wasn't when we started.

lmm 103 days ago [-]

> Dynamic languages are susceptible to their own kind of security bugs. I've found lots of bugs in Node apps where they accepted some JSON from the client and then passed it directly into a Mongo query without type-checking it. Or look at Ruby's YAML security issues.

I don't advocate dynamic languages. Indeed I regard this class of vulnerabilities as evidence of a lack of type safety. Still, C++'s undefined behaviour semantics promote almost all bugs into security bugs, which is not a great property to have in your language.

> C++ is one of the few languages (possibly the only popular language) where you can reasonably check for overflows at compile time, as Cap'n Proto now does: https://capnproto.org/news/2015-03-02-security-advisory-and-....

C++ using that kind of template metaprogramming technique has nowhere near the overall popularity of C++, so that's not really a popular approach either (in the sense that e.g. few tools understand it, people who can work on it are hard to hire...). And any language with a reasonably advanced type system (or macros) could do the same thing.

blub 103 days ago [-]

One doesn't need to use complex template techniques, a class like CSafeInt should cover all integer overflow bugs with run-time checks.

The problem is that almost no one is thinking about this class of bugs. Furthermore, few languages offer tools to detect and handle integer overflows; I read that Rust for example only has run-time checks in the debug version, which to me is disappointing.

kouteiheika 102 days ago [-]

> I read that Rust for example only has run-time checks in the debug version, which to me is disappointing.

Only by default; you can turn them on in release builds too if you want. (And Rust 1.17 will have a stable `-C overflow-checks=y` flag which allows you to turn on the overflow checks without other debug assertions.)

lmm 100 days ago [-]

> a class like CSafeInt should cover all integer overflow bugs with run-time checks

If that's all you want it's easy to do that in any language? What's so special about C++? The reason this gets talked about more in C++ is that the consequences of integer overflow in C++ are dreadful (undefined behaviour i.e. instant security bug) whereas in most languages adding two integers will evaluate to an integer.

nickpsecurity 103 days ago [-]

Or Ada 2012 + SPARK 2014 wrapping the C API behind safer interfaces. One can catch most problems, including integer overflow, at compile time or with runtime checks it inserts. The other can prove their absence automatically in more static code. Rust can prevent temporal errors at compile time so it's on table, too.

educar 102 days ago [-]

I would love to see capnproto rewritten in rust. Is there an ongoing project for this?

kentonv 102 days ago [-]

Cap'n Proto has a high-quality Rust implementation: https://github.com/dwrensha/capnproto-rust

steveklabnik 102 days ago [-]

I linked to the Rust implementation of capnproto elsewhere in this thread.

detaro 103 days ago [-]

There is also a Rust implementation, which is the one Sandstorm uses as far as I know.

staticassertion 103 days ago [-]

Is there a source for this? I was unaware.

steveklabnik 103 days ago [-]

https://github.com/dwrensha/capnproto-rust

staticassertion 103 days ago [-]

Thanks, this is what I was looking for (found through your link): https://github.com/sandstorm-io/collections-app

So it does seem to be the case that they are using the rust version here. That's nice.

_pmf_ 103 days ago [-]

I'm usually not in the No-C camp, but your argument (new software with focus on secure infrastructure) is actually solid.

m_eiman 103 days ago [-]

Is there a way to make the compiler list the applied optimizations for a specific function, in e.g. the .lst output? Would be very helpful when the compiler is doing something creative and you can't figure out how to make it do what you intended.

mbel 103 days ago [-]

With clang you can use -Rpass family of flags to get information about missed and used optimization opportunities. Although it will not print absolutely everything (trivial optimizations will be omitted). Other compilers have other options, ICC for example is able to generate optimization reports.

chrisseaton 103 days ago [-]

You can ask Clang to print the compiler IR after each optimisation, but you would need to know that you were looking for something before that was useful.

adrianN 103 days ago [-]

You could start by not relying on undefined behavior ;)

patrec 103 days ago [-]

You mean like approximately zero percent of non-trivial C or C++ programs in actual existence?

adrianN 103 days ago [-]

You could of course switch to Ada.

masklinn 103 days ago [-]

Except Ada has UB (they used to call them bounded errors, but the semantics are basically identical), and reading the musings of ADAists[0]

> I would say, having looked at hundreds of thousands of lines of Ada, that I have had a disappointingly frequent experience in finding the UNCHECKED_CONVERSION generic used in production Ada code.

> I agree that this is often a mark of poor craftsmanship and unhesitatingly discourage its use, but it is there

[0] https://www.cs.york.ac.uk/hise/safety-critical-archive/2011/...

adrianN 103 days ago [-]

If finding potential UB in Ada programs is as simple as grepping for unchecked_conversion, then that's a big step forward compared to C, don't you think?

masklinn 103 days ago [-]

1. your claim was essentially that ADA = no UB, so that's hardly relevant

2. that's not Ada's only UB, the 2005 spec has 35 or 36, which granted is an improvement over C's circa 200, but a far cry from being UB free

adrianN 103 days ago [-]

The claim is that it's impossible to avoid UB in real world C, but not impossible in real world $SAFERLANGUAGE, with Ada as an example. Any language that allows you to poke around in memory (a feature you need for embedded systems) will contain unsafe constructs. The question is whether you can tell by looking whether a particular piece of code is safe or not.

mpweiher 103 days ago [-]

Compiler writers could/should start by not treating undefined behavior as blank checks.

This is so idiotic, and no, it's not a bug in Cap'n'Proto, it is very definitely a bug in the compiler (and/or the current version of the C spec. if it actually allows this).

frederikvs 103 days ago [-]

The C standard allows the compiler to do this, so it's definitely not a bug in the compiler.

mpweiher 103 days ago [-]

As I wrote earlier:

I had a C compiler before there was C standard. So by your definition, this compiler could not have bugs?

As an exercise, try to express the difference between "standards-compliant" and "bug-free".

kutkloon7 103 days ago [-]

I think the main point is that if there is a standard, an implementation should comply. If there is no standard, you have to use intuition and common sense to say what's a bug.

Different people may have different expectations, so bugs become subjective.

Ideally, there is a water-tight standard. In this case every deviation from the standard is a bug and vice versa.

mpweiher 103 days ago [-]

> if there is a standard, an implementation should comply

Yes. However, we are talking specifically about the parts where the standard is undefined. The standard most emphatically does not require a compiler to do these crazy optimizations that remove safety critical code.

unwind 103 days ago [-]

So rewriting code in order to "do what I mean" makes sense, since not doing what you mean is a bug in the language itself?

Not sure that's a very productive approach. The spec is what it is, and it's of course rather deliberately done the way it is in order to make it possible for compilers to optimize and extract more performance.

makomk 103 days ago [-]

Not so much "do what I mean" as "do what the underlying hardware architecture does". For example, on pretty much any machine your code is likely to run on these days, pointer arithmetic is the same as unsigned arithmetic, and so it's quite reasonable to expect C compilers to treat it that way. The spec can't mandate this because there are some obscure historic systems where pointer arithmetic is bounds-checked in hardware or has other oddities, but people not writing code for those systems shouldn't have to deal with that.

mjevans 103 days ago [-]

We're seeing these new security bugs because the abuse of portability wiggle-room as a corner to intentionally cut is changing the meaning that a human is trying to convey in a logical statement.

I still assert that at the very least if logical operations are eliminated they should be verbosely enumerated in the output of the compiler.

It might also be nice if there were a way of marking a section as /security/ rather than /speed/ critical, and thus transformations on the highlighted section would be far more limited.

mpweiher 103 days ago [-]

> "do what I mean"

Well, the code is right there, there is no magic "do what I mean" needed. Just don't magically remove code that I wrote.

> spec is [..] rather deliberately done the way it is in order to make it possible for compilers to optimize and extract more performance

Yes, and it is wrong. Breaking existing programs with magic that simply removes code that is clearly there in order to eke out a bit of performance is wrong.

staticassertion 103 days ago [-]

> Just don't magically remove code that I wrote.

Compile with O0.

> Yes, and it is wrong. Breaking existing programs with magic that simply removes code that is clearly there in order to eke out a bit of performance is wrong.

Well welcome to writing C? It's been this way for a long time and that won't change - UB is critical to performance and doesn't always lead to unsafety. If you're programming in C, don't blame the compiler when your UB leads to remote code exec, you should have known what you were getting into.

mpweiher 103 days ago [-]

> Well welcome to writing C?

I've been writing C for 30 years.

> UB is critical to performance

Not really, no. See

"What every compiler writer should know about programmers or “Optimization” based on undefined behaviour hurts performance"

(referenced elsewhere in this thread)

http://www.complang.tuwien.ac.at/kps2015/proceedings/KPS_201...

staticassertion 103 days ago [-]

C optimizers have and will optimize around UB. Feel free to call it a bug in C, the compiler, or the programmer, or the code - when you write C, when you use a common compiler, you're asking for vulnerabilities.

Thanks for the paper. It's very interesting but I think it has no real impact on my initial statement - these optimizations aren't going away, you can continue to expect problems like this vuln in the future for the same reasons.

mpweiher 103 days ago [-]

Having something wrong be or become reality doesn't change it being wrong.

staticassertion 103 days ago [-]

Ok... so what? I don't care if you blame the compiler or the spec, and no one else should either - the end result is the same, vulnerable code.

masklinn 103 days ago [-]

> Well, the code is right there, there is no magic "do what I mean" needed. Just don't magically remove code that I wrote.

You can already do that: use a non-optimising compiler (tcc and the like) or compile without optimisations.

willvarfar 103 days ago [-]

C/C++ coding standards I've seen always warned against using unsigned to mean "cannot be negative", because of precisely this kind of bug.

The standards I'm recalling were in-house standards, but a quick Google shows that Google have published their standard and they warn against unsigned too: https://google.github.io/styleguide/cppguide.html#Integer_Ty...

Bottom line: avoid unsigned for numbers. Use unsigned only for bags of bits.

esrauch 103 days ago [-]

I think the situation is a lot more nuanced than that. The issue was that additition overflow is undefined behavior for both signed ints and pointers (but it is defined for unsigned ints).

The fix commit changes it to use an unsigned integer type instead, so it seems like a concrete example for the exact opposite of your suggestion.

banachtarski 103 days ago [-]

There is no undefined behavior here? Unsigned overflow wraps around and this is correct. Always has been in C++ for as long as I can remember.

kentonv 103 days ago [-]

Unsigned overflow wraps around, but signed and pointer overflow are UB. In this case, the code was depending on pointer overflow wrapping around.

detaro 103 days ago [-]

I don't think pointer arithmetic is defined to behave as unsigned. Unsigned is the exception in that it's overflow behavior is defined.

banachtarski 103 days ago [-]

edit: oh gosh I totally misread something

> Since farPointer.offset is an unsigned number, the compiler is able to conclude that target < segmentStart always evaluates false. Thus, the compiler removes this part of the check. Unfortunately, in the case of overflow, this is exactly the part of the check that we need.

This certainly seems like a compiler bug!

raverbashing 103 days ago [-]

I'm going to be honest here, whenever I read things like this more I'm convinced C belongs in the trash can

I would accept that pointer overflow is implementation dependent. Allowing it to be "undefined behaviour" is sincerely BS.

Every processor has a "pointer register". Increment it until it overflows. What happens then? Most of the time it will roll over. I suspect this happens in even early RISC architectures that faint if you look at it too hard. There's your answer

It's right, it's not a compiler bug. C tries to solve all problems and cops out on the real issues. The language is broken.

banachtarski 103 days ago [-]

No, it's definitely a compiler bug here.

Pxtl 103 days ago [flagged] [dead] [-]

Please tell me this reply is a joke. If not, the irony is hilarious.

sctb 103 days ago [-]

We detached this subthread from https://news.ycombinator.com/item?id=14164763 and marked it off-topic.

jjnoakes 103 days ago [-]

I don't know why you think my reply is a joke. The person I replied to basically said (paraphrasing) "compilers which optimize based on undefined behavior and follow the standards are written by idiots", and I don't think that's accurate by any stretch of the imagination.

Now, if you want to talk about changes that should perhaps be made to the standards, that's a different discussion all together (and also has nothing to do with intelligence, since the UB in the standard today is there to facilitate portability and optimizations, both of which are highly important to C and C++).

kbenson 103 days ago [-]

A major point of the comment was that being technically correct isn't useful if all you've done is ignore the actual problem in lieu of a small subset of it that doesn't actually help in any way.

You then proceeded to call out a small subset of the comment and yes, you were technically correct. You also ignored the actual point of the comment and didn't really contribute anything to because of it, thus doing exactly what the comment was referring to.

> The person I replied to basically said (paraphrasing) "compilers which optimize based on undefined behavior and follow the standards are written by idiots"

That's not what was said. What was said is that if someone continues to hide behind technicalities rather than addressing the problem problem, then that person is not worth talking to. If you view your original reply in that light...

jjnoakes 103 days ago [-]

> then that person is not worth talking to

That's a highly charitable reading of what was written.

I responded to only part of the comment because only part of the comment offended me: "I'll find someone smarter."

Sure, if one ignores the hyperbole (like a compiler deleting all of your files compiling your UB source code, because that ever happened), some of the idea has merit. But the expression of that core idea was not done well. But I ignored all of that.

The idea that a compiler author must be mentally deficient in some way for following the language standard is absurd.

Technical correctness should not be the final word in any discussion (and I never said it was), but calling someone "not smart" for being technically correct in a situation where technical correctness has a lot of value (even if you are focused on the drawbacks) in a comment doesn't really deserve a complete and thoughtful reply.

kbenson 103 days ago [-]

> That's a highly charitable reading of what was written.

It is. I also think a case could be made that if someone can't get past a technicality when repeatedly asked to, they are lacking in intelligence in one aspect or another, or are purposefully being obstructive. Note that "find someone smarter" doesn't necessarily mean "you are stupid". It does imply that further discussion with this person on this topic will be pointless though.

> I responded to only part of the comment because only part of the comment offended me: "I'll find someone smarter."

I submit that perhaps because of the greater context and your own views, you might have read more into the words used that strictly called for. I think the original comment is essentially saying "If I keep saying I have a real problem and you keep deferring that it doesn't matter because of some technicality, talking to you is a waste of my time," albeit in slightly more colorful language.

jjnoakes 103 days ago [-]

> Note that "find someone smarter" doesn't necessarily mean "you are stupid". It does imply that further discussion with this person on this topic will be pointless though.

It also implies "Someone smarter will see it my way" which implies "I'm smarter than you" which is getting to part of the root of what's offensive about that wording.

> "talking to you is a waste of my time"

That's slightly less offensive than what was said because it no longer implies a "smarter" person would be less of a "waste of my time", but it's still fairly offensive.

kbenson 103 days ago [-]

> It also implies "Someone smarter will see it my way" which implies "I'm smarter than you" which is getting to part of the root of what's offensive about that wording.

Yes, but in this case, "my way" is referring to the understanding that real problems need real solutions and that continuous deflection based on technicalities doesn't help with that. I would count that as an indicator that one person was smarter than the other in that specific context. To be clear, I don't believe intelligence can be measured along a single axis in any useful way (I've met plenty of "smart" people that acted very stupidly, and plenty of "stupid" people that showed amazing amount of competence and intelligence about certain things), so I didn't read the initial statement as indicating intelligence overall of an individual.

> That's slightly less offensive than what was said because it no longer implies a "smarter" person would be less of a "waste of my time", but it's still fairly offensive.

It's also fairly subjective. It's your right to be offended at what you want, but I would caution against being offended without confirming the intent and/meaning behind the statements. I've shown how I interpreted it somewhat differently than you, so there's at least some ambiguity.

jjnoakes 103 days ago [-]

> I would caution against being offended without confirming the intent and/meaning behind the statements.

Which is why I merely replied with "I don't know why you think this has anything to do with lack of intelligence."

I could have said "you are wrong and it has nothing to do with lack of intelligence" (which is what I believe), or I could have gone further and said or implied I'm smarter than he is (which I didn't do, but which the original commenter said I did for some reason), but I didn't.

So I believe you are preaching to the choir here.

kbenson 103 days ago [-]

Understood. In that respect, it's unfortunate that your original reply happened on cursory review to match what the original comment was talking about in general, even if it didn't really in intent. Without that context, there would have been nothing for anyone to object to (or more accurately, misinterpret and call negative attention to).

Pxtl 103 days ago [-]

You made a correction on a technicality of a post complaining about our discipline's fastidious obsession with correcting technicalities.

geocar 103 days ago [flagged] [dead] [-]

> The person I replied to basically said (paraphrasing) "compilers which optimize based on undefined behavior and follow the standards are written by idiots"

That's not what I said, it's not what I meant, and I don't think a reasonable person reading my statement would interpret it that way.

That you think I'm stupid simply because you didn't understand what I meant, is a common programmer trait, and I don't think it's a good one.

jjnoakes 103 days ago [-]

Where did I say I thought you were stupid?

lawnchair_larry 103 days ago [-]

I think I understand your sentiment, but it doesn't apply here. Folks arguing technical correctness in the context of compilers and the C standards are not doing so in order to be pedantic over hypotheticals. Being "technically correct" is important in C because compiler authors reasonably take many liberties in order to improve both compile-time and run-time performance, not to mention reducing cost and complexity of developing the compiler itself. C makes certain guarantees, and compilers will take those for granted where it benefits them, rather than conservatively analyze for situations that are theoretically impossible. You can't say a compilers is dumb because the user gave it invalid code, and in no way is this hiding behind technicalities.

It's also not much of a stretch to say that invoking undefined behavior can do things like delete everything on your hard drive. In fact, I've seen it happen many times. Any memory corruption exploit fundamentally relies on the fact that the author invoked undefined behavior. Although it may be undefined, it's still deterministic, so an attacker can construct an input that abuses the failure modes of your particular implementation to run arbitrary code of their choosing, like spawning a shell process and binding it to a port, or fetching some malware.

C guarantees that none of this can happen, but when you don't follow the rules of C, they really do mean it when they say anything can happen. It's not because the compiler authors are being trolls and feel entitled to delete your files or launch a rocket out of spite. It's because your machine is now executing who-knows-what.

geocar 103 days ago [-]

Now it's definitely specified that a compiler can interpret this:

    void*x = trk->x;
    if(!trk) return 0;
    return doit(x);

as:

    return doit(trk->x);

but is that right?

Compiler writers can argue that this kind of optimization is useful because it makes other kinds of optimizations easier to implement, but the argument isn't that it's "technically correct" so they must. They have to argue on the merits of the decision, and if they fail to be convincing, people will find other solutions.

The DNS specification required clients use source port 53. I think this is dumb because it made it easy to spoof DNS requests -- simply generate 30k UDP packets in a short period of time for a domain name that is commonly requested (like yahoo.com) on a DNS server that's popular (like opendns).

If only clients would use a random source port, the number of packets goes into the billions.

A few DNS clients ignored the specification because it was wrong, but a few DNS clients stood their ground insisted that it was specified so it was right. I remember the smear campaign referring to one of those secure DNS clients as "not standard compliant". Brutish, but it was effective: Some people remained insecure for a very long time simply because they trusted the standardization process as right and truth.

jbb67 103 days ago [-]

This is a broken compiler. It might well meet the "standard", but it's still broken because it doesn't do what any reasonable person would expect it to do.

kutkloon7 103 days ago [-]

Which means the standard is broken, not the compiler.

Search: