back to listing index

runtime: FuncForPC/FileLine/Caller/Callers api interferes with implementation of better inlining · Issue #11432 · golang/go

[web search]
Original source (github.com)
Tags: golang tail-call tail-call-optimization
Clipped on: 2016-04-27

Skip to content

runtime: FuncForPC/FileLine/Caller/Callers api interferes with implementation of better inlining #11432

Open
dr2chase opened this Issue on Jun 26, 2015 · 20 comments

Labels

None yet

Milestone

  Go1.7

Assignee

No one assigned

Notifications

You’re not receiving notifications from this thread.

7 participants

Image (Asset 2/18) alt= dr2chase Image (Asset 3/18) alt= randall77 Image (Asset 4/18) alt= ianlancetaylor Image (Asset 5/18) alt= josharian Image (Asset 6/18) alt= ChrisHines Image (Asset 7/18) alt= rsc Image (Asset 8/18) alt= gopherbot
Image (Asset 9/18) alt=

I'm fishing for comments. The current API gives away too much from the POV of a compiler-writer.

And I need to be clear on one point -- if we don't resolve this, there's well-understood optimizations that Go compilers won't do, and these are sufficiently important optimizations that they affect the way people write programs.


Current (1.<=5) Go only inlines when it can inline entire call trees all the way to the leaves; it cannot, for example, inline a simple function Small that wraps a much larger function Huge, because Huge is not eligible for inlining and thus all callers of Huge are not eligible for inlining. The current total-inlining policies mean that Go programs never observe the call sites at which the inlining occurs, and thus are never presented with the dilemma that there can be a stack of "callers" corresponding to a single return PC.

If we allow more flexible inlining, and if A calls B, B calls runtime.Callers(1, pcslice) and B is inlined into A, the "return pc" for Callers (pcslice[0]) will be an address within A (according to nm) and there will be no return PC identifying B or the file of B or the line number within B.

And unfortunately, this return PC = single caller identity is exposed in an interface -- a "return pc" is a uintptr. We could in theory hack on that integer to embed inline depth information in some unused bits, but in practice that is likely to break some code.

So, if we intend to improve inlining in the future, we need to do something about this. The options I see (and I may be myopic, other options are welcome) include:

(a) simply ignore intermediate callers. In the example, the caller of Callers will be A, and any mention of B is lost. It's a systems programming language, optimizers do this stuff all the time in other languages and you love it when they do, put on your grown-up pants and deal with it.

pros: easy.
And it will prepare people for tail-call elimination.
And it will discourage them from thinking that they can use this for implementing Java-style caller-sensitive security, which is slow, yet difficult to reason about and hard to maintain. Seriously, this has been a source of numerous security holes in Java, in JDK 8 they simplified it as much as possible to reduce their risk (see e.g. http://openjdk.java.net/jeps/176 and https://bugs.openjdk.java.net/browse/JDK-8046166 ).

cons: people might already critically rely on exact stack traces.

(b) hack on the encoding of the return PC to embed an index. I'd propose to grab the upper 2-3 bits on an Intel box or the lower 2 bits on a RISC architecture (this imposes an obvious restriction on inline depth, and I assume that I can grab these bits even on a 32-bit machine), and have the counter indicate the inline depth, thus the 00 case will match what is seen in the nm output (in the above example, "A") and the 01 case corresponds to B. The symbol table information for "A" will need to provide the inlining depth at this site. In the example above, if it is IA32 and the raw RPC is 0x01234567, then the caller of Callers (B) is return PC 0x41234567 and B's caller ("A") is return PC 0x01234567. On a RISC, raw PC 0x03217654 is the same as the RPC for "A" and 0x03217655 is the RPC for "B" (called by A).

pros: no change to interface, we never promised that "returnpc" was good for anything, did we?

cons: people thought we promised "returnpc" was a real live returnpc, and acted accordingly.
Fails to adequately discourage Java-style caller-sensitive security.
Also fails to prepare people for tail-call elimination.

(c) = (a) + extend the interface with a more generalized notion of program counter. Most old code will continue to sort-of work without change and no risk of gigantic surprise (e.g., illegal address dereference from believing that an option-b PC was a real address), new code will use the improved interface.

Note that there's nothing novel about abstracting on the "return PC" to get something that allows you to talk precisely about backtraces in the presence of inlining -- Java implementations have been doing this for more than 15 years.

Image (Asset 10/18) alt= dr2chase added this to the Go1.6 milestone on Jun 26, 2015
Image (Asset 11/18) alt=
On Fri, Jun 26, 2015 at 12:55 PM, dr2chase ***@***.***> wrote: I'm fishing for comments. The current API gives away too much from the POV of a compiler-writer. And I need to be clear on one point -- if we don't resolve this, there's well-understood optimizations that Go compilers won't do, and these are sufficiently important optimizations that they affect the way people write programs. Current (1.<=5) Go only inlines when it can inline entire call trees all the way to the leaves; it cannot, for example, inline a simple function Small that wraps a much larger function Huge, because Huge is not eligible for inlining and thus all callers of Huge are not eligible for inlining. The current total-inlining policies means that go programs never observe the call sites at which the inlining occurs, and thus are never presented with the dilemma that there can be a stack of "callers" corresponding to a single return PC. If we allow more flexible inlining, and if A calls B, B calls runtime.Callers(1, pcslice) and B is inlined into A, the "return pc" for Callers (pcslice[0]) will be an address within A (according to nm) and there will be no return PC identifying B or the file of B or the line number within B. And unfortunately, this return PC = single caller identity is exposed in an interface -- a "return pc" is a uintptr. We could in theory hack on that integer to embed inline depth information in some unused bits, but in practice that is likely to break some code. So, if we intend to improve inlining in the future, we need to do something about this. The options I see (and I may be myopic, other options are welcome) include: (a) simply ignore intermediate callers. In the example, the caller of Callers will be A, and any mention of B is lost. It's a systems programming language, optimizers do this stuff all the time in other languages and you love it when they do, put on your grown-up pants and deal with it. pros: easy. And it will prepare people for tail-call elimination. And it will discourage them from thinking that they can use this for implementing Java-style caller-sensitive security, which is slow, yet difficult to reason about and hard to maintain. Seriously, this has been a source of numerous security holes in Java, in JDK 8 they simplified it as much as possible to reduce their risk (see e.g. http://openjdk.java.net/jeps/176 and https://bugs.openjdk.java.net/browse/JDK-8046166 ). cons: people might already critically rely on exact stack traces.
I'd rather not do this. Not because people might rely programmatically on exact stack traces, but because stack traces with holes in them are hard to understand. I care less about the actual PCs that show up, but somehow text backtraces should make sense in the original (uninlined) program if at all possible.
We did promise it would be a valid argument to FuncForPC/FileLine. There's no need to steal bits - we can assign arbitrary PCs to encode inlined bodies. At one (virtual address) byte per PC, it shouldn't be too expensive.
I think this is what I was describing above. We should do this one.
Image (Asset 12/18) alt=

I struggle with this as well in gccgo. gccgo does of course implement arbitrary inlining (but currently does not do tail calls). Gccgo does full correct unwinding of inlined functions, using the debug info to track inline sites. However, this only works with runtime.Caller. If you use runtime.Callers, and then pass the resulting PC values to FuncForPC, you don't get a proper stack. At an inlining point, you get the same file/line multiple times.

I agree with Keith that the best approach is to encode inline positions in the PC value, somehow.

Image (Asset 13/18) alt=

I'm happy with encoding into "PC" values. The 3 options that quickly come to mind are:

  1. take existing "return PC" and hack into known-unused bits (high order on x86, low order on PPC/MIPS/ARM/SPARC). The advantage of this is that if the "bogus" PCs are filtered out of the stack trace, the ones that remain are the ones that you would see in a machine-level debugger, and also correspond to what you'd see in nm (e.g., A inlines B inlines C, the "good" PC belongs to A, and that's what nm shows).

  2. repurpose other PCs within the image, perhaps correspond to rough inline boundaries (never mind what optimization might do). All PCs are "real", only the innermost one corresponds to a real call site.
    We might even leave marker instructions (nops) in the stream. The advantage is all PCs are real, the disadvantage is that the name of the one that looks like a call site won't match the addresses you see in nm.

  3. make up something completely different that is just designed to get you the information that we promised, that does not correspond to nm at all. Advantage here is that there's no risk of subtle confusion -- these things won't look like PCs at all, and we won't run into any weird corner cases based on attempts to fake it. (This is what we did years ago in a static Java-bytecodes compiler, but we had no installed base to confuse, and sadly, never did).

I think I have an anti-preference against 2. That seems most-confusing and also most likely to lead to non-intuitive hair in things like dead code elimination and code motion.

Where do we stand on tail-call elimination? Are we more nearly "this is an important optimization, users must deal with it" or "stack traces are really important, think about the time added to debugging/diagnosis when information is lost"? There have been proposals/hacks over the years for figuring out ways to deal with TCE, but none of them come close to the (cache, stack size, simplicity) efficiency of the real thing. Maybe we can think of new hacks???

Image (Asset 14/18) alt=

With regard to tail call optimization, one idea that has been discussed in the past (e.g., https://groups.google.com/d/msg/golang-nuts/JZZrFjFKkuM/MV9sSJsFlFEJ) is to add an explicit statement to the language to implement tail calls ("become f()"). Only for Go 2, though.

I think that in the absence of some sort of explicit indication that a tail call is acceptable, the compiler should not implement them. That is, I'm on the side of "stack traces are really important."

That said, let me also mention http://www.dwarfstd.org/ShowIssue.php?issue=100909.2 which is an approach used by GCC to permit tail calls to be reconstructed from the debug info. So one option is to only make a tail call when the call sequence can be reconstructed.

(As far as fake PC values goes I think I would tentatively vote for your option 3, in the sense of use real PC values for simple addresses and completely different values for inlined addresses, but I have no strong feelings about it.)

Image (Asset 15/18) alt=

A few thoughts:

  • Having real PC values as often as possible is valuable. Within the last fortnight, Austin asked for  nm  output to correlate to a stack trace to help debug a runtime bug.
  • Relatedly, if we use invented PCs, it would be nice if they were roughly comprehensible given only the binary and appropriate tooling.
  • Unsurprisingly given the above, I'm also on the side of "stack traces are really important".
  • #4899 might be an interesting use case to look at in this connection. It is entangled with testing API design, but entangled in relevant way, I think.
  • I'm disinclined to shove stuff into unused bits, as it doesn't fail as reliably as you would want when you miss a spot--and you always miss a spot.
Image (Asset 17/18) alt=

Suppose for the return PCs we try the following: if A inlines B inlines C, the single PC in the "real" stack trace will appear within A. Annotate that PC appropriately to indicate the inlining patterns, and for the two synthesized PCs, choose the appropriate in site in non-inlined B where C is called, and the appropriate site in non-inlined C where the backtrace event occurred.

Also write the api and implementation that we wish we had started with, and in the old code recommend the new api and note that sometimes the stack of "PCs" contains white lies.

An improvement?

Image (Asset 18/18) alt=

Your PC choices sound good but they sound hard to implement. If A, B, and C are all in different packages, it's not going to be easy for the compilation of A to know the PC values to use to represent the inlining of B and C. If we can implement them with no extra cost, then, yes, sounds great.

The difficulty in my example is "B" -- if A inlines B and C, then it's likely that B inlines C, and thus there is not necessarily (unless we extend the fake) any good place to say that "B calls C". It would also lead to a proliferation of symbols (most likely). Russ seemed to think (I hope I am not putting words in his mouth) that it would be perhaps okay to just drop PCs from the old interface, as long as we provide a new way to obtain all the funcs/files/lines that apply at a given call site and use that for usual-case user-visible stack traces. I.e., it's not entirely clear that the customers for slices of caller-PC have much use for fake PCs anyhow.

As the author/maintainer of logging packages (github.com/go-kit/kit/log and gopkg.in/inconshreveable/log15.v2) and a standalone package that consumes  runtime.Callers  (gopkg.in/stack.v1) I would like to provide my perspective on this and related issues.

For my use cases I don't care about the actual PC values. I care that I can use them with  runtime.FuncForPC  and get accurate file/line-number/function-name information. I also want my code to report on call-sites rather than return-sites. It is rather cumbersome to deal with the special cases documented by  runtime.Callers  to convert the return-site PCs into call-site PCs. As I said in the mailing list recently, if a new API comes out of this discussion it would be nice if it could simplify the logging/debugging use case.

Dropping inlined call-sites would pose problems to the libraries I build, putting me on the side of "stack traces are really important". Furthermore, the opposite problem—extra entries for  <generated>  functions that appear in stack traces, but not in the code—are also a source of confusion that add little or no value to my use cases. For an example of code that was changed specifically to deal with that issue, see: go-kit/kit#106, and in particular this comment describing the nuances of what's going on. I wonder if a mechanism to provide stack trace data in the face of inlined functions could also hide the  <generated>  functions from stack traces as well?

Hiding generated functions from stack traces is not too far off of #4899 .
I think stack traces are really important, but it's not clear to me if we're better off inserting fake PCs into the existing interface, or making a new interface that allows us to preserve the illusion of no-inlining. I don't have enough experience with Go or the users of this API to say what's best.

As far as runtime.Callers goes, we should probably think in terms of a new interface. Though the transition will be slow. Ideally runtime.Caller will consider inlining.

Of course the most important thing is that stack dumps consider inlining.

rsc commented on Nov 5, 2015

We're not going to eliminate tail calls. Let's just get that out of the way.

rsc changed the title from runtime: FuncForPC/FileLine/Caller/Callers api interferes with implementation of better inlining (and tail-call elimination) to runtime: FuncForPC/FileLine/Caller/Callers api interferes with implementation of better inlining on Nov 5, 2015
rsc modified the milestone: Unplanned, Go1.6 on Nov 5, 2015
rsc commented on Nov 5, 2015

Let's also take "fake PCs" off the table. That will introduce significant complexity and confusion.

New API is needed, but probably not much.

It's true that a single PC may in general correspond to a tiny call stack of func/file/line entries. Assume this:

normaltype Frame struct {
    Func string
    File string
    Line int
}

type Frames struct { ... opaque ... }

func (*Frames) Len() int
func (*Frames) Frame(int) Frame // 0 is inner-most frame
normal

Then:

  1. FuncForPC can be left alone: each byte of code does belong to just one function.
  2. (*Func).FileLine(pc) can continue to return the top-most (outer-most) file/line in the call stack for pc.
  3. (*Func).Frames(pc) can return the more complete Frames information.
  4. Callers can be left alone: there really is a stack of program counters at run time, and Callers can continue to return them. The skip parameter counts frames, not PCs (today those are the same).
  5. Caller can count frames instead of PCs but otherwise is fine. This means that if you have a PC corresponding to two frames, Callers(i) and Callers(i+1) may both return that PC, just with different file, line information. It would be nice if it returned a function name too, but oh well.

People who report full stacks using repeated calls to Callers are already writing quadratic code, so it seems okay that it's imperfect in a different way now too. People who want to report the full call stack more carefully will use Callers to get some PCs and then call FuncForPC+Frames for each of those.

@rsc Have you perhaps written "Callers" in a few places where you meant "Caller"? Specifically in (5) and the last paragraph?

I'm looking at this in the context of also supporting stack tracebacks for C code called by cgo. I would like to support those tracebacks in Callers and get good information for them in runtime/pprof and friends. Currently it's hard to correctly turn the Callers information into a good stack traceback, as can be seen by the fact that the implementation in runtime/pprof is subtly incorrect (it checks for runtime.gopanic where it should check for runtime.sigpanic).

Russ's  Frames  suggestion makes sense for Go, but it's hard to implement for C code, and it doesn't address the complexities of when and how much to adjust the PC value.

I propose that rather than referring to sigpanic in Callers, we instead say

normal// To easily look up file/line information for the call sequence, use
// CallersIterate.
normal

and then we implement

normal// CallersIterator may be used to get function/file/line information
// for a slice of PC values returned by Callers.
type CallersIterator struct ...

// CallersIterate takes a slice of PC returned by Callers and prepares
// to return function/file/line information.
// Do not change the slice until you are done with the CallersIterator.
func CallersIterate(callers []uintptr) *CallersIterator {}

// Next returns function/function entry point/file/line information
// for the next caller.
// If more is false, there are no more callers (the other results are valid).
// This method may return multiple times for a single PC value in the
// original slice, to represent inlined functions.
// If no information is available, function and/or file may be empty,
// and line may be zero.
func (ci* CallersIterator) Next() (pc uintptr, function string, entry uintptr, file string, line int, more bool) {}
normal
ianlancetaylor modified the milestone: Go1.7, Unplanned on Feb 19
ChrisHines referenced this issue in go-kit/kit on Feb 22
Closed

Regarding use of gopkg.in #197

ChrisHines commented on Feb 22

@ianlancetaylor I like your proposed new API for  CallersIterate . I have two questions:

  1. Would the new API also skip  <generated>  functions as I described in my comment above?
  2. Would  CallersIterator.Next  return line numbers for call-sites or return-sites?
  1. I see the handling of generated functions as an orthogonal issue. Personally I think it's best to include them at this level and filter them out at a higher level. But we can decide that separately.
  2. Call-sites.
dr2chase commented on Feb 22

Should the per-"call" information include some information about the function, such as "was generated", "was inlined"? There is certainly a point in the compilation where we know about that, so we could just choose not to forget it.

ChrisHines commented on Feb 22

Combining these ideas, the signature of  CallersIterator.Next  could look like:

func (ci* CallersIterator) Next() (f Frame, more bool)

type Frame struct {
    Func      string
    File      string
    PC        uintptr
    Entry     uintptr
    Line      int
    Generated bool // maybe
    Inlined   bool // maybe
}

Returning a struct type allows future additions and seems less cumbersome to use for API clients.

gopherbot commented on Feb 24

CL https://golang.org/cl/19869 mentions this issue.

Attach files by dragging & dropping, , or pasting from the clipboard.