Jay Taylor's notes

back to listing index

Add variables to Python traceback | Hacker News

[web search]
Original source (news.ycombinator.com)
Tags: python traceback minority-report-enhance stackprinter dev-feedback-loop news.ycombinator.com
Clipped on: 2020-12-18

Image (Asset 1/2) alt=
Image (Asset 2/2) alt=
I use stackprinter[1] for the same. It's advisable to suppress printing of library functions (`suppressed_paths=["lib/python"])`) otherwise deep stack traces (as some libraries---hello pandas---like to produce) will cause the actually-problematic code to be elided. Long reprs are also truncated so your 2GB JSON buffer isn't going to blow up your trace.

this:

    Traceback (most recent call last):
      File "demo.py", line 12, in <module>
        dangerous_function(somelist + anotherlist)
      File "demo.py", line 6, in dangerous_function
        return sorted(blub, key=lambda xs: sum(xs))
      File "demo.py", line 6, in <lambda>
        return sorted(blub, key=lambda xs: sum(xs))
    TypeError: unsupported operand type(s) for +: 'int' and 'str'
becomes:

    File demo.py, line 12, in <module>
        9        somelist = [[1,2], [3,4]]
        10       anotherlist = [['5', 6]]
        11       spam = numpy.zeros((3,3))
    --> 12       dangerous_function(somelist + anotherlist)
        13   except:
        ..................................................
         somelist = [[1, 2, ], [3, 4, ], ]
         anotherlist = [['5', 6, ], ]
         spam = 3x3 array([[0. 0. 0.]
                           [0. 0. 0.]
                           [0. 0. 0.]])
        ..................................................

    [...]

    File demo.py, line 6, in <lambda>
        3
        4
        5    def dangerous_function(blub):
    --> 6        return sorted(blub, key=lambda xs: sum(xs))
        7
        ..................................................
         xs = ['5', 6, ]
        ..................................................

    TypeError: unsupported operand type(s) for +: 'int' and 'str'
[1] https://github.com/cknd/stackprinter


Some self advertisement: I implemented some very similar Python module a while ago (more than 9 years ago actually, but updated all the time):

better_exchook: https://pypi.org/project/better-exchook/ / https://github.com/albertz/py_better_exchook

It's just a single file, so you can easily copy it into your project.

It's a bit more advanced than this as well, as it has some further logic about what variables it actually shows:

* It shows only those which are used in the statement of the active line. In bigger functions, you often have other unrelated locals, and this would become incredibly spammy otherwise. Also, the use locals in the current line are almost always exactly those you are interested in.

* It also shows globals (this one only shows locals).

Like the original traceback and this one, it shows the current active line, and not the context around (e.g. like in IPython or stackprinter).

However, it has an extension: If the current statement goes over multiple lines, it will print them as well.

It also has an extension for DomTerm that you can fold away the variables.


This reminds me of a trick Monotone VCS in C++ had to print out specific variables upon failure. Usually they'd do nothing, but upon hitting an invariant or similar it would dump all the listed ones out. A bit harder to do cheaply in a safe language like Rust I guess...

  MM(variabletodump);
example https://github.com/graydon/monotone/blob/master/graph.cc#L49... docs https://github.com/graydon/monotone/blob/master/HACKING#L252

The implementation was a bit hairy https://github.com/graydon/monotone/blob/master/sanity.hh#L4...


The usual Rust answer would be to build a structured wrapper error (which you'd usually have to do anyway, once you have more than one error source). Then you can just add whatever debug data you want to it.


An alternative is to reuse pytest version:

    exc_info = ExceptionInfo.from_current()
    print(exc_info.getrepr(style="long", showlocals=True))


Odd that cgitb.text is not mentioned anywhere as the core feature (format tracebacks including local variables referenced from the expression the exception comes from) is what it does.


Opening an issue to ask for a FAQ comparison might help people evaluate the differences.

For those like me who didn’t know about this: https://docs.python.org/3/library/cgitb.html

While the function seems similar, the interfaces seem very different. cgitb seems to need to be wrapped and configured to match the fully headless/logger use case. I think they could focus on that in the FAQ.


Neat idea but it doesn’t look like something I’d be comfortable using in production. A useful feature to have would be to selectively tag the variables I want to track (in the decorator perhaps?). This way one can prevent printing sensitive info or potentially large variables in the logs.


Agree - which is why I use Sentry, which has sensible default and configurable PII filters [0] for staging/production.

For debugging, eh, why not - but I've not previously had the need to enable this, let's say, application-wide (where I see the most advantage compared to just adding dumps of locals() per stack).

[0] https://docs.sentry.io/platforms/python/configuration/option...


Blind logging of all variables (as the writeup implies) is definitely a nightmare. You can end up with things like passwords, credit card details etc in the logs. If you're running a log concentrator, now they're in the concentrator - and did you remember to set ACLs on who can read from the concentrator (assuming the servers have good ACLs already for reading logs directly from prod servers)?

In dev, sure (assuming local disk only). In prod, not in the form documented.


Do you think having a set of opt-out functions/decorators for sensitive variables would be comforting?

I could see it going either way. I would probably not use this globally on programs using secrets, but would consider opting-in occasionally where chances are low and value high.


Consider the scenario where you develop a daemon that does something, has no secrets that you worry about. You implement this concept of automatic logging of variables, and all is well in the world.

You leave. New person takes over ownership, doesn't notice the import. Urgent business need says "add this feature to the daemon" and now it's processing secrets. Now, you can argue that the person who took ownership should have read the code more carefully, but perhaps it's their 5th day on the job when that request comes in, or they misread the import as something else.

I suppose you could do a code standard of "variables that containe a secret must always have 'secret' as a prefix", and could then do pattern matching, but humans are going to human and make a mistake.

I'd probably lean towards opt-in, decorator/context only - maybe "@traceback_with_variables_can_log_secrets" as the name...


> Do you think having a set of opt-out functions/decorators for sensitive variables would be comforting?

Opt-out is risky and I’d not want to use it in sensitive applications.

Instead, opt-in would be as good as deliberately logging variables but without the additional log/print statements and try/catch blocks.


I wrote https://github.com/skorokithakis/tbvaccine years ago for the same purpose, and the fix is simple: Just don't install it on production.

It requires no changes to your code, you can install it system-wide and all tracebacks will be prettified.


+1 this seems like a GDPR/CCPA nightmare when it comes to potentially spewing user data into logs.


I wonder what would happen if extracting value from a 2GB JSON "variable" went wrong....


Yea, seems like a neat library but I'm not sure if it would be a good idea to turn it on for all stack traces globally. Could see issues like the one you mentioned and others like variables that are secrets that shouldn't be logged.


Or.. I guess for your example you can set a max string length.

max_value_str_len max length of each variable string, -1 to disable, default=1000


Does this have runtime overhead when not printing stack traces? I imagine some optimizations might not be do-able if you need to keep around all this data in case it needs to be shown, but perhaps pythons optimizations aren't that advanced...


It shouldn't, what you do is hook the traceback function, though the method is fiddly to get right. I wrote a utility to do the same years ago:

https://github.com/skorokithakis/tbvaccine


Funny, I discovered that I could uses sysexcepthook to do some funny stuff long ago, documented it in my twitter, and everytime I have stuff like this I go to my twitter to copy paste.

https://twitter.com/txomon/status/1128330566531788800


A former colleague of mine did something similar several years ago for Java, intended to be used for failing unit tests: https://github.com/dodie/scott


I usually use

    import ipdb; ipdb.pm()
for quick post mortem debugging, which also allows to see the variables at the moment where the exception was raised (Command locals()). Is there a scenario where this adds value?


Maybe I missed it when reading through, but does this library have a method similar to traceback.format_exc()?


I guess the iterator example would give similar results. I usually like to include an easy to read log message when I catch an exception to give more context about the error.

    return '\n'.join(iter_tb_lines(e))


Neat idea. I wonder if it's possible to do something like this for JavaScript/NodeJS?


Looks interesting for dev mode. But will keep sentry for production environments.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: