Jay Taylor's notes

back to listing index

lcamtuf's blog: Automatically inferring file syntax with afl-analyze

[web search]
Original source (lcamtuf.blogspot.com)
Tags: cyber-security context-triggered-piecewise-hashes fuzzy-hashes afl-analyze american-fuzzy-lop file-syntax-inference lcamtuf.blogspot.com
Clipped on: 2016-04-27

This is a personal blog. My other stuff: book | home page | Twitter | CNC robotics | electronics

February 09, 2016

Automatically inferring file syntax with afl-analyze

The nice thing about the control flow instrumentation used by American Fuzzy Lop is that it allows you to do much more than just, well, fuzzing stuff. For example, the suite has long shipped with a standalone tool called afl-tmin, capable of automatically shrinking test cases while still making sure that they exercise the same functionality in the targeted binary (or that they trigger the same crash). Another similar tool, afl-cmin, employed a similar trick to eliminate redundant files in any large testing corpora.

The latest release of AFL features another nifty new addition along these lines: afl-analyze. The tool takes an input file, sequentially flips bytes in this data stream, and then observes the behavior of the targeted binary after every flip. From this information, it can infer several things:

  • Classify some content as no-op blocks that do not elicit any changes to control flow (say, comments, pixel data, etc).
  • Checksums, magic values, and other short, atomically compared tokens where any bit flip causes the same change to program execution.
  • Longer blobs exhibiting this property - almost certainly corresponding to checksummed or encrypted data.
  • "Pure" data sections, where analyzer-injected changes consistently elicit differing changes to control flow.

This gives us some remarkable and quick insights into the syntax of the file and the behavior of the underlying parser. It may sound too good to be true, but actually seems to work in practice. For a quick demo, let's see what afl-analyze has to say about running cut -d ' ' -f1 on a text file:

Image (Asset 1/3) alt=

We see that cut really only cares about spaces and newlines. Interestingly, it also appears that the tool always tokenizes the entire line, even if it's just asked to return the first token. Neat, right?

Of course, the value of afl-analyze is greater for incomprehensible binary formats than for simple text utilities; perhaps even more so when dealing with black-box parsers (which can be analyzed thanks to the runtime QEMU instrumentation supported in AFL). To try out the tool's ability to deal with binaries, let's check out libpng:

Image (Asset 2/3) alt=

This looks pretty damn good: we have two four-byte signatures, followed by chunk length, four-byte chunk name, chunk length, some image metadata, and then a comment section. Neat, right? All in a matter of seconds: no configuration needed and no knobs to turn.

Of course, the tool shipped just moments ago and is still very much experimental; expect some kinks. Field testing and feedback welcome!


  1. Image (Asset 3/3) alt=

    i'm trying to understand where does it fit to ? do i need to run the target first with analyze and then give the results back as more samples to the afl-fuzz ? or this is now something that by default afl-fuzz is doing to the inputs before it start doing more fuzzing to get more test cases or this is just effecting on the mutation strategy ?

    what is the full real usage of this new feature ?

    1. See http://lcamtuf.coredump.cx/afl/technical_details.txt for a complete explanation of what AFL does and how afl-analyze leverages that information. It's a very simple bit flip technique, but it probably helps to understand the broader context of AFL instrumentation and its properties.

      The usage is very simple, something like:

      $ ./afl-analyze -i input_file_to_annotate ./target_binary

      [ or ./target_binary @@ if the program expects a file name, rather than reading from stdin. ]

      The target binary needs to be compiled with afl-gcc / afl-clang beforehand. See http://lcamtuf.coredump.cx/afl/README.txt for more.

  2. ok i got it so this tool will just help me to get to the best minimization of the needed input files ?
    is there an output dir that if i give all the inputs it will output what are the best minimization inputs that will be the best ? currently i'm only seeing it's just outputting for me to decide if it's better for me or not..

    1. The tool that lets you reduce file size is called afl-tmin.

Comment as:
Notify me

Subscribe to: Post Comments (Atom)