Jay Taylor's notes

back to listing index

Notes on the M4 Macro Language

[web search]
Original source (mbreen.com)
Tags: programming programming-languages macros templates build-pipeline m4 mbreen.com
Clipped on: 2020-04-09

Notes on the M4 Macro Language

Michael Breen © 2008



About this document

Which m4?

This document describes GNU m4, as included with Linux; areas of potential incompatibility of which I am aware are mentioned as they arise and highlighted with a boldface “GNU”.

This was originally based on GNU m4 version 1.4.5; it has been updated for version 1.4.10.

Who should read this?

You may find this helpful if

  • you want to decide whether m4 is the tool you need for some task (once you get a rough idea of what the language is about, you might want to skip down to the comparative assessment)
  • you need to quickly get up to speed on m4, or revise or (perhaps) learn more about the language

You should already be familiar with fundamental programming concepts (e.g., recursion).

How is this different from the manual?

There is a substantial overlap between the GNU m4 info pages and this document. The info pages are designed to be a comprehensive reference. This document is a much shorter “m4 by example” which is still “practically” complete – that is, I have tried to include:

  • everything helpful in using m4 effectively
  • anything that might cause a problem if you weren't aware of it

Examples of the kind of details omitted are:

  • experimental features that may disappear in future versions
  • the ways different versions of m4 handle the changequote macro (in practice, all you need to know are the restrictions to observe in order to ensure compatibility)
  • details on the myriad debugging flags: effective debugging is possible using just two or three flags and macros

There is also some original material here:

  • tips, e.g., macros to protect unbalanced quote characters inside quotes
  • different examples

What is m4?

M4 can be called a “template language”, a “macro language” or a “preprocessor language”. The name “m4” also refers to the program which processes texts in this language: this “preprocessor” or “macro processor” takes as input an m4 template and sends this to the output, after acting on any embedded directives, called macros.

At its most basic, it can be used for simple embedded text replacement. If m4 receives the input

  define(AUTHOR, William Shakespeare)
  A Midsummer Night's Dream
  by AUTHOR

then it outputs

  A Midsummer Night's Dream
  by William Shakespeare

While similar in principle to the better-known C preprocessor, it is a far more powerful, general-purpose tool. Some significant uses are:

  • sendmail: sendmail's rather cryptic configuration file (/etc/mail/sendmail.cf) is generated using m4 from a template file that is much easier to read and edit (/etc/mail/sendmail.mc).
  • GNU Autoconf: m4 macros are used to produce “configure” scripts which make source code packages portable across different Unix-like platforms.
  • Security Enhanced Linux: SELinux policy files are (at time of writing) processed using m4. (In fact, m4 is the source of some difficulties here because its flexibility allows abuses and makes automated policy analysis difficult to apply.)

Basics: Simple macros, whitespace, quoting, comments

M4 is a Unix filter program. Its arguments, if any, are the files it is to read; if none is specified then it reads from stdin. The resulting text is sent to stdout.

M4 comes with an initial set of built-in macros, often simply called “builtins”. The most basic of these, define, is used to create new macros:

  define(AUTHOR, W. Shakespeare)

After this definition, the word “AUTHOR” is recognized as a macro that expands to “W. Shakespeare”.

The define macro itself – including its two arguments – expands to an empty string, that is, it produces no output. However the newline at the end of the AUTHOR definition above would be echoed to the output. If a blank line added to the output is a problem then you can suppress it using the “delete to newline” macro:

  define(AUTHOR, W. Shakespeare)dnl

There is no space between the end of the macro and the dnl: if there were then that space would be echoed to the output.

No whitespace is allowed between a macro name and the opening parenthesis. Any whitespace before the beginning of a parameter is discarded. Thus the following definition is equivalent to the one above:

  define(
     AUTHOR,W. Shakespeare)dnl

It's also possible to pass definitions on the command line using the -D option, for example:

  m4 -DAUTHOR="W. Shakespeare" -DYEAR=1587 input_file.m4

Quoting a string suppresses macro expansion. The default quote characters are the backtick (`) and apostrophe ('). M4 strips off these delimiters before outputting the string. Thus

  define(AUTHOR, W. Shakespeare)dnl
  `AUTHOR' is AUTHOR

produces the output

  AUTHOR is W. Shakespeare

For conciseness, most examples will show m4's output in the following way:

  `AUTHOR' is AUTHOR       # -> AUTHOR is W. Shakespeare

In m4, the hash character # is the default opening delimiter of a comment. A comment lasts up to and including the following newline character. The contents of a comment are not examined by m4; however, contrary to what you might expect, comments are echoed to the output. Thus, the previous line, if entered in full, would actually produce the output

  AUTHOR is W. Shakespeare       # -> AUTHOR is W. Shakespeare

Opening comment delimiters can be protected by quotes:

  `#' AUTHOR              # -> # W. Shakespeare

Nested quotes are recognized as such:

  ``AUTHOR'' is AUTHOR     # -> `AUTHOR' is W. Shakespeare

Quoted strings can include newlines:

  define(newline,`line
  break')
  a newline here

outputs

  a line
  break here

Without a matching opening quote character (`), a closing quote (') is simply echoed to the output. Thus

  `AUTHOR
   ' is AUTHOR.''

produces

  AUTHOR
    is W. Shakespeare.''

M4 also understands nested parentheses within a macro's argument list:

  define(PARENS, ())
  brackets: PARENS         # -> brackets: ()

Unbalanced parentheses can be quoted to protect them:

  define(LPAREN,`(')
  define(RPAREN,`)')
  LPAREN bracketed RPAREN  # -> ( bracketed )

(Unbalanced quote characters are more problematic; a solution is given later.)

Pitfall: In fact, quoting of the macro name is also recommended. Consider the following:

  define(LEFT, [)
  LEFT                     # -> [
  define(LEFT, {)
  LEFT                     # -> [

Why didn't the second define work? The problem is that, within the second define, the macro LEFT was expanded before the define macro itself took effect:

  define(LEFT, {)          # -> define([, {) ->

That is, instead of redefining the macro LEFT, a new macro named [ was defined. GNU m4 allows macros to have non-standard names, including punctuation characters like [. In fact, the new macro doesn't seem to work either:

  [                        # -> [

That's because GNU m4 doesn't ordinarily recognize a macro as a macro unless it has a valid name – that is, a sequence of ASCII letters, underscores, or digits, beginning with an underscore or letter. For example, my_macro1 and _1stMacro are both valid names; my.macro1 and 1stMacro are not. (We will see later how the ability to define macros with invalid names can be useful.)

Quoting the macro's arguments avoids this problem:

  define(`LEFT',`[')
  LEFT                     # -> [
  define(`LEFT',`{')
  LEFT                     # -> {

For the same reason, the undefine macro will normally work as expected only if its argument is quoted:

  define(`RIGHT', `]')
  undefine(RIGHT)          # -> undefine(]) ->
  RIGHT                    # -> ]
  undefine(`RIGHT')
  RIGHT                    # -> RIGHT

(Note that undefine does not complain if it is given the name of a non-existent macro, it simply does nothing.)

How m4 works

M4's behaviour can be mystifying. It is best to get an early understanding of how it works. This should save you time figuring out what's going on when it doesn't do what you expect.

First, m4 looks for tokens in its input – roughly speaking, it divides it into quoted strings, macro arguments, names (i.e., identifiers), numbers and other symbols (punctuation characters). Whitespace (including newlines), numbers and punctuation usually mark token boundaries; exceptions are when they appear within a quoted string or a macro argument.

  define( `Version2', A – 1 )99Version2:Version2_   Version22
  # -> 99A – 1 :Version2_   Version22

Above, since a valid name can include digits but cannot begin with one, the names seen after the definition are Version2, Version2_, and Version22; only the first of these corresponds to a defined macro.

Continuing:

  Version2(arg1, arg2) Version2 (junk) garbage(trash)Version2()
  # -> A – 1  A – 1  (junk) garbage(trash)A – 1

If the name of a macro is followed immediately by a '(' then m4 reads in a list of arguments. The Version2 macro we have defined ignores its arguments -- but that doesn't matter to m4: it swallows up the arguments and outputs only the macro's expansion “A – 1 ”.

In general, m4 passes input tokens and separators straight through to the output, making no change except to remove the quotes surrounding quoted string tokens. When it encounters a macro name, however, it stops echoing to the output. Instead:

  1. it reads in the macro's arguments (if any)
  2. it determines the expansion of the macro and inserts this expansion at the beginning of its input
  3. m4 continues scanning the input, starting with the expansion

If while reading in a macro's arguments, m4 encounters another macro then it repeats this process for the nested macro.

An example makes this clearer:

  define(`definenum', `define(`num', `99')')
  num                      # -> num
  definenum num            # -> define(`num', `99') num ->  99

As soon as m4 gets to the end of “definenum” on the last line above, it recognizes it as a macro and replaces it with “define(`num', 99)” -- however, instead of outputting this expansion, it sticks it back on the beginning of its input buffer and starts again from there. Thus, the next thing it reads in is “define(`num', 99)”. As the define macro expands to an empty string, nothing is output; however, the new macro num is now defined. Then m4 reads in a space which it echoes to the output, followed by the macro num, which it replaces with its expansion. The last line therefore results in the output “ 99”.

Unless a nested macro is quoted, it is expanded immediately:

  define(`definenum', define(`num', `99'))
  num                      # -> 99
  definenum                # ->

Here, when m4 reads in the nested define macro, it immediately defines num; it also replaces the macro “define(`num', `99')” with its expansion – an empty string. Thus, “definenum” ends up being defined as an empty string.

Arbitrary nesting is possible -- with (ordinarily) an extra layer of protective quotes at each level of nesting:

  define(`definedefineX',`define(`defineX',`define(`X',`xxx')')')
  defineX X           # -> defineX X
  definedefineX X     # ->  X
  defineX X           # ->  xxx

If rescanning of a macro's expansion is not what you want then just add more quotes:

  define(`stmt',``define(`Y',`yyy')'')
  stmt                # -> define(`Y',`yyy')
  Y                   # -> Y

Above, the outermost quotes are removed when the nested macro is being read in – so stmt expands first to `define(`Y',`yyy')'; m4 then rescans this as a string token and removes the second layer of quotes before sending it to the output.

Now consider the definition

  define(`plus', `+')

Suppose we want to use this plus macro twice in succession with no intervening space. Clearly, plusplus doesn't work – it is read as a single token, plusplus, not two plus tokens:

  plusplus       # -> plusplus

We can use an argument list as a separator:

  plus()plus     # -> ++

But watch what happens with an extra level of indirection:

  define(`oper', `plus')
  oper()oper     # -> plusoper

Here, oper() expands to plus; but then rescanning of the input starts from the beginning of the expansion. Thus, the next thing read in is the token plusoper. As it doesn't correspond to a macro, it is copied straight to the output.

The problem can be solved by adding an empty quote as a separator:

  oper`'oper     # -> plus`'oper -> +`'oper -> ... -> ++

It is a good idea to include such a separator in macro definitions as a matter of policy:

  define(`oper',`plus`'')
  oper()oper     # -> plus`'oper -> +`'oper -> +oper -> ... -> ++

If ever m4 seems to hang or stop working, it is probably because a faulty macro has sent it into an infinite loop:

  define(`Bye', `Bye for now')
  Hello.         # -> Hello.
  Bye.           # -> Bye for now. -> Bye for now for now. -> ...

Such an error is not always this obvious: the cycle may involve more than one macro.

Finally, look at this example:

  define(`args', ``NAME', `Marie'')
  define(args)                   # -> define(`NAME', `Marie') ->
  NAME                           # -> Marie
  
  args(define(`args',`Rachel'))  # -> args() -> `NAME', `Marie' -> NAME, Marie
  args                           # -> Rachel

In the second part of the example, although args doesn't take an argument, we can still pass it one. In this case the argument redefines the macro that's currently being expanded. However, it is the expansion that was in force when the macro identifier was read in that is output.

Similarly, it is possible to define a self-modifying macro or even a self-destructing macro:

  define(`msg', `undefine(`msg')Secret message.')
  msg            # -> Secret message.
  msg            # -> msg

Recursive macros can also be defined.

Quotes, escaping and non-ASCII characters

A deficiency of m4 is that there is no escape character. This means that if you want to use the backtick (`) for anything other than an opening quote delimiter you need to take care. Sometimes you can just add an extra layer of quotes:

  I said, ``Quote me.''     # -> I said, `Quote me.'

However, in other cases, you might need an opening quote without m4 interpreting it as such.

The general way around this problem is to use the changequote macro, e.g.,

  changequote(<!,!>)
  a `quo<!ted str!>ing'

outputs

  a `quoted string'

Without parameters, changequote restores the default delimiters.

In general, it is best to avoid using changequote. You can define macros to insert literal quotes should you need them.

Sometimes, however, it is necessary to change the quote character globally, e.g., because the backtick character is not available on some keyboards or because the text being processed makes extensive use of the default quote characters. If you do use changequote then be aware of the pitfalls:

GNU m4's changequote can differ from other implementations of m4 and from earlier versions of GNU m4. For portability, call changequote only with two arguments – or with no arguments, i.e.,

  changequote`'    # (trailing `' is separator if needed)

Note that changequote changes how existing macros are interpreted, e.g.,

  define(x,``xyz'')
  x                    # -> xyz
  changequote({,})
  x                    # -> `xyz'

Don't choose the same delimiter for the left and right quotes: doing so makes it impossible to have nested quotes.

Don't change a quote delimiter to anything that begins with a letter or underscore or a digit; m4 won't complain but it only recognizes a delimiter if it starts with a punctuation character. A digit may be recognized as a delimiter but not if it is scanned as part of the preceding token.

While later versions of GNU m4 have a greater tolerance for non-ASCII characters (e.g., the pound sign or an accented character) it is better to avoid them, certainly in macro names and preferably in delimiters too. If you do use 8-bit characters and m4 is not behaving quite as you expect, this may be the reason. Where multibyte character encoding is used, m4 should not be used at all.

Comments

As mentioned above, line comments are echoed to the output, e.g.,

  define(`VERSION',`A1')
  VERSION # VERSION `quote' unmatched`

expands to

  A1 # VERSION `quote' unmatched`

Comments are not very useful. However, even if you don't use them you need to remember to quote any hash character in order to prevent it being interpreted as the beginning of a comment:

  `#' VERSION    -> # A1

You can change the opening comment delimiter, e.g., changecom(`@@') – as with changequote, the new delimiter should start with a punctuation character.

If you want echoing block comments, you can also change the closing delimiter, e.g., for C-like comments,

  changecom(/*,*/)
  VERSION `quote' /* VERSION
  `quote' ` */ VERSION
  # ->
  # A1 quote /* VERSION
  # `quote' ` */ A1

Without arguments, changecom restores the default comment delimiters.

Alternatives to comments

For a comment that should not be echoed to the output, use dnl: this macro not only prevents the following newline from being output (as we saw above), it also discards everything up to the newline.

  dnl These two lines will not result
  dnl in any output.

Non-echoing block comments: multiline comments that are not echoed to the output can be written like this

  ifelse(`
  This is a comment
  spanning more than
  one line.
  ')dnl

This is a hack which takes advantage of the fact that the ifelse macro (described below) has no effect if it is passed only one argument. Some versions of m4 may therefore issue a warning about insufficient arguments; GNU m4 doesn't.

Be sure there are no unmatched quotes in the comment text.

Conditionals

ifdef(`a',b) outputs b if a is defined; ifdef(`a',b,c) outputs c if a is not defined. The definition being tested may be empty, e.g.,

  define(`def')
  `def' is ifdef(`def', , not )defined.
  # -> def is defined.

ifelse(a,b,c,d) compares the strings a and b. If they match, the macro expands to string c; if not, string d.

This can be extended to multiple else-ifs:

  ifelse(a,b,c,d,e,f,g)

means that if a matches b, then return (expand to) c; else if d matches e, then return f; else return g. In other words, it's shorthand for

  ifelse(a,b,c,ifelse(d,e,f,g))

Numbers

M4 normally treats numbers as strings. However, the eval macro allows access to integer arithmetic; expressions can include these operators (in order of precedence)

+ - unary plus and minus
** exponent
* / % multiplication, division, modulo (eval(8/-5) -> -1)
+ - addition and subtraction
<< >> shift up or down (eval(-8>>1) -> -4)
== != < <= >= > relational
! logical not (converts non-zero to 0, 0 to 1)
~ bitwise not (eval(~0) -> -1)
& bitwise and (eval(6&5) -> 4)
^ bitwise exclusive or (eval(3^2) -> 1)
| bitwise or (eval(1|2) -> 3)
&& logical and
|| logical or

The above table is for GNU m4; unfortunately, the operators and precedence are version-dependent. Some versions of m4 incorrectly treat ^ the same as ** (exponent). For maximum compatibility, make liberal use of parentheses to enforce precedence.

Should you need it, octal, hexadecimal and indeed arbitrary radix arithmetic are available. It's also possible to specify the width of eval's output. (See the m4 info pages for details on these.)

  eval(7*6)        # -> 42
  eval(7/3+100)    # -> 102

There are also incr and decr builtins as shortcuts which expand to the argument plus or minus one, e.g., incr(x) is equivalent to eval(x+1):

  define(`n', 0)
  n # -> 0
  define(`n', incr(n))
  n # -> 1

Beware of silent integer overflow, e.g., on my machine, the integer range is -2**31 ... 2**31-1; eval(2**31) erroneously expands to -2147483648.

Logical conditions can be checked like this:

  `n' is ifelse(eval(n < 2), 1, less than ,
     eval(n = 2), 1, , greater than )2

Strings

len:

  len(`hello')                     # -> 5

substr:

  substr(`hello', 1, 3)            # -> ell
  substr(`hello', 2)               # -> llo

index:

  index(`hello',`llo')             # -> 2
  index(`not in string', `xyz')    # -> -1

translit:

  define(`ALPHA', `abcdefghijklmnopqrstuvwxyz')
  define(`ALPHA_UPR', `ABCDEFGHIJKLMNOPQRSTUVWXYZ')
  define(`ROT13', `nopqrstuvwxyzabcdefghijklm')
  
  translit(`abc ebg13', ALPHA, ALPHA_UPR)
  # -> ABC EBG13
  
  translit(`abc ebg13', ALPHA, ROT13)
  # -> nop rot13

GNU m4 includes some additional string macros: regexp, to search for a regular expression in a string, and patsubst, to do find and replace.

Unfortunately, m4's usual approach of rescanning the expansion of a macro can be a problem with macros that operate on strings:

  define(`eng',`engineering')
  substr(`engineer',0,3)           # -> eng -> engineering
  translit(`rat', ALPHA, ROT13)    # -> eng -> engineering

This is not normally the desired behaviour and is arguably a design bug in m4: the builtins should at least provide some way to allow us to prevent the extracted or transformec substring from being expanded. A workaround is suggested below.

Defining macros with arguments; a recursive macro

In standard m4 (Unix), a macro can have up to 9 arguments; within the macro definition, these are referenced as $1 ... $9. (GNU m4 has no fixed limit on the number of arguments.) Arguments default to the empty string, e.g., if 2 arguments are passed then $3 will be empty.

Going in at the deep end, here is a reimplementation of the len builtin (replacing it) as a recursive macro.

  define(`len',`ifelse($1,,0,`eval(1+len(substr($1,1)))')')

In a macro definition, argument references like $1 expand immediately, regardless of surrounding quotes. For example, len(`xyz') above would expand (at the first step) to

  ifelse(xyz,,0,`eval(1+len(substr(xyz,1)))')')

Where necessary, this immediate expansion can be prevented by breaking up the reference with inside quotes, e.g., $`'1.

The name of the macro is given by $0; $# expands to the number of arguments. Note in the following example that empty parentheses are treated as delimiting a single argument: an empty string:

  define(`count', ``$0': $# args')
  count        # -> count: 0 args
  count()      # -> count: 1 args
  count(1)     # -> count: 1 args
  count(1,)    # -> count: 2 args

$* expands to the list of arguments; $@ does the same but protects each one with quotes to prevent them being expanded:

  define(`list',`$`'*: $*; $`'@: $@')
  list(len(`abc'),`len(`abc')')
  # -> $*: 3,3; $@: 3,len(`abc')

A common requirement is to process a list of arguments where we don't know in advance how long the list will be. Here, the shift macro comes in useful – it expands to the same list of arguments with the first one removed:

  shift(1,2, `abc', 4)       # -> 2,abc,4
  shift(one)                 # ->
  define(`echolast',`ifelse(eval($#<2),1,`$1`'',
    `echolast(shift($@))')')
  echolast(one,two,three)    # -> three

Scope of macros; local variables

All macros have global scope.

What if we want a “local variable” – a macro that is used only within the definition of another macro? In particular, suppose we want to avoid accidentally redefining a macro used somewhere else.

One possibility is to prefix “local” macro names with the name of the containing macro. Unfortunately, this isn't entirely satisfactory – and it won't work at all in a recursive macro. A better approach is described in the next section.

Pushing and popping macro definitions

For each macro, m4 actually creates a stack of definitions – the current definition is just the one on top of the stack. It's possible to temporarily redefine a macro by using pushdef to add a definition to the top of the stack and, later, popdef to destroy only the topmost definition:

  define(`USED',1)
  define(`proc',
    `pushdef(`USED',10)pushdef(`UNUSED',20)dnl
  `'`USED' = USED, `UNUSED' = UNUSED`'dnl
  `'popdef(`USED',`UNUSED')')
  proc     # -> USED = 10, UNUSED = 20
  USED     # -> 1

If the macro hasn't yet been defined then pushdef is equivalent to define. As with undefine, it is not an error to popdef a macro which isn't currently defined; it simply has no effect.

In GNU m4, define(X,Y) works like popdef(X)pushdef(X,Y), i.e., it replaces only the topmost definition on the stack; in some implementations, define(X) is equivalent to undefine(X)define(X,Y), i.e., the new definition replaces the whole stack.

Macros that don't expand without arguments

When GNU m4 encounters a word such as “define” that corresponds to a builtin that requires arguments, it leaves the word unchanged unless it is immediately followed by an opening parenthesis.

  define(`MYMACRO',`text')    # ->
  define a macro              # -> define a macro

Actually, we can say that m4 does expand the macro – but that it expands only to the same literal string. We can make our own macros equally intelligent by adding an ifelse – or an extra clause to an existing “ifelse”:

  define(`reverse',`ifelse($1,,,
   `reverse(substr($1,1))`'substr($1,0,1)')')
  reverse drawer: reverse(`drawer')     # ->  drawer: reward
  
  define(`reverse',`ifelse($#,0,``$0'',$1,,,
   `reverse(substr($1,1))`'substr($1,0,1)')')
  reverse drawer: reverse(`drawer')     # -> reverse drawer: reward

Name clashes: making macro names safe

Unfortunately, some macros do not require arguments and so m4 has no way of knowing whether a word corresponding to a macro name is intended to be a macro call or just accidentally present in the text being processed.

Also, other versions of m4, and older versions of GNU m4, may expand macro names which are not followed by arguments even where GNU m4 does not:

  # GNU m4 1.4.10
  we shift the responsibility    # -> we shift the responsibility
  # GNU m4 1.4.5
  we shift the responsibility    # -> we  the responsibility

In general, the problem is dealt with by quoting any word that corresponds to a macro name:

  we `shift' the responsibility  # -> we shift the responsibility

However if you are not fully in control of the text being passed to m4 this can be troublesome. Many macro names, like “changequote”, are unlikely to occur in ordinary text. Potentially more problematic are dictionary words that are recognized as macros even without arguments:

  • divert, undivert (covered below)
  • windows

(“windows” – as well as “unix” and “os2” – is defined in some versions of m4 as a way of testing the platform on which m4 is running; by default it is not defined in GNU m4.)

An alternative to quoting macro names is to change all m4's macro names so that they won't clash with anything. Invoking m4 with the -P command-line option prefixes all builtins with “m4_”:

  define(`M1',`text1')M1          # -> define(M1,text1)M1
  m4_define(`M1',`text1')M1       # -> text1

On the basis that unnecessary changes to a language are generally undesirable, I suggest not using -P option if you can comfortably avoid it.

However, if you are writing a set of m4 macros that may be included by others as a module, do add some kind of prefix to your own macros to reduce the possibility of clashes.

Loops

Although m4 provides no builtins for iteration, it is not difficult to create macros which use recursion to do this. Various implementations can be found on the web. This author's “for” loop is:

  define(`for',`ifelse($#,0,``$0'',`ifelse(eval($2<=$3),1,
    `pushdef(`$1',$2)$4`'popdef(`$1')$0(`$1',incr($2),$3,`$4')')')')
  
  for n = for(`x',1,5,`x,')...      # -> for n = 1,2,3,4,5,...
  
  for(`x',1,3,`for(`x',0,4,`eval(5-x)') ')
  # -> 54321 54321 54321

Note the use of pushdef and popdef to prevent loop variables clobbering any existing variable; in the nested for loop, this causes the second x to hide (shadow) the first one during execution of the inner loop.

A “for each” macro might be written:

  define(`foreach',`ifelse(eval($#>2),1,
    `pushdef(`$1',`$3')$2`'popdef(`$1')dnl
  `'ifelse(eval($#>3),1,`$0(`$1',`$2',shift(shift(shift($@))))')')')
  
  foreach(`X',`Open the X. ',`door',`window')
  # -> Open the door. Open the window.
  
  foreach(`X',`foreach(`Y',`Y the X. ',`Open',`Close')',`door',`window')
  # -> Open the door. Close the door. Open the window. Close the window.
  
  define(`OPER',``$2 the $1'')
  foreach(`XY',`OPER(XY). ', ``window',`Open'', ``door',`Close'')
  # -> Open the window. Close the door.

In a “for” loop of either kind, it can be useful to know when you've reached the last item in the sequence:

  define(`foreach',`ifelse(eval($#>2),1,
    `pushdef(`last_$1',eval($#==3))dnl
  `'pushdef(`$1',`$3')$2`'popdef(`$1')dnl
  `'popdef(`last_$1')dnl
  `'ifelse(eval($#>3),1,`$0(`$1',`$2',shift(shift(shift($@))))')')')
  
  define(`everyone',``Tom',`Dick',`Harry'')
  foreach(`one',`one`'ifelse(last_one,0,` and ')',everyone).
  # -> Tom and Dick and Harry.

Finally, a simple “while” loop macro:

  define(`while',`ifelse($#,0,``$0'',eval($1+0),1,`$2`'$0($@)')')
  
  define(`POW2',2)
  while(`POW2<=1000',`define(`POW2',eval(POW2*2))')
  POW2                             # -> 1024

Here, the apparently redundant +0 in eval($1+0) does have a purpose: without it, a while without arguments expands to

  ifelse(0,0,``while'',eval() ...

whereupon eval() produces an empty argument warning.

Suspending and discarding output: Buffers and redirection

To discard output – in particular, to prevent newlines in a set of definitions being output – use divert:

  divert(-1)
  <definitions...>
  divert(0)dnl

Unlike the contents of a comment, the definitions (and any other macros) are still processed by m4; divert(-1) merely causes m4 to do this silently, without sending anything to the output.

The last line above, with its dnl to prevent the following newline being echoed, could also have been written:

  divert`'dnl

divnum expands to the number of the currently active diversion; 0, the default, means standard output (stdout); positive numbers are temporary buffers which are output in numeric order at the end of processing. Standard m4 has 9 buffers (1..9); in GNU m4 there is no fixed limit.

undivert(num) appends the contents of diversion num to the current diversion (normally stdout), emptying it; without arguments, undivert retrieves all diversions in numeric order. Note that undivert() is the same as undivert(0) and has no effect: diversion 0 is stdout which is effectively an empty buffer.

The contents of the buffer are not interpreted when undivert is run, they are simply output as raw text, e.g., the following code results in Z Z Z being output (not 9 9 9):

  divert(1)
  Z Z Z
  divert
  define(`Z',9)
  undivert(1)

There is an implicit divert and undivert when m4 reaches the end of the input, i.e., all buffers are flushed to the standard output. If you want to avoid this for any reason, you can of course discard the contents of the buffers by putting the following line at the end of your input

  divert(-1)undivert

or by exiting using the m4exit builtin.

Including files

include(filename.m4) causes the contents of the named file to be read and interpreted as if it was part of the current file (just like #include in the C preprocessor).

GNU m4 allows for an include file search path. To specify directories to be searched for include files use the -I option on the command line, e.g.,

  m4 -I ~/mydir -Ilocaldir/subdir

or use the environment variable M4PATH, e.g. (bash shell)

  export M4PATH=~/mydir:localdir/subdir
  m4 test.m4

sinclude(nonexistentfile) (silent include) is a version of include that doesn't complain if the file doesn't exist.

To include a file uninterpreted, GNU m4 allows undivert to be passed a filename argument. If inc.m4 contains

  define(`planet',`jupiter')

then

  undivert(`inc.m4')       # -> define(`planet',`jupiter')
  planet                   # -> planet
  include(`inc.m4')planet  # -> jupiter

Accessing the shell; creating temporary files

A system command can be passed to the shell, e.g.,

  syscmd(`date --iso-8601|sed s/-/./g')

outputs something like 2007.10.16.

The output from the command sent to syscmd is not interpreted:

  syscmd(`echo "define(\`AUTHOR',\`Orwell')"')
                          # -> define(`AUTHOR',`Orwell')
  AUTHOR                  # -> AUTHOR

However GNU m4 provides another macro, esyscmd, that does process the output of the shell command:

  esyscmd(`echo "define(\`AUTHOR',\`Orwell')"')
                          # ->
  AUTHOR                  # -> Orwell

The macro sysval expands to the exit status of the last shell command issued (0 for success):

  sysval                  # -> 0
  esyscmd(`ls /no-dir/')
  sysval                  # -> 2

Naturally, m4 can be used as a filter in shell scripts or interactively:

  echo "eval(98/3)"|m4

outputs 32.

Temporary files can be created to store the output of shell commands: maketemp(prefixXXXXXX) creates a temporary file and expands to the filename – this name will be the (optional) prefix with the six X's replaced by six random letters and digits. In older versions of GNU m4 and in other implementations of m4, the X's are generated from the process ID. In certain contexts, this may be a security hole. Another macro, mkstemp, is available in newer m4's which always generates a random filename extension.

  define(`FILENAME',mkstemp(`/tmp/myscriptXXXXXX'))

The temporary file can be read in using include (perhaps in conjunction with divert).

Debugging

Most bugs relate to problems with quoting so check that first.

If you want to see step-by-step what m4 is doing, either invoke it with the -dV option or, to limit full debug output to one part of the file,

  debugmode(V)
  ...problematic section...
  debugmode

The V flag is for full debugging; other flags for finer control are described in the info pages.

dumpdef(`macro', ...) outputs to standard error the formatted definition of each argument – or just <macro> if macro is a builtin; dumpdef without arguments dumps all definitions to stderr. Nothing is sent to stdout.

For user-defined macros, defn(`macro') expands to the definition string (i.e., not prefixed by the macro name).

errprint(`this message goes to standard error (stderr)')

Aliasing and renaming macros (including builtins)

Suppose we want to allow strlen to be used instead of len. This won't work:

  define(`strlen',`len')
  strlen(`hello')           # -> len

because we forgot to relay the arguments:

  define(`strlen',`len($@)')
  strlen(`hello')           # -> 5

OK, but suppose we want to replace len altogether. Clearly, this doesn't work:

  define(`strlen',`len($@)')undefine(`len')
  strlen(`hello')           # -> len(hello)

since expansion now stops at len.

However, using the builtin defn to access the definition of a macro, it's possible to alias or rename macros quite simply. For user-defined macros, defn expands to the text of the macro (protected with quotes before being output). The defn of a builtin expands in most contexts to the empty string – but when passed as an argument to “define” it expands to a special token that has the desired effect:

  define(`rename', `define(`$2',defn(`$1'))undefine(`$1')')
  rename(`define',`create')
  create(`vehicle',`truck')
  vehicle                   # -> truck
  define(`fuel',`diesel')   # -> define(fuel,diesel)
  fuel                      # -> fuel

And, because the intelligence is built into the macro definition, m4 is still smart enough not to expand the word “create” unless it is followed by arguments – compare the indirect approach, where defn is not used:

  create a macro            # -> create a macro
  create(`new',`create($@)')
  new(`wheels', 6)
  new wheels                # ->  6

Accessing internal builtins

Even when you undefine a builtin or define another macro with the same name, GNU m4 still keeps the internal definition which can be called indirectly via the macro builtin:

  define(`TREE',`maple')
  undefine(`define',`undefine')
  undefine(`TREE')             # -> undefine(TREE)
  TREE                         # -> maple
  builtin(`undefine',`TREE')
  TREE                         # -> TREE
  builtin(`define',`create',`builtin'(``define'',$`'@))
  create(`TREE',`ash')
  TREE                         # -> ash

(Note the judicious use of quotes for the last argument to the call to builtin which defines the create macro above. Because of the use of inner quotes, the usual approach of surrounding the whole argument with quotes, i.e.,

  builtin(`define',`create',`builtin(`define',$`'@)')

would not have worked as desired: instead, any call to the create macro would have ended up defining a macro called “$@”.)

Because they can be accessed only indirectly and so don't need to be protected, the names of these internal macros are not changed by the -P flag.

Macros for literal quotes

The obvious way to prevent the characters ` and ' being interpreted as quotes is to change m4's quote delimiters as described above. This has some drawbacks, for example, to ensure the new delimiters don't accidentally occur anywhere else, more than one character may be used for each delimiter – and if there's a lot of quoting, the code will become more verbose and perhaps more difficult to read.

Another approach is to keep m4's existing quote delimiters and define macros which hide the backtick and apostrophe from m4. The trick is to balance the quotes while m4 still sees them as nested quotes, temporarily change the quoting, and then prevent one of the quotes being output:

  define(`LQ',`changequote(<,>)`dnl'
  changequote`'')
  define(`RQ',`changequote(<,>)dnl`
  'changequote`'')
  
  define(myne, `It`'RQ()s mine!')
  LQ()LQ()myne''                 # -> ``It's mine!''

Indirect macro calls

GNU m4 allows any macro to be called indirectly using the macro indir:

  indir(`define',`SIZE',78)
  SIZE                           # -> 78
  indir(`SIZE')                  # -> 78

This is useful where the name of the macro to be called is derived dynamically or where it does not correspond to a token (i.e., a macro name with spaces or punctuation).

Compared to an ordinary call, there are two differences to be aware of:

  • the called macro must exist, otherwise m4 issues an error
  • the arguments are processed before the definition of the macro being called is retrieved
  indir(`define(`SIZE')',67)
  # -> m4: undefined macro `define(`SIZE')'
  indir(`SIZE', indir(`define',`SIZE',53))    # -> 53
  indir(`SIZE', indir(`undefine',`SIZE'))
  # -> m4: undefined macro `SIZE'

We can of course define our own higher-order macros. For example, here is a macro, do, roughly similar to indir above:

  define(do, $1($2, $3, $4, $5))
  do(`define', ``x'', 4)
  x                              # -> 4

Since extra arguments are normally ignored, do works for any macro taking up to 4 arguments. Note however that the example here, which expands to define(`x', 4, , , ), does generate a warning: “excess arguments to builtin `define' ignored”.

Recursion pitfall: nesting limits

Pretend we don't know that the sum n + (n-1) + ... + 1 is given by n*(n+1)/2 and so we define a recursive macro to calculate it:

  define(`sigma',`ifelse(eval($1<=1),1,$1,`eval($1+sigma(decr($1)))')')

If too large a number is passed to this macro then m4 may crash with a message like

  ERROR: recursion limit of 1024 exceeded

(for GNU m4 1.4.10). In fact, the problem is not that sigma is recursive, it is the degree of nesting in the expansion, e.g., sigma(1000) will expand to

  eval(1000 + eval(999 + eval(998 + eval(997 + ...

The nesting limit could be increased using a command line option (-L). However, we do better to avoid the problem by performing the calculation as we go using an extra parameter as an accumulator:

  define(`sigma',`ifelse(eval($1<1),1,$2,`sigma(decr($1),eval($2+$1))')')

Now, no matter how many steps in the expansion, the amount of nesting is limited at every step, e.g., sigma(1000) becomes

  ifelse(eval(1000<1),1,,`sigma(decr(1000),eval(+1000))')

which becomes sigma(999,1000) which in turn expands to

  ifelse(eval(999<1),1,1000,`sigma(decr(999),eval(1000+999))')

and so on.

Here, the default value of the added parameter (an empty string) worked OK. In other cases, an auxiliary macro may be required: the auxiliary macro will then be the recursive one; the main macro will call it, passing the appropriate initial value for the extra parameter.

Using unexpanding macros for arrays and hashes

Although it is not standard, GNU m4 allows any text string to be defined as a macro. Since only valid identifiers are checked against macros, macros whose names include spaces or punctuation characters will not be expanded. However, they can still be accessed as variables using the defn macro:

  define(`my var', `a strange one')
  my var is defn(`my var').    # -> my var is a strange one.

This feature can be used to implement arrays and hashes (associative arrays):

  define(`_set', `define(`$1[$2]', `$3')')
  define(`_get', `defn(`$1[$2]')')
  _set(`myarray', 1, `alpha')
  _get(`myarray', 1)                  # -> alpha
  _set(`myarray', `alpha', `omega')
  _get(`myarray', _get(`myarray',1))  # -> omega
  defn(`myarray[alpha]')              # -> omega

String macro problem workaround

Above, we noted a problem with the string macros: it's not possible to prevent the string that's returned from being expanded.

Steven Simpson wrote a patch for m4 which fixes the problem by allowing an extra parameter to be passed to string macros – however this of course means using a non-standard m4.

A less radical fix is to redefine the substr macro as follows. It works by extracting the substring one letter at a time, thus avoiding any unwanted expansion (assuming, of course, that no one-letter macros have been defined):

  define(`substr',`ifelse($#,0,``$0'',
  $#,2,`substr($@,eval(len(`$1')-$2))',
  `ifelse(eval($3<=0),1,,
  `builtin(`substr',`$1',$2,1)`'substr(
  `$1',eval($2+1),eval($3-1))')')')dnl
  
  define(`eng',`engineering')
  substr(`engineer',0,3)       # -> eng

To keep it simple, this definition assumes reasonably sensible arguments, e.g., it doesn't allow for substr(`abcdef', -2) or substr(`abc'). Note that, as with the corresponding builtin substr, you may have problems where a string contains quotes, e.g., substr(``quoted'',0,3)

The new version of substr can in turn be used to implement a new version of translit:

  define(`translit',`ifelse($#,0,``$0'',
  len(`$1'),0,,
  `builtin(`translit',substr(`$1',0,1),`$2',`$3')`'translit(
  substr(`$1',1),`$2',`$3')')')dnl
  
  define(`ALPHA', `abcdefghijklmnopqrstuvwxyz')
  define(`ALPHA_UPR', `ABCDEFGHIJKLMNOPQRSTUVWXYZ')
  translit(`alpha', ALPHA, ALPHA_UPR)
  # -> ALPHA

M4: Assessment

M4's general character as a macro language can be seen by comparing it to another, very different macro language: FreeMarker.

GNU m4 and FreeMarker are both free in both senses of the word: FreeMarker is covered by a BSD-style license. They are more-or-less equally “powerful”, e.g., both languages support recursive macros.

In some respects, m4 has an edge over FreeMarker:

  • m4 is a standalone tool, FreeMarker requires Java.
  • On Unix platforms, m4 is a standard tool with a long heritage – e.g., a Makefile can reasonably expect to be able invoke it as a filter in a processing sequence.
  • m4 scripts can interact with the Unix shell.
  • m4 is arguably a simpler, “cleaner”, macro language.

The two languages are quite different in appearance and how they work. In m4, macros are ordinary identifiers; FreeMarker uses XML-like markup for the <#opening> and </#closing> delimiters of macros. While m4's textual rescanning approach is conceptually elegant, it can be confusing in practice and demands careful attention to layers of nested quotes. FreeMarker, in comparison, works like a conventional structured programming language, making it much easier to read, write and debug. On the other hand, FreeMarker markup is more verbose and might seem intrusive in certain contexts, for example, where macros are used to extend an existing programming language.

FreeMarker has several distinct advantages:

  • it has an associated tool, FMPP, which can read in data from different sources (e.g., in XML or CSV format) and incorporate it into the template output.
  • FreeMarker has a comprehensive set of builtin macros and better data handling capabilities.
  • No compatibility issues: there is a single, cross-platform implementation that is quite stable and mature (whereas differences even between recent GNU m4 versions are not strictly backwardly compatible).
  • FreeMarker supports Unicode; m4 is generally limited to ASCII, or at best 8-bit character sets.

Ultimately, which language is “better” depends on the importance of their relative advantages in different contexts. This author has very positive experience of using FreeMarker/FMPP for automatic code generation where, for several reasons, m4 was unsuitable. On the other hand, m4 is clearly a more sensible and appropriate choice for Unix sendmail's configuration macros.