Jay Taylor's notes
back to listing indexNotes on the M4 Macro Language
[web search]Notes on the M4 Macro Language
Michael Breen © 2008- About this document
- What is m4?
- Basics: Simple macros, whitespace, quoting, comments
- How m4 works
- Quotes, escaping and non-ASCII characters
- Comments
- Alternatives to comments
- Conditionals
- Numbers
- Strings
- Defining macros with arguments; a recursive macro
- Scope of macros; local variables
- Pushing and popping macro definitions
- Macros that don't expand without arguments
- Name clashes: making macro names safe
- Loops
- Suspending and discarding output: Buffers and redirection
- Including files
- Accessing the shell; creating temporary files
- Debugging
- Aliasing and renaming macros (including builtins)
- Accessing internal builtins
- Macros for literal quotes
- Indirect macro calls
- Recursion pitfall: nesting limits
- Using unexpanding macros for arrays and hashes
- String macro problem workaround
- M4: Assessment
About this document
Which m4?
This document describes GNU m4, as included with Linux; areas of potential incompatibility of which I am aware are mentioned as they arise and highlighted with a boldface “GNU”.
This was originally based on GNU m4 version 1.4.5; it has been updated for version 1.4.10.
Who should read this?
You may find this helpful if
- you want to decide whether m4 is the tool you need for some task (once you get a rough idea of what the language is about, you might want to skip down to the comparative assessment)
- you need to quickly get up to speed on m4, or revise or (perhaps) learn more about the language
You should already be familiar with fundamental programming concepts (e.g., recursion).
How is this different from the manual?
There is a substantial overlap between the GNU m4 info pages and this document. The info pages are designed to be a comprehensive reference. This document is a much shorter “m4 by example” which is still “practically” complete – that is, I have tried to include:
- everything helpful in using m4 effectively
- anything that might cause a problem if you weren't aware of it
Examples of the kind of details omitted are:
- experimental features that may disappear in future versions
- the ways different versions of m4 handle the
changequote
macro (in practice, all you need to know are the restrictions to observe in order to ensure compatibility) - details on the myriad debugging flags: effective debugging is possible using just two or three flags and macros
There is also some original material here:
- tips, e.g., macros to protect unbalanced quote characters inside quotes
- different examples
What is m4?
M4 can be called a “template language”, a “macro language” or a “preprocessor language”. The name “m4” also refers to the program which processes texts in this language: this “preprocessor” or “macro processor” takes as input an m4 template and sends this to the output, after acting on any embedded directives, called macros.
At its most basic, it can be used for simple embedded text replacement. If m4 receives the input
define(AUTHOR, William Shakespeare) A Midsummer Night's Dream by AUTHOR
then it outputs
A Midsummer Night's Dream by William Shakespeare
While similar in principle to the better-known C preprocessor, it is a far more powerful, general-purpose tool. Some significant uses are:
- sendmail: sendmail's rather cryptic configuration file (/etc/mail/sendmail.cf) is generated using m4 from a template file that is much easier to read and edit (/etc/mail/sendmail.mc).
- GNU Autoconf: m4 macros are used to produce “configure” scripts which make source code packages portable across different Unix-like platforms.
- Security Enhanced Linux: SELinux policy files are (at time of writing) processed using m4. (In fact, m4 is the source of some difficulties here because its flexibility allows abuses and makes automated policy analysis difficult to apply.)
Basics: Simple macros, whitespace, quoting, comments
M4 is a Unix filter program. Its arguments, if any, are the files it is to read; if none is specified then it reads from stdin. The resulting text is sent to stdout.
M4 comes with an initial set of built-in macros,
often simply called “builtins”.
The most basic of these,
define
, is used to create new macros:
define(AUTHOR, W. Shakespeare)
After this definition, the word “AUTHOR” is recognized as a macro that expands to “W. Shakespeare”.
The define
macro itself – including its two
arguments – expands to an empty string, that is,
it produces no output.
However the newline at the end of the AUTHOR
definition above would be echoed to the output.
If a blank line added to the output is a problem then
you can suppress it using the “delete to newline”
macro:
define(AUTHOR, W. Shakespeare)dnl
There is no space between the end of the macro and the dnl
:
if there were then that space would be echoed to the output.
No whitespace is allowed between a macro name and the opening parenthesis. Any whitespace before the beginning of a parameter is discarded. Thus the following definition is equivalent to the one above:
define( AUTHOR,W. Shakespeare)dnl
It's also possible to pass definitions on the command line
using the -D
option, for example:
m4 -DAUTHOR="W. Shakespeare" -DYEAR=1587 input_file.m4
Quoting a string suppresses macro expansion. The default quote characters are the backtick (`) and apostrophe ('). M4 strips off these delimiters before outputting the string. Thus
define(AUTHOR, W. Shakespeare)dnl `AUTHOR' is AUTHOR
produces the output
AUTHOR is W. Shakespeare
For conciseness, most examples will show m4's output in the following way:
`AUTHOR' is AUTHOR # -> AUTHOR is W. Shakespeare
In m4, the hash character # is the default opening delimiter of a comment. A comment lasts up to and including the following newline character. The contents of a comment are not examined by m4; however, contrary to what you might expect, comments are echoed to the output. Thus, the previous line, if entered in full, would actually produce the output
AUTHOR is W. Shakespeare # -> AUTHOR is W. Shakespeare
Opening comment delimiters can be protected by quotes:
`#' AUTHOR # -> # W. Shakespeare
Nested quotes are recognized as such:
``AUTHOR'' is AUTHOR # -> `AUTHOR' is W. Shakespeare
Quoted strings can include newlines:
define(newline,`line break') a newline here
outputs
a line break here
Without a matching opening quote character (`), a closing quote (') is simply echoed to the output. Thus
`AUTHOR ' is AUTHOR.''
produces
AUTHOR is W. Shakespeare.''
M4 also understands nested parentheses within a macro's argument list:
define(PARENS, ()) brackets: PARENS # -> brackets: ()
Unbalanced parentheses can be quoted to protect them:
define(LPAREN,`(') define(RPAREN,`)') LPAREN bracketed RPAREN # -> ( bracketed )
(Unbalanced quote characters are more problematic; a solution is given later.)
Pitfall: In fact, quoting of the macro name is also recommended. Consider the following:
define(LEFT, [) LEFT # -> [ define(LEFT, {) LEFT # -> [
Why didn't the second define
work?
The problem is that, within the second define
, the
macro LEFT
was expanded before the define macro
itself took effect:
define(LEFT, {) # -> define([, {) ->
That is, instead of redefining the macro LEFT
,
a new macro named [
was defined.
GNU m4 allows macros to have non-standard names,
including punctuation characters like [
.
In fact, the new macro doesn't seem to work either:
[ # -> [
That's because GNU m4 doesn't ordinarily recognize a
macro as a macro unless it has a valid name – that is,
a sequence of ASCII letters, underscores, or digits,
beginning with an underscore or letter.
For example,
my_macro1
and _1stMacro
are both valid names;
my.macro1
and 1stMacro
are not.
(We will see later how the ability to define
macros with invalid names can be useful.)
Quoting the macro's arguments avoids this problem:
define(`LEFT',`[') LEFT # -> [ define(`LEFT',`{') LEFT # -> {
For the same reason, the undefine
macro will
normally work as expected only if its argument is quoted:
define(`RIGHT', `]') undefine(RIGHT) # -> undefine(]) -> RIGHT # -> ] undefine(`RIGHT') RIGHT # -> RIGHT
(Note that undefine
does not complain if it is
given the name of a non-existent macro, it simply
does nothing.)
How m4 works
M4's behaviour can be mystifying. It is best to get an early understanding of how it works. This should save you time figuring out what's going on when it doesn't do what you expect.
First, m4 looks for tokens in its input – roughly speaking, it divides it into quoted strings, macro arguments, names (i.e., identifiers), numbers and other symbols (punctuation characters). Whitespace (including newlines), numbers and punctuation usually mark token boundaries; exceptions are when they appear within a quoted string or a macro argument.
define( `Version2', A – 1 )99Version2:Version2_ Version22 # -> 99A – 1 :Version2_ Version22
Above, since a valid name can include digits but cannot
begin with one, the names seen after the definition are
Version2
, Version2_
, and Version22
;
only the first of these corresponds to a defined macro.
Continuing:
Version2(arg1, arg2) Version2 (junk) garbage(trash)Version2() # -> A – 1 A – 1 (junk) garbage(trash)A – 1
If the name of a macro is followed immediately by a
'(' then m4 reads in a list of arguments.
The Version2
macro we have defined ignores
its arguments -- but that doesn't matter to m4:
it swallows up the arguments and outputs only
the macro's expansion “A – 1 ”.
In general, m4 passes input tokens and separators straight through to the output, making no change except to remove the quotes surrounding quoted string tokens. When it encounters a macro name, however, it stops echoing to the output. Instead:
- it reads in the macro's arguments (if any)
- it determines the expansion of the macro and inserts this expansion at the beginning of its input
- m4 continues scanning the input, starting with the expansion
If while reading in a macro's arguments, m4 encounters another macro then it repeats this process for the nested macro.
An example makes this clearer:
define(`definenum', `define(`num', `99')') num # -> num definenum num # -> define(`num', `99') num -> 99
As soon as m4 gets to the end of “definenum” on the
last line above, it recognizes it as a macro and
replaces it with “define(`num', 99)” --
however, instead of outputting this expansion,
it sticks it back on the beginning of its input buffer
and starts again from there.
Thus, the next thing it reads in is “define(`num', 99)”.
As the define macro expands to an empty string,
nothing is output; however, the new macro num
is
now defined.
Then m4 reads in a space which it echoes to the output,
followed by the macro num
, which it replaces with its
expansion.
The last line therefore results in the output “ 99”.
Unless a nested macro is quoted, it is expanded immediately:
define(`definenum', define(`num', `99')) num # -> 99 definenum # ->
Here, when m4 reads in the nested define
macro,
it immediately defines num
; it also replaces the macro
“define(`num', `99')” with its expansion – an empty string.
Thus, “definenum” ends up being defined as an empty string.
Arbitrary nesting is possible -- with (ordinarily) an extra layer of protective quotes at each level of nesting:
define(`definedefineX',`define(`defineX',`define(`X',`xxx')')') defineX X # -> defineX X definedefineX X # -> X defineX X # -> xxx
If rescanning of a macro's expansion is not what you want then just add more quotes:
define(`stmt',``define(`Y',`yyy')'') stmt # -> define(`Y',`yyy') Y # -> Y
Above, the outermost quotes are removed when the
nested macro is being read in – so stmt
expands
first to `define(`Y',`yyy')'
; m4 then rescans
this as a string token and removes the second layer of
quotes before sending it to the output.
Now consider the definition
define(`plus', `+')
Suppose we want to use this plus
macro twice
in succession with no intervening space.
Clearly, plusplus
doesn't work – it is read as
a single token, plusplus
, not two plus
tokens:
plusplus # -> plusplus
We can use an argument list as a separator:
plus()plus # -> ++
But watch what happens with an extra level of indirection:
define(`oper', `plus') oper()oper # -> plusoper
Here, oper()
expands to plus
; but then rescanning of
the input starts from the beginning of the expansion.
Thus, the next thing read in is the token plusoper
.
As it doesn't correspond to a macro, it is copied straight
to the output.
The problem can be solved by adding an empty quote as a separator:
oper`'oper # -> plus`'oper -> +`'oper -> ... -> ++
It is a good idea to include such a separator in macro definitions as a matter of policy:
define(`oper',`plus`'') oper()oper # -> plus`'oper -> +`'oper -> +oper -> ... -> ++
If ever m4 seems to hang or stop working, it is probably because a faulty macro has sent it into an infinite loop:
define(`Bye', `Bye for now') Hello. # -> Hello. Bye. # -> Bye for now. -> Bye for now for now. -> ...
Such an error is not always this obvious: the cycle may involve more than one macro.
Finally, look at this example:
define(`args', ``NAME', `Marie'') define(args) # -> define(`NAME', `Marie') -> NAME # -> Marie args(define(`args',`Rachel')) # -> args() -> `NAME', `Marie' -> NAME, Marie args # -> Rachel
In the second part of the example, although args
doesn't take an argument, we can still pass it one.
In this case the argument redefines the macro that's
currently being expanded.
However, it is the expansion that was in force when the
macro identifier was read in that is output.
Similarly, it is possible to define a self-modifying macro or even a self-destructing macro:
define(`msg', `undefine(`msg')Secret message.') msg # -> Secret message. msg # -> msg
Recursive macros can also be defined.
Quotes, escaping and non-ASCII characters
A deficiency of m4 is that there is no escape character. This means that if you want to use the backtick (`) for anything other than an opening quote delimiter you need to take care. Sometimes you can just add an extra layer of quotes:
I said, ``Quote me.'' # -> I said, `Quote me.'
However, in other cases, you might need an opening quote without m4 interpreting it as such.
The general way around this problem is to use the
changequote
macro, e.g.,
changequote(<!,!>) a `quo<!ted str!>ing'
outputs
a `quoted string'
Without parameters, changequote
restores the default
delimiters.
In general, it is best to avoid using changequote. You can define macros to insert literal quotes should you need them.
Sometimes, however, it is necessary to change the
quote character globally, e.g., because the backtick
character is not available on some keyboards or
because the text being processed makes extensive use
of the default quote characters.
If you do use changequote
then be aware of the
pitfalls:
GNU m4's changequote
can differ from other
implementations of m4 and from earlier versions of GNU m4.
For portability,
call changequote
only with two arguments –
or with no arguments, i.e.,
changequote`' # (trailing `' is separator if needed)
Note that changequote
changes how existing macros are
interpreted, e.g.,
define(x,``xyz'') x # -> xyz changequote({,}) x # -> `xyz'
Don't choose the same delimiter for the left and right quotes: doing so makes it impossible to have nested quotes.
Don't change a quote delimiter to anything that begins with a letter or underscore or a digit; m4 won't complain but it only recognizes a delimiter if it starts with a punctuation character. A digit may be recognized as a delimiter but not if it is scanned as part of the preceding token.
While later versions of GNU m4 have a greater tolerance for non-ASCII characters (e.g., the pound sign or an accented character) it is better to avoid them, certainly in macro names and preferably in delimiters too. If you do use 8-bit characters and m4 is not behaving quite as you expect, this may be the reason. Where multibyte character encoding is used, m4 should not be used at all.
Comments
As mentioned above, line comments are echoed to the output, e.g.,
define(`VERSION',`A1') VERSION # VERSION `quote' unmatched`
expands to
A1 # VERSION `quote' unmatched`
Comments are not very useful. However, even if you don't use them you need to remember to quote any hash character in order to prevent it being interpreted as the beginning of a comment:
`#' VERSION -> # A1
You can change the opening comment delimiter, e.g.,
changecom
(`@@')
– as with changequote
,
the new delimiter should start with a punctuation character.
If you want echoing block comments, you can also change the closing delimiter, e.g., for C-like comments,
changecom(/*,*/) VERSION `quote' /* VERSION `quote' ` */ VERSION # -> # A1 quote /* VERSION # `quote' ` */ A1
Without arguments, changecom
restores the default
comment delimiters.
Alternatives to comments
For a comment that should not be echoed to the output,
use dnl
: this macro not only prevents the following
newline from being output (as we saw above), it also
discards everything up to the newline.
dnl These two lines will not result dnl in any output.
Non-echoing block comments: multiline comments that are not echoed to the output can be written like this
ifelse(` This is a comment spanning more than one line. ')dnl
This is a hack which takes advantage of the fact that the
ifelse
macro (described below) has no effect if it is
passed only one argument.
Some versions of m4 may therefore issue a warning about
insufficient arguments; GNU m4 doesn't.
Be sure there are no unmatched quotes in the comment text.
Conditionals
ifdef
(`a',b)
outputs b if a is defined;
ifdef(`a',b,c)
outputs c if a is not defined.
The definition being tested may be empty, e.g.,
define(`def') `def' is ifdef(`def', , not )defined. # -> def is defined.
ifelse
(a,b,c,d)
compares the strings a and b.
If they match, the macro expands to string c;
if not, string d.
This can be extended to multiple else-ifs:
ifelse(a,b,c,d,e,f,g)
means that if a matches b, then return (expand to) c; else if d matches e, then return f; else return g. In other words, it's shorthand for
ifelse(a,b,c,ifelse(d,e,f,g))
Numbers
M4 normally treats numbers as strings.
However, the eval
macro allows access to
integer arithmetic;
expressions can include these operators (in order of precedence)
+ - | unary plus and minus |
** | exponent |
* / % | multiplication, division, modulo (eval(8/-5) -> -1 ) |
+ - | addition and subtraction |
<< >> | shift up or down (eval(-8>>1) -> -4 ) |
== != < <= >= > | relational |
! | logical not (converts non-zero to 0, 0 to 1) |
~ | bitwise not (eval(~0) -> -1 ) |
& | bitwise and (eval(6&5) -> 4 ) |
^ | bitwise exclusive or (eval(3^2) -> 1 ) |
| |
bitwise or (eval(1|2) -> 3 ) |
&& | logical and |
|| |
logical or |
The above table is for GNU m4; unfortunately,
the operators and precedence are version-dependent.
Some versions of m4 incorrectly treat ^
the same as **
(exponent).
For maximum compatibility, make liberal use of parentheses
to enforce precedence.
Should you need it, octal, hexadecimal and indeed
arbitrary radix arithmetic are available.
It's also possible to specify the width of eval
's output.
(See the m4 info pages for details on these.)
eval(7*6) # -> 42 eval(7/3+100) # -> 102
There are also incr
and decr
builtins as shortcuts
which expand to the argument plus or minus one, e.g.,
incr(x)
is equivalent to eval(x+1)
:
define(`n', 0) n # -> 0 define(`n', incr(n)) n # -> 1
Beware of silent integer overflow, e.g.,
on my machine, the integer range is -2**31
... 2**31-1
;
eval(2**31)
erroneously expands to -2147483648
.
Logical conditions can be checked like this:
`n' is ifelse(eval(n < 2), 1, less than , eval(n = 2), 1, , greater than )2
Strings
len
:
len(`hello') # -> 5
substr
:
substr(`hello', 1, 3) # -> ell substr(`hello', 2) # -> llo
index
:
index(`hello',`llo') # -> 2 index(`not in string', `xyz') # -> -1
translit
:
define(`ALPHA', `abcdefghijklmnopqrstuvwxyz') define(`ALPHA_UPR', `ABCDEFGHIJKLMNOPQRSTUVWXYZ') define(`ROT13', `nopqrstuvwxyzabcdefghijklm') translit(`abc ebg13', ALPHA, ALPHA_UPR) # -> ABC EBG13 translit(`abc ebg13', ALPHA, ROT13) # -> nop rot13
GNU m4 includes some additional string macros:
regexp
, to search for a regular expression in a
string, and patsubst
, to do find and replace.
Unfortunately, m4's usual approach of rescanning the expansion of a macro can be a problem with macros that operate on strings:
define(`eng',`engineering') substr(`engineer',0,3) # -> eng -> engineering translit(`rat', ALPHA, ROT13) # -> eng -> engineering
This is not normally the desired behaviour and is arguably a design bug in m4: the builtins should at least provide some way to allow us to prevent the extracted or transformec substring from being expanded. A workaround is suggested below.
Defining macros with arguments; a recursive macro
In standard m4 (Unix), a macro can have up to 9 arguments;
within the macro definition, these are referenced as
$1
... $9
.
(GNU m4 has no fixed limit on the number of arguments.)
Arguments default to the empty string, e.g., if 2
arguments are passed then $3
will be empty.
Going in at the deep end, here is a reimplementation of the
len
builtin (replacing it) as a recursive macro.
define(`len',`ifelse($1,,0,`eval(1+len(substr($1,1)))')')
In a macro definition, argument references like $1
expand immediately, regardless of surrounding quotes.
For example, len(`xyz')
above would expand (at the
first step) to
ifelse(xyz,,0,`eval(1+len(substr(xyz,1)))')')
Where necessary, this immediate expansion can be prevented
by breaking up the reference with
inside quotes, e.g., $`'1
.
The name of the macro is given by $0
;
$#
expands to the number of arguments.
Note in the following example that
empty parentheses are treated as delimiting a single argument:
an empty string:
define(`count', ``$0': $# args') count # -> count: 0 args count() # -> count: 1 args count(1) # -> count: 1 args count(1,) # -> count: 2 args
$*
expands to the list of arguments;
$@
does the same but protects each one with quotes
to prevent them being expanded:
define(`list',`$`'*: $*; $`'@: $@') list(len(`abc'),`len(`abc')') # -> $*: 3,3; $@: 3,len(`abc')
A common requirement is to process a list of arguments where
we don't know in advance how long the list will be.
Here, the shift
macro comes in useful – it expands
to the same list of arguments with the first one removed:
shift(1,2, `abc', 4) # -> 2,abc,4 shift(one) # -> define(`echolast',`ifelse(eval($#<2),1,`$1`'', `echolast(shift($@))')') echolast(one,two,three) # -> three
Scope of macros; local variables
All macros have global scope.
What if we want a “local variable” – a macro that is used only within the definition of another macro? In particular, suppose we want to avoid accidentally redefining a macro used somewhere else.
One possibility is to prefix “local” macro names with the name of the containing macro. Unfortunately, this isn't entirely satisfactory – and it won't work at all in a recursive macro. A better approach is described in the next section.
Pushing and popping macro definitions
For each macro, m4 actually creates a stack of definitions –
the current definition is just the one on top of the stack.
It's possible to temporarily redefine a macro by using
pushdef
to add a definition to the top of the stack
and, later, popdef
to destroy only the topmost
definition:
define(`USED',1) define(`proc', `pushdef(`USED',10)pushdef(`UNUSED',20)dnl `'`USED' = USED, `UNUSED' = UNUSED`'dnl `'popdef(`USED',`UNUSED')') proc # -> USED = 10, UNUSED = 20 USED # -> 1
If the macro hasn't yet been defined then pushdef
is
equivalent to define
.
As with undefine
, it is not an error to popdef
a macro which isn't currently defined; it simply has
no effect.
In GNU m4, define(X,Y)
works like
popdef(X)pushdef(X,Y)
, i.e., it replaces only the
topmost definition on the stack;
in some implementations, define(X)
is equivalent to
undefine(X)define(X,Y)
, i.e., the new definition
replaces the whole stack.
Macros that don't expand without arguments
When GNU m4 encounters a word such as “define” that corresponds to a builtin that requires arguments, it leaves the word unchanged unless it is immediately followed by an opening parenthesis.
define(`MYMACRO',`text') # -> define a macro # -> define a macro
Actually, we can say that m4 does expand the macro –
but that it expands only to the same literal string.
We can make our own macros equally intelligent by adding an
ifelse
– or an extra clause to an existing “ifelse”:
define(`reverse',`ifelse($1,,, `reverse(substr($1,1))`'substr($1,0,1)')') reverse drawer: reverse(`drawer') # -> drawer: reward define(`reverse',`ifelse($#,0,``$0'',$1,,, `reverse(substr($1,1))`'substr($1,0,1)')') reverse drawer: reverse(`drawer') # -> reverse drawer: reward
Name clashes: making macro names safe
Unfortunately, some macros do not require arguments and so m4 has no way of knowing whether a word corresponding to a macro name is intended to be a macro call or just accidentally present in the text being processed.
Also, other versions of m4, and older versions of GNU m4, may expand macro names which are not followed by arguments even where GNU m4 does not:
# GNU m4 1.4.10 we shift the responsibility # -> we shift the responsibility # GNU m4 1.4.5 we shift the responsibility # -> we the responsibility
In general, the problem is dealt with by quoting any word that corresponds to a macro name:
we `shift' the responsibility # -> we shift the responsibility
However if you are not fully in control of the text being passed to m4 this can be troublesome. Many macro names, like “changequote”, are unlikely to occur in ordinary text. Potentially more problematic are dictionary words that are recognized as macros even without arguments:
divert
,undivert
(covered below)windows
(“windows” – as well as “unix” and “os2” – is defined in some versions of m4 as a way of testing the platform on which m4 is running; by default it is not defined in GNU m4.)
An alternative to quoting macro names is to change all
m4's macro names so that they won't clash with anything.
Invoking m4 with the -P
command-line option prefixes
all builtins with “m4_”:
define(`M1',`text1')M1 # -> define(M1,text1)M1 m4_define(`M1',`text1')M1 # -> text1
On the basis that unnecessary changes to a language are
generally undesirable, I suggest not using -P
option
if you can comfortably avoid it.
However, if you are writing a set of m4 macros that may be included by others as a module, do add some kind of prefix to your own macros to reduce the possibility of clashes.
Loops
Although m4 provides no builtins for iteration, it is not difficult to create macros which use recursion to do this. Various implementations can be found on the web. This author's “for” loop is:
define(`for',`ifelse($#,0,``$0'',`ifelse(eval($2<=$3),1, `pushdef(`$1',$2)$4`'popdef(`$1')$0(`$1',incr($2),$3,`$4')')')') for n = for(`x',1,5,`x,')... # -> for n = 1,2,3,4,5,... for(`x',1,3,`for(`x',0,4,`eval(5-x)') ') # -> 54321 54321 54321
Note the use of pushdef
and popdef
to prevent
loop variables clobbering any existing variable;
in the nested for
loop, this causes the second x
to
hide (shadow) the first one during execution of the inner loop.
A “for each” macro might be written:
define(`foreach',`ifelse(eval($#>2),1, `pushdef(`$1',`$3')$2`'popdef(`$1')dnl `'ifelse(eval($#>3),1,`$0(`$1',`$2',shift(shift(shift($@))))')')') foreach(`X',`Open the X. ',`door',`window') # -> Open the door. Open the window. foreach(`X',`foreach(`Y',`Y the X. ',`Open',`Close')',`door',`window') # -> Open the door. Close the door. Open the window. Close the window. define(`OPER',``$2 the $1'') foreach(`XY',`OPER(XY). ', ``window',`Open'', ``door',`Close'') # -> Open the window. Close the door.
In a “for” loop of either kind, it can be useful to know when you've reached the last item in the sequence:
define(`foreach',`ifelse(eval($#>2),1, `pushdef(`last_$1',eval($#==3))dnl `'pushdef(`$1',`$3')$2`'popdef(`$1')dnl `'popdef(`last_$1')dnl `'ifelse(eval($#>3),1,`$0(`$1',`$2',shift(shift(shift($@))))')')') define(`everyone',``Tom',`Dick',`Harry'') foreach(`one',`one`'ifelse(last_one,0,` and ')',everyone). # -> Tom and Dick and Harry.
Finally, a simple “while” loop macro:
define(`while',`ifelse($#,0,``$0'',eval($1+0),1,`$2`'$0($@)')') define(`POW2',2) while(`POW2<=1000',`define(`POW2',eval(POW2*2))') POW2 # -> 1024
Here, the apparently redundant +0
in eval($1+0)
does
have a purpose: without it, a while
without arguments
expands to
ifelse(0,0,``while'',eval() ...
whereupon eval()
produces an empty argument warning.
Suspending and discarding output: Buffers and redirection
To discard output – in particular,
to prevent newlines in a set of definitions being output – use
divert
:
divert(-1) <definitions...> divert(0)dnl
Unlike the contents of a comment, the definitions
(and any other macros) are still processed by m4;
divert(-1)
merely causes m4 to do this silently,
without sending anything to the output.
The last line above, with its dnl
to prevent
the following newline being echoed, could also have
been written:
divert`'dnl
divnum
expands to the number of the currently active
diversion; 0, the default, means standard output (stdout);
positive numbers are temporary buffers which are output in
numeric order at the end of processing.
Standard m4 has 9 buffers (1..9); in GNU m4 there is no
fixed limit.
undivert
(num)
appends the contents of diversion num
to the current diversion (normally stdout), emptying it; without
arguments, undivert
retrieves all diversions in numeric order.
Note that undivert()
is the same as undivert(0)
and
has no effect: diversion 0 is stdout which is effectively an
empty buffer.
The contents of the buffer are not interpreted when undivert
is run, they are simply output as raw text, e.g., the following
code results in Z Z Z
being output (not 9 9 9
):
divert(1) Z Z Z divert define(`Z',9) undivert(1)
There is an implicit divert
and undivert
when m4
reaches the end of the input, i.e., all buffers are flushed to
the standard output.
If you want to avoid this for any reason, you can of course
discard the contents of the buffers by putting the following
line at the end of your input
divert(-1)undivert
or by exiting using the m4exit
builtin.
Including files
include
(filename.m4)
causes the contents of the
named file to be read and interpreted as if it was part of
the current file (just like #include
in the C preprocessor).
GNU m4 allows for an include file search path.
To specify directories to be searched for include files use the
-I
option on the command line, e.g.,
m4 -I ~/mydir -Ilocaldir/subdir
or use the environment variable M4PATH
, e.g. (bash shell)
export M4PATH=~/mydir:localdir/subdir m4 test.m4
sinclude
(nonexistentfile)
(silent include) is a
version of include
that doesn't complain if the file
doesn't exist.
To include a file uninterpreted, GNU m4 allows
undivert
to be passed a filename argument.
If inc.m4
contains
define(`planet',`jupiter')
then
undivert(`inc.m4') # -> define(`planet',`jupiter') planet # -> planet include(`inc.m4')planet # -> jupiter
Accessing the shell; creating temporary files
A system command can be passed to the shell, e.g.,
syscmd(`date --iso-8601|sed s/-/./g')
outputs something like 2007.10.16
.
The output from the command sent to syscmd
is not
interpreted:
syscmd(`echo "define(\`AUTHOR',\`Orwell')"') # -> define(`AUTHOR',`Orwell') AUTHOR # -> AUTHOR
However GNU m4 provides another macro,
esyscmd
, that does process the output of the
shell command:
esyscmd(`echo "define(\`AUTHOR',\`Orwell')"') # -> AUTHOR # -> Orwell
The macro sysval
expands to the exit status of the
last shell command issued (0 for success):
sysval # -> 0 esyscmd(`ls /no-dir/') sysval # -> 2
Naturally, m4 can be used as a filter in shell scripts or interactively:
echo "eval(98/3)"|m4
outputs 32.
Temporary files can be created to store the output of shell
commands:
maketemp
(prefixXXXXXX)
creates a temporary file and
expands to the filename – this name will be the (optional) prefix
with the six X's replaced by six random letters and digits.
In older versions of GNU m4 and in other implementations
of m4, the X's are generated from the process ID.
In certain contexts, this may be a security hole.
Another macro, mkstemp
, is available in newer m4's
which always generates a random filename extension.
define(`FILENAME',mkstemp(`/tmp/myscriptXXXXXX'))
The temporary file can be read in using include
(perhaps
in conjunction with divert
).
Debugging
Most bugs relate to problems with quoting so check that first.
If you want to see step-by-step what m4 is doing, either
invoke it with the -dV
option or, to limit full debug output
to one part of the file,
debugmode(V) ...problematic section... debugmode
The V
flag is for full debugging; other flags for finer
control are described in the info pages.
dumpdef
(`macro', ...)
outputs to standard error
the formatted definition of each argument – or just <macro>
if macro
is a builtin;
dumpdef
without arguments dumps all definitions to stderr.
Nothing is sent to stdout.
For user-defined macros, defn
(`macro')
expands
to the definition string (i.e., not prefixed by the macro name).
errprint
(`this message goes to standard error (stderr)')
Aliasing and renaming macros (including builtins)
Suppose we want to allow strlen
to be used instead of
len
.
This won't work:
define(`strlen',`len') strlen(`hello') # -> len
because we forgot to relay the arguments:
define(`strlen',`len($@)') strlen(`hello') # -> 5
OK, but suppose we want to replace len
altogether.
Clearly, this doesn't work:
define(`strlen',`len($@)')undefine(`len') strlen(`hello') # -> len(hello)
since expansion now stops at len
.
However, using the builtin defn
to access the definition
of a macro, it's possible to alias or rename macros quite simply.
For user-defined macros, defn
expands to the text of the
macro (protected with quotes before being output).
The defn
of a builtin expands in most contexts to the empty
string – but when passed as an argument to “define” it expands
to a special token that has the desired effect:
define(`rename', `define(`$2',defn(`$1'))undefine(`$1')') rename(`define',`create') create(`vehicle',`truck') vehicle # -> truck define(`fuel',`diesel') # -> define(fuel,diesel) fuel # -> fuel
And, because the intelligence is built into the macro definition,
m4 is still smart enough not to expand the word “create” unless
it is followed by arguments – compare the indirect approach,
where defn
is not used:
create a macro # -> create a macro create(`new',`create($@)') new(`wheels', 6) new wheels # -> 6
Accessing internal builtins
Even when you undefine a builtin or define another macro
with the same name, GNU m4 still keeps the internal
definition which can be called indirectly via the macro
builtin
:
define(`TREE',`maple') undefine(`define',`undefine') undefine(`TREE') # -> undefine(TREE) TREE # -> maple builtin(`undefine',`TREE') TREE # -> TREE builtin(`define',`create',`builtin'(``define'',$`'@)) create(`TREE',`ash') TREE # -> ash
(Note the judicious use of quotes for the last argument
to the call to builtin
which defines the create
macro above.
Because of the use of inner quotes, the usual approach
of surrounding the whole argument with quotes, i.e.,
builtin(`define',`create',`builtin(`define',$`'@)')
would not have worked as desired: instead, any call to the create macro would have ended up defining a macro called “$@”.)
Because they can be accessed only indirectly and so
don't need to be protected, the names of these
internal macros are not changed by the -P
flag.
Macros for literal quotes
The obvious way to prevent the characters ` and ' being interpreted as quotes is to change m4's quote delimiters as described above. This has some drawbacks, for example, to ensure the new delimiters don't accidentally occur anywhere else, more than one character may be used for each delimiter – and if there's a lot of quoting, the code will become more verbose and perhaps more difficult to read.
Another approach is to keep m4's existing quote delimiters and define macros which hide the backtick and apostrophe from m4. The trick is to balance the quotes while m4 still sees them as nested quotes, temporarily change the quoting, and then prevent one of the quotes being output:
define(`LQ',`changequote(<,>)`dnl' changequote`'') define(`RQ',`changequote(<,>)dnl` 'changequote`'') define(myne, `It`'RQ()s mine!') LQ()LQ()myne'' # -> ``It's mine!''
Indirect macro calls
GNU m4 allows any macro to be called indirectly
using the macro indir
:
indir(`define',`SIZE',78) SIZE # -> 78 indir(`SIZE') # -> 78
This is useful where the name of the macro to be called is derived dynamically or where it does not correspond to a token (i.e., a macro name with spaces or punctuation).
Compared to an ordinary call, there are two differences to be aware of:
- the called macro must exist, otherwise m4 issues an error
- the arguments are processed before the definition of the macro being called is retrieved
indir(`define(`SIZE')',67) # -> m4: undefined macro `define(`SIZE')' indir(`SIZE', indir(`define',`SIZE',53)) # -> 53 indir(`SIZE', indir(`undefine',`SIZE')) # -> m4: undefined macro `SIZE'
We can of course define our own higher-order macros.
For example, here is a macro, do
, roughly similar to
indir
above:
define(do, $1($2, $3, $4, $5)) do(`define', ``x'', 4) x # -> 4
Since extra arguments are normally ignored, do
works
for any macro taking up to 4 arguments.
Note however that the example here, which expands to
define(`x', 4, , , )
,
does generate a warning:
“excess arguments to builtin `define' ignored”.
Recursion pitfall: nesting limits
Pretend we don't know that the sum n + (n-1) + ... + 1
is given by n*(n+1)/2
and so we define a recursive macro
to calculate it:
define(`sigma',`ifelse(eval($1<=1),1,$1,`eval($1+sigma(decr($1)))')')
If too large a number is passed to this macro then m4 may crash with a message like
ERROR: recursion limit of 1024 exceeded
(for GNU m4 1.4.10).
In fact, the problem is not that sigma
is recursive,
it is the degree of nesting in the expansion,
e.g., sigma(1000)
will expand to
eval(1000 + eval(999 + eval(998 + eval(997 + ...
The nesting limit could be increased using a command line
option (-L
).
However, we do better to avoid the problem by performing
the calculation as we go using an extra parameter as an
accumulator:
define(`sigma',`ifelse(eval($1<1),1,$2,`sigma(decr($1),eval($2+$1))')')
Now, no matter how many steps in the expansion, the amount of
nesting is limited at every step, e.g., sigma(1000)
becomes
ifelse(eval(1000<1),1,,`sigma(decr(1000),eval(+1000))')
which becomes sigma(999,1000)
which in turn expands to
ifelse(eval(999<1),1,1000,`sigma(decr(999),eval(1000+999))')
and so on.
Here, the default value of the added parameter (an empty string) worked OK. In other cases, an auxiliary macro may be required: the auxiliary macro will then be the recursive one; the main macro will call it, passing the appropriate initial value for the extra parameter.
Using unexpanding macros for arrays and hashes
Although it is not standard, GNU m4 allows any text string
to be defined as a macro.
Since only valid identifiers are checked against macros,
macros whose names include spaces or punctuation characters
will not be expanded.
However, they can still be accessed as variables using the
defn
macro:
define(`my var', `a strange one') my var is defn(`my var'). # -> my var is a strange one.
This feature can be used to implement arrays and hashes (associative arrays):
define(`_set', `define(`$1[$2]', `$3')') define(`_get', `defn(`$1[$2]')') _set(`myarray', 1, `alpha') _get(`myarray', 1) # -> alpha _set(`myarray', `alpha', `omega') _get(`myarray', _get(`myarray',1)) # -> omega defn(`myarray[alpha]') # -> omega
String macro problem workaround
Above, we noted a problem with the string macros: it's not possible to prevent the string that's returned from being expanded.
Steven Simpson wrote a patch for m4 which fixes the problem by allowing an extra parameter to be passed to string macros – however this of course means using a non-standard m4.
A less radical fix is to redefine the
substr
macro as follows.
It works by extracting the substring one letter at a time,
thus avoiding any unwanted expansion (assuming, of course,
that no one-letter macros have been defined):
define(`substr',`ifelse($#,0,``$0'', $#,2,`substr($@,eval(len(`$1')-$2))', `ifelse(eval($3<=0),1,, `builtin(`substr',`$1',$2,1)`'substr( `$1',eval($2+1),eval($3-1))')')')dnl define(`eng',`engineering') substr(`engineer',0,3) # -> eng
To keep it simple, this definition assumes reasonably
sensible arguments, e.g., it doesn't allow for
substr(`abcdef', -2)
or substr(`abc')
.
Note that, as with the corresponding builtin substr
,
you may have problems where a string contains quotes, e.g.,
substr(``quoted'',0,3)
The new version of substr
can in turn be used to
implement a new version of translit
:
define(`translit',`ifelse($#,0,``$0'', len(`$1'),0,, `builtin(`translit',substr(`$1',0,1),`$2',`$3')`'translit( substr(`$1',1),`$2',`$3')')')dnl define(`ALPHA', `abcdefghijklmnopqrstuvwxyz') define(`ALPHA_UPR', `ABCDEFGHIJKLMNOPQRSTUVWXYZ') translit(`alpha', ALPHA, ALPHA_UPR) # -> ALPHA
M4: Assessment
M4's general character as a macro language can be seen by comparing it to another, very different macro language: FreeMarker.
GNU m4 and FreeMarker are both free in both senses of the word: FreeMarker is covered by a BSD-style license. They are more-or-less equally “powerful”, e.g., both languages support recursive macros.
In some respects, m4 has an edge over FreeMarker:
- m4 is a standalone tool, FreeMarker requires Java.
- On Unix platforms, m4 is a standard tool with a long heritage – e.g., a Makefile can reasonably expect to be able invoke it as a filter in a processing sequence.
- m4 scripts can interact with the Unix shell.
- m4 is arguably a simpler, “cleaner”, macro language.
The two languages are quite different in appearance and
how they work.
In m4, macros are ordinary identifiers; FreeMarker uses
XML-like markup for the <#opening>
and </#closing>
delimiters of macros.
While m4's textual rescanning approach is conceptually
elegant, it can be confusing in practice and demands
careful attention to layers of nested quotes.
FreeMarker, in comparison, works like a conventional
structured programming language, making it much easier
to read, write and debug.
On the other hand, FreeMarker markup is more verbose and
might seem intrusive in certain contexts, for example,
where macros are used to extend an existing programming
language.
FreeMarker has several distinct advantages:
- it has an associated tool, FMPP, which can read in data from different sources (e.g., in XML or CSV format) and incorporate it into the template output.
- FreeMarker has a comprehensive set of builtin macros and better data handling capabilities.
- No compatibility issues: there is a single, cross-platform implementation that is quite stable and mature (whereas differences even between recent GNU m4 versions are not strictly backwardly compatible).
- FreeMarker supports Unicode; m4 is generally limited to ASCII, or at best 8-bit character sets.
Ultimately, which language is “better” depends on the importance of their relative advantages in different contexts. This author has very positive experience of using FreeMarker/FMPP for automatic code generation where, for several reasons, m4 was unsuitable. On the other hand, m4 is clearly a more sensible and appropriate choice for Unix sendmail's configuration macros.