diff options
Diffstat (limited to 'doc/m4.texinfo')
-rw-r--r-- | doc/m4.texinfo | 8729 |
1 files changed, 8729 insertions, 0 deletions
diff --git a/doc/m4.texinfo b/doc/m4.texinfo new file mode 100644 index 0000000..1dcc8e4 --- /dev/null +++ b/doc/m4.texinfo @@ -0,0 +1,8729 @@ +\input texinfo @c -*- texinfo -*- +@comment ======================================================== +@comment %**start of header +@setfilename m4.info +@include version.texi +@settitle GNU M4 @value{VERSION} macro processor +@setchapternewpage odd +@ifnothtml +@setcontentsaftertitlepage +@end ifnothtml +@finalout + +@c @tabchar{} +@c ---------- +@c The testsuite expects literal tab output in some examples, but +@c literal tabs in texinfo lead to formatting issues. +@macro tabchar +@ @c +@end macro + +@c @ovar{ARG} +@c ------------------- +@c The ARG is an optional argument. To be used for macro arguments in +@c their documentation (@defmac). +@macro ovar{varname} +@r{[}@var{\varname\}@r{]}@c +@end macro + +@c @dvar{ARG, DEFAULT} +@c ------------------- +@c The ARG is an optional argument, defaulting to DEFAULT. To be used +@c for macro arguments in their documentation (@defmac). +@macro dvar{varname, default} +@r{[}@var{\varname\} = @samp{\default\}@r{]}@c +@end macro + +@comment %**end of header +@comment ======================================================== + +@copying + +This manual (@value{UPDATED}) is for GNU M4 (version +@value{VERSION}), a package containing an implementation of the m4 macro +language. + +Copyright @copyright{} 1989-1994, 2004-2011 Free Software Foundation, +Inc. + +@quotation +Permission is granted to copy, distribute and/or modify this document +under the terms of the GNU Free Documentation License, +Version 1.3 or any later version published by the Free Software +Foundation; with no Invariant Sections, no Front-Cover Texts, and no +Back-Cover Texts. A copy of the license is included in the section +entitled ``GNU Free Documentation License.'' +@end quotation +@end copying + +@dircategory Text creation and manipulation +@direntry +* M4: (m4). A powerful macro processor. +@end direntry + +@titlepage +@title GNU M4, version @value{VERSION} +@subtitle A powerful macro processor +@subtitle Edition @value{EDITION}, @value{UPDATED} +@author by Ren@'e Seindal, Fran@,{c}ois Pinard, +@author Gary V. Vaughan, and Eric Blake +@author (@email{bug-m4@@gnu.org}) + +@page +@vskip 0pt plus 1filll +@insertcopying +@end titlepage + +@contents + +@ifnottex +@node Top +@top GNU M4 +@insertcopying +@end ifnottex + +GNU @code{m4} is an implementation of the traditional UNIX macro +processor. It is mostly SVR4 compatible, although it has some +extensions (for example, handling more than 9 positional parameters +to macros). @code{m4} also has builtin functions for including +files, running shell commands, doing arithmetic, etc. Autoconf needs +GNU @code{m4} for generating @file{configure} scripts, but not for +running them. + +GNU @code{m4} was originally written by Ren@'e Seindal, with +subsequent changes by Fran@,{c}ois Pinard and other volunteers +on the Internet. All names and email addresses can be found in the +files @file{m4-@value{VERSION}/@/AUTHORS} and +@file{m4-@value{VERSION}/@/THANKS} from the GNU M4 +distribution. + +This is release @value{VERSION}. It is now considered stable: future +releases in the 1.4.x series are only meant to fix bugs, increase speed, +or improve documentation. However@dots{} + +An experimental feature, which would improve @code{m4} usefulness, +allows for changing the syntax for what is a @dfn{word} in @code{m4}. +You should use: +@comment ignore +@example +./configure --enable-changeword +@end example +@noindent +if you want this feature compiled in. The current implementation +slows down @code{m4} considerably and is hardly acceptable. In the +future, @code{m4} 2.0 will come with a different set of new features +that provide similar capabilities, but without the inefficiencies, so +changeword will go away and @emph{you should not count on it}. + +@menu +* Preliminaries:: Introduction and preliminaries +* Invoking m4:: Invoking @code{m4} +* Syntax:: Lexical and syntactic conventions + +* Macros:: How to invoke macros +* Definitions:: How to define new macros +* Conditionals:: Conditionals, loops, and recursion + +* Debugging:: How to debug macros and input + +* Input Control:: Input control +* File Inclusion:: File inclusion +* Diversions:: Diverting and undiverting output + +* Text handling:: Macros for text handling +* Arithmetic:: Macros for doing arithmetic +* Shell commands:: Macros for running shell commands +* Miscellaneous:: Miscellaneous builtin macros +* Frozen files:: Fast loading of frozen state + +* Compatibility:: Compatibility with other versions of @code{m4} +* Answers:: Correct version of some examples + +* Copying This Package:: How to make copies of the overall M4 package +* Copying This Manual:: How to make copies of this manual +* Indices:: Indices of concepts and macros + +@detailmenu + --- The Detailed Node Listing --- + +Introduction and preliminaries + +* Intro:: Introduction to @code{m4} +* History:: Historical references +* Bugs:: Problems and bugs +* Manual:: Using this manual + +Invoking @code{m4} + +* Operation modes:: Command line options for operation modes +* Preprocessor features:: Command line options for preprocessor features +* Limits control:: Command line options for limits control +* Frozen state:: Command line options for frozen state +* Debugging options:: Command line options for debugging +* Command line files:: Specifying input files on the command line + +Lexical and syntactic conventions + +* Names:: Macro names +* Quoted strings:: Quoting input to @code{m4} +* Comments:: Comments in @code{m4} input +* Other tokens:: Other kinds of input tokens +* Input processing:: How @code{m4} copies input to output + +How to invoke macros + +* Invocation:: Macro invocation +* Inhibiting Invocation:: Preventing macro invocation +* Macro Arguments:: Macro arguments +* Quoting Arguments:: On Quoting Arguments to macros +* Macro expansion:: Expanding macros + +How to define new macros + +* Define:: Defining a new macro +* Arguments:: Arguments to macros +* Pseudo Arguments:: Special arguments to macros +* Undefine:: Deleting a macro +* Defn:: Renaming macros +* Pushdef:: Temporarily redefining macros + +* Indir:: Indirect call of macros +* Builtin:: Indirect call of builtins + +Conditionals, loops, and recursion + +* Ifdef:: Testing if a macro is defined +* Ifelse:: If-else construct, or multibranch +* Shift:: Recursion in @code{m4} +* Forloop:: Iteration by counting +* Foreach:: Iteration by list contents +* Stacks:: Working with definition stacks +* Composition:: Building macros with macros + +How to debug macros and input + +* Dumpdef:: Displaying macro definitions +* Trace:: Tracing macro calls +* Debug Levels:: Controlling debugging output +* Debug Output:: Saving debugging output + +Input control + +* Dnl:: Deleting whitespace in input +* Changequote:: Changing the quote characters +* Changecom:: Changing the comment delimiters +* Changeword:: Changing the lexical structure of words +* M4wrap:: Saving text until end of input + +File inclusion + +* Include:: Including named files +* Search Path:: Searching for include files + +Diverting and undiverting output + +* Divert:: Diverting output +* Undivert:: Undiverting output +* Divnum:: Diversion numbers +* Cleardivert:: Discarding diverted text + +Macros for text handling + +* Len:: Calculating length of strings +* Index macro:: Searching for substrings +* Regexp:: Searching for regular expressions +* Substr:: Extracting substrings +* Translit:: Translating characters +* Patsubst:: Substituting text by regular expression +* Format:: Formatting strings (printf-like) + +Macros for doing arithmetic + +* Incr:: Decrement and increment operators +* Eval:: Evaluating integer expressions + +Macros for running shell commands + +* Platform macros:: Determining the platform +* Syscmd:: Executing simple commands +* Esyscmd:: Reading the output of commands +* Sysval:: Exit status +* Mkstemp:: Making temporary files + +Miscellaneous builtin macros + +* Errprint:: Printing error messages +* Location:: Printing current location +* M4exit:: Exiting from @code{m4} + +Fast loading of frozen state + +* Using frozen files:: Using frozen files +* Frozen file format:: Frozen file format + +Compatibility with other versions of @code{m4} + +* Extensions:: Extensions in GNU M4 +* Incompatibilities:: Facilities in System V m4 not in GNU M4 +* Other Incompatibilities:: Other incompatibilities + +Correct version of some examples + +* Improved exch:: Solution for @code{exch} +* Improved forloop:: Solution for @code{forloop} +* Improved foreach:: Solution for @code{foreach} +* Improved copy:: Solution for @code{copy} +* Improved m4wrap:: Solution for @code{m4wrap} +* Improved cleardivert:: Solution for @code{cleardivert} +* Improved capitalize:: Solution for @code{capitalize} +* Improved fatal_error:: Solution for @code{fatal_error} + +How to make copies of the overall M4 package + +* GNU General Public License:: License for copying the M4 package + +How to make copies of this manual + +* GNU Free Documentation License:: License for copying this manual + +Indices of concepts and macros + +* Macro index:: Index for all @code{m4} macros +* Concept index:: Index for many concepts + +@end detailmenu +@end menu + +@node Preliminaries +@chapter Introduction and preliminaries + +This first chapter explains what GNU @code{m4} is, where @code{m4} +comes from, how to read and use this documentation, how to call the +@code{m4} program, and how to report bugs about it. It concludes by +giving tips for reading the remainder of the manual. + +The following chapters then detail all the features of the @code{m4} +language. + +@menu +* Intro:: Introduction to @code{m4} +* History:: Historical references +* Bugs:: Problems and bugs +* Manual:: Using this manual +@end menu + +@node Intro +@section Introduction to @code{m4} + +@cindex overview of @code{m4} +@code{m4} is a macro processor, in the sense that it copies its +input to the output, expanding macros as it goes. Macros are either +builtin or user-defined, and can take any number of arguments. +Besides just doing macro expansion, @code{m4} has builtin functions +for including named files, running shell commands, doing integer +arithmetic, manipulating text in various ways, performing recursion, +etc.@dots{} @code{m4} can be used either as a front-end to a compiler, +or as a macro processor in its own right. + +The @code{m4} macro processor is widely available on all UNIXes, and has +been standardized by POSIX. +Usually, only a small percentage of users are aware of its existence. +However, those who find it often become committed users. The +popularity of GNU Autoconf, which requires GNU +@code{m4} for @emph{generating} @file{configure} scripts, is an incentive +for many to install it, while these people will not themselves +program in @code{m4}. GNU @code{m4} is mostly compatible with the +System V, Release 3 version, except for some minor differences. +@xref{Compatibility}, for more details. + +Some people find @code{m4} to be fairly addictive. They first use +@code{m4} for simple problems, then take bigger and bigger challenges, +learning how to write complex sets of @code{m4} macros along the way. +Once really addicted, users pursue writing of sophisticated @code{m4} +applications even to solve simple problems, devoting more time +debugging their @code{m4} scripts than doing real work. Beware that +@code{m4} may be dangerous for the health of compulsive programmers. + +@node History +@section Historical references + +@cindex history of @code{m4} +@cindex GNU M4, history of +@code{GPM} was an important ancestor of @code{m4}. See +C. Strachey: ``A General Purpose Macro generator'', Computer Journal +8,3 (1965), pp.@: 225 ff. @code{GPM} is also succinctly described into +David Gries classic ``Compiler Construction for Digital Computers''. + +The classic B. Kernighan and P.J. Plauger: ``Software Tools'', +Addison-Wesley, Inc.@: (1976) describes and implements a Unix +macro-processor language, which inspired Dennis Ritchie to write +@code{m3}, a macro processor for the AP-3 minicomputer. + +Kernighan and Ritchie then joined forces to develop the original +@code{m4}, as described in ``The M4 Macro Processor'', Bell +Laboratories (1977). It had only 21 builtin macros. + +While @code{GPM} was more @emph{pure}, @code{m4} is meant to deal with +the true intricacies of real life: macros can be recognized without +being pre-announced, skipping whitespace or end-of-lines is easier, +more constructs are builtin instead of derived, etc. + +Originally, the Kernighan and Plauger macro-processor, and then +@code{m3}, formed the engine for the Rational FORTRAN preprocessor, +that is, the @code{Ratfor} equivalent of @code{cpp}. Later, @code{m4} +was used as a front-end for @code{Ratfor}, @code{C} and @code{Cobol}. + +Ren@'e Seindal released his implementation of @code{m4}, GNU +@code{m4}, +in 1990, with the aim of removing the artificial limitations in many +of the traditional @code{m4} implementations, such as maximum line +length, macro size, or number of macros. + +The late Professor A. Dain Samples described and implemented a further +evolution in the form of @code{M5}: ``User's Guide to the M5 Macro +Language: 2nd edition'', Electronic Announcement on comp.compilers +newsgroup (1992). + +Fran@,{c}ois Pinard took over maintenance of GNU @code{m4} in +1992, until 1994 when he released GNU @code{m4} 1.4, which was +the stable release for 10 years. It was at this time that GNU +Autoconf decided to require GNU @code{m4} as its underlying +engine, since all other implementations of @code{m4} had too many +limitations. + +More recently, in 2004, Paul Eggert released 1.4.1 and 1.4.2 which +addressed some long standing bugs in the venerable 1.4 release. Then in +2005, Gary V. Vaughan collected together the many patches to +GNU @code{m4} 1.4 that were floating around the net and +released 1.4.3 and 1.4.4. And in 2006, Eric Blake joined the team and +prepared patches for the release of 1.4.5, 1.4.6, 1.4.7, and 1.4.8. +More bug fixes were incorporated in 2007, with releases 1.4.9 and +1.4.10. Eric continued with some portability fixes for 1.4.11 and +1.4.12 in 2008, 1.4.13 in 2009, 1.4.14 and 1.4.15 in 2010, and 1.4.16 in +2011. + +Meanwhile, development has continued on new features for @code{m4}, such +as dynamic module loading and additional builtins. When complete, +GNU @code{m4} 2.0 will start a new series of releases. + +@node Bugs +@section Problems and bugs + +@cindex reporting bugs +@cindex bug reports +@cindex suggestions, reporting +If you have problems with GNU M4 or think you've found a bug, +please report it. Before reporting a bug, make sure you've actually +found a real bug. Carefully reread the documentation and see if it +really says you can do what you're trying to do. If it's not clear +whether you should be able to do something or not, report that too; it's +a bug in the documentation! + +Before reporting a bug or trying to fix it yourself, try to isolate it +to the smallest possible input file that reproduces the problem. Then +send us the input file and the exact results @code{m4} gave you. Also +say what you expected to occur; this will help us decide whether the +problem was really in the documentation. + +Once you've got a precise problem, send e-mail to +@email{bug-m4@@gnu.org}. Please include the version number of @code{m4} +you are using. You can get this information with the command +@kbd{m4 --version}. Also provide details about the platform you are +executing on. + +Non-bug suggestions are always welcome as well. If you have questions +about things that are unclear in the documentation or are just obscure +features, please report them too. + +@node Manual +@section Using this manual + +@cindex examples, understanding +This manual contains a number of examples of @code{m4} input and output, +and a simple notation is used to distinguish input, output and error +messages from @code{m4}. Examples are set out from the normal text, and +shown in a fixed width font, like this + +@comment ignore +@example +This is an example of an example! +@end example + +To distinguish input from output, all output from @code{m4} is prefixed +by the string @samp{@result{}}, and all error messages by the string +@samp{@error{}}. When showing how command line options affect matters, +the command line is shown with a prompt @samp{$ @kbd{like this}}, +otherwise, you can assume that a simple @kbd{m4} invocation will work. +Thus: + +@comment ignore +@example +$ @kbd{command line to invoke m4} +Example of input line +@result{}Output line from m4 +@error{}and an error message +@end example + +The sequence @samp{^D} in an example indicates the end of the input +file. The sequence @samp{@key{NL}} refers to the newline character. +The majority of these examples are self-contained, and you can run them +with similar results by invoking @kbd{m4 -d}. In fact, the testsuite +that is bundled in the GNU M4 package consists of the examples +in this document! Some of the examples assume that your current +directory is located where you unpacked the installation, so if you plan +on following along, you may find it helpful to do this now: + +@comment ignore +@example +$ @kbd{cd m4-@value{VERSION}} +@end example + +As each of the predefined macros in @code{m4} is described, a prototype +call of the macro will be shown, giving descriptive names to the +arguments, e.g., + +@deffn Composite example (@var{string}, @dvar{count, 1}, @ + @ovar{argument}@dots{}) +This is a sample prototype. There is not really a macro named +@code{example}, but this documents that if there were, it would be a +Composite macro, rather than a Builtin. It requires at least one +argument, @var{string}. Remember that in @code{m4}, there must not be a +space between the macro name and the opening parenthesis, unless it was +intended to call the macro without any arguments. The brackets around +@var{count} and @var{argument} show that these arguments are optional. +If @var{count} is omitted, the macro behaves as if count were @samp{1}, +whereas if @var{argument} is omitted, the macro behaves as if it were +the empty string. A blank argument is not the same as an omitted +argument. For example, @samp{example(`a')}, @samp{example(`a',`1')}, +and @samp{example(`a',`1',)} would behave identically with @var{count} +set to @samp{1}; while @samp{example(`a',)} and @samp{example(`a',`')} +would explicitly pass the empty string for @var{count}. The ellipses +(@samp{@dots{}}) show that the macro processes additional arguments +after @var{argument}, rather than ignoring them. +@end deffn + +@cindex numbers +All macro arguments in @code{m4} are strings, but some are given +special interpretation, e.g., as numbers, file names, regular +expressions, etc. The documentation for each macro will state how the +parameters are interpreted, and what happens if the argument cannot be +parsed according to the desired interpretation. Unless specified +otherwise, a parameter specified to be a number is parsed as a decimal, +even if the argument has leading zeros; and parsing the empty string as +a number results in 0 rather than an error, although a warning will be +issued. + +This document consistently writes and uses @dfn{builtin}, without a +hyphen, as if it were an English word. This is how the @code{builtin} +primitive is spelled within @code{m4}. + +@node Invoking m4 +@chapter Invoking @code{m4} + +@cindex command line +@cindex invoking @code{m4} +The format of the @code{m4} command is: + +@comment ignore +@example +@code{m4} @r{[}@var{option}@dots{}@r{]} @r{[}@var{file}@dots{}@r{]} +@end example + +@cindex command line, options +@cindex options, command line +@cindex @env{POSIXLY_CORRECT} +All options begin with @samp{-}, or if long option names are used, with +@samp{--}. A long option name need not be written completely, any +unambiguous prefix is sufficient. POSIX requires @code{m4} to +recognize arguments intermixed with files, even when +@env{POSIXLY_CORRECT} is set in the environment. Most options take +effect at startup regardless of their position, but some are documented +below as taking effect after any files that occurred earlier in the +command line. The argument @option{--} is a marker to denote the end of +options. + +With short options, options that do not take arguments may be combined +into a single command line argument with subsequent options, options +with mandatory arguments may be provided either as a single command line +argument or as two arguments, and options with optional arguments must +be provided as a single argument. In other words, +@kbd{m4 -QPDfoo -d a -df} is equivalent to +@kbd{m4 -Q -P -D foo -d -df -- ./a}, although the latter form is +considered canonical. + +With long options, options with mandatory arguments may be provided with +an equal sign (@samp{=}) in a single argument, or as two arguments, and +options with optional arguments must be provided as a single argument. +In other words, @kbd{m4 --def foo --debug a} is equivalent to +@kbd{m4 --define=foo --debug= -- ./a}, although the latter form is +considered canonical (not to mention more robust, in case a future +version of @code{m4} introduces an option named @option{--default}). + +@code{m4} understands the following options, grouped by functionality. + +@menu +* Operation modes:: Command line options for operation modes +* Preprocessor features:: Command line options for preprocessor features +* Limits control:: Command line options for limits control +* Frozen state:: Command line options for frozen state +* Debugging options:: Command line options for debugging +* Command line files:: Specifying input files on the command line +@end menu + +@node Operation modes +@section Command line options for operation modes + +Several options control the overall operation of @code{m4}: + +@table @code +@item --help +Print a help summary on standard output, then immediately exit +@code{m4} without reading any input files or performing any other +actions. + +@item --version +Print the version number of the program on standard output, then +immediately exit @code{m4} without reading any input files or +performing any other actions. + +@item -E +@itemx --fatal-warnings +@cindex errors, fatal +@cindex fatal errors +Controls the effect of warnings. If unspecified, then execution +continues and exit status is unaffected when a warning is printed. If +specified exactly once, warnings become fatal; when one is issued, +execution continues, but the exit status will be non-zero. If specified +multiple times, then execution halts with non-zero status the first time +a warning is issued. The introduction of behavior levels is new to M4 +1.4.9; for behavior consistent with earlier versions, you should specify +@option{-E} twice. + +@item -i +@itemx --interactive +@itemx -e +Makes this invocation of @code{m4} interactive. This means that all +output will be unbuffered, and interrupts will be ignored. The +spelling @option{-e} exists for compatibility with other @code{m4} +implementations, and issues a warning because it may be withdrawn in a +future version of GNU M4. + +@item -P +@itemx --prefix-builtins +Internally modify @emph{all} builtin macro names so they all start with +the prefix @samp{m4_}. For example, using this option, one should write +@samp{m4_define} instead of @samp{define}, and @samp{m4___file__} +instead of @samp{__file__}. This option has no effect if @option{-R} +is also specified. + +@item -Q +@itemx --quiet +@itemx --silent +Suppress warnings, such as missing or superfluous arguments in macro +calls, or treating the empty string as zero. + +@item --warn-macro-sequence@r{[}=@var{regexp}@r{]} +Issue a warning if the regular expression @var{regexp} has a non-empty +match in any macro definition (either by @code{define} or +@code{pushdef}). Empty matches are ignored; therefore, supplying the +empty string as @var{regexp} disables any warning. If the optional +@var{regexp} is not supplied, then the default regular expression is +@samp{\$\(@{[^@}]*@}\|[0-9][0-9]+\)} (a literal @samp{$} followed by +multiple digits or by an open brace), since these sequences will +change semantics in the default operation of GNU M4 2.0 (due +to a change in how more than 9 arguments in a macro definition will be +handled, @pxref{Arguments}). Providing an alternate regular +expression can provide a useful reverse lookup feature of finding +where a macro is defined to have a given definition. + +@item -W @var{regexp} +@itemx --word-regexp=@var{regexp} +Use @var{regexp} as an alternative syntax for macro names. This +experimental option will not be present in all GNU @code{m4} +implementations (@pxref{Changeword}). +@end table + +@node Preprocessor features +@section Command line options for preprocessor features + +@cindex macro definitions, on the command line +@cindex command line, macro definitions on the +@cindex preprocessor features +Several options allow @code{m4} to behave more like a preprocessor. +Macro definitions and deletions can be made on the command line, the +search path can be altered, and the output file can track where the +input came from. These features occur with the following options: + +@table @code +@item -D @var{name}@r{[}=@var{value}@r{]} +@itemx --define=@var{name}@r{[}=@var{value}@r{]} +This enters @var{name} into the symbol table. If @samp{=@var{value}} is +missing, the value is taken to be the empty string. The @var{value} can +be any string, and the macro can be defined to take arguments, just as +if it was defined from within the input. This option may be given more +than once; order with respect to file names is significant, and +redefining the same @var{name} loses the previous value. + +@item -I @var{directory} +@itemx --include=@var{directory} +Make @code{m4} search @var{directory} for included files that are not +found in the current working directory. @xref{Search Path}, for more +details. This option may be given more than once. + +@item -s +@itemx --synclines +@cindex synchronization lines +@cindex location, input +@cindex input location +Generate synchronization lines, for use by the C preprocessor or other +similar tools. Order is significant with respect to file names. This +option is useful, for example, when @code{m4} is used as a +front end to a compiler. Source file name and line number information +is conveyed by directives of the form @samp{#line @var{linenum} +"@var{file}"}, which are inserted as needed into the middle of the +output. Such directives mean that the following line originated or was +expanded from the contents of input file @var{file} at line +@var{linenum}. The @samp{"@var{file}"} part is often omitted when +the file name did not change from the previous directive. + +Synchronization directives are always given on complete lines by +themselves. When a synchronization discrepancy occurs in the middle of +an output line, the associated synchronization directive is delayed +until the next newline that does not occur in the middle of a quoted +string or comment. + +@comment options: -s +@example +define(`twoline', `1 +2') +@result{}#line 2 "stdin" +@result{} +changecom(`/*', `*/') +@result{} +define(`comment', `/*1 +2*/') +@result{}#line 5 +@result{} +dnl no line +hello +@result{}#line 7 +@result{}hello +twoline +@result{}1 +@result{}#line 8 +@result{}2 +comment +@result{}/*1 +@result{}2*/ +one comment `two +three' +@result{}#line 10 +@result{}one /*1 +@result{}2*/ two +@result{}three +goodbye +@result{}#line 12 +@result{}goodbye +@end example + +@item -U @var{name} +@itemx --undefine=@var{name} +This deletes any predefined meaning @var{name} might have. Obviously, +only predefined macros can be deleted in this way. This option may be +given more than once; undefining a @var{name} that does not have a +definition is silently ignored. Order is significant with respect to +file names. +@end table + +@node Limits control +@section Command line options for limits control + +There are some limits within @code{m4} that can be tuned. For +compatibility, @code{m4} also accepts some options that control limits +in other implementations, but which are automatically unbounded (limited +only by your hardware and operating system constraints) in GNU +@code{m4}. + +@table @code +@item -g +@itemx --gnu +Enable all the extensions in this implementation. In this release of +M4, this option is always on by default; it is currently only useful +when overriding a prior use of @option{--traditional}. However, having +GNU behavior as default makes it impossible to write a +strictly POSIX-compliant client that avoids all incompatible +GNU M4 extensions, since such a client would have to use the +non-POSIX command-line option to force full POSIX +behavior. Thus, a future version of M4 will be changed to implicitly +use the option @option{--traditional} if the environment variable +@env{POSIXLY_CORRECT} is set. Projects that intentionally use +GNU extensions should consider using @option{--gnu} to state +their intentions, so that the project will not mysteriously break if the +user upgrades to a newer M4 and has @env{POSIXLY_CORRECT} set in their +environment. + +@item -G +@itemx --traditional +Suppress all the extensions made in this implementation, compared to the +System V version. @xref{Compatibility}, for a list of these. + +@item -H @var{num} +@itemx --hashsize=@var{num} +Make the internal hash table for symbol lookup be @var{num} entries big. +For better performance, the number should be prime, but this is not +checked. The default is 509 entries. It should not be necessary to +increase this value, unless you define an excessive number of macros. + +@item -L @var{num} +@itemx --nesting-limit=@var{num} +@cindex nesting limit +@cindex limit, nesting +Artificially limit the nesting of macro calls to @var{num} levels, +stopping program execution if this limit is ever exceeded. When not +specified, nesting defaults to unlimited on platforms that can detect +stack overflow, and to 1024 levels otherwise. A value of zero means +unlimited; but then heavily nested code could potentially cause a stack +overflow. + +The precise effect of this option is more correctly associated +with textual nesting than dynamic recursion. It has been useful +when some complex @code{m4} input was generated by mechanical means, and +also in diagnosing recursive algorithms that do not scale well. +Most users never need to change this option from its default. + +@cindex rescanning +This option does @emph{not} have the ability to break endless +rescanning loops, since these do not necessarily consume much memory +or stack space. Through clever usage of rescanning loops, one can +request complex, time-consuming computations from @code{m4} with useful +results. Putting limitations in this area would break @code{m4} power. +There are many pathological cases: @w{@samp{define(`a', `a')a}} is +only the simplest example (but @pxref{Compatibility}). Expecting GNU +@code{m4} to detect these would be a little like expecting a compiler +system to detect and diagnose endless loops: it is a quite @emph{hard} +problem in general, if not undecidable! + +@item -B @var{num} +@itemx -S @var{num} +@itemx -T @var{num} +These options are present for compatibility with System V @code{m4}, but +do nothing in this implementation. They may disappear in future +releases, and issue a warning to that effect. + +@item -N @var{num} +@itemx --diversions=@var{num} +These options are present only for compatibility with previous +versions of GNU @code{m4}, and were controlling the number of +possible diversions which could be used at the same time. They do nothing, +because there is no fixed limit anymore. They may disappear in future +releases, and issue a warning to that effect. +@end table + +@node Frozen state +@section Command line options for frozen state + +GNU @code{m4} comes with a feature of freezing internal state +(@pxref{Frozen files}). This can be used to speed up @code{m4} +execution when reusing a common initialization script. + +@table @code +@item -F @var{file} +@itemx --freeze-state=@var{file} +Once execution is finished, write out the frozen state on the specified +@var{file}. It is conventional, but not required, for @var{file} to end +in @samp{.m4f}. + +@item -R @var{file} +@itemx --reload-state=@var{file} +Before execution starts, recover the internal state from the specified +frozen @var{file}. The options @option{-D}, @option{-U}, and +@option{-t} take effect after state is reloaded, but before the input +files are read. +@end table + +@node Debugging options +@section Command line options for debugging + +Finally, there are several options for aiding in debugging @code{m4} +scripts. + +@table @code +@item -d@r{[}@var{flags}@r{]} +@itemx --debug@r{[}=@var{flags}@r{]} +Set the debug-level according to the flags @var{flags}. The debug-level +controls the format and amount of information presented by the debugging +functions. @xref{Debug Levels}, for more details on the format and +meaning of @var{flags}. If omitted, @var{flags} defaults to @samp{aeq}. + +@item --debugfile@r{[}=@var{file}@r{]} +@itemx -o @var{file} +@itemx --error-output=@var{file} +Redirect @code{dumpdef} output, debug messages, and trace output to the +named @var{file}. Warnings, error messages, and @code{errprint} output +are still printed to standard error. If these options are not used, or +if @var{file} is unspecified (only possible for @option{--debugfile}), +debug output goes to standard error; if @var{file} is the empty string, +debug output is discarded. @xref{Debug Output}, for more details. The +option @option{--debugfile} may be given more than once, and order is +significant with respect to file names. The spellings @option{-o} and +@option{--error-output} are misleading and inconsistent with other +GNU tools; for now they are silently accepted as synonyms of +@option{--debugfile} and only recognized once, but in a future version +of M4, using them will cause a warning to be issued. + +@ignore +@comment not worth including in the manual, but provides a good test + +@comment examples +@comment options: -Dbar=hello -tbar --debugfile= foo --debugfile - +@example +$ @kbd{m4 -d -Iexamples -Dbar=hello -tbar --debugfile= foo --debugfile - +@result{}hello +errprint(`hi +')dnl +@error{}hi +bar +@error{}m4trace: -1- bar -> `hello' +@result{}hello +@end example +@end ignore + +@item -l @var{num} +@itemx --arglength=@var{num} +Restrict the size of the output generated by macro tracing to @var{num} +characters per trace line. If unspecified or zero, output is +unlimited. @xref{Debug Levels}, for more details. + +@item -t @var{name} +@itemx --trace=@var{name} +This enables tracing for the macro @var{name}, at any point where it is +defined. @var{name} need not be defined when this option is given. +This option may be given more than once, and order is significant with +respect to file names. @xref{Trace}, for more details. +@end table + +@node Command line files +@section Specifying input files on the command line + +@cindex command line, file names on the +@cindex file names, on the command line +The remaining arguments on the command line are taken to be input file +names. If no names are present, standard input is read. A file +name of @file{-} is taken to mean standard input. It is +conventional, but not required, for input files to end in @samp{.m4}. + +The input files are read in the sequence given. Standard input can be +read more than once, so the file name @file{-} may appear multiple times +on the command line; this makes a difference when input is from a +terminal or other special file type. It is an error if an input file +ends in the middle of argument collection, a comment, or a quoted +string. + +The options @option{--define} (@option{-D}), @option{--undefine} +(@option{-U}), @option{--synclines} (@option{-s}), and @option{--trace} +(@option{-t}) only take effect after processing input from any file +names that occur earlier on the command line. For example, assume the +file @file{foo} contains: + +@comment ignore +@example +$ @kbd{cat foo} +bar +@end example + +The text @samp{bar} can then be redefined over multiple uses of +@file{foo}: + +@comment options: -Dbar=hello foo -Dbar=world foo +@example +$ @kbd{m4 -Dbar=hello foo -Dbar=world foo} +@result{}hello +@result{}world +@end example + +If none of the input files invoked @code{m4exit} (@pxref{M4exit}), the +exit status of @code{m4} will be 0 for success, 1 for general failure +(such as problems with reading an input file), and 63 for version +mismatch (@pxref{Using frozen files}). + +If you need to read a file whose name starts with a @file{-}, you can +specify it as @samp{./-file}, or use @option{--} to mark the end of +options. + +@ignore +@comment Test that 'm4 file/' detects that file is not a directory; we +@comment can assume that the current directory contains a Makefile. +@comment mingw fails with EINVAL rather than ENOTDIR. + +@comment status: 1 +@comment xerr: ignore +@comment options: Makefile/ +@example +@error{}m4: cannot open `Makefile/': Not a directory +@end example + +@comment Test that closed stderr does not cause a crash. Not all +@comment systems have the same message for EBADF. + +@comment xerr: ignore +@example +ifdef(`__unix__', , + `errprint(` skipping: syscmd does not have unix semantics +')m4exit(`77')')dnl +changequote(`[', `]')dnl +syscmd([echo | ']__program__[' >&-])dnl +@error{}m4: write error: Bad file descriptor +sysval +@result{}1 +@end example + +@example +ifdef(`__unix__', , + `errprint(` skipping: syscmd does not have unix semantics +')m4exit(`77')')dnl +changequote(`[', `]')dnl +syscmd([echo 'esyscmd(echo hi >&2 && echo err"print(bye +)d"nl)dnl' > tmp.m4 \ + && ']__program__[' tmp.m4 <&- >&- \ + && rm tmp.m4])sysval +@error{}hi +@error{}bye +@result{}0 +@end example + +@comment Test that we obey POSIX semantics with -D interspersed with +@comment files, even with POSIXLY_CORRECT (BSD getopt gets it wrong). + +$ @kbd{m4 } +@example +ifdef(`__unix__', , + `errprint(` skipping: syscmd does not have unix semantics +')m4exit(`77')')dnl +changequote(`[', `]')dnl +syscmd([POSIXLY_CORRECT=1 ']__program__[' -Dbar=hello foo -Dbar=world foo])dnl +@result{}hello +@result{}world +sysval +@result{}0 +@end example +@end ignore + +@node Syntax +@chapter Lexical and syntactic conventions + +@cindex input tokens +@cindex tokens +As @code{m4} reads its input, it separates it into @dfn{tokens}. A +token is either a name, a quoted string, or any single character, that +is not a part of either a name or a string. Input to @code{m4} can also +contain comments. GNU @code{m4} does not yet understand +multibyte locales; all operations are byte-oriented rather than +character-oriented (although if your locale uses a single byte +encoding, such as @sc{ISO-8859-1}, you will not notice a difference). +However, @code{m4} is eight-bit clean, so you can +use non-@sc{ascii} characters in quoted strings (@pxref{Changequote}), +comments (@pxref{Changecom}), and macro names (@pxref{Indir}), with the +exception of the @sc{nul} character (the zero byte @samp{'\0'}). + +@menu +* Names:: Macro names +* Quoted strings:: Quoting input to @code{m4} +* Comments:: Comments in @code{m4} input +* Other tokens:: Other kinds of input tokens +* Input processing:: How @code{m4} copies input to output +@end menu + +@node Names +@section Macro names + +@cindex names +@cindex words +A name is any sequence of letters, digits, and the character @samp{_} +(underscore), where the first character is not a digit. @code{m4} will +use the longest such sequence found in the input. If a name has a +macro definition, it will be subject to macro expansion +(@pxref{Macros}). Names are case-sensitive. + +Examples of legal names are: @samp{foo}, @samp{_tmp}, and @samp{name01}. + +@node Quoted strings +@section Quoting input to @code{m4} + +@cindex quoted string +@cindex string, quoted +A quoted string is a sequence of characters surrounded by quote +strings, defaulting to +@samp{`} and @samp{'}, where the nested begin and end quotes within the +string are balanced. The value of a string token is the text, with one +level of quotes stripped off. Thus + +@comment ignore +@example +`' +@result{} +@end example + +@noindent +is the empty string, and double-quoting turns into single-quoting. + +@comment ignore +@example +``quoted'' +@result{}`quoted' +@end example + +The quote characters can be changed at any time, using the builtin macro +@code{changequote}. @xref{Changequote}, for more information. + +@node Comments +@section Comments in @code{m4} input + +@cindex comments +Comments in @code{m4} are normally delimited by the characters @samp{#} +and newline. All characters between the comment delimiters are ignored, +but the entire comment (including the delimiters) is passed through to +the output---comments are @emph{not} discarded by @code{m4}. + +Comments cannot be nested, so the first newline after a @samp{#} ends +the comment. The commenting effect of the begin-comment string +can be inhibited by quoting it. + +@example +$ @kbd{m4} +`quoted text' # `commented text' +@result{}quoted text # `commented text' +`quoting inhibits' `#' `comments' +@result{}quoting inhibits # comments +@end example + +The comment delimiters can be changed to any string at any time, using +the builtin macro @code{changecom}. @xref{Changecom}, for more +information. + +@ignore +@comment Detect regression in 1.4.10b in regards to reparsing comments. +@comment Not worth including in the manual. +@example +define(`e', `$@@')define(`q', ``$@@'')define(`foo', `bar') +@result{} +q(e(`one +',#two ' foo +)) +@result{}`one +@result{}',`#two bar +@result{}'' +changecom(`<', `>')define(`n', `$#') +@result{} +n(e(<`>, <'>)) +@result{}1 +len(e(<`>, ,<'>)) +@result{}12 +@end example +@end ignore + +@node Other tokens +@section Other kinds of input tokens + +@cindex tokens, special +Any character, that is neither a part of a name, nor of a quoted string, +nor a comment, is a token by itself. When not in the context of macro +expansion, all of these tokens are just copied to output. However, +during macro expansion, whitespace characters (space, tab, newline, +formfeed, carriage return, vertical tab), parentheses (@samp{(} and +@samp{)}), comma (@samp{,}), and dollar (@samp{$}) have additional +roles, explained later. + +@node Input processing +@section How @code{m4} copies input to output + +As @code{m4} reads the input token by token, it will copy each token +directly to the output immediately. + +The exception is when it finds a word with a macro definition. In that +case @code{m4} will calculate the macro's expansion, possibly reading +more input to get the arguments. It then inserts the expansion in front +of the remaining input. In other words, the resulting text from a macro +call will be read and parsed into tokens again. + +@code{m4} expands a macro as soon as possible. If it finds a macro call +when collecting the arguments to another, it will expand the second call +first. This process continues until there are no more macro calls to +expand and all the input has been consumed. + +For a running example, examine how @code{m4} handles this input: + +@comment ignore +@example +format(`Result is %d', eval(`2**15')) +@end example + +@noindent +First, @code{m4} sees that the token @samp{format} is a macro name, so +it collects the tokens @samp{(}, @samp{`Result is %d'}, @samp{,}, +and @samp{@w{ }}, before encountering another potential macro. Sure +enough, @samp{eval} is a macro name, so the nested argument collection +picks up @samp{(}, @samp{`2**15'}, and @samp{)}, invoking the eval macro +with the lone argument of @samp{2**15}. The expansion of +@samp{eval(2**15)} is @samp{32768}, which is then rescanned as the five +tokens @samp{3}, @samp{2}, @samp{7}, @samp{6}, and @samp{8}; and +combined with the next @samp{)}, the format macro now has all its +arguments, as if the user had typed: + +@comment ignore +@example +format(`Result is %d', 32768) +@end example + +@noindent +The format macro expands to @samp{Result is 32768}, and we have another +round of scanning for the tokens @samp{Result}, @samp{@w{ }}, +@samp{is}, @samp{@w{ }}, @samp{3}, @samp{2}, @samp{7}, @samp{6}, and +@samp{8}. None of these are macros, so the final output is + +@comment ignore +@example +@result{}Result is 32768 +@end example + +As a more complicated example, we will contrast an actual code +example from the Gnulib project@footnote{Derived from a patch in +@uref{http://lists.gnu.org/archive/html/bug-gnulib/@/2007-01/@/msg00389.html}, +and a followup patch in +@uref{http://lists.gnu.org/archive/html/bug-gnulib/@/2007-02/@/msg00000.html}}, +showing both a buggy approach and the desired results. The user desires +to output a shell assignment statement that takes its argument and turns +it into a shell variable by converting it to uppercase and prepending a +prefix. The original attempt looks like this: + +@example +changequote([,])dnl +define([gl_STRING_MODULE_INDICATOR], + [ + dnl comment + GNULIB_]translit([$1],[a-z],[A-Z])[=1 + ])dnl + gl_STRING_MODULE_INDICATOR([strcase]) +@result{} @w{ } +@result{} GNULIB_strcase=1 +@result{} @w{ } +@end example + +Oops -- the argument did not get capitalized. And although the manual +is not able to easily show it, both lines that appear empty actually +contain two trailing spaces. By stepping through the parse, it is easy +to see what happened. First, @code{m4} sees the token +@samp{changequote}, which it recognizes as a macro, followed by +@samp{(}, @samp{[}, @samp{,}, @samp{]}, and @samp{)} to form the +argument list. The macro expands to the empty string, but changes the +quoting characters to something more useful for generating shell code +(unbalanced @samp{`} and @samp{'} appear all the time in shell scripts, +but unbalanced @samp{[]} tend to be rare). Also in the first line, +@code{m4} sees the token @samp{dnl}, which it recognizes as a builtin +macro that consumes the rest of the line, resulting in no output for +that line. + +The second line starts a macro definition. @code{m4} sees the token +@samp{define}, which it recognizes as a macro, followed by a @samp{(}, +@samp{[gl_STRING_MODULE_INDICATOR]}, and @samp{,}. Because an unquoted +comma was encountered, the first argument is known to be the expansion +of the single-quoted string token, or @samp{gl_STRING_MODULE_INDICATOR}. +Next, @code{m4} sees @samp{@key{NL}}, @samp{ }, and @samp{ }, but this +whitespace is discarded as part of argument collection. Then comes a +rather lengthy single-quoted string token, @samp{[@key{NL}@ @ @ @ dnl +comment@key{NL}@ @ @ @ GNULIB_]}. This is followed by the token +@samp{translit}, which @code{m4} recognizes as a macro name, so a nested +macro expansion has started. + +The arguments to the @code{translit} are found by the tokens @samp{(}, +@samp{[$1]}, @samp{,}, @samp{[a-z]}, @samp{,}, @samp{[A-Z]}, and finally +@samp{)}. All three string arguments are expanded (or in other words, +the quotes are stripped), and since neither @samp{$} nor @samp{1} need +capitalization, the result of the macro is @samp{$1}. This expansion is +rescanned, resulting in the two literal characters @samp{$} and +@samp{1}. + +Scanning of the outer macro resumes, and picks up with +@samp{[=1@key{NL}@ @ ]}, and finally @samp{)}. The collected pieces of +expanded text are concatenated, with the end result that the macro +@samp{gl_STRING_MODULE_INDICATOR} is now defined to be the sequence +@samp{@key{NL}@ @ @ @ dnl comment@key{NL}@ @ @ @ GNULIB_$1=1@key{NL}@ @ }. +Once again, @samp{dnl} is recognized and avoids a newline in the output. + +The final line is then parsed, beginning with @samp{ } and @samp{ } +that are output literally. Then @samp{gl_STRING_MODULE_INDICATOR} is +recognized as a macro name, with an argument list of @samp{(}, +@samp{[strcase]}, and @samp{)}. Since the definition of the macro +contains the sequence @samp{$1}, that sequence is replaced with the +argument @samp{strcase} prior to starting the rescan. The rescan sees +@samp{@key{NL}} and four spaces, which are output literally, then +@samp{dnl}, which discards the text @samp{ comment@key{NL}}. Next +comes four more spaces, also output literally, and the token +@samp{GNULIB_strcase}, which resulted from the earlier parameter +substitution. Since that is not a macro name, it is output literally, +followed by the literal tokens @samp{=}, @samp{1}, @samp{@key{NL}}, and +two more spaces. Finally, the original @samp{@key{NL}} seen after the +macro invocation is scanned and output literally. + +Now for a corrected approach. This rearranges the use of newlines and +whitespace so that less whitespace is output (which, although harmless +to shell scripts, can be visually unappealing), and fixes the quoting +issues so that the capitalization occurs when the macro +@samp{gl_STRING_MODULE_INDICATOR} is invoked, rather then when it is +defined. It also adds another layer of quoting to the first argument of +@code{translit}, to ensure that the output will be rescanned as a string +rather than a potential uppercase macro name needing further expansion. + +@example +changequote([,])dnl +define([gl_STRING_MODULE_INDICATOR], + [dnl comment + GNULIB_[]translit([[$1]], [a-z], [A-Z])=1dnl +])dnl + gl_STRING_MODULE_INDICATOR([strcase]) +@result{} GNULIB_STRCASE=1 +@end example + +The parsing of the first line is unchanged. The second line sees the +name of the macro to define, then sees the discarded @samp{@key{NL}} +and two spaces, as before. But this time, the next token is +@samp{[dnl comment@key{NL}@ @ GNULIB_[]translit([[$1]], [a-z], +[A-Z])=1dnl@key{NL}]}, which includes nested quotes, followed by +@samp{)} to end the macro definition and @samp{dnl} to skip the +newline. No early expansion of @code{translit} occurs, so the entire +string becomes the definition of the macro. + +The final line is then parsed, beginning with two spaces that are +output literally, and an invocation of +@code{gl_STRING_MODULE_INDICATOR} with the argument @samp{strcase}. +Again, the @samp{$1} in the macro definition is substituted prior to +rescanning. Rescanning first encounters @samp{dnl}, and discards +@samp{ comment@key{NL}}. Then two spaces are output literally. Next +comes the token @samp{GNULIB_}, but that is not a macro, so it is +output literally. The token @samp{[]} is an empty string, so it does +not affect output. Then the token @samp{translit} is encountered. + +This time, the arguments to @code{translit} are parsed as @samp{(}, +@samp{[[strcase]]}, @samp{,}, @samp{ }, @samp{[a-z]}, @samp{,}, @samp{ }, +@samp{[A-Z]}, and @samp{)}. The two spaces are discarded, and the +translit results in the desired result @samp{[STRCASE]}. This is +rescanned, but since it is a string, the quotes are stripped and the +only output is a literal @samp{STRCASE}. +Then the scanner sees @samp{=} and @samp{1}, which are output +literally, followed by @samp{dnl} which discards the rest of the +definition of @code{gl_STRING_MODULE_INDICATOR}. The newline at the +end of output is the literal @samp{@key{NL}} that appeared after the +invocation of the macro. + +The order in which @code{m4} expands the macros can be further explored +using the trace facilities of GNU @code{m4} (@pxref{Trace}). + +@node Macros +@chapter How to invoke macros + +This chapter covers macro invocation, macro arguments and how macro +expansion is treated. + +@menu +* Invocation:: Macro invocation +* Inhibiting Invocation:: Preventing macro invocation +* Macro Arguments:: Macro arguments +* Quoting Arguments:: On Quoting Arguments to macros +* Macro expansion:: Expanding macros +@end menu + +@node Invocation +@section Macro invocation + +@cindex macro invocation +@cindex invoking macros +Macro invocations has one of the forms + +@comment ignore +@example +name +@end example + +@noindent +which is a macro invocation without any arguments, or + +@comment ignore +@example +name(arg1, arg2, @dots{}, arg@var{n}) +@end example + +@noindent +which is a macro invocation with @var{n} arguments. Macros can have any +number of arguments. All arguments are strings, but different macros +might interpret the arguments in different ways. + +The opening parenthesis @emph{must} follow the @var{name} directly, with +no spaces in between. If it does not, the macro is called with no +arguments at all. + +For a macro call to have no arguments, the parentheses @emph{must} be +left out. The macro call + +@comment ignore +@example +name() +@end example + +@noindent +is a macro call with one argument, which is the empty string, not a call +with no arguments. + +@node Inhibiting Invocation +@section Preventing macro invocation + +An innovation of the @code{m4} language, compared to some of its +predecessors (like Strachey's @code{GPM}, for example), is the ability +to recognize macro calls without resorting to any special, prefixed +invocation character. While generally useful, this feature might +sometimes be the source of spurious, unwanted macro calls. So, GNU +@code{m4} offers several mechanisms or techniques for inhibiting the +recognition of names as macro calls. + +@cindex GNU extensions +@cindex blind macro +@cindex macro, blind +First of all, many builtin macros cannot meaningfully be called without +arguments. As a GNU extension, for any of these macros, +whenever an opening parenthesis does not immediately follow their name, +the builtin macro call is not triggered. This solves the most usual +cases, like for @samp{include} or @samp{eval}. Later in this document, +the sentence ``This macro is recognized only with parameters'' refers to +this specific provision of GNU M4, also known as a blind +builtin macro. For the builtins defined by POSIX that bear +this disclaimer, POSIX specifically states that invoking those +builtins without arguments is unspecified, because many other +implementations simply invoke the builtin as though it were given one +empty argument instead. + +@example +$ @kbd{m4} +eval +@result{}eval +eval(`1') +@result{}1 +@end example + +There is also a command line option (@option{--prefix-builtins}, or +@option{-P}, @pxref{Operation modes, , Invoking m4}) that renames all +builtin macros with a prefix of @samp{m4_} at startup. The option has +no effect whatsoever on user defined macros. For example, with this option, +one has to write @code{m4_dnl} and even @code{m4_m4exit}. It also has +no effect on whether a macro requires parameters. + +@comment options: -P +@example +$ @kbd{m4 -P} +eval +@result{}eval +eval(`1') +@result{}eval(1) +m4_eval +@result{}m4_eval +m4_eval(`1') +@result{}1 +@end example + +Another alternative is to redefine problematic macros to a name less +likely to cause conflicts, @xref{Definitions}. + +If your version of GNU @code{m4} has the @code{changeword} feature +compiled in, it offers far more flexibility in specifying the +syntax of macro names, both builtin or user-defined. @xref{Changeword}, +for more information on this experimental feature. + +Of course, the simplest way to prevent a name from being interpreted +as a call to an existing macro is to quote it. The remainder of +this section studies a little more deeply how quoting affects macro +invocation, and how quoting can be used to inhibit macro invocation. + +Even if quoting is usually done over the whole macro name, it can also +be done over only a few characters of this name (provided, of course, +that the unquoted portions are not also a macro). It is also possible +to quote the empty string, but this works only @emph{inside} the name. +For example: + +@example +`divert' +@result{}divert +`d'ivert +@result{}divert +di`ver't +@result{}divert +div`'ert +@result{}divert +@end example + +@noindent +all yield the string @samp{divert}. While in both: + +@example +`'divert +@result{} +divert`' +@result{} +@end example + +@noindent +the @code{divert} builtin macro will be called, which expands to the +empty string. + +@cindex rescanning +The output of macro evaluations is always rescanned. In the following +example, the input @samp{x`'y} yields the string @samp{bCD}, exactly as +if @code{m4} +has been given @w{@samp{substr(ab`'cde, `1', `3')}} as input: + +@example +define(`cde', `CDE') +@result{} +define(`x', `substr(ab') +@result{} +define(`y', `cde, `1', `3')') +@result{} +x`'y +@result{}bCD +@end example + +@ignore +@comment Similar, but with argument references, to ensure good test +@comment coverage. +@example +define(`x1', `len(`$1'') +@result{} +define(`y1', ``$1')') +@result{} +x1(`01234567890123456789')y1(`98765432109876543210') +@result{}40 +@end example +@end ignore + +Unquoted strings on either side of a quoted string are subject to +being recognized as macro names. In the following example, quoting the +empty string allows for the second @code{macro} to be recognized as such: + +@example +define(`macro', `m') +@result{} +macro(`m')macro +@result{}mmacro +macro(`m')`'macro +@result{}mm +@end example + +Quoting may prevent recognizing as a macro name the concatenation of a +macro expansion with the surrounding characters. In this example: + +@example +define(`macro', `di$1') +@result{} +macro(`v')`ert' +@result{}divert +macro(`v')ert +@result{} +@end example + +@noindent +the input will produce the string @samp{divert}. When the quotes were +removed, the @code{divert} builtin was called instead. + +@node Macro Arguments +@section Macro arguments + +@cindex macros, arguments to +@cindex arguments to macros +When a name is seen, and it has a macro definition, it will be expanded +as a macro. + +If the name is followed by an opening parenthesis, the arguments will be +collected before the macro is called. If too few arguments are +supplied, the missing arguments are taken to be the empty string. +However, some builtins are documented to behave differently for a +missing optional argument than for an explicit empty string. If there +are too many arguments, the excess arguments are ignored. Unquoted +leading whitespace is stripped off all arguments, but whitespace +generated by a macro expansion or occurring after a macro that expanded +to an empty string remains intact. Whitespace includes space, tab, +newline, carriage return, vertical tab, and formfeed. + +@example +define(`macro', `$1') +@result{} +macro( unquoted leading space lost) +@result{}unquoted leading space lost +macro(` quoted leading space kept') +@result{} quoted leading space kept +macro( + divert `unquoted space kept after expansion') +@result{} unquoted space kept after expansion +macro(macro(` +')`whitespace from expansion kept') +@result{} +@result{}whitespace from expansion kept +macro(`unquoted trailing whitespace kept' +) +@result{}unquoted trailing whitespace kept +@result{} +@end example + +@cindex warnings, suppressing +@cindex suppressing warnings +Normally @code{m4} will issue warnings if a builtin macro is called +with an inappropriate number of arguments, but it can be suppressed with +the @option{--quiet} command line option (or @option{--silent}, or +@option{-Q}, @pxref{Operation modes, , Invoking m4}). For user +defined macros, there is no check of the number of arguments given. + +@example +$ @kbd{m4} +index(`abc') +@error{}m4:stdin:1: Warning: too few arguments to builtin `index' +@result{}0 +index(`abc',) +@result{}0 +index(`abc', `b', `ignored') +@error{}m4:stdin:3: Warning: excess arguments to builtin `index' ignored +@result{}1 +@end example + +@comment options: -Q +@example +$ @kbd{m4 -Q} +index(`abc') +@result{}0 +index(`abc',) +@result{}0 +index(`abc', `b', `ignored') +@result{}1 +@end example + +Macros are expanded normally during argument collection, and whatever +commas, quotes and parentheses that might show up in the resulting +expanded text will serve to define the arguments as well. Thus, if +@var{foo} expands to @samp{, b, c}, the macro call + +@comment ignore +@example +bar(a foo, d) +@end example + +@noindent +is a macro call with four arguments, which are @samp{a }, @samp{b}, +@samp{c} and @samp{d}. To understand why the first argument contains +whitespace, remember that unquoted leading whitespace is never part +of an argument, but trailing whitespace always is. + +It is possible for a macro's definition to change during argument +collection, in which case the expansion uses the definition that was in +effect at the time the opening @samp{(} was seen. + +@example +define(`f', `1') +@result{} +f(define(`f', `2')) +@result{}1 +f +@result{}2 +@end example + +It is an error if the end of file occurs while collecting arguments. + +@comment status: 1 +@example +hello world +@result{}hello world +define( +^D +@error{}m4:stdin:2: ERROR: end of file in argument list +@end example + +@node Quoting Arguments +@section On Quoting Arguments to macros + +@cindex quoted macro arguments +@cindex macros, quoted arguments to +@cindex arguments, quoted macro +Each argument has unquoted leading whitespace removed. Within each +argument, all unquoted parentheses must match. For example, if +@var{foo} is a macro, + +@comment ignore +@example +foo(() (`(') `(') +@end example + +@noindent +is a macro call, with one argument, whose value is @samp{() (() (}. +Commas separate arguments, except when they occur inside quotes, +comments, or unquoted parentheses. @xref{Pseudo Arguments}, for +examples. + +It is common practice to quote all arguments to macros, unless you are +sure you want the arguments expanded. Thus, in the above +example with the parentheses, the `right' way to do it is like this: + +@comment ignore +@example +foo(`() (() (') +@end example + +@cindex quoting rule of thumb +@cindex rule of thumb, quoting +It is, however, in certain cases necessary (because nested expansion +must occur to create the arguments for the outer macro) or convenient +(because it uses fewer characters) to leave out quotes for some +arguments, and there is nothing wrong in doing it. It just makes life a +bit harder, if you are not careful to follow a consistent quoting style. +For consistency, this manual follows the rule of thumb that each layer +of parentheses introduces another layer of single quoting, except when +showing the consequences of quoting rules. This is done even when the +quoted string cannot be a macro, such as with integers when you have not +changed the syntax via @code{changeword} (@pxref{Changeword}). + +The quoting rule of thumb of one level of quoting per parentheses has a +nice property: when a macro name appears inside parentheses, you can +determine when it will be expanded. If it is not quoted, it will be +expanded prior to the outer macro, so that its expansion becomes the +argument. If it is single-quoted, it will be expanded after the outer +macro. And if it is double-quoted, it will be used as literal text +instead of a macro name. + +@example +define(`active', `ACT, IVE') +@result{} +define(`show', `$1 $1') +@result{} +show(active) +@result{}ACT ACT +show(`active') +@result{}ACT, IVE ACT, IVE +show(``active'') +@result{}active active +@end example + +@node Macro expansion +@section Macro expansion + +@cindex macros, expansion of +@cindex expansion of macros +When the arguments, if any, to a macro call have been collected, the +macro is expanded, and the expansion text is pushed back onto the input +(unquoted), and reread. The expansion text from one macro call might +therefore result in more macros being called, if the calls are included, +completely or partially, in the first macro calls' expansion. + +Taking a very simple example, if @var{foo} expands to @samp{bar}, and +@var{bar} expands to @samp{Hello}, the input + +@comment options: -Dbar=Hello -Dfoo=bar +@example +$ @kbd{m4 -Dbar=Hello -Dfoo=bar} +foo +@result{}Hello +@end example + +@noindent +will expand first to @samp{bar}, and when this is reread and +expanded, into @samp{Hello}. + +@ignore +@comment not worth documenting, but test that the command line can +@comment define macros that take parameters + +@comment options: -Dfoo -Decho=$@ +@example +$ @kbd{m4 -Dfoo -Decho='$@'} +foo +@result{} +foo(`silently ignored') +@result{} +echo(`1', `2') +@result{}1,2 +@end example +@end ignore + +@node Definitions +@chapter How to define new macros + +@cindex macros, how to define new +@cindex defining new macros +Macros can be defined, redefined and deleted in several different ways. +Also, it is possible to redefine a macro without losing a previous +value, and bring back the original value at a later time. + +@menu +* Define:: Defining a new macro +* Arguments:: Arguments to macros +* Pseudo Arguments:: Special arguments to macros +* Undefine:: Deleting a macro +* Defn:: Renaming macros +* Pushdef:: Temporarily redefining macros + +* Indir:: Indirect call of macros +* Builtin:: Indirect call of builtins +@end menu + +@node Define +@section Defining a macro + +The normal way to define or redefine macros is to use the builtin +@code{define}: + +@deffn Builtin define (@var{name}, @ovar{expansion}) +Defines @var{name} to expand to @var{expansion}. If +@var{expansion} is not given, it is taken to be empty. + +The expansion of @code{define} is void. +The macro @code{define} is recognized only with parameters. +@end deffn + +The following example defines the macro @var{foo} to expand to the text +@samp{Hello World.}. + +@example +define(`foo', `Hello world.') +@result{} +foo +@result{}Hello world. +@end example + +The empty line in the output is there because the newline is not +a part of the macro definition, and it is consequently copied to +the output. This can be avoided by use of the macro @code{dnl}. +@xref{Dnl}, for details. + +The first argument to @code{define} should be quoted; otherwise, if the +macro is already defined, you will be defining a different macro. This +example shows the problems with underquoting, since we did not want to +redefine @code{one}: + +@example +define(foo, one) +@result{} +define(foo, two) +@result{} +one +@result{}two +@end example + +@cindex GNU extensions +GNU @code{m4} normally replaces only the @emph{topmost} +definition of a macro if it has several definitions from @code{pushdef} +(@pxref{Pushdef}). Some other implementations of @code{m4} replace all +definitions of a macro with @code{define}. @xref{Incompatibilities}, +for more details. + +As a GNU extension, the first argument to @code{define} does +not have to be a simple word. +It can be any text string, even the empty string. A macro with a +non-standard name cannot be invoked in the normal way, as the name is +not recognized. It can only be referenced by the builtins @code{indir} +(@pxref{Indir}) and @code{defn} (@pxref{Defn}). + +@cindex arrays +Arrays and associative arrays can be simulated by using non-standard +macro names. + +@deffn Composite array (@var{index}) +@deffnx Composite array_set (@var{index}, @ovar{value}) +Provide access to entries within an array. @code{array} reads the entry +at location @var{index}, and @code{array_set} assigns @var{value} to +location @var{index}. +@end deffn + +@example +define(`array', `defn(format(``array[%d]'', `$1'))') +@result{} +define(`array_set', `define(format(``array[%d]'', `$1'), `$2')') +@result{} +array_set(`4', `array element no. 4') +@result{} +array_set(`17', `array element no. 17') +@result{} +array(`4') +@result{}array element no. 4 +array(eval(`10 + 7')) +@result{}array element no. 17 +@end example + +Change the @samp{%d} to @samp{%s} and it is an associative array. + +@node Arguments +@section Arguments to macros + +@cindex macros, arguments to +@cindex arguments to macros +Macros can have arguments. The @var{n}th argument is denoted by +@code{$n} in the expansion text, and is replaced by the @var{n}th actual +argument, when the macro is expanded. Replacement of arguments happens +before rescanning, regardless of how many nesting levels of quoting +appear in the expansion. Here is an example of a macro with +two arguments. + +@deffn Composite exch (@var{arg1}, @var{arg2}) +Expands to @var{arg2} followed by @var{arg1}, effectively exchanging +their order. +@end deffn + +@example +define(`exch', `$2, $1') +@result{} +exch(`arg1', `arg2') +@result{}arg2, arg1 +@end example + +This can be used, for example, if you like the arguments to +@code{define} to be reversed. + +@example +define(`exch', `$2, $1') +@result{} +define(exch(``expansion text'', ``macro'')) +@result{} +macro +@result{}expansion text +@end example + +@xref{Quoting Arguments}, for an explanation of the double quotes. +(You should try and improve this example so that clients of @code{exch} +do not have to double quote; or @pxref{Improved exch, , Answers}). + +As a special case, the zeroth argument, @code{$0}, is always the name +of the macro being expanded. + +@example +define(`test', ``Macro name: $0'') +@result{} +test +@result{}Macro name: test +@end example + +If you want quoted text to appear as part of the expansion text, +remember that quotes can be nested in quoted strings. Thus, in + +@example +define(`foo', `This is macro `foo'.') +@result{} +foo +@result{}This is macro foo. +@end example + +@noindent +The @samp{foo} in the expansion text is @emph{not} expanded, since it is +a quoted string, and not a name. + +@cindex GNU extensions +@cindex nine arguments, more than +@cindex more than nine arguments +@cindex arguments, more than nine +@cindex positional parameters, more than nine +GNU @code{m4} allows the number following the @samp{$} to +consist of one or more digits, allowing macros to have any number of +arguments. The extension of accepting multiple digits is incompatible +with POSIX, and is different than traditional implementations +of @code{m4}, which only recognize one digit. Therefore, future +versions of GNU M4 will phase out this feature. To portably +access beyond the ninth argument, you can use the @code{argn} macro +documented later (@pxref{Shift}). + +POSIX also states that @samp{$} followed immediately by +@samp{@{} in a macro definition is implementation-defined. This version +of M4 passes the literal characters @samp{$@{} through unchanged, but M4 +2.0 will implement an optional feature similar to @command{sh}, where +@samp{$@{11@}} expands to the eleventh argument, to replace the current +recognition of @samp{$11}. Meanwhile, if you want to guarantee that you +will get a literal @samp{$@{} in output when expanding a macro, even +when you upgrade to M4 2.0, you can use nested quoting to your +advantage: + +@example +define(`foo', `single quoted $`'@{1@} output') +@result{} +define(`bar', ``double quoted $'`@{2@} output'') +@result{} +foo(`a', `b') +@result{}single quoted $@{1@} output +bar(`a', `b') +@result{}double quoted $@{2@} output +@end example + +To help you detect places in your M4 input files that might change in +behavior due to the changed behavior of M4 2.0, you can use the +@option{--warn-macro-sequence} command-line option (@pxref{Operation +modes, , Invoking m4}) with the default regular expression. This will +add a warning any time a macro definition includes @samp{$} followed by +multiple digits, or by @samp{@{}. The warning is not enabled by +default, because it triggers a number of warnings in Autoconf 2.61 (and +Autoconf uses @option{-E} to treat warnings as errors), and because it +will still be possible to restore older behavior in M4 2.0. + +@comment options: --warn-macro-sequence +@example +$ @kbd{m4 --warn-macro-sequence} +define(`foo', `$001 $@{1@} $1') +@error{}m4:stdin:1: Warning: definition of `foo' contains sequence `$001' +@error{}m4:stdin:1: Warning: definition of `foo' contains sequence `$@{1@}' +@result{} +foo(`bar') +@result{}bar $@{1@} bar +@end example + +@node Pseudo Arguments +@section Special arguments to macros + +@cindex special arguments to macros +@cindex macros, special arguments to +@cindex arguments to macros, special +There is a special notation for the number of actual arguments supplied, +and for all the actual arguments. + +The number of actual arguments in a macro call is denoted by @code{$#} +in the expansion text. + +@deffn Composite nargs (@dots{}) +Expands to a count of the number of arguments supplied. +@end deffn + +@example +define(`nargs', `$#') +@result{} +nargs +@result{}0 +nargs() +@result{}1 +nargs(`arg1', `arg2', `arg3') +@result{}3 +nargs(`commas can be quoted, like this') +@result{}1 +nargs(arg1#inside comments, commas do not separate arguments +still arg1) +@result{}1 +nargs((unquoted parentheses, like this, group arguments)) +@result{}1 +@end example + +Remember that @samp{#} defaults to the comment character; if you forget +quotes to inhibit the comment behavior, your macro definition may not +end where you expected. + +@example +dnl Attempt to define a macro to just `$#' +define(underquoted, $#) +oops) +@result{} +underquoted +@result{}0) +@result{}oops +@end example + +The notation @code{$*} can be used in the expansion text to denote all +the actual arguments, unquoted, with commas in between. For example + +@example +define(`echo', `$*') +@result{} +echo(arg1, arg2, arg3 , arg4) +@result{}arg1,arg2,arg3 ,arg4 +@end example + +Often each argument should be quoted, and the notation @code{$@@} handles +that. It is just like @code{$*}, except that it quotes each argument. +A simple example of that is: + +@example +define(`echo', `$@@') +@result{} +echo(arg1, arg2, arg3 , arg4) +@result{}arg1,arg2,arg3 ,arg4 +@end example + +Where did the quotes go? Of course, they were eaten, when the expanded +text were reread by @code{m4}. To show the difference, try + +@example +define(`echo1', `$*') +@result{} +define(`echo2', `$@@') +@result{} +define(`foo', `This is macro `foo'.') +@result{} +echo1(foo) +@result{}This is macro This is macro foo.. +echo1(`foo') +@result{}This is macro foo. +echo2(foo) +@result{}This is macro foo. +echo2(`foo') +@result{}foo +@end example + +@noindent +@xref{Trace}, if you do not understand this. As another example of the +difference, remember that comments encountered in arguments are passed +untouched to the macro, and that quoting disables comments. + +@example +define(`echo1', `$*') +@result{} +define(`echo2', `$@@') +@result{} +define(`foo', `bar') +@result{} +echo1(#foo'foo +foo) +@result{}#foo'foo +@result{}bar +echo2(#foo'foo +foo) +@result{}#foobar +@result{}bar' +@end example + +@ignore +@comment Not worth putting in the manual, but this example is needed for +@comment good test coverage of copying large strings across recursion +@comment levels. + +@example +define(`echo', `$@@')dnl +echo(echo(`01234567890123456789', `01234567890123456789') +echo(`98765432109876543210', `98765432109876543210')) +@result{}01234567890123456789,01234567890123456789 +@result{}98765432109876543210,98765432109876543210 +len((echo(`01234567890123456789', + `01234567890123456789')echo(`98765432109876543210', + `98765432109876543210'))) +@result{}84 +indir(`echo', indir(`echo', `01234567890123456789', + `01234567890123456789') +indir(`echo', `98765432109876543210', `98765432109876543210')) +@result{}01234567890123456789,01234567890123456789 +@result{}98765432109876543210,98765432109876543210 +define(`argn', `$#')dnl +define(`echo1', `-$@@-')define(`echo2', `,$@@,')dnl +echo1(`1', `2', `3') argn(echo1(`1', `2', `3')) +@result{}-1,2,3- 3 +echo2(`1', `2', `3') argn(echo2(`1', `2', `3')) +@result{},1,2,3, 5 +@end example +@end ignore + +A @samp{$} sign in the expansion text, that is not followed by anything +@code{m4} understands, is simply copied to the macro expansion, as any +other text is. + +@example +define(`foo', `$$$ hello $$$') +@result{} +foo +@result{}$$$ hello $$$ +@end example + +@cindex rescanning +@cindex literal output +@cindex output, literal +If you want a macro to expand to something like @samp{$12}, the +judicious use of nested quoting can put a safe character between the +@code{$} and the next character, relying on the rescanning to remove the +nested quote. This will prevent @code{m4} from interpreting the +@code{$} sign as a reference to an argument. + +@example +define(`foo', `no nested quote: $1') +@result{} +foo(`arg') +@result{}no nested quote: arg +define(`foo', `nested quote around $: `$'1') +@result{} +foo(`arg') +@result{}nested quote around $: $1 +define(`foo', `nested empty quote after $: $`'1') +@result{} +foo(`arg') +@result{}nested empty quote after $: $1 +define(`foo', `nested quote around next character: $`1'') +@result{} +foo(`arg') +@result{}nested quote around next character: $1 +define(`foo', `nested quote around both: `$1'') +@result{} +foo(`arg') +@result{}nested quote around both: arg +@end example + +@node Undefine +@section Deleting a macro + +@cindex macros, how to delete +@cindex deleting macros +@cindex undefining macros +A macro definition can be removed with @code{undefine}: + +@deffn Builtin undefine (@var{name}@dots{}) +For each argument, remove the macro @var{name}. The macro names must +necessarily be quoted, since they will be expanded otherwise. + +The expansion of @code{undefine} is void. +The macro @code{undefine} is recognized only with parameters. +@end deffn + +@example +foo bar blah +@result{}foo bar blah +define(`foo', `some')define(`bar', `other')define(`blah', `text') +@result{} +foo bar blah +@result{}some other text +undefine(`foo') +@result{} +foo bar blah +@result{}foo other text +undefine(`bar', `blah') +@result{} +foo bar blah +@result{}foo bar blah +@end example + +Undefining a macro inside that macro's expansion is safe; the macro +still expands to the definition that was in effect at the @samp{(}. + +@example +define(`f', ``$0':$1') +@result{} +f(f(f(undefine(`f')`hello world'))) +@result{}f:f:f:hello world +f(`bye') +@result{}f(bye) +@end example + +It is not an error for @var{name} to have no macro definition. In that +case, @code{undefine} does nothing. + +@node Defn +@section Renaming macros + +@cindex macros, how to rename +@cindex renaming macros +@cindex macros, displaying definitions +@cindex definitions, displaying macro +It is possible to rename an already defined macro. To do this, you need +the builtin @code{defn}: + +@deffn Builtin defn (@var{name}@dots{}) +Expands to the @emph{quoted definition} of each @var{name}. If an +argument is not a defined macro, the expansion for that argument is +empty. + +If @var{name} is a user-defined macro, the quoted definition is simply +the quoted expansion text. If, instead, there is only one @var{name} +and it is a builtin, the +expansion is a special token, which points to the builtin's internal +definition. This token is only meaningful as the second argument to +@code{define} (and @code{pushdef}), and is silently converted to an +empty string in most other contexts. Combining a builtin with anything +else is not supported; a warning is issued and the builtin is omitted +from the final expansion. + +The macro @code{defn} is recognized only with parameters. +@end deffn + +Its normal use is best understood through an example, which shows how to +rename @code{undefine} to @code{zap}: + +@example +define(`zap', defn(`undefine')) +@result{} +zap(`undefine') +@result{} +undefine(`zap') +@result{}undefine(zap) +@end example + +In this way, @code{defn} can be used to copy macro definitions, and also +definitions of builtin macros. Even if the original macro is removed, +the other name can still be used to access the definition. + +The fact that macro definitions can be transferred also explains why you +should use @code{$0}, rather than retyping a macro's name in its +definition: + +@example +define(`foo', `This is `$0'') +@result{} +define(`bar', defn(`foo')) +@result{} +bar +@result{}This is bar +@end example + +Macros used as string variables should be referred through @code{defn}, +to avoid unwanted expansion of the text: + +@example +define(`string', `The macro dnl is very useful +') +@result{} +string +@result{}The macro@w{ } +defn(`string') +@result{}The macro dnl is very useful +@result{} +@end example + +@cindex rescanning +However, it is important to remember that @code{m4} rescanning is purely +textual. If an unbalanced end-quote string occurs in a macro +definition, the rescan will see that embedded quote as the termination +of the quoted string, and the remainder of the macro's definition will +be rescanned unquoted. Thus it is a good idea to avoid unbalanced +end-quotes in macro definitions or arguments to macros. + +@example +define(`foo', a'a) +@result{} +define(`a', `A') +@result{} +define(`echo', `$@@') +@result{} +foo +@result{}A'A +defn(`foo') +@result{}aA' +echo(foo) +@result{}AA' +@end example + +On the other hand, it is possible to exploit the fact that @code{defn} +can concatenate multiple macros prior to the rescanning phase, in order +to join the definitions of macros that, in isolation, have unbalanced +quotes. This is particularly useful when one has used several macros to +accumulate text that M4 should rescan as a whole. In the example below, +note how the use of @code{defn} on @code{l} in isolation opens a string, +which is not closed until the next line; but used on @code{l} and +@code{r} together results in nested quoting. + +@example +define(`l', `<[>')define(`r', `<]>') +@result{} +changequote(`[', `]') +@result{} +defn([l])defn([r]) +]) +@result{}<[>]defn([r]) +@result{}) +defn([l], [r]) +@result{}<[>][<]> +@end example + +@cindex builtins, special tokens +@cindex tokens, builtin macro +Using @code{defn} to generate special tokens for builtin macros outside +of expected contexts can sometimes trigger warnings. But most of the +time, such tokens are silently converted to the empty string. + +@example +$ @kbd{m4 -d} +defn(`defn') +@result{} +define(defn(`divnum'), `cannot redefine a builtin token') +@error{}m4:stdin:2: Warning: define: invalid macro name ignored +@result{} +divnum +@result{}0 +len(defn(`divnum')) +@result{}0 +@end example + +Also note that @code{defn} with multiple arguments can only join text +macros, not builtins, although a future version of GNU M4 may +lift this restriction. + +@example +$ @kbd{m4 -d} +define(`a', `A')define(`AA', `b') +@result{} +traceon(`defn', `define') +@result{} +defn(`a', `divnum', `a') +@error{}m4:stdin:3: Warning: cannot concatenate builtin `divnum' +@error{}m4trace: -1- defn(`a', `divnum', `a') -> ``A'`A'' +@result{}AA +define(`mydivnum', defn(`divnum', `divnum'))mydivnum +@error{}m4:stdin:4: Warning: cannot concatenate builtin `divnum' +@error{}m4:stdin:4: Warning: cannot concatenate builtin `divnum' +@error{}m4trace: -2- defn(`divnum', `divnum') +@error{}m4trace: -1- define(`mydivnum', `') +@result{} +traceoff(`defn', `define') +@result{} +@end example + +@node Pushdef +@section Temporarily redefining macros + +@cindex macros, temporary redefinition of +@cindex temporary redefinition of macros +@cindex redefinition of macros, temporary +@cindex definition stack +@cindex pushdef stack +@cindex stack, macro definition +It is possible to redefine a macro temporarily, reverting to the +previous definition at a later time. This is done with the builtins +@code{pushdef} and @code{popdef}: + +@deffn Builtin pushdef (@var{name}, @ovar{expansion}) +@deffnx Builtin popdef (@var{name}@dots{}) +Analogous to @code{define} and @code{undefine}. + +These macros work in a stack-like fashion. A macro is temporarily +redefined with @code{pushdef}, which replaces an existing definition of +@var{name}, while saving the previous definition, before the new one is +installed. If there is no previous definition, @code{pushdef} behaves +exactly like @code{define}. + +If a macro has several definitions (of which only one is accessible), +the topmost definition can be removed with @code{popdef}. If there is +no previous definition, @code{popdef} behaves like @code{undefine}. + +The expansion of both @code{pushdef} and @code{popdef} is void. +The macros @code{pushdef} and @code{popdef} are recognized only with +parameters. +@end deffn + +@example +define(`foo', `Expansion one.') +@result{} +foo +@result{}Expansion one. +pushdef(`foo', `Expansion two.') +@result{} +foo +@result{}Expansion two. +pushdef(`foo', `Expansion three.') +@result{} +pushdef(`foo', `Expansion four.') +@result{} +popdef(`foo') +@result{} +foo +@result{}Expansion three. +popdef(`foo', `foo') +@result{} +foo +@result{}Expansion one. +popdef(`foo') +@result{} +foo +@result{}foo +@end example + +If a macro with several definitions is redefined with @code{define}, the +topmost definition is @emph{replaced} with the new definition. If it is +removed with @code{undefine}, @emph{all} the definitions are removed, +and not only the topmost one. However, POSIX allows other +implementations that treat @code{define} as replacing an entire stack +of definitions with a single new definition, so to be portable to other +implementations, it may be worth explicitly using @code{popdef} and +@code{pushdef} rather than relying on the GNU behavior of +@code{define}. + +@example +define(`foo', `Expansion one.') +@result{} +foo +@result{}Expansion one. +pushdef(`foo', `Expansion two.') +@result{} +foo +@result{}Expansion two. +define(`foo', `Second expansion two.') +@result{} +foo +@result{}Second expansion two. +undefine(`foo') +@result{} +foo +@result{}foo +@end example + +@cindex local variables +@cindex variables, local +Local variables within macros are made with @code{pushdef} and +@code{popdef}. At the start of the macro a new definition is pushed, +within the macro it is manipulated and at the end it is popped, +revealing the former definition. + +It is possible to temporarily redefine a builtin with @code{pushdef} +and @code{defn}. + +@node Indir +@section Indirect call of macros + +@cindex indirect call of macros +@cindex call of macros, indirect +@cindex macros, indirect call of +@cindex GNU extensions +Any macro can be called indirectly with @code{indir}: + +@deffn Builtin indir (@var{name}, @ovar{args@dots{}}) +Results in a call to the macro @var{name}, which is passed the +rest of the arguments @var{args}. If @var{name} is not defined, an +error message is printed, and the expansion is void. + +The macro @code{indir} is recognized only with parameters. +@end deffn + +This can be used to call macros with computed or ``invalid'' +names (@code{define} allows such names to be defined): + +@example +define(`$$internal$macro', `Internal macro (name `$0')') +@result{} +$$internal$macro +@result{}$$internal$macro +indir(`$$internal$macro') +@result{}Internal macro (name $$internal$macro) +@end example + +The point is, here, that larger macro packages can have private macros +defined, that will not be called by accident. They can @emph{only} be +called through the builtin @code{indir}. + +One other point to observe is that argument collection occurs before +@code{indir} invokes @var{name}, so if argument collection changes the +value of @var{name}, that will be reflected in the final expansion. +This is different than the behavior when invoking macros directly, +where the definition that was in effect before argument collection is +used. + +@example +$ @kbd{m4 -d} +define(`f', `1') +@result{} +f(define(`f', `2')) +@result{}1 +indir(`f', define(`f', `3')) +@result{}3 +indir(`f', undefine(`f')) +@error{}m4:stdin:4: undefined macro `f' +@result{} +@end example + +When handed the result of @code{defn} (@pxref{Defn}) as one of its +arguments, @code{indir} defers to the invoked @var{name} for whether a +token representing a builtin is recognized or flattened to the empty +string. + +@example +$ @kbd{m4 -d} +indir(defn(`defn'), `divnum') +@error{}m4:stdin:1: Warning: indir: invalid macro name ignored +@result{} +indir(`define', defn(`defn'), `divnum') +@error{}m4:stdin:2: Warning: define: invalid macro name ignored +@result{} +indir(`define', `foo', defn(`divnum')) +@result{} +foo +@result{}0 +indir(`divert', defn(`foo')) +@error{}m4:stdin:5: empty string treated as 0 in builtin `divert' +@result{} +@end example + +@node Builtin +@section Indirect call of builtins + +@cindex indirect call of builtins +@cindex call of builtins, indirect +@cindex builtins, indirect call of +@cindex GNU extensions +Builtin macros can be called indirectly with @code{builtin}: + +@deffn Builtin builtin (@var{name}, @ovar{args@dots{}}) +Results in a call to the builtin @var{name}, which is passed the +rest of the arguments @var{args}. If @var{name} does not name a +builtin, an error message is printed, and the expansion is void. + +The macro @code{builtin} is recognized only with parameters. +@end deffn + +This can be used even if @var{name} has been given another definition +that has covered the original, or been undefined so that no macro +maps to the builtin. + +@example +pushdef(`define', `hidden') +@result{} +undefine(`undefine') +@result{} +define(`foo', `bar') +@result{}hidden +foo +@result{}foo +builtin(`define', `foo', defn(`divnum')) +@result{} +foo +@result{}0 +builtin(`define', `foo', `BAR') +@result{} +foo +@result{}BAR +undefine(`foo') +@result{}undefine(foo) +foo +@result{}BAR +builtin(`undefine', `foo') +@result{} +foo +@result{}foo +@end example + +The @var{name} argument only matches the original name of the builtin, +even when the @option{--prefix-builtins} option (or @option{-P}, +@pxref{Operation modes, , Invoking m4}) is in effect. This is different +from @code{indir}, which only tracks current macro names. + +@comment options: -P +@example +$ @kbd{m4 -P} +m4_builtin(`divnum') +@result{}0 +m4_builtin(`m4_divnum') +@error{}m4:stdin:2: undefined builtin `m4_divnum' +@result{} +m4_indir(`divnum') +@error{}m4:stdin:3: undefined macro `divnum' +@result{} +m4_indir(`m4_divnum') +@result{}0 +@end example + +Note that @code{indir} and @code{builtin} can be used to invoke builtins +without arguments, even when they normally require parameters to be +recognized; but it will provoke a warning, and result in a void expansion. + +@example +builtin +@result{}builtin +builtin() +@error{}m4:stdin:2: undefined builtin `' +@result{} +builtin(`builtin') +@error{}m4:stdin:3: Warning: too few arguments to builtin `builtin' +@result{} +builtin(`builtin',) +@error{}m4:stdin:4: undefined builtin `' +@result{} +builtin(`builtin', ``' +') +@error{}m4:stdin:5: undefined builtin ``' +@error{}' +@result{} +indir(`index') +@error{}m4:stdin:7: Warning: too few arguments to builtin `index' +@result{} +@end example + +@ignore +@comment This example is not worth putting in the manual, but it is +@comment needed for full coverage. Autoconf's m4_include relies heavily +@comment on this feature. + +@example +builtin(`include', `foo')dnl +@result{}bar +@end example + +@comment And this example triggers a regression present in 1.4.10b. + +@example +define(`s', `builtin(`shift', $@@)')dnl +define(`loop', `ifelse(`$2', `', `-', `$1$2: $0(`$1', s(s($@@)))')')dnl +loop(`1') +@result{}- +loop(`1', `2') +@result{}12: - +loop(`1', `2', `3') +@result{}12: 13: - +loop(`1', `2', `3', `4') +@result{}12: 13: 14: - +loop(`1', `2', `3', `4', `5') +@result{}12: 13: 14: 15: - +@end example +@end ignore + +@node Conditionals +@chapter Conditionals, loops, and recursion + +Macros, expanding to plain text, perhaps with arguments, are not quite +enough. We would like to have macros expand to different things, based +on decisions taken at run-time. For that, we need some kind of conditionals. +Also, we would like to have some kind of loop construct, so we could do +something a number of times, or while some condition is true. + +@menu +* Ifdef:: Testing if a macro is defined +* Ifelse:: If-else construct, or multibranch +* Shift:: Recursion in @code{m4} +* Forloop:: Iteration by counting +* Foreach:: Iteration by list contents +* Stacks:: Working with definition stacks +* Composition:: Building macros with macros +@end menu + +@node Ifdef +@section Testing if a macro is defined + +@cindex conditionals +There are two different builtin conditionals in @code{m4}. The first is +@code{ifdef}: + +@deffn Builtin ifdef (@var{name}, @var{string-1}, @ovar{string-2}) +If @var{name} is defined as a macro, @code{ifdef} expands to +@var{string-1}, otherwise to @var{string-2}. If @var{string-2} is +omitted, it is taken to be the empty string (according to the normal +rules). + +The macro @code{ifdef} is recognized only with parameters. +@end deffn + +@example +ifdef(`foo', ``foo' is defined', ``foo' is not defined') +@result{}foo is not defined +define(`foo', `') +@result{} +ifdef(`foo', ``foo' is defined', ``foo' is not defined') +@result{}foo is defined +ifdef(`no_such_macro', `yes', `no', `extra argument') +@error{}m4:stdin:4: Warning: excess arguments to builtin `ifdef' ignored +@result{}no +@end example + +@node Ifelse +@section If-else construct, or multibranch + +@cindex comparing strings +@cindex discarding input +@cindex input, discarding +The other conditional, @code{ifelse}, is much more powerful. It can be +used as a way to introduce a long comment, as an if-else construct, or +as a multibranch, depending on the number of arguments supplied: + +@deffn Builtin ifelse (@var{comment}) +@deffnx Builtin ifelse (@var{string-1}, @var{string-2}, @var{equal}, @ + @ovar{not-equal}) +@deffnx Builtin ifelse (@var{string-1}, @var{string-2}, @var{equal-1}, @ + @var{string-3}, @var{string-4}, @var{equal-2}, @dots{}, @ovar{not-equal}) +Used with only one argument, the @code{ifelse} simply discards it and +produces no output. + +If called with three or four arguments, @code{ifelse} expands into +@var{equal}, if @var{string-1} and @var{string-2} are equal (character +for character), otherwise it expands to @var{not-equal}. A final fifth +argument is ignored, after triggering a warning. + +If called with six or more arguments, and @var{string-1} and +@var{string-2} are equal, @code{ifelse} expands into @var{equal-1}, +otherwise the first three arguments are discarded and the processing +starts again. + +The macro @code{ifelse} is recognized only with parameters. +@end deffn + +Using only one argument is a common @code{m4} idiom for introducing a +block comment, as an alternative to repeatedly using @code{dnl}. This +special usage is recognized by GNU @code{m4}, so that in this +case, the warning about missing arguments is never triggered. + +@example +ifelse(`some comments') +@result{} +ifelse(`foo', `bar') +@error{}m4:stdin:2: Warning: too few arguments to builtin `ifelse' +@result{} +@end example + +Using three or four arguments provides decision points. + +@example +ifelse(`foo', `bar', `true') +@result{} +ifelse(`foo', `foo', `true') +@result{}true +define(`foo', `bar') +@result{} +ifelse(foo, `bar', `true', `false') +@result{}true +ifelse(foo, `foo', `true', `false') +@result{}false +@end example + +@cindex macro, blind +@cindex blind macro +Notice how the first argument was used unquoted; it is common to compare +the expansion of a macro with a string. With this macro, you can now +reproduce the behavior of blind builtins, where the macro is recognized +only with arguments. + +@example +define(`foo', `ifelse(`$#', `0', ``$0'', `arguments:$#')') +@result{} +foo +@result{}foo +foo() +@result{}arguments:1 +foo(`a', `b', `c') +@result{}arguments:3 +@end example + +For an example of a way to make defining blind macros easier, see +@ref{Composition}. + +@cindex multibranches +@cindex switch statement +@cindex case statement +The macro @code{ifelse} can take more than four arguments. If given more +than four arguments, @code{ifelse} works like a @code{case} or @code{switch} +statement in traditional programming languages. If @var{string-1} and +@var{string-2} are equal, @code{ifelse} expands into @var{equal-1}, otherwise +the procedure is repeated with the first three arguments discarded. This +calls for an example: + +@example +ifelse(`foo', `bar', `third', `gnu', `gnats') +@error{}m4:stdin:1: Warning: excess arguments to builtin `ifelse' ignored +@result{}gnu +ifelse(`foo', `bar', `third', `gnu', `gnats', `sixth') +@result{} +ifelse(`foo', `bar', `third', `gnu', `gnats', `sixth', `seventh') +@result{}seventh +ifelse(`foo', `bar', `3', `gnu', `gnats', `6', `7', `8') +@error{}m4:stdin:4: Warning: excess arguments to builtin `ifelse' ignored +@result{}7 +@end example + +@ignore +@comment Stress tests, not worth documenting. + +@comment Ensure that references compared to strings work regardless of +@comment similar prefixes. +@example +define(`e', `$@@')define(`long', `01234567890123456789') +@result{} +ifelse(long, `01234567890123456789', `yes', `no') +@result{}yes +ifelse(`01234567890123456789', long, `yes', `no') +@result{}yes +ifelse(long, `01234567890123456789-', `yes', `no') +@result{}no +ifelse(`01234567890123456789-', long, `yes', `no') +@result{}no +ifelse(e(long), `01234567890123456789', `yes', `no') +@result{}yes +ifelse(`01234567890123456789', e(long), `yes', `no') +@result{}yes +ifelse(e(long), `01234567890123456789-', `yes', `no') +@result{}no +ifelse(`01234567890123456789-', e(long), `yes', `no') +@result{}no +ifelse(-e(long), `-01234567890123456789', `yes', `no') +@result{}yes +ifelse(-`01234567890123456789', -e(long), `yes', `no') +@result{}yes +ifelse(-e(long), `-01234567890123456789-', `yes', `no') +@result{}no +ifelse(`-01234567890123456789-', -e(long), `yes', `no') +@result{}no +ifelse(-e(long)-, `-01234567890123456789-', `yes', `no') +@result{}yes +ifelse(-`01234567890123456789-', -e(long)-, `yes', `no') +@result{}yes +ifelse(-e(long)-, `-01234567890123456789', `yes', `no') +@result{}no +ifelse(`-01234567890123456789', -e(long)-, `yes', `no') +@result{}no +ifelse(`-'e(long), `-01234567890123456789', `yes', `no') +@result{}yes +ifelse(-`01234567890123456789', `-'e(long), `yes', `no') +@result{}yes +ifelse(`-'e(long), `-01234567890123456789-', `yes', `no') +@result{}no +ifelse(`-01234567890123456789-', `-'e(long), `yes', `no') +@result{}no +ifelse(`-'e(long)`-', `-01234567890123456789-', `yes', `no') +@result{}yes +ifelse(-`01234567890123456789-', `-'e(long)`-', `yes', `no') +@result{}yes +ifelse(`-'e(long)`-', `-01234567890123456789', `yes', `no') +@result{}no +ifelse(`-01234567890123456789', `-'e(long)`-', `yes', `no') +@result{}no +@end example +@end ignore + +Naturally, the normal case will be slightly more advanced than these +examples. A common use of @code{ifelse} is in macros implementing loops +of various kinds. + +@node Shift +@section Recursion in @code{m4} + +@cindex recursive macros +@cindex macros, recursive +There is no direct support for loops in @code{m4}, but macros can be +recursive. There is no limit on the number of recursion levels, other +than those enforced by your hardware and operating system. + +@cindex loops +Loops can be programmed using recursion and the conditionals described +previously. + +There is a builtin macro, @code{shift}, which can, among other things, +be used for iterating through the actual arguments to a macro: + +@deffn Builtin shift (@var{arg1}, @dots{}) +Takes any number of arguments, and expands to all its arguments except +@var{arg1}, separated by commas, with each argument quoted. + +The macro @code{shift} is recognized only with parameters. +@end deffn + +@example +shift +@result{}shift +shift(`bar') +@result{} +shift(`foo', `bar', `baz') +@result{}bar,baz +@end example + +An example of the use of @code{shift} is this macro: + +@cindex reversing arguments +@cindex arguments, reversing +@deffn Composite reverse (@dots{}) +Takes any number of arguments, and reverses their order. +@end deffn + +It is implemented as: + +@example +define(`reverse', `ifelse(`$#', `0', , `$#', `1', ``$1'', + `reverse(shift($@@)), `$1'')') +@result{} +reverse +@result{} +reverse(`foo') +@result{}foo +reverse(`foo', `bar', `gnats', `and gnus') +@result{}and gnus, gnats, bar, foo +@end example + +While not a very interesting macro, it does show how simple loops can be +made with @code{shift}, @code{ifelse} and recursion. It also shows +that @code{shift} is usually used with @samp{$@@}. Another example of +this is an implementation of a short-circuiting conditional operator. + +@cindex short-circuiting conditional +@cindex conditional, short-circuiting +@deffn Composite cond (@var{test-1}, @var{string-1}, @var{equal-1}, @ + @ovar{test-2}, @ovar{string-2}, @ovar{equal-2}, @dots{}, @ovar{not-equal}) +Similar to @code{ifelse}, where an equal comparison between the first +two strings results in the third, otherwise the first three arguments +are discarded and the process repeats. The difference is that each +@var{test-<n>} is expanded only when it is encountered. This means that +every third argument to @code{cond} is normally given one more level of +quoting than the corresponding argument to @code{ifelse}. +@end deffn + +Here is the implementation of @code{cond}, along with a demonstration of +how it can short-circuit the side effects in @code{side}. Notice how +all the unquoted side effects happen regardless of how many comparisons +are made with @code{ifelse}, compared with only the relevant effects +with @code{cond}. + +@example +define(`cond', +`ifelse(`$#', `1', `$1', + `ifelse($1, `$2', `$3', + `$0(shift(shift(shift($@@))))')')')dnl +define(`side', `define(`counter', incr(counter))$1')dnl +define(`example1', +`define(`counter', `0')dnl +ifelse(side(`$1'), `yes', `one comparison: ', + side(`$1'), `no', `two comparisons: ', + side(`$1'), `maybe', `three comparisons: ', + `side(`default answer: ')')counter')dnl +define(`example2', +`define(`counter', `0')dnl +cond(`side(`$1')', `yes', `one comparison: ', + `side(`$1')', `no', `two comparisons: ', + `side(`$1')', `maybe', `three comparisons: ', + `side(`default answer: ')')counter')dnl +example1(`yes') +@result{}one comparison: 3 +example1(`no') +@result{}two comparisons: 3 +example1(`maybe') +@result{}three comparisons: 3 +example1(`feeling rather indecisive today') +@result{}default answer: 4 +example2(`yes') +@result{}one comparison: 1 +example2(`no') +@result{}two comparisons: 2 +example2(`maybe') +@result{}three comparisons: 3 +example2(`feeling rather indecisive today') +@result{}default answer: 4 +@end example + +@cindex joining arguments +@cindex arguments, joining +@cindex concatenating arguments +Another common task that requires iteration is joining a list of +arguments into a single string. + +@deffn Composite join (@ovar{separator}, @ovar{args@dots{}}) +@deffnx Composite joinall (@ovar{separator}, @ovar{args@dots{}}) +Generate a single-quoted string, consisting of each @var{arg} separated +by @var{separator}. While @code{joinall} always outputs a +@var{separator} between arguments, @code{join} avoids the +@var{separator} for an empty @var{arg}. +@end deffn + +Here are some examples of its usage, based on the implementation +@file{m4-@value{VERSION}/@/examples/@/join.m4} distributed in this +package: + +@comment examples +@example +$ @kbd{m4 -I examples} +include(`join.m4') +@result{} +join,join(`-'),join(`-', `'),join(`-', `', `') +@result{},,, +joinall,joinall(`-'),joinall(`-', `'),joinall(`-', `', `') +@result{},,,- +join(`-', `1') +@result{}1 +join(`-', `1', `2', `3') +@result{}1-2-3 +join(`', `1', `2', `3') +@result{}123 +join(`-', `', `1', `', `', `2', `') +@result{}1-2 +joinall(`-', `', `1', `', `', `2', `') +@result{}-1---2- +join(`,', `1', `2', `3') +@result{}1,2,3 +define(`nargs', `$#')dnl +nargs(join(`,', `1', `2', `3')) +@result{}1 +@end example + +Examining the implementation shows some interesting points about several +m4 programming idioms. + +@comment examples +@example +$ @kbd{m4 -I examples} +undivert(`join.m4')dnl +@result{}divert(`-1') +@result{}# join(sep, args) - join each non-empty ARG into a single +@result{}# string, with each element separated by SEP +@result{}define(`join', +@result{}`ifelse(`$#', `2', ``$2'', +@result{} `ifelse(`$2', `', `', ``$2'_')$0(`$1', shift(shift($@@)))')') +@result{}define(`_join', +@result{}`ifelse(`$#$2', `2', `', +@result{} `ifelse(`$2', `', `', ``$1$2'')$0(`$1', shift(shift($@@)))')') +@result{}# joinall(sep, args) - join each ARG, including empty ones, +@result{}# into a single string, with each element separated by SEP +@result{}define(`joinall', ``$2'_$0(`$1', shift($@@))') +@result{}define(`_joinall', +@result{}`ifelse(`$#', `2', `', ``$1$3'$0(`$1', shift(shift($@@)))')') +@result{}divert`'dnl +@end example + +First, notice that this implementation creates helper macros +@code{_join} and @code{_joinall}. This division of labor makes it +easier to output the correct number of @var{separator} instances: +@code{join} and @code{joinall} are responsible for the first argument, +without a separator, while @code{_join} and @code{_joinall} are +responsible for all remaining arguments, always outputting a separator +when outputting an argument. + +Next, observe how @code{join} decides to iterate to itself, because the +first @var{arg} was empty, or to output the argument and swap over to +@code{_join}. If the argument is non-empty, then the nested +@code{ifelse} results in an unquoted @samp{_}, which is concatenated +with the @samp{$0} to form the next macro name to invoke. The +@code{joinall} implementation is simpler since it does not have to +suppress empty @var{arg}; it always executes once then defers to +@code{_joinall}. + +Another important idiom is the idea that @var{separator} is reused for +each iteration. Each iteration has one less argument, but rather than +discarding @samp{$1} by iterating with @code{$0(shift($@@))}, the macro +discards @samp{$2} by using @code{$0(`$1', shift(shift($@@)))}. + +Next, notice that it is possible to compare more than one condition in a +single @code{ifelse} test. The test of @samp{$#$2} against @samp{2} +allows @code{_join} to iterate for two separate reasons---either there +are still more than two arguments, or there are exactly two arguments +but the last argument is not empty. + +Finally, notice that these macros require exactly two arguments to +terminate recursion, but that they still correctly result in empty +output when given no @var{args} (i.e., zero or one macro argument). On +the first pass when there are too few arguments, the @code{shift} +results in no output, but leaves an empty string to serve as the +required second argument for the second pass. Put another way, +@samp{`$1', shift($@@)} is not the same as @samp{$@@}, since only the +former guarantees at least two arguments. + +@cindex quote manipulation +@cindex manipulating quotes +Sometimes, a recursive algorithm requires adding quotes to each element, +or treating multiple arguments as a single element: + +@deffn Composite quote (@dots{}) +@deffnx Composite dquote (@dots{}) +@deffnx Composite dquote_elt (@dots{}) +Takes any number of arguments, and adds quoting. With @code{quote}, +only one level of quoting is added, effectively removing whitespace +after commas and turning multiple arguments into a single string. With +@code{dquote}, two levels of quoting are added, one around each element, +and one around the list. And with @code{dquote_elt}, two levels of +quoting are added around each element. +@end deffn + +An actual implementation of these three macros is distributed as +@file{m4-@value{VERSION}/@/examples/@/quote.m4} in this package. First, +let's examine their usage: + +@comment examples +@example +$ @kbd{m4 -I examples} +include(`quote.m4') +@result{} +-quote-dquote-dquote_elt- +@result{}---- +-quote()-dquote()-dquote_elt()- +@result{}--`'-`'- +-quote(`1')-dquote(`1')-dquote_elt(`1')- +@result{}-1-`1'-`1'- +-quote(`1', `2')-dquote(`1', `2')-dquote_elt(`1', `2')- +@result{}-1,2-`1',`2'-`1',`2'- +define(`n', `$#')dnl +-n(quote(`1', `2'))-n(dquote(`1', `2'))-n(dquote_elt(`1', `2'))- +@result{}-1-1-2- +dquote(dquote_elt(`1', `2')) +@result{}``1'',``2'' +dquote_elt(dquote(`1', `2')) +@result{}``1',`2'' +@end example + +The last two lines show that when given two arguments, @code{dquote} +results in one string, while @code{dquote_elt} results in two. Now, +examine the implementation. Note that @code{quote} and +@code{dquote_elt} make decisions based on their number of arguments, so +that when called without arguments, they result in nothing instead of a +quoted empty string; this is so that it is possible to distinguish +between no arguments and an empty first argument. @code{dquote}, on the +other hand, results in a string no matter what, since it is still +possible to tell whether it was invoked without arguments based on the +resulting string. + +@comment examples +@example +$ @kbd{m4 -I examples} +undivert(`quote.m4')dnl +@result{}divert(`-1') +@result{}# quote(args) - convert args to single-quoted string +@result{}define(`quote', `ifelse(`$#', `0', `', ``$*'')') +@result{}# dquote(args) - convert args to quoted list of quoted strings +@result{}define(`dquote', ``$@@'') +@result{}# dquote_elt(args) - convert args to list of double-quoted strings +@result{}define(`dquote_elt', `ifelse(`$#', `0', `', `$#', `1', ```$1''', +@result{} ```$1'',$0(shift($@@))')') +@result{}divert`'dnl +@end example + +It is worth pointing out that @samp{quote(@var{args})} is more efficient +than @samp{joinall(`,', @var{args})} for producing the same output. + +@cindex nine arguments, more than +@cindex more than nine arguments +@cindex arguments, more than nine +One more useful macro based on @code{shift} allows portably selecting +an arbitrary argument (usually greater than the ninth argument), without +relying on the GNU extension of multi-digit arguments +(@pxref{Arguments}). + +@deffn Composite argn (@var{n}, @dots{}) +Expands to argument @var{n} out of the remaining arguments. @var{n} +must be a positive number. Usually invoked as +@samp{argn(`@var{n}',$@@)}. +@end deffn + +It is implemented as: + +@example +define(`argn', `ifelse(`$1', 1, ``$2'', + `argn(decr(`$1'), shift(shift($@@)))')') +@result{} +argn(`1', `a') +@result{}a +define(`foo', `argn(`11', $@@)') +@result{} +foo(`a', `b', `c', `d', `e', `f', `g', `h', `i', `j', `k', `l') +@result{}k +@end example + +@node Forloop +@section Iteration by counting + +@cindex for loops +@cindex loops, counting +@cindex counting loops +Here is an example of a loop macro that implements a simple for loop. + +@deffn Composite forloop (@var{iterator}, @var{start}, @var{end}, @var{text}) +Takes the name in @var{iterator}, which must be a valid macro name, and +successively assign it each integer value from @var{start} to @var{end}, +inclusive. For each assignment to @var{iterator}, append @var{text} to +the expansion of the @code{forloop}. @var{text} may refer to +@var{iterator}. Any definition of @var{iterator} prior to this +invocation is restored. +@end deffn + +It can, for example, be used for simple counting: + +@comment examples +@example +$ @kbd{m4 -I examples} +include(`forloop.m4') +@result{} +forloop(`i', `1', `8', `i ') +@result{}1 2 3 4 5 6 7 8@w{ } +@end example + +For-loops can be nested, like: + +@comment examples +@example +$ @kbd{m4 -I examples} +include(`forloop.m4') +@result{} +forloop(`i', `1', `4', `forloop(`j', `1', `8', ` (i, j)') +') +@result{} (1, 1) (1, 2) (1, 3) (1, 4) (1, 5) (1, 6) (1, 7) (1, 8) +@result{} (2, 1) (2, 2) (2, 3) (2, 4) (2, 5) (2, 6) (2, 7) (2, 8) +@result{} (3, 1) (3, 2) (3, 3) (3, 4) (3, 5) (3, 6) (3, 7) (3, 8) +@result{} (4, 1) (4, 2) (4, 3) (4, 4) (4, 5) (4, 6) (4, 7) (4, 8) +@result{} +@end example + +The implementation of the @code{forloop} macro is fairly +straightforward. The @code{forloop} macro itself is simply a wrapper, +which saves the previous definition of the first argument, calls the +internal macro @code{@w{_forloop}}, and re-establishes the saved +definition of the first argument. + +The macro @code{@w{_forloop}} expands the fourth argument once, and +tests to see if the iterator has reached the final value. If it has +not finished, it increments the iterator (using the predefined macro +@code{incr}, @pxref{Incr}), and recurses. + +Here is an actual implementation of @code{forloop}, distributed as +@file{m4-@value{VERSION}/@/examples/@/forloop.m4} in this package: + +@comment examples +@example +$ @kbd{m4 -I examples} +undivert(`forloop.m4')dnl +@result{}divert(`-1') +@result{}# forloop(var, from, to, stmt) - simple version +@result{}define(`forloop', `pushdef(`$1', `$2')_forloop($@@)popdef(`$1')') +@result{}define(`_forloop', +@result{} `$4`'ifelse($1, `$3', `', `define(`$1', incr($1))$0($@@)')') +@result{}divert`'dnl +@end example + +Notice the careful use of quotes. Certain macro arguments are left +unquoted, each for its own reason. Try to find out @emph{why} these +arguments are left unquoted, and see what happens if they are quoted. +(As presented, these two macros are useful but not very robust for +general use. They lack even basic error handling for cases like +@var{start} less than @var{end}, @var{end} not numeric, or +@var{iterator} not being a macro name. See if you can improve these +macros; or @pxref{Improved forloop, , Answers}). + +@node Foreach +@section Iteration by list contents + +@cindex for each loops +@cindex loops, list iteration +@cindex iterating over lists +Here is an example of a loop macro that implements list iteration. + +@deffn Composite foreach (@var{iterator}, @var{paren-list}, @var{text}) +@deffnx Composite foreachq (@var{iterator}, @var{quote-list}, @var{text}) +Takes the name in @var{iterator}, which must be a valid macro name, and +successively assign it each value from @var{paren-list} or +@var{quote-list}. In @code{foreach}, @var{paren-list} is a +comma-separated list of elements contained in parentheses. In +@code{foreachq}, @var{quote-list} is a comma-separated list of elements +contained in a quoted string. For each assignment to @var{iterator}, +append @var{text} to the overall expansion. @var{text} may refer to +@var{iterator}. Any definition of @var{iterator} prior to this +invocation is restored. +@end deffn + +As an example, this displays each word in a list inside of a sentence, +using an implementation of @code{foreach} distributed as +@file{m4-@value{VERSION}/@/examples/@/foreach.m4}, and @code{foreachq} +in @file{m4-@value{VERSION}/@/examples/@/foreachq.m4}. + +@comment examples +@example +$ @kbd{m4 -I examples} +include(`foreach.m4') +@result{} +foreach(`x', (foo, bar, foobar), `Word was: x +')dnl +@result{}Word was: foo +@result{}Word was: bar +@result{}Word was: foobar +include(`foreachq.m4') +@result{} +foreachq(`x', `foo, bar, foobar', `Word was: x +')dnl +@result{}Word was: foo +@result{}Word was: bar +@result{}Word was: foobar +@end example + +It is possible to be more complex; each element of the @var{paren-list} +or @var{quote-list} can itself be a list, to pass as further arguments +to a helper macro. This example generates a shell case statement: + +@comment examples +@example +$ @kbd{m4 -I examples} +include(`foreach.m4') +@result{} +define(`_case', ` $1) + $2=" $1";; +')dnl +define(`_cat', `$1$2')dnl +case $`'1 in +@result{}case $1 in +foreach(`x', `(`(`a', `vara')', `(`b', `varb')', `(`c', `varc')')', + `_cat(`_case', x)')dnl +@result{} a) +@result{} vara=" a";; +@result{} b) +@result{} varb=" b";; +@result{} c) +@result{} varc=" c";; +esac +@result{}esac +@end example + +The implementation of the @code{foreach} macro is a bit more involved; +it is a wrapper around two helper macros. First, @code{@w{_arg1}} is +needed to grab the first element of a list. Second, +@code{@w{_foreach}} implements the recursion, successively walking +through the original list. Here is a simple implementation of +@code{foreach}: + +@comment examples +@example +$ @kbd{m4 -I examples} +undivert(`foreach.m4')dnl +@result{}divert(`-1') +@result{}# foreach(x, (item_1, item_2, ..., item_n), stmt) +@result{}# parenthesized list, simple version +@result{}define(`foreach', `pushdef(`$1')_foreach($@@)popdef(`$1')') +@result{}define(`_arg1', `$1') +@result{}define(`_foreach', `ifelse(`$2', `()', `', +@result{} `define(`$1', _arg1$2)$3`'$0(`$1', (shift$2), `$3')')') +@result{}divert`'dnl +@end example + +Unfortunately, that implementation is not robust to macro names as list +elements. Each iteration of @code{@w{_foreach}} is stripping another +layer of quotes, leading to erratic results if list elements are not +already fully expanded. The first cut at implementing @code{foreachq} +takes this into account. Also, when using quoted elements in a +@var{paren-list}, the overall list must be quoted. A @var{quote-list} +has the nice property of requiring fewer characters to create a list +containing the same quoted elements. To see the difference between the +two macros, we attempt to pass double-quoted macro names in a list, +expecting the macro name on output after one layer of quotes is removed +during list iteration and the final layer removed during the final +rescan: + +@comment examples +@example +$ @kbd{m4 -I examples} +define(`a', `1')define(`b', `2')define(`c', `3') +@result{} +include(`foreach.m4') +@result{} +include(`foreachq.m4') +@result{} +foreach(`x', `(``a'', ``(b'', ``c)'')', `x +') +@result{}1 +@result{}(2)1 +@result{} +@result{}, x +@result{}) +foreachq(`x', ```a'', ``(b'', ``c)''', `x +')dnl +@result{}a +@result{}(b +@result{}c) +@end example + +Obviously, @code{foreachq} did a better job; here is its implementation: + +@comment examples +@example +$ @kbd{m4 -I examples} +undivert(`foreachq.m4')dnl +@result{}include(`quote.m4')dnl +@result{}divert(`-1') +@result{}# foreachq(x, `item_1, item_2, ..., item_n', stmt) +@result{}# quoted list, simple version +@result{}define(`foreachq', `pushdef(`$1')_foreachq($@@)popdef(`$1')') +@result{}define(`_arg1', `$1') +@result{}define(`_foreachq', `ifelse(quote($2), `', `', +@result{} `define(`$1', `_arg1($2)')$3`'$0(`$1', `shift($2)', `$3')')') +@result{}divert`'dnl +@end example + +Notice that @code{@w{_foreachq}} had to use the helper macro +@code{quote} defined earlier (@pxref{Shift}), to ensure that the +embedded @code{ifelse} call does not go haywire if a list element +contains a comma. Unfortunately, this implementation of @code{foreachq} +has its own severe flaw. Whereas the @code{foreach} implementation was +linear, this macro is quadratic in the number of list elements, and is +much more likely to trip up the limit set by the command line option +@option{--nesting-limit} (or @option{-L}, @pxref{Limits control, , +Invoking m4}). Additionally, this implementation does not expand +@samp{defn(`@var{iterator}')} very well, when compared with +@code{foreach}. + +@comment examples +@example +$ @kbd{m4 -I examples} +include(`foreach.m4')include(`foreachq.m4') +@result{} +foreach(`name', `(`a', `b')', ` defn(`name')') +@result{} a b +foreachq(`name', ``a', `b'', ` defn(`name')') +@result{} _arg1(`a', `b') _arg1(shift(`a', `b')) +@end example + +It is possible to have robust iteration with linear behavior and sane +@var{iterator} contents for either list style. See if you can learn +from the best elements of both of these implementations to create robust +macros (or @pxref{Improved foreach, , Answers}). + +@node Stacks +@section Working with definition stacks + +@cindex definition stack +@cindex pushdef stack +@cindex stack, macro definition +Thanks to @code{pushdef}, manipulation of a stack is an intrinsic +operation in @code{m4}. Normally, only the topmost definition in a +stack is important, but sometimes, it is desirable to manipulate the +entire definition stack. + +@deffn Composite stack_foreach (@var{macro}, @var{action}) +@deffnx Composite stack_foreach_lifo (@var{macro}, @var{action}) +For each of the @code{pushdef} definitions associated with @var{macro}, +invoke the macro @var{action} with a single argument of that definition. +@code{stack_foreach} visits the oldest definition first, while +@code{stack_foreach_lifo} visits the current definition first. +@var{action} should not modify or dereference @var{macro}. There are a +few special macros, such as @code{defn}, which cannot be used as the +@var{macro} parameter. +@end deffn + +A sample implementation of these macros is distributed in the file +@file{m4-@value{VERSION}/@/examples/@/stack.m4}. + +@comment examples +@example +$ @kbd{m4 -I examples} +include(`stack.m4') +@result{} +pushdef(`a', `1')pushdef(`a', `2')pushdef(`a', `3') +@result{} +define(`show', ``$1' +') +@result{} +stack_foreach(`a', `show')dnl +@result{}1 +@result{}2 +@result{}3 +stack_foreach_lifo(`a', `show')dnl +@result{}3 +@result{}2 +@result{}1 +@end example + +Now for the implementation. Note the definition of a helper macro, +@code{_stack_reverse}, which destructively swaps the contents of one +stack of definitions into the reverse order in the temporary macro +@samp{tmp-$1}. By calling the helper twice, the original order is +restored back into the macro @samp{$1}; since the operation is +destructive, this explains why @samp{$1} must not be modified or +dereferenced during the traversal. The caller can then inject +additional code to pass the definition currently being visited to +@samp{$2}. The choice of helper names is intentional; since @samp{-} is +not valid as part of a macro name, there is no risk of conflict with a +valid macro name, and the code is guaranteed to use @code{defn} where +necessary. Finally, note that any macro used in the traversal of a +@code{pushdef} stack, such as @code{pushdef} or @code{defn}, cannot be +handled by @code{stack_foreach}, since the macro would temporarily be +undefined during the algorithm. + +@comment examples +@example +$ @kbd{m4 -I examples} +undivert(`stack.m4')dnl +@result{}divert(`-1') +@result{}# stack_foreach(macro, action) +@result{}# Invoke ACTION with a single argument of each definition +@result{}# from the definition stack of MACRO, starting with the oldest. +@result{}define(`stack_foreach', +@result{}`_stack_reverse(`$1', `tmp-$1')'dnl +@result{}`_stack_reverse(`tmp-$1', `$1', `$2(defn(`$1'))')') +@result{}# stack_foreach_lifo(macro, action) +@result{}# Invoke ACTION with a single argument of each definition +@result{}# from the definition stack of MACRO, starting with the newest. +@result{}define(`stack_foreach_lifo', +@result{}`_stack_reverse(`$1', `tmp-$1', `$2(defn(`$1'))')'dnl +@result{}`_stack_reverse(`tmp-$1', `$1')') +@result{}define(`_stack_reverse', +@result{}`ifdef(`$1', `pushdef(`$2', defn(`$1'))$3`'popdef(`$1')$0($@@)')') +@result{}divert`'dnl +@end example + +@node Composition +@section Building macros with macros + +@cindex macro composition +@cindex composing macros +Since m4 is a macro language, it is possible to write macros that +can build other macros. First on the list is a way to automate the +creation of blind macros. + +@cindex macro, blind +@cindex blind macro +@deffn Composite define_blind (@var{name}, @ovar{value}) +Defines @var{name} as a blind macro, such that @var{name} will expand to +@var{value} only when given explicit arguments. @var{value} should not +be the result of @code{defn} (@pxref{Defn}). This macro is only +recognized with parameters, and results in an empty string. +@end deffn + +Defining a macro to define another macro can be a bit tricky. We want +to use a literal @samp{$#} in the argument to the nested @code{define}. +However, if @samp{$} and @samp{#} are adjacent in the definition of +@code{define_blind}, then it would be expanded as the number of +arguments to @code{define_blind} rather than the intended number of +arguments to @var{name}. The solution is to pass the difficult +characters through extra arguments to a helper macro +@code{_define_blind}. When composing macros, it is a common idiom to +need a helper macro to concatenate text that forms parameters in the +composed macro, rather than interpreting the text as a parameter of the +composing macro. + +As for the limitation against using @code{defn}, there are two reasons. +If a macro was previously defined with @code{define_blind}, then it can +safely be renamed to a new blind macro using plain @code{define}; using +@code{define_blind} to rename it just adds another layer of +@code{ifelse}, occupying memory and slowing down execution. And if a +macro is a builtin, then it would result in an attempt to define a macro +consisting of both text and a builtin token; this is not supported, and +the builtin token is flattened to an empty string. + +With that explanation, here's the definition, and some sample usage. +Notice that @code{define_blind} is itself a blind macro. + +@example +$ @kbd{m4 -d} +define(`define_blind', `ifelse(`$#', `0', ``$0'', +`_$0(`$1', `$2', `$'`#', `$'`0')')') +@result{} +define(`_define_blind', `define(`$1', +`ifelse(`$3', `0', ``$4'', `$2')')') +@result{} +define_blind +@result{}define_blind +define_blind(`foo', `arguments were $*') +@result{} +foo +@result{}foo +foo(`bar') +@result{}arguments were bar +define(`blah', defn(`foo')) +@result{} +blah +@result{}blah +blah(`a', `b') +@result{}arguments were a,b +defn(`blah') +@result{}ifelse(`$#', `0', ``$0'', `arguments were $*') +@end example + +@cindex currying arguments +@cindex argument currying +Another interesting composition tactic is argument @dfn{currying}, or +factoring a macro that takes multiple arguments for use in a context +that provides exactly one argument. + +@deffn Composite curry (@var{macro}, @dots{}) +Expand to a macro call that takes exactly one argument, then appends +that argument to the original arguments and invokes @var{macro} with the +resulting list of arguments. +@end deffn + +A demonstration of currying makes the intent of this macro a little more +obvious. The macro @code{stack_foreach} mentioned earlier is an example +of a context that provides exactly one argument to a macro name. But +coupled with currying, we can invoke @code{reverse} with two arguments +for each definition of a macro stack. This example uses the file +@file{m4-@value{VERSION}/@/examples/@/curry.m4} included in the +distribution. + +@comment examples +@example +$ @kbd{m4 -I examples} +include(`curry.m4')include(`stack.m4') +@result{} +define(`reverse', `ifelse(`$#', `0', , `$#', `1', ``$1'', + `reverse(shift($@@)), `$1'')') +@result{} +pushdef(`a', `1')pushdef(`a', `2')pushdef(`a', `3') +@result{} +stack_foreach(`a', `:curry(`reverse', `4')') +@result{}:1, 4:2, 4:3, 4 +curry(`curry', `reverse', `1')(`2')(`3') +@result{}3, 2, 1 +@end example + +Now for the implementation. Notice how @code{curry} leaves off with a +macro name but no open parenthesis, while still in the middle of +collecting arguments for @samp{$1}. The macro @code{_curry} is the +helper macro that takes one argument, then adds it to the list and +finally supplies the closing parenthesis. The use of a comma inside the +@code{shift} call allows currying to also work for a macro that takes +one argument, although it often makes more sense to invoke that macro +directly rather than going through @code{curry}. + +@comment examples +@example +$ @kbd{m4 -I examples} +undivert(`curry.m4')dnl +@result{}divert(`-1') +@result{}# curry(macro, args) +@result{}# Expand to a macro call that takes one argument, then invoke +@result{}# macro(args, extra). +@result{}define(`curry', `$1(shift($@@,)_$0') +@result{}define(`_curry', ``$1')') +@result{}divert`'dnl +@end example + +Unfortunately, with M4 1.4.x, @code{curry} is unable to handle builtin +tokens, which are silently flattened to the empty string when passed +through another text macro. This limitation will be lifted in a future +release of M4. + +@cindex renaming macros +@cindex copying macros +@cindex macros, copying +Putting the last few concepts together, it is possible to copy or rename +an entire stack of macro definitions. + +@deffn Composite copy (@var{source}, @var{dest}) +@deffnx Composite rename (@var{source}, @var{dest}) +Ensure that @var{dest} is undefined, then define it to the same stack of +definitions currently in @var{source}. @code{copy} leaves @var{source} +unchanged, while @code{rename} undefines @var{source}. There are only a +few macros, such as @code{copy} or @code{defn}, which cannot be copied +via this macro. +@end deffn + +The implementation is relatively straightforward (although since it uses +@code{curry}, it is unable to copy builtin macros, such as the second +definition of @code{a} as a synonym for @code{divnum}. See if you can +design a version that works around this limitation, or @pxref{Improved +copy, , Answers}). + +@comment examples +@example +$ @kbd{m4 -I examples} +include(`curry.m4')include(`stack.m4') +@result{} +define(`rename', `copy($@@)undefine(`$1')')dnl +define(`copy', `ifdef(`$2', `errprint(`$2 already defined +')m4exit(`1')', + `stack_foreach(`$1', `curry(`pushdef', `$2')')')')dnl +pushdef(`a', `1')pushdef(`a', defn(`divnum'))pushdef(`a', `2') +@result{} +copy(`a', `b') +@result{} +rename(`b', `c') +@result{} +a b c +@result{}2 b 2 +popdef(`a', `c')c a +@result{} 0 +popdef(`a', `c')a c +@result{}1 1 +@end example + +@node Debugging +@chapter How to debug macros and input + +@cindex debugging macros +@cindex macros, debugging +When writing macros for @code{m4}, they often do not work as intended on +the first try (as is the case with most programming languages). +Fortunately, there is support for macro debugging in @code{m4}. + +@menu +* Dumpdef:: Displaying macro definitions +* Trace:: Tracing macro calls +* Debug Levels:: Controlling debugging output +* Debug Output:: Saving debugging output +@end menu + +@node Dumpdef +@section Displaying macro definitions + +@cindex displaying macro definitions +@cindex macros, displaying definitions +@cindex definitions, displaying macro +@cindex standard error, output to +If you want to see what a name expands into, you can use the builtin +@code{dumpdef}: + +@deffn Builtin dumpdef (@ovar{names@dots{}}) +Accepts any number of arguments. If called without any arguments, +it displays the definitions of all known names, otherwise it displays +the definitions of the @var{names} given. The output is printed to the +current debug file (usually standard error), and is sorted by name. If +an unknown name is encountered, a warning is printed. + +The expansion of @code{dumpdef} is void. +@end deffn + +@example +$ @kbd{m4 -d} +define(`foo', `Hello world.') +@result{} +dumpdef(`foo') +@error{}foo:@tabchar{}`Hello world.' +@result{} +dumpdef(`define') +@error{}define:@tabchar{}<define> +@result{} +@end example + +The last example shows how builtin macros definitions are displayed. +The definition that is dumped corresponds to what would occur if the +macro were to be called at that point, even if other definitions are +still live due to redefining a macro during argument collection. + +@example +$ @kbd{m4 -d} +pushdef(`f', ``$0'1')pushdef(`f', ``$0'2') +@result{} +f(popdef(`f')dumpdef(`f')) +@error{}f:@tabchar{}``$0'1' +@result{}f2 +f(popdef(`f')dumpdef(`f')) +@error{}m4:stdin:3: undefined macro `f' +@result{}f1 +@end example + +@xref{Debug Levels}, for information on controlling the details of the +display. + +@node Trace +@section Tracing macro calls + +@cindex tracing macro expansion +@cindex macro expansion, tracing +@cindex expansion, tracing macro +@cindex standard error, output to +It is possible to trace macro calls and expansions through the builtins +@code{traceon} and @code{traceoff}: + +@deffn Builtin traceon (@ovar{names@dots{}}) +@deffnx Builtin traceoff (@ovar{names@dots{}}) +When called without any arguments, @code{traceon} and @code{traceoff} +will turn tracing on and off, respectively, for all currently defined +macros. + +When called with arguments, only the macros listed in @var{names} are +affected, whether or not they are currently defined. + +The expansion of @code{traceon} and @code{traceoff} is void. +@end deffn + +Whenever a traced macro is called and the arguments have been collected, +the call is displayed. If the expansion of the macro call is not void, +the expansion can be displayed after the call. The output is printed +to the current debug file (defaulting to standard error, @pxref{Debug +Output}). + +@example +$ @kbd{m4 -d} +define(`foo', `Hello World.') +@result{} +define(`echo', `$@@') +@result{} +traceon(`foo', `echo') +@result{} +foo +@error{}m4trace: -1- foo -> `Hello World.' +@result{}Hello World. +echo(`gnus', `and gnats') +@error{}m4trace: -1- echo(`gnus', `and gnats') -> ``gnus',`and gnats'' +@result{}gnus,and gnats +@end example + +The number between dashes is the depth of the expansion. It is one most +of the time, signifying an expansion at the outermost level, but it +increases when macro arguments contain unquoted macro calls. The +maximum number that will appear between dashes is controlled by the +option @option{--nesting-limit} (or @option{-L}, @pxref{Limits control, +, Invoking m4}). Additionally, the option @option{--trace} (or +@option{-t}) can be used to invoke @code{traceon(@var{name})} before +parsing input. + +@comment The explicit -dp neutralizes the testsuite default of -d. +@comment options: -dp -L3 -tifelse +@comment status: 1 +@example +$ @kbd{m4 -L 3 -t ifelse} +ifelse(`one level') +@error{}m4trace: -1- ifelse +@result{} +ifelse(ifelse(ifelse(`three levels'))) +@error{}m4trace: -3- ifelse +@error{}m4trace: -2- ifelse +@error{}m4trace: -1- ifelse +@result{} +ifelse(ifelse(ifelse(ifelse(`four levels')))) +@error{}m4:stdin:3: recursion limit of 3 exceeded, use -L<N> to change it +@end example + +Tracing by name is an attribute that is preserved whether the macro is +defined or not. This allows the selection of macros to trace before +those macros are defined. + +@example +$ @kbd{m4 -d} +traceoff(`foo') +@result{} +traceon(`foo') +@result{} +foo +@result{}foo +defn(`foo') +@result{} +define(`foo', `bar') +@result{} +foo +@error{}m4trace: -1- foo -> `bar' +@result{}bar +undefine(`foo') +@result{} +ifdef(`foo', `yes', `no') +@result{}no +indir(`foo') +@error{}m4:stdin:9: undefined macro `foo' +@result{} +define(`foo', `blah') +@result{} +foo +@error{}m4trace: -1- foo -> `blah' +@result{}blah +traceoff +@result{} +foo +@result{}blah +@end example + +Tracing even works on builtins. However, @code{defn} (@pxref{Defn}) +does not transfer tracing status. + +@example +$ @kbd{m4 -d} +traceon(`traceon') +@result{} +traceon(`traceoff') +@error{}m4trace: -1- traceon(`traceoff') +@result{} +traceoff(`traceoff') +@error{}m4trace: -1- traceoff(`traceoff') +@result{} +traceoff(`traceon') +@result{} +traceon(`eval', `m4_divnum') +@result{} +define(`m4_eval', defn(`eval')) +@result{} +define(`m4_divnum', defn(`divnum')) +@result{} +eval(divnum) +@error{}m4trace: -1- eval(`0') -> `0' +@result{}0 +m4_eval(m4_divnum) +@error{}m4trace: -2- m4_divnum -> `0' +@result{}0 +@end example + +@xref{Debug Levels}, for information on controlling the details of the +display. The format of the trace output is not specified by +POSIX, and varies between implementations of @code{m4}. + +@ignore +@comment not worth including in the manual, but this tests a trace code +@comment path that was temporarily broken +@comment options: -de --trace ifelse +@example +$ @kbd{m4 -de --trace ifelse} +define(`e', `ifelse(`$1', `$2', `ifelse(`$1', `$2', `e(shift($@@))')')') +@result{} +e(`1', `1') +@error{}m4trace: -1- ifelse -> ifelse(`1', `1', `e(shift(`1',`1'))') +@error{}m4trace: -1- ifelse -> e(shift(`1',`1')) +@error{}m4trace: -1- ifelse +@result{} +@end example +@end ignore + +@node Debug Levels +@section Controlling debugging output + +@cindex controlling debugging output +@cindex debugging output, controlling +The @option{-d} option to @code{m4} (or @option{--debug}, +@pxref{Debugging options, , Invoking m4}) controls the amount of details +presented in three +categories of output. Trace output is requested by @code{traceon} +(@pxref{Trace}), and each line is prefixed by @samp{m4trace:} in +relation to a macro invocation. Debug output tracks useful events not +associated with a macro invocation, and each line is prefixed by +@samp{m4debug:}. Finally, @code{dumpdef} (@pxref{Dumpdef}) output is +affected, with no prefix added to the output lines. + +The @var{flags} following the option can be one or more of the +following: + +@table @code +@item a +In trace output, show the actual arguments that were collected before +invoking the macro. This applies to all macro calls if the @samp{t} +flag is used, otherwise only the macros covered by calls of +@code{traceon}. Arguments are subject to length truncation specified by +the command line option @option{--arglength} (or @option{-l}). + +@item c +In trace output, show several trace lines for each macro call. A line +is shown when the macro is seen, but before the arguments are collected; +a second line when the arguments have been collected and a third line +after the call has completed. + +@item e +In trace output, show the expansion of each macro call, if it is not +void. This applies to all macro calls if the @samp{t} flag is used, +otherwise only the macros covered by calls of @code{traceon}. The +expansion is subject to length truncation specified by the command line +option @option{--arglength} (or @option{-l}). + +@item f +In debug and trace output, include the name of the current input file in +the output line. + +@item i +In debug output, print a message each time the current input file is +changed. + +@item l +In debug and trace output, include the current input line number in the +output line. + +@item p +In debug output, print a message when a named file is found through the +path search mechanism (@pxref{Search Path}), giving the actual file name +used. + +@item q +In trace and dumpdef output, quote actual arguments and macro expansions +in the display with the current quotes. This is useful in connection +with the @samp{a} and @samp{e} flags above. + +@item t +In trace output, trace all macro calls made in this invocation of +@code{m4}, regardless of the settings of @code{traceon}. + +@item x +In trace output, add a unique `macro call id' to each line of the trace +output. This is useful in connection with the @samp{c} flag above. + +@item V +A shorthand for all of the above flags. +@end table + +If no flags are specified with the @option{-d} option, the default is +@samp{aeq}. The examples throughout this manual assume the default +flags. + +@cindex GNU extensions +There is a builtin macro @code{debugmode}, which allows on-the-fly control of +the debugging output format: + +@deffn Builtin debugmode (@ovar{flags}) +The argument @var{flags} should be a subset of the letters listed above. +As special cases, if the argument starts with a @samp{+}, the flags are +added to the current debug flags, and if it starts with a @samp{-}, they +are removed. If no argument is present, all debugging flags are cleared +(as if no @option{-d} was given), and with an empty argument the flags +are reset to the default of @samp{aeq}. + +The expansion of @code{debugmode} is void. +@end deffn + +@comment The explicit -dp neutralizes the testsuite default of -d. +@comment options: -dp +@example +$ @kbd{m4} +define(`foo', `FOO') +@result{} +traceon(`foo') +@result{} +debugmode() +@result{} +foo +@error{}m4trace: -1- foo -> `FOO' +@result{}FOO +debugmode +@result{} +foo +@error{}m4trace: -1- foo +@result{}FOO +debugmode(`+l') +@result{} +foo +@error{}m4trace:8: -1- foo +@result{}FOO +@end example + +The following example demonstrates the behavior of length truncation, +when specified on the command line. Note that each argument and the +final result are individually truncated. Also, the special tokens for +builtin functions are not truncated. + +@comment options: -l6 +@example +$ @kbd{m4 -d -l 6} +define(`echo', `$@@')debugmode(`+t') +@result{} +echo(`1', `long string') +@error{}m4trace: -1- echo(`1', `long s...') -> ``1',`l...' +@result{}1,long string +indir(`echo', defn(`changequote')) +@error{}m4trace: -2- defn(`change...') +@error{}m4trace: -1- indir(`echo', <changequote>) -> ``'' +@result{} +@end example + +This example shows the effects of the debug flags that are not related +to macro tracing. + +@comment examples +@comment options: -dip +@example +$ @kbd{m4 -dip -I examples} +@error{}m4debug: input read from stdin +include(`foo')dnl +@error{}m4debug: path search for `foo' found `examples/foo' +@error{}m4debug: input read from examples/foo +@result{}bar +@error{}m4debug: input reverted to stdin, line 1 +^D +@error{}m4debug: input exhausted +@end example + +@node Debug Output +@section Saving debugging output + +@cindex saving debugging output +@cindex debugging output, saving +@cindex output, saving debugging +@cindex GNU extensions +Debug and tracing output can be redirected to files using either the +@option{--debugfile} option to @code{m4} (@pxref{Debugging options, , +Invoking m4}), or with the builtin macro @code{debugfile}: + +@deffn Builtin debugfile (@ovar{file}) +Sends all further debug and trace output to @var{file}, opened in append +mode. If @var{file} is the empty string, debug and trace output are +discarded. If @code{debugfile} is called without any arguments, debug +and trace output are sent to standard error. This does not affect +warnings, error messages, or @code{errprint} output, which are +always sent to standard error. If @var{file} cannot be opened, the +current debug file is unchanged, and an error is issued. + +The expansion of @code{debugfile} is void. +@end deffn + +@example +$ @kbd{m4 -d} +traceon(`divnum') +@result{} +divnum(`extra') +@error{}m4:stdin:2: Warning: excess arguments to builtin `divnum' ignored +@error{}m4trace: -1- divnum(`extra') -> `0' +@result{}0 +debugfile() +@result{} +divnum(`extra') +@error{}m4:stdin:4: Warning: excess arguments to builtin `divnum' ignored +@result{}0 +debugfile +@result{} +divnum +@error{}m4trace: -1- divnum -> `0' +@result{}0 +@end example + +@node Input Control +@chapter Input control + +This chapter describes various builtin macros for controlling the input +to @code{m4}. + +@menu +* Dnl:: Deleting whitespace in input +* Changequote:: Changing the quote characters +* Changecom:: Changing the comment delimiters +* Changeword:: Changing the lexical structure of words +* M4wrap:: Saving text until end of input +@end menu + +@node Dnl +@section Deleting whitespace in input + +@cindex deleting whitespace in input +@cindex discarding input +@cindex input, discarding +The builtin @code{dnl} stands for ``Discard to Next Line'': + +@deffn Builtin dnl +All characters, up to and including the next newline, are discarded +without performing any macro expansion. A warning is issued if the end +of the file is encountered without a newline. + +The expansion of @code{dnl} is void. +@end deffn + +It is often used in connection with @code{define}, to remove the +newline that follows the call to @code{define}. Thus + +@example +define(`foo', `Macro `foo'.')dnl A very simple macro, indeed. +foo +@result{}Macro foo. +@end example + +The input up to and including the next newline is discarded, as opposed +to the way comments are treated (@pxref{Comments}). + +Usually, @code{dnl} is immediately followed by an end of line or some +other whitespace. GNU @code{m4} will produce a warning diagnostic if +@code{dnl} is followed by an open parenthesis. In this case, @code{dnl} +will collect and process all arguments, looking for a matching close +parenthesis. All predictable side effects resulting from this +collection will take place. @code{dnl} will return no output. The +input following the matching close parenthesis up to and including the +next newline, on whatever line containing it, will still be discarded. + +@example +dnl(`args are ignored, but side effects occur', +define(`foo', `like this')) while this text is ignored: undefine(`foo') +@error{}m4:stdin:1: Warning: excess arguments to builtin `dnl' ignored +See how `foo' was defined, foo? +@result{}See how foo was defined, like this? +@end example + +If the end of file is encountered without a newline character, a +warning is issued and dnl stops consuming input. + +@example +m4wrap(`m4wrap(`2 hi +')0 hi dnl 1 hi') +@result{} +define(`hi', `HI') +@result{} +^D +@error{}m4:stdin:1: Warning: end of file treated as newline +@result{}0 HI 2 HI +@end example + +@node Changequote +@section Changing the quote characters + +@cindex changing quote delimiters +@cindex quote delimiters, changing +@cindex delimiters, changing +The default quote delimiters can be changed with the builtin +@code{changequote}: + +@deffn Builtin changequote (@dvar{start, `}, @dvar{end, '}) +This sets @var{start} as the new begin-quote delimiter and @var{end} as +the new end-quote delimiter. If both arguments are missing, the default +quotes (@code{`} and @code{'}) are used. If @var{start} is void, then +quoting is disabled. Otherwise, if @var{end} is missing or void, the +default end-quote delimiter (@code{'}) is used. The quote delimiters +can be of any length. + +The expansion of @code{changequote} is void. +@end deffn + +@example +changequote(`[', `]') +@result{} +define([foo], [Macro [foo].]) +@result{} +foo +@result{}Macro foo. +@end example + +The quotation strings can safely contain eight-bit characters. +@ignore +@comment Yuck. I know of no clean way to render an 8-bit character in +@comment both info and dvi. This example uses the `open-guillemot' and +@comment `close-guillemot' characters of the Latin-1 character set. + +@example +define(`a', `b') +@result{} +«a» +@result{}«b» +changequote(`«', `»') +@result{} +«a» +@result{}a +@end example +@end ignore +If no single character is appropriate, @var{start} and @var{end} can be +of any length. Other implementations cap the delimiter length to five +characters, but GNU has no inherent limit. + +@example +changequote(`[[[', `]]]') +@result{} +define([[[foo]]], [[[Macro [[[[[foo]]]]].]]]) +@result{} +foo +@result{}Macro [[foo]]. +@end example + +Calling @code{changequote} with @var{start} as the empty string will +effectively disable the quoting mechanism, leaving no way to quote text. +However, using an empty string is not portable, as some other +implementations of @code{m4} revert to the default quoting, while others +preserve the prior non-empty delimiter. If @var{start} is not empty, +then an empty @var{end} will use the default end-quote delimiter of +@samp{'}, as otherwise, it would be impossible to end a quoted string. +Again, this is not portable, as some other @code{m4} implementations +reuse @var{start} as the end-quote delimiter, while others preserve the +previous non-empty value. Omitting both arguments restores the default +begin-quote and end-quote delimiters; fortunately this behavior is +portable to all implementations of @code{m4}. + +@example +define(`foo', `Macro `FOO'.') +@result{} +changequote(`', `') +@result{} +foo +@result{}Macro `FOO'. +`foo' +@result{}`Macro `FOO'.' +changequote(`,) +@result{} +foo +@result{}Macro FOO. +@end example + +There is no way in @code{m4} to quote a string containing an unmatched +begin-quote, except using @code{changequote} to change the current +quotes. + +If the quotes should be changed from, say, @samp{[} to @samp{[[}, +temporary quote characters have to be defined. To achieve this, two +calls of @code{changequote} must be made, one for the temporary quotes +and one for the new quotes. + +Macros are recognized in preference to the begin-quote string, so if a +prefix of @var{start} can be recognized as part of a potential macro +name, the quoting mechanism is effectively disabled. Unless you use +@code{changeword} (@pxref{Changeword}), this means that @var{start} +should not begin with a letter, digit, or @samp{_} (underscore). +However, even though quoted strings are not recognized, the quote +characters can still be discerned in macro expansion and in trace +output. + +@example +define(`echo', `$@@') +@result{} +define(`hi', `HI') +@result{} +changequote(`q', `Q') +@result{} +q hi Q hi +@result{}q HI Q HI +echo(hi) +@result{}qHIQ +changequote +@result{} +changequote(`-', `EOF') +@result{} +- hi EOF hi +@result{} hi HI +changequote +@result{} +changequote(`1', `2') +@result{} +hi1hi2 +@result{}hi1hi2 +hi 1hi2 +@result{}HI hi +@end example + +Quotes are recognized in preference to argument collection. In +particular, if @var{start} is a single @samp{(}, then argument +collection is effectively disabled. For portability with other +implementations, it is a good idea to avoid @samp{(}, @samp{,}, and +@samp{)} as the first character in @var{start}. + +@example +define(`echo', `$#:$@@:') +@result{} +define(`hi', `HI') +@result{} +changequote(`(',`)') +@result{} +echo(hi) +@result{}0::hi +changequote +@result{} +changequote(`((', `))') +@result{} +echo(hi) +@result{}1:HI: +echo((hi)) +@result{}0::hi +changequote +@result{} +changequote(`,', `)') +@result{} +echo(hi,hi)bye) +@result{}1:HIhibye: +@end example + +However, if you are not worried about portability, using @samp{(} and +@samp{)} as quoting characters has an interesting property---you can use +it to compute a quoted string containing the expansion of any quoted +text, as long as the expansion results in both balanced quotes and +balanced parentheses. The trick is realizing @code{expand} uses +@samp{$1} unquoted, to trigger its expansion using the normal quoting +characters, but uses extra parentheses to group unquoted commas that +occur in the expansion without consuming whitespace following those +commas. Then @code{_expand} uses @code{changequote} to convert the +extra parentheses back into quoting characters. Note that it takes two +more @code{changequote} invocations to restore the original quotes. +Contrast the behavior on whitespace when using @samp{$*}, via +@code{quote}, to attempt the same task. + +@example +changequote(`[', `]')dnl +define([a], [1, (b)])dnl +define([b], [2])dnl +define([quote], [[$*]])dnl +define([expand], [_$0(($1))])dnl +define([_expand], + [changequote([(], [)])$1changequote`'changequote(`[', `]')])dnl +expand([a, a, [a, a], [[a, a]]]) +@result{}1, (2), 1, (2), a, a, [a, a] +quote(a, a, [a, a], [[a, a]]) +@result{}1,(2),1,(2),a, a,[a, a] +@end example + +If @var{end} is a prefix of @var{start}, the end-quote will be +recognized in preference to a nested begin-quote. In particular, +changing the quotes to have the same string for @var{start} and +@var{end} disables nesting of quotes. When quote nesting is disabled, +it is impossible to double-quote strings across macro expansions, so +using the same string is not done very often. + +@example +define(`hi', `HI') +@result{} +changequote(`""', `"') +@result{} +""hi"""hi" +@result{}hihi +""hi" ""hi" +@result{}hi hi +""hi"" "hi" +@result{}hi" "HI" +changequote +@result{} +`hi`hi'hi' +@result{}hi`hi'hi +changequote(`"', `"') +@result{} +"hi"hi"hi" +@result{}hiHIhi +@end example + +@ignore +@comment And another stress test, not worth documenting in the manual. +@example +define(`aaaaaaaaaaaaaaaaaaaa', `A')define(`q', `"$@@"') +@result{} +changequote(`"', `"') +@result{} +q(q("aaaaaaaaaaaaaaaaaaaa", "a")) +@result{}A,a +@end example +@end ignore + +It is an error if the end of file occurs within a quoted string. + +@comment status: 1 +@example +`hello world' +@result{}hello world +`dangling quote +^D +@error{}m4:stdin:2: ERROR: end of file in string +@end example + +@comment status: 1 +@example +ifelse(`dangling quote +^D +@error{}m4:stdin:1: ERROR: end of file in string +@end example + +@node Changecom +@section Changing the comment delimiters + +@cindex changing comment delimiters +@cindex comment delimiters, changing +@cindex delimiters, changing +The default comment delimiters can be changed with the builtin +macro @code{changecom}: + +@deffn Builtin changecom (@ovar{start}, @dvar{end, @key{NL}}) +This sets @var{start} as the new begin-comment delimiter and @var{end} +as the new end-comment delimiter. If both arguments are missing, or +@var{start} is void, then comments are disabled. Otherwise, if +@var{end} is missing or void, the default end-comment delimiter of +newline is used. The comment delimiters can be of any length. + +The expansion of @code{changecom} is void. +@end deffn + +@example +define(`comment', `COMMENT') +@result{} +# A normal comment +@result{}# A normal comment +changecom(`/*', `*/') +@result{} +# Not a comment anymore +@result{}# Not a COMMENT anymore +But: /* this is a comment now */ while this is not a comment +@result{}But: /* this is a comment now */ while this is not a COMMENT +@end example + +@cindex comments, copied to output +Note how comments are copied to the output, much as if they were quoted +strings. If you want the text inside a comment expanded, quote the +begin-comment delimiter. + +Calling @code{changecom} without any arguments, or with @var{start} as +the empty string, will effectively disable the commenting mechanism. To +restore the original comment start of @samp{#}, you must explicitly ask +for it. If @var{start} is not empty, then an empty @var{end} will use +the default end-comment delimiter of newline, as otherwise, it would be +impossible to end a comment. However, this is not portable, as some +other @code{m4} implementations preserve the previous non-empty +delimiters instead. + +@example +define(`comment', `COMMENT') +@result{} +changecom +@result{} +# Not a comment anymore +@result{}# Not a COMMENT anymore +changecom(`#', `') +@result{} +# comment again +@result{}# comment again +@end example + +The comment strings can safely contain eight-bit characters. +@ignore +@comment Yuck. I know of no clean way to render an 8-bit character in +@comment both info and dvi. This example uses the `open-guillemot' and +@comment `close-guillemot' characters of the Latin-1 character set. + +@example +define(`a', `b') +@result{} +«a» +@result{}«b» +changecom(`«', `»') +@result{} +«a» +@result{}«a» +@end example +@end ignore +If no single character is appropriate, @var{start} and @var{end} can be +of any length. Other implementations cap the delimiter length to five +characters, but GNU has no inherent limit. + +Comments are recognized in preference to macros. However, this is not +compatible with other implementations, where macros and even quoting +takes precedence over comments, so it may change in a future release. +For portability, this means that @var{start} should not begin with a +letter, digit, or @samp{_} (underscore), and that neither the +start-quote nor the start-comment string should be a prefix of the +other. + +@example +define(`hi', `HI') +@result{} +define(`hi1hi2', `hello') +@result{} +changecom(`q', `Q') +@result{} +q hi Q hi +@result{}q hi Q HI +changecom(`1', `2') +@result{} +hi1hi2 +@result{}hello +hi 1hi2 +@result{}HI 1hi2 +@end example + +Comments are recognized in preference to argument collection. In +particular, if @var{start} is a single @samp{(}, then argument +collection is effectively disabled. For portability with other +implementations, it is a good idea to avoid @samp{(}, @samp{,}, and +@samp{)} as the first character in @var{start}. + +@example +define(`echo', `$#:$*:$@@:') +@result{} +define(`hi', `HI') +@result{} +changecom(`(',`)') +@result{} +echo(hi) +@result{}0:::(hi) +changecom +@result{} +changecom(`((', `))') +@result{} +echo(hi) +@result{}1:HI:HI: +echo((hi)) +@result{}0:::((hi)) +changecom(`,', `)') +@result{} +echo(hi,hi)bye) +@result{}1:HI,hi)bye:HI,hi)bye: +changecom +@result{} +echo(hi,`,`'hi',hi) +@result{}3:HI,,HI,HI:HI,,`'hi,HI: +echo(hi,`,`'hi',hi`'changecom(`,,', `hi')) +@result{}3:HI,,`'hi,HI:HI,,`'hi,HI: +@end example + +It is an error if the end of file occurs within a comment. + +@comment status: 1 +@example +changecom(`/*', `*/') +@result{} +/*dangling comment +^D +@error{}m4:stdin:2: ERROR: end of file in comment +@end example + +@node Changeword +@section Changing the lexical structure of words + +@cindex lexical structure of words +@cindex words, lexical structure of +@cindex syntax, changing +@cindex changing syntax +@cindex regular expressions +@quotation +The macro @code{changeword} and all associated functionality is +experimental. It is only available if the @option{--enable-changeword} +option was given to @command{configure}, at GNU @code{m4} +installation +time. The functionality will go away in the future, to be replaced by +other new features that are more efficient at providing the same +capabilities. @emph{Do not rely on it}. Please direct your comments +about it the same way you would do for bugs. +@end quotation + +A file being processed by @code{m4} is split into quoted strings, words +(potential macro names) and simple tokens (any other single character). +Initially a word is defined by the following regular expression: + +@comment ignore +@example +[_a-zA-Z][_a-zA-Z0-9]* +@end example + +Using @code{changeword}, you can change this regular expression: + +@deffn {Optional builtin} changeword (@var{regex}) +Changes the regular expression for recognizing macro names to be +@var{regex}. If @var{regex} is empty, use +@samp{[_a-zA-Z][_a-zA-Z0-9]*}. @var{regex} must obey the constraint +that every prefix of the desired final pattern is also accepted by the +regular expression. If @var{regex} contains grouping parentheses, the +macro invoked is the portion that matched the first group, rather than +the entire matching string. + +The expansion of @code{changeword} is void. +The macro @code{changeword} is recognized only with parameters. +@end deffn + +Relaxing the lexical rules of @code{m4} might be useful (for example) if +you wanted to apply translations to a file of numbers: + +@example +ifdef(`changeword', `', `errprint(` skipping: no changeword support +')m4exit(`77')')dnl +changeword(`[_a-zA-Z0-9]+') +@result{} +define(`1', `0')1 +@result{}0 +@end example + +Tightening the lexical rules is less useful, because it will generally +make some of the builtins unavailable. You could use it to prevent +accidental call of builtins, for example: + +@example +ifdef(`changeword', `', `errprint(` skipping: no changeword support +')m4exit(`77')')dnl +define(`_indir', defn(`indir')) +@result{} +changeword(`_[_a-zA-Z0-9]*') +@result{} +esyscmd(`foo') +@result{}esyscmd(foo) +_indir(`esyscmd', `echo hi') +@result{}hi +@result{} +@end example + +Because @code{m4} constructs its words a character at a time, there +is a restriction on the regular expressions that may be passed to +@code{changeword}. This is that if your regular expression accepts +@samp{foo}, it must also accept @samp{f} and @samp{fo}. + +@example +ifdef(`changeword', `', `errprint(` skipping: no changeword support +')m4exit(`77')')dnl +define(`foo +', `bar +') +@result{} +dnl This example wants to recognize changeword, dnl, and `foo\n'. +dnl First, we check that our regexp will match. +regexp(`changeword', `[cd][a-z]*\|foo[ +]') +@result{}0 +regexp(`foo +', `[cd][a-z]*\|foo[ +]') +@result{}0 +regexp(`f', `[cd][a-z]*\|foo[ +]') +@result{}-1 +foo +@result{}foo +changeword(`[cd][a-z]*\|foo[ +]') +@result{} +dnl Even though `foo\n' matches, we forgot to allow `f'. +foo +@result{}foo +changeword(`[cd][a-z]*\|fo*[ +]?') +@result{} +dnl Now we can call `foo\n'. +foo +@result{}bar +@end example + +@ignore +@comment One more test of including newline in a macro name; but this +@comment does not need to be displayed in the manual. This ensures +@comment that line numbering is correct when dnl cuts across include +@comment file boundaries, and when __file__ or __line__ is the last +@comment token in an include file. + +@example +ifdef(`changeword', `', `errprint(` skipping: no changeword support +')m4exit(`77')')dnl +define(`bar +', defn(`dnl'))dnl +define(`baz', `dnl +include(`foo') ignored +dnl')dnl +changeword(`\([_a-zA-Z][_a-zA-Z0-9]*\|bar +\)') +@result{} +__file__:__line__ +@result{}stdin:10 +include(`foo') ignored +__file__:__line__ +@result{}stdin:12 +baz ignored +__file__:__line__ +@result{}stdin:14 +define(`bar +', defn(`__file__')) +@result{} +include(`foo') +@result{}examples/foo +define(`bar +', defn(`__line__')) +@result{} +include(`foo') +@result{}1 +__file__:__line__ +@result{}stdin:21 +@end example +@end ignore + +@code{changeword} has another function. If the regular expression +supplied contains any grouped subexpressions, then text outside +the first of these is discarded before symbol lookup. So: + +@example +ifdef(`changeword', `', `errprint(` skipping: no changeword support +')m4exit(`77')')dnl +ifdef(`__unix__', , + `errprint(` skipping: syscmd does not have unix semantics +')m4exit(`77')')dnl +changecom(`/*', `*/')dnl +define(`foo', `bar')dnl +changeword(`#\([_a-zA-Z0-9]*\)') +@result{} +#esyscmd(`echo foo \#foo') +@result{}foo bar +@result{} +@end example + +@code{m4} now requires a @samp{#} mark at the beginning of every +macro invocation, so one can use @code{m4} to preprocess plain +text without losing various words like @samp{divert}. + +In @code{m4}, macro substitution is based on text, while in @TeX{}, it +is based on tokens. @code{changeword} can throw this difference into +relief. For example, here is the same idea represented in @TeX{} and +@code{m4}. First, the @TeX{} version: + +@comment ignore +@example +\def\a@{\message@{Hello@}@} +\catcode`\@@=0 +\catcode`\\=12 +@@a +@@bye +@result{}Hello +@end example + +@noindent +Then, the @code{m4} version: + +@example +ifdef(`changeword', `', `errprint(` skipping: no changeword support +')m4exit(`77')')dnl +define(`a', `errprint(`Hello')')dnl +changeword(`@@\([_a-zA-Z0-9]*\)') +@result{} +@@a +@result{}errprint(Hello) +@end example + +In the @TeX{} example, the first line defines a macro @code{a} to +print the message @samp{Hello}. The second line defines @key{@@} to +be usable instead of @key{\} as an escape character. The third line +defines @key{\} to be a normal printing character, not an escape. +The fourth line invokes the macro @code{a}. So, when @TeX{} is run +on this file, it displays the message @samp{Hello}. + +When the @code{m4} example is passed through @code{m4}, it outputs +@samp{errprint(Hello)}. The reason for this is that @TeX{} does +lexical analysis of macro definition when the macro is @emph{defined}. +@code{m4} just stores the text, postponing the lexical analysis until +the macro is @emph{used}. + +You should note that using @code{changeword} will slow @code{m4} down +by a factor of about seven, once it is changed to something other +than the default regular expression. You can invoke @code{changeword} +with the empty string to restore the default word definition, and regain +the parsing speed. + +@node M4wrap +@section Saving text until end of input + +@cindex saving input +@cindex input, saving +@cindex deferring expansion +@cindex expansion, deferring +It is possible to `save' some text until the end of the normal input has +been seen. Text can be saved, to be read again by @code{m4} when the +normal input has been exhausted. This feature is normally used to +initiate cleanup actions before normal exit, e.g., deleting temporary +files. + +To save input text, use the builtin @code{m4wrap}: + +@deffn Builtin m4wrap (@var{string}, @dots{}) +Stores @var{string} in a safe place, to be reread when end of input is +reached. As a GNU extension, additional arguments are +concatenated with a space to the @var{string}. + +The expansion of @code{m4wrap} is void. +The macro @code{m4wrap} is recognized only with parameters. +@end deffn + +@example +define(`cleanup', `This is the `cleanup' action. +') +@result{} +m4wrap(`cleanup') +@result{} +This is the first and last normal input line. +@result{}This is the first and last normal input line. +^D +@result{}This is the cleanup action. +@end example + +The saved input is only reread when the end of normal input is seen, and +not if @code{m4exit} is used to exit @code{m4}. + +@comment FIXME: this contradicts POSIX, which requires that "If the +@comment m4wrap macro is used multiple times, the arguments specified +@comment shall be processed in the order in which the m4wrap macros were +@comment processed." +It is safe to call @code{m4wrap} from saved text, but then the order in +which the saved text is reread is undefined. If @code{m4wrap} is not used +recursively, the saved pieces of text are reread in the opposite order +in which they were saved (LIFO---last in, first out). However, this +behavior is likely to change in a future release, to match +POSIX, so you should not depend on this order. + +It is possible to emulate POSIX behavior even +with older versions of GNU M4 by including the file +@file{m4-@value{VERSION}/@/examples/@/wrapfifo.m4} from the +distribution: + +@comment examples +@example +$ @kbd{m4 -I examples} +undivert(`wrapfifo.m4')dnl +@result{}dnl Redefine m4wrap to have FIFO semantics. +@result{}define(`_m4wrap_level', `0')dnl +@result{}define(`m4wrap', +@result{}`ifdef(`m4wrap'_m4wrap_level, +@result{} `define(`m4wrap'_m4wrap_level, +@result{} defn(`m4wrap'_m4wrap_level)`$1')', +@result{} `builtin(`m4wrap', `define(`_m4wrap_level', +@result{} incr(_m4wrap_level))dnl +@result{}m4wrap'_m4wrap_level)dnl +@result{}define(`m4wrap'_m4wrap_level, `$1')')')dnl +include(`wrapfifo.m4') +@result{} +m4wrap(`a`'m4wrap(`c +', `d')')m4wrap(`b') +@result{} +^D +@result{}abc +@end example + +It is likewise possible to emulate LIFO behavior without resorting to +the GNU M4 extension of @code{builtin}, by including the file +@file{m4-@value{VERSION}/@/examples/@/wraplifo.m4} from the +distribution. (Unfortunately, both examples shown here share some +subtle bugs. See if you can find and correct them; or @pxref{Improved +m4wrap, , Answers}). + +@comment examples +@example +$ @kbd{m4 -I examples} +undivert(`wraplifo.m4')dnl +@result{}dnl Redefine m4wrap to have LIFO semantics. +@result{}define(`_m4wrap_level', `0')dnl +@result{}define(`_m4wrap', defn(`m4wrap'))dnl +@result{}define(`m4wrap', +@result{}`ifdef(`m4wrap'_m4wrap_level, +@result{} `define(`m4wrap'_m4wrap_level, +@result{} `$1'defn(`m4wrap'_m4wrap_level))', +@result{} `_m4wrap(`define(`_m4wrap_level', incr(_m4wrap_level))dnl +@result{}m4wrap'_m4wrap_level)dnl +@result{}define(`m4wrap'_m4wrap_level, `$1')')')dnl +include(`wraplifo.m4') +@result{} +m4wrap(`a`'m4wrap(`c +', `d')')m4wrap(`b') +@result{} +^D +@result{}bac +@end example + +Here is an example of implementing a factorial function using +@code{m4wrap}: + +@example +define(`f', `ifelse(`$1', `0', `Answer: 0!=1 +', eval(`$1>1'), `0', `Answer: $2$1=eval(`$2$1') +', `m4wrap(`f(decr(`$1'), `$2$1*')')')') +@result{} +f(`10') +@result{} +^D +@result{}Answer: 10*9*8*7*6*5*4*3*2*1=3628800 +@end example + +Invocations of @code{m4wrap} at the same recursion level are +concatenated and rescanned as usual: + +@example +define(`aa', `AA +') +@result{} +m4wrap(`a')m4wrap(`a') +@result{} +^D +@result{}AA +@end example + +@noindent +however, the transition between recursion levels behaves like an end of +file condition between two input files. + +@comment status: 1 +@example +m4wrap(`m4wrap(`)')len(abc') +@result{} +^D +@error{}m4:stdin:1: ERROR: end of file in argument list +@end example + +@node File Inclusion +@chapter File inclusion + +@cindex file inclusion +@cindex inclusion, of files +@code{m4} allows you to include named files at any point in the input. + +@menu +* Include:: Including named files +* Search Path:: Searching for include files +@end menu + +@node Include +@section Including named files + +There are two builtin macros in @code{m4} for including files: + +@deffn Builtin include (@var{file}) +@deffnx Builtin sinclude (@var{file}) +Both macros cause the file named @var{file} to be read by +@code{m4}. When the end of the file is reached, input is resumed from +the previous input file. + +The expansion of @code{include} and @code{sinclude} is therefore the +contents of @var{file}. + +If @var{file} does not exist, is a directory, or cannot otherwise be +read, the expansion is void, +and @code{include} will fail with an error while @code{sinclude} is +silent. The empty string counts as a file that does not exist. + +The macros @code{include} and @code{sinclude} are recognized only with +parameters. +@end deffn + +@comment status: 1 +@example +include(`none') +@error{}m4:stdin:1: cannot open `none': No such file or directory +@result{} +include() +@error{}m4:stdin:2: cannot open `': No such file or directory +@result{} +sinclude(`none') +@result{} +sinclude() +@result{} +@end example + +The rest of this section assumes that @code{m4} is invoked with the +@option{-I} option (@pxref{Preprocessor features, , Invoking m4}) +pointing to the @file{m4-@value{VERSION}/@/examples} +directory shipped as part of the GNU @code{m4} package. The +file @file{m4-@value{VERSION}/@/examples/@/incl.m4} in the distribution +contains the lines: + +@comment ignore +@example +$ @kbd{cat examples/incl.m4} +@result{}Include file start +@result{}foo +@result{}Include file end +@end example + +Normally file inclusion is used to insert the contents of a file +into the input stream. The contents of the file will be read by +@code{m4} and macro calls in the file will be expanded: + +@comment examples +@example +$ @kbd{m4 -I examples} +define(`foo', `FOO') +@result{} +include(`incl.m4') +@result{}Include file start +@result{}FOO +@result{}Include file end +@result{} +@end example + +The fact that @code{include} and @code{sinclude} expand to the contents +of the file can be used to define macros that operate on entire files. +Here is an example, which defines @samp{bar} to expand to the contents +of @file{incl.m4}: + +@comment examples +@example +$ @kbd{m4 -I examples} +define(`bar', include(`incl.m4')) +@result{} +This is `bar': >>bar<< +@result{}This is bar: >>Include file start +@result{}foo +@result{}Include file end +@result{}<< +@end example + +This use of @code{include} is not trivial, though, as files can contain +quotes, commas, and parentheses, which can interfere with the way the +@code{m4} parser works. GNU @code{m4} seamlessly concatenates +the file contents with the next character, even if the included file +ended in the middle of a comment, string, or macro call. These +conditions are only treated as end of file errors if specified as input +files on the command line. + +In GNU @code{m4}, an alternative method of reading files is +using @code{undivert} (@pxref{Undivert}) on a named file. + +@ignore +@comment Test that include(`file/') detects that file is not a +@comment directory; we can assume that the current directory contains a +@comment Makefile. mingw fails with EINVAL rather than ENOTDIR. + +@comment status: 1 +@comment xerr: ignore +@example +include(`Makefile/') +@error{}m4:stdin:1: cannot open `Makefile/': Not a directory +@result{} +@end example + +@comment POSIX allows, but doesn't require, failure on reading +@comment directories. But since they aren't text files, it never makes +@comment sense, so we globally forbid it even if fopen doesn't. mingw +@comment fails with EACCES rather than EISDIR. + +@comment status: 1 +@comment xerr: ignore +@example +include(`.') +@error{}m4:stdin:1: cannot open `.': Is a directory +@result{} +@end example + +@comment Meanwhile, ignore errors with sinclude. + +@example +sinclude(`Makefile/') +@result{} +sinclude(`.') +@result{} +@end example +@end ignore + +@node Search Path +@section Searching for include files + +@cindex search path for included files +@cindex included files, search path for +@cindex GNU extensions +GNU @code{m4} allows included files to be found in other directories +than the current working directory. + +@cindex @env{M4PATH} +If the @option{--prepend-include} or @option{-B} command-line option was +provided (@pxref{Preprocessor features, , Invoking m4}), those +directories are searched first, in reverse order that those options were +listed on the command line. Then @code{m4} looks in the current working +directory. Next comes the directories specified with the +@option{--include} or @option{-I} option, in the order found on the +command line. Finally, if the @env{M4PATH} environment variable is set, +it is expected to contain a colon-separated list of directories, which +will be searched in order. + +If the automatic search for include-files causes trouble, the @samp{p} +debug flag (@pxref{Debug Levels}) can help isolate the problem. + +@node Diversions +@chapter Diverting and undiverting output + +@cindex deferring output +Diversions are a way of temporarily saving output. The output of +@code{m4} can at any time be diverted to a temporary file, and be +reinserted into the output stream, @dfn{undiverted}, again at a later +time. + +@cindex @env{TMPDIR} +Numbered diversions are counted from 0 upwards, diversion number 0 +being the normal output stream. GNU +@code{m4} tries to keep diversions in memory. However, there is a +limit to the overall memory usable by all diversions taken together +(512K, currently). When this maximum is about to be exceeded, +a temporary file is opened to receive the contents of the biggest +diversion still in memory, freeing this memory for other diversions. +When creating the temporary file, @code{m4} honors the value of the +environment variable @env{TMPDIR}, and falls back to @file{/tmp}. +Thus, the amount of available disk space provides the only real limit on +the number and aggregate size of diversions. + +@ignore +@comment We need to test spilled diversions, but don't need to expose +@comment this highly repetitive test in the manual. + +@example +divert(`-1')define(`f', `.') +define(`f', defn(`f')defn(`f')) +define(`f', defn(`f')defn(`f')) +define(`f', defn(`f')defn(`f')) +define(`f', defn(`f')defn(`f')) +define(`f', defn(`f')defn(`f')) +define(`f', defn(`f')defn(`f')) +define(`f', defn(`f')defn(`f')) +define(`f', defn(`f')defn(`f')) +define(`f', defn(`f')defn(`f')) +define(`f', defn(`f')defn(`f')) +define(`f', defn(`f')defn(`f')) +define(`f', defn(`f')defn(`f')) +define(`f', defn(`f')defn(`f')) +define(`f', defn(`f')defn(`f')) +define(`f', defn(`f')defn(`f')) +define(`f', defn(`f')defn(`f')) +define(`f', defn(`f')defn(`f')) +define(`f', defn(`f')defn(`f')) +define(`f', defn(`f')defn(`f')) +define(`f', defn(`f')defn(`f')) +divert`'dnl +len(f) +@result{}1048576 +divert(`1') +f +divert(`2') +f +divert(`-1')undivert +divert(`1')bye +^D +@result{}bye +@end example + +@comment Another test of spilled diversions. + +@example +divert(`-1')define(`f', `.') +define(`f', defn(`f')defn(`f')) +define(`f', defn(`f')defn(`f')) +define(`f', defn(`f')defn(`f')) +define(`f', defn(`f')defn(`f')) +define(`f', defn(`f')defn(`f')) +define(`f', defn(`f')defn(`f')) +define(`f', defn(`f')defn(`f')) +define(`f', defn(`f')defn(`f')) +define(`f', defn(`f')defn(`f')) +define(`f', defn(`f')defn(`f')) +define(`f', defn(`f')defn(`f')) +define(`f', defn(`f')defn(`f')) +define(`f', defn(`f')defn(`f')) +define(`f', defn(`f')defn(`f')) +define(`f', defn(`f')defn(`f')) +define(`f', defn(`f')defn(`f')) +define(`f', defn(`f')defn(`f')) +define(`f', defn(`f')defn(`f')) +define(`f', defn(`f')defn(`f')) +define(`f', defn(`f')defn(`f')) +divert`'dnl +len(f) +@result{}1048576 +divert(`1') +f +m4exit +@end example + +@comment Catch regression in 1.4.10 with spilled diversions. + +@example +ifdef(`__unix__', , + `errprint(` skipping: syscmd does not have unix semantics +')m4exit(`77')')dnl +changequote(`[', `]')dnl +syscmd([echo 'divert(1)hi +format(%1000000d, 1)' | ']__program__[' | sed -n 1p])dnl +@result{}hi +sysval +@result{}0 +@end example + +@comment Avoid quadratic copying time when transferring diversions; +@comment test both in-memory and spilled to file. + +@comment examples +@example +$ @kbd{m4 -I examples} +include(`forloop2.m4')dnl +divert(`1')format(`%10000s', `')dnl +forloop(`i', `1', `10000', + `divert(incr(i))undivert(i)')dnl +divert(`9001')format(`%1000000s', `')dnl +forloop(`i', `9001', `10000', + `divert(incr(i))undivert(i)')dnl +divert(`-1')undivert +@end example +@end ignore + +Diversions make it possible to generate output in a different order than +the input was read. It is possible to implement topological sorting +dependencies. For example, GNU Autoconf makes use of +diversions under the hood to ensure that the expansion of a prerequisite +macro appears in the output prior to the expansion of a dependent macro, +regardless of which order the two macros were invoked in the user's +input file. + +@menu +* Divert:: Diverting output +* Undivert:: Undiverting output +* Divnum:: Diversion numbers +* Cleardivert:: Discarding diverted text +@end menu + +@node Divert +@section Diverting output + +@cindex diverting output to files +@cindex output, diverting to files +@cindex files, diverting output to +Output is diverted using @code{divert}: + +@deffn Builtin divert (@dvar{number, 0}) +The current diversion is changed to @var{number}. If @var{number} is left +out or empty, it is assumed to be zero. If @var{number} cannot be +parsed, the diversion is unchanged. + +The expansion of @code{divert} is void. +@end deffn + +When all the @code{m4} input will have been processed, all existing +diversions are automatically undiverted, in numerical order. + +@example +divert(`1') +This text is diverted. +divert +@result{} +This text is not diverted. +@result{}This text is not diverted. +^D +@result{} +@result{}This text is diverted. +@end example + +Several calls of @code{divert} with the same argument do not overwrite +the previous diverted text, but append to it. Diversions are printed +after any wrapped text is expanded. + +@example +define(`text', `TEXT') +@result{} +divert(`1')`diverted text.' +divert +@result{} +m4wrap(`Wrapped text precedes ') +@result{} +^D +@result{}Wrapped TEXT precedes diverted text. +@end example + +@cindex discarding input +@cindex input, discarding +If output is diverted to a negative diversion, it is simply discarded. +This can be used to suppress unwanted output. A common example of +unwanted output is the trailing newlines after macro definitions. Here +is a common programming idiom in @code{m4} for avoiding them. + +@example +divert(`-1') +define(`foo', `Macro `foo'.') +define(`bar', `Macro `bar'.') +divert +@result{} +@end example + +@cindex GNU extensions +Traditional implementations only supported ten diversions. But as a +GNU extension, diversion numbers can be as large as positive +integers will allow, rather than treating a multi-digit diversion number +as a request to discard text. + +@example +divert(eval(`1<<28'))world +divert(`2')hello +^D +@result{}hello +@result{}world +@end example + +Note that @code{divert} is an English word, but also an active macro +without arguments. When processing plain text, the word might appear in +normal text and be unintentionally swallowed as a macro invocation. One +way to avoid this is to use the @option{-P} option to rename all +builtins (@pxref{Operation modes, , Invoking m4}). Another is to write +a wrapper that requires a parameter to be recognized. + +@example +We decided to divert the stream for irrigation. +@result{}We decided to the stream for irrigation. +define(`divert', `ifelse(`$#', `0', ``$0'', `builtin(`$0', $@@)')') +@result{} +divert(`-1') +Ignored text. +divert(`0') +@result{} +We decided to divert the stream for irrigation. +@result{}We decided to divert the stream for irrigation. +@end example + +@node Undivert +@section Undiverting output + +Diverted text can be undiverted explicitly using the builtin +@code{undivert}: + +@deffn Builtin undivert (@ovar{diversions@dots{}}) +Undiverts the numeric @var{diversions} given by the arguments, in the +order given. If no arguments are supplied, all diversions are +undiverted, in numerical order. + +@cindex file inclusion +@cindex inclusion, of files +@cindex GNU extensions +As a GNU extension, @var{diversions} may contain non-numeric +strings, which are treated as the names of files to copy into the output +without expansion. A warning is issued if a file could not be opened. + +The expansion of @code{undivert} is void. +@end deffn + +@example +divert(`1') +This text is diverted. +divert +@result{} +This text is not diverted. +@result{}This text is not diverted. +undivert(`1') +@result{} +@result{}This text is diverted. +@result{} +@end example + +Notice the last two blank lines. One of them comes from the newline +following @code{undivert}, the other from the newline that followed the +@code{divert}! A diversion often starts with a blank line like this. + +When diverted text is undiverted, it is @emph{not} reread by @code{m4}, +but rather copied directly to the current output, and it is therefore +not an error to undivert into a diversion. Undiverting the empty string +is the same as specifying diversion 0; in either case nothing happens +since the output has already been flushed. + +@example +divert(`1')diverted text +divert +@result{} +undivert() +@result{} +undivert(`0') +@result{} +undivert +@result{}diverted text +@result{} +divert(`1')more +divert(`2')undivert(`1')diverted text`'divert +@result{} +undivert(`1') +@result{} +undivert(`2') +@result{}more +@result{}diverted text +@end example + +When a diversion has been undiverted, the diverted text is discarded, +and it is not possible to bring back diverted text more than once. + +@example +divert(`1') +This text is diverted first. +divert(`0')undivert(`1')dnl +@result{} +@result{}This text is diverted first. +undivert(`1') +@result{} +divert(`1') +This text is also diverted but not appended. +divert(`0')undivert(`1')dnl +@result{} +@result{}This text is also diverted but not appended. +@end example + +Attempts to undivert the current diversion are silently ignored. Thus, +when the current diversion is not 0, the current diversion does not get +rearranged among the other diversions. + +@example +divert(`1')one +divert(`2')two +divert(`3')three +divert(`2')undivert`'dnl +divert`'undivert`'dnl +@result{}two +@result{}one +@result{}three +@end example + +@cindex GNU extensions +@cindex file inclusion +@cindex inclusion, of files +GNU @code{m4} allows named files to be undiverted. Given a +non-numeric argument, the contents of the file named will be copied, +uninterpreted, to the current output. This complements the builtin +@code{include} (@pxref{Include}). To illustrate the difference, assume +the file @file{foo} contains: + +@comment ignore +@example +$ @kbd{cat foo} +bar +@end example + +@noindent +then + +@example +define(`bar', `BAR') +@result{} +undivert(`foo') +@result{}bar +@result{} +include(`foo') +@result{}BAR +@result{} +@end example + +If the file is not found (or cannot be read), an error message is +issued, and the expansion is void. It is possible to intermix files +and diversion numbers. + +@example +divert(`1')diversion one +divert(`2')undivert(`foo')dnl +divert(`3')diversion three +divert`'dnl +undivert(`1', `2', `foo', `3')dnl +@result{}diversion one +@result{}bar +@result{}bar +@result{}diversion three +@end example + +@node Divnum +@section Diversion numbers + +@cindex diversion numbers +The current diversion is tracked by the builtin @code{divnum}: + +@deffn Builtin divnum +Expands to the number of the current diversion. +@end deffn + +@example +Initial divnum +@result{}Initial 0 +divert(`1') +Diversion one: divnum +divert(`2') +Diversion two: divnum +^D +@result{} +@result{}Diversion one: 1 +@result{} +@result{}Diversion two: 2 +@end example + +@node Cleardivert +@section Discarding diverted text + +@cindex discarding diverted text +@cindex diverted text, discarding +Often it is not known, when output is diverted, whether the diverted +text is actually needed. Since all non-empty diversion are brought back +on the main output stream when the end of input is seen, a method of +discarding a diversion is needed. If all diversions should be +discarded, the easiest is to end the input to @code{m4} with +@samp{divert(`-1')} followed by an explicit @samp{undivert}: + +@example +divert(`1') +Diversion one: divnum +divert(`2') +Diversion two: divnum +divert(`-1') +undivert +^D +@end example + +@noindent +No output is produced at all. + +Clearing selected diversions can be done with the following macro: + +@deffn Composite cleardivert (@ovar{diversions@dots{}}) +Discard the contents of each of the listed numeric @var{diversions}. +@end deffn + +@example +define(`cleardivert', +`pushdef(`_n', divnum)divert(`-1')undivert($@@)divert(_n)popdef(`_n')') +@result{} +@end example + +It is called just like @code{undivert}, but the effect is to clear the +diversions, given by the arguments. (This macro has a nasty bug! You +should try to see if you can find it and correct it; or @pxref{Improved +cleardivert, , Answers}). + +@node Text handling +@chapter Macros for text handling + +There are a number of builtins in @code{m4} for manipulating text in +various ways, extracting substrings, searching, substituting, and so on. + +@menu +* Len:: Calculating length of strings +* Index macro:: Searching for substrings +* Regexp:: Searching for regular expressions +* Substr:: Extracting substrings +* Translit:: Translating characters +* Patsubst:: Substituting text by regular expression +* Format:: Formatting strings (printf-like) +@end menu + +@node Len +@section Calculating length of strings + +@cindex length of strings +@cindex strings, length of +The length of a string can be calculated by @code{len}: + +@deffn Builtin len (@var{string}) +Expands to the length of @var{string}, as a decimal number. + +The macro @code{len} is recognized only with parameters. +@end deffn + +@example +len() +@result{}0 +len(`abcdef') +@result{}6 +@end example + +@node Index macro +@section Searching for substrings + +@cindex substrings, locating +Searching for substrings is done with @code{index}: + +@deffn Builtin index (@var{string}, @var{substring}) +Expands to the index of the first occurrence of @var{substring} in +@var{string}. The first character in @var{string} has index 0. If +@var{substring} does not occur in @var{string}, @code{index} expands to +@samp{-1}. + +The macro @code{index} is recognized only with parameters. +@end deffn + +@example +index(`gnus, gnats, and armadillos', `nat') +@result{}7 +index(`gnus, gnats, and armadillos', `dag') +@result{}-1 +@end example + +Omitting @var{substring} evokes a warning, but still produces output; +contrast this with an empty @var{substring}. + +@example +index(`abc') +@error{}m4:stdin:1: Warning: too few arguments to builtin `index' +@result{}0 +index(`abc', `') +@result{}0 +index(`abc', `b') +@result{}1 +@end example + +@ignore +@comment Expose a bug in the strstr() algorithm present in glibc +@comment 2.9 through 2.12 and in gnulib up to Sep 2010. + +@example +index(`;:11-:12-:12-:12-:12-:12-:12-:12-:12.:12.:12.:12.:12.:12.:12.:12.:12-', +`:12-:12-:12-:12-:12-:12-:12-:12-') +@result{}-1 +@end example + +@comment Expose a bug in the gnulib replacement strstr() algorithm +@comment present from Jun 2010 to Feb 2011, including m4 1.4.15. + +@example +index(`..wi.d.', `.d.') +@result{}4 +@end example +@end ignore + +@node Regexp +@section Searching for regular expressions + +@cindex basic regular expressions +@cindex regular expressions +@cindex expressions, regular +@cindex GNU extensions +Searching for regular expressions is done with the builtin +@code{regexp}: + +@deffn Builtin regexp (@var{string}, @var{regexp}, @ovar{replacement}) +Searches for @var{regexp} in @var{string}. The syntax for regular +expressions is the same as in GNU Emacs, which is similar to +BRE, Basic Regular Expressions in POSIX. +@ifnothtml +@xref{Regexps, , Syntax of Regular Expressions, emacs, The GNU Emacs +Manual}. +@end ifnothtml +@ifhtml +See +@uref{http://www.gnu.org/@/software/@/emacs/@/manual/@/emacs.html#Regexps, +Syntax of Regular Expressions} in the GNU Emacs Manual. +@end ifhtml +Support for ERE, Extended Regular Expressions is not +available, but will be added in GNU M4 2.0. + +If @var{replacement} is omitted, @code{regexp} expands to the index of +the first match of @var{regexp} in @var{string}. If @var{regexp} does +not match anywhere in @var{string}, it expands to -1. + +If @var{replacement} is supplied, and there was a match, @code{regexp} +changes the expansion to this argument, with @samp{\@var{n}} substituted +by the text matched by the @var{n}th parenthesized sub-expression of +@var{regexp}, up to nine sub-expressions. The escape @samp{\&} is +replaced by the text of the entire regular expression matched. For +all other characters, @samp{\} treats the next character literally. A +warning is issued if there were fewer sub-expressions than the +@samp{\@var{n}} requested, or if there is a trailing @samp{\}. If there +was no match, @code{regexp} expands to the empty string. + +The macro @code{regexp} is recognized only with parameters. +@end deffn + +@example +regexp(`GNUs not Unix', `\<[a-z]\w+') +@result{}5 +regexp(`GNUs not Unix', `\<Q\w*') +@result{}-1 +regexp(`GNUs not Unix', `\w\(\w+\)$', `*** \& *** \1 ***') +@result{}*** Unix *** nix *** +regexp(`GNUs not Unix', `\<Q\w*', `*** \& *** \1 ***') +@result{} +@end example + +Here are some more examples on the handling of backslash: + +@example +regexp(`abc', `\(b\)', `\\\10\a') +@result{}\b0a +regexp(`abc', `b', `\1\') +@error{}m4:stdin:2: Warning: sub-expression 1 not present +@error{}m4:stdin:2: Warning: trailing \ ignored in replacement +@result{} +regexp(`abc', `\(\(d\)?\)\(c\)', `\1\2\3\4\5\6') +@error{}m4:stdin:3: Warning: sub-expression 4 not present +@error{}m4:stdin:3: Warning: sub-expression 5 not present +@error{}m4:stdin:3: Warning: sub-expression 6 not present +@result{}c +@end example + +Omitting @var{regexp} evokes a warning, but still produces output; +contrast this with an empty @var{regexp} argument. + +@example +regexp(`abc') +@error{}m4:stdin:1: Warning: too few arguments to builtin `regexp' +@result{}0 +regexp(`abc', `') +@result{}0 +regexp(`abc', `', `\\def') +@result{}\def +@end example + +@node Substr +@section Extracting substrings + +@cindex extracting substrings +@cindex substrings, extracting +Substrings are extracted with @code{substr}: + +@deffn Builtin substr (@var{string}, @var{from}, @ovar{length}) +Expands to the substring of @var{string}, which starts at index +@var{from}, and extends for @var{length} characters, or to the end of +@var{string}, if @var{length} is omitted. The starting index of a string +is always 0. The expansion is empty if there is an error parsing +@var{from} or @var{length}, if @var{from} is beyond the end of +@var{string}, or if @var{length} is negative. + +The macro @code{substr} is recognized only with parameters. +@end deffn + +@example +substr(`gnus, gnats, and armadillos', `6') +@result{}gnats, and armadillos +substr(`gnus, gnats, and armadillos', `6', `5') +@result{}gnats +@end example + +Omitting @var{from} evokes a warning, but still produces output. + +@example +substr(`abc') +@error{}m4:stdin:1: Warning: too few arguments to builtin `substr' +@result{}abc +substr(`abc',) +@error{}m4:stdin:2: empty string treated as 0 in builtin `substr' +@result{}abc +@end example + +@node Translit +@section Translating characters + +@cindex translating characters +@cindex characters, translating +Character translation is done with @code{translit}: + +@deffn Builtin translit (@var{string}, @var{chars}, @ovar{replacement}) +Expands to @var{string}, with each character that occurs in +@var{chars} translated into the character from @var{replacement} with +the same index. + +If @var{replacement} is shorter than @var{chars}, the excess characters +of @var{chars} are deleted from the expansion; if @var{chars} is +shorter, the excess characters in @var{replacement} are silently +ignored. If @var{replacement} is omitted, all characters in +@var{string} that are present in @var{chars} are deleted from the +expansion. If a character appears more than once in @var{chars}, only +the first instance is used in making the translation. Only a single +translation pass is made, even if characters in @var{replacement} also +appear in @var{chars}. + +As a GNU extension, both @var{chars} and @var{replacement} can +contain character-ranges, e.g., @samp{a-z} (meaning all lowercase +letters) or @samp{0-9} (meaning all digits). To include a dash @samp{-} +in @var{chars} or @var{replacement}, place it first or last in the +entire string, or as the last character of a range. Back-to-back ranges +can share a common endpoint. It is not an error for the last character +in the range to be `larger' than the first. In that case, the range +runs backwards, i.e., @samp{9-0} means the string @samp{9876543210}. +The expansion of a range is dependent on the underlying encoding of +characters, so using ranges is not always portable between machines. + +The macro @code{translit} is recognized only with parameters. +@end deffn + +@example +translit(`GNUs not Unix', `A-Z') +@result{}s not nix +translit(`GNUs not Unix', `a-z', `A-Z') +@result{}GNUS NOT UNIX +translit(`GNUs not Unix', `A-Z', `z-a') +@result{}tmfs not fnix +translit(`+,-12345', `+--1-5', `<;>a-c-a') +@result{}<;>abcba +translit(`abcdef', `aabdef', `bcged') +@result{}bgced +@end example + +In the @sc{ascii} encoding, the first example deletes all uppercase +letters, the second converts lowercase to uppercase, and the third +`mirrors' all uppercase letters, while converting them to lowercase. +The two first cases are by far the most common, even though they are not +portable to @sc{ebcdic} or other encodings. The fourth example shows a +range ending in @samp{-}, as well as back-to-back ranges. The final +example shows that @samp{a} is mapped to @samp{b}, not @samp{c}; the +resulting @samp{b} is not further remapped to @samp{g}; the @samp{d} and +@samp{e} are swapped, and the @samp{f} is discarded. + +@ignore +@comment No need to fight 8-bit characters, as it is difficult to get +@comment rendering right in both info and dvi. + +@example +translit(`«abc~', `~-»') +@result{}abc +@end example + +@comment Stress test short arguments, since they use a different code +@comment path. +@example +translit(`abcdeabcde', `a') +@result{}bcdebcde +translit(`abcdeabcde', `ab') +@result{}cdecde +translit(`abcdeabcde', `a', `f') +@result{}fbcdefbcde +translit(`abcdeabcde', `a', `f') +@result{}fbcdefbcde +translit(`abcdeabcde', `a', `fg') +@result{}fbcdefbcde +translit(`abcdeabcde', `ab', `f') +@result{}fcdefcde +translit(`abcdeabcde', `ab', `fg') +@result{}fgcdefgcde +translit(`abcdeabcde', `ab', `ba') +@result{}bacdebacde +translit(`abcdeabcde', `e', `f') +@result{}abcdfabcdf +translit(`abc', `', `cde') +@result{}abc +translit(`', `a', `bc') +@result{} +@end example +@end ignore + +Omitting @var{chars} evokes a warning, but still produces output. + +@example +translit(`abc') +@error{}m4:stdin:1: Warning: too few arguments to builtin `translit' +@result{}abc +@end example + +@node Patsubst +@section Substituting text by regular expression + +@cindex basic regular expressions +@cindex regular expressions +@cindex expressions, regular +@cindex pattern substitution +@cindex substitution by regular expression +@cindex GNU extensions +Global substitution in a string is done by @code{patsubst}: + +@deffn Builtin patsubst (@var{string}, @var{regexp}, @ovar{replacement}) +Searches @var{string} for matches of @var{regexp}, and substitutes +@var{replacement} for each match. The syntax for regular expressions +is the same as in GNU Emacs (@pxref{Regexp}). + +The parts of @var{string} that are not covered by any match of +@var{regexp} are copied to the expansion. Whenever a match is found, the +search proceeds from the end of the match, so a character from +@var{string} will never be substituted twice. If @var{regexp} matches a +string of zero length, the start position for the search is incremented, +to avoid infinite loops. + +When a replacement is to be made, @var{replacement} is inserted into +the expansion, with @samp{\@var{n}} substituted by the text matched by +the @var{n}th parenthesized sub-expression of @var{patsubst}, for up to +nine sub-expressions. The escape @samp{\&} is replaced by the text of +the entire regular expression matched. For all other characters, +@samp{\} treats the next character literally. A warning is issued if +there were fewer sub-expressions than the @samp{\@var{n}} requested, or +if there is a trailing @samp{\}. + +The @var{replacement} argument can be omitted, in which case the text +matched by @var{regexp} is deleted. + +The macro @code{patsubst} is recognized only with parameters. +@end deffn + +@example +patsubst(`GNUs not Unix', `^', `OBS: ') +@result{}OBS: GNUs not Unix +patsubst(`GNUs not Unix', `\<', `OBS: ') +@result{}OBS: GNUs OBS: not OBS: Unix +patsubst(`GNUs not Unix', `\w*', `(\&)') +@result{}(GNUs)() (not)() (Unix)() +patsubst(`GNUs not Unix', `\w+', `(\&)') +@result{}(GNUs) (not) (Unix) +patsubst(`GNUs not Unix', `[A-Z][a-z]+') +@result{}GN not@w{ } +patsubst(`GNUs not Unix', `not', `NOT\') +@error{}m4:stdin:6: Warning: trailing \ ignored in replacement +@result{}GNUs NOT Unix +@end example + +Here is a slightly more realistic example, which capitalizes individual +words or whole sentences, by substituting calls of the macros +@code{upcase} and @code{downcase} into the strings. + +@deffn Composite upcase (@var{text}) +@deffnx Composite downcase (@var{text}) +@deffnx Composite capitalize (@var{text}) +Expand to @var{text}, but with capitalization changed: @code{upcase} +changes all letters to upper case, @code{downcase} changes all letters +to lower case, and @code{capitalize} changes the first character of each +word to upper case and the remaining characters to lower case. +@end deffn + +First, an example of their usage, using implementations distributed in +@file{m4-@value{VERSION}/@/examples/@/capitalize.m4}. + +@comment examples +@example +$ @kbd{m4 -I examples} +include(`capitalize.m4') +@result{} +upcase(`GNUs not Unix') +@result{}GNUS NOT UNIX +downcase(`GNUs not Unix') +@result{}gnus not unix +capitalize(`GNUs not Unix') +@result{}Gnus Not Unix +@end example + +Now for the implementation. There is a helper macro @code{_capitalize} +which puts only its first word in mixed case. Then @code{capitalize} +merely parses out the words, and replaces them with an invocation of +@code{_capitalize}. (As presented here, the @code{capitalize} macro has +some subtle flaws. You should try to see if you can find and correct +them; or @pxref{Improved capitalize, , Answers}). + +@comment examples +@example +$ @kbd{m4 -I examples} +undivert(`capitalize.m4')dnl +@result{}divert(`-1') +@result{}# upcase(text) +@result{}# downcase(text) +@result{}# capitalize(text) +@result{}# change case of text, simple version +@result{}define(`upcase', `translit(`$*', `a-z', `A-Z')') +@result{}define(`downcase', `translit(`$*', `A-Z', `a-z')') +@result{}define(`_capitalize', +@result{} `regexp(`$1', `^\(\w\)\(\w*\)', +@result{} `upcase(`\1')`'downcase(`\2')')') +@result{}define(`capitalize', `patsubst(`$1', `\w+', `_$0(`\&')')') +@result{}divert`'dnl +@end example + +While @code{regexp} replaces the whole input with the replacement as +soon as there is a match, @code{patsubst} replaces each +@emph{occurrence} of a match and preserves non-matching pieces: + +@example +define(`patreg', +`patsubst($@@) +regexp($@@)')dnl +patreg(`bar foo baz Foo', `foo\|Foo', `FOO') +@result{}bar FOO baz FOO +@result{}FOO +patreg(`aba abb 121', `\(.\)\(.\)\1', `\2\1\2') +@result{}bab abb 212 +@result{}bab +@end example + +Omitting @var{regexp} evokes a warning, but still produces output; +contrast this with an empty @var{regexp} argument. + +@example +patsubst(`abc') +@error{}m4:stdin:1: Warning: too few arguments to builtin `patsubst' +@result{}abc +patsubst(`abc', `') +@result{}abc +patsubst(`abc', `', `\\-') +@result{}\-a\-b\-c\- +@end example + +@node Format +@section Formatting strings (printf-like) + +@cindex formatted output +@cindex output, formatted +@cindex GNU extensions +Formatted output can be made with @code{format}: + +@deffn Builtin format (@var{format-string}, @dots{}) +Works much like the C function @code{printf}. The first argument +@var{format-string} can contain @samp{%} specifications which are +satisfied by additional arguments, and the expansion of @code{format} is +the formatted string. + +The macro @code{format} is recognized only with parameters. +@end deffn + +Its use is best described by a few examples: + +@comment This test is a bit fragile, if someone tries to port to a +@comment platform without infinity. +@example +define(`foo', `The brown fox jumped over the lazy dog') +@result{} +format(`The string "%s" uses %d characters', foo, len(foo)) +@result{}The string "The brown fox jumped over the lazy dog" uses 38 characters +format(`%*.*d', `-1', `-1', `1') +@result{}1 +format(`%.0f', `56789.9876') +@result{}56790 +len(format(`%-*X', `5000', `1')) +@result{}5000 +ifelse(format(`%010F', `infinity'), ` INF', `success', + format(`%010F', `infinity'), ` INFINITY', `success', + format(`%010F', `infinity')) +@result{}success +ifelse(format(`%.1A', `1.999'), `0X1.0P+1', `success', + format(`%.1A', `1.999'), `0X2.0P+0', `success', + format(`%.1A', `1.999')) +@result{}success +format(`%g', `0xa.P+1') +@result{}20 +@end example + +Using the @code{forloop} macro defined earlier (@pxref{Forloop}), this +example shows how @code{format} can be used to produce tabular output. + +@comment examples +@example +$ @kbd{m4 -I examples} +include(`forloop.m4') +@result{} +forloop(`i', `1', `10', `format(`%6d squared is %10d +', i, eval(i**2))') +@result{} 1 squared is 1 +@result{} 2 squared is 4 +@result{} 3 squared is 9 +@result{} 4 squared is 16 +@result{} 5 squared is 25 +@result{} 6 squared is 36 +@result{} 7 squared is 49 +@result{} 8 squared is 64 +@result{} 9 squared is 81 +@result{} 10 squared is 100 +@result{} +@end example + +The builtin @code{format} is modeled after the ANSI C @samp{printf} +function, and supports these @samp{%} specifiers: @samp{c}, @samp{s}, +@samp{d}, @samp{o}, @samp{x}, @samp{X}, @samp{u}, @samp{a}, @samp{A}, +@samp{e}, @samp{E}, @samp{f}, @samp{F}, @samp{g}, @samp{G}, and +@samp{%}; it supports field widths and precisions, and the flags +@samp{+}, @samp{-}, @samp{ }, @samp{0}, @samp{#}, and @samp{'}. For +integer specifiers, the width modifiers @samp{hh}, @samp{h}, and +@samp{l} are recognized, and for floating point specifiers, the width +modifier @samp{l} is recognized. Items not yet supported include +positional arguments, the @samp{n}, @samp{p}, @samp{S}, and @samp{C} +specifiers, the @samp{z}, @samp{t}, @samp{j}, @samp{L} and @samp{ll} +modifiers, and any platform extensions available in the native +@code{printf}. For more details on the functioning of @code{printf}, +see the C Library Manual, or the POSIX specification (for +example, @samp{%a} is supported even on platforms that haven't yet +implemented C99 hexadecimal floating point output natively). + +Unrecognized specifiers result in a warning. It is anticipated that a +future release of GNU @code{m4} will support more specifiers, +and give better warnings when various problems such as overflow are +encountered. Likewise, escape sequences are not yet recognized. + +@example +format(`%p', `0') +@error{}m4:stdin:1: Warning: unrecognized specifier in `%p' +@result{} +@end example + +@ignore +@comment Expose a crash with a bad format string fixed in 1.4.15. +@comment Unfortuntely, 8-bit bytes are hard to check for; but the +@comment exit status is enough to sniff the crash in broken versions. + +@comment xerr: ignore +@example +format(`%'format(`%c', `128')) +@result{} +@end example +@end ignore + +@node Arithmetic +@chapter Macros for doing arithmetic + +@cindex arithmetic +@cindex integer arithmetic +Integer arithmetic is included in @code{m4}, with a C-like syntax. As +convenient shorthands, there are builtins for simple increment and +decrement operations. + +@menu +* Incr:: Decrement and increment operators +* Eval:: Evaluating integer expressions +@end menu + +@node Incr +@section Decrement and increment operators + +@cindex decrement operator +@cindex increment operator +Increment and decrement of integers are supported using the builtins +@code{incr} and @code{decr}: + +@deffn Builtin incr (@var{number}) +@deffnx Builtin decr (@var{number}) +Expand to the numerical value of @var{number}, incremented +or decremented, respectively, by one. Except for the empty string, the +expansion is empty if @var{number} could not be parsed. + +The macros @code{incr} and @code{decr} are recognized only with +parameters. +@end deffn + +@example +incr(`4') +@result{}5 +decr(`7') +@result{}6 +incr() +@error{}m4:stdin:3: empty string treated as 0 in builtin `incr' +@result{}1 +decr() +@error{}m4:stdin:4: empty string treated as 0 in builtin `decr' +@result{}-1 +@end example + +@node Eval +@section Evaluating integer expressions + +@cindex integer expression evaluation +@cindex evaluation, of integer expressions +@cindex expressions, evaluation of integer +Integer expressions are evaluated with @code{eval}: + +@deffn Builtin eval (@var{expression}, @dvar{radix, 10}, @ovar{width}) +Expands to the value of @var{expression}. The expansion is empty +if a problem is encountered while parsing the arguments. If specified, +@var{radix} and @var{width} control the format of the output. + +Calculations are done with 32-bit signed numbers. Overflow silently +results in wraparound. A warning is issued if division by zero is +attempted, or if @var{expression} could not be parsed. + +Expressions can contain the following operators, listed in order of +decreasing precedence. + +@table @samp +@item () +Parentheses +@item + - ~ ! +Unary plus and minus, and bitwise and logical negation +@item ** +Exponentiation +@item * / % +Multiplication, division, and modulo +@item + - +Addition and subtraction +@item << >> +Shift left or right +@item > >= < <= +Relational operators +@item == != +Equality operators +@item & +Bitwise and +@item ^ +Bitwise exclusive-or +@item | +Bitwise or +@item && +Logical and +@item || +Logical or +@end table + +The macro @code{eval} is recognized only with parameters. +@end deffn + +All binary operators, except exponentiation, are left associative. C +operators that perform variable assignment, such as @samp{+=} or +@samp{--}, are not implemented, since @code{eval} only operates on +constants, not variables. Attempting to use them results in an error. +However, since traditional implementations treated @samp{=} as an +undocumented alias for @samp{==} as opposed to an assignment operator, +this usage is supported as a special case. Be aware that a future +version of GNU M4 may support assignment semantics as an +extension when POSIX mode is not requested, and that using +@samp{=} to check equality is not portable. + +@comment status: 1 +@example +eval(`2 = 2') +@error{}m4:stdin:1: Warning: recommend ==, not =, for equality operator +@result{}1 +eval(`++0') +@error{}m4:stdin:2: invalid operator in eval: ++0 +@result{} +eval(`0 |= 1') +@error{}m4:stdin:3: invalid operator in eval: 0 |= 1 +@result{} +@end example + +Note that some older @code{m4} implementations use @samp{^} as an +alternate operator for the exponentiation, although POSIX +requires the C behavior of bitwise exclusive-or. The precedence of the +negation operators, @samp{~} and @samp{!}, was traditionally lower than +equality. The unary operators could not be used reliably more than once +on the same term without intervening parentheses. The traditional +precedence of the equality operators @samp{==} and @samp{!=} was +identical instead of lower than the relational operators such as +@samp{<}, even through GNU M4 1.4.8. Starting with version +1.4.9, GNU M4 correctly follows POSIX precedence +rules. M4 scripts designed to be portable between releases must be +aware that parentheses may be required to enforce C precedence rules. +Likewise, division by zero, even in the unused branch of a +short-circuiting operator, is not always well-defined in other +implementations. + +Following are some examples where the current version of M4 follows C +precedence rules, but where older versions and some other +implementations of @code{m4} require explicit parentheses to get the +correct result: + +@example +eval(`1 == 2 > 0') +@result{}1 +eval(`(1 == 2) > 0') +@result{}0 +eval(`! 0 * 2') +@result{}2 +eval(`! (0 * 2)') +@result{}1 +eval(`1 | 1 ^ 1') +@result{}1 +eval(`(1 | 1) ^ 1') +@result{}0 +eval(`+ + - ~ ! ~ 0') +@result{}1 +eval(`2 || 1 / 0') +@result{}1 +eval(`0 || 1 / 0') +@error{}m4:stdin:9: divide by zero in eval: 0 || 1 / 0 +@result{} +eval(`0 && 1 % 0') +@result{}0 +eval(`2 && 1 % 0') +@error{}m4:stdin:11: modulo by zero in eval: 2 && 1 % 0 +@result{} +@end example + +@cindex GNU extensions +As a GNU extension, the operator @samp{**} performs integral +exponentiation. The operator is right-associative, and if evaluated, +the exponent must be non-negative, and at least one of the arguments +must be non-zero, or a warning is issued. + +@example +eval(`2 ** 3 ** 2') +@result{}512 +eval(`(2 ** 3) ** 2') +@result{}64 +eval(`0 ** 1') +@result{}0 +eval(`2 ** 0') +@result{}1 +eval(`0 ** 0') +@result{} +@error{}m4:stdin:5: divide by zero in eval: 0 ** 0 +eval(`4 ** -2') +@error{}m4:stdin:6: negative exponent in eval: 4 ** -2 +@result{} +@end example + +Within @var{expression}, (but not @var{radix} or @var{width}), numbers +without a special prefix are decimal. A simple @samp{0} prefix +introduces an octal number. @samp{0x} introduces a hexadecimal number. +As GNU extensions, @samp{0b} introduces a binary number. +@samp{0r} introduces a number expressed in any radix between 1 and 36: +the prefix should be immediately followed by the decimal expression of +the radix, a colon, then the digits making the number. For radix 1, +leading zeros are ignored, and all remaining digits must be @samp{1}; +for all other radices, the digits are @samp{0}, @samp{1}, @samp{2}, +@dots{}. Beyond @samp{9}, the digits are @samp{a}, @samp{b} @dots{} up +to @samp{z}. Lower and upper case letters can be used interchangeably +in numbers prefixes and as number digits. + +Parentheses may be used to group subexpressions whenever needed. For the +relational operators, a true relation returns @code{1}, and a false +relation return @code{0}. + +Here are a few examples of use of @code{eval}. + +@example +eval(`-3 * 5') +@result{}-15 +eval(`-99 / 10') +@result{}-9 +eval(`-99 % 10') +@result{}-9 +eval(`99 % -10') +@result{}9 +eval(index(`Hello world', `llo') >= 0) +@result{}1 +eval(`0r1:0111 + 0b100 + 0r3:12') +@result{}12 +define(`square', `eval(`($1) ** 2')') +@result{} +square(`9') +@result{}81 +square(square(`5')` + 1') +@result{}676 +define(`foo', `666') +@result{} +eval(`foo / 6') +@error{}m4:stdin:11: bad expression in eval: foo / 6 +@result{} +eval(foo / 6) +@result{}111 +@end example + +As the last two lines show, @code{eval} does not handle macro +names, even if they expand to a valid expression (or part of a valid +expression). Therefore all macros must be expanded before they are +passed to @code{eval}. + +Some calculations are not portable to other implementations, since they +have undefined semantics in C, but GNU @code{m4} has +well-defined behavior on overflow. When shifting, an out-of-range shift +amount is implicitly brought into the range of 32-bit signed integers +using an implicit bit-wise and with 0x1f). + +@example +define(`max_int', eval(`0x7fffffff')) +@result{} +define(`min_int', incr(max_int)) +@result{} +eval(min_int` < 0') +@result{}1 +eval(max_int` > 0') +@result{}1 +ifelse(eval(min_int` / -1'), min_int, `overflow occurred') +@result{}overflow occurred +min_int +@result{}-2147483648 +eval(`0x80000000 % -1') +@result{}0 +eval(`-4 >> 1') +@result{}-2 +eval(`-4 >> 33') +@result{}-2 +@end example + +If @var{radix} is specified, it specifies the radix to be used in the +expansion. The default radix is 10; this is also the case if +@var{radix} is the empty string. A warning results if the radix is +outside the range of 1 through 36, inclusive. The result of @code{eval} +is always taken to be signed. No radix prefix is output, and for +radices greater than 10, the digits are lower case. The @var{width} +argument specifies the minimum output width, excluding any negative +sign. The result is zero-padded to extend the expansion to the +requested width. A warning results if the width is negative. If +@var{radix} or @var{width} is out of bounds, the expansion of +@code{eval} is empty. + +@example +eval(`666', `10') +@result{}666 +eval(`666', `11') +@result{}556 +eval(`666', `6') +@result{}3030 +eval(`666', `6', `10') +@result{}0000003030 +eval(`-666', `6', `10') +@result{}-0000003030 +eval(`10', `', `0') +@result{}10 +`0r1:'eval(`10', `1', `11') +@result{}0r1:01111111111 +eval(`10', `16') +@result{}a +eval(`1', `37') +@error{}m4:stdin:9: radix 37 in builtin `eval' out of range +@result{} +eval(`1', , `-1') +@error{}m4:stdin:10: negative width to builtin `eval' +@result{} +eval() +@error{}m4:stdin:11: empty string treated as 0 in builtin `eval' +@result{}0 +@end example + +@node Shell commands +@chapter Macros for running shell commands + +@cindex UNIX commands, running +@cindex executing shell commands +@cindex running shell commands +@cindex shell commands, running +@cindex commands, running shell +There are a few builtin macros in @code{m4} that allow you to run shell +commands from within @code{m4}. + +Note that the definition of a valid shell command is system dependent. +On UNIX systems, this is the typical @command{/bin/sh}. But on other +systems, such as native Windows, the shell has a different syntax of +commands that it understands. Some examples in this chapter assume +@command{/bin/sh}, and also demonstrate how to quit early with a known +exit value if this is not the case. + +@menu +* Platform macros:: Determining the platform +* Syscmd:: Executing simple commands +* Esyscmd:: Reading the output of commands +* Sysval:: Exit status +* Mkstemp:: Making temporary files +@end menu + +@node Platform macros +@section Determining the platform + +@cindex platform macros +Sometimes it is desirable for an input file to know which platform +@code{m4} is running on. GNU @code{m4} provides several +macros that are predefined to expand to the empty string; checking for +their existence will confirm platform details. + +@deffn {Optional builtin} __gnu__ +@deffnx {Optional builtin} __os2__ +@deffnx {Optional builtin} os2 +@deffnx {Optional builtin} __unix__ +@deffnx {Optional builtin} unix +@deffnx {Optional builtin} __windows__ +@deffnx {Optional builtin} windows +Each of these macros is conditionally defined as needed to describe the +environment of @code{m4}. If defined, each macro expands to the empty +string. For now, these macros silently ignore all arguments, but in a +future release of M4, they might warn if arguments are present. +@end deffn + +When GNU extensions are in effect (that is, when you did not +use the @option{-G} option, @pxref{Limits control, , Invoking m4}), +GNU @code{m4} will define the macro @code{@w{__gnu__}} to +expand to the empty string. + +@example +$ @kbd{m4} +__gnu__ +@result{} +__gnu__(`ignored') +@result{} +Extensions are ifdef(`__gnu__', `active', `inactive') +@result{}Extensions are active +@end example + +@comment options: -G +@example +$ @kbd{m4 -G} +__gnu__ +@result{}__gnu__ +__gnu__(`ignored') +@result{}__gnu__(ignored) +Extensions are ifdef(`__gnu__', `active', `inactive') +@result{}Extensions are inactive +@end example + +On UNIX systems, GNU @code{m4} will define @code{@w{__unix__}} +by default, or @code{unix} when the @option{-G} option is specified. + +On native Windows systems, GNU @code{m4} will define +@code{@w{__windows__}} by default, or @code{windows} when the +@option{-G} option is specified. + +On OS/2 systems, GNU @code{m4} will define @code{@w{__os2__}} +by default, or @code{os2} when the @option{-G} option is specified. + +If GNU @code{m4} does not provide a platform macro for your system, +please report that as a bug. + +@example +define(`provided', `0') +@result{} +ifdef(`__unix__', `define(`provided', incr(provided))') +@result{} +ifdef(`__windows__', `define(`provided', incr(provided))') +@result{} +ifdef(`__os2__', `define(`provided', incr(provided))') +@result{} +provided +@result{}1 +@end example + +@node Syscmd +@section Executing simple commands + +Any shell command can be executed, using @code{syscmd}: + +@deffn Builtin syscmd (@var{shell-command}) +Executes @var{shell-command} as a shell command. + +The expansion of @code{syscmd} is void, @emph{not} the output from +@var{shell-command}! Output or error messages from @var{shell-command} +are not read by @code{m4}. @xref{Esyscmd}, if you need to process the +command output. + +Prior to executing the command, @code{m4} flushes its buffers. +The default standard input, output and error of @var{shell-command} are +the same as those of @code{m4}. + +By default, the @var{shell-command} will be used as the argument to the +@option{-c} option of the @command{/bin/sh} shell (or the version of +@command{sh} specified by @samp{command -p getconf PATH}, if your system +supports that). If you prefer a different shell, the +@command{configure} script can be given the option +@option{--with-syscmd-shell=@var{location}} to set the location of an +alternative shell at GNU @code{m4} installation; the +alternative shell must still support @option{-c}. + +The macro @code{syscmd} is recognized only with parameters. +@end deffn + +@example +define(`foo', `FOO') +@result{} +syscmd(`echo foo') +@result{}foo +@result{} +@end example + +Note how the expansion of @code{syscmd} keeps the trailing newline of +the command, as well as using the newline that appeared after the macro. + +The following is an example of @var{shell-command} using the same +standard input as @code{m4}: + +@comment ignore +@example +$ @kbd{echo "m4wrap(\`syscmd(\`cat')')" | m4} +@result{} +@end example + +@ignore +@comment If the user types the example below with stdin being an +@comment interactive terminal, then cat will hang waiting for additional +@comment input after m4 has exited. But the testsuite is using a pipe +@comment for stdin. Hence, we have two versions - the one we feed the +@comment testsuite below, and the one we display to the user above that +@comment more accurately shows what the testsuite is really doing but +@comment which the testsuite cannot parse. + +@example +m4wrap(`syscmd(`cat')') +@result{} +^D +@end example +@end ignore + +It tells @code{m4} to read all of its input before executing the wrapped +text, then hand a valid (albeit emptied) pipe as standard input for the +@code{cat} subcommand. Therefore, you should be careful when using +standard input (either by specifying no files, or by passing @samp{-} as +a file name on the command line, @pxref{Command line files, , Invoking +m4}), and also invoking subcommands via @code{syscmd} or @code{esyscmd} +that consume data from standard input. When standard input is a +seekable file, the subprocess will pick up with the next character not +yet processed by @code{m4}; when it is a pipe or other non-seekable +file, there is no guarantee how much data will already be buffered by +@code{m4} and thus unavailable to the child. + +@node Esyscmd +@section Reading the output of commands + +@cindex GNU extensions +If you want @code{m4} to read the output of a shell command, use +@code{esyscmd}: + +@deffn Builtin esyscmd (@var{shell-command}) +Expands to the standard output of the shell command +@var{shell-command}. + +Prior to executing the command, @code{m4} flushes its buffers. +The default standard input and standard error of @var{shell-command} are +the same as those of @code{m4}. The error output of @var{shell-command} +is not a part of the expansion: it will appear along with the error +output of @code{m4}. + +By default, the @var{shell-command} will be used as the argument to the +@option{-c} option of the @command{/bin/sh} shell (or the version of +@command{sh} specified by @samp{command -p getconf PATH}, if your system +supports that). If you prefer a different shell, the +@command{configure} script can be given the option +@option{--with-syscmd-shell=@var{location}} to set the location of an +alternative shell at GNU @code{m4} installation; the +alternative shell must still support @option{-c}. + +The macro @code{esyscmd} is recognized only with parameters. +@end deffn + +@example +define(`foo', `FOO') +@result{} +esyscmd(`echo foo') +@result{}FOO +@result{} +@end example + +Note how the expansion of @code{esyscmd} keeps the trailing newline of +the command, as well as using the newline that appeared after the macro. + +Just as with @code{syscmd}, care must be exercised when sharing standard +input between @code{m4} and the child process of @code{esyscmd}. + +@node Sysval +@section Exit status + +@cindex UNIX commands, exit status from +@cindex exit status from shell commands +@cindex shell commands, exit status from +@cindex commands, exit status from shell +@cindex status of shell commands +To see whether a shell command succeeded, use @code{sysval}: + +@deffn Builtin sysval +Expands to the exit status of the last shell command run with +@code{syscmd} or @code{esyscmd}. Expands to 0 if no command has been +run yet. +@end deffn + +@example +sysval +@result{}0 +syscmd(`false') +@result{} +ifelse(sysval, `0', `zero', `non-zero') +@result{}non-zero +syscmd(`exit 2') +@result{} +sysval +@result{}2 +syscmd(`true') +@result{} +sysval +@result{}0 +esyscmd(`false') +@result{} +ifelse(sysval, `0', `zero', `non-zero') +@result{}non-zero +esyscmd(`echo dnl && exit 127') +@result{} +sysval +@result{}127 +esyscmd(`true') +@result{} +sysval +@result{}0 +@end example + +@code{sysval} results in 127 if there was a problem executing the +command, for example, if the system-imposed argument length is exceeded, +or if there were not enough resources to fork. It is not possible to +distinguish between failed execution and successful execution that had +an exit status of 127, unless there was output from the child process. + +On UNIX platforms, where it is possible to detect when command execution +is terminated by a signal, rather than a normal exit, the result is the +signal number shifted left by eight bits. + +@comment This test has difficulties being portable, even on platforms +@comment where syscmd invokes /bin/sh. Kill is not portable with signal +@comment names. According to autoconf, the only portable signal numbers +@comment are 1 (HUP), 2 (INT), 9 (KILL), 13 (PIPE) and 15 (TERM). But +@comment all shells handle SIGINT, and ksh handles HUP (as in, the shell +@comment exits normally rather than letting the signal terminate it). +@comment Also, TERM is flaky, as it can also kill the running m4 on +@comment systems where /bin/sh does not create its own process group. +@comment And PIPE is unreliable, since people tend to run with it +@comment ignored, with m4 inheriting that choice. That leaves KILL as +@comment the only signal we can reliably test. +@example +dnl This test assumes kill is a shell builtin, and that signals are +dnl recognizable. +ifdef(`__unix__', , + `errprint(` skipping: syscmd does not have unix semantics +')m4exit(`77')')dnl +syscmd(`kill -9 $$') +@result{} +sysval +@result{}2304 +syscmd() +@result{} +sysval +@result{}0 +esyscmd(`kill -9 $$') +@result{} +sysval +@result{}2304 +@end example + +@node Mkstemp +@section Making temporary files + +@cindex temporary file names +@cindex files, names of temporary +Commands specified to @code{syscmd} or @code{esyscmd} might need a +temporary file, for output or for some other purpose. There is a +builtin macro, @code{mkstemp}, for making a temporary file: + +@deffn Builtin mkstemp (@var{template}) +@deffnx Builtin maketemp (@var{template}) +Expands to the quoted name of a new, empty file, made from the string +@var{template}, which should end with the string @samp{XXXXXX}. The six +@samp{X} characters are then replaced with random characters matching +the regular expression @samp{[a-zA-Z0-9._-]}, in order to make the file +name unique. If fewer than six @samp{X} characters are found at the end +of @code{template}, the result will be longer than the template. The +created file will have access permissions as if by @kbd{chmod =rw,go=}, +meaning that the current umask of the @code{m4} process is taken into +account, and at most only the current user can read and write the file. + +The traditional behavior, standardized by POSIX, is that +@code{maketemp} merely replaces the trailing @samp{X} with the process +id, without creating a file or quoting the expansion, and without +ensuring that the resulting +string is a unique file name. In part, this means that using the same +@var{template} twice in the same input file will result in the same +expansion. This behavior is a security hole, as it is very easy for +another process to guess the name that will be generated, and thus +interfere with a subsequent use of @code{syscmd} trying to manipulate +that file name. Hence, POSIX has recommended that all new +implementations of @code{m4} provide the secure @code{mkstemp} builtin, +and that users of @code{m4} check for its existence. + +The expansion is void and an error issued if a temporary file could +not be created. + +The macros @code{mkstemp} and @code{maketemp} are recognized only with +parameters. +@end deffn + +If you try this next example, you will most likely get different output +for the two file names, since the replacement characters are randomly +chosen: + +@comment ignore +@example +$ @kbd{m4} +define(`tmp', `oops') +@result{} +maketemp(`/tmp/fooXXXXXX') +@result{}/tmp/fooa07346 +ifdef(`mkstemp', `define(`maketemp', defn(`mkstemp'))', + `define(`mkstemp', defn(`maketemp'))dnl +errprint(`warning: potentially insecure maketemp implementation +')') +@result{} +mkstemp(`doc') +@result{}docQv83Uw +@end example + +@cindex GNU extensions +Unless you use the @option{--traditional} command line option (or +@option{-G}, @pxref{Limits control, , Invoking m4}), the GNU +version of @code{maketemp} is secure. This means that using the same +template to multiple calls will generate multiple files. However, we +recommend that you use the new @code{mkstemp} macro, introduced in +GNU M4 1.4.8, which is secure even in traditional mode. Also, +as of M4 1.4.11, the secure implementation quotes the resulting file +name, so that you are guaranteed to know what file was created even if +the random file name happens to match an existing macro. Notice that +this example is careful to use @code{defn} to avoid unintended expansion +of @samp{foo}. + +@example +$ @kbd{m4} +define(`foo', `errprint(`oops')') +@result{} +syscmd(`rm -f foo-??????')sysval +@result{}0 +define(`file1', maketemp(`foo-XXXXXX'))dnl +ifelse(esyscmd(`echo \` foo-?????? \''), ` foo-?????? ', + `no file', `created') +@result{}created +define(`file2', maketemp(`foo-XX'))dnl +define(`file3', mkstemp(`foo-XXXXXX'))dnl +ifelse(len(defn(`file1')), len(defn(`file2')), + `same length', `different') +@result{}same length +ifelse(defn(`file1'), defn(`file2'), `same', `different file') +@result{}different file +ifelse(defn(`file2'), defn(`file3'), `same', `different file') +@result{}different file +ifelse(defn(`file1'), defn(`file3'), `same', `different file') +@result{}different file +syscmd(`rm 'defn(`file1') defn(`file2') defn(`file3')) +@result{} +sysval +@result{}0 +@end example + +@ignore +@c Not worth documenting, but make sure we don't leave trailing NUL in +@c the expansion. + +@example +syscmd(`rm -rf foodir')sysval +@result{}0 +syscmd(`mkdir foodir')sysval +@result{}0 +len(mkstemp(`foodir/fooXXXXX')) +@result{}16 +syscmd(`rm -r foodir')sysval +@result{}0 +@end example + +@c Likewise, and ensure that traditional mode leaves the result unquoted +@c without creating a file. + +@comment options: -G +@example +syscmd(`rm -f foo-*')sysval +@result{}0 +len(maketemp(`foo-XXXXX')) +@error{}m4:stdin:2: recommend using mkstemp instead +@result{}9 +define(`abc', `def') +@result{} +maketemp(`foo-abc') +@result{}foo-def +@error{}m4:stdin:4: recommend using mkstemp instead +syscmd(`test -f foo-*')ifelse(sysval, `0', `0', `1') +@result{}1 +@end example +@end ignore + +@node Miscellaneous +@chapter Miscellaneous builtin macros + +This chapter describes various builtins, that do not really belong in +any of the previous chapters. + +@menu +* Errprint:: Printing error messages +* Location:: Printing current location +* M4exit:: Exiting from @code{m4} +@end menu + +@node Errprint +@section Printing error messages + +@cindex printing error messages +@cindex error messages, printing +@cindex messages, printing error +@cindex standard error, output to +You can print error messages using @code{errprint}: + +@deffn Builtin errprint (@var{message}, @dots{}) +Prints @var{message} and the rest of the arguments to standard error, +separated by spaces. Standard error is used, regardless of the +@option{--debugfile} option (@pxref{Debugging options, , Invoking m4}). + +The expansion of @code{errprint} is void. +The macro @code{errprint} is recognized only with parameters. +@end deffn + +@example +errprint(`Invalid arguments to forloop +') +@error{}Invalid arguments to forloop +@result{} +errprint(`1')errprint(`2',`3 +') +@error{}12 3 +@result{} +@end example + +A trailing newline is @emph{not} printed automatically, so it should be +supplied as part of the argument, as in the example. Unfortunately, the +exact output of @code{errprint} is not very portable to other @code{m4} +implementations: POSIX requires that all arguments be printed, +but some implementations of @code{m4} only print the first. +Furthermore, some BSD implementations always append a newline +for each @code{errprint} call, regardless of whether the last argument +already had one, and POSIX is silent on whether this is +acceptable. + +@node Location +@section Printing current location + +@cindex location, input +@cindex input location +To make it possible to specify the location of an error, three +utility builtins exist: + +@deffn Builtin __file__ +@deffnx Builtin __line__ +@deffnx Builtin __program__ +Expand to the quoted name of the current input file, the +current input line number in that file, and the quoted name of the +current invocation of @code{m4}. +@end deffn + +@example +errprint(__program__:__file__:__line__: `input error +') +@error{}m4:stdin:1: input error +@result{} +@end example + +Line numbers start at 1 for each file. If the file was found due to the +@option{-I} option or @env{M4PATH} environment variable, that is +reflected in the file name. The syncline option (@option{-s}, +@pxref{Preprocessor features, , Invoking m4}), and the +@samp{f} and @samp{l} flags of @code{debugmode} (@pxref{Debug Levels}), +also use this notion of current file and line. Redefining the three +location macros has no effect on syncline, debug, warning, or error +message output. + +This example reuses the file @file{incl.m4} mentioned earlier +(@pxref{Include}): + +@comment examples +@example +$ @kbd{m4 -I examples} +define(`foo', ``$0' called at __file__:__line__') +@result{} +foo +@result{}foo called at stdin:2 +include(`incl.m4') +@result{}Include file start +@result{}foo called at examples/incl.m4:2 +@result{}Include file end +@result{} +@end example + +The location of macros invoked during the rescanning of macro expansion +text corresponds to the location in the file where the expansion was +triggered, regardless of how many newline characters the expansion text +contains. As of GNU M4 1.4.8, the location of text wrapped +with @code{m4wrap} (@pxref{M4wrap}) is the point at which the +@code{m4wrap} was invoked. Previous versions, however, behaved as +though wrapped text came from line 0 of the file ``''. + +@example +define(`echo', `$@@') +@result{} +define(`foo', `echo(__line__ +__line__)') +@result{} +echo(__line__ +__line__) +@result{}4 +@result{}5 +m4wrap(`foo +') +@result{} +foo(errprint(__line__ +__line__ +)) +@error{}8 +@error{}9 +@result{}8 +@result{}8 +__line__ +@result{}11 +m4wrap(`__line__ +') +@result{} +^D +@result{}12 +@result{}6 +@result{}6 +@end example + +The @code{@w{__program__}} macro behaves like @samp{$0} in shell +terminology. If you invoke @code{m4} through an absolute path or a link +with a different spelling, rather than by relying on a @env{PATH} search +for plain @samp{m4}, it will affect how @code{@w{__program__}} expands. +The intent is that you can use it to produce error messages with the +same formatting that @code{m4} produces internally. It can also be used +within @code{syscmd} (@pxref{Syscmd}) to pick the same version of +@code{m4} that is currently running, rather than whatever version of +@code{m4} happens to be first in @env{PATH}. It was first introduced in +GNU M4 1.4.6. + +@node M4exit +@section Exiting from @code{m4} + +@cindex exiting from @code{m4} +@cindex status, setting @code{m4} exit +If you need to exit from @code{m4} before the entire input has been +read, you can use @code{m4exit}: + +@deffn Builtin m4exit (@dvar{code, 0}) +Causes @code{m4} to exit, with exit status @var{code}. If @var{code} is +left out, the exit status is zero. If @var{code} cannot be parsed, or +is outside the range of 0 to 255, the exit status is one. No further +input is read, and all wrapped and diverted text is discarded. +@end deffn + +@example +m4wrap(`This text is lost due to `m4exit'.') +@result{} +divert(`1') So is this. +divert +@result{} +m4exit And this is never read. +@end example + +A common use of this is to abort processing: + +@deffn Composite fatal_error (@var{message}) +Abort processing with an error message and non-zero status. Prefix +@var{message} with details about where the error occurred, and print the +resulting string to standard error. +@end deffn + +@comment status: 1 +@example +define(`fatal_error', + `errprint(__program__:__file__:__line__`: fatal error: $* +')m4exit(`1')') +@result{} +fatal_error(`this is a BAD one, buster') +@error{}m4:stdin:4: fatal error: this is a BAD one, buster +@end example + +After this macro call, @code{m4} will exit with exit status 1. This macro +is only intended for error exits, since the normal exit procedures are +not followed, i.e., diverted text is not undiverted, and saved text +(@pxref{M4wrap}) is not reread. (This macro could be made more robust +to earlier versions of @code{m4}. You should try to see if you can find +weaknesses and correct them; or @pxref{Improved fatal_error, , Answers}). + +Note that it is still possible for the exit status to be different than +what was requested by @code{m4exit}. If @code{m4} detects some other +error, such as a write error on standard output, the exit status will be +non-zero even if @code{m4exit} requested zero. + +If standard input is seekable, then the file will be positioned at the +next unread character. If it is a pipe or other non-seekable file, +then there are no guarantees how much data @code{m4} might have read +into buffers, and thus discarded. + +@node Frozen files +@chapter Fast loading of frozen state + +Some bigger @code{m4} applications may be built over a common base +containing hundreds of definitions and other costly initializations. +Usually, the common base is kept in one or more declarative files, +which files are listed on each @code{m4} invocation prior to the +user's input file, or else each input file uses @code{include}. + +Reading the common base of a big application, over and over again, may +be time consuming. GNU @code{m4} offers some machinery to +speed up the start of an application using lengthy common bases. + +@menu +* Using frozen files:: Using frozen files +* Frozen file format:: Frozen file format +@end menu + +@node Using frozen files +@section Using frozen files + +@cindex fast loading of frozen files +@cindex frozen files for fast loading +@cindex initialization, frozen state +@cindex dumping into frozen file +@cindex reloading a frozen file +@cindex GNU extensions +Suppose a user has a library of @code{m4} initializations in +@file{base.m4}, which is then used with multiple input files: + +@comment ignore +@example +$ @kbd{m4 base.m4 input1.m4} +$ @kbd{m4 base.m4 input2.m4} +$ @kbd{m4 base.m4 input3.m4} +@end example + +Rather than spending time parsing the fixed contents of @file{base.m4} +every time, the user might rather execute: + +@comment ignore +@example +$ @kbd{m4 -F base.m4f base.m4} +@end example + +@noindent +once, and further execute, as often as needed: + +@comment ignore +@example +$ @kbd{m4 -R base.m4f input1.m4} +$ @kbd{m4 -R base.m4f input2.m4} +$ @kbd{m4 -R base.m4f input3.m4} +@end example + +@noindent +with the varying input. The first call, containing the @option{-F} +option, only reads and executes file @file{base.m4}, defining +various application macros and computing other initializations. +Once the input file @file{base.m4} has been completely processed, GNU +@code{m4} produces in @file{base.m4f} a @dfn{frozen} file, that is, a +file which contains a kind of snapshot of the @code{m4} internal state. + +Later calls, containing the @option{-R} option, are able to reload +the internal state of @code{m4}, from @file{base.m4f}, +@emph{prior} to reading any other input files. This means +instead of starting with a virgin copy of @code{m4}, input will be +read after having effectively recovered the effect of a prior run. +In our example, the effect is the same as if file @file{base.m4} has +been read anew. However, this effect is achieved a lot faster. + +Only one frozen file may be created or read in any one @code{m4} +invocation. It is not possible to recover two frozen files at once. +However, frozen files may be updated incrementally, through using +@option{-R} and @option{-F} options simultaneously. For example, if +some care is taken, the command: + +@comment ignore +@example +$ @kbd{m4 file1.m4 file2.m4 file3.m4 file4.m4} +@end example + +@noindent +could be broken down in the following sequence, accumulating the same +output: + +@comment ignore +@example +$ @kbd{m4 -F file1.m4f file1.m4} +$ @kbd{m4 -R file1.m4f -F file2.m4f file2.m4} +$ @kbd{m4 -R file2.m4f -F file3.m4f file3.m4} +$ @kbd{m4 -R file3.m4f file4.m4} +@end example + +Some care is necessary because not every effort has been made for +this to work in all cases. In particular, the trace attribute of +macros is not handled, nor the current setting of @code{changeword}. +Currently, @code{m4wrap} and @code{sysval} also have problems. +Also, interactions for some options of @code{m4}, being used in one call +and not in the next, have not been fully analyzed yet. On the other +end, you may be confident that stacks of @code{pushdef} definitions +are handled correctly, as well as undefined or renamed builtins, and +changed strings for quotes or comments. And future releases of +GNU M4 will improve on the utility of frozen files. + +@ignore +@c This example is not worth putting in the manual, but caused core +@c dumps in all versions prior to 1.4.11. + +@comment options: -F /dev/null +@example +traceon(`undefined')dnl +@end example + +@c Make sure freezing is successful. + +@example +ifdef(`__unix__', , + `errprint(` skipping: syscmd does not have unix semantics +')m4exit(`77')')dnl +changequote(`[', `]')dnl +syscmd([echo 'changequote([,])pushdef([divnum],[hi])dnl' \ + | ']__program__[' -F in.m4f \ + && echo 'divnum popdef([divnum])divnum' \ + | ']__program__[' -R in.m4f \ + && rm in.m4f])status sysval +@result{}hi 0 +@result{}status 0 +@end example + +@c Detect inability to freeze. +@c Some systems harden /, and fail with EACCES rather than ENOENT. + +@comment options: -F /none/such +@comment xerr: ignore +@comment status: 1 +@example +$ @kbd{m4 -F /none/such} +^D +@error{}m4: cannot open `/none/such': No such file or directory +@end example +@end ignore + +When an @code{m4} run is to be frozen, the automatic undiversion +which takes place at end of execution is inhibited. Instead, all +positively numbered diversions are saved into the frozen file. +The active diversion number is also transmitted. + +A frozen file to be reloaded need not reside in the current directory. +It is looked up the same way as an @code{include} file (@pxref{Search +Path}). + +If the frozen file was generated with a newer version of @code{m4}, and +contains directives that an older @code{m4} cannot parse, attempting to +load the frozen file with option @option{-R} will cause @code{m4} to +exit with status 63 to indicate version mismatch. + +@node Frozen file format +@section Frozen file format + +@cindex frozen file format +@cindex file format, frozen file +Frozen files are sharable across architectures. It is safe to write +a frozen file on one machine and read it on another, given that the +second machine uses the same or newer version of GNU @code{m4}. +It is conventional, but not required, to give a frozen file the suffix +of @code{.m4f}. + +These are simple (editable) text files, made up of directives, +each starting with a capital letter and ending with a newline +(@key{NL}). Wherever a directive is expected, the character +@samp{#} introduces a comment line; empty lines are also ignored if they +are not part of an embedded string. +In the following descriptions, each @var{len} refers to the length of +the corresponding strings @var{str} in the next line of input. Numbers +are always expressed in decimal. There are no escape characters. The +directives are: + +@table @code +@item C @var{len1} , @var{len2} @key{NL} @var{str1} @var{str2} @key{NL} +Uses @var{str1} and @var{str2} as the begin-comment and +end-comment strings. If omitted, then @samp{#} and @key{NL} are the +comment delimiters. + +@item D @var{number}, @var{len} @key{NL} @var{str} @key{NL} +Selects diversion @var{number}, making it current, then copy +@var{str} in the current diversion. @var{number} may be a negative +number for a non-existing diversion. To merely specify an active +selection, use this command with an empty @var{str}. With 0 as the +diversion @var{number}, @var{str} will be issued on standard output +at reload time. GNU @code{m4} will not produce the @samp{D} +directive with non-zero length for diversion 0, but this can be done +with manual edits. This directive may +appear more than once for the same diversion, in which case the +diversion is the concatenation of the various uses. If omitted, then +diversion 0 is current. + +@item F @var{len1} , @var{len2} @key{NL} @var{str1} @var{str2} @key{NL} +Defines, through @code{pushdef}, a definition for @var{str1} +expanding to the function whose builtin name is @var{str2}. If the +builtin does not exist (for example, if the frozen file was produced by +a copy of @code{m4} compiled with changeword support, but the version +of @code{m4} reloading was compiled without it), the reload is silent, +but any subsequent use of the definition of @var{str1} will result in +a warning. This directive may appear more than once for the same name, +and its order, along with @samp{T}, is important. If omitted, you will +have no access to any builtins. + +@item Q @var{len1} , @var{len2} @key{NL} @var{str1} @var{str2} @key{NL} +Uses @var{str1} and @var{str2} as the begin-quote and end-quote +strings. If omitted, then @samp{`} and @samp{'} are the quote +delimiters. + +@item T @var{len1} , @var{len2} @key{NL} @var{str1} @var{str2} @key{NL} +Defines, though @code{pushdef}, a definition for @var{str1} +expanding to the text given by @var{str2}. This directive may appear +more than once for the same name, and its order, along with @samp{F}, is +important. + +@item V @var{number} @key{NL} +Confirms the format of the file. @code{m4} @value{VERSION} only creates +and understands frozen files where @var{number} is 1. This directive +must be the first non-comment in the file, and may not appear more than +once. +@end table + +@node Compatibility +@chapter Compatibility with other versions of @code{m4} + +@cindex compatibility +This chapter describes the many of the differences between this +implementation of @code{m4}, and of other implementations found under +UNIX, such as System V Release 3, Solaris, and BSD flavors. +In particular, it lists the known differences and extensions to +POSIX. However, the list is not necessarily comprehensive. + +At the time of this writing, POSIX 2001 (also known as IEEE +Std 1003.1-2001) is the latest standard, although a new version of +POSIX is under development and includes several proposals for +modifying what @code{m4} is required to do. The requirements for +@code{m4} are shared between SUSv3 and POSIX, and +can be viewed at +@uref{http://www.opengroup.org/onlinepubs/@/000095399/@/utilities/@/m4.html}. + +@menu +* Extensions:: Extensions in GNU M4 +* Incompatibilities:: Facilities in System V m4 not in GNU M4 +* Other Incompatibilities:: Other incompatibilities +@end menu + +@node Extensions +@section Extensions in GNU M4 + +@cindex GNU extensions +@cindex POSIX +This version of @code{m4} contains a few facilities that do not exist +in System V @code{m4}. These extra facilities are all suppressed by +using the @option{-G} command line option (@pxref{Limits control, , +Invoking m4}), unless overridden by other command line options. + +@itemize @bullet +@item +In the @code{$@var{n}} notation for macro arguments, @var{n} can contain +several digits, while the System V @code{m4} only accepts one digit. +This allows macros in GNU @code{m4} to take any number of +arguments, and not only nine (@pxref{Arguments}). + +This means that @code{define(`foo', `$11')} is ambiguous between +implementations. To portably choose between grabbing the first +parameter and appending 1 to the expansion, or grabbing the eleventh +parameter, you can do the following: + +@example +define(`a1', `A1') +@result{} +dnl First argument, concatenated with 1 +define(`_1', `$1')define(`first1', `_1($@@)1') +@result{} +dnl Eleventh argument, portable +define(`_9', `$9')define(`eleventh', `_9(shift(shift($@@)))') +@result{} +dnl Eleventh argument, GNU style +define(`Eleventh', `$11') +@result{} +first1(`a', `b', `c', `d', `e', `f', `g', `h', `i', `j', `k') +@result{}A1 +eleventh(`a', `b', `c', `d', `e', `f', `g', `h', `i', `j', `k') +@result{}k +Eleventh(`a', `b', `c', `d', `e', `f', `g', `h', `i', `j', `k') +@result{}k +@end example + +@noindent +Also see the @code{argn} macro (@pxref{Shift}). + +@item +The @code{divert} (@pxref{Divert}) macro can manage more than 9 +diversions. GNU @code{m4} treats all positive numbers as valid +diversions, rather than discarding diversions greater than 9. + +@item +Files included with @code{include} and @code{sinclude} are sought in a +user specified search path, if they are not found in the working +directory. The search path is specified by the @option{-I} option and the +@env{M4PATH} environment variable (@pxref{Search Path}). + +@item +Arguments to @code{undivert} can be non-numeric, in which case the named +file will be included uninterpreted in the output (@pxref{Undivert}). + +@item +Formatted output is supported through the @code{format} builtin, which +is modeled after the C library function @code{printf} (@pxref{Format}). + +@item +Searches and text substitution through basic regular expressions are +supported by the @code{regexp} (@pxref{Regexp}) and @code{patsubst} +(@pxref{Patsubst}) builtins. Some BSD implementations use +extended regular expressions instead. + +@item +The output of shell commands can be read into @code{m4} with +@code{esyscmd} (@pxref{Esyscmd}). + +@item +There is indirect access to any builtin macro with @code{builtin} +(@pxref{Builtin}). + +@item +Macros can be called indirectly through @code{indir} (@pxref{Indir}). + +@item +The name of the program, the current input file, and the current input +line number are accessible through the builtins @code{@w{__program__}}, +@code{@w{__file__}}, and @code{@w{__line__}} (@pxref{Location}). + +@item +The format of the output from @code{dumpdef} and macro tracing can be +controlled with @code{debugmode} (@pxref{Debug Levels}). + +@item +The destination of trace and debug output can be controlled with +@code{debugfile} (@pxref{Debug Output}). + +@item +The @code{maketemp} (@pxref{Mkstemp}) macro behaves like @code{mkstemp}, +creating a new file with a unique name on every invocation, rather than +following the insecure behavior of replacing the trailing @samp{X} +characters with the @code{m4} process id. + +@item +POSIX only requires support for the command line options +@option{-s}, @option{-D}, and @option{-U}, so all other options accepted +by GNU M4 are extensions. @xref{Invoking m4}, for a +description of these options. + +The debugging and tracing facilities in GNU @code{m4} are much +more extensive than in most other versions of @code{m4}. +@end itemize + +@node Incompatibilities +@section Facilities in System V @code{m4} not in GNU @code{m4} + +The version of @code{m4} from System V contains a few facilities that +have not been implemented in GNU @code{m4} yet. Additionally, +POSIX requires some behaviors that GNU @code{m4} has not +implemented yet. Relying on these behaviors is non-portable, as a +future release of GNU @code{m4} may change. + +@itemize @bullet +@item +POSIX requires support for multiple arguments to @code{defn}, +without any clarification on how @code{defn} behaves when one of the +multiple arguments names a builtin. System V @code{m4} and some other +implementations allow mixing builtins and text macros into a single +macro. GNU @code{m4} only supports joining multiple text +arguments, although a future implementation may lift this restriction to +behave more like System V@. The only portable way to join text macros +with builtins is via helper macros and implicit concatenation of macro +results. + +@item +POSIX requires an application to exit with non-zero status if +it wrote an error message to stderr. This has not yet been consistently +implemented for the various builtins that are required to issue an error +(such as @code{eval} (@pxref{Eval}) when an argument cannot be parsed). + +@item +Some traditional implementations only allow reading standard input +once, but GNU @code{m4} correctly handles multiple instances +of @samp{-} on the command line. + +@item +POSIX requires @code{m4wrap} (@pxref{M4wrap}) to act in FIFO +(first-in, first-out) order, but GNU @code{m4} currently uses +LIFO order. Furthermore, POSIX states that only the first +argument to @code{m4wrap} is saved for later evaluation, but +GNU @code{m4} saves and processes all arguments, with output +separated by spaces. + +@item +POSIX states that builtins that require arguments, but are +called without arguments, have undefined behavior. Traditional +implementations simply behave as though empty strings had been passed. +For example, @code{a`'define`'b} would expand to @code{ab}. But +GNU @code{m4} ignores certain builtins if they have missing +arguments, giving @code{adefineb} for the above example. + +@item +Traditional implementations handle @code{define(`f',`1')} (@pxref{Define}) +by undefining the entire stack of previous definitions, and if doing +@code{undefine(`f')} first. GNU @code{m4} replaces just the top +definition on the stack, as if doing @code{popdef(`f')} followed by +@code{pushdef(`f',`1')}. POSIX allows either behavior. + +@item +POSIX 2001 requires @code{syscmd} (@pxref{Syscmd}) to evaluate +command output for macro expansion, but this was a mistake that is +anticipated to be corrected in the next version of POSIX. +GNU @code{m4} follows traditional behavior in @code{syscmd} +where output is not rescanned, and provides the extension @code{esyscmd} +that does scan the output. + +@item +At one point, POSIX required @code{changequote(@var{arg})} +(@pxref{Changequote}) to use newline as the close quote, but this was a +bug, and the next version of POSIX is anticipated to state +that using empty strings or just one argument is unspecified. +Meanwhile, the GNU @code{m4} behavior of treating an empty +end-quote delimiter as @samp{'} is not portable, as Solaris treats it as +repeating the start-quote delimiter, and BSD treats it as leaving the +previous end-quote delimiter unchanged. For predictable results, never +call changequote with just one argument, or with empty strings for +arguments. + +@item +At one point, POSIX required @code{changecom(@var{arg},)} +(@pxref{Changecom}) to make it impossible to end a comment, but this is +a bug, and the next version of POSIX is anticipated to state +that using empty strings is unspecified. Meanwhile, the GNU +@code{m4} behavior of treating an empty end-comment delimiter as newline +is not portable, as BSD treats it as leaving the previous end-comment +delimiter unchanged. It is also impossible in BSD implementations to +disable comments, even though that is required by POSIX. For +predictable results, never call changecom with empty strings for +arguments. + +@item +Most implementations of @code{m4} give macros a higher precedence than +comments when parsing, meaning that if the start delimiter given to +@code{changecom} (@pxref{Changecom}) starts with a macro name, comments +are effectively disabled. POSIX does not specify what the +precedence is, so this version of GNU @code{m4} parser +recognizes comments, then macros, then quoted strings. + +@item +Traditional implementations allow argument collection, but not string +and comment processing, to span file boundaries. Thus, if @file{a.m4} +contains @samp{len(}, and @file{b.m4} contains @samp{abc)}, +@kbd{m4 a.m4 b.m4} outputs @samp{3} with traditional @code{m4}, but +gives an error message that the end of file was encountered inside a +macro with GNU @code{m4}. On the other hand, traditional +implementations do end of file processing for files included with +@code{include} or @code{sinclude} (@pxref{Include}), while GNU +@code{m4} seamlessly integrates the content of those files. Thus +@code{include(`a.m4')include(`b.m4')} will output @samp{3} instead of +giving an error. + +@item +Traditional @code{m4} treats @code{traceon} (@pxref{Trace}) without +arguments as a global variable, independent of named macro tracing. +Also, once a macro is undefined, named tracing of that macro is lost. +On the other hand, when GNU @code{m4} encounters +@code{traceon} without +arguments, it turns tracing on for all existing definitions at the time, +but does not trace future definitions; @code{traceoff} without arguments +turns tracing off for all definitions regardless of whether they were +also traced by name; and tracing by name, such as with @option{-tfoo} at +the command line or @code{traceon(`foo')} in the input, is an attribute +that is preserved even if the macro is currently undefined. + +Additionally, while POSIX requires trace output, it makes no +demands on the formatting of that output. Parsing trace output is not +guaranteed to be reliable, even between different releases of +GNU M4; however, the intent is that any future changes in +trace output will only occur under the direction of additional +@code{debugmode} flags (@pxref{Debug Levels}). + +@item +POSIX requires @code{eval} (@pxref{Eval}) to treat all +operators with the same precedence as C@. However, earlier versions of +GNU @code{m4} followed the traditional behavior of other +@code{m4} implementations, where bitwise and logical negation (@samp{~} +and @samp{!}) have lower precedence than equality operators; and where +equality operators (@samp{==} and @samp{!=}) had the same precedence as +relational operators (such as @samp{<}). Use explicit parentheses to +ensure proper precedence. As extensions to POSIX, +GNU @code{m4} gives well-defined semantics to operations that +C leaves undefined, such as when overflow occurs, when shifting negative +numbers, or when performing division by zero. POSIX also +requires @samp{=} to cause an error, but many traditional +implementations allowed it as an alias for @samp{==}. + +@item +POSIX 2001 requires @code{translit} (@pxref{Translit}) to +treat each character of the second and third arguments literally. +However, it is anticipated that the next version of POSIX will +allow the GNU @code{m4} behavior of treating @samp{-} as a +range operator. + +@item +POSIX requires @code{m4} to honor the locale environment +variables of @env{LANG}, @env{LC_ALL}, @env{LC_CTYPE}, +@env{LC_MESSAGES}, and @env{NLSPATH}, but this has not yet been +implemented in GNU @code{m4}. + +@item +POSIX states that only unquoted leading newlines and blanks +(that is, space and tab) are ignored when collecting macro arguments. +However, this appears to be a bug in POSIX, since most +traditional implementations also ignore all whitespace (formfeed, +carriage return, and vertical tab). GNU @code{m4} follows +tradition and ignores all leading unquoted whitespace. + +@item +@cindex @env{POSIXLY_CORRECT} +A strictly-compliant POSIX client is not allowed to use +command-line arguments not specified by POSIX. However, since +this version of M4 ignores @env{POSIXLY_CORRECT} and enables the option +@code{--gnu} by default (@pxref{Limits control, , Invoking m4}), a +client desiring to be strictly compliant has no way to disable +GNU extensions that conflict with POSIX when +directly invoking the compiled @code{m4}. A future version of +@code{GNU} M4 will honor the environment variable @env{POSIXLY_CORRECT}, +implicitly enabling @option{--traditional} if it is set, in order to +allow a strictly-compliant client. In the meantime, a client needing +strict POSIX compliance can use the workaround of invoking a +shell script wrapper, where the wrapper then adds @option{--traditional} +to the arguments passed to the compiled @code{m4}. +@end itemize + +@node Other Incompatibilities +@section Other incompatibilities + +There are a few other incompatibilities between this implementation of +@code{m4}, and the System V version. + +@itemize @bullet +@item +GNU @code{m4} implements sync lines differently from System V +@code{m4}, when text is being diverted. GNU @code{m4} outputs +the sync lines when the text is being diverted, and System V @code{m4} +when the diverted text is being brought back. + +The problem is which lines and file names should be attached to text +that is being, or has been, diverted. System V @code{m4} regards all +the diverted text as being generated by the source line containing the +@code{undivert} call, whereas GNU @code{m4} regards the +diverted text as being generated at the time it is diverted. + +The sync line option is used mostly when using @code{m4} as +a front end to a compiler. If a diverted line causes a compiler error, +the error messages should most probably refer to the place where the +diversion was made, and not where it was inserted again. + +@comment options: -s +@example +divert(2)2 +divert(1)1 +divert`'0 +@result{}#line 3 "stdin" +@result{}0 +^D +@result{}#line 2 "stdin" +@result{}1 +@result{}#line 1 "stdin" +@result{}2 +@end example + +The current @code{m4} implementation has a limitation that the syncline +output at the start of each diversion occurs no matter what, even if the +previous diversion did not end with a newline. This goes contrary to +the claim that synclines appear on a line by themselves, so this +limitation may be corrected in a future version of @code{m4}. In the +meantime, when using @option{-s}, it is wisest to make sure all +diversions end with newline. + +@item +GNU @code{m4} makes no attempt at prohibiting self-referential +definitions like: + +@example +define(`x', `x') +@result{} +define(`x', `x ') +@result{} +@end example + +@cindex rescanning +There is nothing inherently wrong with defining @samp{x} to +return @samp{x}. The wrong thing is to expand @samp{x} unquoted, +because that would cause an infinite rescan loop. +In @code{m4}, one might use macros to hold strings, as we do for +variables in other programming languages, further checking them with: + +@comment ignore +@example +ifelse(defn(`@var{holder}'), `@var{value}', @dots{}) +@end example + +@noindent +In cases like this one, an interdiction for a macro to hold its own name +would be a useless limitation. Of course, this leaves more rope for the +GNU @code{m4} user to hang himself! Rescanning hangs may be +avoided through careful programming, a little like for endless loops in +traditional programming languages. +@end itemize + +@node Answers +@chapter Correct version of some examples + +Some of the examples in this manuals are buggy or not very robust, for +demonstration purposes. Improved versions of these composite macros are +presented here. + +@menu +* Improved exch:: Solution for @code{exch} +* Improved forloop:: Solution for @code{forloop} +* Improved foreach:: Solution for @code{foreach} +* Improved copy:: Solution for @code{copy} +* Improved m4wrap:: Solution for @code{m4wrap} +* Improved cleardivert:: Solution for @code{cleardivert} +* Improved capitalize:: Solution for @code{capitalize} +* Improved fatal_error:: Solution for @code{fatal_error} +@end menu + +@node Improved exch +@section Solution for @code{exch} + +The @code{exch} macro (@pxref{Arguments}) as presented requires clients +to double quote their arguments. A nicer definition, which lets +clients follow the rule of thumb of one level of quoting per level of +parentheses, involves adding quotes in the definition of @code{exch}, as +follows: + +@example +define(`exch', ``$2', `$1'') +@result{} +define(exch(`expansion text', `macro')) +@result{} +macro +@result{}expansion text +@end example + +@node Improved forloop +@section Solution for @code{forloop} + +The @code{forloop} macro (@pxref{Forloop}) as presented earlier can go +into an infinite loop if given an iterator that is not parsed as a macro +name. It does not do any sanity checking on its numeric bounds, and +only permits decimal numbers for bounds. Here is an improved version, +shipped as @file{m4-@value{VERSION}/@/examples/@/forloop2.m4}; this +version also optimizes overhead by calling four macros instead of six +per iteration (excluding those in @var{text}), by not dereferencing the +@var{iterator} in the helper @code{@w{_forloop}}. + +@comment examples +@example +$ @kbd{m4 -d -I examples} +undivert(`forloop2.m4')dnl +@result{}divert(`-1') +@result{}# forloop(var, from, to, stmt) - improved version: +@result{}# works even if VAR is not a strict macro name +@result{}# performs sanity check that FROM is larger than TO +@result{}# allows complex numerical expressions in TO and FROM +@result{}define(`forloop', `ifelse(eval(`($2) <= ($3)'), `1', +@result{} `pushdef(`$1')_$0(`$1', eval(`$2'), +@result{} eval(`$3'), `$4')popdef(`$1')')') +@result{}define(`_forloop', +@result{} `define(`$1', `$2')$4`'ifelse(`$2', `$3', `', +@result{} `$0(`$1', incr(`$2'), `$3', `$4')')') +@result{}divert`'dnl +include(`forloop2.m4') +@result{} +forloop(`i', `2', `1', `no iteration occurs') +@result{} +forloop(`', `1', `2', ` odd iterator name') +@result{} odd iterator name odd iterator name +forloop(`i', `5 + 5', `0xc', ` 0x`'eval(i, `16')') +@result{} 0xa 0xb 0xc +forloop(`i', `a', `b', `non-numeric bounds') +@error{}m4:stdin:6: bad expression in eval (bad input): (a) <= (b) +@result{} +@end example + +One other change to notice is that the improved version used @samp{_$0} +rather than @samp{_foreach} to invoke the helper routine. In general, +this is a good practice to follow, because then the set of macros can be +uniformly transformed. The following example shows a transformation +that doubles the current quoting and appends a suffix @samp{2} to each +transformed macro. If @code{foreach} refers to the literal +@samp{_foreach}, then @code{foreach2} invokes @code{_foreach} instead of +the intended @code{_foreach2}, and the mixing of quoting paradigms leads +to an infinite recursion loop in this example. + +@comment options: -L9 +@comment status: 1 +@comment examples +@example +$ @kbd{m4 -d -L 9 -I examples} +define(`arg1', `$1')include(`forloop2.m4')include(`quote.m4') +@result{} +define(`double', `define(`$1'`2', + arg1(patsubst(dquote(defn(`$1')), `[`']', `\&\&')))') +@result{} +double(`forloop')double(`_forloop')defn(`forloop2') +@result{}ifelse(eval(``($2) <= ($3)''), ``1'', +@result{} ``pushdef(``$1'')_$0(``$1'', eval(``$2''), +@result{} eval(``$3''), ``$4'')popdef(``$1'')'') +forloop(i, 1, 5, `ifelse(')forloop(i, 1, 5, `)') +@result{} +changequote(`[', `]')changequote([``], ['']) +@result{} +forloop2(i, 1, 5, ``ifelse('')forloop2(i, 1, 5, ``)'') +@result{} +changequote`'include(`forloop.m4') +@result{} +double(`forloop')double(`_forloop')defn(`forloop2') +@result{}pushdef(``$1'', ``$2'')_forloop($@@)popdef(``$1'') +forloop(i, 1, 5, `ifelse(')forloop(i, 1, 5, `)') +@result{} +changequote(`[', `]')changequote([``], ['']) +@result{} +forloop2(i, 1, 5, ``ifelse('')forloop2(i, 1, 5, ``)'') +@error{}m4:stdin:12: recursion limit of 9 exceeded, use -L<N> to change it +@end example + +One more optimization is still possible. Instead of repeatedly +assigning a variable then invoking or dereferencing it, it is possible +to pass the current iterator value as a single argument. Coupled with +@code{curry} if other arguments are needed (@pxref{Composition}), or +with helper macros if the argument is needed in more than one place in +the expansion, the output can be generated with three, rather than four, +macros of overhead per iteration. Notice how the file +@file{m4-@value{VERSION}/@/examples/@/forloop3.m4} rearranges the +arguments of the helper @code{_forloop} to take two arguments that are +placed around the current value. By splitting a balanced set of +parantheses across multiple arguments, the helper macro can now be +shared by @code{forloop} and the new @code{forloop_arg}. + +@comment examples +@example +$ @kbd{m4 -I examples} +include(`forloop3.m4') +@result{} +undivert(`forloop3.m4')dnl +@result{}divert(`-1') +@result{}# forloop_arg(from, to, macro) - invoke MACRO(value) for +@result{}# each value between FROM and TO, without define overhead +@result{}define(`forloop_arg', `ifelse(eval(`($1) <= ($2)'), `1', +@result{} `_forloop(`$1', eval(`$2'), `$3(', `)')')') +@result{}# forloop(var, from, to, stmt) - refactored to share code +@result{}define(`forloop', `ifelse(eval(`($2) <= ($3)'), `1', +@result{} `pushdef(`$1')_forloop(eval(`$2'), eval(`$3'), +@result{} `define(`$1',', `)$4')popdef(`$1')')') +@result{}define(`_forloop', +@result{} `$3`$1'$4`'ifelse(`$1', `$2', `', +@result{} `$0(incr(`$1'), `$2', `$3', `$4')')') +@result{}divert`'dnl +forloop(`i', `1', `3', ` i') +@result{} 1 2 3 +define(`echo', `$@@') +@result{} +forloop_arg(`1', `3', ` echo') +@result{} 1 2 3 +include(`curry.m4') +@result{} +forloop_arg(`1', `3', `curry(`pushdef', `a')') +@result{} +a +@result{}3 +popdef(`a')a +@result{}2 +popdef(`a')a +@result{}1 +popdef(`a')a +@result{}a +@end example + +Of course, it is possible to make even more improvements, such as +adding an optional step argument, or allowing iteration through +descending sequences. GNU Autoconf provides some of these +additional bells and whistles in its @code{m4_for} macro. + +@node Improved foreach +@section Solution for @code{foreach} + +The @code{foreach} and @code{foreachq} macros (@pxref{Foreach}) as +presented earlier each have flaws. First, we will examine and fix the +quadratic behavior of @code{foreachq}: + +@comment examples +@example +$ @kbd{m4 -I examples} +include(`foreachq.m4') +@result{} +traceon(`shift')debugmode(`aq') +@result{} +foreachq(`x', ``1', `2', `3', `4'', `x +')dnl +@result{}1 +@error{}m4trace: -3- shift(`1', `2', `3', `4') +@error{}m4trace: -2- shift(`1', `2', `3', `4') +@result{}2 +@error{}m4trace: -4- shift(`1', `2', `3', `4') +@error{}m4trace: -3- shift(`2', `3', `4') +@error{}m4trace: -3- shift(`1', `2', `3', `4') +@error{}m4trace: -2- shift(`2', `3', `4') +@result{}3 +@error{}m4trace: -5- shift(`1', `2', `3', `4') +@error{}m4trace: -4- shift(`2', `3', `4') +@error{}m4trace: -3- shift(`3', `4') +@error{}m4trace: -4- shift(`1', `2', `3', `4') +@error{}m4trace: -3- shift(`2', `3', `4') +@error{}m4trace: -2- shift(`3', `4') +@result{}4 +@error{}m4trace: -6- shift(`1', `2', `3', `4') +@error{}m4trace: -5- shift(`2', `3', `4') +@error{}m4trace: -4- shift(`3', `4') +@error{}m4trace: -3- shift(`4') +@end example + +@cindex quadratic behavior, avoiding +@cindex avoiding quadratic behavior +Each successive iteration was adding more quoted @code{shift} +invocations, and the entire list contents were passing through every +iteration. In general, when recursing, it is a good idea to make the +recursion use fewer arguments, rather than adding additional quoted +uses of @code{shift}. By doing so, @code{m4} uses less memory, invokes +fewer macros, is less likely to run into machine limits, and most +importantly, performs faster. The fixed version of @code{foreachq} can +be found in @file{m4-@value{VERSION}/@/examples/@/foreachq2.m4}: + +@comment examples +@example +$ @kbd{m4 -I examples} +include(`foreachq2.m4') +@result{} +undivert(`foreachq2.m4')dnl +@result{}include(`quote.m4')dnl +@result{}divert(`-1') +@result{}# foreachq(x, `item_1, item_2, ..., item_n', stmt) +@result{}# quoted list, improved version +@result{}define(`foreachq', `pushdef(`$1')_$0($@@)popdef(`$1')') +@result{}define(`_arg1q', ``$1'') +@result{}define(`_rest', `ifelse(`$#', `1', `', `dquote(shift($@@))')') +@result{}define(`_foreachq', `ifelse(`$2', `', `', +@result{} `define(`$1', _arg1q($2))$3`'$0(`$1', _rest($2), `$3')')') +@result{}divert`'dnl +traceon(`shift')debugmode(`aq') +@result{} +foreachq(`x', ``1', `2', `3', `4'', `x +')dnl +@result{}1 +@error{}m4trace: -3- shift(`1', `2', `3', `4') +@result{}2 +@error{}m4trace: -3- shift(`2', `3', `4') +@result{}3 +@error{}m4trace: -3- shift(`3', `4') +@result{}4 +@end example + +Note that the fixed version calls unquoted helper macros in +@code{@w{_foreachq}} to trim elements immediately; those helper macros +in turn must re-supply the layer of quotes lost in the macro invocation. +Contrast the use of @code{@w{_arg1q}}, which quotes the first list +element, with @code{@w{_arg1}} of the earlier implementation that +returned the first list element directly. Additionally, by calling the +helper method immediately, the @samp{defn(`@var{iterator}')} no longer +contains unexpanded macros. + +The astute m4 programmer might notice that the solution above still uses +more memory and macro invocations, and thus more time, than strictly +necessary. Note that @samp{$2}, which contains an arbitrarily long +quoted list, is expanded and rescanned three times per iteration of +@code{_foreachq}. Furthermore, every iteration of the algorithm +effectively unboxes then reboxes the list, which costs a couple of macro +invocations. It is possible to rewrite the algorithm for a bit more +speed by swapping the order of the arguments to @code{_foreachq} in +order to operate on an unboxed list in the first place, and by using the +fixed-length @samp{$#} instead of an arbitrary length list as the key to +end recursion. The result is an overhead of six macro invocations per +loop (excluding any macros in @var{text}), instead of eight. This +alternative approach is available as +@file{m4-@value{VERSION}/@/examples/@/foreach3.m4}: + +@comment examples +@example +$ @kbd{m4 -I examples} +include(`foreachq3.m4') +@result{} +undivert(`foreachq3.m4')dnl +@result{}divert(`-1') +@result{}# foreachq(x, `item_1, item_2, ..., item_n', stmt) +@result{}# quoted list, alternate improved version +@result{}define(`foreachq', `ifelse(`$2', `', `', +@result{} `pushdef(`$1')_$0(`$1', `$3', `', $2)popdef(`$1')')') +@result{}define(`_foreachq', `ifelse(`$#', `3', `', +@result{} `define(`$1', `$4')$2`'$0(`$1', `$2', +@result{} shift(shift(shift($@@))))')') +@result{}divert`'dnl +traceon(`shift')debugmode(`aq') +@result{} +foreachq(`x', ``1', `2', `3', `4'', `x +')dnl +@result{}1 +@error{}m4trace: -4- shift(`x', `x +@error{}', `', `1', `2', `3', `4') +@error{}m4trace: -3- shift(`x +@error{}', `', `1', `2', `3', `4') +@error{}m4trace: -2- shift(`', `1', `2', `3', `4') +@result{}2 +@error{}m4trace: -4- shift(`x', `x +@error{}', `1', `2', `3', `4') +@error{}m4trace: -3- shift(`x +@error{}', `1', `2', `3', `4') +@error{}m4trace: -2- shift(`1', `2', `3', `4') +@result{}3 +@error{}m4trace: -4- shift(`x', `x +@error{}', `2', `3', `4') +@error{}m4trace: -3- shift(`x +@error{}', `2', `3', `4') +@error{}m4trace: -2- shift(`2', `3', `4') +@result{}4 +@error{}m4trace: -4- shift(`x', `x +@error{}', `3', `4') +@error{}m4trace: -3- shift(`x +@error{}', `3', `4') +@error{}m4trace: -2- shift(`3', `4') +@end example + +In the current version of M4, every instance of @samp{$@@} is rescanned +as it is encountered. Thus, the @file{foreachq3.m4} alternative uses +much less memory than @file{foreachq2.m4}, and executes as much as 10% +faster, since each iteration encounters fewer @samp{$@@}. However, the +implementation of rescanning every byte in @samp{$@@} is quadratic in +the number of bytes scanned (for example, making the broken version in +@file{foreachq.m4} cubic, rather than quadratic, in behavior). A future +release of M4 will improve the underlying implementation by reusing +results of previous scans, so that both styles of @code{foreachq} can +become linear in the number of bytes scanned. Notice how the +implementation injects an empty argument prior to expanding @samp{$2} +within @code{foreachq}; the helper macro @code{_foreachq} then ignores +the third argument altogether, and ends recursion when there are three +arguments left because there was nothing left to pass through +@code{shift}. Thus, each iteration only needs one @code{ifelse}, rather +than the two conditionals used in the version from @file{foreachq2.m4}. + +@cindex nine arguments, more than +@cindex more than nine arguments +@cindex arguments, more than nine +So far, all of the implementations of @code{foreachq} presented have +been quadratic with M4 1.4.x. But @code{forloop} is linear, because +each iteration parses a constant amount of arguments. So, it is +possible to design a variant that uses @code{forloop} to do the +iteration, then uses @samp{$@@} only once at the end, giving a linear +result even with older M4 implementations. This implementation relies +on the GNU extension that @samp{$10} expands to the tenth +argument rather than the first argument concatenated with @samp{0}. The +trick is to define an intermediate macro that repeats the text +@code{m4_define(`$1', `$@var{n}')$2`'}, with @samp{n} set to successive +integers corresponding to each argument. The helper macro +@code{_foreachq_} is needed in order to generate the literal sequences +such as @samp{$1} into the intermediate macro, rather than expanding +them as the arguments of @code{_foreachq}. With this approach, no +@code{shift} calls are even needed! Even though there are seven macros +of overhead per iteration instead of six in @file{foreachq3.m4}, the +linear scaling is apparent at relatively small list sizes. However, +this approach will need adjustment when a future version of M4 follows +POSIX by no longer treating @samp{$10} as the tenth argument; +the anticipation is that @samp{$@{10@}} can be used instead, although +that alternative syntax is not yet supported. + +@comment examples +@example +$ @kbd{m4 -I examples} +include(`foreachq4.m4') +@result{} +undivert(`foreachq4.m4')dnl +@result{}include(`forloop2.m4')dnl +@result{}divert(`-1') +@result{}# foreachq(x, `item_1, item_2, ..., item_n', stmt) +@result{}# quoted list, version based on forloop +@result{}define(`foreachq', +@result{}`ifelse(`$2', `', `', `_$0(`$1', `$3', $2)')') +@result{}define(`_foreachq', +@result{}`pushdef(`$1', forloop(`$1', `3', `$#', +@result{} `$0_(`1', `2', indir(`$1'))')`popdef( +@result{} `$1')')indir(`$1', $@@)') +@result{}define(`_foreachq_', +@result{}``define(`$$1', `$$3')$$2`''') +@result{}divert`'dnl +traceon(`shift')debugmode(`aq') +@result{} +foreachq(`x', ``1', `2', `3', `4'', `x +')dnl +@result{}1 +@result{}2 +@result{}3 +@result{}4 +@end example + +For yet another approach, the improved version of @code{foreach}, +available in @file{m4-@value{VERSION}/@/examples/@/foreach2.m4}, simply +overquotes the arguments to @code{@w{_foreach}} to begin with, using +@code{dquote_elt}. Then @code{@w{_foreach}} can just use +@code{@w{_arg1}} to remove the extra layer of quoting that was added up +front: + +@comment examples +@example +$ @kbd{m4 -I examples} +include(`foreach2.m4') +@result{} +undivert(`foreach2.m4')dnl +@result{}include(`quote.m4')dnl +@result{}divert(`-1') +@result{}# foreach(x, (item_1, item_2, ..., item_n), stmt) +@result{}# parenthesized list, improved version +@result{}define(`foreach', `pushdef(`$1')_$0(`$1', +@result{} (dquote(dquote_elt$2)), `$3')popdef(`$1')') +@result{}define(`_arg1', `$1') +@result{}define(`_foreach', `ifelse(`$2', `(`')', `', +@result{} `define(`$1', _arg1$2)$3`'$0(`$1', (dquote(shift$2)), `$3')')') +@result{}divert`'dnl +traceon(`shift')debugmode(`aq') +@result{} +foreach(`x', `(`1', `2', `3', `4')', `x +')dnl +@error{}m4trace: -4- shift(`1', `2', `3', `4') +@error{}m4trace: -4- shift(`2', `3', `4') +@error{}m4trace: -4- shift(`3', `4') +@result{}1 +@error{}m4trace: -3- shift(``1'', ``2'', ``3'', ``4'') +@result{}2 +@error{}m4trace: -3- shift(``2'', ``3'', ``4'') +@result{}3 +@error{}m4trace: -3- shift(``3'', ``4'') +@result{}4 +@error{}m4trace: -3- shift(``4'') +@end example + +It is likewise possible to write a variant of @code{foreach} that +performs in linear time on M4 1.4.x; the easiest method is probably +writing a version of @code{foreach} that unboxes its list, then invokes +@code{_foreachq} as previously defined in @file{foreachq4.m4}. + +In summary, recursion over list elements is trickier than it appeared at +first glance, but provides a powerful idiom within @code{m4} processing. +As a final demonstration, both list styles are now able to handle +several scenarios that would wreak havoc on one or both of the original +implementations. This points out one other difference between the +list styles. @code{foreach} evaluates unquoted list elements only once, +in preparation for calling @code{@w{_foreach}}, similary for +@code{foreachq} as provided by @file{foreachq3.m4} or +@file{foreachq4.m4}. But +@code{foreachq}, as provided by @file{foreachq2.m4}, +evaluates unquoted list elements twice while visiting the first list +element, once in @code{@w{_arg1q}} and once in @code{@w{_rest}}. When +deciding which list style to use, one must take into account whether +repeating the side effects of unquoted list elements will have any +detrimental effects. + +@comment examples +@example +$ @kbd{m4 -I examples} +include(`foreach2.m4') +@result{} +include(`foreachq2.m4') +@result{} +dnl 0-element list: +foreach(`x', `', `<x>') / foreachq(`x', `', `<x>') +@result{} /@w{ } +dnl 1-element list of empty element +foreach(`x', `()', `<x>') / foreachq(`x', ``'', `<x>') +@result{}<> / <> +dnl 2-element list of empty elements +foreach(`x', `(`',`')', `<x>') / foreachq(`x', ``',`'', `<x>') +@result{}<><> / <><> +dnl 1-element list of a comma +foreach(`x', `(`,')', `<x>') / foreachq(`x', ``,'', `<x>') +@result{}<,> / <,> +dnl 2-element list of unbalanced parentheses +foreach(`x', `(`(', `)')', `<x>') / foreachq(`x', ``(', `)'', `<x>') +@result{}<(><)> / <(><)> +define(`ab', `oops')dnl using defn(`iterator') +foreach(`x', `(`a', `b')', `defn(`x')') /dnl + foreachq(`x', ``a', `b'', `defn(`x')') +@result{}ab / ab +define(`active', `ACT, IVE') +@result{} +traceon(`active') +@result{} +dnl list of unquoted macros; expansion occurs before recursion +foreach(`x', `(active, active)', `<x> +')dnl +@error{}m4trace: -4- active -> `ACT, IVE' +@error{}m4trace: -4- active -> `ACT, IVE' +@result{}<ACT> +@result{}<IVE> +@result{}<ACT> +@result{}<IVE> +foreachq(`x', `active, active', `<x> +')dnl +@error{}m4trace: -3- active -> `ACT, IVE' +@error{}m4trace: -3- active -> `ACT, IVE' +@result{}<ACT> +@error{}m4trace: -3- active -> `ACT, IVE' +@error{}m4trace: -3- active -> `ACT, IVE' +@result{}<IVE> +@result{}<ACT> +@result{}<IVE> +dnl list of quoted macros; expansion occurs during recursion +foreach(`x', `(`active', `active')', `<x> +')dnl +@error{}m4trace: -1- active -> `ACT, IVE' +@result{}<ACT, IVE> +@error{}m4trace: -1- active -> `ACT, IVE' +@result{}<ACT, IVE> +foreachq(`x', ``active', `active'', `<x> +')dnl +@error{}m4trace: -1- active -> `ACT, IVE' +@result{}<ACT, IVE> +@error{}m4trace: -1- active -> `ACT, IVE' +@result{}<ACT, IVE> +dnl list of double-quoted macro names; no expansion +foreach(`x', `(``active'', ``active'')', `<x> +')dnl +@result{}<active> +@result{}<active> +foreachq(`x', ```active'', ``active''', `<x> +')dnl +@result{}<active> +@result{}<active> +@end example + +@ignore +@comment Not worth putting in the manual, but make sure that foreach +@comment implementations behave, and that final implementation is +@comment linear. + +@comment boxed recursion + +@comment examples +@comment options: -Dlimit=10 -Dverbose +@example +$ @kbd {m4 -I examples -Dlimit=10 -Dverbose} +include(`loop.m4')dnl +@result{} 1 2 3 4 5 6 7 8 9 10 +@end example + +@comment unboxed recursion + +@comment examples +@comment options: -Dlimit=10 -Dverbose -Dalt +@example +$ @kbd {m4 -I examples -Dlimit=10 -Dverbose -Dalt} +include(`loop.m4')dnl +@result{} 1 2 3 4 5 6 7 8 9 10 +@end example + +@comment foreach via forloop recursion + +@comment examples +@comment options: -Dlimit=10 -Dverbose -Dalt=4 +@example +$ @kbd {m4 -I examples -Dlimit=10 -Dverbose -Dalt=4} +include(`loop.m4')dnl +@result{} 1 2 3 4 5 6 7 8 9 10 +@end example + +@comment examples +@comment options: -Dlimit=2500 -Dalt=4 +@example +$ @kbd {m4 -I examples -Dlimit=2500 -Dalt=4} +include(`loop.m4')dnl +@end example + +@comment examples +@comment options: -Dlimit=10000 -Dalt=4 +@example +$ @kbd {m4 -I examples -Dlimit=10000 -Dalt=4} +define(`foo', `divert`'len(popdef(`_foreachq')_foreachq($@@))')dnl +define(`debug', `pushdef(`_foreachq', defn(`foo'))') +@result{} +include(`loop.m4')dnl +@result{}48894 +@end example + +@end ignore + +@node Improved copy +@section Solution for @code{copy} + +The macro @code{copy} presented above +is unable to handle builtin tokens with M4 1.4.x, because it tries to +pass the builtin token through the macro @code{curry}, where it is +silently flattened to an empty string (@pxref{Composition}). Rather +than using the problematic @code{curry} to work around the limitation +that @code{stack_foreach} expects to invoke a macro that takes exactly +one argument, we can write a new macro that lets us form the exact +two-argument @code{pushdef} call sequence needed, so that we are no +longer passing a builtin token through a text macro. + +@deffn Composite stack_foreach_sep (@var{macro}, @var{pre}, @var{post}, @ + @var{sep}) +@deffnx Composite stack_foreach_sep_lifo (@var{macro}, @var{pre}, @ + @var{post}, @var{sep}) +For each of the @code{pushdef} definitions associated with @var{macro}, +expand the sequence @samp{@var{pre}`'definition`'@var{post}}. +Additionally, expand @var{sep} between definitions. +@code{stack_foreach_sep} visits the oldest definition first, while +@code{stack_foreach_sep_lifo} visits the current definition first. The +expansion may dereference @var{macro}, but should not modify it. There +are a few special macros, such as @code{defn}, which cannot be used as +the @var{macro} parameter. +@end deffn + +Note that @code{stack_foreach(`@var{macro}', `@var{action}')} is +equivalent to @code{stack_foreach_sep(`@var{macro}', `@var{action}(', +`)')}. By supplying explicit parentheses, split among the @var{pre} and +@var{post} arguments to @code{stack_foreach_sep}, it is now possible to +construct macro calls with more than one argument, without passing +builtin tokens through a macro call. It is likewise possible to +directly reference the stack definitions without a macro call, by +leaving @var{pre} and @var{post} empty. Thus, in addition to fixing +@code{copy} on builtin tokens, it also executes with fewer macro +invocations. + +The new macro also adds a separator that is only output after the first +iteration of the helper @code{_stack_reverse_sep}, implemented by +prepending the original @var{sep} to @var{pre} and omitting a @var{sep} +argument in subsequent iterations. Note that the empty string that +separates @var{sep} from @var{pre} is provided as part of the fourth +argument when originally calling @code{_stack_reverse_sep}, and not by +writing @code{$4`'$3} as the third argument in the recursive call; while +the other approach would give the same output, it does so at the expense +of increasing the argument size on each iteration of +@code{_stack_reverse_sep}, which results in quadratic instead of linear +execution time. The improved stack walking macros are available in +@file{m4-@value{VERSION}/@/examples/@/stack_sep.m4}: + +@comment examples +@example +$ @kbd{m4 -I examples} +include(`stack_sep.m4') +@result{} +define(`copy', `ifdef(`$2', `errprint(`$2 already defined +')m4exit(`1')', + `stack_foreach_sep(`$1', `pushdef(`$2',', `)')')')dnl +pushdef(`a', `1')pushdef(`a', defn(`divnum')) +@result{} +copy(`a', `b') +@result{} +b +@result{}0 +popdef(`b') +@result{} +b +@result{}1 +pushdef(`c', `1')pushdef(`c', `2') +@result{} +stack_foreach_sep_lifo(`c', `', `', `, ') +@result{}2, 1 +undivert(`stack_sep.m4')dnl +@result{}divert(`-1') +@result{}# stack_foreach_sep(macro, pre, post, sep) +@result{}# Invoke PRE`'defn`'POST with a single argument of each definition +@result{}# from the definition stack of MACRO, starting with the oldest, and +@result{}# separated by SEP between definitions. +@result{}define(`stack_foreach_sep', +@result{}`_stack_reverse_sep(`$1', `tmp-$1')'dnl +@result{}`_stack_reverse_sep(`tmp-$1', `$1', `$2`'defn(`$1')$3', `$4`'')') +@result{}# stack_foreach_sep_lifo(macro, pre, post, sep) +@result{}# Like stack_foreach_sep, but starting with the newest definition. +@result{}define(`stack_foreach_sep_lifo', +@result{}`_stack_reverse_sep(`$1', `tmp-$1', `$2`'defn(`$1')$3', `$4`'')'dnl +@result{}`_stack_reverse_sep(`tmp-$1', `$1')') +@result{}define(`_stack_reverse_sep', +@result{}`ifdef(`$1', `pushdef(`$2', defn(`$1'))$3`'popdef(`$1')$0( +@result{} `$1', `$2', `$4$3')')') +@result{}divert`'dnl +@end example + +@ignore +@comment Not worth putting in the manual, but make sure that +@comment stack_foreach_sep has linear performance. + +@comment examples +@example +$ @kbd {m4 -I examples} +include(`forloop3.m4')include(`stack_sep.m4')dnl +forloop(`i', `1', `10000', `pushdef(`s', i)') +@result{} +define(`colon', `:')define(`dash', `-') +@result{} +len(stack_foreach_sep(`s', `dash', `', `colon')) +@result{}58893 +@end example +@end ignore + +@node Improved m4wrap +@section Solution for @code{m4wrap} + +The replacement @code{m4wrap} versions presented above, designed to +guarantee FIFO or LIFO order regardless of the underlying M4 +implementation, share a bug when dealing with wrapped text that looks +like parameter expansion. Note how the invocation of +@code{m4wrap@var{n}} interprets these parameters, while using the +builtin preserves them for their intended use. + +@comment examples +@example +$ @kbd{m4 -I examples} +include(`wraplifo.m4') +@result{} +m4wrap(`define(`foo', ``$0:'-$1-$*-$#-')foo(`a', `b') +') +@result{} +builtin(`m4wrap', ``'define(`bar', ``$0:'-$1-$*-$#-')bar(`a', `b') +') +@result{} +^D +@result{}bar:-a-a,b-2- +@result{}m4wrap0:---0- +@end example + +Additionally, the computation of @code{_m4wrap_level} and creation of +multiple @code{m4wrap@var{n}} placeholders in the original examples is +more expensive in time and memory than strictly necessary. Notice how +the improved version grabs the wrapped text via @code{defn} to avoid +parameter expansion, then undefines @code{_m4wrap_text}, before +stripping a level of quotes with @code{_arg1} to expand the text. That +way, each level of wrapping reuses the single placeholder, which starts +each nesting level in an undefined state. + +Finally, it is worth emulating the GNU M4 extension of saving +all arguments to @code{m4wrap}, separated by a space, rather than saving +just the first argument. This is done with the @code{join} macro +documented previously (@pxref{Shift}). The improved LIFO example is +shipped as @file{m4-@value{VERSION}/@/examples/@/wraplifo2.m4}, and can +easily be converted to a FIFO solution by swapping the adjacent +invocations of @code{joinall} and @code{defn}. + +@comment examples +@example +$ @kbd{m4 -I examples} +include(`wraplifo2.m4') +@result{} +undivert(`wraplifo2.m4')dnl +@result{}dnl Redefine m4wrap to have LIFO semantics, improved example. +@result{}include(`join.m4')dnl +@result{}define(`_m4wrap', defn(`m4wrap'))dnl +@result{}define(`_arg1', `$1')dnl +@result{}define(`m4wrap', +@result{}`ifdef(`_$0_text', +@result{} `define(`_$0_text', joinall(` ', $@@)defn(`_$0_text'))', +@result{} `_$0(`_arg1(defn(`_$0_text')undefine(`_$0_text'))')dnl +@result{}define(`_$0_text', joinall(` ', $@@))')')dnl +m4wrap(`define(`foo', ``$0:'-$1-$*-$#-')foo(`a', `b') +') +@result{} +m4wrap(`lifo text +m4wrap(`nested', `', `$@@ +')') +@result{} +^D +@result{}lifo text +@result{}foo:-a-a,b-2- +@result{}nested $@@ +@end example + +@node Improved cleardivert +@section Solution for @code{cleardivert} + +The @code{cleardivert} macro (@pxref{Cleardivert}) cannot, as it stands, be +called without arguments to clear all pending diversions. That is +because using undivert with an empty string for an argument is different +than using it with no arguments at all. Compare the earlier definition +with one that takes the number of arguments into account: + +@example +define(`cleardivert', + `pushdef(`_n', divnum)divert(`-1')undivert($@@)divert(_n)popdef(`_n')') +@result{} +divert(`1')one +divert +@result{} +cleardivert +@result{} +undivert +@result{}one +@result{} +define(`cleardivert', + `pushdef(`_num', divnum)divert(`-1')ifelse(`$#', `0', + `undivert`'', `undivert($@@)')divert(_num)popdef(`_num')') +@result{} +divert(`2')two +divert +@result{} +cleardivert +@result{} +undivert +@result{} +@end example + +@node Improved capitalize +@section Solution for @code{capitalize} + +The @code{capitalize} macro (@pxref{Patsubst}) as presented earlier does +not allow clients to follow the quoting rule of thumb. Consider the +three macros @code{active}, @code{Active}, and @code{ACTIVE}, and the +difference between calling @code{capitalize} with the expansion of a +macro, expanding the result of a case change, and changing the case of a +double-quoted string: + +@comment examples +@example +$ @kbd{m4 -I examples} +include(`capitalize.m4')dnl +define(`active', `act1, ive')dnl +define(`Active', `Act2, Ive')dnl +define(`ACTIVE', `ACT3, IVE')dnl +upcase(active) +@result{}ACT1,IVE +upcase(`active') +@result{}ACT3, IVE +upcase(``active'') +@result{}ACTIVE +downcase(ACTIVE) +@result{}act3,ive +downcase(`ACTIVE') +@result{}act1, ive +downcase(``ACTIVE'') +@result{}active +capitalize(active) +@result{}Act1 +capitalize(`active') +@result{}Active +capitalize(``active'') +@result{}_capitalize(`active') +define(`A', `OOPS') +@result{} +capitalize(active) +@result{}OOPSct1 +capitalize(`active') +@result{}OOPSctive +@end example + +First, when @code{capitalize} is called with more than one argument, it +was throwing away later arguments, whereas @code{upcase} and +@code{downcase} used @samp{$*} to collect them all. The fix is simple: +use @samp{$*} consistently. + +Next, with single-quoting, @code{capitalize} outputs a single character, +a set of quotes, then the rest of the characters, making it impossible +to invoke @code{Active} after the fact, and allowing the alternate macro +@code{A} to interfere. Here, the solution is to use additional quoting +in the helper macros, then pass the final over-quoted output string +through @code{_arg1} to remove the extra quoting and finally invoke the +concatenated portions as a single string. + +Finally, when passed a double-quoted string, the nested macro +@code{_capitalize} is never invoked because it ended up nested inside +quotes. This one is the toughest to fix. In short, we have no idea how +many levels of quotes are in effect on the substring being altered by +@code{patsubst}. If the replacement string cannot be expressed entirely +in terms of literal text and backslash substitutions, then we need a +mechanism to guarantee that the helper macros are invoked outside of +quotes. In other words, this sounds like a job for @code{changequote} +(@pxref{Changequote}). By changing the active quoting characters, we +can guarantee that replacement text injected by @code{patsubst} always +occurs in the middle of a string that has exactly one level of +over-quoting using alternate quotes; so the replacement text closes the +quoted string, invokes the helper macros, then reopens the quoted +string. In turn, that means the replacement text has unbalanced quotes, +necessitating another round of @code{changequote}. + +In the fixed version below, (also shipped as +@file{m4-@value{VERSION}/@/examples/@/capitalize2.m4}), @code{capitalize} +uses the alternate quotes of @samp{<<[} and @samp{]>>} (the longer +strings are chosen so as to be less likely to appear in the text being +converted). The helpers @code{_to_alt} and @code{_from_alt} merely +reduce the number of characters required to perform a +@code{changequote}, since the definition changes twice. The outermost +pair means that @code{patsubst} and @code{_capitalize_alt} are invoked +with alternate quoting; the innermost pair is used so that the third +argument to @code{patsubst} can contain an unbalanced +@samp{]>>}/@samp{<<[} pair. Note that @code{upcase} and @code{downcase} +must be redefined as @code{_upcase_alt} and @code{_downcase_alt}, since +they contain nested quotes but are invoked with the alternate quoting +scheme in effect. + +@comment examples +@example +$ @kbd{m4 -I examples} +include(`capitalize2.m4')dnl +define(`active', `act1, ive')dnl +define(`Active', `Act2, Ive')dnl +define(`ACTIVE', `ACT3, IVE')dnl +define(`A', `OOPS')dnl +capitalize(active; `active'; ``active''; ```actIVE''') +@result{}Act1,Ive; Act2, Ive; Active; `Active' +undivert(`capitalize2.m4')dnl +@result{}divert(`-1') +@result{}# upcase(text) +@result{}# downcase(text) +@result{}# capitalize(text) +@result{}# change case of text, improved version +@result{}define(`upcase', `translit(`$*', `a-z', `A-Z')') +@result{}define(`downcase', `translit(`$*', `A-Z', `a-z')') +@result{}define(`_arg1', `$1') +@result{}define(`_to_alt', `changequote(`<<[', `]>>')') +@result{}define(`_from_alt', `changequote(<<[`]>>, <<[']>>)') +@result{}define(`_upcase_alt', `translit(<<[$*]>>, <<[a-z]>>, <<[A-Z]>>)') +@result{}define(`_downcase_alt', `translit(<<[$*]>>, <<[A-Z]>>, <<[a-z]>>)') +@result{}define(`_capitalize_alt', +@result{} `regexp(<<[$1]>>, <<[^\(\w\)\(\w*\)]>>, +@result{} <<[_upcase_alt(<<[<<[\1]>>]>>)_downcase_alt(<<[<<[\2]>>]>>)]>>)') +@result{}define(`capitalize', +@result{} `_arg1(_to_alt()patsubst(<<[<<[$*]>>]>>, <<[\w+]>>, +@result{} _from_alt()`]>>_$0_alt(<<[\&]>>)<<['_to_alt())_from_alt())') +@result{}divert`'dnl +@end example + +@node Improved fatal_error +@section Solution for @code{fatal_error} + +The @code{fatal_error} macro (@pxref{M4exit}) is not robust to versions +of GNU M4 earlier than 1.4.8, where invoking +@code{@w{__file__}} (@pxref{Location}) inside @code{m4wrap} would result +in an empty string, and @code{@w{__line__}} resulted in @samp{0} even +though all files start at line 1. Furthermore, versions earlier than +1.4.6 did not support the @code{@w{__program__}} macro. If you want +@code{fatal_error} to work across the entire 1.4.x release series, a +better implementation would be: + +@comment status: 1 +@example +define(`fatal_error', + `errprint(ifdef(`__program__', `__program__', ``m4'')'dnl +`:ifelse(__line__, `0', `', + `__file__:__line__:')` fatal error: $* +')m4exit(`1')') +@result{} +m4wrap(`divnum(`demo of internal message') +fatal_error(`inside wrapped text')') +@result{} +^D +@error{}m4:stdin:6: Warning: excess arguments to builtin `divnum' ignored +@result{}0 +@error{}m4:stdin:6: fatal error: inside wrapped text +@end example + +@c ========================================================== Appendices + +@node Copying This Package +@appendix How to make copies of the overall M4 package +@cindex License, code + +This appendix covers the license for copying the source code of the +overall M4 package. This manual is under a different set of +restrictions, covered later (@pxref{Copying This Manual}). + +@menu +* GNU General Public License:: License for copying the M4 package +@end menu + +@node GNU General Public License +@appendixsec License for copying the M4 package +@cindex GPL, GNU General Public License +@cindex GNU General Public License +@cindex General Public License (GPL), GNU +@include gpl-3.0.texi + +@node Copying This Manual +@appendix How to make copies of this manual +@cindex License, manual + +This appendix covers the license for copying this manual. Note that +some of the longer examples in this manual are also distributed in the +directory @file{m4-@value{VERSION}/@/examples/}, where a more +permissive license is in effect when copying just the examples. + +@menu +* GNU Free Documentation License:: License for copying this manual +@end menu + +@node GNU Free Documentation License +@appendixsec License for copying this manual +@cindex FDL, GNU Free Documentation License +@cindex GNU Free Documentation License +@cindex Free Documentation License (FDL), GNU +@include fdl-1.3.texi + +@node Indices +@appendix Indices of concepts and macros + +@menu +* Macro index:: Index for all @code{m4} macros +* Concept index:: Index for many concepts +@end menu + +@node Macro index +@appendixsec Index for all @code{m4} macros + +This index covers all @code{m4} builtins, as well as several useful +composite macros. References are exclusively to the places where a +macro is introduced the first time. + +@printindex fn + +@node Concept index +@appendixsec Index for many concepts + +@printindex cp + +@bye + +@c Local Variables: +@c coding: iso-8859-1 +@c fill-column: 72 +@c ispell-local-dictionary: "american" +@c indent-tabs-mode: nil +@c whitespace-check-buffer-indent: nil +@c End: |