Hubbry Logo
General-purpose macro processorGeneral-purpose macro processorMain
Open search
General-purpose macro processor
Community hub
General-purpose macro processor
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
General-purpose macro processor
General-purpose macro processor
from Wikipedia

A general-purpose macro processor or general purpose preprocessor is a macro processor that is not tied to or integrated with a particular language or piece of software.

A macro processor is a program that copies a stream of text from one place to another, making a systematic set of replacements as it does so. Macro processors are often embedded in other programs, such as assemblers and compilers. Sometimes they are standalone programs that can be used to process any kind of text.

Macro processors have been used for language expansion (defining new language constructs that can be expressed in terms of existing language components), for systematic text replacements that require decision making, and for text reformatting (e.g. conditional extraction of material from an HTML file).

Examples of general purpose macro processors

[edit]
Name Year Description
GPM 1960s One of the earliest macro processors was GPM (the General Purpose Macrogenerator).[1] This was developed at the University of Cambridge, UK, in the mid 1960s, under the direction of Christopher Strachey.
ML/I 1960s One particularly important general purpose macro processor was (and still is) ML/I (Macro Language One). This was developed as part of PhD research by a Cambridge postgraduate, Peter J. Brown. ML/I operates on a character stream, and requires no special format for its input, nor any special flag characters to introduce macros.
STAGE2 1960s A contemporary of ML/I was STAGE2,[2] part of William Waite's Mobile Programming System.[3] This too is a general purpose macro processor, but it processes input a line at a time, matching each line against specified patterns; it is notable in that it is independent of character set, requiring only that the digits 0-9 are contiguous and in that order (a condition not met by some of the 6-bit and BCD character codes of the era).
M6 1960s Early macro processor developed at AT&T Bell Laboratories by Douglas McIlroy, Robert Morris and Andrew Hall. It was influenced by GPM and TRAC. Implemented in FORTRAN IV,[4] it was ported to Version 2 Unix.
SNOBOL 1960s SNOBOL is a string processing language which is capable of doing most of the pre-processing which can be done by a macro processor.
XPOP XPOP was another attempt at a general macro processing language by Mark Halpern at IBM in the 1960s.
TTM 1968 TTM is a recursive, interpretive language designed primarily for string manipulation, text editing, macro definition and expansion, and other applications generally classified as systems programming. It was developed in 1968 by Steven Caine and E. Kent Gordon at the California Institute of Technology. It is derived, primarily, from GAP[5] and GPM.[1]
GMP 1970s Another attempt was the GMP (General Macro Processor) developed in the mid-1970s by M Boule in the DLB/GC department of the CII Company along ideas from R.J. Chevance. Tested in association with the Bordeaux I University the first version ran the SIRIS8/IRIS80 System. It was ported to mini6 systems and was the main component involved in the system generation for this family of computers. The GMP processor used C2-Chomsky grammars to define the syntax of macros and used an imperative language to execute computations and proceed to macro expansion.
M4 1977 m4 was designed and written in C for Unix by Dennis Ritchie and converted to Ratfor by Brian Kernighan.[6]
ELENA Software: Practice and Experience, Vol. 14, pp. 519–531, Jun. 1984
gema 1995 gema is a contextual macro processor based on pattern matching, written by David N. Gray. It replaces/enhances the concept of regular expressions by contexts. Contexts roughly corresponds to named sets of patterns. As a consequence, macros in gema closely resemble an EBNF description.[7]
GPP 1996 gpp is another general macro processor written by Denis Auroux. It resembles a C preprocessor, but has more general semantics and allows for customized syntax (for instance, TeX, XHTML, and Prolog-like scripts are definable).[8]
M5 1999 m5 is a general-purpose macro processor written by William A. Ward, Jr. Unlike many macroprocessors, m5 does not directly interpret its input. Instead it uses a two-pass approach in which the first pass translates the input to an awk program, and the second pass executes the awk program to produce the final output.
pyexpander 2011 pyexpander is a general-purpose macro processor based on the Python programming language. In addition to simple macro replacement it allows evaluation of arbitrary Python expressions and execution of python code.
Text Assembler 2014 Text Assembler is a general-purpose text/macro processor based on the JavaScript programming language. Beyond simple macro replacement, it allows evaluating arbitrary JavaScript expressions and executing JavaScript code. It can also load JSON data models for more complex data-driven text processing tasks.[9]
PP 2016 PP is a text preprocessor designed for Pandoc (and more generally Markdown and reStructuredText). PP implements: Macros, literate programming, GraphViz, PlantUML and ditaa diagrams, Bash, Cmd, PowerShell, Python and Haskell scripts.[10]
minimac minimac is a minimalist general purpose macro processor. It operates as a character stream filter, recursively expanding macros as they are encountered. It is unusual for a macro processor in that it uses an explicit argument stack, and user functions are defined by concatenation (similar to the Forth language).[11]
aa_macro 2017 aa_macro is an open-source character-stream-based text processing language written in Python. Text is processed in a left-to-right, inside-to-outside manner. A selection of pre-defined built-in functions provide fundamental processing mechanisms that may be used directly or as elements of user-defined styles. The language is user extensible, and wtfm, an open-source web-based document preparation wrapper for the language, is available.[12][13]

See also

[edit]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
A general-purpose macro processor (GMP) is a text-manipulation tool that reads input streams, identifies macro invocations—specialized placeholders or commands—and replaces them with predefined or dynamically generated text expansions, enabling reusable patterns and automation in a manner. Unlike language-specific preprocessors, a GMP operates independently, allowing it to preprocess , configuration files, or any textual data by performing substitutions, arithmetic computations, and conditional logic during expansion. The concept of general-purpose macro processing emerged in the mid-20th century as a way to extend and simplify programming languages through automated text transformation. In the 1950s, advocated for independent macro facilities to enhance code modularity, laying foundational ideas for systematic text replacement. This evolved into practical implementations, such as Christopher Strachey's General Purpose Macrogenerator (GPM) in 1965, which formalized macro expansion as a standalone process for generating code from higher-level descriptions. By the 1970s, tools like M6 facilitated porting complex systems, leading to the influential m4 processor developed by and in 1977, which became a Unix standard for versatile preprocessing. Key features of GMPs include macro definition via commands that associate names with replacement text (often supporting up to nine arguments for parameterization), recursive expansion for nested macros, and built-in functions for tasks like string manipulation, integer arithmetic (e.g., increment or ), file inclusion, and conditional branching based on macro-time variables. These capabilities support aids, such as tracing expansions or dumping definitions, ensuring robust handling of complex inputs without altering the underlying syntax of the target text. Modern implementations, like GNU m4 (released in 1990), extend these with enhanced portability and performance optimizations while preserving core extensibility. GMPs have broad applications, serving as front-ends for compilers (e.g., preprocessing C or assembly code), generating documentation, automating build scripts, or even creating domain-specific languages by layering custom syntax over base ones. Notable examples include ML/I (1967), a compact system for user-defined language extensions implemented on early computers like the PDP-7; Gema, a pattern-matching utility for arbitrary text processing; and persistent tools like m4, which remain integral to Unix-like environments for tasks requiring repeatable text generation.

Overview

Definition

A general-purpose macro processor is a standalone text-processing tool that performs macro expansion through pattern matching and substitution, operating independently of any specific programming language or application. Unlike language-specific preprocessors, it manipulates plain text streams by replacing macro invocations with their defined expansions, enabling flexible text transformation without embedding into a compiler or interpreter. Key characteristics include support for user-defined macros with optional parameters, recursive expansion to allow nested macro calls, and design for broad applicability across domains such as code generation, documentation, and configuration files. It processes input on a character-by-character or token basis, accommodating diverse syntax rules like comments and identifier formation through user-specified directives, which distinguishes it from specialized macro systems tied to particular languages. The term gained prominence in the 1960s with tools like the General Purpose Macrogenerator (GPM), emphasizing language independence as first proposed by in the 1950s. In its basic workflow, the processor scans input text for macro invocations, identifies parameters if present, substitutes them into the macro's predefined body, and outputs the expanded result, potentially iterating recursively until no further expansions are possible.

Purpose and Applications

General-purpose macro processors are primarily designed to automate the handling of repetitive text patterns in source files, enabling developers to define macros that expand into larger blocks of or text during preprocessing. This approach facilitates the generation of , which is particularly useful in scenarios where similar structures need to be repeated across multiple files or projects, thereby streamlining development workflows. By allowing users to create configurable templates, these processors support techniques that abstract away low-level details, making it easier to adapt for different environments or requirements without manual duplication. In practice, general-purpose macro processors find key applications in preprocessing for languages such as and assembly, where they substitute macro invocations with equivalent instructions to simplify complex routines like register saving or loop constructs. They are also employed in generating and s by transforming structured input into formatted output, as well as building configuration files for software systems, exemplified by tools like Autoconf that produce portable setup scripts from macro definitions. Additionally, these processors aid transformation tasks, such as generation, by performing manipulations, conditional inclusions, and arithmetic operations on text streams to produce customized results. The advantages of using general-purpose macro processors include improved code maintainability through centralized definitions that reduce the need for scattered repetitions, minimization of errors in repetitive tasks via automated expansions, and the ability to abstract complex patterns in a manner, allowing reuse across diverse text-based workflows. However, their purpose is limited to static preprocessing stages, where they operate solely on text-level substitutions and manipulations rather than supporting runtime execution or deep , which distinguishes them from full-fledged compilers or interpreters.

History

Early Developments

The origins of general-purpose macro processors trace back to the 1950s, when computer scientist Alan Perlis advocated for a macro language independent of the target programming language to enhance flexibility in code generation and processing. This idea addressed the limitations of early computing environments, where hardware constraints like small memory capacities and slow processing speeds necessitated tools for language extensibility, particularly in assemblers and nascent high-level languages that required efficient code reuse without bloating source programs. A pivotal innovation in this era was the introduction of recursive macro expansion and parameter substitution, first realized in the MACRO assembler developed for MIT's TX-0 computer in the late . This system allowed macros to call themselves and substitute arguments dynamically, enabling nested definitions and reducing repetitive coding in resource-scarce settings; it later evolved into the assembler for the in the early , further refining these capabilities for broader use in early programming. The 1960s marked key milestones in general-purpose macro processor development. The General Purpose Macrogenerator (GPM), created by at the around 1965, was one of the earliest standalone macro processors, compact enough to fit in just 250 machine instructions and designed for arbitrary text manipulation independent of specific languages. In 1964, Calvin Mooers introduced , a string-processing macro system implemented on the , emphasizing procedure description for text reckoning and compiling through generalized macro concepts. Later that decade, Peter J. Brown developed ML/I at University in 1966, a versatile macro processor for general text manipulation that supported conditional processing and became influential for its simplicity and portability across early systems. These advancements, driven by the era's push for programmable tools to automate code transformation amid hardware limitations, laid the groundwork for paradigms.

Modern Evolution

In the late 1960s, extensions to early macro processing concepts emerged with the development of the M6 macro processor by Douglas McIlroy and Robert Morris at Bell Laboratories, with implementation by . Inspired by Christopher Strachey's General Purpose Macrogenerator (GPM), M6 was implemented in in the early 1970s and served primarily as a tool for porting code within the Altran , demonstrating early versatility beyond simple text substitution. A pivotal advancement occurred in 1977 with the creation of m4 by and at Bell Laboratories, as detailed in their introducing built-in macros for conditional processing, file inclusion, and arithmetic operations, alongside tight integration with the UNIX operating system. This design emphasized portability and simplicity, influencing subsequent tools and leading to the open-source GNU m4 implementation in 1990 by Seindal, which expanded compatibility across UNIX variants and early PCs while maintaining the core syntax. Into the late 20th and early 21st centuries, innovations like Gema, developed by David N. Gray in the mid-1990s, shifted focus toward advanced pattern-matching capabilities, allowing contextual substitutions that enhanced flexibility for text transformation tasks without relying on traditional regular expressions. Meanwhile, classic processors such as ML/I, originally conceived by Peter Brown in 1966, received ongoing maintenance, including ports to modern platforms like macOS in 2020 and in 2021, ensuring continued relevance through improved error handling and cross-OS portability. Broader trends from the onward reflect a move toward open-source availability, with tools like GNU m4 fostering community-driven enhancements and widespread adoption in open-source projects. Enhanced portability across operating systems became standard, enabling seamless use in diverse environments from UNIX to Windows. Integration with build systems such as GNU Make grew common, allowing macro processors to automate configuration and code generation in software development pipelines. Adaptations for web technologies, including CGI scripts for dynamic content generation, further extended their utility, though post-1980s evolution emphasized incremental improvements like robust error diagnostics rather than radical paradigm shifts. As of May 2025, GNU m4 reached version 1.4.20, incorporating portability enhancements and optimizations.

Key Features

Macro Definition and Parameters

In general-purpose macro processors, macros are defined through a directive that binds a name to a body of text, which is later substituted or processed upon invocation of the name. Syntax varies across implementations; for example, influential tools like m4 use define(name, body), where name is an identifier and body is the literal text or template to expand, defaulting to an empty string if no body is provided. This mechanism allows for the creation of reusable text fragments independent of any specific programming language, as introduced in early designs for language-independent processing. Other GMPs, such as ML/I, employ different syntax like MCDEF with custom delimiters. Parameterized macros extend this by incorporating formal parameters, enabling dynamic substitution based on arguments supplied at invocation. The definition specifies , with placeholders in the body denoting substitution points, such as positional arguments (e.g., first argument as $1 in m4) or keyword parameters in some systems. Arguments are typically delimited during calls, with leading whitespace stripped and excess arguments ignored unless handled; advanced systems support named arguments via key-value pairs or default values for optional to enhance flexibility. For instance, in m4, a definition like define(add, $1 + $2) allows invocation as add(3, 4) to yield 3 + 4. In assembly-style macro processors, keyword like TYPE=DIRECT are used. These features facilitate the construction of macro-based functions, including conditionals, without tying to a host language's . and parameter handling vary, with m4 using comma-separated arguments in parentheses, while ML/I allows flexible delimiters. Macros generally operate within a global scope, making definitions visible throughout the input stream unless explicitly managed. Visibility can be controlled through stacking mechanisms, such as pushdef to add a new definition atop an existing one (creating temporary overrides) and popdef to restore prior versions, as in m4, simulating local scopes in nested contexts. To remove a macro entirely, an undefine(name) directive is used, which safely ignores nonexistent names and can be invoked even during expansions. Nesting of macro definitions and invocations is permitted, allowing macros to define or call others, but implementations impose limits—often configurable—to prevent infinite loops from self-referential or cyclic definitions; for example, GNU m4 defaults to 1024 levels on platforms without detection (as of 2023). A simple non-parameterized example in m4 is define(PI, 3.14), which substitutes 3.14 wherever PI appears, while a parameterized one like define(square, $1 * $1) with square(5) produces 5 * 5, illustrating how definitions underpin text expansion for code generation or configuration.

Expansion and Substitution

In general-purpose macro processors, the expansion process begins with scanning the input text stream for macro invocations, which typically consist of a macro name followed by optional arguments delimited according to the implementation's rules. Upon recognizing a defined macro name—often as an alphanumeric token—the processor retrieves the corresponding macro body from its definition table and initiates substitution by replacing the invocation with the body, inserting the provided arguments in place of formal parameters. This textual replacement occurs positionally or by keyword, depending on the processor's , ensuring that the expanded output maintains the original while incorporating the arguments. For instance, an invocation might expand to a sequence of commands referencing the arguments within the macro body. Substitution rules emphasize literal text replacement to preserve the macro's intent, but include mechanisms to handle special characters and prevent unintended expansions. Quoting conventions, such as enclosing text in quotes or using escape sequences, allow users to defer evaluation or protect parameters from immediate processing, avoiding premature substitution of embedded macro names. A key side effect is the re-scanning of expanded text: after substitution, the resulting output is fed back into the input stream for further macro detection and expansion, enabling nested or recursive invocations but requiring careful design to manage complexity. This re-scanning ensures that dynamically generated content can invoke additional macros, promoting flexibility in text generation. Advanced mechanisms extend basic substitution to support programmatic control within expansions. Conditional expansion evaluates parameter values or expressions to selectively include portions of the macro body, such as using constructs to branch based on argument equality or arithmetic comparisons. Some processors incorporate looping via while-do structures or recursive macro calls, allowing repetitive substitution until a termination condition is met, though this is often limited to avoid infinite cycles. Error handling typically involves detecting undefined macro names during scanning, triggering warnings or halting expansion, while built-in functions may divert output or report issues to facilitate in complex inputs. These features enable macros to perform computations or logic during processing, beyond simple templating. Efficiency in expansion is achieved through streamlined strategies, balancing completeness with . One-pass processors scan and expand in a single traversal, immediately substituting and re-scanning as needed, which suits sequential inputs but demands forward definitions to avoid unresolved references. Multi-pass approaches separate definition collection from expansion, allowing backward references but increasing overhead for large files. To mitigate risks like from deep , processors often impose expansion depth limits or optimize re-scanning by tracking substitution history, ensuring for practical applications such as code generation or configuration scripting.

Notable Implementations

m4

The m4 macro processor was developed by Brian W. Kernighan and Dennis M. Ritchie at Bell Laboratories in 1977 as a general-purpose tool for text manipulation and code generation in UNIX environments. The original implementation, detailed in their technical report "The M4 Macro Processor," provided a foundation for macro expansion with support for user-defined macros and built-in functions, making it suitable as a front-end for languages like Ratfor and C. GNU m4, the primary open-source reimplementation, was first released in 1990 and has since become the de facto standard, with its latest stable version, 1.4.20, issued in May 2025. At its core, m4 features a set of built-in macros that enable powerful text processing, including define for creating new macros, undefine for removing definitions, ifelse for conditional logic, divert for redirecting output to temporary buffers, include for incorporating external files, and arithmetic operations via incr for incrementing integers and eval for evaluating expressions. These primitives allow m4 to handle complex transformations, such as parameter substitution where arguments are referenced as $1, $2, and so on up to $9 in standard implementations, with GNU m4 extending support beyond nine parameters. For instance, the macro definition define(HELLO', Hello, $1!') followed by invocation HELLO(world')expands toHello, world!`, demonstrating how m4 passes and substitutes arguments during expansion. m4 is commonly employed in UNIX toolchains for tasks like generating C code from higher-level descriptions, scripting in to produce portable configure scripts, and preprocessing Makefiles to handle conditional builds and variable expansions. Its integration into the Autotools suite underscores its role in automating software configuration across POSIX-compliant systems. While m4's extensibility allows users to build sophisticated abstractions on top of its primitives, its syntax—requiring explicit quoting with backquotes (`) and apostrophes (') to delimit arguments and prevent premature expansion—can be verbose and demands careful management to avoid errors. This quirk, inherited from the original design, contributes to a steep but ensures precise control over macro hygiene. Despite these challenges, m4's in POSIX.1-2008 guarantees its widespread availability on systems, solidifying its enduring influence as a foundational macro processing tool.

Other Processors

One notable general-purpose macro processor is ML/I, developed in 1966 by Peter J. Brown as part of his PhD work at the University of Cambridge. It features robust error diagnostics, multi-pass processing for handling complex dependencies, and a syntax that prioritizes readability through structured MACRO blocks for definition and expansion. Originally implemented on systems like the Elliott 903 and later ported to various platforms including IBM mainframes and Unix, ML/I has been employed in environments requiring precise text manipulation, such as extending programming languages or formatting reports. Modern ports continue to support its use in legacy system maintenance, where its diagnostic capabilities aid in debugging and updating older codebases. Another significant example is Gema, a general-purpose text processing utility introduced in the mid-1990s by David N. Gray. Unlike traditional substitution-based processors, Gema emphasizes pattern-matching rules with a simple, regex-like syntax for defining transformations, enabling complex operations like conditional replacements and structural rewrites on input streams. Available as hosted on , it operates as a command-line tool across Unix, , Windows, and Macintosh systems, making it suitable for pipeline-based workflows. Its niche lies in advanced text filtering and conversion tasks, such as generating code from templates or processing configuration files in automated build environments. An early foundational implementation is the General Purpose Macrogenerator (GPM), developed by in 1965. Designed as a standalone macro expansion tool, GPM formalized the process of generating code from higher-level descriptions and influenced subsequent processors with its support for parameterized macros and recursive expansions. Implemented on the Atlas 2 computer, it consisted of approximately 250 machine instructions and served as a precursor to more advanced systems. Early innovations include , a string-oriented macro processor developed between 1959 and 1964 by Calvin N. Mooers and first implemented on the PDP-1. Designed for interactive text reckoning and compiling, treats all data as character strings, facilitating early applications in non-numeric processing like document preparation and simple scripting on limited hardware. In 1972, Andrew D. Hall developed M6 at Bell Laboratories, building on ideas from for a compact macro processor implemented in FORTRAN IV. Its concise design, fitting core functionalities into minimal code, made it ideal for porting Fortran-based systems like the Altran computer algebra software across different machines. The SLAC macro processor, specifically MAC74, emerged in the 1970s at the Stanford Linear Accelerator Center under Joseph C. H. Park. Tailored for scientific computing, it offers versatility in handling nested expansions and symbol manipulation, with an emphasis on easy modifiability to adapt to domain-specific needs in physics simulations and .

Comparisons

With Language-Specific Preprocessors

Language-specific preprocessors, such as the (CPP), are macro processing tools designed exclusively for a particular programming language, with directives and behaviors tightly coupled to that language's syntax and semantics. For instance, CPP operates on C and C++ source code, handling features like #include for file incorporation and #ifdef for conditional compilation, all integrated into the compilation pipeline to ensure compatibility with the language's tokenization rules. In contrast, general-purpose macro processors like m4 function as standalone, utilities that treat input as , enabling expansion and substitution without reliance on any specific or language parser. This fundamental difference results in limited portability for language-specific tools—CPP, for example, assumes C-like syntax and can produce unexpected results when applied to non-C files—while general-purpose processors offer broad applicability across diverse text-based tasks. General-purpose macro processors provide key advantages in reusability and independence, allowing deployment across multiple projects and languages without compiler dependencies; m4, for example, powers tools like for generating portable configuration scripts usable in various build environments, a flexibility not inherent in CPP's C-centric design. However, this generality comes at the cost of missing language-aware safeguards, such as CPP's implicit alignment with C's for directive processing, potentially requiring more manual oversight to avoid expansion errors that only surface during later compilation stages.

With Template Engines

Template engines are software tools designed to combine templates with dynamic data models to generate output documents, such as HTML, often embedding logic like conditionals and loops that are evaluated at runtime within a host programming language. For instance, Jinja2 in Python allows placeholders for Python-like code that render final content by inserting runtime variables and executing control structures based on application data. In contrast to general-purpose macro processors, which perform purely static text substitution and expansion before execution—replacing macro names with predefined text without runtime evaluation—template engines support dynamic insertion of data, runtime-dependent conditionals, and integration with host languages for flexible content generation. Macro processors like m4 operate on textual input to produce fixed output through pre-execution expansion, lacking the ability to handle variables or logic that change during program runtime, whereas template engines such as Mustache or Jinja2 enable conditional rendering based on live data feeds. Macro processors are typically employed for build-time code generation and configuration, as seen in tools like Autotools where m4 expands macros to create portable build scripts from templates without runtime dependencies. Template engines, however, excel in server-side rendering scenarios, such as generating personalized pages from database queries in web applications, where dynamic data like user information drives the output. Unlike macro processors, which do not provide built-in mechanisms for context-specific escaping to prevent issues like injection attacks, template engines often include automatic escaping features to ensure safe output when rendering user-supplied data. While some overlaps exist—such as m4 being repurposed for simple templating tasks in early due to its text manipulation capabilities—general-purpose macro processors prioritize simplicity by avoiding runtime dependencies, unlike template engines that rely on host environments for dynamic behavior. This distinction has led to migrations where static macro approaches inspire but are supplanted by more adaptable templating systems in modern dynamic applications.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.