Recent from talks
Nothing was collected or created yet.
PdfTeX
View on Wikipedia| pdfTeX | |
|---|---|
| Original author | Hàn Thế Thành |
| Developer | The pdfTeX team |
| Stable release | 1.40.27[1] |
| Repository | |
| Operating system | Multiplatform |
| Type | Typesetting |
| License | GNU General Public License |
| Website | www |
The computer program pdfTeX, sometimes typeset as pdfTeX, is an extension of Knuth's typesetting program TeX, and was originally written and developed into a publicly usable product by Hàn Thế Thành as a part of the work for his PhD thesis at the Faculty of Informatics, Masaryk University, Brno, Czech Republic. The idea of making this extension to TeX was conceived during the early 1990s, when Jiří Zlatuška and Phil Taylor discussed some developmental ideas with Donald Knuth at Stanford University. Knuth later met Hàn Thế Thành in Brno during his visit to the Faculty of Informatics to receive an honorary doctorate from Masaryk University.
Two prominent characteristics of pdfTeX are character protrusion, which generalizes the concept of hanging punctuation, and font expansion, an implementation of Hermann Zapf's ideas for improving the grayness of a typeset page. Both extend the core paragraph breaking routine. They are discussed in Thành's PhD thesis.[2]
pdfTeX is included in most modern distributions of LaTeX and ConTeXt (including TeX Live, MacTeX, and MiKTeX)[3] and used as the default TeX engine.[4][5] The main difference between TeX and pdfTeX is that whereas TeX outputs DVI files, pdfTeX can output PDF files directly. This allows tight integration of PDF features such as hypertext links and tables of contents, using packages such as hyperref. On the other hand, packages (such as PSTricks) which exploit the earlier conversion process of DVI-to-PostScript may fail, although replacements such as PGF/TikZ have been written. Direct embedding of PostScript graphics is no longer functional, and one has to use a program such as eps2pdf to convert EPS files to PDF, which can then be directly inserted by pdfTeX.
It is possible to obtain DVI output from pdfTeX. This DVI output should be identical to that of TeX, unless pdfTeX's extra microtypography features have been activated. Moreover, since LaTeX, ConTeXt et al. are simply macro packages for TeX, they work equally well with pdfTeX. Hence, pdflatex, for example, calls the pdfTeX program using the standard LaTeX macros to typeset LaTeX documents, whereas it was the default rendering engine for ConTeXt documents. Current versions of ConTeXt use
LuaMetaTeX as default rendering engine.[6]
Features
[edit]pdfTeX has several features not available in standard TeX:
- Native TrueType and Type 1 font embedding
- Micro-typographic extensions such as margin kerning and font expansion
- Direct access to PDF-specific features such as hyperlinks, tables of contents and document information
See also
[edit]References
[edit]- ^ "NEWS".
- ^ "Micro-typographic extensions to the TEX typesetting system" (PDF). pragma-ade.com. October 2000. Retrieved 2025-01-09.
- ^ "TeX catalogue online". Archived from the original on 2013-09-04. Retrieved 2007-09-12.
- ^ "Documentation - TeX Live - TeX Users Group". www.tug.org. Retrieved 2020-11-14.
- ^ Christian Schenk : MiKTeX 2.5: pdfetex becomes default engine. dojo.miktex.org. Archived 2007-09-07 at the Wayback Machine
- ^ "LuaMetaTeX - README". GitHub. Retrieved 2024-08-18.
External links
[edit]- pdfTeX project page
- pdfTeX manual
- Micro-typographic extensions to the TeX typesetting system - dissertation by Hàn Thế Thành
- 2008 interview
PdfTeX
View on Grokipediatex2pdf to address the need for native PDF generation in TeX workflows.[3] The project evolved through community contributions, including from Sebastian Rahtz, Hans Hagen, and Hartmut Henkel, and was integrated into distributions like teTeX by Thomas Esser before becoming a standard component of TeX Live and MiKTeX.[3][1] Donald Knuth, the creator of TeX, endorsed pdfTeX, affirming its compatibility with the TeX ecosystem.[3]
Key features of pdfTeX include support for embedded Type 1 fonts, virtual fonts, hyperlinks, and compression algorithms such as LZW (later supplemented by ZIP), which improve PDF file efficiency.[3] It also incorporates micro-typographic extensions from Thành's PhD work, completed in 2001, enabling advanced line-breaking and font expansion for superior typesetting precision, often utilized via packages like microtype.[3][2] pdfTeX extends TeX's backend in C for performance, leveraging Web2C and Kpathsea libraries, and supports formats like Plain TeX and LaTeX.[3][1]
Maintained primarily by Hàn Thế Thành and the pdfTeX Team as part of the TeX Live releases, pdfTeX remains a foundational tool in the TeX community, widely used for high-quality document preparation, including multilingual support such as for Vietnamese through related projects like vnTeX.[1][3] Its source code is available under the GNU General Public License, with ongoing development hosted in the TeX Users Group repository.[1][2]
History
Conception and Early Development
The conception of pdfTeX emerged in the early 1990s amid the growing adoption of Adobe's Portable Document Format (PDF), which was first released in June 1993 as a standard for digital document distribution.[4] Jiří Zlatuška and Phil Taylor, in collaboration with Donald Knuth at Stanford University, discussed ideas for extending TeX to generate PDF output directly, aiming to address the limitations of TeX's traditional Device Independent (DVI) format and the reliance on external conversion tools such as dvipdfm.[3] This initiative sought to leverage PDF's advantages, including native support for font embedding and hyperlinks, thereby streamlining workflows for TeX users in an era of increasing digital publishing needs.[3] The primary development of pdfTeX was led by Hàn Thế Thành as part of his doctoral research at Masaryk University in Brno, Czech Republic, under the supervision of Jiří Zlatuška.[3] Beginning in 1994 during his Master's work, Thành initially explored rewriting TeX in a high-level language like Prolog but pivoted to integrating a PDF backend directly into TeX's typesetting engine.[3] This effort built on the foundational discussions with Knuth and others, focusing on bridging TeX's output constraints with PDF's capabilities for self-contained, portable documents without intermediate processing steps.[5] Early prototypes achieved initial success in 1994 by generating basic PDF files, such as a simple "Hello, world!" output, while Thành's Master's thesis completed in 1996 incorporated core PDF features like Type 1 font handling and hyperlink support.[3] The project culminated in Thành's 2000 PhD thesis, formally titled Micro-typographic extensions to the TeX typesetting system, which was published in May 2001 and detailed the integration of PDF generation as a foundational extension to TeX.[5][6] This academic work, supported by the TeX community including early contributions from Sebastian Rahtz in promotion and mailing list setup, established pdfTeX as a viable tool for direct PDF typesetting from TeX sources.[3]Major Releases
pdfTeX's first public release occurred in 1996, developed by Hàn Thế Thành, and introduced the capability to generate PDF output directly from TeX sources, marking a significant advancement in integrating TeX with the emerging PDF format. A major milestone came with the completion of Thành's PhD thesis on micro-typographic extensions in 2001, solidifying pdfTeX's foundation as a robust extension of TeX.[7] pdfTeX was integrated into TeX Live distributions starting around 2003, broadening its accessibility and adoption within the TeX community through standardized packaging and updates.[8] The engine evolved to support advanced PDF features by 2007, including enhancements like JBIG2 compression in version 1.40.0, in response to advancing PDF standards for better document portability and efficiency.[9] Microtypographic enhancements, such as improved font expansion and protrusion, were added in the early 2000s, as part of the 2001 PhD thesis, to refine typesetting quality. Version 1.40.0, released in 2007, introduced shell escape support for pipes and other primitives that facilitated compatibility with emerging engines like LuaTeX through later hooks and shared primitives.[9] Following Hàn Thế Thành's primary involvement, maintenance transitioned to a collaborative effort by the pdfTeX team, including Martin Schröder and others, under the oversight of the TeX Users Group (TUG), ensuring ongoing stability and compatibility.[10] The latest release as of November 2025 is version 1.40.28, issued around May 2025, which addressed bug fixes for font handling—particularly overlapping text issues with certain fonts like ptmr8r—and improved support for PDF/A compliance through enhanced output controls.[11]Technical Overview
Extension of TeX
pdfTeX serves as a direct derivative of Donald Knuth's original TeX typesetting engine, achieved by extending its WEB source code to incorporate PDF output capabilities while retaining compatibility with the conventional DVI format. This modification enables the engine to produce fully formed PDF files straight from TeX input files, eliminating the reliance on external tools like dvips for PostScript conversion or distillation to PDF. The extension maintains the foundational structure of TeX, ensuring that documents processed by pdfTeX remain fully interchangeable with those generated by the original engine in DVI mode.[5] A primary architectural change involves the introduction of specialized TeX primitives tailored for PDF handling, such as\pdfoutput to toggle between PDF (value 1) and DVI (value 0) output modes, and \pdfcompresslevel to adjust the compression intensity of PDF object streams from 0 (none) to 9 (maximum). Additional primitives like \pdfpagewidth, \pdfpageheight, \pdfhorigin, and \pdfvorigin allow precise control over page geometry and positioning in the PDF backend. These additions integrate seamlessly into TeX's macro language without disrupting its core typesetting mechanisms, preserving algorithms for paragraph breaking—based on Knuth's total-fit method—and mathematical formula rendering exactly as in the original TeX. This design ensures that pdfTeX processes standard TeX and LaTeX documents without requiring alterations to their source code for basic functionality.[5]
The engine upholds TeX's device-independent philosophy by transitioning the default output backend from DVI to PDF, which supports the direct embedding of page descriptions, fonts, and graphics as vector elements within the file structure. Unlike DVI's reliance on device drivers for final rendering, PDF's self-contained format allows for immediate, high-fidelity viewing and printing across platforms. In adapting TeX's box and glue model to this PDF context, pdfTeX accommodates scalable output resolutions through mechanisms like vector font embedding (e.g., Type 1 and TrueType formats) and primitives such as \pdfimageresolution for raster inclusions, enabling consistent quality at arbitrary scales without pixelation. This handling extends the flexibility of TeX's horizontal and vertical spacing glues to PDF's coordinate system, supporting precise positioning of typesetting elements in a resolution-independent manner.[5]
PDF Output Mechanism
pdfTeX generates PDF files directly from TeX source input by extending the core TeX engine to produce PDF objects and streams instead of the traditional DVI format. When the primitive\pdfoutput is set to a positive value, pdfTeX activates PDF mode, processing the TeX input through the standard typesetting pipeline while diverting output to PDF structures. This direct approach eliminates the need for intermediate DVI files and subsequent conversion tools like dvipdfm, enabling seamless integration of PDF-specific features during compilation.
The output pipeline begins with TeX's standard processing of input tokens into horizontal and vertical material, assembled into boxes via primitives like \hbox and \vbox. These boxes are accumulated until a \shipout command is invoked, typically at page breaks. In PDF mode, pdfTeX translates each shipout box into a PDF page object, defined as an indirect object with /Type /Page, which references a content stream for drawing operations and a resources dictionary for fonts, colors, and other elements. The content stream consists of PostScript-like PDF operators—such as q for saving graphics state, BT and ET for text blocks, Tf for font selection, Td for text positioning, and TJ for show text—generated from the box's internal representation of glyphs, rules, and images. This translation occurs via whatsit nodes inserted during TeX's horizontal and vertical list building, ensuring that PDF code is emitted at shipout time without altering TeX's core logic.[12]
pdfTeX constructs the PDF file structure on-the-fly, numbering objects sequentially (e.g., page objects as even-numbered indirect objects starting from 2) and writing them as direct or indirect entities. Content streams and other compressible elements, such as font dictionaries and image data, are generated as byte streams supporting Flate compression (zlib-based, levels 0-9 controlled by \pdfcompresslevel, default 9) for efficient file sizes; LZW compression is not natively supported in modern versions. Object references are managed using primitives like \pdfobj for creation and \pdfrefobj for indirect referencing, culminating in a cross-reference table (xref) at the file's end. This table lists byte offsets for each object, pointed to by a startxref keyword, enabling random access in PDF viewers. The trailer dictionary references the root catalog (object 1), which links to the page tree for navigation.[12][13]
Font embedding in pdfTeX ensures PDF portability by converting TeX's native Packed Font (PK) and virtual fonts into embeddable subsets of Type 1 PostScript or TrueType formats. During font loading, pdfTeX consults the pdftex.map file, which maps TeX font names (TFM) to physical font files, encodings, and subsetting instructions (e.g., cmr10 CMR10 <cmr10.pfb> for embedding a Type 1 subset). For PK bitmap fonts generated from Metafont, pdfTeX embeds them at a default resolution (e.g., 600 dpi, adjustable via \pdfpkresolution), but prefers vector outlines: Type 1 fonts are embedded directly if available (often via tools like mf2pt1 for outline conversion), while TrueType fonts support native embedding with glyph subsetting to include only used characters. Glyph outlining for compatibility is handled by including only necessary paths in the font dictionary, referenced in page resources via /Font arrays, preventing rasterization in viewers.[12][13]
pdfTeX supports PDF versions up to 1.7 in recent releases, set via \pdfmajorversion (default 1) and \pdfminorversion (default 7 in TeX Live 2025), allowing features like transparency and JPEG2000 images while maintaining backward compatibility. Optimizations include object compression (\pdfobjcompresslevel, 0-3) to reduce file size by Flate-encoding dictionary streams, though primary use remains full-document generation for efficiency.[13][14][12]
Features
Microtypography
pdfTeX introduces advanced microtypographic features that refine typesetting for improved optical margins and readability, extending beyond standard TeX capabilities by incorporating subtle adjustments to character positioning and font scaling. These enhancements, developed by Han The Thanh, draw inspiration from the hz-program created by typographer Hermann Zapf and digital font expert Peter Karow, which emphasized algorithmic justification through optical testing and discrete manipulations.[5] Character protrusion in pdfTeX generalizes the concept of hanging punctuation, allowing glyphs such as quotation marks, parentheses, hyphens, and periods to extend slightly into the left or right margins based on predefined font metrics. This adjustment creates visually straighter text edges without altering the overall line length, particularly beneficial for ragged-right or justified alignments. The feature is controlled by the primitive\pdfprotrudechars, where a value of 0 or negative disables it, 1 enables basic protrusion without affecting line breaking, and values of 2 or higher integrate it into the paragraph-breaking algorithm for optimal results. Protrusion factors are specified using \lpcode for left margins and \rpcode for right margins, with values in thousandths of an em unit (e.g., \rpcode<FontId><Character>=700 shifts a hyphen 70% of its width into the right margin). These factors are derived from extensive optical tests involving hundreds of participants, ensuring perceptual evenness; for instance, opening quotes often receive a left protrusion of around 700 units, while closing quotes get 500 units on the right.[15][5]
Font expansion enables algorithmic stretching or shrinking of entire fonts in small increments to achieve more even justification, thereby reducing the frequency of hyphenation and improving interword spacing uniformity. This is activated via \pdfadjustspacing, with values analogous to protrusion levels: 0 disables it, 1 applies post-line-breaking adjustments, and ≥2 incorporates it during breaking to minimize badness penalties. The primitive \pdffontexpand defines the parameters for a font, including maximum expansion (e.g., 20 thousandths or 2%), maximum condensation (e.g., 10 thousandths), step size (e.g., 5 thousandths), and a scaling factor (typically 1000); this results in discrete variants, such as 20 levels spanning from 0.97 to 1.03 em-widths for fine-grained control. Expansion is supported for scalable fonts like those from METAFONT, Type 1, or Multiple Masters, where widths are adjusted via font matrix scaling or axis interpolation, and character-specific limits can be set with \efcode (default 1000 for full participation). Inspired by Zapf and Karow's work on variance limits, pdfTeX's implementation caps changes at around 3% to avoid noticeable distortion, prioritizing aesthetic balance over exact metrics.[15][5]
Margin kerning in pdfTeX fine-tunes inter-glyph spacing specifically at line beginnings and ends to better align with page edges, enhancing the overall ragged or justified text aesthetics by optically straightening margins. This builds on protrusion by applying targeted kerns between the last (or first) character and the margin, accounting for shapes like serifs or counters in letters such as 'A' or 'V' that create uneven visual boundaries. The algorithm operates in two modes similar to protrusion and expansion, using dynamic programming to evaluate breakpoints with adjusted badness scores that include excess width from kerning (e.g., badness ≈ 100ρ³, where ρ incorporates kerning contributions). Factors are calibrated through optical tests, with discrete steps of 100 units or finer, ensuring adjustments remain imperceptible yet effective for readability; for example, a slight inward kern on protruding elements prevents overextension. These kerning values, like protrusion, stem from Zapf and Karow's empirical methods, validated in pdfTeX through user studies at TeX user group meetings.[5][15]
Implementation details in pdfTeX emphasize efficiency and compatibility, with protrusion and kerning factors stored in font metric files (e.g., TFM extensions) for quick access during typesetting, and expansion generating on-the-fly variants without full font regeneration. The system uses TeX's existing horizontal list processing, inserting kerns or selecting expanded glyphs post- or during paragraph formation, with a total algorithmic complexity of O(n²) or optimized O(mn) via dynamic programming for breakpoint selection. Optimal settings, such as 3% maximum manipulation, were refined from optical tests with 300 participants across NTG, DANTE, and GUST events, confirming reduced hyphenation needs and straighter margins without compromising legibility.[5]
PDF-Specific Capabilities
pdfTeX introduces a suite of primitives that enable the creation of interactive PDF elements directly within TeX documents, surpassing the limitations of traditional DVI-to-PostScript workflows. These features allow users to embed hyperlinks, annotations, and navigational structures, making documents more dynamic and user-friendly for digital viewing. By integrating these capabilities at the engine level, pdfTeX ensures that PDFs are self-contained and portable across PDF viewers without requiring additional post-processing tools.[16] For hyperlink and annotation support, pdfTeX provides primitives such as\pdfdest to define named destinations within the document, which serve as targets for internal navigation. Users can create clickable links using \pdfstartlink and \pdfendlink, specifying actions like "goto" for jumping to destinations or "uri" for external web addresses, along with attributes for borders and colors to customize appearance. These primitives also support annotations, including notes and highlights, through link actions that can trigger PDF annotation objects, thereby facilitating threaded comments and interactive feedback in documents. Additionally, bookmarks are generated via \pdfoutline, which builds a hierarchical outline tree for quick access to sections, enhancing document navigability in PDF readers.[16]
Metadata embedding is handled through commands like \pdfinfo, which populates the PDF's Info dictionary with fields such as title, author, subject, keywords, creation date, and producer information. The \pdfcatalog primitive further allows customization of the document catalog, including compliance flags for standards like PDF/A by setting entries for viewer preferences and open actions. This ensures that essential document metadata is preserved and searchable, promoting long-term archival and interoperability. For outline and table of contents generation, \pdfoutline integrates seamlessly with TeX's built-in \tableofcontents by assigning outline entries to section headings, automatically creating a navigable structure that reflects the document's hierarchy without manual intervention.[16]
In terms of advanced graphics, pdfTeX supports transparency via the alpha channel in PNG images and direct PDF literals using \pdfliteral to apply blending modes and opacity to vector elements. Shading patterns, such as axial or radial gradients, are implemented through \pdfliteral for embedding raw PDF shading dictionaries, enabling sophisticated visual effects like smooth color transitions. Embedded images are managed with \pdfximage, which incorporates formats like JPEG, PNG, JBIG2, and even PDF pages directly into the output, with options for scaling and resolution to maintain quality while bypassing PostScript distillation. These graphics features, combined with native font embedding during PDF generation, contribute to compact, high-fidelity documents that retain visual integrity across platforms.[16]
Usage and Integration
Command-Line Usage
pdfTeX is invoked from the command line using thepdftex executable, which processes a TeX source file and generates a PDF output directly. The basic command is pdftex filename.tex, where filename.tex is the input TeX file; if the extension is omitted, pdfTeX assumes .tex. This compiles the source to filename.pdf by default, along with auxiliary files.[13]
Several common flags allow customization of the processing behavior. The -output-directory=DIR option specifies a directory for writing output files such as the PDF and logs, enabling organized file management in multi-file projects. For error handling, -interaction=mode sets the interaction level, with modes including batchmode for non-interactive runs, nonstopmode to continue past errors, scrollmode for partial feedback, and errorstopmode to halt at errors. The -jobname=STRING flag customizes the base name for output files, overriding the input filename. Additionally, -halt-on-error enforces strict execution by stopping at the first error encountered.[13]
pdfTeX handles inputs through standard TeX mechanisms, such as the \input{file} primitive, which incorporates additional TeX files during processing without requiring command-line specification. Standard outputs include the primary .pdf file, a .log file detailing the compilation process, and a .aux file for auxiliary data used in multi-pass compilations.[13]
Error diagnostics in pdfTeX feature extended logging tailored to PDF generation, with the .log file capturing details on issues like font subsetting failures, where atypical fonts may not embed properly. This logging aids troubleshooting by reporting warnings and errors specific to PDF output, such as embedding constraints or subsetting limitations.[13]
