Fat binary
View on WikipediaA fat binary (or multiarchitecture binary) is a computer executable program or library which has been expanded (or "fattened") with code native to multiple instruction sets which can consequently be run on multiple processor types.[1] This results in a file larger than a normal one-architecture binary file, thus the name.
The usual method of implementation is to include a version of the machine code for each instruction set, preceded by a single entry point with code compatible with all operating systems, which executes a jump to the appropriate section. Alternative implementations store different executables in different forks, each with its own entry point that is directly used by the operating system.
The use of fat binaries is not common in operating system software; there are several alternatives to solve the same problem, such as the use of an installer program to choose an architecture-specific binary at install time (such as with Android multiple APKs), selecting an architecture-specific binary at runtime (such as with Plan 9's union directories and GNUstep's fat bundles),[2][3] distributing software in source code form and compiling it in-place, or the use of a virtual machine (such as with Java) and just-in-time compilation.
Apollo
[edit]Apollo's compound executables
[edit]In 1988, Apollo Computer's Domain/OS SR10.1 introduced a new file type, "cmpexe" (compound executable), that bundled binaries for Motorola 680x0 and Apollo PRISM executables.[4]
Apple
[edit]Apple's fat binary
[edit]A fat-binary scheme smoothed the Apple Macintosh's transition, beginning in 1994, from 68k microprocessors to PowerPC microprocessors. Many applications for the old platform ran transparently on the new platform under an evolving emulation scheme, but emulated code generally runs slower than native code. Applications released as "fat binaries" took up more storage space, but they ran at full speed on either platform. This was achieved by packaging both a 68000-compiled version and a PowerPC-compiled version of the same program into their executable files.[5][6] The older 68K code (CFM-68K or classic 68K) continued to be stored in the resource fork, while the newer PowerPC code was contained in the data fork, in PEF format.[7][8][9]
Fat binaries were larger than programs supporting only the PowerPC or 68k, which led to the creation of a number of utilities that would strip out the unneeded version.[5][6] In the era of small hard drives, when 80 MB hard drives were a common size, these utilities were sometimes useful, as program code was generally a large percentage of overall drive usage, and stripping the unneeded members of a fat binary would free up a significant amount of space on a hard drive.
NeXT's/Apple's multi-architecture binaries
[edit]NeXTSTEP Multi-Architecture Binaries
[edit]Fat binaries were a feature of NeXT's NeXTSTEP/OPENSTEP operating system, starting with NeXTSTEP 3.1. In NeXTSTEP, they were called "Multi-Architecture Binaries". Multi-Architecture Binaries were originally intended to allow software to be compiled to run both on NeXT's Motorola 68k-based hardware and on Intel IA-32-based PCs running NeXTSTEP, with a single binary file for both platforms.[10] It was later used to allow OPENSTEP applications to run on PCs and the various RISC platforms OPENSTEP supported. Multi-Architecture Binary files are in a special archive format, in which a single file stores one or more Mach-O subfiles for each architecture supported by the Multi-Architecture Binary. Every Multi-Architecture Binary starts with a structure (struct fat_header) containing two unsigned integers. The first integer ("magic") is used as a magic number to identify this file as a Fat Binary. The second integer (nfat_arch) defines how many Mach-O Files the archive contains (how many instances of the same program for different architectures). After this header, there are nfat_arch number of fat_arch structures (struct fat_arch). This structure defines the offset (from the start of the file) at which to find the file, the alignment, the size and the CPU type and subtype which the Mach-O binary (within the archive) is targeted at.
The version of the GNU Compiler Collection shipped with the Developer Tools was able to cross-compile source code for the different architectures on which NeXTStep was able to run. For example, it was possible to choose the target architectures with multiple '-arch' options (with the architecture as argument). This was a convenient way to distribute a program for NeXTStep running on different architectures.
It was also possible to create libraries (e.g. using NeXTStep's libtool) with different targeted object files.
Mach-O and Mac OS X
[edit]Apple Computer acquired NeXT in 1996 and continued to work with the OPENSTEP code. Mach-O became the native object file format in Apple's free Darwin operating system (2000) and Apple's Mac OS X (2001), and NeXT's Multi-Architecture Binaries continued to be supported by the operating system. Under Mac OS X, Multi-Architecture Binaries can be used to support multiple variants of an architecture, for instance to have different versions of 32-bit code optimized for the PowerPC G3, PowerPC G4, and PowerPC 970 generations of processors. It can also be used to support multiple architectures, such as 32-bit and 64-bit PowerPC, or PowerPC and x86, or x86-64 and ARM64.[11]
Apple's Universal binary
[edit]
In 2005, Apple announced another transition, from PowerPC processors to Intel x86 processors. Apple promoted the distribution of new applications that support both PowerPC and x86 natively by using executable files in Multi-Architecture Binary format.[12] Apple calls such programs "Universal applications" and calls the file format "Universal binary" as perhaps a way to distinguish this new transition from the previous transition, or other uses of Multi-Architecture Binary format.
Universal binary format was not necessary for forward migration of pre-existing native PowerPC applications; from 2006 to 2011, Apple supplied Rosetta, a PowerPC (PPC)-to-x86 dynamic binary translator, to play this role. However, Rosetta had a fairly steep performance overhead, so developers were encouraged to offer both PPC and Intel binaries, using Universal binaries. The obvious cost of Universal binary is that every installed executable file is larger, but in the years since the release of the PPC, hard-drive space has greatly outstripped executable size; while a Universal binary might be double the size of a single-platform version of the same application, free-space resources generally dwarf the code size, which becomes a minor issue. In fact, often a Universal-binary application will be smaller than two single-architecture applications because program resources can be shared rather than duplicated. If not all of the architectures are required, the lipo and ditto command-line applications can be used to remove versions from the Multi-Architecture Binary image, thereby creating what is sometimes called a thin binary.
In addition, Multi-Architecture Binary executables can contain code for both 32-bit and 64-bit versions of PowerPC and x86, allowing applications to be shipped in a form that supports 32-bit processors but that makes use of the larger address space and wider data paths when run on 64-bit processors.
In versions of the Xcode development environment from 2.1 through 3.2 (running on Mac OS X 10.4 through Mac OS X 10.6), Apple included utilities which allowed applications to be targeted for both Intel and PowerPC architecture; universal binaries could eventually contain up to four versions of the executable code (32-bit PowerPC, 32-bit x86, 64-bit PowerPC, and 64-bit x86). However, PowerPC support was removed from Xcode 4.0 and is therefore not available to developers running Mac OS X 10.7 or greater.
In 2020, Apple announced another transition, this time from Intel x86 processors to Apple silicon (ARM64 architecture). To smooth the transition Apple added support for the Universal 2 binary format; Universal 2 binary files are Multi-Architecture Binary files containing both x86-64 and ARM64 executable code, allowing the binary to run natively on both 64-bit Intel and 64-bit Apple silicon. Additionally, Apple introduced Rosetta 2 dynamic binary translation for x86 to Arm64 instruction set to allow users to run applications that do not have Universal binary variants.
Apple Fat EFI binary
[edit]In 2006, Apple switched from PowerPC to Intel CPUs, and replaced Open Firmware with EFI. However, by 2008, some of their Macs used 32-bit EFI and some used 64-bit EFI. For this reason, Apple extended the EFI specification with "fat" binaries that contained both 32-bit and 64-bit EFI binaries.[13]
CP/M and DOS
[edit]Combined COM-style binaries for CP/M-80 and DOS
[edit]CP/M-80, MP/M-80, Concurrent CP/M, CP/M Plus, Personal CP/M-80, SCP and MSX-DOS executables for the Intel 8080 (and Zilog Z80) processor families use the same .COM file extension as DOS-compatible operating systems for Intel 8086 binaries.[nb 1] In both cases programs are loaded at offset +100h and executed by jumping to the first byte in the file.[14][15] As the opcodes of the two processor families are not compatible, attempting to start a program under the wrong operating system leads to incorrect and unpredictable behaviour.
In order to avoid this, some methods have been devised to build fat binaries which contain both a CP/M-80 and a DOS program, preceded by initial code which is interpreted correctly on both platforms.[15] The methods either combine two fully functional programs each built for their corresponding environment, or add stubs which cause the program to exit gracefully if started on the wrong processor. For this to work, the first few instructions (sometimes also called gadget headers[16]) in the .COM file have to be valid code for both 8086 and 8080 processors, which would cause the processors to branch into different locations within the code.[16] For example, the utilities in Simeon Cran's emulator MyZ80 start with the opcode sequence EBh, 52h, EBh.[17][18] An 8086 sees this as a jump and reads its next instruction from offset +154h whereas an 8080 or compatible processor goes straight through and reads its next instruction from +103h. A similar sequence used for this purpose is EBh, 03h, C3h.[19][20] John C. Elliott's FATBIN[21][22][23] is a utility to combine a CP/M-80 and a DOS .COM file into one executable.[17][24] His derivative of the original PMsfx modifies archives created by Yoshihiko Mino's PMarc to be self-extractable under both, CP/M-80 and DOS, starting with EBh, 18h, 2Dh, 70h, 6Dh, 73h, 2Dh to also include the "-pms-" signature for self-extracting PMA archives,[25][17][24][18] thereby also representing a form of executable ASCII code.
Another method to keep a DOS-compatible operating system from erroneously executing .COM programs for CP/M-80 and MSX-DOS machines[15] is to start the 8080 code with C3h, 03h, 01h, which is decoded as a "RET" instruction by x86 processors, thereby gracefully exiting the program,[nb 2] while it will be decoded as "JP 103h" instruction by 8080 processors and simply jump to the next instruction in the program. Similar, the CP/M assembler Z80ASM+ by SLR Systems would display an error message when erroneously run on DOS.[17]
Some CP/M-80 3.0 .COM files may have one or more RSX overlays attached to them by GENCOM.[26] If so, they start with an extra 256-byte header (one page). In order to indicate this, the first byte in the header is set to magic byte C9h, which works both as a signature identifying this type of COM file to the CP/M 3.0 executable loader, as well as a "RET" instruction for 8080-compatible processors which leads to a graceful exit if the file is executed under older versions of CP/M-80.[nb 2]
C9h is never appropriate as the first byte of a program for any x86 processor (it has different meanings for different generations,[nb 3] but is never a meaningful first byte); the executable loader in some versions of DOS rejects COM files that start with C9h, avoiding incorrect operation.
Similar overlapping code sequences have also been devised for combined Z80/6502,[17] 8086/68000[17] or x86/MIPS/ARM binaries.[16]
Combined binaries for CP/M-86 and DOS
[edit]CP/M-86 and DOS do not share a common file extension for executables.[nb 1] Thus, it is not normally possible to confuse executables. However, early versions of DOS had so much in common with CP/M in terms of its architecture that some early DOS programs were developed to share binaries containing executable code. One program known to do this was WordStar 3.2x, which used identical overlay files in their ports for CP/M-86 and MS-DOS,[27] and used dynamically fixed-up code to adapt to the differing calling conventions of these operating systems at runtime.[27]
Digital Research's GSX for CP/M-86 and DOS also shares binary identical 16-bit drivers.[28]
Combined COM and SYS files
[edit]DOS device drivers (typically with file extension .SYS) start with a file header whose first four bytes are FFFFFFFFh by convention, although this is not a requirement.[29] This is fixed up dynamically by the operating system when the driver loads (typically in the DOS BIOS when it executes DEVICE statements in CONFIG.SYS). Since DOS does not reject files with a .COM extension to be loaded per DEVICE and does not test for FFFFFFFFh, it is possible to combine a COM program and a device driver into the same file[30][29] by placing a jump instruction to the entry point of the embedded COM program within the first four bytes of the file (three bytes are usually sufficient).[29] If the embedded program and the device driver sections share a common portion of code, or data, it is necessary for the code to deal with being loaded at offset +0100h as a .COM style program, and at +0000h as a device driver.[30] For shared code loaded at the "wrong" offset but not designed to be position-independent, this requires an internal address fix-up[30] similar to what would otherwise already have been carried out by a relocating loader, except for that in this case it has to be done by the loaded program itself; this is similar to the situation with self-relocating drivers but with the program already loaded at the target location by the operating system's loader.
Crash-protected system files
[edit]Under DOS, some files, by convention, have file extensions which do not reflect their actual file type.[nb 4] For example, COUNTRY.SYS[31] is not a DOS device driver,[nb 5] but a binary NLS database file for use with the CONFIG.SYS COUNTRY directive and the NLSFUNC driver.[31] Likewise, the PC DOS and DR-DOS system files IBMBIO.COM and IBMDOS.COM are special binary images loaded by bootstrap loaders, not COM-style programs.[nb 5] Trying to load COUNTRY.SYS with a DEVICE statement or executing IBMBIO.COM or IBMDOS.COM at the command prompt will cause unpredictable results.[nb 4][nb 6]
It is sometimes possible to avoid this by utilizing techniques similar to those described above. For example, DR-DOS 7.02 and higher incorporate a safety feature developed by Matthias R. Paul:[32] If these files are called inappropriately, tiny embedded stubs will just display some file version information and exit gracefully.[33][32][34][31] Additionally, the message is specifically crafted to follow certain "magic" patterns recognized by the external NetWare & DR-DOS VERSION file identification utility.[31][32][nb 7]
A similar protection feature was the 8080 instruction C7h ("RST 0") at the very start of Jay Sage's and Joe Wright's Z-System type-3 and type-4 "Z3ENV" programs[35][36] as well as "Z3TXT" language overlay files,[37] which would result in a warm boot (instead of a crash) under CP/M-80 if loaded inappropriately.[35][36][37][nb 2]
In a distantly similar fashion, many (binary) file formats by convention include a 1Ah byte (ASCII ^Z) near the beginning of the file. This control character will be interpreted as "soft" end-of-file (EOF) marker when a file is opened in non-binary mode, and thus, under many operating systems (including the PDP-6 monitor[38] and RT-11, VMS, TOPS-10,[39] CP/M,[40][41] DOS,[42] and Windows[43]), it prevents "binary garbage" from being displayed when a file is accidentally printed at the console.
Linux
[edit]FatELF: Universal binaries for Linux
[edit]
FatELF[44] was a fat binary implementation for Linux and other Unix-like operating systems. Technically, a FatELF binary was a concatenation of ELF binaries with some meta data indicating which binary to use on what architecture.[45] Additionally to the CPU architecture abstraction (byte order, word size, CPU instruction set, etc.), there is the advantage of binaries with support for multiple kernel ABIs and versions.
FatELF has several use-cases, according to developers:[44]
- Distributions no longer need to have separate downloads for various platforms.
- Separated /lib, /lib32 and /lib64 trees are not required anymore in OS directory structure.
- The correct binary and libraries are centrally chosen by the system instead of shell scripts.
- If the ELF ABI changes someday, legacy users can be still supported.
- Distribution of web browser plug ins that work out of the box with multiple platforms.
- Distribution of one application file that works across Linux and BSD OS variants, without a platform compatibility layer on them.
- One hard drive partition can be booted on different machines with different CPU architectures, for development and experimentation. Same root file system, different kernel and CPU architecture.
- Applications provided by network share or USB sticks, will work on multiple systems. This is also helpful for creating portable applications and also cloud computing images for heterogeneous systems.[46]
A proof-of-concept Ubuntu 9.04 image is available.[47] As of 2021[update], FatELF has not been integrated into the mainline Linux kernel.[citation needed][48][49]
Windows
[edit]Fatpack
[edit]Although the Portable Executable format used by Windows does not allow assigning code to platforms, it is still possible to make a loader program that dispatches based on architecture. This is because desktop versions of Windows on ARM have support for 32-bit x86 emulation, making it a useful "universal" machine code target. Fatpack is a loader that demonstrates the concept: it includes a 32-bit x86 program that tries to run the executables packed into its resource sections one by one.[50]
Arm64X
[edit]When developing Windows 11 ARM64, Microsoft introduced a new way to extend the Portable Executable format called Arm64X.[51] An Arm64X binary contains all the content that would be in separate x64/Arm64EC and Arm64 binaries, but merged into one more efficient file on disk. Visual C++ toolset has been upgraded to support producing such binaries. And when building Arm64X binaries are technically difficult, developers can build Arm64X pure forwarder DLLs instead.[52]
Similar concepts
[edit]The following approaches are similar to fat binaries in that multiple versions of machine code of the same purpose are provided in the same file.
Heterogeneous computing
[edit]Since 2007, some specialized compilers for heterogeneous platforms produce code files for parallel execution on multiple types of processors, i.e. the CHI (C for Heterogeneous Integration) compiler from the Intel EXOCHI (Exoskeleton Sequencer) development suite extends the OpenMP pragma concept for multithreading to produce fat binaries containing code sections for different instruction set architectures (ISAs) from which the runtime loader can dynamically initiate the parallel execution on multiple available CPU and GPU cores in a heterogeneous system environment.[53][54]
Introduced in 2006, Nvidia's parallel computing platform CUDA (Compute Unified Device Architecture) is a software to enable general-purpose computing on GPUs (GPGPU). Its LLVM-based compiler NVCC can create ELF-based fat binaries containing so called PTX virtual assembly (as text) which the CUDA runtime driver can later just-in-time compile into some SASS (Streaming Assembler) binary executable code for the actually present target GPU. The executables can also include so called CUDA binaries (aka cubin files) containing dedicated executable code sections for one or more specific GPU architectures from which the CUDA runtime can choose from at load-time.[55][56][57][58][59][60] Fat binaries are also supported by GPGPU-Sim, a GPU simulator introduced in 2007 as well.[61][62]
Multi2Sim (M2S), an OpenCL heterogeneous system simulator framework (originally only for either MIPS or x86 CPUs, but later extended to also support ARM CPUs and GPUs like the AMD/ATI Evergreen & Southern Islands as well as Nvidia Fermi & Kepler families)[63] supports ELF-based fat binaries as well.[64][63]
Fat objects
[edit]GNU Compiler Collection (GCC) and LLVM do not have a fat binary format, but they do have fat object files for link-time optimization (LTO). Since LTO involves delaying the compilation to link-time, the object files must store the intermediate representation (IR), but on the other hand machine code may need to be stored too (for speed or compatibility). An LTO object containing both IR and machine code is known as a fat object.[65]
Function multi-versioning
[edit]Even in a program or library intended for the same instruction set architecture, a programmer may wish to make use of some newer instruction set extensions while keeping compatibility with an older CPU. This can be achieved with function multi-versioning (FMV): versions of the same function are written into the program, and a piece of code decides which one to use by detecting the CPU's capabilities (such as through CPUID). Intel C++ Compiler, GCC, and LLVM all have the ability to automatically generate multi-versioned functions.[66] This is a form of dynamic dispatch without any semantic effects.
Many math libraries feature hand-written assembly routines that are automatically chosen according to CPU capability. Examples include glibc, Intel MKL, and OpenBLAS. In addition, the library loader in glibc supports loading from alternative paths for specific CPU features.[67]
A similar, but byte-level granular approach originally devised by Matthias R. Paul and Axel C. Frinke is to let a small self-discarding, relaxing and relocating loader embedded into the executable file alongside any number of alternative binary code snippets conditionally build a size- or speed-optimized runtime image of a program or driver necessary to perform (or not perform) a particular function in a particular target environment at load-time through a form of dynamic dead code elimination (DDCE).[68][69][70][71]
See also
[edit]- Cross-platform software
- DOS stub
- Fat pointer
- Linear Executable (LX)
- New Executable (NE)
- Portable Executable (PE)
- Position-independent code (PIC)
- Side effect
- Universal hex format, a "fat" hex file format targeting multiple platforms
- Alphanumeric executable, executable code camouflaged as (sometimes even readable) text
- Multi-architecture shellcode, shellcode targeting multiple platforms (and sometimes even camouflaged as alphanumeric text)
Notes
[edit]- ^ a b This isn't a problem for CP/M-86 style executables under CP/M-86, CP/M-86 Plus, Personal CP/M-86, S5-DOS, Concurrent CP/M-86, Concurrent DOS, Concurrent DOS 286, FlexOS, Concurrent DOS 386, DOS Plus, Multiuser DOS, System Manager and REAL/32 because they use the file extension .CMD rather than .COM for these files. (The .CMD extension, however, is conflictive with the file extension for batchjobs written for the command line processor CMD.EXE under the OS/2 and Windows NT operating system families.)
- ^ a b c This works because a (suitable) return instruction can be used to exit programs under CP/M-80, CP/M-86 and DOS, although the opcodes, exact conditions and underlying mechanisms differ: Under CP/M-80, programs can terminate (that is, warm boot into the BIOS) by jumping to 0 in the zero page, either directly with RST 0 (8080/8085/Z80 opcode C7h), or by calling BDOS function 0 through the CALL 5 interface. Alternatively, as the stack is prepared to hold a 0 return address before passing control to a loaded program, they can, for as long as the stack is flat, also be exited by issuing a RET (opcode C9h) instruction, thereby falling into the terminating code at offset 0 in the zero page. Although DOS has a dedicated INT 20h interrupt as well as INT 21h API sub-functions to terminate programs (which are preferable for more complicated programs), for machine-translated programs DOS also emulates CP/M's behaviour to some extent: A program can terminate itself by jumping to offset 0 in its PSP (the equivalent to CP/M's zero page), where the system had previously planted an INT 20h instruction. Also, a loaded program's initial stack is prepared to hold a word of 0, so that a program issuing a near return RETN (8088/8086 opcode C3h) will implicitly jump to the start of its code segment as well, thereby eventually reaching the INT 20h as well.[a] In CP/M-86, the zero page is structured differently and there is no CALL 5 interface, but the stack return method and BDOS function 0 (but now through INT E0h) both work as well.
- ^ On 8088/8086 processors, the opcode C9h is an undocumented alias for CBh ("RETF", popping CS:IP from the stack), whereas it decodes as "LEAVE" (set SP to BP and pop BP) on 80188/80186 and newer processors.
- ^ a b This problem could have been avoided by choosing non-conflicting file extensions, but, once introduced, these particular file names were retained from very early versions of MS-DOS/PC DOS for compatibility reasons with (third-party) tools hard-wired to expect these specific file names.
- ^ a b Other DOS files of this type are KEYBOARD.SYS, a binary keyboard layout database file for the keyboard driver KEYB under MS-DOS and PC DOS, IO.SYS containing the DOS BIOS under MS-DOS, and MSDOS.SYS, a text configuration file under Windows 95/MS-DOS 7.0 and higher, but originally a binary system file containing the MS-DOS kernel. However, MS-DOS and PC DOS do not provide crash-protected system files at all, and these file names are neither used nor needed in DR-DOS 7.02 and higher, which otherwise does provide crash-protected system files.
- ^ This is the reason why these files have the hidden attribute set, so that they are not listed by default, thereby reducing the risk of being invoked accidentally.
- ^ The
COUNTRY.SYSfile formats supported by the MS-DOS/PC DOS and the DR-DOS families of operating systems contain similar data but are organized differently and incompatible. Since the entry points into the data structures are at different offsets in the file it is possible to create "fat"COUNTRY.SYSdatabases, which could be used under both DOS families.[b] However, DR-DOS 7.02 and its NLSFUNC 4.00 (and higher) include an improved parser capable of reading both types of formats (and variants), even at the same time, so that Janus-headed files are not necessary.[c][d] The shipped files are nevertheless "fat" for including a tiny executable stub just displaying an embedded message when invoked inappropriately.[d][b]
References
[edit]- ^ Devanbu, Premkumar T.; Fong, Philip W. L.; Stubblebine, Stuart G. (19–25 April 1998). "Techniques for trusted software engineering" (PDF). Proceedings of the 20th International Conference on Software Engineering. IEEE. pp. 126–135 [131]. doi:10.1109/ICSE.1998.671109. ISBN 0-8186-8368-6. ISSN 0270-5257. Archived (PDF) from the original on 2014-01-16. Retrieved 2021-09-29. (10 pages)
- ^ Pero, Nicola (2008-12-18). "gnustep/tools-make: README.Packaging". GitHub. Archived from the original on 2022-05-25. Retrieved 2022-05-26.
- ^ "PackagingDrafts/GNUstep". Fedora Project Wiki. 2009-02-25. Archived from the original on 2022-05-25. Retrieved 2022-05-26.
- ^ "Domain System Software Release Notes, Software Release 10.1" (PDF) (first printing ed.). Chelmsford, Massachusetts, USA: Apollo Computer Inc. December 1988. p. 2-16. Order No. 005809-A03. Archived (PDF) from the original on 2023-05-26. Retrieved 2022-07-24. (256 pages)
- ^ a b Engst, Adam C. (1994-08-22). "Should Fat Binaries Diet?". TidBITS. No. 240. TidBITS Publishing Inc. ISSN 1090-7017. Archived from the original on 2021-09-29. Retrieved 2021-09-29.
- ^ a b Engst, Adam C. (1994-08-29). "Fat Binary Comments". TidBITS. No. 241. TidBITS Publishing Inc. ISSN 1090-7017. Archived from the original on 2021-09-29. Retrieved 2021-09-29.
- ^ "Chapter 1 - Resource Manager / Resource Manager Reference - Resource File Format". Inside Macintosh: Mac OS Runtime Architectures. Apple Computer. 1996-07-06. Archived from the original on 2021-09-29. Retrieved 2021-09-29.
- ^ "Chapter 7 - Fat Binary Programs - Creating Fat Binary Programs". Inside Macintosh: Mac OS Runtime Architectures. Apple Computer. 1997-03-11. Archived from the original on 2021-09-29. Retrieved 2011-06-20. [1]
- ^ "Chapter 8 - PEF Structure". Inside Macintosh: Mac OS Runtime Architectures. Apple Computer. 1997-03-11. Archived from the original on 2021-09-29. Retrieved 2021-09-29.
- ^ Tevanian, Avadis; DeMoney, Michael; Enderby, Kevin; Wiebe, Douglas; Snyder, Garth (1995-07-11) [1993-08-20]. "Method and apparatus for architecture independent executable files" (PDF). Redwood City, California, USA: NeXT Computer, Inc. US patent 5432937A. Archived (PDF) from the original on 2020-12-14. Retrieved 2022-05-26. [2] (9 pages); Tevanian, Avadis; DeMoney, Michael; Enderby, Kevin; Wiebe, Douglas; Snyder, Garth (1997-02-18) [1995-02-28]. "Method and apparatus for architecture independent executable files" (PDF). Redwood City, California, USA: NeXT Computer, Inc. US patent 5604905A. Archived (PDF) from the original on 2022-05-26. Retrieved 2022-05-26. (9 pages)
- ^ "Universal Binaries and 32-bit/64-bit PowerPC Binaries". Mac OS X ABI Mach-O File Format Reference. Apple Inc. 2009-02-04 [2003]. Archived from the original on 2012-04-27.
- ^ Singh, Amit (2006-06-19). "2.6.2 Fat Binaries". Mac OS X Internals - A Systems Approach. Pearson Education. p. 66. ISBN 978-0-13270226-3. Retrieved 2021-09-28.
- ^ "rEFIt - EFI Fat Binaries". refit.sourceforge.net. Retrieved 2022-10-18.
- ^ Paul, Matthias R. (2002-10-07) [2000]. "Re: Run a COM file". Newsgroup: alt.msdos.programmer. Archived from the original on 2017-09-03. Retrieved 2017-09-03. [3] (NB. Has details on DOS COM program calling conventions.)
- ^ a b c Wilkinson, William "Bill" Albert (2005-04-02) [2003, 1999-02-16, February 1987, 1986-11-15, 1986-11-10]. Written at Heath Company, USA. "Something COMmon About MS-DOS and CP/M". REMark. Vol. 8, no. 2. St. Joseph, Michigan, USA: Heath/Zenith Users' Group (HUG). pp. 55–57. #85. P/N 885-2085. Archived from the original on 2021-12-13. [4]
- ^ a b c Cha, Sang Kil; Pak, Brian; Brumley, David; Lipton, Richard Jay (2010-10-08) [2010-10-04]. Platform-Independent Programs (PDF). Proceedings of the 17th ACM conference on Computer and Communications Security (CCS'10). Chicago, Illinois, USA: Carnegie Mellon University, Pittsburgh, Pennsylvania, USA / Georgia Institute of Technology, Atlanta, Georgia, USA. pp. 547–558. doi:10.1145/1866307.1866369. ISBN 978-1-4503-0244-9. Archived (PDF) from the original on 2022-05-26. Retrieved 2022-05-26. [5] (12 pages) (See also: [6]) (NB. Does not address the scenario specifically for 8080 vs. 8086 instruction set architectures (as for CP/M and DOS), but describes the general "self-identifying program" concept of platform-independent programs (PIPs) through what the authors call a gadget header (that is, chunks of program logic not to be confused with ROP gadgets) for x86, MIPS and ARM: i.e. 0Eh, B2h, 02h, A9h, 0Eh, B2h, 02h, 3Ah, 24h, 77h, 01h, 04h or 90h, EBh, 20h, 2Ah, 90h, EBh, 20h, 3Ah, 24h, 77h, 01h, 04h.)
- ^ a b c d e f Wilkinson, William "Bill" Albert; Seligman, Cory; Drushel, Richard F.; Harston, Jonathan Graham; Elliott, John C. (1999-02-17). "MS-DOS & CP/M-Compatible Binaries". Newsgroup: comp.os.cpm. Archived from the original on 2021-12-13. Retrieved 2021-12-13. (NB. Some of the opcodes in Elliott's example code (EBh, 44h, EBh and EBh, 04h, ...) might be mixed up.)
- ^ a b Elliott, John C. (2009-10-27). "CP/M info program". Newsgroup: comp.os.cpm. Archived from the original on 2021-12-13. Retrieved 2021-12-13.
[…] DOS protection feature […] The idea is based on the utilities in Simeon Cran's MYZ80 emulator; the DOS-protection header in those goes one better by not changing any Z80 registers. The magic sequence is EB 52 EB: […] XCHG […] MOV D,D […] XCHG […] but that means the DOS code ends up quite a way away from the start of the program. […] More fun can be had with self-extract PMArc archives. Start one with […] defb 0EBh, 018h, '-pms-' […] and it's treated as a valid archive by the PMA utilities, sends 8086 processors to 011Ah, and Z80 processors to 0130h. […]
- ^ ChristW (2012-11-14) [2012-11-13]. Chen, Raymond (ed.). "Microsoft Money crashes during import of account transactions or when changing a payee of a downloaded transaction". The New Old Thing. Archived from the original on 2018-07-05. Retrieved 2018-05-19.
[…] byte sequence […] EB 03 C3 yy xx […] If you create a .COM file with those 5 bytes as the first ones […] you'll see 'JMP SHORT 3', followed by 3 garbage bytes. […] If you look at a Z80 disassembly […] that translates to 'EX DE,HL; INC BC;' […] The 3rd byte is 'JUMP' followed by the 16-bit address specified as yy xx […] you'll have a .COM file that runs on MS-DOS and […] CP/M […]
(NB. While the author speaks about the Z80, this sequence also works on the 8080 and compatible processors.) - ^ Brehm, Andrew J. (2016). "CP/M and MS-DOS Fat Binary". DesertPenguin.org. Archived from the original on 2018-05-19. Retrieved 2018-05-19. (NB. While the article speaks about the Z80, the code sequence also works on the 8080 and compatible processors.)
- ^ Elliott, John C. (1996-06-13). "Upload to micros.hensa.ac.uk". Newsgroup: comp.os.cpm. Archived from the original on 2021-12-13. Retrieved 2021-12-13.
[…] FATBIN 1.00 - combine a CP/M .COM file and a DOS .COM file to create one which runs on both platforms. […] It was used to create: […] MSODBALL 2.05 - convert floppy discs between Amstrad 706k format and a DOS 706k format. […] Both the programs run under CP/M-80 and DOS. […]
- ^ Elliott, John C. (1998-06-28) [1997-04-01]. "FATBIN v1.01". Archived from the original on 1998-06-28. (NB. FATBN101.COM 22k 1997-04-01 FATBIN v1.01. Creates fat binary files which will run under both CP/M and DOS. Distributed in a self-extracting archive for CP/M-80 and DOS.)
- ^ Elliott, John C. (2002-03-11). "DSKWRITE v1.00". Fossies - the Fresh Open Source Software Archive. Archived from the original on 2021-12-12. Retrieved 2021-12-12.
[…] DSKWRITE.Z80 contains source for the CP/M version. […] DSKWRITE.ASM contains source for the DOS version. […] To get the single .COM file, you need to use FBMAKE: […]
[7] (NB. Mentions FBMAKE from the FATBNSEA.COM package.) - ^ a b Elliott, John C. (2012-06-20) [2005-01-05]. "Generic CP/M". Seasip.info. Archived from the original on 2021-11-17. Retrieved 2021-12-12.
[…] Self-extracting archives are .COM files containing a number of smaller files. When you run one, it will create its smaller files […] The self-extract archive programs will run under DOS (2 or later) or CP/M, with identical effects. To extract them under Unix, you can use ZXCC […] FATBNSEA.COM […] FATBIN combines a CP/M-80 .COM file and a DOS .COM file to produce one that will work on both systems. […] M3C4SEA.COM […] M3CONV version 4 - converts Spectrum snapshots in the .Z80 or .SNA format to or from Multiface 3 format (Multiface 3 -> Z80 only on a PC). […] PMSFX21X.COM […] PMSFX is the program that was used to generate these self-unpacking archives. This version (2.11) can generate archives which unpack themselves under CP/M or DOS. You will need PMARC to use PMSFX. New: Under DOS, it supports exact file sizes. […] SP2BMSEA.COM […] Converts a Stop Press Canvas file to a Windows .BMP […]
[8] - ^ Elliott, John C. (1997-01-18) [1997-01-11]. "PMSFX 2". Newsgroup: comp.os.cpm. Archived from the original on 2021-12-13. Retrieved 2021-12-13.
[…] I've written a version of PMSFX that produces .COM files unpackable under DOS and CP/M (the first three bytes are both legal Z80 code, legal 8086 code and legal PMA header) […] as a self-extracting archive. […]
- ^ Elliott, John C.; Lopushinsky, Jim (2002) [1998-04-11]. "CP/M 3 COM file header". Seasip.info. Archived from the original on 2016-08-30. Retrieved 2016-08-29.
- ^ a b Necasek, Michal (2018-01-30) [2018-01-28, 2018-01-26]. "WordStar Again". OS/2 Museum. Archived from the original on 2019-07-28. Retrieved 2019-07-28.
[…] The reason to suspect such difference is that version 3.2x also supported CP/M-86 (the overlays are identical between DOS and CP/M-86, only the main executable is different) […] the .OVR files are 100% identical between DOS and CP/M-86, with a flag (clearly shown in the WordStar 3.20 manual) switching between them at runtime […] the OS interface in WordStar is quite narrow and well abstracted […] the WordStar 3.2x overlays are 100% identical between the DOS and CP/M-86 versions. There is a runtime switch which chooses between calling INT 21h (DOS) and INT E0h (CP/M-86). WS.COM is not the same between DOS and CP/M-86, although it's probably not very different either. […]
- ^ Lineback, Nathan. "GSX Screen Shots". Toastytech.com. Archived from the original on 2020-01-15. Retrieved 2020-01-15.
- ^ a b c Paul, Matthias R. (2002-04-11). "Re: [fd-dev] ANNOUNCE: CuteMouse 2.0 alpha 1". freedos-dev. Archived from the original on 2020-02-21. Retrieved 2020-02-21.
[…] FreeKEYB is […] a true .COM and .SYS driver (tiny model) in one. You can safely overwrite the first JMP, that's part of what I meant by "tricky header". […] you can replace the FFFFh:FFFFh by a 3-byte jump and a pending DB FFh. It works with MS-DOS, PC DOS, DR-DOS, and most probably any other DOS issue as well. […]
- ^ a b c Paul, Matthias R. (2002-04-06). "Re: [fd-dev] ANNOUNCE: CuteMouse 2.0 alpha 1". freedos-dev. Archived from the original on 2020-02-07. Retrieved 2020-02-07.
[…] Add a SYS device driver header to the driver, so that CTMOUSE could be both in one, a normal TSR and a device driver - similar to our FreeKEYB advanced keyboard driver. […] This is not really needed under DR DOS because INSTALL= is supported since DR DOS 3.41+ and DR DOS preserves the order of [D]CONFIG.SYS directives […] but it would […] improve the […] flexibility on MS-DOS/PC DOS systems, which […] always execute DEVICE= directives prior to any INSTALL= statements, regardless of their order in the file. […] software may require the mouse driver to be present as a device driver, as mouse drivers have always been device drivers back in the old times. These mouse drivers have had specific device driver names depending on which protocol they used ("PC$MOUSE" for Mouse Systems Mode for example), and some software may search for these drivers in order to find out the correct type of mouse to be used. […] Another advantage would be that device drivers usually consume less memory (no environment, no PSP) […] It's basically a tricky file header, a different code to parse the command line, a different entry point and exit line, and some segment magics to overcome the ORG 0 / ORG 100h difference. Self-loadhighing a device driver is a bit more tricky as you have to leave the driver header where it is and only relocate the remainder of the driver […]
- ^ a b c d Paul, Matthias R. (2001-06-10) [1995]. "DOS COUNTRY.SYS file format" (COUNTRY.LST file) (1.44 ed.). Archived from the original on 2016-04-20. Retrieved 2016-08-20.
- ^ a b c Paul, Matthias R. (1997-07-30) [1994-05-01]. "Chapter II.4. Undokumentierte Eigenschaften externer Kommandos - SYS.COM". NWDOS-TIPs — Tips & Tricks rund um Novell DOS 7, mit Blick auf undokumentierte Details, Bugs und Workarounds. MPDOSTIP (in German) (3 ed.). Archived from the original on 2017-09-10. Retrieved 2014-08-06.
Für ein zukünftiges Update für Calderas OpenDOS 7.01 habe ich den Startcode von IBMBIO.COM so modifiziert, daß er - falls fälschlicherweise als normales Programm gestartet - ohne Absturz zur Kommandozeile zurückkehrt. Wann diese Sicherheitsfunktion in die offizielle Version Einzug halten wird, ist jedoch noch nicht abzusehen.
(NB. NWDOSTIP.TXT is a comprehensive work on Novell DOS 7 and OpenDOS 7.01, including the description of many undocumented features and internals. It is part of the author's yet largerMPDOSTIP.ZIPcollection maintained up to 2001 and distributed on many sites at the time. The provided link points to a HTML-converted older version of theNWDOSTIP.TXTfile.) [9] - ^ Paul, Matthias R. (1997-10-02). "Caldera OpenDOS 7.01/7.02 Update Alpha 3 IBMBIO.COM README.TXT". Archived from the original on 2003-10-04. Retrieved 2009-03-29. [10]
- ^ DR-DOS 7.03 WHATSNEW.TXT - Changes from DR-DOS 7.02 to DR-DOS 7.03. Caldera, Inc. 1998-12-24. Archived from the original on 2019-04-08. Retrieved 2019-04-08.
- ^ a b Sage, Jay (May–June 1988). Carlson, Art (ed.). "ZCPR 3.4 - Type-4 Programs". The Computer Journal (TCJ) - Programming, User Support, Applications. ZCPR3 Corner (32). Columbia Falls, Montana, USA: 10–17 [16]. ISSN 0748-9331. ark:/13960/t1wd4v943. Retrieved 2021-11-29. [11][12]
- ^ a b Sage, Jay (May–June 1992) [March–June 1992]. Carlson, Art; McEwen, Chris (eds.). "Type-3 and Type-4 Programs". The Computer Journal (TCJ) - Programming, User Support, Applications. Z-System Corner - Some New Applications of Type-4 Programs (55). S. Plainfield, New Jersey, USA: Socrates Press: 13–19 [14, 16]. ISSN 0748-9331. ark:/13960/t4dn54d22. Retrieved 2021-11-29. [13][14]
- ^ a b Sage, Jay (November–December 1992). Carlson, Art; Kibler, Bill D. (eds.). "Regular Feature, ZCPR Support, Language Independence, part 2". The Computer Journal (TCJ) - Programming, User Support, Applications. The Z-System Corner (58). Lincoln, CA, USA: 7–10. ISSN 0748-9331. ark:/13960/t70v9g87h. Retrieved 2020-02-09.
[…] there was an opcode of "RST 0", which, if executed, would result in a warm boot. A file containing a Z3TXT module should never be executed, but at a cost of one byte we could protect ourself against that outside chance. The header also contained the string of characters "Z3TXT" followed by a null (0) byte. Many Z-System modules include such identifiers. In this category are resident command packages (RCPs), flow command packages (FCPs), and environment descriptor modules (Z3ENVs). Programs, such as Bridger Mitchell's […] JETLDR.COM, that load these modules from files into memory can use the ID string to validate the file, that is, to make sure that it is the kind of module that the user has stated it to be. User mistakes and damaged files can thus be detected. […] The header, thus, now stands as follows: […] rst […] db 'Z3TXT',0 ; null-terminated ID […] ; 12345678 ; must be 8 characters, […] db 'PROGNAME' ; pad with spaces […] ; 123 ; must be 3 characters […] db 'ENG' ; name of language […] dw LENGTH ; length of module […]
[15][16] - ^ "Table of IO Device Characteristics - Console or Teletypewriters". PDP-6 Multiprogramming System Manual (PDF). Maynard, Massachusetts, USA: Digital Equipment Corporation (DEC). 1965. p. 43. DEC-6-0-EX-SYS-UM-IP-PRE00. Archived (PDF) from the original on 2014-07-14. Retrieved 2014-07-10. (1+84+10 pages)
- ^ "5.1.1.1. Device Dependent Functions - Data Modes - Full-Duplex Software A(ASCII) and AL(ASCII Line)". PDP-10 Reference Handbook: Communicating with the Monitor - Time-Sharing Monitors (PDF). Vol. 3. Digital Equipment Corporation (DEC). 1969. pp. 5-3–5-6 [5-5 (431)]. Archived (PDF) from the original on 2011-11-15. Retrieved 2014-07-10. (207 pages)
- ^ "2. Operating System Call Conventions". CP/M 2.0 Interface Guide (PDF) (1 ed.). Pacific Grove, California, USA: Digital Research. 1979. p. 5. Archived (PDF) from the original on 2020-02-28. Retrieved 2020-02-28.
[…] The end of an ASCII file is denoted by a control-Z character (1AH) or a real end of file, returned by the CP/M read operation. Control-Z characters embedded within machine code files (e.g., COM files) are ignored, however, and the end of file condition returned by CP/M is used to terminate read operations. […]
(56 pages) - ^ Hogan, Thom (1982). "3. CP/M Transient Commands". Osborne CP/M User Guide - For All CP/M Users (2 ed.). Berkeley, California, USA: A. Osborne/McGraw-Hill. p. 74. ISBN 0-931988-82-9. Retrieved 2020-02-28.
[…] CP/M marks the end of an ASCII file by placing a CONTROL-Z character in the file after the last data character. If the file contains an exact multiple of 128 characters, in which case adding the CONTROL-Z would waste 127 characters, CP/M does not do so. Use of the CONTROL-Z character as the end-of-file marker is possible because CONTROL-Z is seldom used as data in ASCII files. In a non-ASCII file, however, CONTROL-Z is just as likely to occur as any other character. Therefore, it cannot be used as the end-of-file marker. CP/M uses a different method to mark the end of a non-ASCII file. CP/M assumes it has reached the end of the file when it has read the last record (basic unit of disk space) allocated to the file. The disk directory entry for each file contains a list of the disk records allocated to that file. This method relies on the size of the file, rather than its content, to locate the end of the file. […]
[17][18] - ^ BC_Programmer (2010-01-31) [2010-01-30]. "Re: Copy command which merges several files tags the word SUB at the end". Computer Hope Forum. Archived from the original on 2020-02-26. Retrieved 2020-02-26.
- ^ "What are the differences between Linux and Windows .txt files (Unicode encoding)". Superuser. 2011-08-03 [2011-06-08]. Archived from the original on 2020-02-26. Retrieved 2020-02-26.
- ^ a b Gordon, Ryan C. (October 2009). "FatELF: Universal Binaries for Linux". icculus.org. Archived from the original on 2020-08-27. Retrieved 2010-07-13.
- ^ Gordon, Ryan C. (November 2009). "FatELF specification, version 1". icculus.org. Archived from the original on 2020-08-27. Retrieved 2010-07-25.
- ^ Windisch, Eric (2009-11-03). "Subject: Newsgroups: gmane.linux.kernel, Re: FatELF patches..." gmane.org. Archived from the original on 2016-11-15. Retrieved 2010-07-08.
- ^ Gordon, Ryan C. (2009). "FatELF: Universal Binaries for Linux. - The proof-of-concept virtual machine download page". icculus.org. Archived from the original on 2022-05-21. Retrieved 2022-05-26. (NB. VM image of Ubuntu 9.04 with Fat Binary support.)
- ^ Holwerda, Thom (2009-11-05). "Ryan Gordon Halts FatELF Project". Linux. osnews.com. Archived from the original on 2022-05-26. Retrieved 2010-07-05.
- ^ Brockmeier, Joe "Zonker" (2010-06-23). "SELF: Anatomy of an (alleged) failure". LWN.net. Linux Weekly News. Archived from the original on 2022-05-26. Retrieved 2011-02-06.
- ^ Mulder, Sijmen J. (2021-03-06) [2018-04-25]. "sjmulder/fatpack - Build multi-architecture 'fat' binaries for Windows". GitHub. Archived from the original on 2022-05-26. Retrieved 2022-05-26.
- ^ "Arm64X PE files". learn.microsoft.com. Microsoft. 2022-08-13. Archived from the original on 2023-08-20. Retrieved 2023-03-31.
- ^ "Build Arm64X binaries". learn.microsoft.com. Microsoft. 2023-03-10. Archived from the original on 2023-08-20. Retrieved 2023-03-31.
- ^ Wang, Perry H.; Collins, Jamison D.; Chinya, Gautham N.; Jiang, Hong; Tian, Xinmin; Girkar, Milind; Yang, Nick Y.; Lueh, Guei-Yuan; Wang, Hong (June 2007). "EXOCHI: architecture and programming environment for a heterogeneous multi-core multithreaded system". ACM SIGPLAN Notices. 42 (6): 156–166. doi:10.1145/1273442.1250753. (11 pages)
- ^ Wang, Perry H.; Collins, Jamison D.; Chinya, Gautham N.; Jiang, Hong; Tian, Xinmin; Girkar, Milind; Pearce, Lisa; Lueh, Guei-Yuan; Yakoushkin, Sergey; Wang, Hong (2007-08-22). "Accelerator Exoskeleton" (PDF). Intel Technology Journal. 11: Tera-scale Computing (3). Intel Corporation: 185–196. doi:10.1535/itj.1103. ISSN 1535-864X. Archived (PDF) from the original on 2022-05-26. Retrieved 2022-05-26. (12 of 1+vii+90+1 pages)
- ^ "cudaFatFormat.h / ptxomp.c". 1.13. Nvidia Corporation. 2004-11-15. Archived from the original on 2022-05-26. Retrieved 2022-05-26.
- ^ Harris, Mark J. (2014-05-08) [2013-06-05]. "Technical Walkthrough: CUDA Pro Tip: Understand Fat Binaries and JIT Caching". Nvidia Developer. Nvidia. Archived from the original on 2022-03-23. Retrieved 2022-05-26.
- ^ "CUDA Binary Utilities" (PDF) (Application Note). 6.0. Nvidia. February 2014. DA-06762-001_v6.0. Archived (PDF) from the original on 2022-05-25. Retrieved 2022-05-25.
- ^ "fatbinary - help". helpmanual.io. 8.0. 2016. Archived from the original on 2022-05-25. Retrieved 2022-05-25.
- ^ "CUDA Compiler Driver NVCC - Reference Guide" (PDF). 11.7. Nvidia. May 2022. TRM-06721-001_v11.7. Archived (PDF) from the original on 2022-05-25. Retrieved 2022-05-25.
- ^ Braun, Lorenz; Frönin, Holger (2019-11-18). "CUDA Flux: A Lightweight Instruction Profiler for CUDA Applications" (PDF). 2019 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS). IEEE. pp. 73–81. doi:10.1109/PMBS49563.2019.00014. ISBN 978-1-7281-5977-5. Archived (PDF) from the original on 2022-03-21. Retrieved 2022-05-26.
- ^ Fung, Wilson W. L.; Sham, Ivan; Yuan, George; Aamodt, Tor M. (2007). "Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow" (PDF). Vancouver, British Columbia, Canada. Archived (PDF) from the original on 2022-05-26. Retrieved 2022-05-26. (12 pages)
- ^ Bakhoda, Ali; Yuan, George L.; Fung, Wilson W. L.; Wong, Henry; Aamodt, Tor M. (2009-04-28) [2009-04-26]. "Analyzing CUDA workloads using a detailed GPU simulator" (PDF). 2009 IEEE International Symposium on Performance Analysis of Systems and Software. pp. 163–174. doi:10.1109/ISPASS.2009.4919648. ISBN 978-1-4244-4184-6. Archived (PDF) from the original on 2022-05-26. Retrieved 2022-05-06. [19]
- ^ a b "13.4 The AMD Compiler Wrapper: Fat binaries". The Multi2Sim Simulation Framework - A CPU-GPU Model for Heterogeneous Computing (PDF). v4.2. Multi2Sim. 2013. pp. 173–176 [176]. Archived (PDF) from the original on 2022-05-25. Retrieved 2022-05-25. (4 of 210 pages)
- ^ Ubal, Rafael; Jang, Byunghyun; Mistry, Perhaad; Schaa, Dana; Kaeli, David R. (2012-09-23) [2012-09-19]. "Multi2Sim: A Simulation Framework for CPU-GPU Computing" (PDF). 21st International Conference on Parallel Architectures and Compilation Techniques (PACT). Minneapolis, Minnesota, USA: IEEE. ISBN 978-1-4503-1182-3. Archived (PDF) from the original on 2022-05-25. Retrieved 2022-05-25. (10 pages)
- ^ "LTO Overview (GNU Compiler Collection (GCC) Internals)". gcc.gnu.org. Archived from the original on 2021-09-12. Retrieved 2021-09-12.
- ^ Wennborg, Hans (2018). "Attributes in Clang". Clang 7 documentation. Archived from the original on 2022-04-07. Retrieved 2022-05-26.
- ^ Bahena, Victor Rodriguez (2018-04-03). "Transparent use of library packages optimized for Intel architecture". Power and Performance. Clear Linux Project. Intel Corporation. Archived from the original on 2022-05-26. Retrieved 2022-05-26.
- ^ Paul, Matthias R.; Frinke, Axel C. (1997-10-13) [1991], FreeKEYB - Enhanced DOS keyboard and console driver (User Manual) (v6.5 ed.) [20] (NB. FreeKEYB is a Unicode-based dynamically configurable successor of K3PLUS supporting most keyboard layouts, code pages, and country codes. Utilizing an off-the-shelf macro assembler as well as a framework of automatic pre- and post-processing analysis tools to generate dependency and code morphing meta data to be embedded into the executable file alongside the binary code and a self-discarding, relaxing and relocating loader, the driver implements byte-level granular dynamic dead code elimination and relocation techniques at load-time as well as self-modifying code and reconfigurability at run-time to minimize its memory footprint downto close the canonical form depending on the underlying hardware, operating system, and driver configuration as well as the selected feature set and locale (about sixty configuration switches with hundreds of options for an almost unlimited number of possible combinations). This complexity and the dynamics are hidden from users, who deal with a single executable file just like they would do with a conventional driver.)
- ^ Paul, Matthias R. (2002-04-06). "[fd-dev] Ctrl+Alt+Del". freedos-dev. Archived from the original on 2019-04-27. Retrieved 2019-04-27.
[…] FreeKEYB builds the driver's runtime image at initialization time depending on the type of machine it is being loaded on, the type of keyboard, layout, country and code page used, the type of mouse and video adapter(s) installed, the other drivers loaded on that system, the operating system and the load and relocation method(s) used, the individual features included, and the configuration options specified in the command line. Due to the large number of command line switches and options supported […] (around fifty switches […] with multiple possible settings) there is a high number of feature combinations with uncountable dependencies […] resulting in […] endless number of […] different target images. FreeKEYB's Dynamic Dead Code Elimination technique manages to resolve […] these […] dependencies and […] remove dead code and data […] is not restricted to […] include or exclude a somewhat limited number of modules or whole sub-routines and fix up some dispatch tables as in classical TSR programming, but […] works […] at […] byte level […] able to remove […] individual instructions in the middle of larger routines […] distributed all over the code to handle a particular case or support a specific feature […] special tools are used to analyze the code […] and create […] fixup tables […] automated […] using conditional defines […] to declare the various cases […] not only optional at assembly time but at initialization time […] without the […] overhead of having at least some amount of dead code left in the runtime image […] to keep track of all the dependencies between […] these conditionals, dynamically build and relocate the runtime image, fix up all the references between these small, changing, and moving binary parts […] still allowing to use the tiny .COM/.SYS style […] model […] is done at initialization time […]
- ^ Paul, Matthias R. (2001-08-21). "[fd-dev] Changing codepages in FreeDOS". freedos-dev. Archived from the original on 2019-04-19. Retrieved 2019-04-20.
[…] a […] unique feature […] we call dynamic dead code elimination, so you can at installation time […] specify which components of the driver you want and which you don't. This goes to an extent of dynamic loadable modularization and late linkage I have not seen under DOS so far. If you do not like the screen saver, macros, the calculator, or mouse support, or <almost anything else>, you can specify this at the command line, and FreeKEYB, while taking all the dependencies between the routines into account, will completely remove all the code fragments, which deal with that feature and are not necessary to provide the requested functionality, before the driver relocates the image into the target location and makes itself resident. […]
- ^ Paul, Matthias R. (2001-04-10). "[ANN] FreeDOS beta 6 released" (in German). Newsgroup: de.comp.os.msdos. Archived from the original on 2017-09-09. Retrieved 2017-07-02.
[…] brandneue[s] Feature, der dynamischen Dead-Code-Elimination, die die jeweils notwendigen Bestandteile des Treibers erst zum Installationszeitpunkt zusammenbastelt und reloziert, so daß keine ungenutzten Code- oder Datenbereiche mehr resident bleiben (z.B. wenn jemand ein bestimmtes FreeKEYB-Feature nicht benötigt). […]
Further reading
[edit]- Tunney, Justine Alexandra Roberts (2021-02-11). "How Fat Does a Fat Binary Need to Be?". cosmopolitan libc - your build-once run-anywhere c library / Cosmopolitan Communiqué. Archived from the original on 2021-09-12. Retrieved 2021-09-12.; Tunney, Justine Alexandra Roberts (2021-02-11). "How Fat Does a Fat Binary Need to Be?". Hacker News. Archived from the original on 2021-06-01. Retrieved 2021-09-12.
- Tunney, Justine Alexandra Roberts (2020-08-24). "αcτµαlly pδrταblε εxεcµταblε (Ape)". Archived from the original on 2021-09-12. Retrieved 2021-09-12.
- Gotham, Frederick (2020-10-22). "Making a Fat Binary for Linux and Mac". Narkive. Archived from the original on 2021-09-12. Retrieved 2021-09-12.
- Gotham, Frederick (2020-10-24). "Fat Binary - MS-Windows and four Linux". Narkive. Archived from the original on 2021-09-12. Retrieved 2021-09-12.
- Gotham, Frederick (2020-11-02). "Fat Binary - DOS Windows Linux". Narkive. Archived from the original on 2021-09-12. Retrieved 2021-09-12.
- "We develop to WarpUP the Amiga - StormC for PowerPC and p-OS". Haage & Partner GmbH. September 1996. Archived from the original on 2017-12-06. Retrieved 2021-09-29.
- Münch, Matthias (2006) [2005]. "AmigaOS 3.9 - Features". AmigaOS: multimedia, multi-threaded, multi-tasking. Archived from the original on 2021-09-29. Retrieved 2021-09-29.
Fat binary
View on GrokipediaOverview
Definition
A fat binary is a single executable file or library that incorporates multiple distinct versions of machine code, each compiled for different CPU architectures, instruction sets, or operating environments, enabling execution on diverse hardware by dynamically selecting the suitable code variant at runtime. Unlike a thin or single-architecture binary, which contains code optimized solely for one target platform, a fat binary bundles these variants into one file to streamline distribution and deployment across heterogeneous systems. In the prominent Mach-O format used by Apple ecosystems, the internal structure features an initial header that specifies the total number of architecture-specific "slices" and delineates their properties, succeeded by the sequential concatenation of each slice's binary data. This header employs a magic number—such as the 32-bit value 0xCAFEBABE in big-endian byte order—to identify the file format and ensure proper parsing by loaders.[4] Other implementations, such as FatELF for Linux, use different structures to achieve similar multi-architecture support.[5] Essential components in the Mach-O format encompass architecture identifiers within per-slice records, which detail the CPU type and subtype (e.g., via enumerated constants for processors like x86 or ARM); offset values indicating the byte position where each slice commences in the file; size fields denoting the length of each segment; and alignment specifications to maintain memory efficiency.[6] Runtime selection occurs through the operating system's loader, which interrogates the host CPU's capabilities and navigates the header's offset table to load and execute the matching slice, thereby supporting seamless compatibility without user intervention.[7] The term "fat binary" originated with Apple's adoption of multi-architecture executables in 1994 to support the transition from 68k to PowerPC processors, though it has since been applied more broadly to similar formats across computing platforms.[8]Purpose and Benefits
Fat binaries serve as a unified executable format that encapsulates multiple architecture-specific code variants within a single file, allowing software to operate across diverse hardware platforms without requiring separate distributions or manual reconfiguration by users. This approach addresses the challenges of multi-architecture environments by enabling seamless execution on systems with varying instruction sets, such as during processor transitions in computing ecosystems.[3] The primary benefits for developers include streamlined deployment processes, where a single build artifact supports multiple target architectures, thereby reducing the need for parallel compilation pipelines and version management. This simplifies software maintenance and distribution, as updates can be packaged once to serve broad compatibility needs, minimizing overhead in cross-platform support. For end-users, fat binaries enhance accessibility by automatically selecting and executing the optimal code slice at runtime, providing native performance without intervention and improving overall experience in heterogeneous computing setups. Additionally, they facilitate backward compatibility, allowing legacy applications to run alongside newer hardware advancements, such as shifts from 32-bit to 64-bit systems or x86 to ARM architectures.[3] In practical use cases, fat binaries prove particularly valuable for software updates during CPU migrations, ensuring continuity in enterprise or consumer environments where hardware diversity is common, such as in embedded systems or cross-operating system tools. The runtime selection mechanism relies on the host system's loader—such as dynamic linkers—to parse the binary's header, identify the compatible architecture based on the current CPU, and extract and load the corresponding code slice for execution. This process occurs transparently, prioritizing native instructions to optimize performance while maintaining portability.[3]Limitations
One primary limitation of fat binaries is their significantly increased file size, as they embed complete copies of the executable code for multiple architectures within a single file. This can result in binaries that are approximately twice as large as single-architecture equivalents, depending on the number of supported architectures, leading to higher demands on storage, download bandwidth, and initial memory allocation.[9] The enlarged size also introduces performance overhead, particularly during initial loading, where the operating system must parse the fat header and extract the relevant architecture slice before execution. If unused code from other architectures is not properly stripped from memory after loading, it can contribute to unnecessary bloat and prolonged runtime memory usage.[3] Creating fat binaries adds complexity to the development workflow, requiring compilation for each target architecture separately—often using cross-compilation tools—and subsequent merging with utilities like thelipo command to combine the outputs into a unified file. This process is straightforward in integrated environments like Xcode but becomes more intricate for projects relying on custom makefiles or scripts, where manual architecture specification and merging steps are necessary. Debugging multi-architecture binaries presents additional challenges, as not all slices can be executed or inspected on every development machine; for example, Apple silicon (arm64) slices cannot be debugged on Intel-based Macs without emulation, limiting comprehensive testing.[3]
In Apple implementations, fat binaries expand the potential attack surface by incorporating multiple distinct code paths, which may introduce architecture-specific vulnerabilities that are harder to audit uniformly. Challenges in code signing and verification arise because all embedded slices must be individually signed and validated, and inconsistencies can occur during the process; for instance, a 2007 vulnerability in Universal Mach-O binary parsing allowed maliciously crafted files to cause system crashes or arbitrary code execution.[10] More recent issues, as of 2018, have enabled attackers to bypass code-signing checks on universal binaries, potentially masquerading malicious code as legitimate.[11] Similar security considerations may apply to other platforms depending on their implementation.
To mitigate these drawbacks, developers often limit inclusion to only the most essential architecture slices during builds, reducing unnecessary bloat while maintaining broad compatibility. Compression techniques applied to individual segments or the entire binary can further decrease file sizes without altering functionality. Hybrid approaches, such as app thinning in distribution platforms like the App Store—where a universal binary is submitted but only the device-specific slice is delivered—or on-demand resource loading via thin wrappers, address size and performance issues by deferring delivery of unused components until needed.[9]
Early Implementations
CP/M and DOS Combined Binaries
In the mid-1980s, combined binaries emerged as an early form of fat binaries to enable software portability between CP/M-80 on Z80 processors and MS-DOS on x86 processors, allowing executables to run without recompilation across these 8-bit and 16-bit environments.[12] These binaries addressed the need for cross-compatibility in the microcomputer era, where users often transitioned between CP/M systems like those on S-100 buses and emerging IBM PC-compatible machines running MS-DOS. The approach leveraged the similar memory loading conventions of both operating systems, where .COM files are loaded starting at memory address 0x100, facilitating shared code segments while accommodating OS-specific initialization.[12][13] COM-style combined binaries typically consist of a single .COM file incorporating dual entry points through carefully crafted initial bytes that are interpreted differently by each OS. Under CP/M, the loader places the file at offset 0 relative to the transient program area (TPA) starting at 0x100, executing from the beginning of the file; under MS-DOS, a short prefix (often the first 256 bytes) detects the environment via instructions like INT 11h to query equipment configuration, then branches to the DOS-specific code at file offset 0x100 (loaded to memory 0x200). This structure allows the bulk of the program—such as algorithmic logic or data processing—to be shared, with OS detection enabling a jump to the appropriate handler for system calls, file I/O, or console operations. For example, the first three bytes might encode a CP/M BDOS call (e.g., CALL 0x2411) but disassemble as an MS-DOS interrupt followed by an AND operation in 8080/8088-compatible assembly. The technique was published in Remark magazine in February 1987 by Bill Wilkinson.[12] Tools like DMERGE, a C program, or DEBUG automated the merging of separate CP/M-80 and MS-DOS .COM files into this format. Later utilities, such as FATBIN around 2000, also supported creating such binaries.[12][13]Apollo Domain Executables
Apollo Computer Inc. developed compound executables, known as "cmpexe" files, for its Domain/OS operating system starting with Software Release 10.1 in 1988.[14] These were designed for Motorola 68000-based workstations (m68k architecture) and extended to support the PRISM architecture (a88k) in Apollo's Series 10000 systems, enabling a single file to contain multiple architecture-specific versions of a program.[14][15] Cmpexe files bundled multiple object formats, primarily in COFF for both architectures, to address variations in library dependencies and minor hardware differences, such as floating-point processing options across workstation models.[14] The file structure included architecture-specific executable segments within a unified container, with a header implicitly defining offsets and version tags for each component; the runtime loader dynamically selected the appropriate segment based on the host system's instruction set processor (ISP) type and configuration.[14][15] Tools like the Compound Executable Archiver (xar) or mrgri with the -cmpexe option were used to create these files by merging separately compiled and linked versions.[14][15]
The primary purpose was to manage hardware diversity in Apollo's networked workstation environments, allowing software to run without machine-specific rebuilds or separate distributions, thus supporting the evolution from the earlier AEGIS OS to Domain/OS across heterogeneous nodes.[14][15] This approach facilitated diskless booting and cross-development, with installation utilities automatically resolving the correct variant during deployment.[15]
A key innovation was the load-time dynamic resolution mechanism, which extended to shared libraries by selecting compatible versions at runtime, anticipating features in later universal binary formats.[14] However, cmpexe support was incompatible with pre-SR10.1 systems and required pre-stripped components.[14]
Following Hewlett-Packard's acquisition of Apollo in 1989 for $476 million, cmpexe functionality was gradually phased out as Domain/OS integration with HP's systems progressed, though it influenced subsequent multi-architecture tools in enterprise environments.[16][15]
Apple and NeXT Implementations
NeXTSTEP Multi-Architecture Binaries
NeXTSTEP introduced multi-architecture binaries, commonly known as fat binaries, with the release of version 3.1 on May 25, 1993, marking the first support for Intel i386 processors alongside the existing Motorola 68000 (m68k) architecture used in NeXT's proprietary hardware.[2] This feature allowed a single executable file to contain compiled code for multiple instruction set architectures (ISAs), enabling seamless execution across different hardware platforms without requiring separate builds or distributions. The binaries used a fat header structure that encapsulated multiple "slices" of code, each tailored to a specific architecture such as m68k and i386, with metadata specifying the offsets, sizes, and CPU types for each slice to facilitate identification and selection.[17] The build process for these multi-architecture binaries relied on NeXT's GNU-based toolchain, which compiled separate object files for each target ISA before combining them into a single fat file using the lipo utility. Developers would invoke lipo to merge the architecture-specific variants, creating a unified executable that preserved the integrity of each slice while adding the overarching fat header for multi-architecture support. At runtime, the Mach kernel's loader examined the host CPU type upon execution and dynamically selected the appropriate slice from the fat header, loading only that portion into memory while ignoring the others to optimize performance and resource usage.[18][17] This innovation was pivotal in NeXT's strategic shift from proprietary m68k-based hardware to more affordable Intel-compatible systems starting in 1993, significantly broadening the operating system's accessibility and accelerating developer and user adoption by eliminating architecture-specific barriers. The approach evolved through subsequent releases, reaching NeXTSTEP 3.3 in early 1995, which extended support to Sun SPARC and HP PA-RISC architectures for "quad-fat" binaries, further enhancing cross-platform compatibility. These multi-architecture binaries laid the foundational portability mechanisms that influenced OPENSTEP, NeXT's later platform-agnostic specification released in 1994.[19][2]Mach-O Universal Binaries
The Mach-O file format was introduced with Mac OS X 10.0 in 2001 as the native executable format for the operating system, building on multi-architecture concepts from NeXTSTEP while providing a formalized structure for binaries on Darwin-based systems.[20][21] This format replaced earlier a.out-style executables and enabled efficient handling of object code, shared libraries, and executables across Apple's evolving hardware landscape.[7] In Mach-O universal binaries, also known as fat binaries, a top-level "fat" header encapsulates multiple architecture-specific Mach-O files, allowing a single file to contain code for diverse processors such as PowerPC and x86. The fat header begins with a magic number (0xCAFEBABE in big-endian byte order) followed by the number of contained architectures (nfat_arch), and an array of fat_arch structures that detail each slice's CPU type (cputype), CPU subtype (cpusubtype), file offset, size, and alignment requirements.[7] Each embedded Mach-O thin file then includes its own header (mach_header for 32-bit or mach_header_64 for 64-bit), load commands, and segments tailored to the target architecture, such as PPC for PowerPC or i386/x86_64 for Intel. This design supports hybrid 32/64-bit configurations within the same binary, ensuring compatibility without runtime emulation.[22] Apple provides command-line tools for managing these binaries: the lipo utility allows developers to create universal binaries by combining thin files, extract specific architectures, or verify contents, while otool enables detailed inspection of headers, symbols, and segments.[7] At runtime, the dynamic linker dyld examines the host CPU and selects the matching slice by matching the cputype field—for instance, value 18 corresponds to x86_64—before loading and executing the appropriate code. This mechanism is integral to the Darwin kernel, which employs universal binaries to maintain cross-architecture compatibility, such as between PowerPC and Intel processors in early macOS releases.[7][21] Support for universal binaries in Mach-O was significantly expanded with OS X 10.4 Tiger in 2005 to support Apple's transition from PowerPC to Intel architectures, enabling seamless execution on mixed hardware without recompilation.[23] The format remains in active use in modern macOS as of 2025, accommodating ongoing shifts like the move to Apple silicon while preserving backward compatibility for legacy code.[24]Apple Universal Binaries
Apple coined the term "Universal Binary" in 2005 at the Worldwide Developers Conference to support the transition from PowerPC to Intel processors in Mac OS X Tiger (version 10.4).[25] This approach allowed developers to package applications with executable code for multiple architectures in a single file, simplifying compatibility during the hardware shift announced that year.[26] Universal Binaries are implemented as Mach-O fat files containing separate "slices" for architectures such as PowerPC (PPC), i386 (32-bit Intel), and later x86_64 (64-bit Intel).[27] Building on the Mach-O universal binary format introduced earlier, Apple's version integrated seamlessly with the operating system, enabling the runtime to select the appropriate slice based on the host hardware. The adoption of Universal Binaries had a profound impact on Apple's ecosystem, with all major software products—including developer tools like Xcode and consumer applications such as iLife—shipped in this format starting in 2006.[28] This widespread implementation facilitated a smooth hardware transition, allowing users to upgrade to Intel-based Macs without needing separate application versions or emulation for most software. By providing native performance across architectures, it minimized disruptions for developers and end-users alike during the pivotal shift. Over time, support for PowerPC slices was phased out with the release of OS X Lion (version 10.7) in 2011, which dropped Rosetta emulation and required all applications to be Intel-native.[29] The concept was revived in 2020 for the transition to Apple Silicon, with macOS Big Sur (version 11) introducing Universal 2 Binaries that combine x86_64 (Intel) and arm64 slices.[30] This evolution, supported by Xcode 12 and later, ensured continued compatibility as Apple phased in ARM-based M-series chips. As of 2025, Universal Binaries remain integral to Apple's developer ecosystem, particularly for tools like Xcode, which are distributed with multi-architecture support to accommodate mixed hardware environments.[3] iOS applications, however, are distributed as thin binaries optimized solely for ARM64 devices, while iOS simulators on macOS employ universal binaries to enable cross-architecture testing on both Intel and Apple Silicon systems.[31] Developers create these binaries using Xcode build settings to target multiple architectures (e.g., via the "Architectures" option set to "Standard Architectures") or command-line tools likexcodebuild with flags such as -arch arm64 -arch x86_64, followed by lipo to merge the outputs into a single fat file.[3]
Fat EFI Binaries
Fat EFI binaries were introduced by Apple in 2006 with the transition to Intel-based Macs, utilizing EFI 1.1 firmware to bundle both 32-bit x86 and 64-bit x86_64 code within the boot.efi bootloader, enabling compatibility across varying hardware configurations during the early adoption phase.[32] This approach extended the fat binary concept to firmware level, allowing a single file to support multiple instruction sets without requiring separate installations.[33] The structure of these fat EFI binaries is based on the Portable Executable/Common Object File Format (PE/COFF), augmented with Apple-specific EFI headers for multi-architecture support. The file begins with a custom fat header featuring a magic number of 0x0ef1fab9 in little-endian format, followed by the number of embedded architecture slices. Each slice is described by five 32-bit integers: CPU type (e.g., 0x07 for x86 or 0x01000007 for x86_64), CPU subtype (e.g., 0x03 for generic), offset, size, and alignment (typically 0x00). The embedded PE/COFF executables follow in sequence, loaded by the EFI firmware which selects the appropriate slice based on the system's architecture before proceeding to the kernel.[33][34] The primary purpose of fat EFI binaries was to provide a unified bootloader for hybrid Intel systems during the shift from PowerPC, ensuring seamless booting on machines with mixed 32-bit and 64-bit EFI implementations without additional user intervention. This facilitated smoother transitions in enterprise and consumer environments where hardware diversity was common.[32] A key example is the boot.efi file located at /System/Library/CoreServices, which incorporates slices for supported CPU architectures to handle initial system loading. For security, these binaries are cryptographically signed using Apple's root certificates, enabling verification during the boot chain and supporting Secure Boot to prevent unauthorized modifications or malware injection across architectures.[34][35] Over time, fat EFI binaries evolved alongside macOS updates, maintaining their role in dual-boot scenarios and recovery processes as of 2025.Linux Implementations
FatELF Specification
FatELF is a proposed file format extension to the Executable and Linkable Format (ELF) designed to package multiple architecture-specific ELF binaries into a single file, enabling universal binaries for Linux systems. It was introduced in 2009 by Ryan C. Gordon, with contributions from others in the open-source community, as an open alternative to proprietary fat binary formats like those used in macOS. The format aims to address the challenges of distributing software across diverse hardware architectures without requiring separate builds or repositories.[36][37] The core of the FatELF specification consists of a custom header followed by a segment table that catalogs embedded ELF "slices," each tailored to a specific architecture such as x86, ARM, or MIPS. The header begins with a 32-bit magic number (0x1F0E70FA, represented in little-endian as FA 70 0E 1F) to identify the file as FatELF, followed by a 16-bit version field (currently set to 1), an 8-bit record count indicating the number of slices, and a reserved 8-bit field set to zero. The segment table then lists one record per slice, where each 24-byte record includes a 16-bit machine architecture identifier (corresponding to the ELF e_machine field), a 32-bit composite for OS ABI, OS ABI version, word size, and byte order, a reserved 2-byte (16-bit) field set to zero, and two 64-bit integers specifying the offset and size of the respective ELF binary within the file. These offsets point from the start of the file, and slices are padded with null bytes to meet platform-specific alignment requirements, such as 4096-byte boundaries, ensuring no overlap between binaries.[38][39]
Key features of FatELF include support for heterogeneous instruction set architectures (ISAs) by embedding complete, independent ELF binaries without shared sections, allowing the format to handle variations in endianness, word sizes, and operating system ABIs. At runtime, the ELF loader—such as ld.so—inspects the system's architecture via mechanisms like uname() or /proc/cpuinfo to select and load the matching slice, ignoring others to optimize performance and memory usage. The format maintains backward compatibility with standard ELF tools; if the magic number is absent or unrecognized, the file can be treated as a plain ELF binary, and utilities can extract individual slices for processing with existing tools like objdump or readelf.[38][39]
To build FatELF files, a hypothetical utility called fatelf was outlined to merge multiple ELF binaries, reading their headers to validate targets and appending them according to the specification while computing offsets and padding. This tool would integrate with build systems like CMake, and accompanying utilities such as fatelf-extract and fatelf-glue facilitate creation and disassembly, ensuring compatibility with standard ELF workflows by allowing extraction of slices for verification or modification.[39][38]
The primary goals of FatELF were to enable universal Linux distribution packages that work across multiple architectures without maintaining platform-specific repositories, thereby simplifying software deployment for distributions and easing migrations, such as from x86 to ARM during hardware transitions. By consolidating binaries into one file, it sought to reduce storage needs and streamline updates in multi-architecture environments.[36][37]
As of 2025, the FatELF specification has not been merged into core components like glibc, and its adoption remains limited to niche projects and experimental implementations, with the original development effort halted in 2009 due to community resistance.[40][41]
Adoption and Challenges
The adoption of FatELF in Linux has remained largely experimental and confined to niche applications, with no integration into major distributions such as Ubuntu or Fedora as of 2025.[37][39] It has seen limited use in projects involving ports to alternative operating systems like Haiku OS, where community discussions have explored its potential for multi-architecture support during RISC-V porting efforts, and in some embedded Linux tools, such as proposals for combining 32-bit and 64-bit ARM binaries in Debian packages for resource-limited devices.[42][41] Despite these cases, FatELF has not achieved widespread deployment due to its status as a non-standard extension requiring custom tooling and lacking official endorsement from Linux kernel maintainers.[43][44] Key challenges hindering FatELF's uptake include the absence of native kernel and loader integration, necessitating patches to the Linux kernel and glibc for full functionality, or reliance on workarounds like the binfmt_misc kernel module to register and handle FatELF files as a user-space hack.[37][41] Additionally, concerns over binary bloat arise in resource-constrained environments, as embedding multiple architecture-specific ELF binaries into a single file significantly increases storage requirements without proportional benefits in most scenarios.[37][45] The Linux ecosystem has instead favored alternatives such as containerization with Docker for environment portability, cross-compilation to target specific architectures, QEMU-based emulation for multi-ABI support, and distribution formats like separate per-architecture packages, Flatpak, or AppImage, which provide cross-platform deployment without the overhead of fat binaries.[41][46] Recent developments have sparked minor interest in reviving FatELF, particularly in 2023 discussions around multi-architecture needs for emerging hardware like RISC-V, though these have not led to any standardization efforts or kernel inclusion.[41] Tools like pyInstaller have emerged as partial mimics of fat-like universal binaries for Python applications on Linux, bundling dependencies into standalone executables that enhance portability across systems without requiring multi-architecture embedding. The FatELF community remains active through maintenance on GitHub, where it continues to influence concepts of universal binaries, but it has not been adopted in major runtimes such as Android's ART, which opts for architecture-specific optimizations instead.[39][37]Windows Implementations
Fatpack Bundling
Fatpack is an open-source command-line tool designed to create multi-architecture "fat" binaries for Windows executables, embedding multiple versions of a program into a single portable file to enhance distribution simplicity. Developed by Sijmen J. Mulder and released around 2018 under the 2-clause BSD license, it addresses the challenge of supporting diverse hardware and operating system environments by combining binaries targeted at different CPU architectures and Windows versions.[47] The tool's mechanism involves wrapping the primary Portable Executable (PE) file with additional binaries appended as resources within a lightweight 32-bit Intel loader stub. Upon execution, the loader sequentially tests and extracts the embedded binaries to a temporary directory, attempting to run each until one compatible with the host system succeeds; this leverages Windows features like WoW64 for 32-bit on 64-bit systems and ARM emulation. While it embeds the core executables to reduce the need for separate downloads, it does not natively bundle external DLL dependencies or assets, limiting its scope to the main program files and focusing instead on portability across Windows versions from XP onward and architectures including 32-bit Intel, 64-bit Intel, and 64-bit ARM (Windows 10 and later). This approach differs from true multi-architecture formats like Apple's Mach-O universal binaries, as it relies on runtime extraction rather than native linker support for multiple instruction sets within a single binary.[47] Common use cases for Fatpack include shareware distribution and creating cross-compatible applications that run seamlessly on varied Windows setups, such as from Windows 7 to Windows 11, without requiring users to select or install architecture-specific versions or full installers. For instance, developers can package a single executable that adapts to Intel or ARM systems, simplifying deployment for portable tools or utilities.[47] Despite its utility, Fatpack has notable limitations: the resulting file size grows substantially with each added architecture, potentially exceeding several megabytes for comprehensive bundles. The extraction to temporary directories can disrupt programs expecting fixed paths, and the self-modifying loader behavior may trigger antivirus software false positives. Additionally, since DLL dependencies remain external and architecture-specific, the tool requires manual handling of libraries for full portability, and its ARM support remains largely untested as a proof-of-concept implementation. As of 2025, Fatpack remains available on GitHub with no major updates since its initial release, serving primarily as an experimental bundler that has informed broader concepts in Windows executable packaging tools.[47]Arm64X Hybrid Binaries
Arm64X hybrid binaries represent a specialized Portable Executable (PE) format introduced by Microsoft in the Windows 11 SDK to enhance x64 application compatibility on Arm64-based devices.[48] This format builds on the Arm64EC (Emulation Compatible) application binary interface, first announced in June 2021, allowing developers to create binaries that seamlessly integrate native Arm64 code with emulated x64 components.[49] Designed for Windows 11 on Arm, Arm64X addresses the challenge of running the vast x86/x64 software ecosystem on hardware like Qualcomm's Snapdragon X Elite processors, where unmodified x64 applications are emulated through the WoW64 layer.[50] The format was introduced in 2021 with the Windows 11 SDK alongside Arm64EC support, with preview availability in developer tools that year and broader ecosystem enhancements including the Prism emulator in Windows 11 version 24H2 in 2024, enabling broader adoption of Arm64 PCs.[51] At its core, an Arm64X binary is a single PE file that merges native Arm64 code sections with Arm64EC-compatible elements, including embedded x64 thunk layers for interoperability.[48] The Windows loader, primarily through ntdll.dll, dynamically dispatches execution based on the host process architecture: native Arm64 code runs directly in Arm64 processes, while x64 or Arm64EC calls are routed to the emulation layer if necessary.[52] This structure eliminates the need for separate binaries, reducing disk space and simplifying deployment for system components, middleware, and plugins. The Prism emulator, integrated into Windows 11 24H2, performs just-in-time (JIT) translation of x86/x64 instruction blocks into optimized Arm64 equivalents, supporting complex workloads including DirectX graphics APIs and system calls without developer intervention.[51][53] The primary purpose of Arm64X is to bridge the legacy x86 application ecosystem to modern Arm hardware, allowing end-users to run unmodified x64 software transparently while encouraging incremental native optimization.[48] Key features include its "chameleon-like" adaptability, where a single binary loads into either Arm64 or x64/Arm64EC processes without recompilation for existing x86 apps, and built-in support for API forwarding via pure forwarder DLLs.[52] Native Arm64 portions execute at full hardware speed, while emulated x86/x64 code benefits from Prism's optimizations, delivering up to a 2x performance improvement over prior emulation layers and approaching near-parity in many scenarios as of late 2025 updates.[54] Nearly all 64-bit system binaries in Windows 11 on Arm are built as Arm64X, ensuring consistent compatibility across architectures.[55]Related Concepts
Heterogeneous Computing
Heterogeneous computing encompasses systems that integrate diverse processing units, such as central processing units (CPUs), graphics processing units (GPUs), digital signal processors (DSPs), and neural processing units (NPUs), to leverage specialized hardware for improved performance and energy efficiency.[56][57] In these environments, fat binaries play a crucial role by packaging multiple variants of executable code tailored to different accelerators, enabling seamless dispatch to the most suitable hardware component at runtime.[58] The connection between fat binaries and heterogeneous computing lies in the creation of multi-kernel executables that bundle device-specific code segments. For instance, in frameworks like OpenCL and CUDA, these binaries encapsulate kernels optimized for various architectures, while runtime application programming interfaces (APIs) query the system's hardware capabilities to select and load the appropriate code path.[59][58] This mechanism avoids the need for separate builds per device, streamlining deployment across heterogeneous setups. A prominent example is NVIDIA's CUDA fatbins, which embed Parallel Thread Execution (PTX) as an intermediate representation alongside SASS (Shader Assembly) native code for specific GPU architectures, allowing applications to fall back between CPU and GPU execution.[58] These are widely employed in artificial intelligence and machine learning workloads, where computational demands vary by available accelerators.[60] The primary advantages of fat binaries in heterogeneous computing include hardware-agnostic optimization, as they permit execution tailored to the present configuration without recompilation, reducing development overhead.[58] This is especially vital in mobile platforms, such as those using Qualcomm Snapdragon processors, which combine CPU clusters with integrated GPUs, DSPs, and NPUs for tasks like on-device AI inference.[61] The evolution of fat binaries in this context accelerated in the 2010s with the introduction of the Heterogeneous System Architecture (HSA), a standard that unified memory access and programming models across CPU and GPU domains, inspiring extensions to fat binary formats for enhanced interoperability in diverse accelerator ecosystems.[62]Fat Objects
Fat objects refer to compiler-generated object files (.o) that embed multiple variants of code or representations, enabling selection of appropriate implementations during the build process rather than requiring separate compilations for each target or optimization scenario. In systems like Apple's Mach-O format, fat object files combine object code for multiple architectures—such as i386 and x86_64—using tools like lipo, allowing the linker to extract and use only the relevant slice at build time. This approach serves as a precursor to full fat binaries by facilitating multi-architecture support at the object level, where the linker performs the selection based on the target specified during linking.[6][7] In compilers such as GCC and LLVM/Clang, fat objects often incorporate both native object code and intermediate representations (IR), particularly through Link Time Optimization (LTO) features. For instance, GCC's -ffat-lto-objects option, introduced in GCC 4.7 in 2012, produces object files containing GIMPLE IR alongside compiled code, enabling whole-program optimizations during linking without recompiling sources.[63][64][65] Similarly, LLVM's FatLTO support, introduced in LLVM 17 in 2023, embeds LTO-compatible IR in object files to allow deferred optimization decisions.[66][67] These structures permit the linker to apply optimizations across modules, selecting or generating code paths tailored to specific targets or levels, such as variants using SSE instructions versus non-SSE for x86 compatibility.[63] This mechanism integrates with profile-guided optimization (PGO), where runtime execution profiles collected from instrumented builds inform link-time decisions, mimicking runtime selection but performed at build time to produce more efficient code without embedding all variants in the final binary. An example is Intel's ifunc (indirect function) feature in glibc, which embeds multiple CPU-specific versions—such as SSE-optimized versus baseline—in object files (typically shared libraries), with the resolver selecting the best at runtime based on detected features, though the inclusion occurs at the object compilation stage.[68] The primary benefits of fat objects include reduced distribution overhead compared to full fat binaries, as a single file supports multiple build configurations, and smaller final executable sizes since unused variants are discarded during linking rather than bundled entirely. This enables efficient just-in-time-like adaptation at build time, avoiding the need for comprehensive multi-architecture executables while supporting diverse hardware targets.[63]Function Multi-Versioning
Function multi-versioning is a compiler technique that generates multiple implementations of the same function, each optimized for specific hardware features such as instruction set extensions, with a runtime dispatcher selecting the appropriate version based on the executing CPU's capabilities.[69] This approach enables dynamic optimization without requiring separate binaries for different architectures, focusing on vectorization (e.g., SSE or AVX instructions) or speculative execution paths tailored to CPU variants.[70] In contrast to full fat binaries that embed entire executables for multiple architectures, function multi-versioning operates at a finer granularity, embedding only variant implementations of individual functions within a single binary to minimize overhead.[69] Compilers like GCC support this through attributes such as__attribute__((target_clones)), which instruct the compiler to produce clones of a function for specified targets, for example, "default", "arch=x86-64-v2" (SSE4.2), or "arch=x86-64-v3" (AVX2).[71] On x86-64 platforms, the __attribute__((target("avx"))) can generate SSE and AVX variants, allowing seamless execution on older or newer CPUs without crashes from unsupported instructions.[70] The generated code includes a dispatcher that resolves the optimal version at runtime, often using the GNU Indirect Function (ifunc) mechanism defined in the ELF standard extension.
In the ifunc implementation, a resolver function—marked with __attribute__((ifunc("resolver_name")))—is invoked once during program loading by the dynamic linker (rtld).[71] The resolver executes feature tests, typically via the CPUID instruction on x86, to query available extensions like AVX or FMA, storing results in global variables such as __cpu_features for quick lookup.[69] It then returns a pointer to the matching function version, ensuring subsequent calls bypass further resolution for efficiency.[71] This setup maps symbols dynamically, avoiding static linking decisions that might mismatch hardware.
A prominent use case is performance tuning in system libraries, where multi-versioning targets hot-path functions to leverage CPU-specific optimizations without inflating the entire library. In glibc, math functions like sin, cos, exp, and log have been enhanced with AVX2 and FMA variants since version 2.27, yielding over 50% speedups on Intel Skylake processors for floating-point operations.[72] By versioning only computationally intensive routines, this avoids the bloat of full fat binaries while enabling portable deployment across x86-64 baselines.[73]
The advantages of function multi-versioning include its granularity, which keeps binary sizes smaller than comprehensive fat architectures by limiting variants to performance-critical code, and its runtime adaptability, which can deliver measurable gains like 3% overall speedup in applications such as NumPy on Haswell CPUs.[69] It simplifies developer workflows compared to manual CPU dispatching or conditional compilation, promoting compatibility without SIGILL signals on mismatched hardware.[69] However, drawbacks involve increased linker complexity, as the ELF format must handle indirect symbols and resolvers, potentially complicating debugging or integration with non-supporting tools; additionally, automatic version generation requires explicit attributes, limiting its seamlessness in legacy codebases.[71]