Hubbry Logo
Object fileObject fileMain
Open search
Object file
Community hub
Object file
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something
Object file
Object file
from Wikipedia

An object file is a file that contains machine code or bytecode, as well as other data and metadata, generated by a compiler or assembler from source code during the compilation or assembly process. The machine code that is generated is known as object code.

The object code is usually relocatable, and not usually directly executable. There are various formats for object files, and the same machine code can be packaged in different object file formats. An object file may also work like a shared library.

The metadata that object files may include can be used for linking or debugging; it includes information to resolve symbolic cross-references between different modules, relocation information, stack unwinding information, comments, program symbols, and debugging or profiling information. Other metadata may include the date and time of compilation, the compiler name and version, and other identifying information.

The term "object program" dates from at least the 1950s:

A term in automatic programming for the machine language program produced by the machine by translating a source program written by the programmer in a language similar to algebraic notation.[1]

A linker is used to combine the object code into one executable program or library pulling in precompiled system libraries as needed.

Object file formats

[edit]

There are many different object file formats; originally each type of computer and supporting software had its own unique format, such as the OS/360 Object File Format, but with the advent of Unix and other portable operating systems, some formats, such as COFF, ELF, and Mach-O, have been defined and used on different kinds of systems.

Some systems make a distinction between formats which are directly executable and formats which require processing by the linker. For example, OS/360 and successors call the first format a load module and the second an object module. In this case the files have entirely different formats.[2] DOS and Windows also have different file formats for executable files and object files, such as Portable Executable for executables and COFF for object files in 32-bit and 64-bit Windows.

Unix and Unix-like systems have used the same format for executable and object files, starting with the original a.out format. Some formats can contain machine code for different processors, with the correct one chosen by the operating system when the program is loaded.[3][4]

The design and/or choice of an object file format is a key part of overall system design. It affects the performance of the linker and thus programmer turnaround while a program is being developed. If the format is used for executables, the design also affects the time programs take to begin running, and thus the responsiveness for users.

The GNU Project's Binary File Descriptor library (BFD library) provides a common API for the manipulation of object files in a variety of formats.

Absolute files

[edit]

Many early computers, or small microcomputers, support only an absolute object format. Programs are not relocatable; they need to be assembled or compiled to execute at specific, predefined addresses. The file contains no relocation or linkage information. These files can be loaded into read/write memory, or stored in read-only memory. For example, the Motorola 6800 MIKBUG monitor contains a routine to read an absolute object file (SREC Format) from paper tape.[5] DOS COM files are a more recent example of absolute object files.[6]

Segmentation

[edit]

Most object file formats are structured as separate sections of data, each section containing a certain type of data. These sections are known as "segments" due to the term "memory segment", which was previously a common form of memory management. When a program is loaded into memory by a loader, the loader allocates various regions of memory to the program. Some of these regions correspond to sections of the object file, and thus are usually known by the same names. Others, such as the stack, only exist at run time. In some cases, relocation is done by the loader (or linker) to specify the actual memory addresses. However, for many programs or architectures, relocation is not necessary, due to being handled by the memory management unit or by position-independent code. On some systems the segments of the object file can then be copied (paged) into memory and executed, without needing further processing. On these systems, this may be done lazily, that is, only when the segments are referenced during execution, for example via a memory-mapped file backed by the object file.

Types of data supported by typical object file formats:[7]

Segments in different object files may be combined by the linker according to rules specified when the segments are defined. Conventions exist for segments shared between object files; for instance, in DOS there are different memory models that specify the names of special segments and whether or not they may be combined.[8]

The debugging data format of debugging information may either be an integral part of the object file format, as in COFF, or a semi-independent format which may be used with several object formats, such as stabs or DWARF.

See also

[edit]

References

[edit]

Further reading

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
An object file is a produced by a or assembler during the compilation process, containing machine-readable , initialized and uninitialized data, symbol information, and relocation directives that enable the linker to combine multiple such files into an program, , or other loadable module. These files are relocatable, meaning their contents can be positioned at different memory addresses during linking, and they serve as an essential intermediate step in modular , allowing modules to be compiled independently before final assembly. The structure of an object file varies by platform but generally includes a header that specifies the file type, target architecture, and entry points; sections or segments that organize code, data, and metadata; and auxiliary tables for symbols, relocations, and debugging information. For instance, in the Executable and Linking Format (ELF), commonly used in Unix-like systems such as Linux, the file begins with an ELF header followed by section headers describing content units like .text (code) and .data, and the raw section data itself. Program headers for loadable segments are present in executable and shared object files. Similarly, Windows object files adhere to the Common Object File Format (COFF), featuring a COFF header with machine type and section count, followed by section headers and raw data including relocations for address adjustments during linking. On Apple platforms, the Mach-O format organizes content into segments such as __TEXT for read-only code and constants, and __DATA for modifiable variables, with sections within segments to fine-tune memory mapping and sharing. Object files play a critical role in the build process by preserving unresolved references to external symbols, which the linker resolves by matching them across files and libraries, thus supporting features like and code optimization. They also include attributes for security, such as executable permissions on code sections, and can embed resources or debug symbols to aid in development and without affecting runtime . The use of standardized formats ensures portability within compatible ecosystems, though cross-platform linking often requires format conversion tools.

Overview

Definition and Characteristics

An object file is a file containing object code, typically in the form of machine code or bytecode, along with metadata and ancillary data such as symbol tables and relocation information, generated by a or assembler from during the compilation process. These files hold sections for instructions, data, and other elements suitable for subsequent processing. Key characteristics of object files include their relocatability, meaning they contain relocation information that allows the code and data to be positioned at any memory address by adjusting addresses during linking, facilitating combination with other files. They are not directly executable, as they contain unresolved references to external symbols and lack the complete structure needed for standalone execution. Object files also incorporate symbols for undefined references, enabling the linker to resolve dependencies, and support multiple processor architectures through standardized formats like . In contrast to , which is human-readable and written in high-level languages for programmers, object files consist of low-level, machine-readable instructions that a CPU can process directly but require additional steps to become functional. Unlike fully linked executables, which are complete, runnable programs with resolved addresses and no external dependencies, object files serve as intermediate artifacts in the build pipeline. Common examples include the .o files produced by the GNU Compiler Collection (GCC) when invoked with the -c option, which compiles source code into relocatable object code without performing linking. Similarly, Microsoft's CL compiler generates object files that can be specified via the /Fo option for custom naming and directories.

Historical Development

The concept of an object file originated in the alongside the development of early high-level compilers, particularly at . The term "object program" was used to describe the machine-language output generated by the from , emphasizing efficiency in producing code for systems like the 704. This marked a shift from manual assembly of to automated translation, with 's first implementation in 1957 setting a precedent for separating compilation from final loading. In the , object files evolved with the rise of more structured formats, exemplified by 's OS/360 Object File Format introduced for the System/360 mainframe architecture announced in 1964. Early object representations often relied on punch-card systems, where compiled code was stored as sequences of punched cards representing binary data, symbols, and relocation information for on machines like the IBM 7090. These formats supported absolute addressing but were limited by the physical constraints of cards and tapes, facilitating the linkage of modules in large-scale scientific and business applications. The 1970s brought significant advancements with the advent of Unix at , driving a transition from absolute to relocatable object files to support and dynamic memory allocation. The initial Unix versions, starting around 1971 on the PDP-11, used the simple a.out format for relocatable objects, allowing code sections to be positioned anywhere in memory during linking. A key milestone was the development of the ld linker in early Unix (First Edition, 1971), which automated the resolution of external references and relocation, enabling efficient assembly of multiple object files into executables. This shift was influenced by the demands of emerging multitasking operating systems, which required flexible segmentation to isolate processes and manage without fixed addresses. Subsequent standardization in the late 1970s and 1980s further refined object files for portability across architectures. The Common Object File Format (COFF), designed by around 1982 and adopted in Release 3 (1986), introduced extensible sections for symbols, relocations, and debugging, replacing simpler formats like a.out in many systems. By the 1990s, the Executable and Linking Format (ELF), specified in the System V ABI (1990) and first implemented in Solaris 2.0 (1992), became the dominant standard, offering improved support for dynamic linking and shared libraries in environments.

Purpose and Role

In the Compilation and Linking Process

In the software build pipeline, object files serve as an intermediate artifact generated during compilation from . A translates high-level , such as or C++, through stages including preprocessing, , , code generation, and assembly into relocatable stored in an object file, typically with extensions like .o or .obj. For example, the GNU Collection (GCC) produces such files when the -c flag is specified, halting the process after assembly to output one object file per source file without performing linking. Similarly, Clang from the LLVM project follows this approach, generating object files in formats like ELF for systems or COFF for Windows. The linking phase follows compilation, where a linker tool combines multiple object files—along with static libraries—into a single program or . The GNU linker (ld) reads these inputs, performs symbol resolution by matching external references (uses of functions or variables) in one object file to their definitions in another or in libraries, and allocates sections into a unified layout. Microsoft's LINK.EXE operates analogously for COFF object files, integrating modules to form .exe or .dll outputs while resolving inter-module dependencies. This step enables modular development, as separate compilation units can be built independently before final assembly. Central to linking is the relocation process, which resolves symbolic addresses in object files to their final runtime locations. Object files contain relocation records that mark instructions or data elements requiring address adjustments, such as jumps to external functions or global variables; the linker computes offsets based on the assigned positions and patches these sites accordingly. This ensures code portability across different load addresses, with static relocation occurring fully at link time for executables. Common tools in this workflow include compilers like GCC and for object file generation, and linkers such as ld or LINK.EXE for integration, often invoked automatically by build systems like Make or . Errors during linking, such as undefined symbols where a reference lacks a corresponding definition, prompt the linker to halt and report failures, ensuring incomplete programs are not produced. Relocation errors, including mismatched types or overflow in address calculations, are similarly flagged to prevent invalid executables.

Applications and Usage

Object files are integral to static linking, where the linker merges multiple relocatable object files along with static libraries to produce a standalone . This approach embeds all required code and data directly into the final binary, eliminating external dependencies at runtime and ensuring portability across systems without additional library installations. In dynamic linking, object files compiled as serve as the foundation for shared object files (.so), which the operating system's loads into only when needed. This enables code sharing among multiple processes, reduces memory usage, and allows updates to libraries without recompiling applications. Object files act as the core components for constructing libraries, with static libraries (.a) formed by archiving groups of object files for inclusion in executables during linking. Dynamic libraries, built from object files generated with position-independent flags such as -fPIC, support runtime loading and promote modular . Cross-compilation leverages object files to target diverse architectures, such as generating ARM-compatible files from an x86-based host using specialized toolchains, facilitating development for embedded devices, mobile platforms, and environments. Modern build systems like Make and utilize object files to enable incremental compilation, where only changed source files are recompiled into new object files before relinking, significantly accelerating development cycles for large-scale projects. In embedded systems, object files are essential for producing optimized , as compilers generate architecture-specific files that link into minimal executables tailored for microcontrollers with limited resources. Within workflows, such as Docker multi-stage builds, object files emerge as transient intermediates during compilation in builder stages, yielding final binaries or libraries copied to runtime images for efficient, portable deployment.

Internal Components

Headers and Metadata

Object file headers and metadata provide the foundational structure for interpreting the file's contents, serving as the initial point of access for tools like assemblers, linkers, and loaders. These elements encapsulate critical descriptors that identify the file's format, target environment, and organizational layout, ensuring compatibility and proper across diverse systems. By embedding this information at the file's outset, headers facilitate rapid validation and navigation, preventing misinterpretation of subsequent data sections. The core of an object file header typically includes magic numbers, which are unique byte sequences at the file's beginning used to identify the format. For instance, in the Executable and Linkable Format (ELF), the magic number consists of the bytes 0x7F followed by 'E', 'L', and 'F' in the e_ident array, confirming it as an ELF file. Version fields specify the format iteration, such as ELF's e_version set to 1 for the original specification, allowing tools to handle evolving standards. Architecture indicators denote the target processor, distinguishing between variants like 32-bit (e.g., EM_386 for x86) and 64-bit (e.g., EM_X86_64), as seen in ELF's e_machine field. An entry point address, if present, marks the virtual location for program execution startup, such as ELF's e_entry field, which is zero for non-executable object files. Additional metadata elements include timestamps recording file creation or modification time, offsets to key structures like the section header table (e.g., ELF's e_shoff), and derived file size information from header aggregates. Endianness specifications clarify byte ordering—little-endian (ELFDATA2LSB) or big-endian (ELFDATA2MSB) in ELF via e_ident[EI_DATA]—while ABI details, such as ELF's e_ident[EI_OSABI], outline operating system and conventions to ensure . In the Common Object File Format (COFF), used for Windows object files, the file header includes a (TimeDateStamp) and machine type for (e.g., 0x8664 for AMD64). The PE format for executables extends COFF with an optional header containing fields such as entry point (AddressOfEntryPoint) and image size (SizeOfImage). These header components collectively enable loaders and linkers to parse, validate, and process the object file efficiently. For example, ELF headers supply offsets like e_phoff for the program header table and e_shoff for sections, allowing sequential reading without scanning the entire file. In COFF-based formats, the 20-byte file header provides section counts, pointers, and flags for relocatability, guiding linking operations. This structured metadata ensures that disparate tools can reliably assemble relocatable code into executables or libraries, maintaining system integrity across architectures.

Code, Data, and Symbol Sections

Object files primarily consist of sections that store the program's , , and symbolic references, forming the core for linking and loading. These sections are organized to separate instructions from modifiable and metadata, enabling efficient processing by linkers and loaders. The , , and symbol sections, along with the section table that catalogs them, provide the essential content while maintaining modularity for relocation and combination during the build process. The code section, conventionally named .text, contains the machine instructions generated by the or assembler for the program's execution. This section holds the compiled in a read-only format to prevent unintended modifications during runtime, and it is marked with attributes that allow execution and allocation in . For instance, in typical object file structures, the .text section's size is defined to encompass the exact byte length of the instructions, with alignment requirements ensuring proper addressing, often to boundaries like 4 or 8 bytes. Permissions restrict it to read and execute operations only, safeguarding the integrity of the program's logic. Data sections store the program's variables and constants, divided into initialized and uninitialized variants to optimize and runtime initialization. The initialized data section, often called .data, includes explicitly set values such as global variables with initial assignments, and it is designated as read-write to allow updates during program execution. This section occupies space in the file proportional to the total size of the initialized , with alignment attributes ensuring compatibility with the target architecture's access patterns. In contrast, the uninitialized data section, typically .bss, reserves space for variables that start with zero or undefined values; it does not consume file storage but specifies a size for zero-filling by the loader at runtime, also marked as read-write. Both data sections include permissions for allocation and writing, distinguishing them from the immutable code section. The serves as a catalog of identifiers used in the program, such as functions, global variables, and labels, each entry detailing the symbol's name, type (e.g., function or object), scope (local or external), and either a resolved address within the object file or a marker for external resolution during linking. This table facilitates the linker's ability to connect references across multiple object files, supporting both static and dynamic linking processes. Symbols are stored in a structured where each entry includes binding information to indicate visibility and relocation needs, ensuring that undefined symbols trigger appropriate error handling or library resolution. The section table acts as a directory that enumerates all sections in the object file, providing metadata for each including its name, , offset within the file, alignment constraints, and permission flags such as readable, writable, or . This table, typically an of fixed- entries, allows tools to navigate and manipulate the file's contents without the entire . For example, attributes specify the byte length allocated for each section, while alignment ensures sections start at optimal addresses to avoid penalties, and permissions define runtime access controls enforced by the operating system loader. The ELF header points to this table to enable quick access during processing.

Relocation and Debugging Information

Object files include relocation tables to facilitate the adjustment of addresses during linking and loading, allowing code and data references to be resolved relative to a base address without recompilation. These tables consist of entries that specify the offsets within sections where modifications are needed, along with the type of relocation to apply. For instance, in the (ELF), a relocation entry like R_386_PC32 indicates a 32-bit PC-relative relocation for the x86 architecture, where the linker adjusts the instruction to account for the difference between the reference and the program's counter at runtime. Each entry typically includes fields such as the offset to the location requiring adjustment, the index into the for the referenced entity, and the relocation type, enabling the linker to compute the final address by adding offsets, section alignments, or symbol values as needed. In multi-file scenarios, relocation tables support linking by referencing symbols defined in other object files, allowing the linker to patch references across modules during the creation of an executable or . This process resolves external dependencies, such as function calls or global variables, by matching relocation entries to definitions and applying the appropriate adjustments, which is essential for modular program development. The purpose of these tables extends to runtime loading, where dynamic linkers like ld.so use them to fix up addresses in for shared libraries, ensuring correct execution regardless of the memory location assigned by the operating system. Debugging information in object files provides source-level mappings that enable developers to analyze program behavior using tools like GDB, without altering the compiled code. Common formats include , a standardized data format that encodes details such as line numbers, local variables, function scopes, and type information in a compact, hierarchical structure stored in dedicated sections like .debug_info and .debug_line. In contrast, the older stabs format uses a simpler, symbol-table-based approach to record similar data, often embedded in string tables, though it is less efficient and has been largely superseded by in modern systems. These formats allow debuggers to correlate machine instructions with original , supporting features like breakpoints, stack traces, and variable inspection during execution or post-mortem analysis of core dumps. By including debugging information directly in object files, compilers enable seamless integration into the linking process, where it is preserved or stripped as needed, facilitating development workflows that prioritize runtime over production . This auxiliary data ensures that tools can perform symbolic across linked modules, mapping relocations and symbols back to high-level constructs for effective troubleshooting.

Types

Relocatable Object Files

Relocatable object files are intermediate binary files produced by compilers or assemblers that contain and data suitable for linking with other such files to form an or shared object. These files feature unresolved addresses, allowing the linker to assign final locations during the linking , thereby enabling the code and data to be placed at any position in the final program's space. Key features of relocatable object files include symbol tables that record external references to functions and variables defined in other modules, as well as relocation entries that mark locations in the code or data requiring address adjustments. Relocatable object files compiled for shared libraries may incorporate (PIC), which avoids hard-coded absolute addresses to facilitate loading at arbitrary memory locations. Relocation tables within these files provide the necessary information for the linker to resolve these references, though detailed structures are handled during the linking phase. The primary advantages of relocatable object files lie in supporting modular compilation, where individual source modules can be compiled independently and later combined, accelerating development and build times for large projects by allowing recompilation of only modified components. This also enhances , as changes to one module do not necessitate rebuilding the entire program, and facilitates the creation of reusable components like static and shared libraries that can be linked across multiple applications. However, relocatable object files have limitations, including the necessity of a linker to resolve and perform address , preventing direct execution without further processing. Additionally, they tend to be larger than final executables due to embedded metadata such as symbol tables, relocation information, and debugging data, which increases storage requirements during development.

Absolute Object Files

Absolute object files are files that contain , data, and metadata with all addresses resolved to fixed, absolute locations, enabling direct loading into without requiring or additional linking processes. These files are typically the final output of a linker in embedded or legacy toolchains, where all symbolic references have been replaced with addresses based on a predefined . Unlike relocatable formats, absolute object files omit relocation tables, which specify how addresses should be adjusted during loading, making them suitable for environments with static allocation. Historically, absolute object files played a key role in early computing systems with limited resources and fixed models. In , .COM files functioned as simple absolute executables, comprising raw loaded at a predetermined offset of 0x0100 in , without headers, symbol tables, or relocation information to keep the format compact for 8086-based systems. Similarly, the S-record (SREC) format, introduced by in the 1970s for the MC6800 , represents absolute object data in an ASCII-hexadecimal , where each record specifies data bytes along with their exact absolute addresses, facilitating programming of EPROMs and ROMs in embedded applications. The Absolute Object Module Format (AOMF), a of the 8051 Object Module Format (OMF), further exemplifies this era, structuring records such as content and debug data with absolute offsets and excluding relocation directives to produce a single, self-contained module. Key features of absolute object files include the absence of tables for external references and records, which reduces file size—often by eliminating overhead metadata—and simplifies loading but renders the code inflexible for reuse or modular assembly. This results in a streamlined , such as contiguous blocks with segment identifiers set to zero and explicit offsets, ensuring all elements are positioned for immediate execution in a fixed . In modern niches, absolute object files remain relevant in resource-constrained embedded systems, particularly for bootloaders and where memory is preallocated and non-relocatable, allowing direct conversion to formats like SREC for flashing microcontrollers without runtime address resolution. For instance, toolchains like those from Renesas generate absolute S-record outputs from linked binaries to represent images at specific flash addresses.

Common Formats

Executable and Linkable Format (ELF)

The (ELF) is a standard object file format designed for executables, relocatable , shared libraries, and core dumps in operating systems. It provides a flexible, extensible structure that supports multiple architectures and facilitates both static and dynamic linking processes. ELF files are particularly prevalent in environments requiring efficient loading and execution, such as distributions, Solaris, Android, and embedded systems. Developed by UNIX System Laboratories (USL) in the early 1990s as part of the System V Application Binary Interface (ABI), ELF was introduced with UNIX System V Release 4 to standardize binary formats across Unix variants. The format was later refined by the Tool Interface Standard (TIS) Committee, with Version 1.2 published in 1995, emphasizing portability for 32-bit Intel architectures while allowing extensions for broader use. The ELF specification continues to evolve; as of September 2025, a draft for version 4.3 is under public review. Its adoption extended rapidly due to its support for relocatable objects (for linking), executables (for direct execution), and shared objects (for dynamic libraries), making it a cornerstone for modern Unix-derived systems. In Android, ELF serves as the exclusive format for native binaries and libraries, enabling cross-compilation via the Native Development Kit (NDK). Similarly, embedded Linux platforms leverage ELF for its lightweight structure in resource-constrained environments. The core structure of an ELF file begins with the ELF header, a fixed-size structure at offset zero that identifies the file type and provides metadata such as the architecture (e.g., machine type for x86 or ), object file class (32-bit or 64-bit), encoding (), version, and offsets to subsequent tables. Following the ELF header are the program header table (optional for executables and shared objects) and the section header table, which describe loadable segments and individual sections, respectively. Program headers define memory segments for runtime loading, including types like PT_LOAD for and segments and PT_DYNAMIC for linker information. Section headers, in contrast, catalog granular components such as .text (executable ), . (initialized ), . (uninitialized ), and specialized sections like .dynsym (dynamic ) and .strtab (string table). A distinguishing feature of ELF is its versioning support, with ELF-32 for 32-bit systems using 32-bit addresses and ELF-64 for 64-bit systems accommodating larger address spaces and extended data types. Dynamic linking is enabled through the .dynamic section, which holds an array of tag-value pairs (e.g., DT_NEEDED for shared library dependencies, DT_PLTRelSZ for procedure linkage table size) that the runtime linker uses to resolve symbols at load time or lazily during execution. This setup supports position-independent executables (PIE) and shared objects (.so files), promoting modularity and reducing memory footprint in multi-process environments like those in Linux and Solaris.

Common Object File Format (COFF)

The Common Object File Format (COFF) is a format originally developed by in the early 1980s for use in , particularly for non-VAX 32-bit platforms such as the 3B20 computer. It was designed to store , supporting both relocatable object files—which allow linking with other modules—and absolute object files for direct execution or loading. COFF provided improvements over earlier formats like a.out, including better support for debugging and portability across architectures. In the Microsoft Windows ecosystem, COFF forms the foundation for the (PE) format, adapted for and subsequent operating systems to handle executables, object files, and dynamic-link libraries (DLLs). The structure of a COFF file begins with a 20-byte file header that specifies essential metadata, including the target machine type (e.g., 0x014c for 386 or compatible processors), the number of sections, a , pointers to the and string table, the number of symbols, the size of the optional header, and characteristics flags indicating file properties such as whether it is or relocatable. Following the file header is an optional header, which is mandatory for PE images but can be absent in pure COFF object files; it includes fields like the image base address, , and data directories for additional structures in executables. An array of section headers then describes each section (typically up to 96 sections), with each 40-byte header detailing the section name (e.g., .text for or .data for initialized ), virtual and raw sizes, offsets, and characteristics like readability, writability, or executability. The actual section data follows, containing , , or other content. Relocation entries and line number entries are associated with sections to facilitate address fixes during linking and source-level , respectively. A key component is the symbol table, an array of 18-byte entries located after the sections, each representing a symbol such as a function, variable, or file with fields for the name (via index into a trailing string table), value (e.g., offset within a section), section number, type (combining base type like integer and derived type like function), storage class (e.g., external or static), and the number of auxiliary entries. Auxiliary entries provide supplementary information, such as function lengths, section checksums, or weak external references, with up to 17 such entries per symbol in formats tailored to the symbol type. For relocations, COFF defines machine-specific types; for example, IMAGE_REL_I386_DIR32 (value 0x0006) specifies a 32-bit absolute virtual address relocation for x86 architectures, while IMAGE_REL_AMD64_ADDR64 (0x0001) handles 64-bit absolute addresses in x64. Line number tables map machine instructions to source line numbers, consisting of pairs with a type field (symbol table index or relative virtual address) and the line number, aiding debuggers in correlating binary code with source code. Over time, COFF was largely superseded by the (ELF) in systems starting in the early , as ELF offered greater flexibility for dynamic loading and extensibility. However, Microsoft extended COFF into the PE/COFF specification for Win32 in the early and further adapted it for 64-bit Windows (PE32+), retaining core COFF elements while adding Windows-specific features like import/export tables and resource sections to support the operating system's loader and runtime environment. This evolution ensured compatibility with legacy tools while enabling modern Windows applications.

Mach-O

The Mach-O file format, short for Mach Object, serves as the native binary format for executables, relocatable object code, shared libraries, dynamically loaded modules, and core dumps in Apple's operating systems, including macOS and . Originally developed by in the late 1980s as part of the operating system, it succeeded the older a.out format and was designed to support the Mach microkernel's flexible addressing and dynamic linking needs. Following Apple's acquisition of NeXT in 1997, Mach-O became the foundation for OS X (now macOS) and subsequent platforms, evolving to accommodate architectures like PowerPC, x86, and . At its core, a Mach-O file consists of a header, followed by load commands and the data they reference. The header includes a magic number to identify the file type (e.g., MH_MAGIC for 32-bit or MH_MAGIC_64 for 64-bit), the target CPU type (such as CPU_TYPE_X86_64 or CPU_TYPE_ARM64), the file type (e.g., MH_OBJECT for relocatable files, MH_EXECUTE for executables, or MH_DYLIB for dynamic libraries), the number of load commands, and their total size. Load commands then detail the file's structure, including segment mappings, symbol tables, and dynamic library dependencies; for instance, LC_SEGMENT_64 specifies a segment's virtual memory address, size, and permissions, while LC_LOAD_DYLIB lists required shared libraries. The file's content is organized into segments—logical groupings like __TEXT (read-only code and constants) and __DATA (writable variables)—each containing one or more sections, such as __text for machine code or __bss for uninitialized data. Segments align to page boundaries (typically 4 KB or 16 KB) for efficient memory mapping. Key features distinguish Mach-O from other formats, particularly in supporting Apple's multi-architecture environments. Universal binaries, or "fat" files, encapsulate multiple binaries within a single container using a fat_header and array of fat_arch structures, each specifying an offset and length for a particular CPU subtype and byte order; this allows binaries to run on diverse hardware like Intel x86_64 and without separate distributions. Additionally, Mach-O employs a two-level for symbol resolution, where undefined symbols are qualified by their originating library (e.g., via an ordinal index in the ), reducing conflicts in and enabling features like prebinding for faster startup. These elements facilitate robust via dyld, Apple's /loader, which interprets load commands to bind libraries at runtime or launch. Mach-O's adoption remains central to the , powering all native applications, frameworks, and system components in macOS and since their inception. Its design optimizes for security features like and , while dyld handles just-in-time symbol resolution and caching to enhance across devices. As of 2025, it continues to evolve with support for modern extensions, ensuring compatibility in hybrid Intel-ARM transitions.

References

Add your contribution
Related Hubs
Contribute something
User Avatar
No comments yet.