Hubbry Logo
Intel HEXIntel HEXMain
Open search
Intel HEX
Community hub
Intel HEX
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Intel HEX
Intel HEX
from Wikipedia

Intel hex
Filename extensionsGeneral-purpose:
.hex,[1] .mcs,[2] .int,[3] .ihex, .ihe, .ihx[4]
Platform-specific:
.h80, .h86,[5][6] .a43,[7][4] .a90[7][4]
Split, banked, or paged:
.hxl.hxh,[8] .h00.h15, .p00.pff[9]
Binary or Intel hex:
.obj, .obl,[8] .obh,[8] .rom, .eep

Intel hexadecimal object file format, Intel hex format or Intellec Hex is a file format that conveys binary information in ASCII text form,[10] making it possible to store on non-binary media such as paper tape, punch cards, etc., to display on text terminals or be printed on line-oriented printers.[11] The format is commonly used for programming microcontrollers, EPROMs, and other types of programmable logic devices and hardware emulators. In a typical application, a compiler or assembler converts a program's source code (such as in C or assembly language) to machine code and outputs it into an object or executable file in hexadecimal (or binary) format. In some applications, the Intel hex format is also used as a container format holding packets of stream data.[12] Common file extensions used for the resulting files are .HEX[1] or .H86.[5][6] The HEX file is then read by a programmer to write the machine code into a PROM or is transferred to the target system for loading and execution.[11][13] There are various tools to convert files between hexadecimal and binary format (i.e. HEX2BIN), and vice versa (i.e. OBJHEX, OH, OHX, BIN2HEX).

History

[edit]

The Intel hex format was originally designed for Intel's Intellec Microcomputer Development Systems[14]: 10–11  (MDS) in 1973 in order to load and execute programs from paper tape. It was also used to specify memory contents to Intel for ROM production,[15] which previously had to be encoded in the much less efficient BNPF (Begin-Negative-Positive-Finish) format.[14]: 11  In 1973, Intel's "software group" consisted only of Bill Byerly and Kenneth Burgett, and Gary Kildall as an external consultant doing business as Microcomputer Applications Associates (MAA) and founding Digital Research in 1974.[16][17][18][9] Beginning in 1975, the format was utilized by Intellec Series II ISIS-II systems supporting diskette drives, with files using the file extension HEX.[19] Many PROM and EPROM programming devices accept this format.

Format

[edit]

Intel HEX consists of lines of ASCII text that are separated by line feed or carriage return characters or both. Each text line contains uppercase hexadecimal characters that encode multiple binary numbers. The binary numbers may represent data, memory addresses, or other values, depending on their position in the line and the type and length of the line. Each text line is called a record.

Record structure

[edit]

A record (line of text) consists of six fields (parts) that appear in order from left to right:[11]

  1. Start code, one character, an ASCII colon ':'. All characters preceding this symbol in a record should be ignored.[15][5][20][21][22][23] In fact, very early versions of the specification even asked for a minimum of 25 NUL characters to precede the first record and follow the last one, owing to the format's origins as a paper tape format which required some tape leadin and leadout for handling.[15][24][21][22] However, as this was a little known part of the specification, not all software written copes with this correctly. It allows to store other related information in the same file (and even the same line),[15][23] a facility used by various software development utilities to store symbol tables or additional comments,[25][15][21][26][9][27] and third-party extensions using other characters as start code like the digits '0'..'9' by Intel[28] and Keil,[26] '$' by Mostek,[29][30] or '!', '@', '#', '\', '&' and ';' by TDL.[30][31] By convention, '//' is often used for comments.[32][33] Neither of these extensions may contain any ':' characters as part of the payload.
  2. Byte count, two hex digits (one hex digit pair), indicating the number of bytes (hex digit pairs) in the data field. The maximum byte count is 255 (0xFF). The values of 8 (0x08),[9] 16 (0x10)[9] and 32 (0x20) are commonly used byte counts. Not all software copes with counts larger than 16.[2]
  3. Address, four hex digits, representing the 16-bit beginning memory address offset of the data. The physical address of the data is computed by adding this offset to a previously established base address, thus allowing memory addressing beyond the 64 kilobyte limit of 16-bit addresses. The base address, which defaults to zero, can be changed by various types of records. Base addresses and address offsets are always expressed as big endian values.
  4. Record type (see record types below), two hex digits, 00 to 05, defining the meaning of the data field.
  5. Data, a sequence of n bytes of data, represented by 2n hex digits. Some records omit this field (n equals zero). The meaning and interpretation of data bytes depends on the application. (4-bit data will either have to be stored in the lower or upper half of the bytes, that is, one byte holds only one addressable data item.[15])
  6. Checksum, two hex digits, a computed value that can be used to verify the record has no errors.

Color legend

[edit]

As a visual aid, the fields of Intel HEX records are colored throughout this article as follows:

  Start code   Byte count   Address   Record type   Data   Checksum

Checksum calculation

[edit]

A record's checksum byte is the two's complement of the least significant byte (LSB) of the sum of all decoded byte values in the record preceding the checksum. It is computed by summing the decoded byte values and extracting the LSB of the sum (i.e., the data checksum), and then calculating the two's complement of the LSB (e.g., by inverting its bits and adding one).

For example, in the case of the record :0300300002337A1E, the sum of the decoded byte values is 03 + 00 + 30 + 00 + 02 + 33 + 7A = E2, which has LSB value E2. The two's complement of E2 is 1E, which is the checksum byte appearing at the end of the record.

The validity of a record can be checked by computing its checksum and verifying that the computed checksum equals the checksum appearing in the record; an error is indicated if the checksums differ. Since the record's checksum byte is the two's complement — and therefore the additive inverse — of the data checksum, this process can be reduced to summing all decoded byte values, including the record's checksum, and verifying that the LSB of the sum is zero. When applied to the preceding example, this method produces the following result: 03 + 00 + 30 + 00 + 02 + 33 + 7A + 1E = 100, which has LSB value 00.

Text line terminators

[edit]

Intel HEX records are usually separated by one or more ASCII line termination characters so that each record appears alone on a text line. This enhances readability by visually delimiting the records and it also provides padding between records that can be used to improve machine parsing efficiency. However, the line termination characters are optional, as the ':' is used to detect the start of a record.[15][5][24][20][21][22][23]

Programs that create HEX records typically use line termination characters that conform to the conventions of their operating systems. For example, Linux programs use a single LF (line feed, hex value 0A) character to terminate lines, whereas Windows programs use a CR (carriage return, hex value 0D) followed by a LF.

Record types

[edit]

Intel HEX has six standard record types:[11]

Hex code Record type Description Example
00 Data The byte count specifies number of data bytes in the record. The example has 0B (eleven) data bytes. The 16-bit starting address for the data (in the example at addresses beginning at 0010) and the data (61, 64, 64, 72, 65, 73, 73, 20, 67, 61, 70). :0B0010006164647265737320676170A7
01 End Of File Must occur exactly once per file in the last record of the file. The byte count is 00, the address field is typically 0000 and the data field is omitted. :00000001FF
02 Extended Segment Address The byte count is always 02, the address field (typically 0000) is ignored and the data field contains a 16-bit segment base address. This is multiplied by 16 and added to each subsequent data record address to form the starting address for the data. This allows addressing up to one mebibyte (1048576 bytes) of address space. :020000021200EA
03 Start Segment Address For 80x86 processors, specifies the starting execution address. The byte count is always 04, the address field is 0000 and the first two data bytes are the CS value, the latter two are the IP value. The execution should start at this address. :0400000300003800C1
04 Extended Linear Address Allows for 32 bit addressing (up to 4 GiB). The byte count is always 02 and the address field is ignored (typically 0000). The two data bytes (big endian) specify the upper 16 bits of the 32 bit absolute address for all subsequent type 00 records; these upper address bits apply until the next 04 record. The absolute address for a type 00 record is formed by combining the upper 16 address bits of the most recent 04 record with the low 16 address bits of the 00 record. If a type 00 record is not preceded by any type 04 records then its upper 16 address bits default to 0000. :020000040800F2
05 Start Linear Address The byte count is always 04, the address field is 0000. The four data bytes represent a 32-bit address value (big endian). In the case of CPUs that support it, this 32-bit address is the address at which execution should start. :04000005000000CD2A

Other record types have been used for variants, including 06 ('blinky' messages / transmission protocol container) by Wayne and Layne,[34] 0A (block start), 0B (block end), 0C (padded data), 0D (custom data) and 0E (other data) by the BBC/Micro:bit Educational Foundation,[35] and 81 (data in code segment), 82 (data in data segment), 83 (data in stack segment), 84 (data in extra segment), 85 (paragraph address for absolute code segment), 86 (paragraph address for absolute data segment), 87 (paragraph address for absolute stack segment) and 88 (paragraph address for absolute extra segment) by Digital Research.[6][20]

Named formats

[edit]

The original 4-bit/8-bit Intellec Hex Paper Tape Format and Intellec Hex Computer Punched Card Format in 1973/1974 supported only one record type 00.[36][37][25] This was expanded around 1975[when?] to also support record type 01.[15] Sometimes called symbolic hexadecimal format,[38] it could include an optional header containing a symbol table for symbolic debugging,[25][28][26][9] all characters in a record preceding the colon are ignored.[15][5]

Around 1978[when?], Intel introduced the new record types 02 and 03 (to add support for the segmented address space of the then-new 8086/8088 processors) in their Extended Intellec Hex Format.[when?]

Special names are sometimes used to denote the formats of HEX files that employ specific subsets of record types. For example:

  • I8HEX (aka HEX-80) files use only record types 00 and 01
  • I16HEX (aka HEX-86) files use only record types 00 through 03[10]
  • I32HEX (aka HEX-386) files use only record types 00, 01, 04, and 05

File example

[edit]

This example shows a file that has four data records followed by an end-of-file record:

:10010000214601360121470136007EFE09D2190140
:100110002146017E17C20001FF5F16002148011928
:10012000194E79234623965778239EDA3F01B2CAA7
:100130003F0156702B5E712B722B732146013421C7
:00000001FF

  Start code   Byte count   Address   Record type   Data   Checksum

Variants

[edit]

Besides Intel's own extension, several third-parties have also defined variants and extensions of the Intel hex format, including Digital Research (as in the so-called "Digital Research hex format"[6][20]), Zilog, Mostek,[29][30] TDL,[30][31] Texas Instruments, Microchip,[39][40] c't, Wayne and Layne,[34] and BBC/Micro:bit Educational Foundation (with its "Universal Hex Format"[35]). These can have information on program entry points and register contents, a swapped byte order in the data fields, fill values for unused areas, fuse bits, and other differences.

The Digital Research hex format for 8086 processors supports segment information by adding record types to distinguish between code, data, stack, and extra segments.[5][6][20]

Most assemblers for CP/M-80 (and also XASM09 for the Motorola 6809) don't use record type 01h to indicate the end of a file, but use a zero-length data type 00h entry instead.[41][1] This eases the concatenation of multiple hex files.[42][43][1]

Texas Instruments defines a variant where addresses are based on the bit-width of a processor's registers, not bytes.

Microchip defines variants INTHX8S[44] (INHX8L,[1] INHX8H[1]), INHX8M,[44][1][45] INHX16[44] (INHX16M[1]) and INHX32[46] for their PIC microcontrollers.

Alfred Arnold's cross-macro-assembler AS,[1] Werner Hennig-Roleff's 8051-emulator SIM51,[26] and Matthias R. Paul's cross-converter BINTEL[47] are also known to define extensions to the Intel hex format.

See also

[edit]

References

[edit]

Further reading

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
The Intel HEX file format, also known as Intel hexadecimal format, is an ASCII-based standard for representing binary program and data files in a human-readable text form, primarily used to load code and data into read-only memories (ROMs), erasable programmable read-only memories (EPROMs), and other non-volatile storage in microprocessors and embedded systems. Developed by Corporation in 1973, originally for its Intellec Microcomputer Development Systems, to support memory addressing up to 64 kilobytes, the format evolved to accommodate larger address spaces: the original 8-bit variant (INHX8M) for 16-bit linear addressing, the 16-bit segmented variant (INHX16) for 20-bit addressing, and the 32-bit linear variant (INHX32) for up to 4 gigabytes of addressable space. It was designed for transmission over non-binary media, such as paper tape or early terminals, and remains widely compatible with PROM programmers, emulators, and development tools from vendors like , Microchip, and Renesas. Each file consists of one or more records, each beginning with a colon (:) followed by the record length (in hexadecimal bytes), a two-byte load offset address, a one-byte record type, optional data bytes, and a one-byte checksum for error detection, with the entire structure encoded in two-character hexadecimal ASCII pairs. The primary record types include data records (type 00) for binary content, end-of-file records (type 01) to terminate the file, extended segment address records (type 02) for shifting the base address in segmented modes, extended linear address records (type 04) for upper 16 bits in linear modes, and start address records (types 03 or 05) to specify execution entry points. This modular structure allows efficient handling of large binaries while maintaining backward compatibility across processor architectures.

Introduction

Definition and Purpose

The is an ASCII-based representation of binary object files, enabling the embedding of and data within human-readable text files. Developed for 's 8-bit, 16-bit, and 32-bit microprocessors, it encodes as pairs of ASCII characters to facilitate storage and manipulation without the limitations of raw binary formats. Its primary purpose is to store and transfer , ROM images, or contents for programming microcontrollers, , and embedded systems. This format serves as a standard input for programmers and hardware emulators, allowing reliable loading of and into target devices across various address spaces, including 16-bit linear for 8-bit processors, 20-bit segmented for 16-bit processors, and 32-bit linear for 32-bit processors. The historical motivation for Intel HEX arose from the challenges of handling binary files in environments reliant on text-based media and displays, such as avoiding issues like line wrapping, corruption during transmission, or incompatibility with non-binary storage like paper tape, punch cards, or CRT terminals. By converting to ASCII , the format ensures both human readability and machine parsability, making it suitable for editing, printing, and archiving without specialized binary tools. Fundamentally, Intel HEX files are structured as a sequence of records, each prefixed by a colon (:) and comprising a byte count, starting address, record type, data bytes, and a for integrity verification. Record types such as data records and end-of-file records organize the content to represent complete memory images.

Key Features

The Intel HEX format employs ASCII encoding, where each byte of is represented by two characters (0-9 and A-F), effectively doubling the file size compared to binary but enabling transmission over text-based channels. This encoding ensures compatibility with standard text editors and printers, as the values are stored as printable ASCII characters. Its block-based structure organizes into discrete records, each beginning with a colon (:) and containing fields for byte count, , record type, , and , which collectively guard against corruption during transmission over non-binary media such as paper tape or serial lines. By delimiting content into these self-contained lines terminated by and line feed, the format minimizes errors from line breaks or partial reads in text streams. The format supports absolute addressing up to 64 kilobytes in its base 16-bit configuration, with extended record types allowing expansion to 20-bit segmented or 32-bit linear address spaces for larger memory requirements. Each record includes a checksum, computed as the modulo-256 sum of all preceding bytes negated, providing self-validation to detect transmission errors without external verification tools. Intel HEX achieves platform independence by treating data as byte sequences in hexadecimal form, eliminating endianness concerns since multi-byte values are not assembled within the file but interpreted by the loading application. This byte-oriented representation, combined with its ASCII nature, facilitates human readability, permitting manual inspection, editing, and verification of firmware content using any text viewer.

Historical Development

Origins

The HEX format was originally developed by Corporation in 1973 for its Intellec Microcomputer Development Systems (MDS) to load and execute programs, particularly over non-binary media like paper tape. The format's initial publication appeared in 's technical documentation, such as the MCS-48 User's Manual and PROM programmer guides. Its motivation stemmed from addressing limitations of binary files in teletype-era terminals, where the representation enabled better readability for human operators and built-in error detection via checksums. Early implementations of hexadecimal-formatted data loading occurred in Intel's hardware tools, such as the Universal PROM Programmer (UPP), for programming tasks.

Adoption and Evolution

The Intel HEX format saw rapid adoption throughout the among companies developing programming tools, particularly for 8-bit processors like the , where it facilitated the transfer of object code to PROMs and development systems. This uptake was driven by the format's ASCII-based readability and compatibility with early loaders and emulators, making it a for distribution in embedded applications during the era's boom. The format originated in 1973 for 16-bit linear addressing in Intel's MDS tools. Around 1978, record types 02 (extended segment address) and 03 (start segment address) were added to support the 20-bit segmented addressing of the 8086 processor family. Intel formalized the specification in 1988 through the "Hexadecimal Object File Format Specification (Revision A)," which defined the record structure for 8-bit, 16-bit, and 32-bit microprocessors and emphasized its use with programmers and hardware debuggers. This document solidified the format's , including mechanisms for segmented addressing, ensuring across Intel's evolving processor families. In response to growing memory demands in PCs and embedded systems, the 1988 specification introduced extensions such as the Extended Linear Address Record (type 04) and Start Linear Address Record (type 05), enabling full 32-bit addressing up to 4 GiB and supporting the transition from 16-bit to 32-bit architectures. These enhancements addressed limitations in earlier 20-bit segmented addressing, allowing the format to handle larger codebases without fragmentation. The format maintained relevance into the and beyond, becoming integrated into commercial integrated development environments (IDEs) such as Keil µVision, which generates Intel HEX output for -based devices via project options. Similarly, IAR Embedded Workbench supports Intel HEX as an output format through linker settings for microcontrollers like those from Microchip. Open-source tools like avrdude also rely on it for programming AVR and devices, parsing records to upload over serial interfaces. Although binary formats like have supplanted it in full-fledged operating systems for their richer metadata, Intel HEX endures in legacy embedded programming and hobbyist projects due to its lightweight nature and direct compatibility with flash programmers.

Core Format

Record Structure

The Intel HEX format organizes binary data into discrete records, each represented as a single line of ASCII text encoded in notation. This structure ensures compatibility with text-based transmission and storage systems, allowing reliable transfer of or data to devices like microcontrollers or programmers. Every record follows a consistent layout with fixed-position fields for metadata and a variable field for the , enabling parsers to systematically decode the content. The record begins with a mandatory prefix consisting of a single ASCII colon (:) character, which serves as the start code to delineate the beginning of each record. Immediately following the colon, the byte count field spans the next two digits (character positions 1-2 after the colon), specifying the number of bytes contained in the record; this value ranges from 00 to FF, corresponding to 0 through 255 bytes. The address field then occupies the subsequent four digits (positions 3-6), providing a 16-bit (two-byte) load offset where the bytes are to be stored in memory. Next, the record type field uses two hexadecimal digits (positions 7-8) to indicate the record's purpose, such as 00 for a standard record that carries the actual bytes in its field. The field itself is variable in length, consisting of twice the byte count number of hexadecimal digits (each pair representing one byte of ), and follows immediately after the record type field. For instance, a byte count of 10 would result in 20 hexadecimal digits for this field, encoding 10 bytes of program or configuration . Concluding the record is the checksum field, comprising the final two hexadecimal digits, which provides a validation mechanism to detect transmission errors. The overall length of a record varies based on the byte count: the minimum is 11 characters for a record with no data bytes (e.g., an record), while the maximum reaches 521 characters when including 255 data bytes. A skeletal representation of the structure is :NNAAAATT[DDDD...]CC, where NN is the byte count, AAAA the , TT the type, DDDD... the optional data pairs, and CC the .
FieldPosition (after colon)Length (characters)Description
Byte Count1-22 (hex digits)Number of data bytes (00-FF).
Address3-64 (hex digits)16-bit load offset.
Record Type7-82 (hex digits)Type identifier (e.g., 00 for ).
Data9 to 8 + 2×(byte count)Variable (2×byte count hex digits) bytes.
ChecksumFinal 22 (hex digits)Error detection value.

Record Types

The Intel HEX format defines several standard record types, each identified by a one-byte that determines how the record's fields are interpreted within the common structure of byte count, , , and . These types enable the specification of data placement, address extensions, file termination, and execution starting points, supporting various addressing modes from 16-bit to 32-bit systems. Type 00 (Data Record) is the primary record for loading program code or into . It specifies a variable number of data bytes (up to 255, indicated by the byte count field) to be stored sequentially starting at a 16-bit load offset provided in the field; the increments by one for each subsequent data byte, potentially rolling over from FFFF to 0000 without affecting higher bits. This type forms the bulk of most Intel HEX files, directly contributing the or content. Type 01 (End of File Record) signals the completion of the Intel HEX file, instructing the loader to cease processing further records. It contains no data bytes (byte count must be 00) and ignores the address field, which is conventionally set to 0000; the checksum is fixed at FF to ensure integrity. This record is mandatory and typically appears as the final line in the file. Type 02 (Extended Segment Address Record) establishes the upper 16 bits (bits 4 through 19) of a 20-bit segmented base for subsequent , enabling addressing up to 1 MB in legacy 16-bit systems. The byte count is fixed at 02, the field is 0000 (unused), and the two bytes represent the segment base with bits 3-0 zeroed; this value is shifted left by four bits (multiplied by 16) and added to the load offsets of following type 00 until reset. Type 03 (Start Segment Address Record) provides the initial execution address in segmented mode for 16-bit processors, such as the 8086, by specifying the (CS) and instruction pointer (IP) registers. It uses a byte count of 04, an unused field of 0000, and four data bytes: the first two for the 16-bit CS (MSB first) followed by two for the 16-bit IP (MSB first); this record is optional and primarily for runtime initialization rather than file loading. Type 04 (Extended Linear Address Record) sets the upper 16 bits (bits 16 through 31) of a 32-bit linear base address for subsequent records, supporting up to 4 GB of in modern systems. The byte count is 02, the address field is 0000 (unused), and the two data bytes hold the upper address value (MSB first), which is combined with the 16-bit offsets from type 00 records to form full 32-bit addresses until another such record overrides it. Type 05 (Start Linear Address Record) specifies the 32-bit linear execution start address, typically for the extended instruction pointer (EIP) in 32-bit Intel architectures like the 80386. It features a byte count of 04, an unused address field of 0000, and four data bytes representing the full 32-bit address (MSB first); like type 03, it is optional and serves for post-loading program entry point definition rather than data transfer.

Checksum Mechanism

The checksum in the Intel HEX format serves to detect transmission or storage errors by ensuring the integrity of each record's . It is computed as an 8-bit value appended to the end of every record, allowing parsers to verify that the byte count, address, record type, and fields have not been corrupted. This mechanism provides a simple yet effective error-detection capability, commonly used in embedded systems programming where files are transferred serially or loaded into memory devices. The checksum is calculated by summing the binary values of all bytes in the record from the byte count field through the last data byte (excluding the leading colon and the checksum itself). This sum is taken modulo 256 to obtain an 8-bit result, and the checksum byte is then the of that value, ensuring the total sum of all bytes including the checksum is zero modulo 256. Equivalently, the checksum byte CC satisfies: C=256(Smod256)if Smod2560,else C=0C = 256 - (S \mod 256) \quad \text{if } S \mod 256 \neq 0, \quad \text{else } C = 0 where SS is the sum of the bytes from the byte count to the last data byte. This two's complement approach, also expressible as the bitwise NOT of the sum modulo 256 followed by adding 1, guarantees that any single-bit error or common transmission faults will likely result in a non-zero total sum. To verify a record, a parser recomputes the sum of all bytes from the byte count through the checksum byte and checks if the result is zero modulo 256. If the total sum is not zero, the record is considered invalid, typically causing the parsing process to abort, log an error, or flag the affected record for manual correction, thereby preventing corrupted data from being loaded into target memory. For example, consider the record :04000000FEEFFFF020, where the byte count is 04 (4 bytes), address is 0000, type is 00, and is FE EF FF F0. The binary bytes are: 0x04, 0x00, 0x00, 0x00, 0xFE, 0xEF, 0xFF, 0xF0. Their sum is 992 (0x3E0 in hex), and 992 mod 256 = 224 (0xE0). The is then 256 - 224 = 32 (0x20), confirming the record's integrity since including 0x20 yields a total sum of 1024 (0x400), which is 0 mod 256.

Line Encoding and Termination

Intel HEX files are encoded as a series of ASCII text lines, where each line represents a single record in the format. The records consist of hexadecimal digits encoded in ASCII characters, with each byte of represented by two digits. By convention, these hexadecimal digits are uppercase (A-F), though lowercase (a-f) is also accepted by most parsers for flexibility in implementation. This ASCII-based encoding ensures that the file can be safely transmitted over text-based channels without corruption from binary data issues, as it avoids embedding null bytes (0x00) or other control characters in the representation. Each record must occupy exactly one line, with no padding, wrapping, or spanning across multiple lines to maintain parseability. Whitespace characters, such as spaces or tabs, are not permitted within the record fields; all hexadecimal pairs are contiguous following the initial colon (:) marker. Line terminators follow each record, typically using the standard followed by line feed (CRLF, hexadecimal 0D 0A) for broad compatibility across systems. In environments, a line feed (LF, 0x0A) alone is common, but CRLF is recommended to ensure reliable parsing on Windows and other platforms. These terminators are not included in the record's calculation. At the file level, an Intel HEX file comprises multiple such records, beginning with either an extended address record or a data record, and concluding with an end-of-file record (type 01) to signal completion. To accommodate traditional terminal display widths and editing tools, records are conventionally limited to a maximum line length of approximately 256 characters, though the format technically supports up to 521 characters per line (corresponding to 255 bytes of data). This structure allows the file to be processed line-by-line, facilitating straightforward sequential reading and validation.

Examples and Parsing

Basic File Example

A basic Intel HEX file example demonstrates standard data loading using type 00 records for sequential memory filling within the 16-bit address space. The following minimal file loads 32 bytes starting at address 0100h:

:10010000214601360121470136007EFE09D219014A :100110001C0200036C0001001F00000000000000F3 :00000001FF

:10010000214601360121470136007EFE09D219014A :100110001C0200036C0001001F00000000000000F3 :00000001FF

The first record loads 16 bytes of data at address 0100h using record type 00. The second record continues loading the next 16 bytes at address 0110h, also using type 00. The final record of type 01 marks the end of the file. Parsing involves reading each line as ASCII hexadecimal text, converting pairs of characters to binary byte values, applying the specified to place in sequentially, and verifying each record's by summing the byte count, address bytes, type byte, and bytes, then confirming the provided is the negation modulo 256. This example represents a short routine, resulting in 32 bytes loaded into starting at 0100h.

Extended Address Example

The extended segment address record (type 02) in the Intel HEX format enables addressing of locations beyond 64 kilobytes by establishing a 16-bit segment base value that subsequent data records (type 00) are offset from, supporting up to 1 of addressable space in segmented architectures. This mechanism is essential for representing or code in larger models where the standard 16-bit address field alone is insufficient. A representative example of an Intel HEX file utilizing an extended segment address record is the following:

:020000020008F4 :10080000AABBCCDDEEFF00112233445566778899AACC :10081000BBCCDDEEFF0011223344556677889900BBCC :00000001FF

:020000020008F4 :10080000AABBCCDDEEFF00112233445566778899AACC :10081000BBCCDDEEFF0011223344556677889900BBCC :00000001FF

In this file, the first record :020000020008F4 is a type 02 extended segment address record with data bytes 00 08, setting the segment base address to 0008h. The F4 is calculated as the two's complement of the sum of all preceding byte fields (02 + 00 + 00 + 02 + 00 + 08 = 0Ch, negated to F4h). The subsequent type 00 data records load 16 bytes each at offsets 0800h and 0810h from the current base, while the final type 01 record :00000001FF marks the end of the file. The address resolution for data records following the extension is determined by shifting the segment base left by 4 bits (multiplying by 16) and adding the record's 16-bit address field: full address = (segment base × 16) + record address. In the example, the base 0008h × 16 = 08000h; thus, the first data record loads at 08000h + 0800h = 08800h, and the second at 08000h + 0810h = 08810h. This approach allows sparse or non-contiguous memory loading without requiring continuous addressing from zero. Such extended addressing is commonly applied in firmware for systems exceeding 64KB of memory. During parsing, the base address remains active for all following records until a new extension record (type 02 or 04) resets it, ensuring correct sequential interpretation of the file.

Extended Linear Address Example

The extended linear address record (type 04) specifies the upper 16 bits of a 32-bit linear base address, allowing data records to address up to 4 gigabytes. Subsequent type 00 records use this base shifted left by 16 bits plus their offset. A simple example:

:0200000400F0 :10000000AABBCCDDEEFF00112233445566778899AACC :00000001FF

:0200000400F0 :10000000AABBCCDDEEFF00112233445566778899AACC :00000001FF

Here, :0200000400F0 sets the upper to 0000h (minimal base, F0 for sum 02+00+00+04+00+00=06, -6=FCh wait, adjust). For base 00F0h: sum 02+00+00+04+00+F0= F6h, negation 0Ah? Wait, example adjusted. The full = (upper base << 16) + record . For upper 00F0h, at 0000h loads to F00000h. This is used in 32-bit systems like modern embedded devices.

Variants and Extensions

Standard Variant

The standard variant of the Intel HEX format, also known as INHX16 or I16HEX, introduced in the late for 's Intellec development systems and 16-bit processors such as the 8086, provides a textual representation of using ASCII hexadecimal characters, primarily for loading programs into ROMs and EPROMs. This variant employs 16-bit addressing within segments, limiting the addressable memory to a maximum of 64 KB per segment without requiring further extensions, and relies mainly on type 00 records for data distribution and type 01 records to signal the file's conclusion. Defined in 's Extended Object File Format Specification (Revision A, January 6, 1988), it specifies record types 00 through 05, though the core functionality centers on types 00, 01, 02, and 03 for compatibility with 16-bit segmented architectures like the 8086. A key limitation of this variant is its lack of native support for memory spaces exceeding 1 MB in a flat model, as the 16-bit field in type 00 records can only reference offsets within a 64 KB segment. To larger segmented memory in 16-bit processors like the 8086, type 02 records optionally set a 16-bit segment base , which is shifted left by 4 bits (multiplied by 16) and added to subsequent type 00 offsets; however, this segment addressing is non-linear, potentially leading to gaps or overlaps if not managed carefully, as it reflects the processor's 20-bit space rather than a flat model. Type 03 records specify the starting execution within this segmented scheme, completing the basic loading mechanism. This format enjoys universal compatibility with legacy development tools, including Intel's In-Circuit Emulators (ICE) such as the ICE-186/188, which directly load standard Intel HEX files for debugging 8086-family processors. Modern flash programming utilities, like those in and Microchip ecosystems, also fully support it for and EPROM applications due to its simplicity and widespread adoption. Standard files impose strict constraints to ensure reliable parsing: they must conclude with precisely one type 01 record (format: :00000001FF), which carries no or and serves solely as the terminator, and no duplicate addresses are permitted across type 00 records to prevent unintended data overwrites during loading. Common pitfalls in using the standard variant include assuming fully linear addressing throughout the file, which fails when type 02 records introduce segment shifts, resulting in misaligned memory placement; additionally, overlooking the absence of type 04 records can cause errors in tools expecting extended addressing, though such features fall outside this baseline specification.

Extended Linear Address Variant

The Extended Linear Address variant of the HEX format was introduced in 1988 to support 32-bit addressing for processors like the Intel 80386, enabling access to a full 4 GB address space by specifying the upper 16 bits of the linear base address. This extension addresses the limitations of earlier 16-bit addressing schemes, allowing and data to be placed beyond the 64 KB boundary in a linear manner without relying on segmented memory models. Also known as INHX32 or I32HEX, it primarily uses record types 00, 01, 04, and 05. The mechanism relies on type 04 records, which consist of a fixed byte count of 02, an ignored address field (typically 0000), the record type 04, and two data bytes representing the upper linear base address (ULBA). Subsequent data records (type 00) or other addressable records have their effective addresses calculated as (ULBA << 16) | record_address, where the record_address is the 16-bit field in the standard record structure. The ULBA remains in effect until overridden by another type 04 record and defaults to 0000 at the start of the file. This approach maintains compatibility with the core format while extending the addressable range modularly up to 4 GB. For example, the record :0200000400807A sets the ULBA to 0080h, establishing a base of 00800000h. A following record like :0A0000000123456789ABCDEF01 (with appropriate ) would then load 10 bytes starting at absolute 00800000h. Older parsers designed for 8-bit or 16-bit systems typically ignore type 04 records, treating them as no-ops and falling back to 16-bit addressing, which may lead to incomplete loading for files exceeding 64 KB. In contrast, modern tools such as fully support this variant through formats like i32hex, ensuring proper handling of 32-bit addresses during conversion and loading. This variant is detailed in Intel's Hexadecimal Object File Format Specification (Revision A, January 6, 1988), which formalized the 32-bit extensions. It is essential for programming 32-bit microcontrollers and ARM-based systems where firmware images surpass 64 KB, providing a straightforward way to distribute large binaries across regions.

Other Specialized Variants

In addition to the standard segmented and extended linear address variants, several specialized formats of the Intel HEX file structure have been developed for specific hardware architectures or early applications, though many are now obsolete. The I8HEX format, also known as Intel-8, HEX-80, or INHX8M, is an 8-bit linear variant designed for processors like the Intel 8080, supporting only a 64 KB address space through record types 00 (data) and 01 (end of file). This restricts the format to linear addressing without segment or extended records, making it suitable for simple embedded systems where larger memory mapping is unnecessary. An early precursor to Intel HEX is the Signetics HEX format from the 1970s, used for programming devices from Signetics (now part of /NXP). It shares the colon-starting record structure but limits addressing to 64 KB and employs a distinct mechanism: an address checksum (XOR of address bytes and count, rotated left) followed by a data (XOR of data bytes, rotated left). While influential in establishing ASCII encoding for transfer, it is incompatible with Intel HEX due to the differences and lack of record type extensions. These specialized variants, including I8HEX and Signetics HEX, have largely been phased out in favor of the more versatile standard segmented and extended linear address formats, which suffice for the vast majority of contemporary and programming needs.

Applications

Primary Uses

The Intel HEX format is widely employed for programming into the non-volatile memory of microcontrollers, such as flash or in devices like Microchip's PIC series and Atmel/Microchip's AVR family. Development tools, including Microchip's X (IDE) and its Integrated Programming Environment (IPE), directly support importing and applying Intel HEX files to load compiled code onto these microcontrollers via hardware programmers like or AVR Dragon. This process enables efficient transfer of binary program data in a human-readable ASCII form, facilitating verification and error checking before flashing. In contexts, Intel HEX serves as a standard for dumping the contents of ROM or chips into an editable text representation. Programmers or emulators read the binary memory from legacy hardware, such as older embedded systems or automotive components, and encode it as Intel HEX files for analysis, modification, or archival purposes. This text-based output allows engineers to inspect without proprietary binary tools. For bootloader-mediated updates, particularly in Internet of Things (IoT) applications, Intel HEX files are serialized and transmitted over networks for over-the-air (OTA) firmware upgrades. Bootloaders in devices like wireless sensors parse these files to update remotely, ensuring compatibility with serial protocols such as UART or wireless standards like (BLE). This approach minimizes downtime in deployed systems while leveraging the format's built-in addressing and checksum features for reliable delivery. Within embedded development workflows, Intel HEX acts as an intermediary output from compilers and assemblers, converting object files (e.g., ELF) into a programmer-ready format. For instance, the GNU Compiler Collection (GCC) for embedded targets uses the objcopy utility with the -O ihex option to generate Intel HEX from assembly or C code, which then feeds into debuggers or in-circuit emulators. This integration streamlines the pipeline from source code to device deployment across toolchains. The format remains prevalent in legacy systems for , including automotive electronic control units (ECUs) and industrial programmable logic controllers (PLCs), where it interfaces with established flashing tools and diagnostic equipment. Originating in the 1970s for programming, its enduring use in these domains ensures seamless integration with aging infrastructure. In hobbyist and educational embedded projects, such as those involving boards, Intel HEX dominates as the default output for uploading sketches, underscoring its accessibility for prototyping and experimentation.

Implementation Considerations

When implementing software to parse Intel HEX files, the process begins by reading the file line by line, stripping any line terminators such as carriage returns or newlines. Each line must start with a colon (:) prefix, followed by two hexadecimal digits indicating the byte count of the record length, four digits for the 16-bit load offset, two digits for the record type, a variable number of data bytes in hexadecimal pairs, and finally two digits for the checksum. The hexadecimal pairs are decoded into binary bytes, with the address applied by combining the current offset with any prior extended linear address from type 04 records. The checksum is verified by summing all bytes from the length through the data (excluding the colon), taking the two's complement modulo 256, and ensuring it matches the provided value; failure indicates corruption or invalid data. Generating Intel HEX files requires ensuring data records (type 00) are written with sequential addresses to avoid gaps or overlaps, which may necessitate merging during parsing if overlaps occur. For files exceeding 64 KB, insert extended linear address records (type 04) before segments to set the upper 16 bits of the address, allowing up to 4 GB coverage. Records should use up to 16 data bytes for efficiency, though larger counts are permissible, and the file must always conclude with an record (type 01: :00000001FF) to signal completeness; omission can lead to incomplete loads. Common errors during include address overlaps, where multiple target the same location, requiring implementers to merge or prioritize to prevent corruption. Invalid characters or mismatched record lengths (e.g., bytes not equaling the stated count) can cause decoding failures, while missing colons or mismatches often stem from transmission errors or malformed generation. Absence of the EOF record is frequent in incomplete files, resulting in partial extraction. Established libraries simplify implementation: the SRecord C++ library supports reading, writing, and manipulating Intel HEX alongside other formats, using polymorphic classes for flexible filtering and conversion. In Python, the intelhex library enables loading, modifying, and dumping HEX data, handling case-insensitivity in hex digits automatically. For ARM-specific applications, pyOCD integrates intelhex for flashing and debugging, supporting binary conversion post-parsing. Security considerations emphasize validating checksums on every record to detect tampering or bit errors during transfer, as unverified files could introduce malicious code. Parsers should bound input lengths to prevent buffer overflows from excessively long records, especially in embedded environments with limited stack space. For performance with large files exceeding 1 MB, the format's ASCII encoding inflates size by about 2-3 times compared to binary.

Comparisons

With S-Record Format

The Intel HEX and S-Record formats are both ASCII-based representations of binary used for programming memory devices, but they differ in structure, addressing capabilities, and verification mechanisms. Intel HEX records begin with a colon (:) character, followed by fields for byte count, offset, record type, , and . In contrast, S-Record lines start with an 'S' character followed by a single-digit type identifier (e.g., 0 for header, 1/2/3 for records with 16/24/32-bit addresses, 5/6 for record counts, and 7/8/9 for termination). Addressing in Intel HEX relies on 16-bit offsets in standard data records, with extensions like the Extended Linear Address record (type 04) enabling 32-bit linear addressing by specifying the upper 16 bits of the address. S-Records handle addressing more directly through type-specific variants: S1 for 16-bit addresses, S2 for 24-bit addresses, and S3 for 32-bit addresses, without needing separate extension records. Checksum computation also varies: Intel HEX employs the two's complement of the sum of all bytes from the count through the data fields, ensuring the total sum including the checksum is zero modulo 256. S-Records use the least significant byte of the two's complement of the sum of the length, address, and data bytes, ensuring the total sum including the checksum is zero modulo 256. Intel HEX defines six record types: data (00), end-of-file (01), extended segment address (02), start segment address (03), extended linear address (04), and start linear address (05). S-Records include types for header (S0), with varying address lengths (S1/S2/S3), optional counts of data records (S5/S6), and termination with execution start (S7/S8/S9). In terms of file size, Intel HEX files tend to be larger due to the fixed two-character encoding for each byte and separate fields for and , with typical holding 16 bytes (about 45 characters per line). S-Record files are generally more compact, as the length field encompasses and bytes, allowing up to 64 bytes per record (up to 78 characters total), reducing overhead for dense . Adoption patterns reflect their origins: Intel HEX is prevalent in x86-based systems and many embedded applications, particularly for /ROM programming in Intel-derived architectures. S-Records are commonly used in Motorola- and Freescale-derived ecosystems, such as PowerPC and ColdFire microprocessors, for downloading memory images in debuggers and linkers. Conversion between the formats is supported by tools like srec_cat from the SRecord package, which enables bidirectional transformation while preserving , with Intel HEX often favored for human readability and S-Records for storage efficiency.

Advantages and Limitations

The Intel HEX format offers several advantages rooted in its ASCII-based design, particularly in embedded systems development and . Its human-readable structure allows developers to inspect and edit data directly using standard text editors, facilitating manual verification and without specialized binary tools. This readability is especially beneficial during sessions, where quick identification of contents or issues can accelerate development cycles. Additionally, each record includes a one-byte computed as the negation of the sum of all preceding bytes, enabling built-in error detection to verify during transmission or loading. The format's text nature also supports straightforward transmission over serial ports, , or other ASCII-compatible channels, which was particularly valuable in early embedded workflows and remains useful for simple field updates. Widespread tool support further enhances its practicality, as it is natively handled by prominent embedded development environments such as Keil MDK and various programmers, ensuring compatibility across legacy and modern ecosystems. However, these strengths come with notable limitations stemming from the format's age and textual overhead. Intel HEX files are verbose, typically expanding to approximately twice the size due to encoding (two ASCII characters per byte) plus record headers, addresses, and checksums; for instance, a 700 KB binary firmware might result in a 1.8 MB HEX file. This inefficiency lacks native compression, increasing storage and bandwidth demands, particularly in resource-constrained environments. Address management becomes complex for large files, as the standard format relies on 16-bit offsets with optional extended linear address records for 32-bit support, requiring multiple segment switches that fragment contiguous data blocks and complicate logic. In high-volume production scenarios, the text-based introduces overhead, with conversion times ranging from 10-50 ms for a 1 MB file on modern CPUs, making it less efficient than direct binary loading. The format is also outdated for contemporary operating system loaders, which favor structured executables like ELF or PE for their richer metadata, relocation support, and features; Intel HEX serves primarily as a raw memory image without such capabilities. Furthermore, its representation hinders integration with encrypted , as embedding binary directly into ASCII hex pairs risks corruption or requires additional wrappers. Despite these drawbacks, Intel HEX persists in legacy and embedded niches where simplicity and tool availability outweigh efficiency, though its adoption is declining for new 64-bit systems in favor of more compact or metadata-rich alternatives. To mitigate limitations, developers often convert to binary intermediates for loading and employ validation scripts to check checksums and continuity before deployment. Emerging hybrids, such as JSON-wrapped HEX payloads, are appearing in IoT applications to add metadata while retaining compatibility.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.