Hubbry Logo
ByteByteMain
Open search
Byte
Community hub
Byte
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something
Byte
Byte
from Wikipedia
byte
Unit systemunit derived from bit
Unit ofdigital information, data size
SymbolB, o (when 8 bits)

The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer[1][2] and for this reason it is the smallest addressable unit of memory in many computer architectures. To disambiguate arbitrarily sized bytes from the common 8-bit definition, network protocol documents such as the Internet Protocol (RFC 791) refer to an 8-bit byte as an octet.[3] Those bits in an octet are usually counted with numbering from 0 to 7 or 7 to 0 depending on the bit endianness.

The size of the byte has historically been hardware-dependent and no definitive standards existed that mandated the size. Sizes from 1 to 48 bits have been used.[4][5][6][7] The six-bit character code was an often-used implementation in early encoding systems, and computers using six-bit and nine-bit bytes were common in the 1960s. These systems often had memory words of 12, 18, 24, 30, 36, 48, or 60 bits, corresponding to 2, 3, 4, 5, 6, 8, or 10 six-bit bytes, and persisted, in legacy systems, into the twenty-first century. In this era, bit groupings in the instruction stream were often referred to as syllables[a] or slab, before the term byte became common.

The modern de facto standard of eight bits, as documented in ISO/IEC 2382-1:1993, is a convenient power of two permitting the binary-encoded values 0 through 255 for one byte, as 2 to the power of 8 is 256.[8] The international standard IEC 80000-13 codified this common meaning. Many types of applications use information representable in eight or fewer bits and processor designers commonly optimize for this usage. The popularity of major commercial computing architectures has aided in the ubiquitous acceptance of the 8-bit byte.[9] Modern architectures typically use 32- or 64-bit words, built of four or eight bytes, respectively.

The unit symbol for the byte was designated as the upper-case letter B by the International Electrotechnical Commission (IEC) and Institute of Electrical and Electronics Engineers (IEEE).[10] Internationally, the unit octet explicitly defines a sequence of eight bits, eliminating the potential ambiguity of the term "byte".[11][12] The symbol for octet, 'o', also conveniently eliminates the ambiguity in the symbol 'B' between byte and bel.

Etymology and history

[edit]

The term byte was coined by Werner Buchholz in June 1956,[4][13][14][b] during the early design phase for the IBM Stretch[15][16][1][13][14][17][18] computer, which had addressing to the bit and variable field length (VFL) instructions with a byte size encoded in the instruction.[13] It is a deliberate respelling of bite to avoid accidental mutation to bit.[1][13][19][c]

Another origin of byte for bit groups smaller than a computer's word size, and in particular groups of four bits, is on record by Louis G. Dooley, who claimed he coined the term while working with Jules Schwartz and Dick Beeler on an air defense system called SAGE at MIT Lincoln Laboratory in 1956 or 1957, which was jointly developed by Rand, MIT, and IBM.[20][21] Later on, Schwartz's language JOVIAL actually used the term, but the author recalled vaguely that it was derived from AN/FSQ-31.[22][21]

Early computers used a variety of four-bit binary-coded decimal (BCD) representations and the six-bit codes for printable graphic patterns common in the U.S. Army (FIELDATA) and Navy. These representations included alphanumeric characters and special graphical symbols. These sets were expanded in 1963 to seven bits of coding, called the American Standard Code for Information Interchange (ASCII) as the Federal Information Processing Standard, which replaced the incompatible teleprinter codes in use by different branches of the U.S. government and universities during the 1960s. ASCII included the distinction of upper- and lowercase alphabets and a set of control characters to facilitate the transmission of written language as well as printing device functions, such as page advance and line feed, and the physical or logical control of data flow over the transmission media.[18] During the early 1960s, while also active in ASCII standardization, IBM simultaneously introduced in its product line of System/360 the eight-bit Extended Binary Coded Decimal Interchange Code (EBCDIC), an expansion of their six-bit binary-coded decimal (BCDIC) representations[d] used in earlier card punches.[23] The prominence of the System/360 led to the ubiquitous adoption of the eight-bit storage size,[18][16][13] while in detail the EBCDIC and ASCII encoding schemes are different.

In the early 1960s, AT&T introduced digital telephony on long-distance trunk lines. These used the eight-bit μ-law encoding. This large investment promised to reduce transmission costs for eight-bit data.

In Volume 1 of The Art of Computer Programming (first published in 1968), Donald Knuth uses byte in his hypothetical MIX computer to denote a unit which "contains an unspecified amount of information ... capable of holding at least 64 distinct values ... at most 100 distinct values. On a binary computer a byte must therefore be composed of six bits".[24] He notes that "Since 1975 or so, the word byte has come to mean a sequence of precisely eight binary digits...When we speak of bytes in connection with MIX we shall confine ourselves to the former sense of the word, harking back to the days when bytes were not yet standardized."[24]

The development of eight-bit microprocessors in the 1970s popularized this storage size. Microprocessors such as the Intel 8080, the direct predecessor of the 8086, could also perform a small number of operations on the four-bit pairs in a byte, such as the decimal-add-adjust (DAA) instruction. A four-bit quantity is often called a nibble, also nybble, which is conveniently represented by a single hexadecimal digit.

The term octet unambiguously specifies a size of eight bits.[18][12] It is used extensively in protocol definitions.

Historically, the term octad or octade was used to denote eight bits as well at least in Western Europe;[25][26] however, this usage is no longer common. The exact origin of the term is unclear, but it can be found in British, Dutch, and German sources of the 1960s and 1970s, and throughout the documentation of Philips mainframe computers.

Unit symbol

[edit]

The unit symbol for the byte is specified in IEC 80000-13, IEEE 1541 and the Metric Interchange Format[10] as the upper-case character B.

In the International System of Quantities (ISQ), B is also the symbol of the bel, a unit of logarithmic power ratio named after Alexander Graham Bell, creating a conflict with the IEC specification. However, little danger of confusion exists, because the bel is a rarely used unit. It is used primarily in its decadic fraction, the decibel (dB), for signal strength and sound pressure level measurements, while a unit for one-tenth of a byte, the decibyte, and other fractions, are only used in derived units, such as transmission rates.

The lowercase letter o for octet is defined as the symbol for octet in IEC 80000-13 and is commonly used in languages such as French[27] and Romanian, and is also combined with metric prefixes for multiples, for example ko and Mo.

Multiple-byte units

[edit]
Multiple-byte units
Decimal
Value Metric
1000 kB kilobyte
10002 MB megabyte
10003 GB gigabyte
10004 TB terabyte
10005 PB petabyte
10006 EB exabyte
10007 ZB zettabyte
10008 YB yottabyte
10009 RB ronnabyte
100010 QB quettabyte
Binary
Value IEC Memory
1024 KiB kibibyte KB kilobyte
10242 MiB mebibyte MB megabyte
10243 GiB gibibyte GB gigabyte
10244 TiB tebibyte TB terabyte
10245 PiB pebibyte
10246 EiB exbibyte
10247 ZiB zebibyte
10248 YiB yobibyte
10249 RiB robibyte
102410 QiB quebibyte
Orders of magnitude of data

More than one system exists to define unit multiples based on the byte. Some systems are based on powers of 10, following the International System of Units (SI), which defines for example the prefix kilo as 1000 (103); other systems are based on powers of two. Nomenclature for these systems has led to confusion. Systems based on powers of 10 use standard SI prefixes (kilo, mega, giga, ...) and their corresponding symbols (k, M, G, ...). Systems based on powers of 2, however, might use binary prefixes (kibi, mebi, gibi, ...) and their corresponding symbols (Ki, Mi, Gi, ...) or they might use the prefixes K, M, and G, creating ambiguity when the prefixes M or G are used.

While the difference between the decimal and binary interpretations is relatively small for the kilobyte (about 2% smaller than the kibibyte), the systems deviate increasingly as units grow larger (the relative deviation grows by 2.4% for each three orders of magnitude). For example, a power-of-10-based terabyte is about 9% smaller than power-of-2-based tebibyte.

Units based on powers of 10

[edit]

Definition of prefixes using powers of 10—in which 1 kilobyte (symbol kB) is defined to equal 1,000 bytes—is recommended by the International Electrotechnical Commission (IEC).[28] The IEC standard defines eight such multiples, up to 1 yottabyte (YB), equal to 10008 bytes.[29] The additional prefixes ronna- for 10009 and quetta- for 100010 were adopted by the International Bureau of Weights and Measures (BIPM) in 2022.[30][31]

This definition is most commonly used for data-rate units in computer networks, internal bus, hard drive and flash media transfer speeds, and for the capacities of most storage media, particularly hard drives,[32] flash-based storage,[33] and DVDs.[34] Operating systems that use this definition include macOS,[35] iOS,[35] Ubuntu,[36] and Debian.[37] It is also consistent with the other uses of the SI prefixes in computing, such as CPU clock speeds or measures of performance.

The IBM System 360 and the related disk and tape systems set the byte at 8 bits and documented capacities in decimal units.[38] The early 8-, 5.25- and 3.5-inch floppies gave capacities in multiples of 1024, using "KB" rather than the more accurate "KiB". The later, larger, 8-, 5.25- and 3.5-inch floppies gave capacities in a hybrid notation, i.e., multiples of 1024,000, using "KB" = 1024 B and "MB" = 1024,000 B. Early 5.25-inch disks used decimal[dubiousdiscuss] even though they used 128-byte and 256-byte sectors.[39] Hard disks used mostly 256-byte and then 512-byte before 4096-byte blocks became standard.[40] RAM was always sold in powers of 2.[citation needed]

Units based on powers of 2

[edit]

A system of units based on powers of 2 in which 1 kibibyte (KiB) is equal to 1,024 (i.e., 210) bytes is defined by international standard IEC 80000-13 and is supported by national and international standards bodies (BIPM, IEC, NIST). The IEC standard defines eight such multiples, up to 1 yobibyte (YiB), equal to 10248 bytes. The natural binary counterparts to ronna- and quetta- were given in a consultation paper of the International Committee for Weights and Measures' Consultative Committee for Units (CCU) as robi- (Ri, 10249) and quebi- (Qi, 102410), but have not yet been adopted by the IEC or ISO.[41]

An alternative system of nomenclature for the same units (referred to here as the customary convention), in which 1 kilobyte (KB) is equal to 1,024 bytes,[42][43][44] 1 megabyte (MB) is equal to 10242 bytes and 1 gigabyte (GB) is equal to 10243 bytes is mentioned by a 1990s JEDEC standard. Only the first three multiples (up to GB) are mentioned by the JEDEC standard, which makes no mention of TB and larger. While confusing and incorrect,[45] the customary convention is used by the Microsoft Windows operating system[46][better source needed] and random-access memory capacity, such as main memory and CPU cache size, and in marketing and billing by telecommunication companies, such as Vodafone,[47] AT&T,[48] Orange[49] and Telstra.[50]

For storage capacity, the customary convention was used by macOS and iOS through Mac OS X 10.5 Leopard and iOS 10, after which they switched to units based on powers of 10.[35]

Parochial units

[edit]

Various computer vendors have coined terms for data of various sizes, sometimes with different sizes for the same term even within a single vendor. These terms include double word, half word, long word, quad word, slab, superword and syllable. There are also informal terms. e.g., half byte and nybble for 4 bits, octal K for 10008.

History of the conflicting definitions

[edit]
Percentage difference between decimal and binary interpretations of the unit prefixes grows with increasing storage size

When I see a disk advertised as having a capacity of one megabyte, what is this telling me? There are three plausible answers, and I wonder if anybody knows which one is correct ... Now this is not a really vital issue, as there is just under 5% difference between the smallest and largest alternatives. Nevertheless, it would [be] nice to know what the standard measure is, or if there is one.

— Allan D. Pratt of Small Computers in Libraries, 1982[51]

Contemporary[e] computer memory has a binary architecture making a definition of memory units based on powers of 2 most practical. The use of the metric prefix kilo for binary multiples arose as a convenience, because 1024 is approximately 1000.[27] This definition was popular in early decades of personal computing, with products like the Tandon 514-inch DD floppy format (holding 368640 bytes) being advertised as "360 KB", following the 1024-byte convention. It was not universal, however. The Shugart SA-400 514-inch floppy disk held 109,375 bytes unformatted,[52] and was advertised as "110 Kbyte", using the 1000 convention.[53] Likewise, the 8-inch DEC RX01 floppy (1975) held 256256 bytes formatted, and was advertised as "256k".[54] Some devices were advertised using a mixture of the two definitions: most notably, floppy disks advertised as "1.44 MB" have an actual capacity of 1440 KiB, the equivalent of 1.47 MB or 1.41 MiB.

In 1995, the International Union of Pure and Applied Chemistry's (IUPAC) Interdivisional Committee on Nomenclature and Symbols attempted to resolve this ambiguity by proposing a set of binary prefixes for the powers of 1024, including kibi (kilobinary), mebi (megabinary), and gibi (gigabinary).[55][56]

In December 1998, the IEC addressed such multiple usages and definitions by adopting the IUPAC's proposed prefixes (kibi, mebi, gibi, etc.) to unambiguously denote powers of 1024.[57] Thus one kibibyte (1 KiB) is 10241 bytes = 1024 bytes, one mebibyte (1 MiB) is 10242 bytes = 1048576 bytes, and so on.

In 1999, Donald Knuth suggested calling the kibibyte a "large kilobyte" (KKB).[58]

Modern standard definitions

[edit]

The IEC adopted the IUPAC proposal and published the standard in January 1999.[59][60] The IEC prefixes are part of the International System of Quantities. The IEC further specified that the kilobyte should only be used to refer to 1000 bytes.[61]

Lawsuits over definition

[edit]

Lawsuits arising from alleged consumer confusion over the binary and decimal definitions of multiples of the byte have generally ended in favor of the manufacturers, with courts holding that the legal definition of gigabyte or GB is 1 GB = 1000000000 (109) bytes (the decimal definition), rather than the binary definition (230, i.e., 1073741824). Specifically, the United States District Court for the Northern District of California held that "the U.S. Congress has deemed the decimal definition of gigabyte to be the 'preferred' one for the purposes of 'U.S. trade and commerce' [...] The California Legislature has likewise adopted the decimal system for all 'transactions in this state.'"[62]

Earlier lawsuits had ended in settlement with no court ruling on the question, such as a lawsuit against drive manufacturer Western Digital.[63][64] Western Digital settled the challenge and added explicit disclaimers to products that the usable capacity may differ from the advertised capacity.[63] Seagate was sued on similar grounds and also settled.[63][65]

Practical examples

[edit]
Unit Approximate equivalent
bit a Boolean variable indicating true (1) or false (0)
byte a basic Latin character.
kilobyte text of "Jabberwocky"
a typical favicon
megabyte text of Harry Potter and the Goblet of Fire[66]
gigabyte about half an hour of DVD video[67]
CD-quality uncompressed audio of The Lamb Lies Down on Broadway
terabyte the largest consumer hard drive in 2007[68]
75 hours of video, encoded at 30 Mbit/second
petabyte 2000 years of MP3-encoded music[69]
exabyte global monthly Internet traffic in 2004[70]
zettabyte global yearly Internet traffic in 2016 (known as the Zettabyte Era)[71]

Common uses

[edit]

Many programming languages define the data type byte.

The C and C++ programming languages define byte as an "addressable unit of data storage large enough to hold any member of the basic character set of the execution environment" (clause 3.6 of the C standard). The C standard requires that the integral data type unsigned char must hold at least 256 different values, and is represented by at least eight bits (clause 5.2.4.2.1). Various implementations of C and C++ reserve 8, 9, 16, 32, or 36 bits for the storage of a byte.[72][73][f] In addition, the C and C++ standards require that there be no gaps between two bytes. This means every bit in memory is part of a byte.[74]

Java's primitive data type byte is defined as eight bits. It is a signed data type, holding values from −128 to 127.

.NET programming languages, such as C#, define byte as an unsigned type, and the sbyte as a signed data type, holding values from 0 to 255, and −128 to 127, respectively.

In data transmission systems, the byte is used as a contiguous sequence of bits in a serial data stream, representing the smallest distinguished unit of data. For asynchronous communication a full transmission unit usually additionally includes a start bit, 1 or 2 stop bits, and possibly a parity bit, and thus its size may vary from seven to twelve bits for five to eight bits of actual data.[75] For synchronous communication the error checking usually uses bytes at the end of a frame.

See also

[edit]

Notes

[edit]

References

[edit]

Further reading

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
A byte is a unit of digital information in computing and digital communications that most commonly consists of eight bits. A single byte is capable of representing 256 distinct values, ranging from 0 to 255 in decimal notation, making it suitable for encoding individual characters, small integers, or binary states. The term "byte" was coined in July 1956 by Werner Buchholz, a German-born American computer scientist, during the early design phase of the IBM 7030 Stretch supercomputer. Buchholz deliberately respelled "bite" as "byte" to denote an ordered collection of bits while avoiding confusion with the existing term "bit." Initially, the size of a byte varied across systems—for instance, early computers used 4-bit or 6-bit groupings—but it was standardized as 8 bits in the 1960s with the IBM System/360 mainframe series, which adopted the 8-bit Extended Binary Coded Decimal Interchange Code (EBCDIC) for character encoding. In modern computing, bytes form the basic building block for data storage, memory allocation, and transmission, enabling the representation of text, images, and executable code. They underpin character encoding schemes such as ASCII, which assigns 128 characters to the first 7 bits of a byte (with the eighth bit often used for parity or extension), and variable-length Unicode formats like UTF-8, where ASCII-compatible characters occupy one byte and others use multiple bytes. Larger data volumes are quantified using binary multiples of the byte, including the kilobyte (1 KiB = 1,024 bytes in computing contexts, though sometimes approximated as 1,000 bytes in decimal systems), megabyte (1 MiB = 1,048,576 bytes), and higher units up to yottabytes. This hierarchical structure is essential for measuring file sizes, bandwidth, and storage capacity in digital systems.

Definition and Fundamentals

Core Definition

A byte is a unit of digital information typically consisting of eight bits, enabling the representation of 256 distinct values ranging from 0 to 255 in decimal notation. This structure allows bytes to serve as a fundamental building block for data storage, processing, and transmission in computing systems. A bit, the smallest unit of digital information, represents a single binary digit that can hold either a value of 0 or 1. By grouping eight such bits into a byte, computers can encode more complex data efficiently, supporting operations like arithmetic calculations and character representation that exceed the limitations of individual bits. The international standard IEC 80000-13:2008 formally defines one byte as exactly eight bits, using the term "byte" (symbol B) as a synonym for "octet" to denote this eight-bit quantity and recommending its use to avoid ambiguity with historical variations. For example, a single byte can store one ASCII character, such as 'A', which corresponds to the decimal value 65.

Relation to Bits

A byte is an ordered collection of bits, standardized in modern computing to eight bits, that is typically treated as a single binary number representing integer values from 00000000 (0 in decimal) to 11111111 (255 in decimal). This structure allows a byte to encode 256 distinct states, as each bit can independently be 0 or 1, yielding 282^8 possible combinations. The numerical value of a byte is determined by its binary representation using positional notation, where each bit's position corresponds to a power of 2. The value VV of an 8-bit byte is calculated as V=i=07bi2iV = \sum_{i=0}^{7} b_i \cdot 2^i where bib_i is the value of the ii-th bit (either 0 or 1), and i=0i = 0 denotes the least significant bit. For example, the binary byte 10101010 converts to 170 in decimal, computed as 127+026+125+024+123+022+121+020=128+32+8+2=1701 \cdot 2^7 + 0 \cdot 2^6 + 1 \cdot 2^5 + 0 \cdot 2^4 + 1 \cdot 2^3 + 0 \cdot 2^2 + 1 \cdot 2^1 + 0 \cdot 2^0 = 128 + 32 + 8 + 2 = 170. In computing systems, bytes play a crucial role by serving as the smallest addressable unit of memory, enabling efficient referencing and manipulation of data in larger aggregates beyond individual bits. This byte-addressable design facilitates operations on contiguous blocks of memory, such as loading instructions or storing variables, which would be impractical at the bit level due to the granularity mismatch.

History and Etymology

Origins of the Term

The term "byte" was coined in July 1956 by IBM engineer Werner Buchholz during the early design phase of the IBM Stretch computer, a pioneering supercomputer project aimed at advancing high-performance computing. Buchholz introduced the word as a more concise alternative to cumbersome phrases like "binary digit group" or "bit string," which were used to describe groupings of bits in data processing. Etymologically, "byte" derives from "bit" with the addition of the suffix "-yte," intentionally respelled from the more intuitive "bite" to prevent confusion with the existing term "bit" while evoking the idea of a larger "bite" of information. This playful yet practical choice reflected the need for a unit that signified a meaningful aggregation of bits, larger than a single binary digit but suitable for computational operations. In its early conceptual role, the byte was proposed as a flexible data-handling unit larger than a bit, specifically to encode characters, perform arithmetic on variable-length fields, and manage instructions in the bit-addressable architecture of mainframes like the Stretch. This addressed the limitations of processing data solely in isolated bits, enabling more efficient handling of textual and numerical information in early computer systems. The first documented use of "byte" appeared in the June 1959 technical paper "Processing Data in Bits and Pieces" by Buchholz, Frederick P. Brooks Jr., and Gerrit A. Blaauw, published in the IRE Transactions on Electronic Computers, where it described a unit for variable-length data operations in the context of Stretch's design. Although the term originated three years earlier in internal IBM discussions, this publication marked its entry into the broader technical literature, predating its adoption in the IBM System/360 architecture.

Evolution of Byte Size

In the early days of computing, the size of a byte varied across systems to suit specific hardware architectures and data encoding needs. The IBM 7030 Stretch supercomputer, introduced in 1959, employed a variable-length byte concept, but typically utilized 6-bit bytes for binary-coded decimal (BCD) character representation, allowing efficient packing of decimal digits within its 64-bit words. Similarly, 7-bit bytes were common in telegraphic and communication systems, aligning with the structure of early character codes like the International Telegraph Alphabet No. 5, a 7-bit code supporting 128 characters. Some minicomputers, such as the DEC PDP-10 from the late 1960s, adopted 9-bit bytes to divide 36-bit words into four equal units, facilitating operations on larger datasets like those in time-sharing systems. The transition to an 8-bit byte gained momentum in the mid-1960s, propelled by emerging character encoding standards that required more robust representation. The American Standard Code for Information Interchange (ASCII), standardized in 1963, defined 7 bits for 128 characters, but practical implementations often added an 8th parity bit for error checking in transmission, effectively establishing an 8-bit structure. IBM's Extended Binary Coded Decimal Interchange Code (EBCDIC), developed in 1964 for the System/360 mainframe series, natively used 8 bits to encode 256 possible values, including control characters and punched-card compatibility, influencing enterprise computing architectures. The IBM System/360, announced in 1964, played a crucial role in this standardization by adopting a consistent 8-bit byte across its compatible family of computers, facilitating data interchange and software portability. This shift aligned with the growing need for international character support and efficient data processing beyond decimal-centric designs. By the 1970s, the 8-bit byte had become the de facto standard, driven by advancements in semiconductor technology and microprocessor design. Early dynamic random-access memory (DRAM) chips, such as Intel's 1103 introduced in 1970, provided 1-kilobit capacities in a 1024 × 1 bit organization. Systems using these chips often combined multiple devices to form 8-bit bytes, aligning with emerging standards for compatibility and efficiency. The Intel 8080 microprocessor, released in 1974, further solidified this by processing data in 8-bit units across its 16-bit architecture, enabling the proliferation of affordable personal computers and embedded systems. This standardization improved memory efficiency, as 8-bit alignments reduced overhead in addressing and arithmetic operations compared to uneven sizes like 6 or 9 bits. Formal standardization affirmed the 8-bit byte in international norms during the late 20th century. The IEEE 754 standard for binary floating-point arithmetic, published in 1985, implicitly relied on 8-bit bytes by defining single-precision formats as 32 bits (four bytes) and double-precision as 64 bits (eight bytes), ensuring portability across hardware. The ISO/IEC 2382-1 vocabulary standard, revised in 1993, explicitly defined a byte as a sequence of eight bits, providing a consistent terminology for information technology. This was reinforced by the International Electrotechnical Commission (IEC) in 1998 through amendments to IEC 60027-2, which integrated the 8-bit byte into binary prefix definitions for data quantities, resolving ambiguities in storage and transmission metrics.

Notation and Standards

Unit Symbols and Abbreviations

The official unit symbol for the byte is the uppercase letter B, as established by international standards to represent a sequence of eight bits. This symbol is defined in IEC 80000-13:2025, which specifies that the byte is synonymous with the octet and uses B to denote this unit in information science and technology contexts. The standard also aligns with earlier guidelines in IEC 60027-2 (2000), which incorporated conventions for binary multiples introduced in 1998 and emphasized consistent notation for bytes and bits. To prevent ambiguity, particularly in data rates and storage metrics, the lowercase b is reserved for the bit or its multiples (e.g., kbit for kilobit), while B exclusively denotes the byte. The National Institute of Standards and Technology (NIST) reinforces this distinction in its guidelines on SI units and binary prefixes, stating that one byte equals 1 B = 8 bits, and recommending B for all byte-related quantities to avoid confusion with bit-based units. Similarly, the International Electrotechnical Commission (IEC) advises against using non-standard symbols like "o" for octet, as it deviates from the unified B notation and could lead to errors in technical documentation. In formal writing and standards-compliant contexts, abbreviations should use B without periods or pluralization (e.g., 8 B for eight bytes), following general SI symbol rules for upright roman type and no modification for plurality. Informal usage in prose often spells out "byte" fully or employs B inline, but avoids ambiguous lowercase "b" for bytes to maintain clarity. For example, storage capacities are expressed as 1 KB = 1024 B in binary contexts, distinguishing from kbit or kb for kilobits (1000 bits). Guidelines from authoritative bodies like NIST and the IEC continue to prioritize B to ensure unambiguous communication in computing and measurement applications. These conventions promote standardized unit symbols to support global interoperability.

Definition of Multiples

Multiples of bytes provide a standardized way to express larger quantities of digital information, commonly applied in contexts such as data storage, memory capacity, and bandwidth measurement. These multiples incorporate prefixes that scale the base unit of one byte (8 bits) by powers of either 10, aligning with the decimal system used in general scientific measurement, or powers of 2, which correspond to the binary nature of computing architectures. In 1998, the International Electrotechnical Commission (IEC) established binary prefixes through the amendment to International Standard IEC 60027-2 to clearly denote multiples based on powers of 2, avoiding ambiguity in computing applications, with the latest revision in IEC 80000-13:2025 adding new prefixes for binary multiples. Under this system, the prefix "kibi" (Ki) represents 2102^{10} bytes, so 1 KiB = 2102^{10} bytes = 1024 bytes; "mebi" (Mi) represents 2202^{20} bytes, so 1 MiB = 2202^{20} bytes = 1,048,576 bytes; and the scale extends through prefixes like gibi (Gi, 2302^{30}), tebi (Ti, 2402^{40}), pebi (Pi, 2502^{50}), exbi (Ei, 2602^{60}), zebi (Zi, 2702^{70}), up to yobi (Yi, 2802^{80}), where 1 YiB = 2802^{80} bytes. Concurrently in 1998, the International System of Units (SI) prefixes were endorsed for decimal multiples of bytes to maintain consistency with metric conventions, defining scales based on powers of 10. For instance, the prefix "kilo" (k) denotes 10310^3 bytes, so 1 kB = 10310^3 bytes = 1000 bytes; "mega" (M) denotes 10610^6 bytes, so 1 MB = 10610^6 bytes = 1,000,000 bytes; and the progression continues with giga (G, 10910^9), tera (T, 101210^{12}), peta (P, 101510^{15}), exa (E, 101810^{18}), zetta (Z, 102110^{21}), yotta (Y, 102410^{24}), ronna (R, 102710^{27}), quetta (Q, 103010^{30}), where 1 QB = 103010^{30} bytes. In general, the value of a byte multiple can be expressed as \text{Value} = \text{prefix_factor} \times \text{byte_size}, where byte_size is 1 byte and prefix_factor equals 10n10^n for decimal prefixes or (210)k=210k(2^{10})^k = 2^{10k} for binary prefixes, where for decimal n is the exponent (e.g., n=3 for kilo), and for binary k is the level (e.g., k=1 for kibi, corresponding to 2102^{10}).

Variations and Conflicts in Multiples

Binary-Based Units

Binary-based units, also referred to as binary prefixes, are measurement units for digital information that are multiples of powers of 2, aligning with the fundamental binary architecture of computers. These units were formalized by the International Electrotechnical Commission (IEC) in its 1998 standard IEC 60027-2, which defines prefixes such as kibi (Ki), mebi (Mi), and gibi (Gi) to denote exact binary multiples of the byte. For instance, 1 kibibyte (KiB) equals 210=1,0242^{10} = 1,024 bytes, while 1 gibibyte (GiB) equals 230=1,073,741,8242^{30} = 1,073,741,824 bytes. This standardization was later incorporated into the updated IEC 80000-13:2008, emphasizing their role in data processing and transmission. The adoption of binary-based units gained traction for their precision in contexts like random access memory (RAM) capacities and file size reporting, where alignment with hardware addressing is crucial. Operating systems such as Microsoft Windows commonly report file sizes using these binary multiples—for example, displaying 1 KB as 1,024 bytes in File Explorer—to reflect actual storage allocation in binary systems. The IEC promoted these units to eliminate ambiguity in computing applications, ensuring that measurements for volatile memory like RAM and non-volatile storage like files accurately represent binary-scaled data. A key advantage of binary-based units lies in their seamless integration with computer memory addressing, where locations are numbered in powers of 2; for example, 2202^{20} addressable bytes precisely equals 1 mebibyte (MiB), facilitating efficient hardware design and software calculations without conversion overhead. The general formula for calculating the size in bytes is 210×n2^{10 \times n}, where nn is the prefix order (e.g., n=1n=1 for kibi, n=2n=2 for mebi). Thus, 1 tebibyte (TiB) = 240=1,099,511,627,7762^{40} = 1,099,511,627,776 bytes. Common binary prefixes are summarized below:
Prefix NameSymbolFactorBytes (for byte multiples)
kibibyteKiB2102^{10}1,024
mebibyteMiB2202^{20}1,048,576
gibibyteGiB2302^{30}1,073,741,824
tebibyteTiB2402^{40}1,099,511,627,776
pebibytePiB2502^{50}1,125,899,906,842,624
These units provide conceptual clarity for computational efficiency, in contrast to decimal-based units that scale by powers of 10 for metric consistency.

Decimal-Based Units

Decimal-based units for byte multiples adhere to the International System of Units (SI) prefixes, employing powers of 10 for scalability in information storage and transfer. Per ISO/IEC 80000-13:2008, the kilobyte (kB) is defined as exactly 1,000 bytes, or 10310^3 bytes, establishing the foundational decimal progression. This system scales linearly: the megabyte (MB) equals 10610^6 bytes (1,000,000 bytes), the gigabyte (GB) equals 10910^9 bytes (1,000,000,000 bytes), and the petabyte (PB) equals 101510^{15} bytes (1,000,000,000,000,000 bytes). The general expression for these units is given by the equation Size in bytes=103×n,\text{Size in bytes} = 10^{3 \times n}, where nn denotes the prefix order (n=1n=1 for kilo-, n=2n=2 for mega-, up to n=5n=5 for peta- and n=6n=6 for exa-). For example, 1 exabyte (EB) comprises 101810^{18} bytes. These decimal conventions are prevalent in hard drive manufacturing and networking protocols, prioritizing consumer familiarity with metric measurements over computational binary alignments. ISO/IEC 80000-13:2008 further endorses this for information technology, recommending SI prefixes to enhance clarity in storage and data rate expressions. A key distinction arises when comparing decimal units to their binary counterparts: 1 GB (decimal) totals 1,000,000,000 bytes, while 1 GiB equals 1,073,741,824 bytes (2302^{30}), yielding roughly a 7% variance that manifests as reduced apparent capacity in binary-displaying operating systems. For a 1 TB drive labeled in decimal terms (1,000,000,000,000 bytes), systems report approximately 931 GB, illustrating this practical implication.

Historical Disputes and Resolutions

During the 1980s and 1990s, significant ambiguity surrounded the definition of byte multiples like the kilobyte (KB), with computing hardware and software conventionally interpreting 1 KB as 1024 bytes based on binary powers of two, while hard disk drive (HDD) manufacturers increasingly adopted decimal interpretations of 1000 bytes to inflate advertised capacities. This divergence, driven by HDD marketing strategies to highlight larger storage sizes, caused widespread consumer frustration as operating systems reported usable space closer to 93% of the labeled amount due to binary calculations. To address the growing confusion, the International Electrotechnical Commission (IEC) approved a set of binary prefixes in December 1998, including "kibi" (symbol Ki) for 210 or 1024, "mebi" (Mi) for 220 or 1,048,576, and similar terms up to "yobi" (Yi) for 280, explicitly distinguishing them from decimal SI prefixes. In 2000, the U.S. National Institute of Standards and Technology (NIST) endorsed these IEC binary prefixes, recommending their use in technical contexts to avoid ambiguity and aligning with international standards for data storage and memory specifications. Legal actions further highlighted the issue. In 2006, Western Digital settled a class-action lawsuit alleging deceptive advertising of gigabyte (GB) capacities on drives like the 80 GB and 120 GB models, where actual binary-displayed space fell short; the settlement required the company to disclose its decimal definition (1 GB = 1,000,000,000 bytes) on product packaging, websites, and software for five years, along with providing free data recovery tools to affected customers. A similar 2007 class-action against Seagate resulted in a settlement offering refunds equivalent to 5% of purchase price (up to $7 per drive) to millions of customers and mandating clearer labeling of decimal versus binary interpretations to prevent future misleading claims. In the European Union, the Unfair Commercial Practices Directive (2005/29/EC) has prohibited misleading advertisements on storage capacities, enabling national authorities to pursue cases against vendors for deceptive decimal labeling without binary disclaimers, thereby reinforcing consumer protections against such discrepancies. More recently, the 2025 edition of ISO/IEC 80000-13 on quantities and units in information technology reaffirms the IEC binary prefixes and urges their consistent adoption alongside decimal ones to fully resolve lingering ambiguities in byte multiple definitions across global standards.

Applications in Computing

Storage and Memory

In data storage devices such as hard disk drives (HDDs) and solid-state drives (SSDs), the byte serves as the fundamental addressable unit, with data organized into sectors that are typically multiples of bytes. Traditional HDD sectors measure 512 bytes, representing the smallest unit for reading or writing data since the early 1980s. Modern HDDs often employ Advanced Format technology with 4,096-byte (4 KiB) physical sectors to enhance storage density and error correction on high-capacity drives exceeding 1 terabyte. SSDs, while internally using pages and blocks rather than traditional tracks and sectors, emulate 512-byte or 4,096-byte sectors for compatibility with operating systems and software. This emulation, known as 512e for 512-byte presentation, allows seamless integration but can introduce overhead in read-modify-write operations on native 4,096-byte structures. Computer memory, particularly random-access memory (RAM), is structured in bytes, enabling fine-grained access in byte-addressable architectures prevalent in modern systems. In byte-addressable memory, each unique address corresponds to a single byte (8 bits), allowing the CPU to directly read or write individual bytes without unnecessary overhead. The x86 architecture, widely used in personal computers, employs byte-addressable memory, where 32-bit or 64-bit addresses reference bytes in RAM. For instance, an 8 GB RAM module, as defined by manufacturers in decimal notation, equates to 8 × 10^9 bytes. File systems allocate storage space in bytes by grouping sectors into larger clusters, ensuring efficient management of data on disks. A cluster, the basic allocation unit, consists of one or more consecutive 512-byte sectors, such as 4 sectors totaling 2,048 bytes, with files occupying whole clusters even if partially filled. In the File Allocation Table (FAT) filesystem, cluster size is calculated as the product of bytes per sector (e.g., 512) and sectors per cluster (a power of 2, up to 128), limiting maximum sizes to 32 KB for broad compatibility, though larger clusters up to 256 KB are supported in modern implementations. This byte-based allocation minimizes fragmentation while tracking file sizes and locations precisely in bytes. In processor caches, which bridge the speed gap between CPU and memory, data is transferred in fixed-size lines typically measuring 64 bytes to optimize bandwidth and exploit spatial locality. Intel's IA-32 and Intel 64 architectures specify 64-byte cache lines, where accessing any byte fetches the entire line into the cache. Similarly, AMD's Zen microarchitecture uses 64-byte cache lines, enabling efficient prefetching of adjacent data. Storage capacities have evolved dramatically in byte terms, from the 1970s-era 3.5-inch high-density floppy disks holding 1.44 MB (1,474,560 bytes in binary notation, derived from 2,880 sectors of 512 bytes each) to contemporary SSDs offering 1 TB (1 × 10^12 bytes in decimal notation). This progression reflects advances in density, with early floppies limited by magnetic media constraints and modern SSDs leveraging flash memory for terabyte-scale byte storage.

Data Processing and Encoding

In data processing, bytes serve as the fundamental unit for encoding and manipulating information within computational systems. Character encoding schemes, such as UTF-8, represent Unicode characters using a variable number of bytes, typically ranging from 1 to 4 bytes per character to accommodate the full range of code points up to U+10FFFF. This variable-width approach ensures backward compatibility with ASCII, where the first 128 characters (0x00 to 0x7F) are encoded in a single byte, effectively utilizing 7 bits while reserving the eighth bit for extension. ASCII, formalized as a 7-bit standard, thus fits entirely within one byte in modern 8-bit systems, enabling efficient handling of basic Latin text without additional overhead. Central processing units (CPUs) process bytes through arithmetic logic units (ALUs) that perform operations on 8-bit values, such as addition, subtraction, and bitwise manipulations, forming the basis for more complex computations on multi-byte data types. For instance, low-level assembly instructions like BSWAP on x86 architectures reverse the byte order within a 32-bit or 64-bit register, facilitating data conversion between different endian formats during processing. These operations highlight the byte's role in granular data handling, where registers and ALUs treat sequences of bytes as building blocks for integers, floating-point numbers, and other structures. In network protocols, bytes define the structure and transmission of data packets. Ethernet frames, as specified in IEEE 802.3, carry a payload with a maximum transmission unit (MTU) of 1500 bytes, excluding headers, to balance efficiency and error detection in local area networks. Bandwidth is commonly measured in bytes per second, with Gigabit Ethernet theoretically supporting up to approximately 125 MB/s (megabytes per second), though practical rates like 100 MB/s account for protocol overhead and real-world conditions. Encoding techniques further illustrate byte manipulation in data processing. Base64, a method for representing binary data in an ASCII-compatible format, converts every 3 bytes of input into 4 characters from a 64-symbol alphabet, increasing the data size by about 33% to ensure safe transmission over text-based protocols. Endianness, the ordering of bytes in multi-byte integers, affects storage and processing; for example, a little-endian system like x86 stores the least significant byte at the lowest memory address, which can require byte swaps when interfacing with big-endian networks like IP protocols.

References

Add your contribution
Related Hubs
Contribute something
User Avatar
No comments yet.