Hubbry Logo
Units of informationUnits of informationMain
Open search
Units of information
Community hub
Units of information
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Units of information
Units of information
from Wikipedia

A unit of information is any unit of measure of digital data size. In digital computing, a unit of information is used to describe the capacity of a digital data storage device. In telecommunications, a unit of information is used to describe the throughput of a communication channel. In information theory, a unit of information is used to measure information contained in messages and the entropy of random variables.

Due to the need to work with data sizes that range from very small to very large, units of information cover a wide range of data sizes. Units are defined as multiples of a smaller unit except for the smallest unit which is based on convention and hardware design. Multiplier prefixes are used to describe relatively large sizes.

For binary hardware, by far the most common hardware today, the smallest unit is the bit, a portmanteau of binary digit,[1] which represents a value that is one of two possible values; typically shown as 0 and 1. The nibble, 4 bits, represents the value of a single hexadecimal digit. The byte, 8 bits, 2 nibbles, is possibly the most commonly known and used base unit to describe data size. The word is a size that varies by and has a special importance for a particular hardware context. On modern hardware, a word is typically 2, 4 or 8 bytes, but the size varies dramatically on older hardware. Larger sizes can be expressed as multiples of a base unit via SI metric prefixes (powers of ten) or the newer and generally more accurate IEC binary prefixes (powers of two).

Information theory

[edit]
Comparison of units of information: bit, trit, nat, ban. Quantity of information is the height of bars. Dark green level is the "nat" unit.

In 1928, Ralph Hartley observed a fundamental storage principle,[2] which was further formalized by Claude Shannon in 1945: the information that can be stored in a system is proportional to the logarithm of N possible states of that system, denoted logb N. Changing the base of the logarithm from b to a different number c has the effect of multiplying the value of the logarithm by a fixed constant, namely logc N = (logc b) logb N. Therefore, the choice of the base b determines the unit used to measure information. In particular, if b is a positive integer, then the unit is the amount of information that can be stored in a system with b possible states.

When b is 2, the unit is the shannon, equal to the information content of one "bit". A system with 8 possible states, for example, can store up to log2 8 = 3 bits of information. Other units that have been named include:

Base b = 3
the unit is called "trit", and is equal to log2 3 (≈ 1.585) bits.[3]
Base b = 10
the unit is called decimal digit, hartley, ban, decit, or dit, and is equal to log2 10 (≈ 3.322) bits.[2][4][5][6]
Base b = e, the base of natural logarithms
the unit is called a nat, nit, or nepit (from Neperian), and is worth log2 e (≈ 1.443) bits.[2]

The trit, ban, and nat are rarely used to measure storage capacity; but the nat, in particular, is often used in information theory, because natural logarithms are mathematically more convenient than logarithms in other bases.

Units derived from bit

[edit]

Several conventional names are used for collections or groups of bits.

Byte

[edit]

Historically, a byte was the number of bits used to encode a character of text in the computer, which depended on computer hardware architecture, but today it almost always means eight bits – that is, an octet. An 8-bit byte can represent 256 (28) distinct values, such as non-negative integers from 0 to 255, or signed integers from −128 to 127. The IEEE 1541-2002 standard specifies "B" (upper case) as the symbol for byte (IEC 80000-13 uses "o" for octet in French, but also allows "B" in English). Bytes, or multiples thereof, are almost always used to specify the sizes of computer files and the capacity of storage units. Most modern computers and peripheral devices are designed to manipulate data in whole bytes or groups of bytes, rather than individual bits.

Nibble

[edit]

A group of four bits, or half a byte, is sometimes called a nibble, nybble or nyble. This unit is most often used in the context of hexadecimal number representations, since a nibble has the same number of possible values as one hexadecimal digit has.[7]

Word, block, and page

[edit]

Computers usually manipulate bits in groups of a fixed size, conventionally called words. The number of bits in a word is usually defined by the size of the registers in the computer's CPU, or by the number of data bits that are fetched from its main memory in a single operation. In the IA-32 architecture more commonly known as x86-32, a word is 32 bits, but other past and current architectures use words with 4, 8, 9, 12, 13, 16, 18, 20, 21, 22, 24, 25, 29, 30, 31, 32, 33, 35, 36, 38, 39, 40, 42, 44, 48, 50, 52, 54, 56, 60, 64, 72[8] bits or others.

Some machine instructions and computer number formats use two words (a "double word" or "dword"), or four words (a "quad word" or "quad").

Computer memory caches usually operate on blocks of memory that consist of several consecutive words. These units are customarily called cache blocks, or, in CPU caches, cache lines.

Virtual memory systems partition the computer's main storage into even larger units, traditionally called pages.

Multiplicative prefixes

[edit]

A unit for a large amount of data can be formed using either a metric or binary prefix with a base unit. For storage, the base unit is typically a byte. For communication throughput, a base unit of bit is common. For example, using the metric kilo prefix, a kilobyte is 1000 bytes and a kilobit is 1000 bits.

Use of metric prefixes is common. In the context of computing, some of these prefixes (primarily kilo, mega and giga) are used to refer to the nearest power of two. For example, 'kilobyte' often refers to 1024 bytes even though the standard meaning of kilo is 1000.

Symbol Prefix Multiple
k kilo 1000
M mega 10002
G giga 10003
T tera 10004
P peta 10005
E exa 10006
Z zetta 10007
Y yotta 10008
R ronna 10009
Q quetta 100010

The International Electrotechnical Commission (IEC) standardized binary prefixes for binary multiples to avoid ambiguity through their similarity to the standard metric terms. These are based on powers of 1024, which is a power of 2.[9]

Symbol Prefix Multiple Example
Ki kibi 210, 1024 kibibyte (KiB)
Mi mebi 220, 10242 mebibyte (MiB)
Gi gibi 230, 10243 gibibyte (GiB)
Ti tebi 240, 10244 tebibyte (TiB)
Pi pebi 250, 10245 pebibyte (PiB)
Ei exbi 260, 10246 exbibyte (EiB)
Zi zebi 270, 10247 zebibyte (ZiB)
Yi yobi 280, 10248 yobibyte (YiB)
Ri robi 290, 10249 robibyte (RiB)
Qi quebi 2100, 102410 quebibyte (QiB)

The JEDEC memory standard JESD88F notes that its inclusion of the definitions of kilo (K), mega (M) and giga (G) based on powers of two are included only to reflect common usage, but that these are otherwise deprecated.[10]

Size examples

[edit]
  • 1 bit: Answer to a yes/no question
  • 1 byte: A number from 0 to 255
  • 90 bytes: Enough to store a typical line of text from a book
  • 512 bytes = 0.5 KiB: The typical sector size of an old style hard disk drive (modern Advanced Format sectors are 4096 bytes).
  • 1024 bytes = 1 KiB: A block size in some older UNIX filesystems
  • 2048 bytes = 2 KiB: A CD-ROM sector
  • 4096 bytes = 4 KiB: A memory page in x86 (since Intel 80386) and many other architectures, also the modern Advanced Format hard disk drive sector size.
  • 4 kB: About one page of text from a novel
  • 120 kB: The text of a typical pocket book
  • 1 MiB: A 1024×1024 pixel bitmap image with 256 colors (8 bpp color depth)
  • 3 MB: A three-minute song (133 kbit/s)
  • 650–900 MB – a CD-ROM
  • 1 GB: 114 minutes of uncompressed CD-quality audio at 1.4 Mbit/s
  • 16 GB: DDR5 DRAM laptop memory under $40 (as of early 2024)
  • 32/64/128 GB: Three common sizes of USB flash drives
  • 1 TB: The size of a $30 hard disk (as of early 2024)
  • 6 TB: The size of a $100 hard disk (as of early 2022)
  • 16 TB: The size of a small/cheap $130 (as of early 2024) enterprise SAS hard disk drive
  • 24 TB: The size of $440 (as of early 2024) "video" hard disk drive
  • 32 TB: Largest hard disk drive (as of mid-2024)
  • 100 TB: Largest commercially available solid-state drive (as of mid-2024)
  • 200 TB: Largest solid-state drive constructed (prediction for mid-2022)
  • 1.6 PB (1600 TB): Amount of possible storage in one 2U server (world record as of 2021, using 100 TB solid-states drives).[11]
  • 1.3 ZB: Prediction of the volume of the whole internet in 2016

Obsolete and unusual units

[edit]

Some notable unit names that are today obsolete or only used in limited contexts.

  • 5 bits: pentad, pentade,[23]
  • 7 bits: heptad, heptade[23]
  • 9 bits: nonet,[27] rarely used
  • 18 bits: chomp, chawmp (on a 36-bit machine)[38]
  • 256 bytes: page (on Intel 4004,[44] 8080 and 8086 processors,[42] also many other 8-bit processors – typically much larger on many 16-bit/32-bit processors)

See also

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Units of information are standardized measures that quantify the amount of stored, transmitted, or processed in digital systems, as well as the uncertainty or in probabilistic events within . The concept originated with foundational work in the and , where was formalized as a logarithmic function of probability, enabling precise of informational content. These units facilitate everything from data compression and error correction to limits in communication systems. In information theory, the basic unit is the bit (binary digit or shannon), defined as the information conveyed by an event with probability 12\frac{1}{2}, equivalent to log2(12)=1-\log_2 \left( \frac{1}{2} \right) = 1 bit. This unit arises from using the base-2 logarithm in entropy calculations, H=pilog2piH = -\sum p_i \log_2 p_i, where HH measures average uncertainty in bits. Alternative units include the nat (natural unit), based on the natural logarithm (base ee), where 1 nat equals approximately 1.4427 bits and represents the information of an event with probability 1e\frac{1}{e}. Another is the hartley (or ban/dit), using the base-10 logarithm, where 1 hartley equals about 3.3219 bits and stems from early work on message selection from finite sets. These units are interconvertible via logarithmic base changes, with bits being the most common in digital contexts due to binary hardware. In and , the bit remains the atomic unit, but practical units build upon it for efficiency. A (or half-byte) consists of 4 bits, often used to represent digits in low-level programming and hardware design. The byte, defined as 8 bits, serves as the standard grouping for (e.g., ASCII) and memory addressing, capable of representing 256 distinct values (0 to 255). Larger quantities employ binary prefixes to avoid ambiguity with decimal multiples: for example, 1 kibibyte (KiB) = 2102^{10} bytes = 1,024 bytes, while 1 traditionally means 1,000 bytes in some contexts, though NIST recommends binary prefixes for powers of 2 in computing. Common multiples extend to mebibyte (MiB, 2202^{20} bytes), gibibyte (GiB, 2302^{30} bytes), and beyond, underpinning storage capacities in devices from RAM to hard drives. These units ensure interoperability in standards like protocols and cryptographic algorithms.

Theoretical Foundations

Information Theory Basics

Information theory, pioneered by , defines information as the measure of uncertainty reduction in a communication system, quantifying how much knowledge is gained from receiving a . This foundational concept emerged from Shannon's efforts to optimize and signals, addressing the fundamental question of how to reliably transmit messages over noisy channels. By formalizing in probabilistic terms, Shannon provided a framework independent of the message's meaning, focusing solely on its statistical properties to ensure efficient encoding and decoding. Central to this theory is the concept of , which calculates the average of a . For a discrete XX with possible outcomes xx and p(x)p(x), H(X)H(X) is given by H(X)=xp(x)log2p(x),H(X) = -\sum_{x} p(x) \log_2 p(x), where the logarithm base 2 yields a result in bits, representing the expected number of yes/no questions needed to identify an outcome. This formula captures the inherent in the source: higher indicates greater unpredictability and thus more information required per symbol, while zero corresponds to complete . Shannon derived this measure by extending earlier work on thermodynamic to communication, ensuring it satisfies key axioms like additivity for independent . Shannon's framework deliberately emphasizes syntactic —the quantifiable amount of —over semantic , which pertains to the interpreted meaning or value of the content. This distinction allows to apply universally to any symbol system, from electrical signals to text, without delving into subjective interpretations. By prioritizing syntax, the theory enables practical applications in compression and correction, where the goal is to minimize while preserving message . The bit, as the fundamental unit in this system, arises directly from binary decision processes modeled in Shannon's work. Developed during at Bell Laboratories, where Shannon analyzed and switching circuits, this approach revolutionized communication engineering by establishing a rigorous metric for . The bit, short for binary digit, serves as the fundamental unit of in , quantifying the uncertainty resolved by a choice between two equally probable alternatives, such as a flip. This unit, introduced by in his seminal 1948 paper, corresponds to the of a binary random variable with equal probabilities and is equivalent to one shannon, mathematically expressed as log22=1\log_2 2 = 1. In contrast, the nat (natural unit of ) employs the natural logarithm for measuring , particularly in the entropy expression H(X)=p(x)lnp(x)H(X) = -\sum p(x) \ln p(x), where the resulting value is in nats. One nat represents the of an event with probability 1/e1/e, and it is approximately equal to 1.4427 bits due to the base conversion factor log2e\log_2 e. The hartley, also known as a ban or decit, is a logarithmic unit based on the common (base-10) logarithm, originally proposed by Ralph Hartley in his 1928 work on information transmission. Defined such that one hartley equals log1010=1\log_{10} 10 = 1, it measures the information in an event with probability 1/101/10 and converts to approximately 3.3219 bits via the factor log210\log_2 10. This unit aligns with decimal systems and is formalized in international standards for information quantities. Conversions between these units follow from the change of logarithmic base: one bit equals ln2\ln 2 nats (approximately 0.693 nats), while one hartley equals log210\log_2 10 bits. These relationships ensure consistent quantification across different bases, with the bit scaled by log2b\log_2 b relative to a unit in base bb. In practice, bits predominate in digital computing and communication systems due to their alignment with binary hardware, as established by Shannon's framework. Nats find application in theoretical and continuous probability models, where the natural logarithm facilitates analytical derivations. Hartleys, though less common today, were historically used in early and to assess channel capacities in terms.

Binary-Derived Units

Nibble and Byte

A , sometimes spelled nybble, is a unit of digital information equal to four consecutive bits. This grouping allows a nibble to encode 16 possible values, ranging from 0 to 15 in or 0 to F in notation, making it equivalent to a single hexadecimal digit. In , nibbles are commonly used in contexts like (BCD) representations on early mainframes, where each nibble stores one decimal digit for efficient numerical processing. The term "nibble" emerged in the late 1950s as a playful extension of terminology, referring to half the size of a byte and evoking the idea of a small bite. While its exact coining is attributed to informal usage among researchers, such as a 1958 remark by Professor David B. Benson during discussions on encoding, it gained traction in technical literature by the . The byte is a fundamental unit of digital information, consisting of eight bits in modern systems, and serves as the smallest addressable unit of in most processors. A byte can represent 256 distinct values (2^8), from 0 to 255 in or 00 to FF in . This size enables the encoding of a wide range of characters and symbols, including the full American Standard Code for Information Interchange (ASCII), which assigns 128 characters to the lower seven bits, with the eighth bit often reserved for error-checking parity in early implementations. The ASCII standard, developed by the (ANSI) and first published in 1968 (building on proposals from 1963), fits precisely within one byte, facilitating text storage and transmission in environments. Historically, the byte's size varied across early computers; for instance, systems in the , such as those using BCD, employed six-bit bytes to encode 64 characters, sufficient for alphanumeric data in business applications. The term "byte" was coined in June 1956 by engineer Werner Buchholz during the design of the , deliberately respelled from "bite" to distinguish it from "bit" while suggesting a larger unit of . Standardization to eight bits occurred in the early , solidified by 's System/360 architecture announced in , which adopted the eight-bit byte to support international character sets, efficient addressing, and compatibility with emerging standards like ASCII. This shift marked a pivotal moment in , as the System/360's widespread adoption influenced the industry to converge on the eight-bit byte as the .

Word, Block, and Page

In , a word represents the natural unit of handled by the processor for most operations, such as arithmetic, logic, and data movement, with its size varying by to match the width of registers and data paths. Typical word sizes include 16 bits in early minicomputers, 32 bits in many 32-bit processors, and 64 bits in contemporary 64-bit systems, allowing efficient processing of integers, addresses, and instructions aligned to that width. For instance, in the x86 originating from the , a word is defined as 16 bits, serving as the basic unit for early operations like string moves and comparisons. Extensions of the word size provide larger units for expanded data handling in modern architectures. A double word (dword) consists of two words, typically 32 bits in 16-bit-based systems like x86, and is used for full-width registers such as EAX in 32-bit mode, enabling operations on larger integers and memory addresses up to 4 GB. Similarly, a quadruple word (qword) comprises four words or two double words, equaling 64 bits, which supports 64-bit registers like RAX in x86-64 mode for addressing vast memory spaces up to 2^64 bytes and SIMD instructions in extensions like SSE and AVX. These multiples maintain while scaling with hardware generations, from 16-bit words in systems like the PDP-11—where registers and ALU operations processed 16-bit data—to 64-bit words in current CPUs for . Blocks extend word-based units into fixed-size chunks optimized for input/output (I/O) transfers and caching, minimizing overhead in data movement between storage and . Historically, blocks were often 512 bytes to align with disk sectors, facilitating efficient reads and writes in early storage systems. In modern contexts, block sizes commonly reach 4 KB, matching filesystem and cache line aggregates to reduce I/O latency and improve throughput in operations like buffering and prefetching. Pages serve as the fundamental unit for allocation and management in operating systems, enabling efficient mapping between virtual and physical addresses via page tables. The standard page size in contemporary systems is 4 KB, balancing translation overhead, TLB efficiency, and fragmentation for workloads spanning gigabytes of RAM. Historical variations included smaller sizes like 512 bytes in some early systems to accommodate limited , though early UNIX implementations used 512-byte disk blocks and 512-byte swap units alongside smaller 64-byte core allocations for file-system integration. This evolution ties page sizes to hardware capabilities, with larger options like 2 MB or 1 GB now available for reducing table entries in memory-intensive applications.

Binary Multiplicative Prefixes

Binary multiplicative prefixes, also known as binary prefixes, are standardized terms used to denote powers of two when scaling units of information, particularly bits and bytes, in and data transmission. Unlike the decimal-based SI prefixes, which represent powers of ten (e.g., for 103=100010^3 = 1000), binary prefixes are specifically designed for the binary nature of digital systems, where is organized in powers of two. The (IEC) introduced these prefixes to eliminate longstanding ambiguities in . The binary prefixes were approved by the IEC Technical Committee 25 in December 1998 and formally published in 2 to IEC 60027-2 in 1999, with incorporation into the standard's second edition in November 2000. This standardization effort addressed the historical debate over the interpretation of prefixes like "kilo-" in contexts, where it had conventionally meant 210=[1024](/page/1024)2^{10} = [1024](/page/1024) rather than the SI definition of 1000. The confusion arose because early computer engineers adopted 1024 (a ) for convenience in addressing and storage, leading to discrepancies that escalated with larger scales—for instance, a "" could mean either 109=[1,000,000,000](/page/1,000,000,000)10^9 = [1,000,000,000](/page/1,000,000,000) bytes () or 230=1,073,741,8242^{30} = 1,073,741,824 bytes (binary), resulting in about a 7% difference. To resolve this, the IEC defined a set of binary prefixes with the suffix "-bi" (or symbol ending in "i"), explicitly tied to powers of 2102^{10}. The primary ones include:
FactorNameSymbolValue
2102^{10}kibiKi1,024
2202^{20}mebiMi1,048,576
2302^{30}gibiGi1,073,741,824
2402^{40}tebiTi1,099,511,627,776
2502^{50}pebiPi1,125,899,906,842,624
2602^{60}exbiEi1,152,921,504,606,846,976
These prefixes are applied to base units like the bit (e.g., 1 Kibit = 1024 bits) or byte (e.g., 1 MiB = 2202^{20} bytes), providing precise scaling without overlap with interpretations. In practice, traditional decimal-like prefixes persist in many applications: for rates, "kbps" typically denotes kilobits per second as 1000 bits per second (), while for storage capacities, "MB" often implies 2202^{20} bytes (binary). However, standardization bodies like the IEEE (in IEEE 1541-2002) and ISO/IEC 80000-13:2008 endorse binary prefixes for technical documentation, promoting a modern shift toward terms like GiB in rigorous contexts to avoid errors in large-scale handling.

Alternative Units

Non-Binary Base Units

Non-binary base units encompass digits defined in numeral systems with radices greater than 2, allowing each to represent multiple states beyond the binary choice of 0 or 1. These units are particularly relevant in theoretical contexts where higher radices can increase per , though practical implementations are limited by hardware preferences for binary logic. A key example is the , the fundamental unit in base-3 (ternary) systems, which can take three values: 0, 1, and 2 in unbalanced ternary or -1, 0, and +1 in . Each encodes log231.585\log_2 3 \approx 1.585 bits of , providing roughly 58% more capacity than a single bit while using comparable physical resources in certain designs. Balanced ternary, with its symmetric states, facilitates efficient arithmetic operations, including natural handling of negative values without additional sign bits, which contrasts with binary representations that often require extra bits for signed integers. Historically, ternary logic found application in the computer, developed at in the late 1950s. This machine employed with 18-trit words and ternary logic elements, achieving simpler circuitry and lower production costs compared to contemporary binary computers—using about one-third fewer relays per bit equivalent—yet it remained a niche experiment due to the entrenched binary infrastructure in global standards. More broadly, non-binary units generalize to q-ary digits in classical systems of arbitrary base q>2q > 2, where each digit spans qq possible symbols and conveys log2q\log_2 q bits. For instance, a digit in base-10 encoding carries log2103.322\log_2 10 \approx 3.322 bits, making it suitable for human-readable but inefficient for binary hardware without conversion overhead. These units are rare in modern computing owing to the ubiquity of binary transistors and logic gates, which favor power-of-two efficiencies, though they persist in specialized domains like optical signaling or multi-level memory cells where higher densities justify the complexity. In , the capacity of non-binary alphabets is quantified through measures adapted to the base qq. The HH of a discrete with probabilities pip_i over qq symbols is given by H=ipilogqpi,H = -\sum_i p_i \log_q p_i, expressed in qq-its (e.g., trits for q=3q=3), which directly corresponds to the average number of such digits needed to encode messages from the source. This formulation, a straightforward extension of binary , underscores how non-binary bases can optimize coding efficiency for sources with multi-state outputs, though conversion to bits remains standard for cross-system comparisons.

Specialized Information Units

In quantum information theory, the (quantum bit) serves as the fundamental unit of quantum information, analogous to the classical bit but capable of existing in a superposition of states, represented mathematically as α0+β1\alpha |0\rangle + \beta |1\rangle where α2+β2=1|\alpha|^2 + |\beta|^2 = 1. Unlike a classical bit, which holds exactly 1 bit of information, a single qubit can encode up to 1 bit of classical information upon measurement, but when entangled with others, a system of nn qubits can represent exponentially more complex correlations, enabling computational advantages beyond classical limits. The concept of the qubit emerged in the late 1980s as part of foundational work on , with the term coined by Benjamin Schumacher in 1995 during discussions on quantum data compression. Extending this, the qutrit is a three-level quantum system that generalizes the , allowing superposition across states 0|0\rangle, 1|1\rangle, and 2|2\rangle, and serving as the quantum analog to the classical in higher-dimensional . Qutrits enable denser information encoding and more efficient quantum gates in certain algorithms, such as generalizations of Bernstein-Vazirani, though they introduce greater experimental challenges in realization due to the need for precise control over additional energy levels. A key specialized unit in quantum contexts is the ebit (entangled bit), which quantifies bipartite entanglement as the entanglement resource in a maximally entangled pair of qubits, such as the Bell state 12(00+11)\frac{1}{\sqrt{2}} (|00\rangle + |11\rangle)
Add your contribution
Related Hubs
User Avatar
No comments yet.