Hubbry Logo
Integer (computer science)Integer (computer science)Main
Open search
Integer (computer science)
Community hub
Integer (computer science)
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something
Integer (computer science)
Integer (computer science)
from Wikipedia

In computer science, an integer is a datum of integral data type, a data type that represents some range of mathematical integers.[1] Integral data types may be of different sizes and may or may not be allowed to contain negative values. Integers are commonly represented in a computer as a group of binary digits (bits). The size of the grouping varies so the set of integer sizes available varies between different types of computers. Computer hardware nearly always provides a way to represent a processor register or memory address as an integer.

Value and representation

[edit]

The value of an item with an integral type is the mathematical integer that it corresponds to. Integral types may be unsigned (capable of representing only non-negative integers) or signed (capable of representing negative integers as well).[2]

An integer value is typically specified in the source code of a program as a sequence of digits optionally prefixed with + or −. Some programming languages allow other notations, such as hexadecimal (base 16) or octal (base 8). Some programming languages also permit digit group separators.[3]

The internal representation of this datum is the way the value is stored in the computer's memory. Unlike mathematical integers, a typical datum in a computer has some minimal and maximum possible value.

The most common representation of a positive integer is a string of bits, using the binary numeral system. The order of the memory bytes storing the bits varies; see endianness. The width, precision, or bitness[4] of an integral type is the number of bits in its representation. An integral type with n bits can encode 2n numbers; for example an unsigned type typically represents the non-negative values 0 through 2n − 1. Other encodings of integer values to bit patterns are sometimes used, for example binary-coded decimal or Gray code, or as printed character codes such as ASCII.

There are four well-known ways to represent signed numbers in a binary computing system. The most common is two's complement, which allows a signed integral type with n bits to represent numbers from −2(n−1) through 2(n−1) − 1. Two's complement arithmetic is convenient because there is a perfect one-to-one correspondence between representations and values (in particular, no separate +0 and −0), and because addition, subtraction and multiplication do not need to distinguish between signed and unsigned types. Other possibilities include offset binary, sign-magnitude, and ones' complement.

Some computer languages define integer sizes in a machine-independent way; others have varying definitions depending on the underlying processor word size. Not all language implementations define variables of all integer sizes, and defined sizes may not even be distinct in a particular implementation. An integer in one programming language may be a different size in a different language, on a different processor, or in an execution context of different bitness; see § Words.

Some older computer architectures used decimal representations of integers, stored in binary-coded decimal (BCD) or other format. These values generally require data sizes of 4 bits per decimal digit (sometimes called a nibble), usually with additional bits for a sign. Many modern CPUs provide limited support for decimal integers as an extended datatype, providing instructions for converting such values to and from binary values. Depending on the architecture, decimal integers may have fixed sizes (e.g., 7 decimal digits plus a sign fit into a 32-bit word), or may be variable-length (up to some maximum digit size), typically occupying two digits per byte (octet).

Common integral data types

[edit]
Bits Name Range (assuming two's complement for signed) Decimal digits Uses Implementations
C/C++ C# Pascal and Delphi Java SQL[a] FORTRAN D Rust
4 nibble, semioctet Signed: From −8 to 7, from −(23) to 23 − 1 0.9 Binary-coded decimal, single decimal digit repre­sen­ta­tion
Unsigned: From 0 to 15, which equals 24 − 1 1.2
8 byte, octet, i8, u8 Signed: From −128 to 127, from −(27) to 27 − 1 2.11 ASCII characters, code units in the UTF-8 character encoding int8_t, signed char[b] sbyte Shortint byte tinyint INTEGER[c] byte i8
Unsigned: From 0 to 255, which equals 28 − 1 2.41 uint8_t, unsigned char[b] byte Byte unsigned tinyint ubyte u8
16 halfword, word, short, i16, u16 Signed: From −32,768 to 32,767, from −(215) to 215 − 1 4.52 UCS-2 characters, code units in the UTF-16 character encoding int16_t, short,[b] int[b] short Smallint short smallint INTEGER[c] short i16
Unsigned: From 0 to 65,535, which equals 216 − 1 4.82 uint16_t, unsigned,[b] unsigned int[b] ushort Word char[d] unsigned smallint ushort u16
32 word, long, doubleword, longword, int, i32, u32 Signed: From −2,147,483,648 to 2,147,483,647, from −(231) to 231 − 1 9.33 UTF-32 characters, true color with alpha, FourCC, pointers in 32-bit computing int32_t, int,[b] long[b] int LongInt; Integer[e] int int INTEGER[c] int i32
Unsigned: From 0 to 4,294,967,295, which equals 232 − 1 9.63 uint32_t, unsigned,[b] unsigned int,[b] unsigned long[b] uint LongWord; DWord; Cardinal[e] unsigned int uint u32
64 word, doubleword, longword, long, long long, quad, quadword, qword, int64, i64, u64 Signed: From −(263) to 263 − 1 18.96 Time (e.g. milli­seconds since the Unix epoch), pointers in 64-bit computing int64_t, long,[b] long long[b] long Int64 long bigint INTEGER[c] long i64
Unsigned: From 0 to 264 − 1 19.27 uint64_t, unsigned long long[b] ulong UInt64; QWord unsigned bigint ulong u64
128 octaword, double quadword, i128, u128 Signed: From −(2127) to 2127 − 1 38.23 Complex scientific cal­cula­tions,

IPv6 addresses, GUIDs

Only available as non-standard or compiler-specific extensions cent[f] i128
Unsigned: From 0 to 2128 − 1 38.53 ucent[f] u128
n n-bit integer
(general case)
Signed: −(2n−1) to (2n−1 − 1) (n − 1) log10 2 C23: _BitInt(n), signed _BitInt(n) Ada: range -2**(n-1)..2**(n-1)-1
Unsigned: 0 to (2n − 1) n log10 2 C23: unsigned _BitInt(n) Ada: range 0..2**n-1, mod 2**n; standard libraries' or third-party arbitrary arithmetic libraries' BigDecimal or Decimal classes in many languages such as Python, C++, etc.

Different CPUs support different integral data types. Typically, hardware will support both signed and unsigned types, but only a small, fixed set of widths.

The table above lists integral type widths that are supported in hardware by common processors. High-level programming languages provide more possibilities. It is common to have a 'double width' integral type that has twice as many bits as the biggest hardware-supported type. Many languages also have bit-field types (a specified number of bits, usually constrained to be less than the maximum hardware-supported width) and range types (that can represent only the integers in a specified range).

Some languages, such as Lisp, Smalltalk, REXX, Haskell, Python, and Raku, support arbitrary precision integers (also known as infinite precision integers or bignums). Other languages that do not support this concept as a top-level construct may have libraries available to represent very large numbers using arrays of smaller variables, such as Java's java.math.BigInteger class or Perl's "bigint" package.[7] These use as much of the computer's memory as is necessary to store the numbers; however, a computer has only a finite amount of storage, so they, too, can only represent a finite subset of the mathematical integers. These schemes support very large numbers; for example one kilobyte of memory could be used to store numbers up to 2466 decimal digits long.

A Boolean type is a type that can represent only two values: 0 and 1, usually identified with false and true respectively. This type can be stored in memory using a single bit, but is often given a full byte for convenience of addressing and speed of access.

A four-bit quantity is known as a nibble (when eating, being smaller than a bite) or nybble (being a pun on the form of the word byte). One nibble corresponds to one digit in hexadecimal and holds one digit or a sign code in binary-coded decimal.

Bytes and octets

[edit]

The term byte initially meant 'the smallest addressable unit of memory'. In the past, 5-, 6-, 7-, 8-, and 9-bit bytes have all been used. There have also been computers that could address individual bits ('bit-addressed machine'), or that could only address 16- or 32-bit quantities ('word-addressed machine'). The term byte was usually not used at all in connection with bit- and word-addressed machines.

The term octet always refers to an 8-bit quantity. It is mostly used in the field of computer networking, where computers with different byte widths might have to communicate.

In modern usage byte almost invariably means eight bits, since all other sizes have fallen into disuse; thus byte has come to be synonymous with octet.

Words

[edit]

The term 'word' is used for a small group of bits that are handled simultaneously by processors of a particular architecture. The size of a word is thus CPU-specific. Many different word sizes have been used, including 6-, 8-, 12-, 16-, 18-, 24-, 32-, 36-, 39-, 40-, 48-, 60-, and 64-bit. Since it is architectural, the size of a word is usually set by the first CPU in a family, rather than the characteristics of a later compatible CPU. The meanings of terms derived from word, such as longword, doubleword, quadword, and halfword, also vary with the CPU and OS.[8]

Practically all new desktop processors are capable of using 64-bit words, though embedded processors with 8- and 16-bit word size are still common. The 36-bit word length was common in the early days of computers.

One important cause of non-portability of software is the incorrect assumption that all computers have the same word size as the computer used by the programmer. For example, if a programmer using the C language incorrectly declares as int a variable that will be used to store values greater than 215−1, the program will fail on computers with 16-bit integers. That variable should have been declared as long, which has at least 32 bits on any computer. Programmers may also incorrectly assume that a pointer can be converted to an integer without loss of information, which may work on (some) 32-bit computers, but fail on 64-bit computers with 64-bit pointers and 32-bit integers. This issue is resolved by C99 in stdint.h in the form of intptr_t.

The bitness of a program may refer to the word size (or bitness) of the processor on which it runs, or it may refer to the width of a memory address or pointer, which can differ between execution modes or contexts. For example, 64-bit versions of Microsoft Windows support existing 32-bit binaries, and programs compiled for Linux's x32 ABI run in 64-bit mode yet use 32-bit memory addresses.[9]

Standard integer

[edit]

The standard integer size is platform-dependent.

In C, it is denoted by int and required to be at least 16 bits. Windows and Unix systems have 32-bit ints on both 32-bit and 64-bit architectures.

Short integer

[edit]

A short integer can represent a whole number that may take less storage, while having a smaller range, compared with a standard integer on the same machine.

In C, it is denoted by short. It is required to be at least 16 bits, and is often smaller than a standard integer, but this is not required.[10][11] A conforming program can assume that it can safely store values between −(215 − 1)[12] and 215 − 1,[13] but it may not assume that the range is not larger. In Java, a short is always a 16-bit integer. In the Windows API, the datatype SHORT is defined as a 16-bit signed integer on all machines.[8]

Common short integer sizes
Programming language Data type name Signedness Size in bytes Minimum value Maximum value
C and C++ short signed 2 −32,767[g] +32,767
unsigned short unsigned 2 0 65,535
C# short signed 2 −32,768 +32,767
ushort unsigned 2 0 65,535
Java short signed 2 −32,768 +32,767
SQL smallint signed 2 −32,768 +32,767

Long integer

[edit]

A long integer can represent a whole integer whose range is greater than or equal to that of a standard integer on the same machine.

In C, it is denoted by long. It is required to be at least 32 bits, and may or may not be larger than a standard integer. A conforming program can assume that it can safely store values between −(231 − 1)[12] and 231 − 1,[13] but it may not assume that the range is not larger.

Common long integer sizes
Programming language Approval Type Platforms Data type name Storage in bytes Signed range Unsigned range
C ISO/ANSI C99 International Standard Unix, 16/32-bit systems[8]
Windows, 16/32/64-bit systems[8]
long 4
(minimum require­ment 4)
−2,147,483,647 to +2,147,483,647 0 to 4,294,967,295
(minimum require­ment)
C ISO/ANSI C99 International Standard Unix,
64-bit systems[8][11]
long 8
(minimum require­ment 4)
−9,223,372,036,854,775,807 to +9,223,372,036,854,775,807 0 to 18,446,744,073,709,551,615
C++ ISO/ANSI International Standard Unix, Windows,
16/32-bit system
long 4 [14]
(minimum require­ment 4)
−2,147,483,648 to +2,147,483,647
0 to 4,294,967,295
(minimum require­ment)
C++/CLI International Standard
ECMA-372
Unix, Windows,
16/32-bit systems
long 4 [15]
(minimum require­ment 4)
−2,147,483,648 to +2,147,483,647
0 to 4,294,967,295
(minimum require­ment)
VB Company Standard Windows Long 4 [16] −2,147,483,648 to +2,147,483,647
VBA Company Standard Windows, Mac OS X Long 4[17] −2,147,483,648 to +2,147,483,647
SQL Server Company Standard Windows BigInt 8 −9,223,372,036,854,775,808 to +9,223,372,036,854,775,807 0 to 18,446,744,073,709,551,615
C#/ VB.NET ECMA International Standard Microsoft .NET long or Int64 8 −9,223,372,036,854,775,808 to +9,223,372,036,854,775,807 0 to 18,446,744,073,709,551,615
Java International/Company Standard Java platform long 8 −9,223,372,036,854,775,808 to +9,223,372,036,854,775,807
Pascal ? Windows, UNIX int64 8 −9,223,372,036,854,775,808 to +9,223,372,036,854,775,807 0 to 18,446,744,073,709,551,615 (Qword type)

Long long

[edit]

In the C99 version of the C programming language and the C++11 version of C++, a long long type is supported that has double the minimum capacity of the standard long. This type is not supported by compilers that require C code to be compliant with the previous C++ standard, C++03, because the long long type did not exist in C++03. For an ANSI/ISO compliant compiler, the minimum requirements for the specified ranges, that is, −(263 − 1)[12] to 263 − 1 for signed and 0 to 264 − 1 for unsigned,[13] must be fulfilled; however, extending this range is permitted.[18][19] This can be an issue when exchanging code and data between platforms, or doing direct hardware access. Thus, there are several sets of headers providing platform independent exact width types. The C standard library provides stdint.h; this was introduced in C99 and C++11.

Syntax

[edit]

Integer literals can be written as regular Arabic numerals, consisting of a sequence of digits and with negation indicated by a minus sign before the value. However, most programming languages disallow use of commas or spaces for digit grouping. Examples of integer literals are:

  • 42
  • 10000
  • -233000

There are several alternate methods for writing integer literals in many programming languages:

  • Many programming languages, especially those influenced by C, prefix an integer literal with 0X or 0x to represent a hexadecimal value, e.g. 0xDEADBEEF. Other languages may use a different notation, e.g. some assembly languages append an H or h to the end of a hexadecimal value.
  • Perl, Ruby, Java, Julia, D, Go, C#, Rust, Python (starting from version 3.6), and PHP (from version 7.4.0 onwards[20]) allow embedded underscores for clarity, e.g. 10_000_000, and fixed-form Fortran ignores embedded spaces in integer literals. C (starting from C23) and C++ use single quotes for this purpose.
  • In C and C++, a leading zero indicates an octal value, e.g. 0755. This was primarily intended to be used with Unix modes; however, it has been criticized because normal integers may also lead with zero.[21] As such, Python, Ruby, Haskell, and OCaml prefix octal values with 0O or 0o, following the layout used by hexadecimal values.
  • Several languages, including Java, C#, Scala, Python, Ruby, OCaml, C (starting from C23) and C++ can represent binary values by prefixing a number with 0B or 0b.

Extreme values

[edit]

In many programming languages, there exist predefined constants representing the least and the greatest values representable with a given integer type.

Names for these include

See also

[edit]

Notes

[edit]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
In computer science, an integer is a fundamental data type that represents whole numbers without fractional parts, typically encompassing positive, negative, and zero values within a defined range determined by the number of bits allocated for storage. These values are encoded in binary format using a fixed bit length, such as 8, 16, 32, or 64 bits, allowing computers to perform arithmetic operations efficiently on them. Integers can be classified as signed or unsigned, with signed variants supporting both positive and negative numbers through dedicated representation schemes, while unsigned types are limited to non-negative values to maximize the positive range. The most prevalent method for signed integers is two's complement representation, where the most significant bit serves as the sign bit (0 for positive or zero, 1 for negative), and negative values are derived by inverting all bits of the corresponding positive magnitude and adding 1. This approach eliminates ambiguities like dual representations of zero found in alternatives such as sign-magnitude or one's complement, and it simplifies hardware-level addition and subtraction by treating signed and unsigned operations similarly. Common integer sizes vary by programming language and architecture but follow standard conventions, as exemplified in languages like C++ where the int type is typically 32 bits, supporting signed ranges from -2,147,483,648 to +2,147,483,647. For broader portability, extended types like long long use 64 bits, extending the signed range to approximately -9.22 × 10¹⁸ to +9.22 × 10¹⁸. Overflow occurs when operations exceed these bounds, potentially wrapping around in two's complement systems—for instance, adding 1 to the maximum 8-bit signed value of 127 yields -128—necessitating careful handling in software to avoid undefined behavior. Byte order, or endianness, also influences multi-byte integer storage, with big-endian placing the most significant byte first and little-endian the reverse, affecting data interchange across systems.

Fundamentals

Definition and Purpose

In , an is a fundamental that represents discrete whole numbers, encompassing positive, negative, and zero values without any fractional or components. This type is designed to model exact counts and quantities in computational processes, forming a core building block for programming languages and hardware implementations. Typically, integers are encoded using binary (base-2) representation to align with the binary nature of digital systems, enabling efficient storage and manipulation on processors. The primary purpose of integers lies in their utility for precise, error-free numerical tasks that demand exactness, such as indexing elements in arrays, incrementing loop counters in iterative algorithms, addressing locations in , and performing arithmetic operations in routines. For instance, in array-based structures, integers serve as indices to access specific elements reliably, while in mechanisms like for-loops, they track iteration counts to ensure predictable execution. In , unsigned integers often represent byte addresses, allowing programs to navigate vast address spaces without ambiguity. These applications leverage integers' inherent exactness for counting occurrences—such as tallying frequencies in datasets—or assigning unique identifiers in databases, where fractional precision is irrelevant and could introduce complications. Unlike floating-point types, which approximate real numbers through binary fractions and thus incur rounding errors in storage and operations, integers maintain absolute precision within their defined range, avoiding any representational inaccuracies during , , or of supported values. This distinction ensures that integer-based computations yield deterministic results, critical for applications requiring reliability, such as financial tallies or cryptographic hashing, where even minor deviations could propagate significant issues. The Goldberg analysis underscores that while suits continuous approximations like scientific simulations, integers excel in discrete, verifiable calculations inherent to algorithmic logic.

Historical Context

The handling of integers in early computers began with the , completed in 1945, which represented numbers in a 10-digit decimal format using ring counters and vacuum tubes for arithmetic operations. This decimal approach facilitated human-readable input and output but proved cumbersome for electronic implementation. In contrast, John von Neumann's 1945 report on the architecture proposed a shift to binary representation for all data, including integers, enabling more efficient stored-program computing with 27-bit binary numbers as standard for precision in calculations. A key advancement in the 1950s was the development and adoption of representation for signed integers, which simplified hardware arithmetic by allowing through and eliminating the need for separate positive and negative zero values. This method became widespread due to its efficiency in binary systems. By the mid-1960s, the family further standardized integer handling by adopting an 8-bit byte and 32-bit word architecture across compatible models, establishing de facto industry norms for data sizes and enabling modular, interchangeable computing components. The standard, ratified in 1989, formalized integer types in programming by defining the basic int type as at least 16 bits wide, accommodating the growing prevalence of 32-bit systems while ensuring portability. Meanwhile, the standard for binary floating-point arithmetic, published in 1985, indirectly shaped handling by specifying exact conversions between integers and floating-point formats, promoting consistent behavior in mixed-precision computations. This evolution from fixed-size representations in 1950s machines culminated in modern languages like Python 3.0 in 2008, which unified integers to support arbitrary precision seamlessly, allowing handling of numbers beyond hardware limits without explicit programmer intervention.

Representation

Binary Encoding

In , integers are encoded in , representing each value as a of bits—binary digits that are either 0 or 1—where the position of each bit corresponds to a power of 2, starting from the least significant bit (LSB) at position 0 with value 20=12^0 = 1. For instance, the integer 5 is encoded as 101 in binary, equivalent to 1×22+0×21+1×20=4+0+1=51 \times 2^2 + 0 \times 2^1 + 1 \times 2^0 = 4 + 0 + 1 = 5. The number of bits allocated to an integer determines its representational capacity: with nn bits, exactly 2n2^n unique non-negative values can be distinguished, ranging from 0 to 2n12^n - 1. This binary encoding leverages the efficiency of digital hardware, where bits map directly to electrical states in memory cells. To convert a non-negative integer to binary, the standard involves repeated division by 2, recording the (0 or 1) at each step; the binary representation is then read from the remainders in reverse order, from the last (LSB) to the first (most significant bit, MSB). For example, converting 13: 13 ÷ 2 = 6 1, 6 ÷ 2 = 3 0, 3 ÷ 2 = 1 1, 1 ÷ 2 = 0 1, yielding 1101 (i.e., 1×23+1×22+0×21+1×20=8+4+0+1=131 \times 2^3 + 1 \times 2^2 + 0 \times 2^1 + 1 \times 2^0 = 8 + 4 + 0 + 1 = 13). When storing multi-byte integers in memory, the order of bytes affects interpretation: in big-endian format, the MSB byte is stored at the lowest , while in little-endian, the LSB byte is at the lowest address. For the 16-bit value 0x1234 (decimal 4660), big-endian storage places 0x12 at address 0 and 0x34 at address 1; little-endian reverses this to 0x34 at address 0 and 0x12 at address 1. This convention ensures consistent data portability across systems, with little-endian common in x86 architectures and big-endian in network protocols like TCP/IP. For performance reasons, integers are typically aligned in at boundaries that are multiples of their (e.g., a 4-byte integer at addresses divisible by 4), often through bytes, to enable efficient access by the processor's word-sized operations and avoid multiple memory fetches. Misaligned access can lead to exceptions or slowdowns on certain architectures.

Signed vs. Unsigned Integers

In computer science, integers are classified as signed or unsigned based on their ability to represent negative values. Unsigned integers are restricted to non-negative values, utilizing all available bits to encode the magnitude, which maximizes the positive range for a given bit width. For an n-bit unsigned integer, the representable values range from $0toto2^n - 1$; for example, an 8-bit unsigned integer (often denoted as uint8_t in languages like C) spans 0 to 255. Signed integers, by contrast, incorporate a sign bit, typically the most significant bit, to indicate positive or negative values, thereby reducing the range available for magnitude compared to unsigned integers of the same size. Common representation schemes for signed integers include sign-magnitude, where the sign bit precedes the absolute value encoded in the remaining bits; one's complement, which inverts all bits of the positive counterpart for negatives; and two's complement, which inverts bits and adds one for negatives (detailed further in the dedicated section). These methods generally yield a symmetric range around zero, such as -2^{n-1} to 2^{n-1} - 1 for n bits in two's complement, though sign-magnitude and one's complement introduce dual representations for zero (+0 and -0), complicating arithmetic. Unsigned integers are commonly used in programming for quantities that cannot be negative, such as array indices, memory addresses, counters, or buffer sizes, where the full positive range is beneficial. Signed integers suit scenarios involving potential negatives, like coordinate systems, financial calculations with debits, or temperature measurements. In languages like and C++, this distinction is explicit through type declarations: int represents a signed integer (typically 32 bits, ranging from -2,147,483,648 to 2,147,483,647 on most platforms), while unsigned int uses the same bit width for 0 to 4,294,967,295, with operations behaving differently for overflow—unsigned wraps modularly without error, aiding loop counters but risking subtle bugs if mixed with signed types. The primary trade-offs involve range and interpretation risks: unsigned integers double the maximum positive value at the cost of excluding negatives, and their modular overflow can prevent unexpected results in bounded operations but lead to misinterpretation if code assumes signed semantics, such as treating a large unsigned value as negative when cast. Signed integers enable negative representation essential for general-purpose arithmetic but halve the positive range and introduce sign-related complexities in bitwise operations or comparisons across types.

Two's Complement System

The two's complement system is a method for representing signed integers in binary form, where positive numbers are encoded in their standard binary representation and negative numbers are derived by inverting all bits of the corresponding positive value and adding 1. For example, to represent -5 in 4 bits, first encode +5 as 0101, invert the bits to obtain 1010, and add 1 to yield 1011. This process ensures that the sign bit (the most significant bit) is 0 for positive numbers and 1 for negative numbers, while maintaining compatibility with binary arithmetic hardware. In an n-bit two's complement representation, the possible values range from 2n1-2^{n-1} to 2n112^{n-1} - 1. For instance, an 8-bit system accommodates integers from -128 to 127, providing 256 distinct values but with an asymmetric distribution that favors negative numbers by one value. The negation of a number x in this system is computed as x+1\sim x + 1, where \sim denotes the bitwise complement, and any carry out of the most significant bit is discarded. Zero is uniquely represented as all bits set to 0, avoiding dual representations that complicate comparisons in other signed formats. This representation offers key advantages in hardware implementation and arithmetic operations. Addition and subtraction are unified: to subtract y from x, add the two's complement of y to x and ignore any overflow in the sign bit, eliminating the need for dedicated subtraction circuitry. Negation requires only a bitwise invert and increment, without special hardware beyond standard adders, which simplifies and reduces costs. These properties make two's complement efficient for general-purpose computing, as the same binary adder handles both signed and unsigned operations with minimal adjustments. However, the system has notable drawbacks. The asymmetric range means the largest positive value is one less than the magnitude of the smallest negative value, potentially limiting applications requiring balanced positive extents, such as certain tasks. Additionally, while zero's unique representation aids equality checks, the overall scheme is susceptible to overflow (e.g., 127 + 1 yielding -128 in 8 bits) and underflow, which can produce incorrect results if not detected. Two's complement was first proposed by John von Neumann in 1945 for the EDVAC and has since become the dominant format in most modern processor architectures, including the PDP-11 minicomputer introduced in 1970, x86, , and DSPs from manufacturers like and NXP.

Data Types and Sizes

Byte and Octet

A byte is a fundamental unit of digital information in , consisting of eight bits, which allows it to represent 256 distinct values. In unsigned form, a byte can store values from 0 to 255, while in signed form using representation, it ranges from -128 to 127. This 8-bit structure makes the byte the smallest addressable unit in most modern memory systems, serving as the building block for larger data types and storage. In networking contexts, the term octet is used interchangeably with byte to denote exactly eight bits, but it specifically emphasizes an unsigned interpretation to avoid ambiguities arising from varying byte sizes in older systems. Protocols such as the (IP) treat octets as unsigned 8-bit fields for transmitting data packets, ensuring consistent handling across diverse hardware. For instance, IP addresses are structured as sequences of four octets, facilitating reliable data exchange in networks. Bytes and octets find widespread application in , where the American Standard Code for Information Interchange (ASCII) utilizes seven bits within an 8-bit byte to represent 128 printable and control characters, with the eighth bit often reserved for parity or extended sets. They are also employed for small counters in software, such as loop indices or flags limited to 256 states, and in image processing, where individual intensities in images are stored as unsigned byte values from 0 (black) to 255 (white). Historically, the 8-bit byte emerged in the with IBM's , an encoding scheme designed for mainframe systems that required eight bits to accommodate a full set of characters including lowercase letters and symbols. This size was later standardized in the operating system interface, where the char type is defined as a single byte capable of holding 256 values in the POSIX locale. In programming languages like C, bytes are implemented via the char type, but variations exist between signed char (ranging -128 to 127) and unsigned char (0 to 255), with plain char potentially interpreted as either depending on the compiler implementation, leading to signedness ambiguity in portable code. Programmers must explicitly choose signed or unsigned variants when using bytes for numeric purposes to ensure predictable behavior across platforms.

Word and Half-Word

In , a word represents the natural unit of processed by a processor, typically corresponding to the width of its registers and paths. This size determines the efficiency of operations such as arithmetic, access, and instruction execution. Historically, word sizes varied based on hardware design; for instance, the from 1960 used an 18-bit word, allowing for 4096 words of core as standard. Similarly, the 1103 scientific computer, introduced in 1953, employed a 36-bit word size to handle complex numerical computations, with capacities up to 12,288 words. Over time, word sizes evolved to accommodate growing demands for larger address spaces and computational power. Early systems like the and 1103 reflected the 1950s and 1960s era of diverse architectures, but by the 1970s and 1980s, 32-bit words became prevalent in microprocessors for balancing performance and cost. The shift to 64-bit words accelerated in the early , driven by the need for addressing vast memory spaces beyond the 4 GB limit of 32-bit systems; AMD's architecture, introduced in 2003, extended the x86 instruction set to 64-bit registers and operations, enabling modern systems to handle up to 2^64 bytes of . Today, 32-bit and 64-bit words dominate, with the latter standard in desktop and server processors like for efficient register operations and memory addressing. For a 32-bit signed word using representation—the predominant method for signed integers—the range spans from -2^{31} to 2^{31} - 1, or -2147483648 to 2147483647. A half-word, defined as half the size of a full word in many architectures, consists of 16 bits (2 bytes) and is commonly used in embedded systems and for smaller data units. In signed , a 16-bit half-word ranges from -32768 to 32767. Half-words facilitate operations on compact datasets, such as in architectures where instructions load or store 16-bit values into register portions. Words and half-words are integral to memory addressing and register manipulations, where the processor's native word size optimizes transfer; in systems, 64-bit words align with the general-purpose registers (e.g., RAX) for seamless 64-bit integer handling. To enhance performance, data structures involving words are often aligned to multiples of their size in . This alignment prevents data from straddling cache line boundaries—typically 64 bytes—reducing the number of fetches required and minimizing access latency, as misaligned accesses can demand two cache line loads instead of one.

Standard Integer Variants (Short, Int, Long)

In computer science, standard integer variants such as short, int, and long provide portable abstractions for signed and unsigned integers in programming languages, with sizes defined by minimum requirements rather than fixed widths in most cases. These types are typically implemented as multiples of the 8-bit byte, allowing for efficient storage and computation on diverse hardware. The short integer type is typically 16 bits (2 bytes) wide, offering a signed range from -32,768 to 32,767, and is often used to conserve memory in large arrays or when smaller values suffice. In contrast, the int type serves as the default for general-purpose integers, with a minimum size of 16 bits but commonly 32 bits (4 bytes) on modern 32- and 64-bit systems, providing a signed range up to approximately ±2 billion. The long type extends this further, requiring at least 32 bits and typically 32 or 64 bits depending on the platform, while C and C++ also support long long as a 64-bit extension introduced in C99 for wider ranges. Standardization efforts address variability through fixed-width types. The standard, via the <stdint.h> header, defines exact-width types like int16_t (precisely 16 bits, signed range -32,768 to 32,767), int32_t (32 bits, signed range -2,147,483,648 to 2,147,483,647), and int64_t (64 bits), which implementations provide when supported without padding bits. In , these variants are rigidly fixed: short is always 16 bits (-32,768 to 32,767), int is 32 bits (-2,147,483,648 to 2,147,483,647), and long is 64 bits (-9,223,372,036,854,775,808 to 9,223,372,036,854,775,807), ensuring platform independence as specified in the Java Language Specification. Portability challenges arise from platform-dependent sizes in languages like C, where int was historically 16 bits on systems such as 16-bit DOS compilers and the PDP-11, but 64 bits on Cray supercomputers, potentially leading to overflow or incorrect assumptions in cross-platform code.
TypeTypical Size (bits)Signed Range (example for typical size)Standardization Reference
short16-32,768 to 32,767C99 <stdint.h> int16_t
int32-2,147,483,648 to 2,147,483,647C99 minimum 16 bits; Java fixed 32 bits
long32 or 64Varies; up to ±9 quintillion for 64 bitsC99 minimum 32 bits; Java fixed 64 bits

Operations and Syntax

Arithmetic and Bitwise Operations

Integer arithmetic operations in computer science primarily involve addition, subtraction, multiplication, and division on binary representations of signed and unsigned integers. In systems using two's complement for signed integers, addition and subtraction are performed uniformly by treating both as binary addition, leveraging the representation's property that negation is achieved by inverting bits and adding one. For example, adding 5 (binary 00000101 in 8 bits) and -3 (two's complement of 3: 11111101) yields 00000101 + 11111101 = 00000010 (2), with the carry-out from the sign bit discarded. This approach simplifies hardware design, as the arithmetic logic unit (ALU) uses the same adder circuit for both operations, generating flags for carry (from the most significant bit) and overflow (when signs mismatch but result sign differs). Multiplication of integers can be implemented via repeated for general cases, but for multipliers that are powers of two, it reduces to a left shift operation, which is highly efficient in hardware. Division follows a similar pattern: for divisors that are powers of two, a right shift suffices, while general division employs algorithms like or restoring/non-restoring methods, often approximated through reciprocal in to avoid dedicated divider hardware. For instance, dividing by 10 can be approximated as multiplying by a fixed reciprocal (e.g., 0.8 approximated in binary) followed by a shift, with error correction for precision. Modern processors include dedicated multipliers and dividers, but shift-based techniques remain common in embedded systems without full hardware support. Bitwise operations manipulate individual bits of integer representations and include logical AND (&), OR (|), XOR (^), and shifts (left <<, right >>). The AND operation sets a bit to 1 only if both operands have 1 in that position, useful for masking (e.g., x & 0xFF extracts the lowest 8 bits). OR sets a bit to 1 if either operand has 1, enabling bit packing for flags or permissions (e.g., combining multiple states into one integer). XOR toggles bits where operands differ, often for bit flipping or simple encryption. Shift operations reposition bits: a left shift by kk bits multiplies the integer by 2k2^k for unsigned values (or signed if no overflow), filling low bits with zeros, as in xk=x×2kx \ll k = x \times 2^k. A right shift by kk bits divides by 2k2^k, but behavior differs by type: logical right shift (unsigned) fills high bits with zeros, while arithmetic right shift (signed, in ) preserves the by filling with copies of it, ensuring xk=x/2kx \gg k = \lfloor x / 2^k \rfloor for negative values. These operations are fundamental for tasks like aligning data or implementing arithmetic on low-level hardware. In low-level code, bitwise operations are typically faster than equivalent arithmetic ones on simple processors, as they require fewer ALU cycles—often a single instruction versus multiple for or division—due to direct bit-level hardware support. This efficiency makes them preferable for bit packing in flags or permissions, where arithmetic alternatives would introduce unnecessary overhead. Overflow may occur in arithmetic operations but is handled separately in dedicated sections.

Syntax in Programming Languages

In statically typed languages such as C and Java, integer variables must be explicitly declared with their type prior to use, specifying the intended size and signedness. For instance, in C, the declaration syntax is int x = 42;, where int typically denotes a 32-bit signed integer and 42 is the initial decimal value. In Java, the equivalent syntax is int x = 42;, defining a primitive 32-bit signed integer with the same initialization. By contrast, Python employs dynamic typing, inferring the integer type from the assignment without explicit declaration, as in x = 42, which creates an arbitrary-precision integer object. Integer literals represent constant values and support multiple bases for flexibility in expression. In C, decimal literals appear as plain digits (e.g., 42), octal as a leading zero followed by digits (e.g., 052 equaling 42 in ), and hexadecimal as 0x or 0X prefix with digits or letters A-F (e.g., 0x2A for 42). Optional suffixes specify variants, such as L or l for long (e.g., 42L) or U or u for unsigned (e.g., 42U). Java mirrors this closely, with decimal 42, octal 052, hexadecimal 0x2A, and long suffix L (e.g., 42L); binary literals use 0b or 0B prefix since Java SE 7 (e.g., 0b101010 for 42). Python uses decimal 42, hexadecimal 0x2A, octal 0o52 (note the o prefix), and binary 0b101010, with no suffixes required due to . Operations on integers employ consistent across languages for arithmetic and bitwise manipulation. Arithmetic operators include (+), (-), (*), division (/), and modulus (%), as in the C or expression int result = x + y; or Python's result = x + y. Bitwise operators encompass (|), XOR (^), NOT (~), left shift (<<), and right shift (>>), exemplified by x & y in all three languages to perform bit-level AND. These operators apply directly to integer operands, with precedence rules dictating evaluation order (e.g., * before +). Conversions between integer types or from other types to integers involve explicit syntax to ensure type safety. In C, explicit casting uses parenthesized type names, such as (int)3.14 to convert a floating-point value to the integer 3 by truncation; implicit promotion automatically widens narrower types like short to int in expressions. Java mandates explicit casting for narrowing (e.g., (int) someLong to convert a 64-bit long to 32-bit int, potentially losing data), while allowing implicit widening from int to long. Python relies on constructor functions for conversions, like int(3.14) yielding 3, without a casting operator; implicit handling occurs in mixed-type operations via promotion. Language differences in integer syntax stem primarily from typing paradigms and runtime behaviors. C and Java's static typing requires compile-time type declarations and checks, enforcing explicit syntax for declarations and casts to prevent mismatches, whereas Python's dynamic typing omits declarations and performs type resolution at runtime for more concise code. In terms of overflow during operations, C treats signed integer overflow as undefined behavior (often wrapping in implementations), Java's primitive integers wrap silently around the type's range, and Python avoids overflow entirely through automatic extension to arbitrary precision.

Limits and Implementation

Numerical Ranges and Overflow

Integer types in computer science have finite numerical ranges determined by their bit width, which directly influences the values they can represent without error. For an unsigned n-bit integer, the range spans from 0 to 2n12^n - 1, allowing representation of all non-negative values up to that maximum. This formulation arises because each bit position contributes a power-of-two value, exhausting all possible combinations for non-negative integers. In contrast, signed n-bit integers, typically using the representation, accommodate both positive and negative values with a range from 2n1-2^{n-1} to 2n112^{n-1} - 1. The negative bound includes one more value than the positive side due to the asymmetric allocation of the , ensuring a balanced set of representable integers around zero. When arithmetic operations exceed these bounds, overflow or underflow occurs, potentially leading to incorrect results or vulnerabilities. Overflow happens in or when the result surpasses the maximum representable value, while underflow arises in or when dipping below the minimum. For unsigned integers, overflow triggers wrap-around semantics, performing the operation 2n2^n. For instance, in an 8-bit unsigned integer, 255 + 1 equals 0, as the computation effectively reduces 256. This defined behavior in languages like and ensures predictable outcomes for unsigned types, aligning with principles. Signed overflow, however, often results in in standards such as , where implementations may wrap around but are not required to, potentially causing crashes or exploitable errors. Underflow exhibits analogous wrap-around effects in systems. For an 8-bit signed integer, the minimum value is -128; subtracting 1 from this yields 127, as the operation wraps around the range. This behavior stems from the fixed bit pattern inversion and addition inherent to arithmetic, treating the underflow as an overflow in the positive direction. Like overflow, signed underflow is undefined in many languages, though practical implementations on hardware mirror the wrap-around. Unsigned underflow is impossible since the range starts at zero, with subtraction below zero wrapping to the maximum value. Detection of overflow and underflow is crucial for robust software. At the hardware level, processors like Intel x86 set the (OF) in the EFLAGS register after signed arithmetic instructions if the result's sign bit differs from the expected outcome based on operand signs. The (CF) similarly detects unsigned overflow. Software mechanisms build on this; for example, Rust's provides methods like checked_add on primitives, which return Some(result) on success or None if overflow or underflow is detected, enabling explicit handling without . Historical examples illustrate the consequences of range limits. The Y2K problem, or Year 2000 issue, stemmed from widespread use of two-digit year fields in date representations to save storage space, causing systems to misinterpret 00 as 1900 rather than 2000 and leading to potential rollovers in calculations. This space-saving design choice mirrored overflow risks by confining values to narrow ranges. A comparable modern concern is the , where 32-bit systems using (seconds since -01-01) will overflow at 23112^{31} - 1 seconds, around 03:14:07 UTC on January 19, 2038, potentially causing dates to wrap around to 1970 in unmitigated software. In modern , 64-bit integers significantly mitigate such issues, offering ranges up to approximately 9×10189 \times 10^{18} for signed types, far exceeding typical application needs and reducing overflow likelihood in everyday operations.

Fixed vs. Arbitrary Precision

Fixed-precision integers in are implemented with a predetermined number of bits, typically aligned to the processor's native word size—such as 32 or 64 bits—to leverage hardware instructions for rapid arithmetic and bitwise operations, though this imposes strict limits on the representable range. These types, like the int64_t in C++ or Java's long, provide predictable performance in performance-critical scenarios but cannot handle values exceeding their bit width without overflow or modular wrapping. In contrast, arbitrary-precision integers, also known as bignums, support unbounded sizes through software emulation, allowing representation of integers of any magnitude limited only by available memory. Languages and libraries achieve this by storing numbers as arrays of fixed-size "limbs"—each a machine word (e.g., 32 or 64 bits)—representing the value in base 2322^{32} or 2642^{64}, with the full integer formed by concatenating these limbs in little-endian order. For instance, 's BigInteger class uses an array of bytes for two's-complement storage, while the GNU Multiple Precision Arithmetic Library (GMP) employs mp_limb_t arrays for signed integers. Python's built-in int type defaults to arbitrary precision, seamlessly transitioning from small fixed-width representations to multi-limb arrays as values grow. Core operations on arbitrary-precision integers, such as , rely on algorithms optimized for large inputs to mitigate the quadratic of naive methods. For numbers beyond a threshold (e.g., a few hundred bits), libraries like GMP invoke the , a divide-and-conquer technique that multiplies two nn-bit numbers using three multiplications of n/2n/2-bit numbers, achieving O(n1.585)O(n^{1.585}) complexity and reducing overhead for medium-sized operands. This approach incurs significant space overhead from the limb arrays and time penalties from software loops, often 10-100 times slower than hardware-accelerated fixed-precision operations, though assembly-optimized implementations in GMP minimize this gap. Arbitrary-precision integers are essential in domains requiring exact computation of enormous values, such as —where RSA keys often exceed 2048 bits, necessitating on numbers far beyond fixed limits—and symbolic , as in systems for manipulating polynomials or factorials like 1000!1000! without approximation errors. In these contexts, libraries like GMP underpin cryptographic toolkits and symbolic engines, ensuring precision at the cost of efficiency. The primary trade-off favors fixed-precision for high-throughput applications, such as numerical simulations or loop-intensive where bounded ranges suffice and hardware speed is paramount, while arbitrary-precision suits infrequent but exact large-integer tasks, accepting runtime and overhead for unbounded —e.g., GMP operations on 1024-bit numbers may take microseconds versus nanoseconds for 64-bit natives.

References

Add your contribution
Related Hubs
Contribute something
User Avatar
No comments yet.