Recent from talks
Contribute something
Nothing was collected or created yet.
Integer (computer science)
View on WikipediaIn computer science, an integer is a datum of integral data type, a data type that represents some range of mathematical integers.[1] Integral data types may be of different sizes and may or may not be allowed to contain negative values. Integers are commonly represented in a computer as a group of binary digits (bits). The size of the grouping varies so the set of integer sizes available varies between different types of computers. Computer hardware nearly always provides a way to represent a processor register or memory address as an integer.
Value and representation
[edit]The value of an item with an integral type is the mathematical integer that it corresponds to. Integral types may be unsigned (capable of representing only non-negative integers) or signed (capable of representing negative integers as well).[2]
An integer value is typically specified in the source code of a program as a sequence of digits optionally prefixed with + or −. Some programming languages allow other notations, such as hexadecimal (base 16) or octal (base 8). Some programming languages also permit digit group separators.[3]
The internal representation of this datum is the way the value is stored in the computer's memory. Unlike mathematical integers, a typical datum in a computer has some minimal and maximum possible value.
The most common representation of a positive integer is a string of bits, using the binary numeral system. The order of the memory bytes storing the bits varies; see endianness. The width, precision, or bitness[4] of an integral type is the number of bits in its representation. An integral type with n bits can encode 2n numbers; for example an unsigned type typically represents the non-negative values 0 through 2n − 1. Other encodings of integer values to bit patterns are sometimes used, for example binary-coded decimal or Gray code, or as printed character codes such as ASCII.
There are four well-known ways to represent signed numbers in a binary computing system. The most common is two's complement, which allows a signed integral type with n bits to represent numbers from −2(n−1) through 2(n−1) − 1. Two's complement arithmetic is convenient because there is a perfect one-to-one correspondence between representations and values (in particular, no separate +0 and −0), and because addition, subtraction and multiplication do not need to distinguish between signed and unsigned types. Other possibilities include offset binary, sign-magnitude, and ones' complement.
Some computer languages define integer sizes in a machine-independent way; others have varying definitions depending on the underlying processor word size. Not all language implementations define variables of all integer sizes, and defined sizes may not even be distinct in a particular implementation. An integer in one programming language may be a different size in a different language, on a different processor, or in an execution context of different bitness; see § Words.
Some older computer architectures used decimal representations of integers, stored in binary-coded decimal (BCD) or other format. These values generally require data sizes of 4 bits per decimal digit (sometimes called a nibble), usually with additional bits for a sign. Many modern CPUs provide limited support for decimal integers as an extended datatype, providing instructions for converting such values to and from binary values. Depending on the architecture, decimal integers may have fixed sizes (e.g., 7 decimal digits plus a sign fit into a 32-bit word), or may be variable-length (up to some maximum digit size), typically occupying two digits per byte (octet).
Common integral data types
[edit]| Bits | Name | Range (assuming two's complement for signed) | Decimal digits | Uses | Implementations | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| C/C++ | C# | Pascal and Delphi | Java | SQL[a] | FORTRAN | D | Rust | |||||
| 4 | nibble, semioctet | Signed: From −8 to 7, from −(23) to 23 − 1 | 0.9 | Binary-coded decimal, single decimal digit representation | — | |||||||
| Unsigned: From 0 to 15, which equals 24 − 1 | 1.2 | |||||||||||
| 8 | byte, octet, i8, u8 | Signed: From −128 to 127, from −(27) to 27 − 1 | 2.11 | ASCII characters, code units in the UTF-8 character encoding | int8_t, signed char[b] | sbyte | Shortint | byte | tinyint | INTEGER[c] | byte | i8 |
| Unsigned: From 0 to 255, which equals 28 − 1 | 2.41 | uint8_t, unsigned char[b] | byte | Byte | — | unsigned tinyint | — | ubyte | u8 | |||
| 16 | halfword, word, short, i16, u16 | Signed: From −32,768 to 32,767, from −(215) to 215 − 1 | 4.52 | UCS-2 characters, code units in the UTF-16 character encoding | int16_t, short,[b] int[b] | short | Smallint | short | smallint | INTEGER[c] | short | i16 |
| Unsigned: From 0 to 65,535, which equals 216 − 1 | 4.82 | uint16_t, unsigned,[b] unsigned int[b] | ushort | Word | char[d] | unsigned smallint | — | ushort | u16 | |||
| 32 | word, long, doubleword, longword, int, i32, u32 | Signed: From −2,147,483,648 to 2,147,483,647, from −(231) to 231 − 1 | 9.33 | UTF-32 characters, true color with alpha, FourCC, pointers in 32-bit computing | int32_t, int,[b] long[b] | int | LongInt; Integer[e] | int | int | INTEGER[c] | int | i32 |
| Unsigned: From 0 to 4,294,967,295, which equals 232 − 1 | 9.63 | uint32_t, unsigned,[b] unsigned int,[b] unsigned long[b] | uint | LongWord; DWord; Cardinal[e] | — | unsigned int | — | uint | u32 | |||
| 64 | word, doubleword, longword, long, long long, quad, quadword, qword, int64, i64, u64 | Signed: From −(263) to 263 − 1 | 18.96 | Time (e.g. milliseconds since the Unix epoch), pointers in 64-bit computing | int64_t, long,[b] long long[b] | long | Int64 | long | bigint | INTEGER[c] | long | i64 |
| Unsigned: From 0 to 264 − 1 | 19.27 | uint64_t, unsigned long long[b] | ulong | UInt64; QWord | — | unsigned bigint | — | ulong | u64 | |||
| 128 | octaword, double quadword, i128, u128 | Signed: From −(2127) to 2127 − 1 | 38.23 | Complex scientific calculations, | Only available as non-standard or compiler-specific extensions | cent[f] | i128 | |||||
| Unsigned: From 0 to 2128 − 1 | 38.53 | ucent[f] | u128 | |||||||||
| n | n-bit integer (general case) |
Signed: −(2n−1) to (2n−1 − 1) | (n − 1) log10 2 | C23: _BitInt(n), signed _BitInt(n) | Ada: range -2**(n-1)..2**(n-1)-1
| |||||||
| Unsigned: 0 to (2n − 1) | n log10 2 | C23: unsigned _BitInt(n) | Ada: range 0..2**n-1, mod 2**n; standard libraries' or third-party arbitrary arithmetic libraries' BigDecimal or Decimal classes in many languages such as Python, C++, etc.
| |||||||||
Different CPUs support different integral data types. Typically, hardware will support both signed and unsigned types, but only a small, fixed set of widths.
The table above lists integral type widths that are supported in hardware by common processors. High-level programming languages provide more possibilities. It is common to have a 'double width' integral type that has twice as many bits as the biggest hardware-supported type. Many languages also have bit-field types (a specified number of bits, usually constrained to be less than the maximum hardware-supported width) and range types (that can represent only the integers in a specified range).
Some languages, such as Lisp, Smalltalk, REXX, Haskell, Python, and Raku, support arbitrary precision integers (also known as infinite precision integers or bignums). Other languages that do not support this concept as a top-level construct may have libraries available to represent very large numbers using arrays of smaller variables, such as Java's java.math.BigInteger class or Perl's "bigint" package.[7] These use as much of the computer's memory as is necessary to store the numbers; however, a computer has only a finite amount of storage, so they, too, can only represent a finite subset of the mathematical integers. These schemes support very large numbers; for example one kilobyte of memory could be used to store numbers up to 2466 decimal digits long.
A Boolean type is a type that can represent only two values: 0 and 1, usually identified with false and true respectively. This type can be stored in memory using a single bit, but is often given a full byte for convenience of addressing and speed of access.
A four-bit quantity is known as a nibble (when eating, being smaller than a bite) or nybble (being a pun on the form of the word byte). One nibble corresponds to one digit in hexadecimal and holds one digit or a sign code in binary-coded decimal.
Bytes and octets
[edit]The term byte initially meant 'the smallest addressable unit of memory'. In the past, 5-, 6-, 7-, 8-, and 9-bit bytes have all been used. There have also been computers that could address individual bits ('bit-addressed machine'), or that could only address 16- or 32-bit quantities ('word-addressed machine'). The term byte was usually not used at all in connection with bit- and word-addressed machines.
The term octet always refers to an 8-bit quantity. It is mostly used in the field of computer networking, where computers with different byte widths might have to communicate.
In modern usage byte almost invariably means eight bits, since all other sizes have fallen into disuse; thus byte has come to be synonymous with octet.
Words
[edit]The term 'word' is used for a small group of bits that are handled simultaneously by processors of a particular architecture. The size of a word is thus CPU-specific. Many different word sizes have been used, including 6-, 8-, 12-, 16-, 18-, 24-, 32-, 36-, 39-, 40-, 48-, 60-, and 64-bit. Since it is architectural, the size of a word is usually set by the first CPU in a family, rather than the characteristics of a later compatible CPU. The meanings of terms derived from word, such as longword, doubleword, quadword, and halfword, also vary with the CPU and OS.[8]
Practically all new desktop processors are capable of using 64-bit words, though embedded processors with 8- and 16-bit word size are still common. The 36-bit word length was common in the early days of computers.
One important cause of non-portability of software is the incorrect assumption that all computers have the same word size as the computer used by the programmer. For example, if a programmer using the C language incorrectly declares as int a variable that will be used to store values greater than 215−1, the program will fail on computers with 16-bit integers. That variable should have been declared as long, which has at least 32 bits on any computer. Programmers may also incorrectly assume that a pointer can be converted to an integer without loss of information, which may work on (some) 32-bit computers, but fail on 64-bit computers with 64-bit pointers and 32-bit integers. This issue is resolved by C99 in stdint.h in the form of intptr_t.
The bitness of a program may refer to the word size (or bitness) of the processor on which it runs, or it may refer to the width of a memory address or pointer, which can differ between execution modes or contexts. For example, 64-bit versions of Microsoft Windows support existing 32-bit binaries, and programs compiled for Linux's x32 ABI run in 64-bit mode yet use 32-bit memory addresses.[9]
Standard integer
[edit]The standard integer size is platform-dependent.
In C, it is denoted by int and required to be at least 16 bits. Windows and Unix systems have 32-bit ints on both 32-bit and 64-bit architectures.
Short integer
[edit]A short integer can represent a whole number that may take less storage, while having a smaller range, compared with a standard integer on the same machine.
In C, it is denoted by short. It is required to be at least 16 bits, and is often smaller than a standard integer, but this is not required.[10][11] A conforming program can assume that it can safely store values between −(215 − 1)[12] and 215 − 1,[13] but it may not assume that the range is not larger. In Java, a short is always a 16-bit integer. In the Windows API, the datatype SHORT is defined as a 16-bit signed integer on all machines.[8]
| Programming language | Data type name | Signedness | Size in bytes | Minimum value | Maximum value |
|---|---|---|---|---|---|
| C and C++ | short | signed | 2 | −32,767[g] | +32,767 |
| unsigned short | unsigned | 2 | 0 | 65,535 | |
| C# | short | signed | 2 | −32,768 | +32,767 |
| ushort | unsigned | 2 | 0 | 65,535 | |
| Java | short | signed | 2 | −32,768 | +32,767 |
| SQL | smallint | signed | 2 | −32,768 | +32,767 |
Long integer
[edit]A long integer can represent a whole integer whose range is greater than or equal to that of a standard integer on the same machine.
In C, it is denoted by long. It is required to be at least 32 bits, and may or may not be larger than a standard integer. A conforming program can assume that it can safely store values between −(231 − 1)[12] and 231 − 1,[13] but it may not assume that the range is not larger.
| Programming language | Approval Type | Platforms | Data type name | Storage in bytes | Signed range | Unsigned range |
|---|---|---|---|---|---|---|
| C ISO/ANSI C99 | International Standard | Unix, 16/32-bit systems[8] Windows, 16/32/64-bit systems[8] |
long | 4 (minimum requirement 4) |
−2,147,483,647 to +2,147,483,647 | 0 to 4,294,967,295 (minimum requirement) |
| C ISO/ANSI C99 | International Standard | Unix, 64-bit systems[8][11] |
long | 8 (minimum requirement 4) |
−9,223,372,036,854,775,807 to +9,223,372,036,854,775,807 | 0 to 18,446,744,073,709,551,615 |
| C++ ISO/ANSI | International Standard | Unix, Windows, 16/32-bit system |
long | 4 [14] (minimum requirement 4) |
−2,147,483,648 to +2,147,483,647 |
0 to 4,294,967,295 (minimum requirement) |
| C++/CLI | International Standard ECMA-372 |
Unix, Windows, 16/32-bit systems |
long | 4 [15] (minimum requirement 4) |
−2,147,483,648 to +2,147,483,647 |
0 to 4,294,967,295 (minimum requirement) |
| VB | Company Standard | Windows | Long | 4 [16] | −2,147,483,648 to +2,147,483,647 | — |
| VBA | Company Standard | Windows, Mac OS X | Long | 4[17] | −2,147,483,648 to +2,147,483,647 | — |
| SQL Server | Company Standard | Windows | BigInt | 8 | −9,223,372,036,854,775,808 to +9,223,372,036,854,775,807 | 0 to 18,446,744,073,709,551,615 |
| C#/ VB.NET | ECMA International Standard | Microsoft .NET | long or Int64 | 8 | −9,223,372,036,854,775,808 to +9,223,372,036,854,775,807 | 0 to 18,446,744,073,709,551,615 |
| Java | International/Company Standard | Java platform | long | 8 | −9,223,372,036,854,775,808 to +9,223,372,036,854,775,807 | — |
| Pascal | ? | Windows, UNIX | int64 | 8 | −9,223,372,036,854,775,808 to +9,223,372,036,854,775,807 | 0 to 18,446,744,073,709,551,615 (Qword type) |
Long long
[edit]In the C99 version of the C programming language and the C++11 version of C++, a long long type is supported that has double the minimum capacity of the standard long. This type is not supported by compilers that require C code to be compliant with the previous C++ standard, C++03, because the long long type did not exist in C++03. For an ANSI/ISO compliant compiler, the minimum requirements for the specified ranges, that is, −(263 − 1)[12] to 263 − 1 for signed and 0 to 264 − 1 for unsigned,[13] must be fulfilled; however, extending this range is permitted.[18][19] This can be an issue when exchanging code and data between platforms, or doing direct hardware access. Thus, there are several sets of headers providing platform independent exact width types. The C standard library provides stdint.h; this was introduced in C99 and C++11.
Syntax
[edit]Integer literals can be written as regular Arabic numerals, consisting of a sequence of digits and with negation indicated by a minus sign before the value. However, most programming languages disallow use of commas or spaces for digit grouping. Examples of integer literals are:
4210000-233000
There are several alternate methods for writing integer literals in many programming languages:
- Many programming languages, especially those influenced by C, prefix an integer literal with
0Xor0xto represent a hexadecimal value, e.g.0xDEADBEEF. Other languages may use a different notation, e.g. some assembly languages append anHorhto the end of a hexadecimal value. - Perl, Ruby, Java, Julia, D, Go, C#, Rust, Python (starting from version 3.6), and PHP (from version 7.4.0 onwards[20]) allow embedded underscores for clarity, e.g.
10_000_000, and fixed-form Fortran ignores embedded spaces in integer literals. C (starting from C23) and C++ use single quotes for this purpose. - In C and C++, a leading zero indicates an octal value, e.g.
0755. This was primarily intended to be used with Unix modes; however, it has been criticized because normal integers may also lead with zero.[21] As such, Python, Ruby, Haskell, and OCaml prefix octal values with0Oor0o, following the layout used by hexadecimal values. - Several languages, including Java, C#, Scala, Python, Ruby, OCaml, C (starting from C23) and C++ can represent binary values by prefixing a number with
0Bor0b.
Extreme values
[edit]In many programming languages, there exist predefined constants representing the least and the greatest values representable with a given integer type.
Names for these include
- SmallBASIC:
MAXINT[22] - Java:
java.lang.Integer.MAX_VALUE,java.lang.Integer.MIN_VALUE[23]- Corresponding fields exist for the other integer classes in Java.
- C:
INT_MAX,INT_MIN, etc.[citation needed][24] - C++:
std::numeric_limits<int>::max(),std::numeric_limits<int>::min(), etc.[25] - Haskell:
minBound,maxBound[27] - Pascal:
MaxInt[citation needed] - Python 2:
sys.maxint[28] - Python 3:
sys.maxsize[29] - Rust:
i32::MAX,i32::MIN, etc.[30] - Turing:
maxint[31]
See also
[edit]Notes
[edit]- ^ Not all SQL dialects have unsigned datatypes.[5][6]
- ^ a b c d e f g h i j k l m n The sizes of char, short, int, long and long long in C/C++ are dependent upon the implementation of the language.
- ^ a b c d Fortan uses 'kinds' to control the size of integers. Parameterized constants defining the available kinds are available in the iso_fortran_env intrinsic module. Constants defining C compatible kinds are available in the iso_c_binding intrinsic module.
- ^ Java does not directly support arithmetic on char types. The results must be cast back into char from an int.
- ^ a b The sizes of Delphi's Integer and Cardinal are not guaranteed, varying from platform to platform; usually defined as LongInt and LongWord respectively.
- ^ a b Reserved for future use. Not implemented yet.
- ^ The ISO C standard allows implementations to reserve the value with sign bit 1 and all other bits 0 (for sign–magnitude and two's complement representation) or with all bits 1 (for ones' complement) for use as a "trap" value, used to indicate (for example) an overflow.[12]
References
[edit]- ^ "Integer". A Dictionary of Computer Science. Oxford University Press. 21 January 2016. ISBN 978-0-19-968897-5.
- ^ Cheever, Eric. "Representation of numbers". Swarthmore College. Retrieved 2011-09-11.
- ^ Madhusudhan Konda (2011-09-02). "A look at Java 7's new features - O'Reilly Radar". Radar.oreilly.com. Retrieved 2013-10-15.
- ^ Barr, Adam (2018-10-23). The Problem with Software: Why Smart Engineers Write Bad Code. MIT Press. ISBN 978-0-262-34821-8.
- ^ "Sybase Adaptive Server Enterprise 15.5: Exact Numeric Datatypes".
- ^ "MySQL 5.6 Numeric Datatypes".
- ^ "BigInteger (Java Platform SE 6)". Oracle. Retrieved 2011-09-11.
- ^ a b c d e Fog, Agner (2010-02-16). "Calling conventions for different C++ compilers and operating systems: Chapter 3, Data Representation" (PDF). Retrieved 2010-08-30.
- ^ Thorsten Leemhuis (2011-09-13). "Kernel Log: x32 ABI gets around 64-bit drawbacks". www.h-online.com. Archived from the original on 28 October 2011. Retrieved 2011-11-01.
- ^ Giguere, Eric (1987-12-18). "The ANSI Standard: A Summary for the C Programmer". Retrieved 2010-09-04.
- ^ a b Meyers, Randy (2000-12-01). "The New C: Integers in C99, Part 1". drdobbs.com. Retrieved 2010-09-04.
- ^ a b c d "ISO/IEC 9899:201x" (PDF). open-std.org. section 6.2.6.2, paragraph 2. Retrieved 2016-06-20.
- ^ a b c "ISO/IEC 9899:201x" (PDF). open-std.org. section 5.2.4.2.1. Retrieved 2016-06-20.
- ^ "Fundamental types in C++". cppreference.com. Archived from the original on 13 June 2011. Retrieved 5 December 2010.
- ^ "Chapter 8.6.2 on page 12" (PDF). ecma-international.org.
- ^ VB 6.0 help file
- ^ "The Integer, Long, and Byte Data Types (VBA)". microsoft.com. Retrieved 2006-12-19.
- ^ Giguere, Eric (December 18, 1987). "The ANSI Standard: A Summary for the C Programmer". Retrieved 2010-09-04.
- ^ "American National Standard Programming Language C specifies the syntax and semantics of programs written in the C programming language". Archived from the original on 2010-08-22. Retrieved 2010-09-04.
- ^ "PHP: Integers - Manual".
- ^ ECMAScript 6th Edition draft: https://people.mozilla.org/~jorendorff/es6-draft.html#sec-literals-numeric-literals Archived 2013-12-16 at the Wayback Machine
- ^ "SmallBASIC | MAXINT". Retrieved 2025-01-20.
- ^ "Integer (Java Platform SE 8 )". Retrieved 2025-01-20.
- ^ "Numeric limits". cppreference.com.
- ^ "std::numeric_limits". cppreference.com.
- ^ "Limits of Basic Types". Retrieved 2025-01-20.
- ^ "Prelude".
- ^ "28.1. sys — System-specific parameters and functions — Python 2.7.18 documentation". docs.python.org. Retrieved 17 September 2025.
- ^ "sys — System-specific parameters and functions". Python 3.13.7 documentation. Retrieved 17 September 2025.
- ^ "Primitive Type i32". doc.rust-lang.org/std.
- ^ Grogono, Peter (1995). Programming with Turing and Object Oriented Turing. New York: Springer. p. 363. doi:10.1007/978-1-4612-4238-3. ISBN 978-0-387-94517-0. LCCN 95010802.
Integer (computer science)
View on Grokipediaint type is typically 32 bits, supporting signed ranges from -2,147,483,648 to +2,147,483,647.[3] For broader portability, extended types like long long use 64 bits, extending the signed range to approximately -9.22 × 10¹⁸ to +9.22 × 10¹⁸.[3] Overflow occurs when operations exceed these bounds, potentially wrapping around in two's complement systems—for instance, adding 1 to the maximum 8-bit signed value of 127 yields -128—necessitating careful handling in software to avoid undefined behavior.[2] Byte order, or endianness, also influences multi-byte integer storage, with big-endian placing the most significant byte first and little-endian the reverse, affecting data interchange across systems.[2]
Fundamentals
Definition and Purpose
In computer science, an integer is a fundamental data type that represents discrete whole numbers, encompassing positive, negative, and zero values without any fractional or decimal components. This type is designed to model exact counts and quantities in computational processes, forming a core building block for programming languages and hardware implementations. Typically, integers are encoded using binary (base-2) representation to align with the binary nature of digital systems, enabling efficient storage and manipulation on processors.[5] The primary purpose of integers lies in their utility for precise, error-free numerical tasks that demand exactness, such as indexing elements in arrays, incrementing loop counters in iterative algorithms, addressing locations in computer memory, and performing arithmetic operations in data processing routines. For instance, in array-based structures, integers serve as indices to access specific elements reliably, while in control flow mechanisms like for-loops, they track iteration counts to ensure predictable execution. In memory management, unsigned integers often represent byte addresses, allowing programs to navigate vast address spaces without ambiguity. These applications leverage integers' inherent exactness for counting occurrences—such as tallying frequencies in datasets—or assigning unique identifiers in databases, where fractional precision is irrelevant and could introduce complications.[6][5][7][8] Unlike floating-point types, which approximate real numbers through binary fractions and thus incur rounding errors in storage and operations, integers maintain absolute precision within their defined range, avoiding any representational inaccuracies during addition, subtraction, or multiplication of supported values. This distinction ensures that integer-based computations yield deterministic results, critical for applications requiring reliability, such as financial tallies or cryptographic hashing, where even minor deviations could propagate significant issues. The Goldberg analysis underscores that while floating-point arithmetic suits continuous approximations like scientific simulations, integers excel in discrete, verifiable calculations inherent to algorithmic logic.[9][10]Historical Context
The handling of integers in early computers began with the ENIAC, completed in 1945, which represented numbers in a 10-digit decimal format using ring counters and vacuum tubes for arithmetic operations.[11] This decimal approach facilitated human-readable input and output but proved cumbersome for electronic implementation. In contrast, John von Neumann's 1945 report on the EDVAC architecture proposed a shift to binary representation for all data, including integers, enabling more efficient stored-program computing with 27-bit binary numbers as standard for precision in calculations.[12] A key advancement in the 1950s was the development and adoption of two's complement representation for signed integers, which simplified hardware arithmetic by allowing subtraction through addition and eliminating the need for separate positive and negative zero values. This method became widespread due to its efficiency in binary systems. By the mid-1960s, the IBM System/360 family further standardized integer handling by adopting an 8-bit byte and 32-bit word architecture across compatible models, establishing de facto industry norms for data sizes and enabling modular, interchangeable computing components.[13] The ANSI C standard, ratified in 1989, formalized integer types in programming by defining the basicint type as at least 16 bits wide, accommodating the growing prevalence of 32-bit systems while ensuring portability.[14] Meanwhile, the IEEE 754 standard for binary floating-point arithmetic, published in 1985, indirectly shaped integer handling by specifying exact conversions between integers and floating-point formats, promoting consistent behavior in mixed-precision computations.[15] This evolution from fixed-size representations in 1950s machines culminated in modern languages like Python 3.0 in 2008, which unified integers to support arbitrary precision seamlessly, allowing handling of numbers beyond hardware limits without explicit programmer intervention.[16]
Representation
Binary Encoding
In computer science, integers are encoded in binary form, representing each value as a sequence of bits—binary digits that are either 0 or 1—where the position of each bit corresponds to a power of 2, starting from the least significant bit (LSB) at position 0 with value . For instance, the decimal integer 5 is encoded as 101 in binary, equivalent to .[17][18] The number of bits allocated to an integer determines its representational capacity: with bits, exactly unique non-negative values can be distinguished, ranging from 0 to . This binary encoding leverages the efficiency of digital hardware, where bits map directly to electrical states in memory cells.[19][20] To convert a non-negative decimal integer to binary, the standard algorithm involves repeated division by 2, recording the remainder (0 or 1) at each step; the binary representation is then read from the remainders in reverse order, from the last remainder (LSB) to the first (most significant bit, MSB). For example, converting 13: 13 ÷ 2 = 6 remainder 1, 6 ÷ 2 = 3 remainder 0, 3 ÷ 2 = 1 remainder 1, 1 ÷ 2 = 0 remainder 1, yielding 1101 (i.e., ).[21][22] When storing multi-byte integers in memory, the order of bytes affects interpretation: in big-endian format, the MSB byte is stored at the lowest memory address, while in little-endian, the LSB byte is at the lowest address. For the 16-bit value 0x1234 (decimal 4660), big-endian storage places 0x12 at address 0 and 0x34 at address 1; little-endian reverses this to 0x34 at address 0 and 0x12 at address 1. This convention ensures consistent data portability across systems, with little-endian common in x86 architectures and big-endian in network protocols like TCP/IP.[23][24] For performance reasons, integers are typically aligned in memory at boundaries that are multiples of their size (e.g., a 4-byte integer at addresses divisible by 4), often through padding bytes, to enable efficient access by the processor's word-sized operations and avoid multiple memory fetches. Misaligned access can lead to exceptions or slowdowns on certain architectures.[25][26]Signed vs. Unsigned Integers
In computer science, integers are classified as signed or unsigned based on their ability to represent negative values. Unsigned integers are restricted to non-negative values, utilizing all available bits to encode the magnitude, which maximizes the positive range for a given bit width.[27] For an n-bit unsigned integer, the representable values range from $02^n - 1$; for example, an 8-bit unsigned integer (often denoted as uint8_t in languages like C) spans 0 to 255.[28] Signed integers, by contrast, incorporate a sign bit, typically the most significant bit, to indicate positive or negative values, thereby reducing the range available for magnitude compared to unsigned integers of the same size.[29] Common representation schemes for signed integers include sign-magnitude, where the sign bit precedes the absolute value encoded in the remaining bits; one's complement, which inverts all bits of the positive counterpart for negatives; and two's complement, which inverts bits and adds one for negatives (detailed further in the dedicated section).[30] These methods generally yield a symmetric range around zero, such as -2^{n-1} to 2^{n-1} - 1 for n bits in two's complement, though sign-magnitude and one's complement introduce dual representations for zero (+0 and -0), complicating arithmetic.[27] Unsigned integers are commonly used in programming for quantities that cannot be negative, such as array indices, memory addresses, counters, or buffer sizes, where the full positive range is beneficial.[7] Signed integers suit scenarios involving potential negatives, like coordinate systems, financial calculations with debits, or temperature measurements.[28] In languages like C and C++, this distinction is explicit through type declarations: int represents a signed integer (typically 32 bits, ranging from -2,147,483,648 to 2,147,483,647 on most platforms), while unsigned int uses the same bit width for 0 to 4,294,967,295, with operations behaving differently for overflow—unsigned wraps modularly without error, aiding loop counters but risking subtle bugs if mixed with signed types.[31] The primary trade-offs involve range and interpretation risks: unsigned integers double the maximum positive value at the cost of excluding negatives, and their modular overflow can prevent unexpected results in bounded operations but lead to misinterpretation if code assumes signed semantics, such as treating a large unsigned value as negative when cast.[28] Signed integers enable negative representation essential for general-purpose arithmetic but halve the positive range and introduce sign-related complexities in bitwise operations or comparisons across types.[30]Two's Complement System
The two's complement system is a method for representing signed integers in binary form, where positive numbers are encoded in their standard binary representation and negative numbers are derived by inverting all bits of the corresponding positive value and adding 1.[32] For example, to represent -5 in 4 bits, first encode +5 as 0101, invert the bits to obtain 1010, and add 1 to yield 1011.[32] This process ensures that the sign bit (the most significant bit) is 0 for positive numbers and 1 for negative numbers, while maintaining compatibility with binary arithmetic hardware.[32] In an n-bit two's complement representation, the possible values range from to .[33] For instance, an 8-bit system accommodates integers from -128 to 127, providing 256 distinct values but with an asymmetric distribution that favors negative numbers by one value.[33] The negation of a number x in this system is computed as , where denotes the bitwise complement, and any carry out of the most significant bit is discarded.[33] Zero is uniquely represented as all bits set to 0, avoiding dual representations that complicate comparisons in other signed formats.[33] This representation offers key advantages in hardware implementation and arithmetic operations. Addition and subtraction are unified: to subtract y from x, add the two's complement of y to x and ignore any overflow in the sign bit, eliminating the need for dedicated subtraction circuitry.[32][33] Negation requires only a bitwise invert and increment, without special hardware beyond standard adders, which simplifies processor design and reduces costs.[32] These properties make two's complement efficient for general-purpose computing, as the same binary adder handles both signed and unsigned operations with minimal adjustments.[33] However, the system has notable drawbacks. The asymmetric range means the largest positive value is one less than the magnitude of the smallest negative value, potentially limiting applications requiring balanced positive extents, such as certain signal processing tasks.[33] Additionally, while zero's unique representation aids equality checks, the overall scheme is susceptible to overflow (e.g., 127 + 1 yielding -128 in 8 bits) and underflow, which can produce incorrect results if not detected.[33] Two's complement was first proposed by John von Neumann in 1945 for the EDVAC and has since become the dominant format in most modern processor architectures, including the PDP-11 minicomputer introduced in 1970, x86, ARM, and DSPs from manufacturers like Texas Instruments and NXP.[34][35]Data Types and Sizes
Byte and Octet
A byte is a fundamental unit of digital information in computing, consisting of eight bits, which allows it to represent 256 distinct values.[36] In unsigned form, a byte can store integer values from 0 to 255, while in signed form using two's complement representation, it ranges from -128 to 127.[37] This 8-bit structure makes the byte the smallest addressable unit in most modern memory systems, serving as the building block for larger data types and storage. In networking contexts, the term octet is used interchangeably with byte to denote exactly eight bits, but it specifically emphasizes an unsigned interpretation to avoid ambiguities arising from varying byte sizes in older systems.[38] Protocols such as the Internet Protocol (IP) treat octets as unsigned 8-bit fields for transmitting data packets, ensuring consistent handling across diverse hardware.[39] For instance, IP addresses are structured as sequences of four octets, facilitating reliable data exchange in networks. Bytes and octets find widespread application in character encoding, where the American Standard Code for Information Interchange (ASCII) utilizes seven bits within an 8-bit byte to represent 128 printable and control characters, with the eighth bit often reserved for parity or extended sets.[40] They are also employed for small counters in software, such as loop indices or flags limited to 256 states, and in image processing, where individual pixel intensities in grayscale images are stored as unsigned byte values from 0 (black) to 255 (white).[41] Historically, the 8-bit byte emerged in the 1960s with IBM's Extended Binary Coded Decimal Interchange Code (EBCDIC), an encoding scheme designed for mainframe systems that required eight bits to accommodate a full set of characters including lowercase letters and symbols.[42] This size was later standardized in the POSIX operating system interface, where thechar type is defined as a single byte capable of holding 256 values in the POSIX locale.[43]
In programming languages like C, bytes are implemented via the char type, but variations exist between signed char (ranging -128 to 127) and unsigned char (0 to 255), with plain char potentially interpreted as either depending on the compiler implementation, leading to signedness ambiguity in portable code.[44] Programmers must explicitly choose signed or unsigned variants when using bytes for numeric purposes to ensure predictable behavior across platforms.
Word and Half-Word
In computer architecture, a word represents the natural unit of data processed by a processor, typically corresponding to the width of its registers and data paths. This size determines the efficiency of operations such as arithmetic, memory access, and instruction execution. Historically, word sizes varied based on hardware design; for instance, the PDP-1 minicomputer from 1960 used an 18-bit word, allowing for 4096 words of core memory as standard.[45] Similarly, the UNIVAC 1103 scientific computer, introduced in 1953, employed a 36-bit word size to handle complex numerical computations, with memory capacities up to 12,288 words.[46] Over time, word sizes evolved to accommodate growing demands for larger address spaces and computational power. Early systems like the PDP-1 and UNIVAC 1103 reflected the 1950s and 1960s era of diverse architectures, but by the 1970s and 1980s, 32-bit words became prevalent in microprocessors for balancing performance and cost. The shift to 64-bit words accelerated in the early 2000s, driven by the need for addressing vast memory spaces beyond the 4 GB limit of 32-bit systems; AMD's x86-64 architecture, introduced in 2003, extended the x86 instruction set to 64-bit registers and operations, enabling modern systems to handle up to 2^64 bytes of virtual address space.[47] Today, 32-bit and 64-bit words dominate, with the latter standard in desktop and server processors like x86-64 for efficient register operations and memory addressing.[48] For a 32-bit signed word using two's complement representation—the predominant method for signed integers—the range spans from -2^{31} to 2^{31} - 1, or -2147483648 to 2147483647.[49] A half-word, defined as half the size of a full word in many architectures, consists of 16 bits (2 bytes) and is commonly used in embedded systems and for smaller data units. In signed two's complement, a 16-bit half-word ranges from -32768 to 32767.[49] Half-words facilitate operations on compact datasets, such as in ARM architectures where instructions load or store 16-bit values into register portions.[50] Words and half-words are integral to memory addressing and register manipulations, where the processor's native word size optimizes data transfer; in x86-64 systems, 64-bit words align with the general-purpose registers (e.g., RAX) for seamless 64-bit integer handling.[51] To enhance performance, data structures involving words are often aligned to multiples of their size in memory. This alignment prevents data from straddling cache line boundaries—typically 64 bytes—reducing the number of memory fetches required and minimizing access latency, as misaligned accesses can demand two cache line loads instead of one.Standard Integer Variants (Short, Int, Long)
In computer science, standard integer variants such as short, int, and long provide portable abstractions for signed and unsigned integers in programming languages, with sizes defined by minimum requirements rather than fixed widths in most cases. These types are typically implemented as multiples of the 8-bit byte, allowing for efficient storage and computation on diverse hardware.[52] The short integer type is typically 16 bits (2 bytes) wide, offering a signed range from -32,768 to 32,767, and is often used to conserve memory in large arrays or when smaller values suffice.[52] In contrast, the int type serves as the default for general-purpose integers, with a minimum size of 16 bits but commonly 32 bits (4 bytes) on modern 32- and 64-bit systems, providing a signed range up to approximately ±2 billion.[52] The long type extends this further, requiring at least 32 bits and typically 32 or 64 bits depending on the platform, while C and C++ also support long long as a 64-bit extension introduced in C99 for wider ranges.[52][53] Standardization efforts address variability through fixed-width types. The C99 standard, via the <stdint.h> header, defines exact-width types like int16_t (precisely 16 bits, signed range -32,768 to 32,767), int32_t (32 bits, signed range -2,147,483,648 to 2,147,483,647), and int64_t (64 bits), which implementations provide when supported without padding bits.[52][53] In Java, these variants are rigidly fixed: short is always 16 bits (-32,768 to 32,767), int is 32 bits (-2,147,483,648 to 2,147,483,647), and long is 64 bits (-9,223,372,036,854,775,808 to 9,223,372,036,854,775,807), ensuring platform independence as specified in the Java Language Specification.[54] Portability challenges arise from platform-dependent sizes in languages like C, where int was historically 16 bits on systems such as 16-bit DOS compilers and the PDP-11, but 64 bits on Cray supercomputers, potentially leading to overflow or incorrect assumptions in cross-platform code.[55][52]| Type | Typical Size (bits) | Signed Range (example for typical size) | Standardization Reference |
|---|---|---|---|
| short | 16 | -32,768 to 32,767 | C99 <stdint.h> int16_t |
| int | 32 | -2,147,483,648 to 2,147,483,647 | C99 minimum 16 bits; Java fixed 32 bits |
| long | 32 or 64 | Varies; up to ±9 quintillion for 64 bits | C99 minimum 32 bits; Java fixed 64 bits |
Operations and Syntax
Arithmetic and Bitwise Operations
Integer arithmetic operations in computer science primarily involve addition, subtraction, multiplication, and division on binary representations of signed and unsigned integers. In systems using two's complement for signed integers, addition and subtraction are performed uniformly by treating both as binary addition, leveraging the representation's property that negation is achieved by inverting bits and adding one. For example, adding 5 (binary 00000101 in 8 bits) and -3 (two's complement of 3: 11111101) yields 00000101 + 11111101 = 00000010 (2), with the carry-out from the sign bit discarded.[32][56] This approach simplifies hardware design, as the arithmetic logic unit (ALU) uses the same adder circuit for both operations, generating flags for carry (from the most significant bit) and overflow (when signs mismatch but result sign differs).[56] Multiplication of integers can be implemented via repeated addition for general cases, but for multipliers that are powers of two, it reduces to a left shift operation, which is highly efficient in hardware. Division follows a similar pattern: for divisors that are powers of two, a right shift suffices, while general division employs algorithms like long division or restoring/non-restoring methods, often approximated through reciprocal multiplication in fixed-point arithmetic to avoid dedicated divider hardware. For instance, dividing by 10 can be approximated as multiplying by a fixed reciprocal (e.g., 0.8 approximated in binary) followed by a shift, with error correction for precision.[57][58] Modern processors include dedicated multipliers and dividers, but shift-based techniques remain common in embedded systems without full hardware support.[58] Bitwise operations manipulate individual bits of integer representations and include logical AND (&), OR (|), XOR (^), and shifts (left <<, right >>). The AND operation sets a bit to 1 only if both operands have 1 in that position, useful for masking (e.g., x & 0xFF extracts the lowest 8 bits). OR sets a bit to 1 if either operand has 1, enabling bit packing for flags or permissions (e.g., combining multiple states into one integer). XOR toggles bits where operands differ, often for bit flipping or simple encryption.[59][60] Shift operations reposition bits: a left shift by bits multiplies the integer by for unsigned values (or signed if no overflow), filling low bits with zeros, as in . A right shift by bits divides by , but behavior differs by type: logical right shift (unsigned) fills high bits with zeros, while arithmetic right shift (signed, in two's complement) preserves the sign bit by filling with copies of it, ensuring for negative values.[59][60] These operations are fundamental for tasks like aligning data or implementing arithmetic on low-level hardware. In low-level code, bitwise operations are typically faster than equivalent arithmetic ones on simple processors, as they require fewer ALU cycles—often a single instruction versus multiple for multiplication or division—due to direct bit-level hardware support.[61] This efficiency makes them preferable for bit packing in flags or permissions, where arithmetic alternatives would introduce unnecessary overhead.[62] Overflow may occur in arithmetic operations but is handled separately in dedicated sections.Syntax in Programming Languages
In statically typed languages such as C and Java, integer variables must be explicitly declared with their type prior to use, specifying the intended size and signedness. For instance, in C, the declaration syntax isint x = 42;, where int typically denotes a 32-bit signed integer and 42 is the initial decimal value. In Java, the equivalent syntax is int x = 42;, defining a primitive 32-bit signed integer with the same initialization.[63] By contrast, Python employs dynamic typing, inferring the integer type from the assignment without explicit declaration, as in x = 42, which creates an arbitrary-precision integer object.[64]
Integer literals represent constant values and support multiple bases for flexibility in expression. In C, decimal literals appear as plain digits (e.g., 42), octal as a leading zero followed by digits (e.g., 052 equaling 42 in decimal), and hexadecimal as 0x or 0X prefix with digits or letters A-F (e.g., 0x2A for 42). Optional suffixes specify variants, such as L or l for long (e.g., 42L) or U or u for unsigned (e.g., 42U).[65] Java mirrors this closely, with decimal 42, octal 052, hexadecimal 0x2A, and long suffix L (e.g., 42L); binary literals use 0b or 0B prefix since Java SE 7 (e.g., 0b101010 for 42).[63] Python uses decimal 42, hexadecimal 0x2A, octal 0o52 (note the o prefix), and binary 0b101010, with no suffixes required due to type inference.[66]
Operations on integers employ consistent infix notation across languages for arithmetic and bitwise manipulation. Arithmetic operators include addition (+), subtraction (-), multiplication (*), division (/), and modulus (%), as in the C or Java expression int result = x + y; or Python's result = x + y. Bitwise operators encompass AND (&), OR (|), XOR (^), NOT (~), left shift (<<), and right shift (>>), exemplified by x & y in all three languages to perform bit-level AND. These operators apply directly to integer operands, with precedence rules dictating evaluation order (e.g., * before +).[67]
Conversions between integer types or from other types to integers involve explicit syntax to ensure type safety. In C, explicit casting uses parenthesized type names, such as (int)3.14 to convert a floating-point value to the integer 3 by truncation; implicit promotion automatically widens narrower types like short to int in expressions.[68] Java mandates explicit casting for narrowing (e.g., (int) someLong to convert a 64-bit long to 32-bit int, potentially losing data), while allowing implicit widening from int to long. Python relies on constructor functions for conversions, like int(3.14) yielding 3, without a casting operator; implicit handling occurs in mixed-type operations via promotion.[69]
Language differences in integer syntax stem primarily from typing paradigms and runtime behaviors. C and Java's static typing requires compile-time type declarations and checks, enforcing explicit syntax for declarations and casts to prevent mismatches, whereas Python's dynamic typing omits declarations and performs type resolution at runtime for more concise code. In terms of overflow during operations, C treats signed integer overflow as undefined behavior (often wrapping in implementations), Java's primitive integers wrap silently around the type's range, and Python avoids overflow entirely through automatic extension to arbitrary precision.[64]
Limits and Implementation
Numerical Ranges and Overflow
Integer types in computer science have finite numerical ranges determined by their bit width, which directly influences the values they can represent without error. For an unsigned n-bit integer, the range spans from 0 to , allowing representation of all non-negative values up to that maximum.[70] This formulation arises because each bit position contributes a power-of-two value, exhausting all possible combinations for non-negative integers. In contrast, signed n-bit integers, typically using the two's complement representation, accommodate both positive and negative values with a range from to .[71] The negative bound includes one more value than the positive side due to the asymmetric allocation of the sign bit, ensuring a balanced set of representable integers around zero. When arithmetic operations exceed these bounds, overflow or underflow occurs, potentially leading to incorrect results or security vulnerabilities. Overflow happens in addition or multiplication when the result surpasses the maximum representable value, while underflow arises in subtraction or negation when dipping below the minimum. For unsigned integers, overflow triggers wrap-around semantics, performing the operation modulo . For instance, in an 8-bit unsigned integer, 255 + 1 equals 0, as the computation effectively reduces modulo 256.[72] This defined behavior in languages like C and C++ ensures predictable outcomes for unsigned types, aligning with modular arithmetic principles.[73] Signed integer overflow, however, often results in undefined behavior in standards such as C++, where implementations may wrap around but are not required to, potentially causing crashes or exploitable errors.[73] Underflow exhibits analogous wrap-around effects in two's complement systems. For an 8-bit signed integer, the minimum value is -128; subtracting 1 from this yields 127, as the operation wraps around the range.[71] This behavior stems from the fixed bit pattern inversion and addition inherent to two's complement arithmetic, treating the underflow as an overflow in the positive direction. Like overflow, signed underflow is undefined in many languages, though practical implementations on two's complement hardware mirror the wrap-around. Unsigned underflow is impossible since the range starts at zero, with subtraction below zero wrapping to the maximum value. Detection of overflow and underflow is crucial for robust software. At the hardware level, processors like Intel x86 set the overflow flag (OF) in the EFLAGS register after signed arithmetic instructions if the result's sign bit differs from the expected outcome based on operand signs.[74] The carry flag (CF) similarly detects unsigned overflow. Software mechanisms build on this; for example, Rust's standard library provides methods likechecked_add on integer primitives, which return Some(result) on success or None if overflow or underflow is detected, enabling explicit handling without undefined behavior.[75]
Historical examples illustrate the consequences of range limits. The Y2K problem, or Year 2000 issue, stemmed from widespread use of two-digit year fields in date representations to save storage space, causing systems to misinterpret 00 as 1900 rather than 2000 and leading to potential rollovers in calculations.[76] This space-saving design choice mirrored overflow risks by confining values to narrow ranges. A comparable modern concern is the Year 2038 problem, where 32-bit systems using Unix time (seconds since 1970-01-01) will overflow at seconds, around 03:14:07 UTC on January 19, 2038, potentially causing dates to wrap around to 1970 in unmitigated software.[77] In modern computing, 64-bit integers significantly mitigate such issues, offering ranges up to approximately for signed types, far exceeding typical application needs and reducing overflow likelihood in everyday operations.[78]
Fixed vs. Arbitrary Precision
Fixed-precision integers in computer science are implemented with a predetermined number of bits, typically aligned to the processor's native word size—such as 32 or 64 bits—to leverage hardware instructions for rapid arithmetic and bitwise operations, though this imposes strict limits on the representable range. These types, like theint64_t in C++ or Java's long, provide predictable performance in performance-critical scenarios but cannot handle values exceeding their bit width without overflow or modular wrapping.
In contrast, arbitrary-precision integers, also known as bignums, support unbounded sizes through software emulation, allowing representation of integers of any magnitude limited only by available memory.[79] Languages and libraries achieve this by storing numbers as arrays of fixed-size "limbs"—each a machine word (e.g., 32 or 64 bits)—representing the value in base or , with the full integer formed by concatenating these limbs in little-endian order.[80] For instance, Java's BigInteger class uses an array of bytes for two's-complement storage, while the GNU Multiple Precision Arithmetic Library (GMP) employs mp_limb_t arrays for signed integers.[81] Python's built-in int type defaults to arbitrary precision, seamlessly transitioning from small fixed-width representations to multi-limb arrays as values grow.[16]
Core operations on arbitrary-precision integers, such as multiplication, rely on algorithms optimized for large inputs to mitigate the quadratic time complexity of naive methods. For numbers beyond a threshold (e.g., a few hundred bits), libraries like GMP invoke the Karatsuba algorithm, a divide-and-conquer technique that multiplies two -bit numbers using three multiplications of -bit numbers, achieving complexity and reducing overhead for medium-sized operands.[82] This approach incurs significant space overhead from the limb arrays and time penalties from software loops, often 10-100 times slower than hardware-accelerated fixed-precision operations, though assembly-optimized implementations in GMP minimize this gap.[79]
Arbitrary-precision integers are essential in domains requiring exact computation of enormous values, such as cryptography—where RSA keys often exceed 2048 bits, necessitating modular exponentiation on numbers far beyond fixed limits—and symbolic mathematics, as in computer algebra systems for manipulating polynomials or factorials like without approximation errors.[83] In these contexts, libraries like GMP underpin cryptographic toolkits and symbolic engines, ensuring precision at the cost of efficiency.[84]
The primary trade-off favors fixed-precision for high-throughput applications, such as numerical simulations or loop-intensive code where bounded ranges suffice and hardware speed is paramount, while arbitrary-precision suits infrequent but exact large-integer tasks, accepting runtime and memory overhead for unbounded scalability—e.g., GMP operations on 1024-bit numbers may take microseconds versus nanoseconds for 64-bit natives.[85]