Base62

Base62Main

Community hub

Base62

7 pages, 0 posts

0 subscribers

Recent from talks

Be the first to start a discussion here.

Recent from talks

Be the first to start a discussion here.

Contribute something

About hubMembersContent overviewUpdatesRules

Main reference articles

Base62

View on Wikipedia

from Wikipedia

Base62 is a binary-to-text encoding that represents arbitrary data (including binary data) as ASCII text. It encodes data as the 62 letters and digits of ASCII – capital letters A-Z, lower case letters a-z and digits 0–9.^[1]^[2]

123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz
= 58 characters = base58

0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
= 62 characters = base62

0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz+/
= 64 characters = base64

Alphabet

[edit]

The Base62 alphabet:

Value	Char	Value	Char	Value	Char	Value	Char
0	`0`	16	`G`	32	`W`	48	`m`
1	`1`	17	`H`	33	`X`	49	`n`
2	`2`	18	`I`	34	`Y`	50	`o`
3	`3`	19	`J`	35	`Z`	51	`p`
4	`4`	20	`K`	36	`a`	52	`q`
5	`5`	21	`L`	37	`b`	53	`r`
6	`6`	22	`M`	38	`c`	54	`s`
7	`7`	23	`N`	39	`d`	55	`t`
8	`8`	24	`O`	40	`e`	56	`u`
9	`9`	25	`P`	41	`f`	57	`v`
10	`A`	26	`Q`	42	`g`	58	`w`
11	`B`	27	`R`	43	`h`	59	`x`
12	`C`	28	`S`	44	`i`	60	`y`
13	`D`	29	`T`	45	`j`	61	`z`
14	`E`	30	`U`	46	`k`
15	`F`	31	`V`	47	`l`

References

[edit]

^ Kejing He; Xiancheng Xu; Qiang Yue (November 19, 2008). "A secure, lossless, and compressed Base62 encoding". 2008 11th IEEE Singapore International Conference on Communication Systems. Institute of Electrical and Electronics Engineers. pp. 761–765. doi:10.1109/ICCS.2008.4737287. ISBN 978-1-4244-2423-8. S2CID 10831128. This base62 compressed encoding has been tested & The 62 alphanumeric characters (A-Z, a-z, 0–9)
^ Wu, Pei-Chi (June 18, 2001). "A base62 transformation format of ISO 10646 for multilingual identifiers". Software: Practice and Experience. 31 (12): 1125–1130. doi:10.1002/spe.408. S2CID 32472727. Retrieved August 13, 2020. within a [0–9][A–Z][a–z] range, a total of 62 base characters

Revisions and contributors Edit on Wikipedia Read on Wikipedia

View on Grokipedia

from Grokipedia

Base62 is a positional numeral system and binary-to-text encoding scheme that utilizes a character set of 62 alphanumeric symbols to represent data in a compact, human-readable format.^[1] The standard alphabet consists of the ten digits (0-9), the 26 uppercase letters (A-Z), and the 26 lowercase letters (a-z), enabling the encoding of integers or binary data into shorter strings compared to base-10 or base-16 systems.^[2] Unlike Base64, which includes special characters like "+" and "/", Base62 avoids non-alphanumeric symbols, making it particularly suitable for URL-safe identifiers and web applications.^[1] The encoding process involves repeated division of the input number by 62, with the remainders determining the characters selected from the alphabet, starting from the least significant digit.^[3] For example, the decimal number 11157 converts to "2TX" in Base62, where each position represents a power of 62 (e.g., 62², 62¹, 62⁰).^[3] This method allows for exponential growth in representable values; a 7-character Base62 string can encode approximately 3.5 trillion unique combinations (62⁷), far surpassing the 10 million possible with 7 digits in base-10.^[2] Base62 is widely adopted in URL shortening services, such as bit.ly, to transform long numeric database IDs into brief, memorable codes that facilitate redirection to original links.^[1] Its efficiency supports high-scale systems, where generating 1,000 short URLs per second could theoretically exhaust a 7-character space in over 110 years.^[2] Additionally, it finds use in generating unique keys for databases, file naming, and other scenarios demanding collision-resistant, compact representations without relying on special characters.^[3]

Definition and Components

Definition

Base62 is a binary-to-text encoding scheme that converts binary data or integers into compact strings using a set of 62 alphanumeric characters, representing values in a base-62 positional numeral system.^[4] This approach allows for efficient storage and transmission of data in a human-readable format, particularly for large numerical identifiers.^[5] At its core, Base62 operates on the mathematical principle of positional notation, where each character in the string denotes a coefficient multiplied by a power of 62, starting from the rightmost position at 62^0. This structure permits the encoding of an extraordinarily large range of values in few characters; for example, a single-character Base62 string can represent 62 possible values, while a two-character string expands to 62^2 = 3,844 possibilities.^[4]^[5] While rooted in numeral system theory, Base62 extends beyond simple arithmetic conversions by focusing on the transformation of arbitrary binary sequences into safe, printable text, distinguishing it from purely numerical bases like decimal or hexadecimal.^[4] One of its defining properties is URL-safety, achieved by employing only alphanumeric characters that avoid problematic symbols such as +, /, or =, which can disrupt web contexts or require additional escaping.^[5]

Character Set

The Base62 character set comprises 62 alphanumeric characters from the ASCII standard: the digits 0 through 9, the uppercase letters A through Z, and the lowercase letters a through z.^[6] This set is defined in the UTF-62 transformation format for ISO 10646, which maps these characters to integer values 0 through 61 to enable compact representation of multilingual identifiers while preserving lexicographic sorting order.^[6] The common mapping convention assigns values sequentially based on the alphabet string "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz", where digits receive values 0-9, uppercase letters A-Z receive 10-35, and lowercase letters a-z receive 36-61.^[6]^[7] Base62 encoding is case-sensitive, distinguishing between uppercase and lowercase letters to utilize the full 62 symbols. While this order (0-9 followed by A-Z then a-z) is widely adopted, some implementations reverse the case order to "0-9a-zA-Z" for specific applications, though consistency within a system is essential for correct decoding.^[7] The full mapping of characters to their decimal equivalents is provided in the table below:

Value	Character	Value	Character	Value	Character
0	0	20	K	40	e
1	1	21	L	41	f
2	2	22	M	42	g
3	3	23	N	43	h
4	4	24	O	44	i
5	5	25	P	45	j
6	6	26	Q	46	k
7	7	27	R	47	l
8	8	28	S	48	m
9	9	29	T	49	n
10	A	30	U	50	o
11	B	31	V	51	p
12	C	32	W	52	q
13	D	33	X	53	r
14	E	34	Y	54	s
15	F	35	Z	55	t
16	G	36	a	56	u
17	H	37	b	57	v
18	I	38	c	58	w
19	J	39	d	59	x

Note: The table lists the first 60 entries for completeness; the pattern continues sequentially to 61: z. The complete alphabet string is "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz".^[6] The selection of an alphanumeric-only set is driven by the need for space efficiency in representing large integers compactly, particularly in case-sensitive contexts like identifiers.^[6] This choice enhances readability in human-facing applications such as URLs, as the characters are familiar and do not include special symbols that could require percent-encoding or cause parsing issues.^[8] The standard set includes potentially confusable characters like '0' and 'O', '1' and 'l' or 'I', which may lead to visual ambiguity in certain fonts; however, some variant implementations exclude these to prioritize clarity, reducing the effective base size.^[9]

Encoding and Decoding

Encoding Procedure

The Base62 encoding procedure transforms a non-negative integer or arbitrary binary data into a compact string representation using a 62-character alphabet, typically consisting of digits 0-9, uppercase letters A-Z, and lowercase letters a-z.^[10] This process is analogous to standard positional numeral system conversion but employs base 62 instead of base 10, enabling denser encoding suitable for identifiers and short links.^[1]

Integer Encoding Algorithm

To encode a non-negative integer

n

into Base62, the following iterative division algorithm is used, which generates digits from least to most significant and then reverses them to produce the standard most-significant-digit-first order:

If $n = [0](/page/0)$ , return the single character "0" as the encoded string.
Initialize an empty list or string to collect the digits.
While $n > [0](/page/0)$ :
- Compute the remainder $r = n \mod 62$ .
- Append the character from the alphabet corresponding to index $r$ to the list (where index 0 maps to '0', 10 to 'A', 36 to 'a', etc.).
- Update $n = \lfloor n / 62 \rfloor$ .
Reverse the collected digits and concatenate them into the final string.

This method ensures no leading zeros in the output except for the special case of

n = 0

.^[10] The formula for each digit extraction is

r = n \mod 62

, followed by

n = \lfloor n / 62 \rfloor

, with

r

mapped to the appropriate character; repetition continues until

n = 0

.^[11] For example, encoding the integer 12345 using the alphabet "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz" yields "3D7":

$12345 \mod 62 = 7$ (maps to '7'), $n = [199](/page/199)$ .
$[199](/page/199) \mod 62 = 13$ (maps to 'D'), $n = 3$ .
$3 \mod 62 = 3$ (maps to '3'), $n = 0$ .
Collected digits: ['7', 'D', '3']; reversed: "3D7".^[12]

A pseudocode representation in a procedural style is:

function encodeInteger(n): if n == 0: return "0" digits = [] while n > 0: r = n % 62 digits.append([alphabet](/page/Alphabet)[r]) n = [floor](/page/Floor)(n / 62) reverse(digits) return join(digits)

where alphabet is the 62-character string.^[10] Edge cases include encoding 0, which produces the single character "0" without additional digits.^[10] The algorithm inherently omits leading zeros by stopping when

n = [0](/page/0)

and only including non-zero most-significant digits. For a 64-bit integer (up to

2^{64} - 1

), the maximum encoded length is 11 characters, as

62^{10} \approx 8.39 \times 10^{17} < 2^{64} \approx 1.84 \times 10^{19} < 62^{11} \approx 5.20 \times 10^{19}

; this can be derived by solving

\lceil \log_{62}(2^{64}) \rceil = \lceil 64 \cdot \log_{62} 2 \rceil

, where

\log_{62} 2 \approx 0.3010 / 1.7924 \approx 0.1679

, yielding approximately 10.75, rounded up to 11.^[1]

Handling Binary Data

For binary data, such as a byte array, the encoding treats the input as a single large integer in big-endian byte order (most significant byte first). This integer is then encoded using the integer algorithm above. To convert the byte array to an integer, the bytes are interpreted as coefficients in base 256, forming

\sum_{i=0}^{k-1} b_i \cdot 256^{k-1-i}

, where

b_i

are the byte values and

k

is the length.^[13] If the binary data requires a fixed output length (e.g., for consistent identifier sizes), leading zero characters may be prepended to the encoded string to pad it; otherwise, no padding is applied, and the length varies with the input size. An empty byte array is typically encoded as "0".^[14] For large binary inputs exceeding standard integer sizes, arbitrary-precision arithmetic (e.g., BigInt) is used to handle the divisions without overflow.^[13]

Decoding Procedure

The decoding procedure for Base62 reverses the encoding process by interpreting the alphanumeric string as a base-62 numeral and converting it to a decimal integer, which can then be transformed into binary data if needed. This algorithm relies on the standard 62-character set, mapping each symbol to its corresponding value from 0 to 61.^[15] The core steps involve processing the string from left to right (most significant digit first). Begin with an initial value of zero. For each character, multiply the current accumulated value by 62 and add the integer value of that character based on its position in the alphabet: digits 0-9 map to 0-9, uppercase A-Z to 10-35, and lowercase a-z to 36-61. This iterative accumulation efficiently computes the decimal equivalent without needing to calculate explicit powers of 62.^[16] The mathematical formulation is as follows:

value ← 0 for each character c in the string (left to right): value ← value × 62 + char_value(c)

where char_value(c) returns the numeric index of c in the Base62 alphabet.^[17] For instance, consider decoding the string "3D7", which represents the integer 12345:

Start with value = 0.
First character '3' maps to 3: value = 0 × 62 + 3 = 3.
Second character 'D' (uppercase, 13th letter after digits: A=10, ..., D=13): value = 3 × 62 + 13 = 199.
Third character '7' maps to 7: value = 199 × 62 + 7 = 12345.

This yields the original integer 12345.^[12] When the original data is binary (e.g., a byte array), the decoded integer is converted to bytes using big-endian ordering, where the most significant byte is placed first. This reconstructs the input bytes by serializing the integer into an array of 8-bit values, typically trimming leading zero bytes unless the original length is preserved separately. Libraries handle this by using arbitrary-precision arithmetic to support large inputs.^[18] Error handling is essential for robustness. Each character must be validated against the alphabet; invalid symbols (e.g., '/' or other non-Base62 characters) trigger an exception or return an error. Base62 is case-sensitive, so substituting 'a' (36) for 'A' (10) produces incorrect results or validation failures. Leading '0' characters, which map to 0, do not change the final integer value but may indicate padded representations; in binary output, they could correspond to leading zero bytes if the input length is known, though standard decoding assumes minimal representation without explicit padding.^[15]

Applications

URL Shortening Services

URL shortening services apply Base62 encoding to convert lengthy original URLs into compact, alphanumeric identifiers that redirect to the target destination. This technique enables the creation of memorable and shareable links, often prefixed with a custom domain like bit.ly or tinyurl.com.^[2] The core process begins by generating a unique numerical identifier for the input URL, typically through hashing the URL with a function such as MD5 to derive an integer value or by incrementing a global counter for sequential assignment. This identifier is then converted to a Base62 string using the standard encoding procedure, producing a short code of fixed length. For instance, a counter value like 1234567 might encode to a 6-character string such as "K4v8xP" in Base62.^[3]^[2] Prominent services like Bitly implement Base62 encoding schemes to generate these short codes, commonly limiting them to 6-7 characters to accommodate vast scales—such as supporting over 3.5 trillion unique URLs with a 7-character length.^[19]^[20] In practice, these short links fit seamlessly within the constraints of platforms like Twitter's original 140-character tweet limit, preserving more space for textual content while sharing resources. Additionally, the alphanumeric character set of Base62 ensures compatibility without requiring percent-encoding for special symbols, simplifying integration across web and mobile environments.^[20]^[21] To maintain uniqueness, especially in hashing-based methods, services store mappings in a database and perform lookups during code generation; if a collision occurs, a new identifier is derived and re-encoded. This lookup step prevents duplicates without relying solely on cryptographic guarantees.^[2]

Unique Identifier Generation

Base62 encoding plays a crucial role in generating compact, unique identifiers for various data systems, including primary keys in databases and API tokens. In NoSQL databases like MongoDB, ObjectIDs—typically 12-byte binary values—can be encoded into Base62 strings to create shorter, alphanumeric representations suitable for storage and querying, as facilitated by libraries such as base-id.^[22] Similarly, in relational databases, Base62-encoded UUIDs serve as primary keys, providing consistent 22-character strings that enhance readability and URL compatibility without sacrificing uniqueness.^[23] For API tokens, systems like Qovery employ Base62 to produce secure, delimited strings composed solely of alphanumeric characters, ensuring they are URL-safe and easy to handle in authentication flows.^[24] In file naming conventions, Base62 identifiers derived from UUIDs or hashes prevent collisions in distributed storage environments by generating concise, unique labels.^[25] Two primary methods are used to generate Base62 unique identifiers: encoding sequential counters and encoding random bytes. Sequential generation involves taking an auto-incrementing integer from a database primary key and converting it to Base62, which compresses large numeric values into short strings ideal for high-volume systems.^[26] For instance, libraries like Base62UUID start with a random v4 UUID (128 bits) and encode it to Base62, yielding identifiers such as "5rljkyCY7vXDv2bPAnCQdL".^[23] Random byte encoding, often applied to UUIDs or cryptographic hashes, produces non-sequential IDs that maintain global uniqueness across distributed nodes. These methods leverage the full 62-character alphabet (0-9, A-Z, a-z) to maximize density, with decoding procedures allowing retrieval of the original numeric or binary value for database lookups. A representative example of Base62's capacity is a 10-character identifier, which supports 62^{10} unique combinations, approximately 8.39 \times 10^{17} possibilities or roughly equivalent to 2^{60} distinct values, sufficient for most large-scale applications.^[27] This brevity contrasts with standard UUIDs, which require 36 characters in hexadecimal format; Base62 encoding reduces a 128-bit UUID to 22 characters, enabling scalable identifier management in databases handling billions of records without excessive string lengths.^[28] Such efficiency supports high-throughput systems, where short IDs minimize storage overhead and improve performance in indexing and transmission.

Advantages and Disadvantages

Advantages

Base62 encoding provides a compact representation of binary data, utilizing approximately 5.95 bits per character due to its 62-character alphabet, which is more efficient than Base64's 6 bits per character.^[29] This results in shorter encoded strings for the same input; for example, a 128-bit value requires about 22 characters in Base62, compared to 24 characters in Base64 (including padding).^[30]^[31] The character set of Base62 consists solely of alphanumeric symbols (0-9, A-Z, a-z), making it inherently URL-safe without the need for additional percent-encoding of reserved characters like +, /, or = found in Base64.^[32] This property simplifies integration in web applications and reduces potential errors in URL handling.^[33] Base62's use of familiar alphanumeric characters enhances human readability and ease of transcription, as the output resembles typical identifiers rather than arbitrary symbols.^[33] Its case sensitivity—distinguishing between uppercase and lowercase letters—further increases encoding density, effectively doubling the capacity compared to case-insensitive schemes like Base36 that use only 36 characters.^[34]

Disadvantages

Base62 encoding, while compact, introduces case sensitivity as a notable limitation, distinguishing between uppercase and lowercase letters in its 62-character alphabet. This can lead to errors in systems or user interfaces where inputs are expected to be case-insensitive, such as certain password fields or database collations that normalize cases, potentially causing mismatches during validation or storage.^[35]^[36] Another challenge arises from visual ambiguities among characters like '0' (zero) and 'O' (uppercase O), or '1' (one), 'l' (lowercase L), and 'I' (uppercase I), which can appear similar in certain fonts or under poor visibility conditions. These similarities increase the risk of transcription errors during manual entry, such as when users type identifiers from screenshots or handwritten notes, potentially leading to invalid decodings or security incidents in applications relying on human input.^[37]^[9] Base62 lacks a formal standardization, resulting in variations across implementations, including differences in alphabet ordering (e.g., whether digits precede letters or the reverse) or even exclusions of certain lowercase letters in some custom setups. This absence of a universal specification can cause interoperability issues, where strings encoded in one library fail to decode correctly in another, complicating integration in multi-vendor environments.^[13]^[38] In terms of computational performance, Base62 operations are less efficient on binary-oriented hardware compared to power-of-two bases like Base64, as the base-62 arithmetic requires repeated divisions by 62—a non-power-of-two number—that do not align neatly with byte boundaries (no common exponents between 62 and 256). This misalignment prevents simple block-based processing, often resulting in overhead during encoding and decoding, with Base62 generally underperforming Base64 in speed for large datasets.^[39]^[13] From a security perspective, Base62 is not suitable for cryptographic applications, as it is a deterministic encoding scheme rather than a secure hashing or randomization method, offering no resistance to collision attacks or entropy guarantees. Additionally, when applied to sequential integer IDs (e.g., auto-incrementing database keys), the resulting strings remain predictable, enabling attackers to enumerate or guess subsequent identifiers through simple incrementation, which exposes sensitive resources in systems like URL shorteners.^[40]^[41]

Historical Development

Origins

Base62 encoding emerged in the early 2000s as a variant of Base64 tailored for web applications, employing only the 62 alphanumeric characters (A–Z, a–z, 0–9) to eliminate URL-unsafe symbols such as +, /, and padding with =. The foundational Base64 encoding, originally developed for binary data transport in email and standardized in RFC 2045 for MIME types, had highlighted the need for safer alternatives in URL contexts where special characters could trigger encoding issues or parsing errors. One of the earliest documented proposals for Base62 appeared in 2001, when Pei-Chi Wu introduced UTF-62, a transformation format for the ISO 10646 Universal Character Set. This approach encoded multilingual programming language identifiers into a compact, case-sensitive alphanumeric string while preserving the lexicographic sorting order of UCS-4, offering space efficiency for non-English symbols without requiring full 31-bit representations.^[6] Practical implementation followed soon after, with Pip Stuart releasing the Perl module Math::BaseCnv on CPAN in July 2003. The module supported rapid conversion between arbitrary bases from 2 to 36 (and up to 64 or 96 with custom digits), explicitly including base62 via the digit set ['0'..'9', 'a'..'z', 'A'..'Z'], enabling developers to generate alphanumeric representations of integers for various applications.^[42] By the mid-2000s, Base62 gained traction through custom scripts in Perl and PHP for web services requiring short, human-readable identifiers, particularly amid the Web 2.0 boom that emphasized user-generated content and shareable links. Notable early adoption occurred in URL shortening tools; for instance, Qurl.net employed Base62 encoding by 2007 to produce concise, alphanumeric shortcodes from sequential IDs.^[43] There is no single inventor of Base62, as it organically evolved from existing base-conversion libraries and the practical demands for compact, URL-compatible encodings in emerging online platforms.

Evolution and Adoption

Base62 encoding saw significant growth in the late 2000s and early 2010s, largely propelled by the demand for compact, URL-safe identifiers in web services. Its popularity surged alongside the rise of URL shortening platforms, with services like Bitly—launched in 2008—exemplifying the need for efficient alphanumeric representations that could pack more information into fewer characters than hexadecimal or decimal systems. This era marked a key milestone, as Base62's 62-character alphabet (0-9, A-Z, a-z) became a go-to for generating short codes without special symbols that could disrupt URLs.^[44] Adoption accelerated through dedicated libraries across programming languages, enabling developers to integrate Base62 seamlessly into applications. In Python, the pybase62 module was released in 2013, offering straightforward encoding and decoding functions for integers and binary data. JavaScript followed suit around 2015, with packages like base62.js and base-x providing Node.js support for converting UUIDs and other data into compact strings, particularly useful for browser-based and server-side web tools. In the Ruby ecosystem, gems such as base62 emerged by 2014, facilitating its use in frameworks like Ruby on Rails for tasks like ID obfuscation and short-link generation. By 2025, Base62 has solidified as a standard in cloud computing and open-source development, with widespread proliferation across repositories on GitHub and npm, where dozens of implementations support diverse use cases from data serialization to identifier shortening. A notable example of enterprise adoption is Amazon Web Services' Quantum Ledger Database (QLDB), which represents 128-bit globally unique document IDs using base62 encoding to ensure compactness and readability.^[45] This integration highlights Base62's maturity, as it balances density with compatibility in distributed systems, though it continues to evolve alongside needs for even shorter or more secure encodings in emerging technologies.

Comparisons with Other Encodings

Base64

Base64 is a binary-to-text encoding scheme that represents binary data using a set of 64 printable ASCII characters, encoding 6 bits per character.^[46] Its character set consists of the uppercase letters A–Z, lowercase letters a–z, digits 0–9, plus the symbols "+" and "/", with "=" used for padding to ensure the output length is a multiple of 4 characters.^[46] Developed for safe transmission of binary data in text-based protocols, Base64 originated in the MIME standard for email attachments and remains widely used for embedding binary content in formats like JSON, XML, and HTTP.^[46] In contrast to Base62, which employs only the 62 alphanumeric characters (0–9, A–Z, a–z) for encoding approximately 5.95 bits per character, Base64's inclusion of special characters like "+" and "/" often requires URL encoding in web contexts, potentially complicating use in identifiers or links. While Base64 achieves higher density per character due to its power-of-2 base (exactly 6 bits), its padding mechanism can increase overall length for inputs not divisible by 3 bytes, whereas Base62 avoids padding entirely, sometimes resulting in shorter outputs for certain data sizes. Additionally, Base64's standardization provides robust library support across languages, unlike the more ad-hoc implementations of Base62. Base64 is preferable for general-purpose binary data transfer, such as in email, web APIs, or file attachments, where efficiency in representing arbitrary binary streams and compatibility with legacy systems are prioritized.^[46] Base62, however, suits web-friendly applications like URL shortening or unique IDs, where alphanumeric-only output enhances readability and avoids encoding pitfalls without sacrificing much compactness. For example, encoding 128 bits (16 bytes) of data yields 22 characters in Base64, including padding, compared to about 22 characters in Base62.

Base32

Base32 is a binary-to-text encoding scheme that utilizes a 32-character alphabet comprising the uppercase letters A–Z and the digits 2–7, with each character representing 5 bits of data.^[46] Specified in RFC 4648, this encoding is designed for case-insensitive representation of arbitrary binary sequences and deliberately excludes lowercase letters to enhance simplicity and compatibility with systems handling only 7-bit ASCII.^[46] It is widely employed in one-time password systems, such as Time-based One-Time Password (TOTP), where shared secrets are encoded as Base32 strings for easy provisioning via QR codes in applications like Google Authenticator.^[47] Compared to Base62, which employs a 62-character set of digits 0–9, uppercase A–Z, and lowercase a–z to encode roughly 5.95 bits per character (calculated as \log_2(62) \approx 5.95), Base32 generates longer outputs—approximately 20% more characters for equivalent data—due to its reduced alphabet size. However, Base32 offers greater resistance to transcription errors in human-readable contexts by omitting potentially confusable elements like lowercase letters, zero (0), one (1), eight (8), and nine (9), thereby minimizing ambiguities such as O versus 0 or I versus 1.^[46] Base62 provides higher density, making it preferable for non-security-sensitive applications focused on compactness.^[48] For a 64-bit value, Base32 encoding requires about 13 characters (\lceil 64 / 5 \rceil = 13), while Base62 needs roughly 11 (\lceil 64 / \log_2(62) \rceil \approx 11). This length disparity highlights Base32's trade-off: enhanced safety at the expense of efficiency. Base32 suits scenarios involving manual input, like TOTP codes in authenticator apps, where error minimization is paramount.^[47] Conversely, Base62 excels in database identifiers or URL shorteners, prioritizing minimal length for storage and transmission without relying on human parsing.^[48]

Base58

Base58 is a binary-to-text encoding scheme utilizing 58 alphanumeric characters, specifically excluding 0, O, I, and l to minimize visual ambiguities that could lead to transcription errors in human-readable formats.^[49] This design choice makes it particularly suitable for cryptographic applications, such as encoding Bitcoin addresses, where accuracy in manual entry is critical for financial security.^[49] Each Base58 character encodes approximately 5.86 bits of data, calculated as

\log_2 58 \approx 5.858

.^[50] In contrast to Base62, which employs the complete set of 62 alphanumeric characters (0-9, A-Z, a-z) to achieve higher encoding density, Base58 sacrifices some compactness for enhanced readability in contexts like cryptocurrency wallets and keys.^[49] While Base58 produces slightly shorter strings than Base32 encodings due to its larger alphabet, it results in longer outputs compared to Base62 for the same input data, as Base62's additional characters allow for more efficient packing.^[49] The trade-offs highlight distinct use cases: Base58 excels in secure, human-interpretable identifiers for finance and crypto, where avoiding character confusion outweighs minor length increases, whereas Base62 is preferred for general web applications, such as URL shortening or IDs, where such ambiguities are tolerable and brevity is prioritized.^[49] For example, encoding a 160-bit hash requires approximately 28 characters in Base58 but only about 27 in Base62, demonstrating the density advantage of the latter.^[49]

History

Base62

Recent from talks

Recent from talks

Contribute something

Contribute something

Media Pages

Timelines

Articles

Notes collections

Notes

Notes

Days in Chronicle

Base62

Alphabet

See also

References

Base62

Definition and Components

Definition

Character Set

Encoding and Decoding

Encoding Procedure

Integer Encoding Algorithm

Handling Binary Data

Decoding Procedure

Applications

URL Shortening Services

Unique Identifier Generation

Advantages and Disadvantages

Advantages

Disadvantages

Historical Development

Origins

Evolution and Adoption

Comparisons with Other Encodings

Base64

Base32

Base58

References

Add your contribution

Related Hubs

Contribute something