Recent from talks
Nothing was collected or created yet.
Base62
View on WikipediaBase62 is a binary-to-text encoding that represents arbitrary data (including binary data) as ASCII text. It encodes data as the 62 letters and digits of ASCII – capital letters A-Z, lower case letters a-z and digits 0–9.[1][2]
123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz = 58 characters = base58 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz = 62 characters = base62 0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz+/ = 64 characters = base64
Alphabet
[edit]The Base62 alphabet:
| Value | Char | Value | Char | Value | Char | Value | Char | |||
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 |
16 | G |
32 | W |
48 | m
| |||
| 1 | 1 |
17 | H |
33 | X |
49 | n
| |||
| 2 | 2 |
18 | I |
34 | Y |
50 | o
| |||
| 3 | 3 |
19 | J |
35 | Z |
51 | p
| |||
| 4 | 4 |
20 | K |
36 | a |
52 | q
| |||
| 5 | 5 |
21 | L |
37 | b |
53 | r
| |||
| 6 | 6 |
22 | M |
38 | c |
54 | s
| |||
| 7 | 7 |
23 | N |
39 | d |
55 | t
| |||
| 8 | 8 |
24 | O |
40 | e |
56 | u
| |||
| 9 | 9 |
25 | P |
41 | f |
57 | v
| |||
| 10 | A |
26 | Q |
42 | g |
58 | w
| |||
| 11 | B |
27 | R |
43 | h |
59 | x
| |||
| 12 | C |
28 | S |
44 | i |
60 | y
| |||
| 13 | D |
29 | T |
45 | j |
61 | z
| |||
| 14 | E |
30 | U |
46 | k |
|||||
| 15 | F |
31 | V |
47 | l |
See also
[edit]References
[edit]- ^ Kejing He; Xiancheng Xu; Qiang Yue (November 19, 2008). "A secure, lossless, and compressed Base62 encoding". 2008 11th IEEE Singapore International Conference on Communication Systems. Institute of Electrical and Electronics Engineers. pp. 761–765. doi:10.1109/ICCS.2008.4737287. ISBN 978-1-4244-2423-8. S2CID 10831128.
This base62 compressed encoding has been tested & The 62 alphanumeric characters (A-Z, a-z, 0–9)
- ^ Wu, Pei-Chi (June 18, 2001). "A base62 transformation format of ISO 10646 for multilingual identifiers". Software: Practice and Experience. 31 (12): 1125–1130. doi:10.1002/spe.408. S2CID 32472727. Retrieved August 13, 2020.
within a [0–9][A–Z][a–z] range, a total of 62 base characters
Base62
View on GrokipediaDefinition and Components
Definition
Base62 is a binary-to-text encoding scheme that converts binary data or integers into compact strings using a set of 62 alphanumeric characters, representing values in a base-62 positional numeral system.[4] This approach allows for efficient storage and transmission of data in a human-readable format, particularly for large numerical identifiers.[5] At its core, Base62 operates on the mathematical principle of positional notation, where each character in the string denotes a coefficient multiplied by a power of 62, starting from the rightmost position at 62^0. This structure permits the encoding of an extraordinarily large range of values in few characters; for example, a single-character Base62 string can represent 62 possible values, while a two-character string expands to 62^2 = 3,844 possibilities.[4][5] While rooted in numeral system theory, Base62 extends beyond simple arithmetic conversions by focusing on the transformation of arbitrary binary sequences into safe, printable text, distinguishing it from purely numerical bases like decimal or hexadecimal.[4] One of its defining properties is URL-safety, achieved by employing only alphanumeric characters that avoid problematic symbols such as +, /, or =, which can disrupt web contexts or require additional escaping.[5]Character Set
The Base62 character set comprises 62 alphanumeric characters from the ASCII standard: the digits 0 through 9, the uppercase letters A through Z, and the lowercase letters a through z.[6] This set is defined in the UTF-62 transformation format for ISO 10646, which maps these characters to integer values 0 through 61 to enable compact representation of multilingual identifiers while preserving lexicographic sorting order.[6] The common mapping convention assigns values sequentially based on the alphabet string "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz", where digits receive values 0-9, uppercase letters A-Z receive 10-35, and lowercase letters a-z receive 36-61.[6][7] Base62 encoding is case-sensitive, distinguishing between uppercase and lowercase letters to utilize the full 62 symbols. While this order (0-9 followed by A-Z then a-z) is widely adopted, some implementations reverse the case order to "0-9a-zA-Z" for specific applications, though consistency within a system is essential for correct decoding.[7] The full mapping of characters to their decimal equivalents is provided in the table below:| Value | Character | Value | Character | Value | Character |
|---|---|---|---|---|---|
| 0 | 0 | 20 | K | 40 | e |
| 1 | 1 | 21 | L | 41 | f |
| 2 | 2 | 22 | M | 42 | g |
| 3 | 3 | 23 | N | 43 | h |
| 4 | 4 | 24 | O | 44 | i |
| 5 | 5 | 25 | P | 45 | j |
| 6 | 6 | 26 | Q | 46 | k |
| 7 | 7 | 27 | R | 47 | l |
| 8 | 8 | 28 | S | 48 | m |
| 9 | 9 | 29 | T | 49 | n |
| 10 | A | 30 | U | 50 | o |
| 11 | B | 31 | V | 51 | p |
| 12 | C | 32 | W | 52 | q |
| 13 | D | 33 | X | 53 | r |
| 14 | E | 34 | Y | 54 | s |
| 15 | F | 35 | Z | 55 | t |
| 16 | G | 36 | a | 56 | u |
| 17 | H | 37 | b | 57 | v |
| 18 | I | 38 | c | 58 | w |
| 19 | J | 39 | d | 59 | x |
Encoding and Decoding
Encoding Procedure
The Base62 encoding procedure transforms a non-negative integer or arbitrary binary data into a compact string representation using a 62-character alphabet, typically consisting of digits 0-9, uppercase letters A-Z, and lowercase letters a-z.[10] This process is analogous to standard positional numeral system conversion but employs base 62 instead of base 10, enabling denser encoding suitable for identifiers and short links.[1]Integer Encoding Algorithm
To encode a non-negative integer into Base62, the following iterative division algorithm is used, which generates digits from least to most significant and then reverses them to produce the standard most-significant-digit-first order:- If , return the single character "0" as the encoded string.
- Initialize an empty list or string to collect the digits.
- While :
- Reverse the collected digits and concatenate them into the final string.
- (maps to '7'), .
- (maps to 'D'), .
- (maps to '3'), .
- Collected digits: ['7', 'D', '3']; reversed: "3D7".[12]
function encodeInteger(n):
if n == 0:
return "0"
digits = []
while n > 0:
r = n % 62
digits.append([alphabet](/page/Alphabet)[r])
n = [floor](/page/Floor)(n / 62)
reverse(digits)
return join(digits)
function encodeInteger(n):
if n == 0:
return "0"
digits = []
while n > 0:
r = n % 62
digits.append([alphabet](/page/Alphabet)[r])
n = [floor](/page/Floor)(n / 62)
reverse(digits)
return join(digits)
alphabet is the 62-character string.[10]
Edge cases include encoding 0, which produces the single character "0" without additional digits.[10] The algorithm inherently omits leading zeros by stopping when and only including non-zero most-significant digits. For a 64-bit integer (up to ), the maximum encoded length is 11 characters, as ; this can be derived by solving , where , yielding approximately 10.75, rounded up to 11.[1]
Handling Binary Data
For binary data, such as a byte array, the encoding treats the input as a single large integer in big-endian byte order (most significant byte first). This integer is then encoded using the integer algorithm above. To convert the byte array to an integer, the bytes are interpreted as coefficients in base 256, forming , where are the byte values and is the length.[13] If the binary data requires a fixed output length (e.g., for consistent identifier sizes), leading zero characters may be prepended to the encoded string to pad it; otherwise, no padding is applied, and the length varies with the input size. An empty byte array is typically encoded as "0".[14] For large binary inputs exceeding standard integer sizes, arbitrary-precision arithmetic (e.g., BigInt) is used to handle the divisions without overflow.[13]Decoding Procedure
The decoding procedure for Base62 reverses the encoding process by interpreting the alphanumeric string as a base-62 numeral and converting it to a decimal integer, which can then be transformed into binary data if needed. This algorithm relies on the standard 62-character set, mapping each symbol to its corresponding value from 0 to 61.[15] The core steps involve processing the string from left to right (most significant digit first). Begin with an initial value of zero. For each character, multiply the current accumulated value by 62 and add the integer value of that character based on its position in the alphabet: digits 0-9 map to 0-9, uppercase A-Z to 10-35, and lowercase a-z to 36-61. This iterative accumulation efficiently computes the decimal equivalent without needing to calculate explicit powers of 62.[16] The mathematical formulation is as follows:value ← 0
for each character c in the string (left to right):
value ← value × 62 + char_value(c)
value ← 0
for each character c in the string (left to right):
value ← value × 62 + char_value(c)
char_value(c) returns the numeric index of c in the Base62 alphabet.[17]
For instance, consider decoding the string "3D7", which represents the integer 12345:
- Start with value = 0.
- First character '3' maps to 3: value = 0 × 62 + 3 = 3.
- Second character 'D' (uppercase, 13th letter after digits: A=10, ..., D=13): value = 3 × 62 + 13 = 199.
- Third character '7' maps to 7: value = 199 × 62 + 7 = 12345.
