Hubbry Logo
Salsa20Salsa20Main
Open search
Salsa20
Community hub
Salsa20
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Salsa20
Salsa20
from Wikipedia
Salsa20
The Salsa quarter-round function. Four parallel copies make a round.
General
DesignersDaniel J. Bernstein
First published2007 (designed 2005)[1]
SuccessorsChaCha
Related toRumba20
CertificationeSTREAM portfolio
Cipher detail
Key sizes128 or 256 bits
State size512 bits
StructureARX
Rounds20
Speed3.91 cpb on an Intel Core 2 Duo[2]
Best public cryptanalysis
2008 cryptanalysis breaks 8 out of 20 rounds to recover the 256-bit secret key in 2251 operations, using 231 keystream pairs.[3]

Salsa20 and the closely related ChaCha are stream ciphers developed by Daniel J. Bernstein. Salsa20, the original cipher, was designed in 2005, then later submitted to the eSTREAM European Union cryptographic validation process by Bernstein. ChaCha is a modification of Salsa20 published in 2008. It uses a new round function that increases diffusion and increases performance on some architectures.[4]

Both ciphers are built on a pseudorandom function based on add–rotate–XOR (ARX) operations — 32-bit addition, bitwise addition (XOR) and rotation operations. The core function maps a 256-bit key, a 64-bit nonce, and a 64-bit counter to a 512-bit block of the key stream (a Salsa version with a 128-bit key also exists). This gives Salsa20 and ChaCha the unusual advantage that the user can efficiently seek to any position in the key stream in constant time. Salsa20 offers speeds of around 4–14 cycles per byte in software on modern x86 processors,[5] and reasonable hardware performance. It is not patented, and Bernstein has written several public domain implementations optimized for common architectures.[6]

Structure

[edit]

Internally, the cipher uses bitwise addition ⊕ (exclusive OR), 32-bit addition mod 232 ⊞, and constant-distance rotation operations <<< on an internal state of sixteen 32-bit words. Using only add-rotate-xor operations avoids the possibility of timing attacks in software implementations. The internal state is made of sixteen 32-bit words arranged as a 4×4 matrix.

0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15

The initial state is made up of eight words of key ( ), two words of stream position ( ), two words of nonce (essentially additional stream position bits) ( ), and four fixed words ( ):

Initial state of Salsa20
"expa" Key Key Key
Key "nd 3" Nonce Nonce
Pos. Pos. "2-by" Key
Key Key Key "te k"

The constant words spell "expand 32-byte k" in ASCII (i.e. the 4 words are "expa", "nd 3", "2-by", and "te k"). This is an example of a nothing-up-my-sleeve number. The core operation in Salsa20 is the quarter-round QR(a, b, c, d) that takes a four-word input and produces a four-word output:

b ^= (a + d) <<<  7;
c ^= (b + a) <<<  9;
d ^= (c + b) <<< 13;
a ^= (d + c) <<< 18;

Odd-numbered rounds apply QR(a, b, c, d) to each of the four columns in the 4×4 matrix, and even-numbered rounds apply it to each of the four rows. Two consecutive rounds (column-round and row-round) together are called a double-round:

// Odd round
QR( 0,  4,  8, 12) // column 1
QR( 5,  9, 13,  1) // column 2
QR(10, 14,  2,  6) // column 3
QR(15,  3,  7, 11) // column 4
// Even round
QR( 0,  1,  2,  3) // row 1
QR( 5,  6,  7,  4) // row 2
QR(10, 11,  8,  9) // row 3
QR(15, 12, 13, 14) // row 4

An implementation in C/C++ appears below.

#include <stdint.h>
#define ROTL(a,b) (((a) << (b)) | ((a) >> (32 - (b))))
#define QR(a, b, c, d)(  \
	b ^= ROTL(a + d, 7), \
	c ^= ROTL(b + a, 9), \
	d ^= ROTL(c + b,13), \
	a ^= ROTL(d + c,18))
#define ROUNDS 20

void salsa20_block(uint32_t out[16], uint32_t const in[16])
{
	int i;
	uint32_t x[16];

	for (i = 0; i < 16; ++i)
		x[i] = in[i];
	// 10 loops × 2 rounds/loop = 20 rounds
	for (i = 0; i < ROUNDS; i += 2) {
		// Odd round
		QR(x[ 0], x[ 4], x[ 8], x[12]); // column 1
		QR(x[ 5], x[ 9], x[13], x[ 1]); // column 2
		QR(x[10], x[14], x[ 2], x[ 6]); // column 3
		QR(x[15], x[ 3], x[ 7], x[11]); // column 4
		// Even round
		QR(x[ 0], x[ 1], x[ 2], x[ 3]); // row 1
		QR(x[ 5], x[ 6], x[ 7], x[ 4]); // row 2
		QR(x[10], x[11], x[ 8], x[ 9]); // row 3
		QR(x[15], x[12], x[13], x[14]); // row 4
	}
	for (i = 0; i < 16; ++i)
		out[i] = x[i] + in[i];
}

In the last line, the mixed array is added, word by word, to the original array to obtain its 64-byte key stream block. This is important because the mixing rounds on their own are invertible. In other words, applying the reverse operations would produce the original 4×4 matrix, including the key. Adding the mixed array to the original makes it impossible to recover the input. (This same technique is widely used in hash functions from MD4 through SHA-2.)

Salsa20 performs 20 rounds of mixing on its input.[1] However, reduced-round variants Salsa20/8 and Salsa20/12 using 8 and 12 rounds respectively have also been introduced. These variants were introduced to complement the original Salsa20, not to replace it, and perform better[note 1] in the eSTREAM benchmarks than Salsa20, though with a correspondingly lower security margin.

XSalsa20 with 192-bit nonce

[edit]

In 2008, Bernstein proposed a variant of Salsa20 with 192-bit nonces called XSalsa20.[7][8][9] XSalsa20 is provably secure if Salsa20 is secure, but is more suitable for applications where longer nonces are desired. XSalsa20 feeds the key and the first 128 bits of the nonce into one block of Salsa20 (without the final addition, which may either be omitted, or subtracted after a standard Salsa20 block), and uses 256 bits of the output as the key for standard Salsa20 using the last 64 bits of the nonce and the stream position. Specifically, the 256 bits of output used are those corresponding to the non-secret portions of the input: indexes 0, 5, 10, 15, 6, 7, 8 and 9.

eSTREAM selection of Salsa20

[edit]

Salsa20/12 has been selected as a Phase 3 design for Profile 1 (software) by the eSTREAM project, receiving the highest weighted voting score of any Profile 1 algorithm at the end of Phase 2.[10] Salsa20 had previously been selected as a Phase 2 Focus design for Profile 1 (software) and as a Phase 2 design for Profile 2 (hardware) by the eSTREAM project,[11] but was not advanced to Phase 3 for Profile 2 because eSTREAM felt that it was probably not a good candidate for extremely resource-constrained hardware environments.[12]

The eSTREAM committee recommends the use of Salsa20/12, the 12-round variant, for "combining very good performance with a comfortable margin of security."[13]

Cryptanalysis of Salsa20

[edit]

As of 2015, there are no published attacks on Salsa20/12 or the full Salsa20/20; the best attack known[3] breaks 8 of the 12 or 20 rounds.

In 2005, Paul Crowley reported an attack on Salsa20/5 with an estimated time complexity of 2165 and won Bernstein's US$1000 prize for "most interesting Salsa20 cryptanalysis".[14] This attack and all subsequent attacks are based on truncated differential cryptanalysis. In 2006, Fischer, Meier, Berbain, Biasse, and Robshaw reported an attack on Salsa20/6 with estimated time complexity of 2177, and a related-key attack on Salsa20/7 with estimated time complexity of 2217.[15]

In 2007, Tsunoo et al. announced a cryptanalysis of Salsa20 which breaks 8 out of 20 rounds to recover the 256-bit secret key in 2255 operations, using 211.37 keystream pairs.[16] However, this attack does not seem to be competitive with the brute force attack.

In 2008, Aumasson, Fischer, Khazaei, Meier, and Rechberger reported a cryptanalytic attack against Salsa20/7 with a time complexity of 2151, and they reported an attack against Salsa20/8 with an estimated time complexity of 2251. This attack makes use of the new concept of probabilistic neutral key bits for probabilistic detection of a truncated differential. The attack can be adapted to break Salsa20/7 with a 128-bit key.[3]

In 2012, the attack by Aumasson et al. was improved by Shi et al. against Salsa20/7 (128-bit key) to a time complexity of 2109 and Salsa20/8 (256-bit key) to 2250.[17]

In 2013, Mouha and Preneel published a proof[18] that 15 rounds of Salsa20 was 128-bit secure against differential cryptanalysis. (Specifically, it has no differential characteristic with higher probability than 2−130, so differential cryptanalysis would be more difficult than 128-bit key exhaustion.)

In 2025, Dey et al. reported a cryptanalytic attack against Salsa20/8 with a time complexity of 2245.84 and data amounting to 299.47.[19]

ChaCha variant

[edit]
ChaCha
The ChaCha quarter-round function. Four parallel copies make a round.
General
DesignersDaniel J. Bernstein
First published2008
Derived fromSalsa20
Related toRumba20
Cipher detail
Key sizes128 or 256 bits
State size512 bits
StructureARX
Rounds20
Speed3.95 cpb on an Intel Core 2 Duo[4]: 2 

In 2008, Bernstein published the closely related ChaCha family of ciphers, which aim to increase the diffusion per round while achieving the same or slightly better performance.[20] The Aumasson et al. paper also attacks ChaCha, achieving one round fewer (for 256-bit ChaCha6 with complexity 2139, ChaCha7 with complexity 2248, and 128-bit ChaCha6 within 2107) but claims that the attack fails to break 128-bit ChaCha7.[3]

Like Salsa20, ChaCha's initial state includes a 128-bit constant, a 256-bit key, a 64-bit counter, and a 64-bit nonce (in the original version; as described later, a version of ChaCha from RFC 7539 is slightly different), arranged as a 4×4 matrix of 32-bit words.[20] But ChaCha re-arranges some of the words in the initial state:

Initial state of ChaCha
"expa" "nd 3" "2-by" "te k"
Key Key Key Key
Key Key Key Key
Counter Counter Nonce Nonce

The constant is the same as Salsa20 ("expand 32-byte k"). ChaCha replaces the Salsa20 quarter-round QR(a, b, c, d) with:

a += b; d ^= a; d <<<= 16;
c += d; b ^= c; b <<<= 12;
a += b; d ^= a; d <<<=  8;
c += d; b ^= c; b <<<=  7;

Notice that this version updates each word twice, while Salsa20's quarter round updates each word only once. In addition, the ChaCha quarter-round diffuses changes more quickly. On average, after changing 1 input bit the Salsa20 quarter-round will change 8 output bits while ChaCha will change 12.5 output bits.[4]

The ChaCha quarter round has the same number of adds, xors, and bit rotates as the Salsa20 quarter-round, but the fact that two of the rotates are multiples of 8 allows for a small optimization on some architectures including x86.[21] Additionally, the input formatting has been rearranged to support an efficient SSE implementation optimization discovered for Salsa20. Rather than alternating rounds down columns and across rows, they are performed down columns and along diagonals.[4]: 4  Like Salsa20, ChaCha arranges the sixteen 32-bit words in a 4×4 matrix. If we index the matrix elements from 0 to 15

0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15

then a double round in ChaCha is:

// Odd round
QR(0, 4,  8, 12) // column 1
QR(1, 5,  9, 13) // column 2
QR(2, 6, 10, 14) // column 3
QR(3, 7, 11, 15) // column 4
// Even round
QR(0, 5, 10, 15) // diagonal 1 (main diagonal)
QR(1, 6, 11, 12) // diagonal 2
QR(2, 7,  8, 13) // diagonal 3
QR(3, 4,  9, 14) // diagonal 4

ChaCha20 uses 10 iterations of the double round.[22] An implementation in C/C++ appears below.

#include <stdint.h>
#define ROTL(a,b) (((a) << (b)) | ((a) >> (32 - (b))))
#define QR(a, b, c, d) (             \
	a += b, d ^= a, d = ROTL(d, 16), \
	c += d, b ^= c, b = ROTL(b, 12), \
	a += b, d ^= a, d = ROTL(d,  8), \
	c += d, b ^= c, b = ROTL(b,  7))
#define ROUNDS 20

void chacha_block(uint32_t out[16], uint32_t const in[16])
{
	int i;
	uint32_t x[16];

	for (i = 0; i < 16; ++i)
		x[i] = in[i];
	// 10 loops × 2 rounds/loop = 20 rounds
	for (i = 0; i < ROUNDS; i += 2) {
		// Odd round
		QR(x[0], x[4], x[ 8], x[12]); // column 1
		QR(x[1], x[5], x[ 9], x[13]); // column 2
		QR(x[2], x[6], x[10], x[14]); // column 3
		QR(x[3], x[7], x[11], x[15]); // column 4
		// Even round
		QR(x[0], x[5], x[10], x[15]); // diagonal 1 (main diagonal)
		QR(x[1], x[6], x[11], x[12]); // diagonal 2
		QR(x[2], x[7], x[ 8], x[13]); // diagonal 3
		QR(x[3], x[4], x[ 9], x[14]); // diagonal 4
	}
	for (i = 0; i < 16; ++i)
		out[i] = x[i] + in[i];
}

ChaCha is the basis of the BLAKE hash function, a finalist in the NIST hash function competition, and its faster successors BLAKE2 and BLAKE3. It also defines a variant using sixteen 64-bit words (1024 bits of state), with correspondingly adjusted rotation constants.

XChaCha

[edit]

Although not announced by Bernstein, the security proof of XSalsa20 extends straightforwardly to an analogous XChaCha cipher. Use the key and the first 128 bits of the nonce (in input words 12 through 15) to form a ChaCha input block, then perform the block operation (omitting the final addition). Output words 0–3 and 12–15 (those words corresponding to non-key words of the input) then form the key used for ordinary ChaCha (with the last 64 bits of nonce and 64 bits of block counter).[23]

Reduced-round ChaCha

[edit]

Aumasson argues in 2020 that 8 rounds of ChaCha (ChaCha8) probably provides enough resistance to future cryptanalysis for the same security level, yielding a 2.5× speedup.[24] A compromise ChaCha12 (based on the eSTREAM recommendation of a 12-round Salsa)[25] also sees some use.[26] The eSTREAM benchmarking suite includes ChaCha8 and ChaCha12.[20]

ChaCha20 adoption

[edit]

Google had selected ChaCha20 along with Bernstein's Poly1305 message authentication code in SPDY, which was intended as a replacement for TLS over TCP.[27] In the process, they proposed a new authenticated encryption construction combining both algorithms, which is called ChaCha20-Poly1305. ChaCha20 and Poly1305 are now used in the QUIC protocol, which replaces SPDY and is used by HTTP/3.[28][29]

Shortly after Google's adoption for TLS, both the ChaCha20 and Poly1305 algorithms were also used for a new chacha20-poly1305@openssh.com cipher in OpenSSH.[30][31] Subsequently, this made it possible for OpenSSH to avoid any dependency on OpenSSL, via a compile-time option.[32]

ChaCha20 is also used for the arc4random random number generator in FreeBSD,[33] OpenBSD,[34] and NetBSD[35] operating systems, instead of the broken RC4, and in DragonFly BSD[36] for the CSPRNG subroutine of the kernel.[37][38] Starting from version 4.8, the Linux kernel uses the ChaCha20 algorithm to generate data for the nonblocking /dev/urandom device.[39][40][41] ChaCha8 is used for the default PRNG in Golang.[42] Rust's CSPRNG uses ChaCha12.[25]

ChaCha20 usually offers better performance than the more prevalent Advanced Encryption Standard (AES) algorithm on systems where the CPU does not feature AES acceleration (such as the AES instruction set for x86 processors). As a result, ChaCha20 is sometimes preferred over AES in certain use cases involving mobile devices, which mostly use ARM-based CPUs.[43][44] Specialized hardware accelerators for ChaCha20 are also less complex compared to AES accelerators.[45]

ChaCha20-Poly1305 (IETF version; see below) is the exclusive algorithm used by the WireGuard VPN system, as of protocol version 1.[46]

Adiantum (cipher) uses XChaCha12.[47]

Internet standards

[edit]

An implementation reference for ChaCha20 has been published in RFC 7539. The IETF's implementation modified Bernstein's published algorithm by changing the 64-bit nonce and 64-bit block counter to a 96-bit nonce and 32-bit block counter.[48] The name was not changed when the algorithm was modified, as it is cryptographically insignificant (both form what a cryptographer would recognize as a 128-bit nonce), but the interface change could be a source of confusion for developers. Because of the reduced block counter, the maximum message length that can be safely encrypted by the IETF's variant is 232 blocks of 64 bytes (256 GiB). For applications where this is not enough, such as file or disk encryption, RFC 7539 proposes using the original algorithm with 64-bit nonce.

Initial state of ChaCha20 (RFC 7539)[48]
"expa" "nd 3" "2-by" "te k"
Key Key Key Key
Key Key Key Key
Counter Nonce Nonce Nonce

Use of ChaCha20 in IKE and IPsec has been standardized in RFC 7634. Standardization of its use in TLS is published in RFC 7905.

In 2018, RFC 7539 was obsoleted by RFC 8439. RFC 8439 merges in some errata and adds additional security considerations.[49]

See also

[edit]
  • Speck – an add-rotate-xor cipher developed by the NSA
  • ChaCha20-Poly1305 – an AEAD scheme combining ChaCha20 with the Poly1305 MAC

Notes

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Salsa20 is a family of stream ciphers designed by cryptographer in 2005 and submitted to eSTREAM, the ECRYPT Stream Cipher Project, for evaluation as a high-speed software-oriented cipher. It operates by expanding a 256-bit key (with optional 128-bit key support), an 8-byte nonce, and a 64-bit block counter into a pseudorandom keystream using simple 32-bit operations—addition, XOR, and constant-distance rotations—without relying on S-boxes, data-dependent branches, or feedback from or . The core Salsa20 function processes a 64-byte input state through multiple quarter-round transformations to produce 64-byte output blocks, enabling parallel computation and to the keystream up to 2^70 bytes. The family includes variants differentiated by the number of rounds applied to the core function: Salsa20/20 with 20 rounds for optimal security, Salsa20/12 with 12 rounds balancing speed and security, and Salsa20/8 with 8 rounds for maximum performance in resource-constrained environments. Salsa20/12 was selected for inclusion in the final eSTREAM Phase 3 portfolio for software implementations (Profile 1), recognizing its efficiency and resistance to known attacks. On a Core 2 processor, Salsa20/20 achieves encryption speeds of approximately 3.93 cycles per byte, outperforming the AES block cipher in CTR mode (9.2 cycles per byte for 10 rounds), while reduced-round versions are even faster at 2.80 and 1.88 cycles per byte, respectively. Salsa20's emphasizes simplicity and portability to minimize implementation vulnerabilities, such as timing attacks, by avoiding cache-dependent operations and ensuring constant-time execution. Security analyses have identified attacks on reduced-round variants, including a distinguisher for 6 rounds and a practical break for 7 rounds requiring about 2^153 operations, but full-round Salsa20/20 remains unbroken with a security margin estimated at 2^255 operations against differential and . The cipher has seen adoption in cryptographic libraries like libsodium and NaCl, due to its speed and proven robustness in software environments.

History and Development

Origins and Designer

Salsa20 was designed by , a prominent cryptographer known for his work on secure and efficient . As a refinement of his earlier Salsa10 design from November 2004, development of Salsa20 began in late 2004 or early 2005. Motivated by vulnerabilities in existing ciphers like AES, particularly susceptibility to cache-timing attacks that could leak keys through side-channel analysis, Bernstein sought to create a that inherently resisted such threats by avoiding lookup tables and relying on simple arithmetic operations. The core was introduced in March 2005, with the initial specification released in April 2005. Salsa20, initially referred to as Snuffle 2005, was first presented at the Symmetric Key Encryption Workshop (SKEW) in , , on May 26–27, 2005, where early discussions occurred. To encourage scrutiny, offered a $1000 prize for the best cryptanalytic results by the end of 2005, which was awarded to Paul Crowley for his truncated differential attack on a reduced-round version (five rounds). The full specification of the Salsa20 family was detailed in a 2007 , following its initial submission to the eSTREAM project in April 2005. The primary goals of Salsa20's design emphasized simplicity to facilitate security auditing, high-speed performance across diverse hardware platforms including embedded systems, and a substantial security margin exceeding that of block ciphers like AES, achieved through a 20-round core function providing 256-bit . These objectives positioned Salsa20 as a practical alternative for software-based , prioritizing verifiable over complexity while outperforming AES in cycle counts per byte on processors like and x86.

Initial Publication and Goals

Salsa20 was first introduced by Daniel J. Bernstein in 2005 as a candidate stream cipher for the eSTREAM project, with an initial presentation at the Symmetric Key Encryption Workshop (SKEW) that year, where early cryptanalytic results were also discussed. The design, originally termed Snuffle 2005, was detailed in an initial specification document released in April 2005, emphasizing its role as a high-performance alternative to existing stream ciphers. A comprehensive specification for the Salsa20 family appeared in Bernstein's paper presented at Fast Software Encryption (FSE) 2008, which formalized the cipher's structure and variants. The primary goals of Salsa20 were to achieve strong security margins while prioritizing software efficiency on general-purpose processors, without relying on specialized hardware accelerations such as table lookups or custom instructions. To this end, the employs 20 rounds of mixing in its core variant (Salsa20/20), selected to provide robust resistance against known attacks, with the fastest practical attack limited to a full 256-bit . It supports a 256-bit for high security, generates 64-byte (512-bit) output blocks from a key, 64-bit nonce, and 64-bit block counter, and uses only , , and XOR (ARX) operations on 32-bit words to ensure simplicity, auditability, and resistance to timing attacks. This design philosophy explicitly addressed shortcomings in contemporaries like , which suffered from key-dependent biases and vulnerability to state recovery attacks, by avoiding data-dependent branches and substitution tables that could introduce side-channel leaks or analytical weaknesses. Instead, Salsa20's ARX-based facilitates provable diffusion properties and uniform performance, achieving approximately 4 cycles per byte on contemporary CPUs for the 20-round version, making it suitable for resource-constrained software environments.

Core Algorithm

State Initialization

Salsa20 initializes its internal state as a 512-bit (64-byte) block, consisting of 16 32-bit words arranged in a 4×4 matrix. This state serves as the foundation for the core mixing operations and is derived from a 256-bit key (eight 32-bit words), a 64-bit nonce (two 32-bit words), a 64-bit block counter (two 32-bit words, initialized to zero for the first block), and four fixed 32-bit constant words derived from the ASCII string "expand 32-byte k" interpreted in little-endian byte order. The constant words, often denoted as σ (sigma), are placed at specific positions in the state array: σ₀ = 0x61707865 ("expa"), σ₁ = 0x3320646e ("nd 3"), σ₂ = 0x79622d32 ("2-by"), and σ₃ = 0x6b206574 ("te k"). These constants occupy positions 0, 5, 10, and 15 in the linear 16-word , corresponding to the of the 4×4 matrix. The key words are split across the state: the first four key words (k₀ to k₃) occupy positions 1 through 4, and the remaining four (k₄ to k₇) occupy positions 11 through 14. The nonce words (n₀ and n₁) are placed at positions 6 and 7, while the block counter words (c₀ = 0 and c₁ = 0 initially) are at positions 8 and 9. The resulting initial state array can be expressed as:

state[0] = σ₀ state[1] = k₀ state[2] = k₁ state[3] = k₂ state[4] = k₃ state[5] = σ₁ state[6] = n₀ state[7] = n₁ state[8] = c₀ (0) state[9] = c₁ (0) state[10] = σ₂ state[11] = k₄ state[12] = k₅ state[13] = k₆ state[14] = k₇ state[15] = σ₃

state[0] = σ₀ state[1] = k₀ state[2] = k₁ state[3] = k₂ state[4] = k₃ state[5] = σ₁ state[6] = n₀ state[7] = n₁ state[8] = c₀ (0) state[9] = c₁ (0) state[10] = σ₂ state[11] = k₄ state[12] = k₅ state[13] = k₆ state[14] = k₇ state[15] = σ₃

When viewed as a 4×4 matrix in row-major order, the state appears as follows:
Column 0Column 1Column 2Column 3
Row 0σ₀k₀k₁k₂
Row 1k₃σ₁n₀n₁
Row 2c₀ (0)c₁ (0)σ₂k₄
Row 3k₅k₆k₇σ₃
This arrangement positions the constants along the , the key words primarily in the first and last rows (with some spillover), the nonce in the latter part of the second row, and the counter in the initial part of the third row. All words are treated as 32-bit little-endian integers.

Quarter-Round Function

The quarter-round (QR) function serves as the fundamental nonlinear primitive in Salsa20, operating on four 32-bit little-endian words denoted as aa, bb, cc, and dd, to provide and mixing within the cipher's state. This function combines modulo 2322^{32}, bitwise XOR, and fixed left rotations, forming an ARX (Addition-Rotation-XOR) construction that ensures constant-time execution without reliance on table lookups or S-boxes, thereby mitigating timing attacks and enabling efficient implementation across hardware platforms. The quarter-round proceeds through a fixed sequence of eight operations, updating the inputs in place: b(a+d)b(mod232),bb7,c(b+a)c(mod232),cc9,d(c+b)d(mod232),dd13,a(d+c)a(mod232),aa18.\begin{align*} b &\leftarrow (a + d) \oplus b \pmod{2^{32}}, \\ b &\leftarrow b \ll 7, \\ c &\leftarrow (b + a) \oplus c \pmod{2^{32}}, \\ c &\leftarrow c \ll 9, \\ d &\leftarrow (c + b) \oplus d \pmod{2^{32}}, \\ d &\leftarrow d \ll 13, \\ a &\leftarrow (d + c) \oplus a \pmod{2^{32}}, \\ a &\leftarrow a \ll 18. \end{align*} These steps leverage the algebraic properties of XOR and modular addition to scramble the word values, with the rotations (7, 9, 13, and 18 bits) chosen to distribute bits effectively across the 32-bit words. In the Salsa20 state, represented as a 4×4 matrix of 32-bit words, the quarter-round is applied separately to the four words of each row during the row-round phase, followed by application to the four words of each column in the column-round phase, ensuring thorough intermixing between matrix dimensions without explicit transposition. This row-and-column application pattern repeats across multiple rounds, with the quarter-round providing the core nonlinearity that prevents linear attacks and promotes avalanche effects in the output keystream.

Core Mixing Rounds

The core mixing rounds in Salsa20 form the heart of its , which operates on a 16-word (512-bit) state represented as a 4x4 matrix of 32-bit words. These rounds apply the quarter-round function systematically to achieve rapid diffusion across the entire state, ensuring that changes in any input word propagate to all output words after a few iterations. The process consists of 20 rounds, grouped into 10 double-rounds, where each double-round alternates between row-wise and column-wise applications of the quarter-round to mix the state thoroughly. A double-round begins with a row-round, applying the quarter-round function to each of the four rows in the 4x4 state matrix, followed by a column-round that applies it to each of the four columns. This alternation ensures orthogonal mixing: the row-round diffuses data horizontally, while the column-round diffuses it vertically, promoting effects where a single-bit change affects approximately half the output bits per round. The odd-numbered rounds (3, 5, ..., 19) specifically follow row-wise mixing with column-wise operations, maintaining the pattern of horizontal-then-vertical throughout the 10 double-rounds. This structure, with 8 quarter-rounds per double-round (4 rows + 4 columns), totals 80 quarter-round invocations over 20 rounds, providing strong nonlinear mixing without relying on S-boxes. After the final double-round, the output is computed by adding the initial state to the fully mixed state, with all operations performed modulo 2322^{32} on each 32-bit word. This serialization step, known as the "finalization," leverages the to wrap around values and complete the , producing a 64-byte output block from the 64-byte input state. The addition ensures that the output retains properties of the input while incorporating the chaotic mixing from the rounds, contributing to Salsa20's resistance to differential attacks. The round structure can be illustrated in pseudocode as follows, emphasizing the iterative application and diffusion:

function salsa20_core(input_state[16]): state = copy(input_state) // 4x4 matrix of 32-bit words for round in 0 to 19: // 20 rounds total if round % 2 == 0: // Even rounds: row-round for i in 0 to 3: quarter_round(state[0+i], state[1+i], state[2+i], state[3+i]) else: // Odd rounds: column-round for i in 0 to 3: quarter_round(state[0*4+i], state[1*4+i], state[2*4+i], state[3*4+i]) output = [0]*16 for i in 0 to 15: output[i] = (state[i] + input_state[i]) mod 2^32 return output // As 64-byte little-endian block

function salsa20_core(input_state[16]): state = copy(input_state) // 4x4 matrix of 32-bit words for round in 0 to 19: // 20 rounds total if round % 2 == 0: // Even rounds: row-round for i in 0 to 3: quarter_round(state[0+i], state[1+i], state[2+i], state[3+i]) else: // Odd rounds: column-round for i in 0 to 3: quarter_round(state[0*4+i], state[1*4+i], state[2*4+i], state[3*4+i]) output = [0]*16 for i in 0 to 15: output[i] = (state[i] + input_state[i]) mod 2^32 return output // As 64-byte little-endian block

This pseudocode highlights how the quarter-round (briefly, a nonlinear combination of addition, XOR, and rotation on four words) is sequenced to maximize inter-word dependencies, achieving full diffusion by the second or third double-round.

Input and Output Mechanics

Key and Nonce Handling

Salsa20 employs a 256-bit key, consisting of 32 bytes, which is divided into two 128-bit halves for direct insertion into the cipher's 64-byte state without any preprocessing or key schedule, enhancing simplicity and computational efficiency. The first half occupies state positions 1 through 4, while the second half is placed in positions 11 through 14, using little-endian byte order for all 32-bit words. This direct loading approach avoids the overhead of key expansion functions found in other ciphers, allowing for faster initialization. The also supports 128-bit keys as an option, achieved by repeating the 16-byte key material in both the first (positions 1–4) and second (positions 11–14) slots of the state, though the 256-bit variant is recommended for primary use due to its stronger security margin. In both cases, the key setup constants differ: the string "expand 32-byte k" (encoded as σ values) is used for 256-bit keys, while "expand 16-byte k" (τ values) applies to 128-bit keys, ensuring compatibility without altering mixing . Complementing the key, Salsa20 uses a 64-bit nonce (8 bytes) that must be unique for each message encrypted under the same key to prevent reuse attacks and maintain security. This nonce is loaded directly into state positions 6 and 7, again in little-endian format, with no expansion or derivation required. To mitigate potential side-channel vulnerabilities, such as timing leaks during state preparation, implementations perform key and nonce insertion in constant time, leveraging the cipher's design that relies solely on fixed-distance operations like additions, XORs, and rotations.

Stream Generation Process

The stream generation in Salsa20 uses a counter mode construction, where the core is invoked repeatedly to produce successive 64-byte keystream blocks from the fixed key and nonce. The process begins by initializing a 64-byte (16-word) state array, with the 64-bit counter—represented as two 32-bit little-endian words—placed in state positions 8 and 9, starting at zero. The state is then mixed using the Salsa20 core function, which applies 20 quarter-round operations organized into 10 double rounds, yielding a permuted and added-back state. This mixed state is serialized into a 64-byte keystream block by converting each of the 16 32-bit words to 4 bytes in little-endian order. To continue generating the stream, the counter is incremented by 1 as a 64-bit , updating positions 8 and 9, while the key and nonce remain fixed in their state positions. The updated state undergoes the same mixing process, producing the next 64-byte block, which is appended to the keystream. This loop repeats as needed, allowing for arbitrarily long streams up to the counter's maximum value. is performed by XORing the generated keystream with the message, truncating the keystream to match the message length; the same keystream XORs with for decryption. For security, Salsa20 limits output to 2^{64} blocks (2^{70} bytes) per key-nonce pair, after which rekeying with a new key (and optionally a new nonce) is recommended to prevent potential attacks from counter exhaustion.

XSalsa20 Extension

XSalsa20 is a variant of the that extends the nonce length to 192 bits (24 bytes) while maintaining the same 256-bit and 64-bit block counter, enabling better handling of randomness in cryptographic protocols where nonce reuse or predictability could pose risks. This extension was developed by and introduced specifically for use in the Networking and Cryptography (NaCl), where it serves as the basis for the crypto_stream function to provide high-speed stream encryption with enhanced misuse resistance. The construction of XSalsa20 begins by processing the 192-bit nonce, which is divided into a 128-bit (16-byte) prefix and a 64-bit (8-byte) suffix. The prefix, combined with the 256-bit (32-byte) key, is fed into HSalsa20 to derive a 256-bit subkey. HSalsa20 operates on a 512-bit input block consisting of the Salsa20 constants in positions 0, 5, 10, and 15; the key split across positions 1–4 and 11–14; and the 128-bit nonce prefix in positions 6–9. It applies 20 rounds of the Salsa20 core function—structured as 10 double rounds (each double round comprising a columnround followed by a rowround)—to mix the state, then outputs a 256-bit subkey by selecting and concatenating specific words from the resulting state (z0, z5, z10, z15, z6, z7, z8, z9) without performing the final addition of the original input block to the mixed state, distinguishing it from the full Salsa20 core. With the subkey in place, XSalsa20 proceeds using the standard Salsa20 mechanism: the subkey acts as the effective key, the 64-bit nonce suffix serves as the nonce, and the 64-bit block counter increments as usual to generate successive 64-byte keystream blocks via 20 rounds of the Salsa20 core. This two-stage process ensures that the initial key derivation incorporates a large portion of the nonce, reducing the impact of nonce reuse or poor in the shorter suffix, thereby improving overall security in scenarios like network protocols where nonces may be generated from less secure sources.

ChaCha Variant

Design Improvements

ChaCha, introduced by in 2008 as a variant of Salsa20, incorporates targeted modifications to the quarter-round function aimed at enhancing efficiency while preserving the core security properties. These tweaks enable each word in the state matrix to be updated twice per round, compared to once in Salsa20, resulting in improved where a single quarter-round in ChaCha alters approximately 12.5 bits of output on average (in the absence of carries), versus 8 bits for Salsa20. A primary structural change lies in the quarter-round application: even rounds in ChaCha perform additions along columns (similar to Salsa20's column rounds), while odd rounds apply quarter-rounds diagonally across the 4x4 state matrix, replacing Salsa20's alternating row and column orientations. This diagonal mixing promotes faster of changes throughout the state, contributing to better overall . Additionally, words remain the same as in Salsa20—encoding the string "expand 32-byte k" in little-endian format—but the cipher is distinctly branded as ChaCha to reflect these evolutions. The amounts in the quarter-round are also optimized to 16, 12, 8, and 7 bits, differing from Salsa20's 7, 9, 13, and 18 bits; this adjustment yields negligible impact on security but provides a slight speed advantage on certain platforms without altering the ARX (, , XOR) paradigm central to both designs. These refinements yield measurable performance gains on modern CPUs, with ChaCha demonstrating up to 28% faster execution in benchmarks on processors like the for 8-round (3.87 cycles per byte versus Salsa20's 5.39), and consistent or superior speeds across architectures such as (about 5% improvement). The standard configuration employs 20 rounds for robust , though reduced with 8 or 12 rounds are viable for scenarios prioritizing speed, mirroring Salsa20's flexibility while benefiting from the enhanced mixing. Overall, these changes maintain Salsa20's parallelism and vectorizability but reduce register usage in implementations, streamlining deployment in software environments.

XChaCha20

XChaCha20 is a variant of the ChaCha20 stream cipher that extends the nonce size to 192 bits, enabling the safe use of random nonces without concerns in protocols requiring large nonce spaces. This design addresses limitations in the standard 96-bit nonce of ChaCha20 by deriving a subkey through a specialized key-stretching function, thereby distributing nonce across multiple invocations. Developed as part of the NaCl (Networking and Cryptography library) and libsodium ecosystem, XChaCha20 facilitates secure stream generation for applications like schemes. The core mechanism of XChaCha20 relies on HChaCha20, a hash function derived from the ChaCha20 quarter-round operations without the final input XOR step. HChaCha20 processes the 256-bit key concatenated with the first 128 bits (16 bytes) of the 192-bit nonce as input, executing 20 rounds of the ChaCha core to produce a 256-bit subkey from the first and last 128 bits of the resulting state. Subsequently, standard ChaCha20 is applied using this subkey, a 64-bit nonce formed by the remaining 64 bits of the original nonce (prefixed with four zero bytes to form a 96-bit nonce), and a block counter starting at 0 (or 1 in AEAD modes). This two-stage process mirrors the structure of XSalsa20 but substitutes ChaCha's quarter-round function for Salsa20's, enhancing compatibility with ChaCha-based systems while preserving computational efficiency. In practice, XChaCha20 is integrated into protocols that benefit from its extended nonce, such as , where it secures cookie reply packets using XChaCha20-Poly1305 with a random 192-bit nonce per packet to mitigate replay risks in UDP-based communications. Its security inherits from ChaCha20's proven resistance to cryptanalytic attacks, with the HChaCha20 derivation providing nonce-misuse resistance analogous to that analyzed for XSalsa20, ensuring negligible collision probability even after generating approximately 2^96 keystream blocks. As an unauthenticated stream cipher, XChaCha20 is typically paired with a MAC like Poly1305 for integrity in AEAD constructions.

Reduced-Round ChaCha

Reduced-round variants of ChaCha, such as ChaCha8 and ChaCha12, employ fewer iterations of the core mixing function compared to the standard ChaCha20, which uses 10 double rounds (each consisting of a column round followed by a diagonal round). Specifically, ChaCha8 utilizes 4 double rounds, while ChaCha12 uses 6 double rounds, reducing computational overhead while aiming to preserve essential cryptographic strength. These variants were introduced alongside the full-round version to offer configurable security levels based on application needs. The primary trade-off in these reduced-round designs is increased performance at the cost of a narrower margin. For instance, ChaCha12 achieves approximately 1.67 times the speed of ChaCha20 due to the proportional reduction in rounds, making it suitable for scenarios demanding high throughput. Despite this, ChaCha12 maintains 256-bit against known attacks, with the best practical limited to 7 rounds or fewer, providing a margin of at least 5 unbroken rounds. ChaCha8 offers even greater speed gains—up to 28% faster than equivalent Salsa20/8 implementations on certain architectures—but with a correspondingly tighter margin, rendering it appropriate only for less critical, speed-optimized contexts. Both variants remain resistant to differential and linear attacks that succeed against fewer than 8 rounds. In practice, while the full 20-round ChaCha20 is standardized in protocols like TLS 1.3 paired with Poly1305 for broad , reduced-round versions find use in resource-constrained environments. For example, XChaCha12—a nonce-extended form of ChaCha12—is integral to the authenticated encryption mode, designed for entry-level processors in mobile storage encryption, where it balances efficiency and security on hardware lacking AES acceleration. This deployment highlights the viability of reduced rounds for embedded systems prioritizing throughput without AES hardware support. Security analyses confirm that ChaCha's enhanced quarter-round function promotes superior per round compared to Salsa20—altering an average of 12.5 output bits per quarter-round versus 8—allowing reduced-round variants to achieve adequate state mixing with minimal degradation up to 12 rounds. This improved supports resistance to most known cryptanalytic techniques, including probabilistic neutral bits and rotational distinguishers, without compromising the overall essential for security. The full 20-round mixing provides the highest margin but is unnecessary for many high-throughput applications where 12 rounds suffice.

Security Analysis

Cryptanalysis Results

Salsa20 was designed to provide 256-bit security against distinguishing and key-recovery attacks for its full 20-round version, as conjectured by its creator Daniel J. Bernstein in 2005. This claim posits that the output of Salsa20/20 is computationally indistinguishable from a random stream, with no feasible attacks below 2^{256} operations. Subsequent cryptanalytic efforts have validated this for the full but identified weaknesses in reduced-round variants. In 2008, Aumasson et al. introduced a differential key-recovery attack on the 8-round version of Salsa20, exploiting probabilistic neutral bits to achieve a of approximately 2^{251} operations and requiring 2^8 known keystream bytes. This attack targets the 256-bit key variant and relies on truncated differentials over multiple rounds, marking the first practical beyond 7 rounds, though it remains far from threatening the full 20-round design. A 2012 attack by Shi et al. improved on 8 rounds of Salsa20, with an impractical of 2^{251} operations, building on earlier differential techniques but requiring conditions that violate standard usage assumptions. This work improved data and time efficiencies for reduced rounds but confirmed no feasible breaks for more than 9 rounds under chosen-IV scenarios. The XSalsa20 extension and ChaCha variant exhibit comparable security margins, with no practical attacks on their full-round implementations. The strongest known result is a 7-round differential-linear distinguisher for ChaCha, achieving success with around 2^{207} data and computations, which does not extend to key recovery in the full cipher. As of 2025, these security bounds hold, with recent refinements like a 2025 attack on Salsa20/8.5 at 2^{245.84} still deemed impractical and inapplicable to the 20-round version or variants.

Resistance to Known Attacks

Salsa20's core function consists of 20 rounds, comprising 10 double-rounds, which establishes a large security margin against cryptanalytic attacks compared to reduced-round variants and broken stream ciphers like that lack a robust round structure. This design choice ensures that even if attacks on fewer rounds succeed, the full version remains secure, with analyses confirming no differentials exceeding a probability of 21302^{-130} for up to 15 rounds, leaving a substantial buffer. The cipher's reliance on ARX operations—modular additions, fixed rotations, and XORs—enables constant-time implementations that inherently resist timing and simple side-channel attacks, as these primitives avoid data-dependent branches and table lookups common in other ciphers like AES. Natural software implementations on diverse CPUs execute in input-independent time, reducing leakage risks without additional countermeasures, though advanced may still require masking for hardware deployments. As of 2025, no known practical attacks compromise the full 20-round Salsa20, with all published cryptanalyses limited to reduced rounds and exceeding 21282^{128} complexity for the complete . Its 256-bit key size provides strong quantum resistance, as would require approximately 21282^{128} operations to brute-force, preserving 128-bit post-quantum . In scenarios of nonce misuse, such as reuse with the same key, Salsa20 degrades gracefully relative to biased ciphers like , as the full-round keystream exhibits no statistical weaknesses beyond direct XOR recovery of differences, without enabling key recovery or broader biases. This property stems from the core's hash-like block generation, where each 64-byte output depends uniquely on the key, nonce, and counter, limiting damage to affected blocks.

Adoption and Standards

eSTREAM Selection

The eSTREAM project, initiated in 2004 under the European Network of Excellence in Cryptography (ECRYPT) and funded by the , sought to identify innovative stream ciphers suitable for widespread adoption through a structured evaluation process spanning multiple phases and culminating in 2008. Salsa20, submitted by in 2005, progressed through the initial phases and reached Phase 3 without requiring any design alterations, demonstrating robust performance in preliminary assessments of speed and security. In April 2008, during Phase 3, the eSTREAM selected the 12-round variant, Salsa20/12, as a finalist for Profile 1, which targeted high-throughput software implementations; this choice highlighted its superior efficiency on general-purpose processors compared to alternatives. The commended Salsa20/12 for its remarkable speed—often outperforming established ciphers like AES in software environments—coupled with a conservative security margin derived from 20 full rounds in the primary variant, while noting the reduced-round version's adequate protection against known attacks. Competing in the software profile alongside designs like HC-128 and SOSEMANUK, Salsa20/12 prevailed due to its straightforward ARX-based construction, which facilitated easy implementation, auditing, and optimization without compromising security. Following the Phase 3 evaluation, the September 2008 revision of the eSTREAM portfolio formally recommended Salsa20/12 for Profile 1, marking it as a preferred option for software-oriented applications and boosting its profile in subsequent cryptographic developments.

Implementations and Protocols

Salsa20 and its variants have been integrated into several prominent cryptographic libraries for secure software applications. The Networking and Cryptography library (NaCl), developed by and others, incorporates XSalsa20 combined with Poly1305 for in its secretbox , providing a high-level interface for symmetric that ensures both and . Libsodium, a portable of NaCl, maintains this XSalsa20-Poly1305 construction as its default for secret-key , offering bindings for multiple programming languages and emphasizing ease of correct use. BoringSSL, Google's security-focused of , supports ChaCha20 as a , including the ChaCha20-Poly1305 AEAD mode compliant with RFC 8439. In cryptographic protocols, Salsa20 variants enable secure communication channels. WireGuard, a modern VPN protocol, employs ChaCha20 for symmetric encryption alongside Poly1305 for message authentication, leveraging the Noise protocol framework for key exchange and achieving high performance on resource-constrained devices. The Secure Shell (SSH) protocol optionally supports ChaCha20-Poly1305 as an authenticated encryption mode via the [email protected] cipher, introduced in OpenSSH to provide a fast alternative to AES-based ciphers, particularly beneficial on systems without hardware acceleration. Performance benchmarks highlight Salsa20's efficiency in software implementations. On modern x86 processors, optimized Salsa20 variants achieve speeds of approximately 4 cycles per byte, enabling multi-gigabyte-per-second throughput without specialized hardware. These implementations are highly portable, with adaptations for architectures demonstrating reasonable speeds, such as 69 cycles per byte on older ARM920T processors, and further optimizations using instructions for contemporary devices. Open-source reference implementations and optimized ports underpin widespread adoption. provides a public-domain of Salsa20 in C, which serves as the basis for many and has been verified through extensive testing. Optimized ports include vectorized versions using SSE instructions on x86 and NEON on ARM, as well as GPU-accelerated variants for high-throughput scenarios, ensuring compatibility across diverse platforms.

Internet and Industry Use

ChaCha20, as an evolution of Salsa20, has seen significant standardization within IETF protocols, particularly through RFC 8439 published in 2018, which specifies the ChaCha20 stream cipher combined with Poly1305 for authenticated encryption in Transport Layer Security (TLS) and Internet Protocol Security (IPsec). This AEAD construction provides robust protection for network communications, offering an alternative to AES-based ciphers with better performance on software implementations. Further extensions include the eXtended-nonce ChaCha (XChaCha20), detailed in the IETF draft from the Crypto Forum Research Group (CFRG), which supports 192-bit nonces for enhanced security against nonce reuse while maintaining compatibility with existing ChaCha-based suites. In modern internet protocols, is integrated into via RFC 9001, which secures QUIC transport by leveraging TLS 1.3 and explicitly referencing the ChaCha20 function for packet protection. Adoption in web technologies began with early support in for cipher suites prior to the widespread rollout of TLS 1.3, enabling faster encryption on mobile and low-power devices. similarly enabled these suites across its network in 2015, contributing to their performance advantages in content delivery and . In industry applications, underpins in the Signal messaging protocol, where it secures message contents using the first 32 bytes of the message key as the cipher key. For , certain wallets, such as those in the ecosystem, employ ChaCha20 for encrypting wallet contents derived from passphrase-based key derivation functions. As of November 2025, no major new IETF RFC standards specifically targeting ChaCha20 have been published, though an active draft (draft-ietf-sshm-chacha20-poly1305) is progressing toward formal of ChaCha20-Poly1305 for the SSH protocol, and its role in TLS and continues to drive deployment in secure web and transport protocols.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.