Hubbry Logo
Chosen-plaintext attackChosen-plaintext attackMain
Open search
Chosen-plaintext attack
Community hub
Chosen-plaintext attack
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Chosen-plaintext attack
Chosen-plaintext attack
from Wikipedia

A chosen-plaintext attack (CPA) is an attack model for cryptanalysis which presumes that the attacker can obtain the ciphertexts for arbitrary plaintexts.[1] The goal of the attack is to gain information that reduces the security of the encryption scheme.[2]

Modern ciphers aim to provide semantic security, also known as ciphertext indistinguishability under chosen-plaintext attack, and they are therefore, by design, generally immune to chosen-plaintext attacks if correctly implemented.

Introduction

[edit]

In a chosen-plaintext attack the adversary can (possibly adaptively) ask for the ciphertexts of arbitrary plaintext messages. This is formalized by allowing the adversary to interact with an encryption oracle, viewed as a black box. The attacker’s goal is to reveal all or a part of the secret encryption key.

It may seem infeasible in practice that an attacker could obtain ciphertexts for given plaintexts. However, modern cryptography is implemented in software or hardware and is used for a diverse range of applications; for many cases, a chosen-plaintext attack is often very feasible (see also In practice). Chosen-plaintext attacks become extremely important in the context of public key cryptography where the encryption key is public and so attackers can encrypt any plaintext they choose.

Different forms

[edit]

There are two forms of chosen-plaintext attacks:

  • Batch chosen-plaintext attack, where the adversary chooses all of the plaintexts before seeing any of the corresponding ciphertexts. This is often the meaning intended by "chosen-plaintext attack" when this is not qualified.
  • Adaptive chosen-plaintext attack (CPA2), where the adversary can request the ciphertexts of additional plaintexts after seeing the ciphertexts for some plaintexts.

General method of an attack

[edit]

A general batch chosen-plaintext attack is carried out as follows [failed verification]:

  1. The attacker may choose n plaintexts. (This parameter n is specified as part of the attack model, it may or may not be bounded.)
  2. The attacker then sends these n plaintexts to the encryption oracle.
  3. The encryption oracle will then encrypt the attacker's plaintexts and send them back to the attacker.
  4. The attacker receives n ciphertexts back from the oracle, in such a way that the attacker knows which ciphertext corresponds to each plaintext.
  5. Based on the plaintext–ciphertext pairs, the attacker can attempt to extract the key used by the oracle to encode the plaintexts. Since the attacker in this type of attack is free to craft the plaintext to match his needs, the attack complexity may be reduced.

Consider the following extension of the above situation. After the last step,

  1. The adversary outputs two plaintexts m0 and m1.
  2. A bit b is chosen uniformly at random .
  3. The adversary receives the encryption of mb, and attempts to "guess" which plaintext it received, and outputs a bit b'.

A cipher has indistinguishable encryptions under a chosen-plaintext attack if after running the above experiment the adversary can't guess correctly (b=b') with probability non-negligibly better than 1/2.[3]

Examples

[edit]

The following examples demonstrate how some ciphers that meet other security definitions may be broken with a chosen-plaintext attack.

Caesar cipher

[edit]

The following attack on the Caesar cipher allows full recovery of the secret key:

  1. Suppose the adversary sends the message: Attack at dawn,
  2. and the oracle returns Nggnpx ng qnja.
  3. The adversary can then work through to recover the key in the same way as a Caesar cipher. The adversary could deduce the substitutions AN, TG and so on. This would lead the adversary to determine that 13 was the key used in the Caesar cipher.

With more intricate or complex encryption methodologies the decryption method becomes more resource-intensive, however, the core concept is still relatively the same.

One-time pads

[edit]

The following attack on a one-time pad allows full recovery of the secret key. Suppose the message length and key length are equal to n.

  1. The adversary sends a string consisting of n zeroes to the oracle.
  2. The oracle returns the bitwise exclusive-or of the key with the string of zeroes.
  3. The string returned by the oracle is the secret key.

While the one-time pad is used as an example of an information-theoretically secure cryptosystem, this security only holds under security definitions weaker than CPA security. This is because under the formal definition of CPA security the encryption oracle has no state. This vulnerability may not be applicable to all practical implementations – the one-time pad can still be made secure if key reuse is avoided (hence the name "one-time" pad).

In practice

[edit]

In World War II US Navy cryptanalysts discovered that Japan was planning to attack a location referred to as "AF". They believed that "AF" might be Midway Island, because other locations in the Hawaiian Islands had codewords that began with "A". To prove their hypothesis that "AF" corresponded to "Midway Island" they asked the US forces at Midway to send a plaintext message about low supplies. The Japanese intercepted the message and immediately reported to their superiors that "AF" was low on water, confirming the Navy's hypothesis and allowing them to position their force to win the battle.[3][4]

Also during World War II, Allied codebreakers at Bletchley Park would sometimes ask the Royal Air Force to lay mines at a position that didn't have any abbreviations or alternatives in the German naval system's grid reference. The hope was that the Germans, seeing the mines, would use an Enigma machine to encrypt a warning message about the mines and an "all clear" message after they were removed, giving the allies enough information about the message to break the German naval Enigma. This process of planting a known-plaintext was called gardening.[5] Allied codebreakers also helped craft messages sent by double agent Juan Pujol García, whose encrypted radio reports were received in Madrid, manually decrypted, and then re-encrypted with an Enigma machine for transmission to Berlin.[6] This helped the codebreakers decrypt the code used on the second leg, having supplied the original text.[7]

In modern day, chosen-plaintext attacks (CPAs) are often used to break symmetric ciphers. To be considered CPA-secure, the symmetric cipher must not be vulnerable to chosen-plaintext attacks. Thus, it is important for symmetric cipher implementors to understand how an attacker would attempt to break their cipher and make relevant improvements.

For some chosen-plaintext attacks, only a small part of the plaintext may need to be chosen by the attacker; such attacks are known as plaintext injection attacks.

Relation to other attacks

[edit]

A chosen-plaintext attack is more powerful than known-plaintext attack, because the attacker can directly target specific terms or patterns without having to wait for these to appear naturally, allowing faster gathering of data relevant to cryptanalysis. Therefore, any cipher that prevents chosen-plaintext attacks is also secure against known-plaintext and ciphertext-only attacks.

However, a chosen-plaintext attack is less powerful than a chosen-ciphertext attack, where the attacker can obtain the plaintexts of arbitrary ciphertexts. A CCA-attacker can sometimes break a CPA-secure system.[3] For example, the El Gamal cipher is secure against chosen plaintext attacks, but vulnerable to chosen ciphertext attacks because it is unconditionally malleable.

See also

[edit]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
A chosen- attack (CPA) is a cryptanalytic attack model in which an adversary can select arbitrary plaintext messages of their own choosing and obtain the corresponding ciphertexts produced by the target encryption scheme under a fixed secret key, with the goal of deriving information about the key or decrypting other ciphertexts. This attack is more powerful than a , where the adversary is limited to analyzing pre-existing plaintext-ciphertext pairs, as it allows the attacker to tailor inputs to probe the system's weaknesses systematically. CPAs are classified into non-adaptive and adaptive variants: in the non-adaptive case, all chosen plaintexts are submitted in advance without feedback, while in the adaptive chosen-plaintext attack, the adversary can dynamically adjust subsequent plaintext choices based on the ciphertexts received from prior queries. In modern , resistance to chosen-plaintext attacks forms a foundational notion, exemplified by indistinguishability under chosen-plaintext attack (IND-CPA), which requires that even with access to chosen , a probabilistic polynomial-time adversary cannot distinguish the of one target plaintext from another with more than negligible advantage. This model underpins the design of secure symmetric and public-key schemes, ensuring they leak no useful information about plaintexts or keys despite active adversary interaction. Although mounting a full CPA in practice is challenging—often requiring access to an , such as through a compromised device or service—it serves as a rigorous benchmark for evaluating strength, originating from early discussions in where it was recognized as a critical threat surpassing ciphertext-only attacks. Notable vulnerabilities under CPA models have influenced standards like AES and RSA-OAEP, emphasizing the need for probabilistic to achieve security.

Fundamentals

Definition

A chosen-plaintext attack (CPA) is a model of in which an adversary can select arbitrary messages and obtain the corresponding ciphertexts produced by an scheme under an unknown secret key, with the objective of recovering the key or enabling decryption of additional ciphertexts. This attack assumes active interaction with the encryption process, distinguishing it from passive , where the adversary only observes existing ciphertexts without influencing the inputs. Key terms in this context include , which refers to the original, unencrypted message; ciphertext, the encrypted output; and encryption oracle, a conceptual black-box interface that encrypts chosen plaintexts without disclosing the underlying key. The adversary, denoted as A\mathcal{A}, interacts with this oracle by submitting plaintexts mm and receiving ciphertexts c=Ek(m)c = E_k(m), where EE is the encryption function and kk is the secret key hidden from A\mathcal{A}. The concept of chosen-plaintext attacks was first formalized in the 1970s amid the development of , notably in the seminal work by Diffie and Hellman, who highlighted such attacks as a critical threat to systems beyond mere ciphertext-only .

Adversarial Model

In the chosen-plaintext attack (CPA) model, the adversary is classified as active, interacting directly with the system by submitting plaintexts of its choice to obtain corresponding ciphertexts, in contrast to passive adversaries who only observe existing communications. This active role enables the attacker to gather targeted information about the process. Adversaries are further distinguished as adaptive or non-adaptive: adaptive adversaries may choose subsequent plaintexts based on responses to prior queries, allowing dynamic strategy refinement, while non-adaptive (or batch) adversaries submit all plaintexts in advance without feedback. The adaptive variant is the standard assumption in formal CPA definitions, as it captures more powerful attacks. Adversaries in the CPA model are constrained to be probabilistic polynomial-time (PPT) algorithms, meaning they run in time polynomial in the security parameter nn, and their behavior incorporates to model realistic computational limitations. They are permitted a polynomial number q(n)q(n) of queries to the encryption oracle, ensuring the attack remains feasible only under bounded resources. These bounds reflect the foundational assumption in modern that no efficient algorithm can break secure systems with non-negligible probability. The primary goal of a CPA adversary is to distinguish between encryptions of two distinct messages or, more broadly, to recover partial about plaintexts, such as satisfying a predicate on the . Success is quantified by the distinguishing advantage ϵ\epsilon, defined as the between the probability of correctly the encrypted and random (12\frac{1}{2}), where a secure scheme ensures ϵ\epsilon is negligible in nn for all PPT adversaries. Secondary goals like full key recovery imply CPA vulnerability but are not the minimal requirement for the model. The adversary gains black-box access to an Enck()\text{Enc}_k(\cdot), which computes ciphertexts for chosen under a secret key kk without revealing kk or internal details of the algorithm. This simulates real-world scenarios where an attacker might control submission, such as in a compromised device or service, but lacks decryption capabilities. In the indistinguishability , the may be augmented to a left-or-right variant that encrypts one of two submitted messages, formalizing the distinguishing challenge.

Attack Methods

General Procedure

A chosen-plaintext attack (CPA) follows a structured that leverages access to an encryption oracle, allowing the adversary to select plaintexts and observe corresponding ciphertexts under a fixed secret key. This general procedure applies across various cryptosystems, particularly block ciphers, and aims to recover the key or forge new ciphertexts by exploiting structural weaknesses. The process is divided into distinct phases to systematically probe and analyze the encryption function. The initialization phase involves setting up the adversarial environment, including establishing access to the and understanding the target cryptosystem's parameters, such as block size, key length, and round structure. During this setup, the adversary may perform preliminary tests with simple plaintexts to confirm oracle functionality and identify any observable behaviors, such as error messages or timing leaks, without yet committing to a full attack strategy. This phase ensures efficient resource allocation for subsequent steps, as seen in standard cryptanalytic frameworks for symmetric ciphers. In the query phase, the adversary adaptively or non-adaptively selects plaintexts to submit to the , receiving the resulting ciphertexts. Plaintext selection strategies are tailored to the anticipated ; for instance, choosing all-zero plaintexts can probe substitution patterns in the cipher's S-boxes, revealing key-dependent mappings, while structured inputs like those with specific bit differences target differential properties. The number of queries varies by attack type—representative examples include 2^{43} plaintexts for on DES or fewer for simpler systems—but the goal is to generate plaintext-ciphertext pairs that amplify exploitable biases or correlations. This phase may involve multiple iterations, with each new plaintext potentially informed by prior responses in adaptive scenarios. The analysis phase processes the collected ciphertexts using techniques such as statistical analysis to detect non-random distributions, linear approximations to approximate key bits via high-probability equations over plaintext-ciphertext pairs, or exhaustive search over reduced key spaces derived from partial information. For example, in , the adversary computes correlations between linear combinations of input and output bits across queries to hypothesize subkey values, verifying them against observed data. These methods reduce the effective key space, enabling feasible computation even for large keys. Finally, the exploitation phase utilizes insights from analysis to achieve the attack's objective, such as full key recovery, decryption of arbitrary ciphertexts, or forgery of valid encryptions. Success is measured by the attack's ability to outperform random guessing, typically achieving key recovery with probability significantly greater than 1/2^{key length}, often exceeding 90% in optimized cases like linear attacks on reduced-round ciphers. The attack breaks the system if this advantage allows practical decryption or impersonation. To illustrate the query phase's iterative nature, the following pseudocode depicts a basic loop for submitting chosen plaintexts and collecting responses:

initialize empty list PT = [], CT = [] for i from 1 to num_queries: select plaintext m_i based on strategy (e.g., all zeros or differential pair) query [oracle](/page/Oracle): c_i = Enc_k(m_i) append m_i to PT append c_i to CT return PT, CT for analysis

initialize empty list PT = [], CT = [] for i from 1 to num_queries: select plaintext m_i based on strategy (e.g., all zeros or differential pair) query [oracle](/page/Oracle): c_i = Enc_k(m_i) append m_i to PT append c_i to CT return PT, CT for analysis

This loop can be adapted for adaptive queries by incorporating prior CT values into m_i selection.

Oracle-Based Techniques

In chosen-plaintext attacks (CPAs), oracle-based techniques exploit the encryption oracle by submitting carefully crafted plaintexts to obtain corresponding ciphertexts, thereby inferring key material or structural weaknesses in the . These methods prioritize strategic query selection to maximize gain per interaction, distinguishing them from brute-force or passive approaches. Seminal works emphasize the encryption oracle's role in enabling controlled experimentation, as seen in differential cryptanalysis where specific plaintext differences are chosen to propagate through the cipher rounds. Query optimization forms a core element of these techniques, particularly through adaptive querying, which refines subsequent choices based on prior responses. This iterative refinement allows attackers to efficiently explore the key space or plaintext dependencies, reducing overall query needs compared to non-adaptive strategies. A representative example occurs in of order-preserving encryption (OPE) schemes, where an adversary uses binary search on the domain to recover the ordering of encrypted values. Starting with a p=m+12p = \frac{m+1}{2} (where mm is the domain ), the attacker queries its encryption and compares it against known points; depending on the result, the search interval halves, achieving key dependency isolation in O(logN)O(\log N) queries, with NN the domain . Efficiency metrics underscore the practicality of oracle-based techniques, with query complexity often expressed as O(q)O(q), where qq is the number of required plaintext-ciphertext pairs. In differential cryptanalysis, for instance, breaking the full 16-round DES demands approximately 2472^{47} chosen plaintexts, paired with a of 2372^{37} equivalent encryptions for analysis, illustrating how query volume trades against computational resources. Space requirements scale linearly with qq for storing pairs, while adaptive optimizations like binary search minimize qq (e.g., logarithmic reductions) at the expense of heightened per-query processing. These trade-offs are critical, as excessive queries risk detection, whereas intensive computation may exceed feasible limits. Common pitfalls arise from the oracle's inherent constraints in pure CPA scenarios, which limit outputs to ciphertexts alone, excluding decryption access or feedback on validity—unlike chosen-ciphertext settings that enable padding oracle exploitation through error signals. Consequently, techniques simulating padding-like leaks via edge-case (e.g., malformed inputs causing internal validation failures) remain constrained, often yielding incomplete information. To mitigate detection, attackers adopt avoidance strategies such as randomizing query patterns or mimicking legitimate distributions, ensuring queries blend with normal oracle usage without triggering rate limits or logging anomalies.

Historical and Illustrative Examples

Caesar Cipher

The , dating to the 1st century BCE, was employed by for securing military and official communications, as documented in ancient accounts by and later analyzed in comprehensive histories of . This simple uses a fixed amount kk (typically between 1 and 25) as the key, where shifts each letter by kk positions in the , mathematically expressed as c=(p+k)mod26c = (p + k) \mod 26, with letters represented numerically (A = 0, B = 1, ..., Z = 25). The Caesar cipher exemplifies a basic vulnerability to chosen-plaintext attacks (CPAs) due to its deterministic encryption and limited key space of 25 possibilities. In a CPA scenario, the attacker accesses an encryption oracle to submit chosen plaintexts and receive corresponding ciphertexts, enabling key recovery without needing intercepted messages. To execute the attack, the attacker begins by querying the oracle with the single-character plaintext "A" (p = 0). The resulting ciphertext reveals the key directly: if the output is "D" (c = 3), then k=3k = 3, as (0+3)mod26=3(0 + 3) \mod 26 = 3. With the key known, the attacker can compute the decryption function p=(ck)mod26p = (c - k) \mod 26 for any ciphertext, fully compromising the system. For a step-by-step illustration assuming the attacker verifies the mapping across the alphabet (though unnecessary for key recovery), subsequent queries could include "B" (expected ciphertext "E" for k=3), "C" ("F"), and so on up to "Z" ("C"), confirming the uniform shift and exposing the entire substitution table:
A→D, B→E, C→F, ..., W→Z, X→A, Y→B, Z→C. In the worst case, exhaustively querying all 26 letters maps the full permutation, but a single targeted query like "A" suffices in practice.
This vulnerability arises from the cipher's linear, key-dependent structure, which allows complete key exposure through minimal chosen plaintexts, rendering it insecure against even basic adversarial access to the process. Early 20th-century texts, such as those by William , first detailed such systematic attacks on classical ciphers like the Caesar in pedagogical contexts.

One-Time Pad

The is a symmetric-key system in which the mm is encrypted by bitwise XOR with a secret key kk of equal length, producing the c=mkc = m \oplus k. Decryption reverses the process using the same key, as m=ckm = c \oplus k, since XOR is its own inverse. This construction requires the key to be generated from a truly uniform random source and used only once for a single of matching length. Theoretically, the provides perfect secrecy against chosen-plaintext attacks, meaning that even an adversary with unlimited computational resources and access to an cannot gain any information about the from the . proved in that this security holds if the key is truly random, uniformly distributed over all possible bit strings of the message length, and never reused, ensuring that every possible is equally likely given any . In a CPA setting, unlimited queries to the yield only random-looking ciphertexts indistinguishable from encryptions of arbitrary chosen plaintexts, preserving the key's secrecy. In practice, however, implementation flaws such as key reuse introduce severe vulnerabilities exploitable via chosen-plaintext techniques. During , the revealed how Soviet intelligence reused one-time pads under wartime production pressures, allowing U.S. cryptanalysts to break numerous diplomatic and espionage messages. By identifying reused pads and employing cribs—guessed common phrases or structures in plaintexts akin to chosen-plaintext submissions—the analysts recovered keys and full messages. A classic two-time pad attack demonstrates this: if the same key kk encrypts two distinct messages m1m_1 and m2m_2 to yield c1=m1kc_1 = m_1 \oplus k and c2=m2kc_2 = m_2 \oplus k, then XORing the ciphertexts gives c1c2=(m1k)(m2k)=m1m2c_1 \oplus c_2 = (m_1 \oplus k) \oplus (m_2 \oplus k) = m_1 \oplus m_2. Guessing one plaintext segment (e.g., via a crib) allows recovery of the other, cascading to the full messages and key. Similar reuse issues compromised U.S. diplomatic communications in the , where the State Department repeatedly used daily keys an average of nine times across messages, enabling enemy cryptanalysts to exploit patterns and recover sensitive content through methods analogous to CPA recoveries. These breaches led to significant intelligence leaks, underscoring how deviations from one-time use undermine the system's theoretical invulnerability.

Security Notions and Implications

Indistinguishability under CPA

Indistinguishability under chosen-plaintext attack (IND-CPA) is a fundamental security notion in modern that formalizes the requirement for an scheme to resist adversaries who can obtain encryptions of chosen plaintexts. In this model, the adversary interacts with an oracle during a query phase, submitting any plaintexts of its choice to receive corresponding ciphertexts, but without access to the secret key. Following this phase, the adversary selects two distinct plaintext messages m0m_0 and m1m_1 of equal length and receives the encryption of one of them, chosen uniformly at random (say mbm_b where b{0,1}b \in \{0,1\}). The adversary then outputs a guess bb' for bb, aiming to determine which message was encrypted based solely on the challenge ciphertext and prior oracle queries. The IND-CPA game proceeds in two main phases: the query phase, where the adversary A\mathcal{A} makes polynomially many encryption queries to the O\mathcal{O}, and the challenge phase, where A\mathcal{A} receives Enc(pk,mb)\text{Enc}(pk, m_b) and attempts to guess bb. The advantage of A\mathcal{A} in this game is defined as AdvA,ΠIND-CPA(λ)=Pr[b=b]12\text{Adv}^{\text{IND-CPA}}_{\mathcal{A},\Pi}(\lambda) = \left| \Pr[b' = b] - \frac{1}{2} \right|
Add your contribution
Related Hubs
User Avatar
No comments yet.