Hubbry Logo
logo
Handshake (computing)
Community hub

Handshake (computing)

logo
0 subscribers
Read side by side
from Wikipedia

In computing, a handshake is a process in which two devices establish a communication link by authenticating and validating each other's signals. An example is the handshaking between a hypervisor and an application in a guest virtual machine.

In telecommunications, a handshake is an automated process of negotiation between two participants (example "Alice and Bob") through the exchange of information that establishes the protocols of a communication link at the start of the communication, before full communication begins.[1] The handshaking process usually takes place in order to establish rules for communication when a computer attempts to communicate with another device. Signals are usually exchanged between two devices to establish a communication link. For example, when a computer communicates with another device such as a modem, the two devices will signal each other that they are switched on and ready to work, as well as to agree to which protocols are being used.[2]

Handshaking can negotiate parameters that are acceptable to equipment and systems at both ends of the communication channel, including information transfer rate, coding alphabet, parity, interrupt procedure, and other protocol or hardware features. Handshaking is a technique of communication between two entities. However, within TCP/IP RFCs, the term "handshake" is most commonly used to reference the TCP three-way handshake. For example, the term "handshake" is not present in RFCs covering FTP or SMTP. One exception is Transport Layer Security, TLS, setup, FTP RFC 4217. In place of the term "handshake", FTP RFC 3659 substitutes the term "conversation" for the passing of commands.[3][4][5]

A simple handshaking protocol might only involve the receiver sending a message meaning "I received your last message and I am ready for you to send me another one." A more complex handshaking protocol might allow the sender to ask the receiver if it is ready to receive or for the receiver to reply with a negative acknowledgement meaning "I did not receive your last message correctly, please resend it" (e.g., if the data was corrupted en route).[6]

Handshaking facilitates connecting relatively heterogeneous systems or equipment over a communication channel without the need for human intervention to set parameters.

Example

[edit]

TCP three-way handshake

[edit]
Example of three way handshaking

Establishing a normal TCP connection requires three separate steps:

  1. The first host (Alice) sends the second host (Bob) a "synchronize" (SYN) message with its own sequence number , which Bob receives.
  2. Bob replies with a synchronize-acknowledgment (SYN-ACK) message with its own sequence number and acknowledgement number , which Alice receives.
  3. Alice replies with an acknowledgment (ACK) message with acknowledgement number , which Bob receives and to which he doesn't need to reply.
In this setup, the synchronize messages act as service requests from one server to the other, while the acknowledgement messages return to the requesting server to let it know the message was received.

The reason for the client and server not using a default sequence number such as 0 for establishing the connection is to protect against two incarnations of the same connection reusing the same sequence number too soon, which means a segment from an earlier incarnation of a connection might interfere with a later incarnation of the connection.

SMTP

[edit]

The Simple Mail Transfer Protocol (SMTP) is the key Internet standard for email transmission. It includes handshaking to negotiate authentication, encryption and maximum message size.

TLS handshake

[edit]

When a Transport Layer Security (SSL or TLS) connection starts, the record encapsulates a "control" protocol—the handshake messaging protocol (content type 22). This protocol is used to exchange all the information required by both sides for the exchange of the actual application data by TLS. It defines the messages formatting or containing this information and the order of their exchange. These may vary according to the demands of the client and server—i.e., there are several possible procedures to set up the connection. This initial exchange results in a successful TLS connection (both parties ready to transfer application data with TLS) or an alert message (as specified below).

The protocol is used to negotiate the secure attributes of a session. (RFC 5246, p. 37)[7]

WPA2 wireless

[edit]

The WPA2 standard for wireless uses a four-way handshake defined in IEEE 802.11i-2004.

Dial-up access modems

[edit]

One classic example of handshaking is that of dial-up modems, which typically negotiate communication parameters for a brief period when a connection is first established, and there after use those parameters to provide optimal information transfer over the channel as a function of its quality and capacity. The "squealing" (which is actually a sound that changes in pitch 100 times every second) noises made by some modems with speaker output immediately after a connection is established are in fact the sounds of modems at both ends engaging in a handshaking procedure; once the procedure is completed, the speaker might be silenced, depending on the settings of operating system or the application controlling the modem.

Serial "Hardware Handshaking"

[edit]

This frequently used term describes the use of RTS and CTS signals over a serial interconnection. It is, however, not quite correct;[citation needed] it's not a true form of handshaking, and is better described as flow control.

Mobile device charging

[edit]

In mobile device chargers offering special quick-charge abilities to supported devices, the charging process will switch up to a higher output voltage for increased power transfer. But this could cause serious damage to an unsupported device or even result in a fire. It is therefore very important for the device and charger to first perform a handshake to "agree" on mutually supported charge parameters. If such a charger can't identify the connected device or determine its compatibility, it will default to normal but much slower charge parameters within the USB standard.

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
In computing, a handshake is a protocol dialogue between two systems for identifying and authenticating themselves to each other, or for synchronizing their operations with each other.[1] This process ensures reliable communication by establishing parameters such as data rates, error handling, and security measures before full data exchange begins. Handshakes are essential across various computing domains, including hardware interfaces, network protocols, and cryptographic exchanges, preventing issues like data loss or unauthorized access. In hardware contexts, handshaking typically involves the exchange of control signals over dedicated lines to coordinate data transfer between devices, such as in serial communication ports. For instance, Request to Send (RTS) and Clear to Send (CTS) signals enable hardware flow control, allowing a sender to pause transmission if the receiver is not ready, thus using five wires: transmit (TX), receive (RX), RTS, CTS, and ground.[2] This mechanism contrasts with software handshaking, which relies on embedded data characters like XON/XOFF for flow control without additional hardware. In networking, handshakes are critical for establishing connections in protocols like TCP and TLS. The TCP three-way handshake, defined in the original protocol specification, involves a client sending a SYN packet, the server responding with SYN-ACK, and the client replying with ACK to synchronize sequence numbers and confirm bidirectional reachability.[3] Similarly, the TLS handshake negotiates encryption parameters and authenticates parties, culminating in shared session keys for secure communication, as outlined in the TLS 1.3 standard.[4] These processes underpin much of modern internet traffic, ensuring reliability and confidentiality.

Overview

Definition

In computing, a handshake refers to an initial exchange of signals or messages between two devices, programs, or systems to establish, negotiate, or verify the parameters necessary for subsequent communication or data transfer. This process ensures that both parties are ready and capable of interacting under agreed-upon conditions, such as transmission speed or protocol compatibility.[5][6] Key characteristics of a computing handshake include its bidirectional nature, which facilitates synchronization between the communicating entities; negotiation of operational parameters, potentially encompassing data rates, authentication credentials, or session keys; and incorporation of mechanisms for detecting errors or incompatibilities during the setup phase. These elements collectively prepare a stable channel, preventing mismatches that could lead to failed interactions. Unlike unilateral signals, handshakes require mutual confirmation to proceed.[7][8] The term "handshake" derives from the human physical gesture symbolizing agreement or trust, and its analogy was adopted in early computing for data communication systems and network protocols.[9][10] A handshake differs from a simple acknowledgment (ACK), which is a basic, one-way confirmation of data receipt during an active session; in contrast, a handshake constitutes a structured, multi-step sequence dedicated to initial connection establishment rather than ongoing verification. In protocols like TCP, handshakes play a foundational role in synchronizing endpoints before data exchange begins.[11][12]

Purpose and Importance

Handshakes in computing primarily serve to establish reliable connections between communicating entities, such as devices or software processes, by synchronizing their operational states and confirming mutual readiness for data exchange.[7] This process allows the parties to negotiate essential parameters, including data transfer rates, encoding formats, and error-checking mechanisms, ensuring compatibility before any substantive communication occurs.[10] Additionally, handshakes facilitate authentication to verify the identities of participants and detect potential incompatibilities early, preventing wasted resources on mismatched interactions.[13] The benefits of handshakes are substantial, as they reduce transmission errors by incorporating mechanisms for detection and correction, such as acknowledgments, thereby enhancing overall data integrity.[10] By verifying identities and enabling encryption negotiation, handshakes bolster security against unauthorized access, while also promoting efficient resource allocation through synchronized flow control that avoids overwhelming slower components.[13] Furthermore, they support backward compatibility, allowing newer systems to interoperate with legacy hardware or software by dynamically adjusting to supported capabilities.[7] Despite these advantages, handshakes introduce drawbacks, including overhead in terms of time and bandwidth due to the multiple signal exchanges required, which can introduce latency in high-speed environments.[7] They are also susceptible to certain risks, such as resource exhaustion attacks that exploit the state-holding nature of the process, exemplified by SYN flooding where incomplete handshakes consume server memory and processing capacity.[14] Failure modes, like timeouts or mismatched responses, can lead to abrupt connection drops, necessitating retry mechanisms that further amplify overhead.[15] In modern computing, handshakes are indispensable for the scalability of distributed systems, where myriad devices must interoperate seamlessly across networks.[13] Their role is particularly vital in Internet of Things (IoT) ecosystems and cloud environments, enabling secure, synchronized data flows among heterogeneous devices while supporting real-time applications in industrial automation and beyond.[13] Without effective handshaking, the reliability and efficiency of these expansive, interconnected infrastructures would be severely compromised.[10]

Types of Handshakes

Software Handshakes

Software handshakes, also known as software flow control, involve the use of special control characters embedded within the data stream to coordinate data transmission between devices, without requiring dedicated hardware control lines. This method relies on in-band signaling over the existing transmit and receive data lines, typically in serial communications.[16][17] The primary mechanism uses ASCII control characters: XON (Transmit On, DC1, hexadecimal 0x11) to resume transmission and XOFF (Transmit Off, DC3, hexadecimal 0x13) to pause it. When a receiving device's buffer approaches capacity, it sends an XOFF character to the sender, which halts data flow until an XON is received. This process ensures synchronization and prevents buffer overflows, though it can be less reliable if control characters are corrupted or misinterpreted as data.[16][18] Software handshakes operate at the physical or data link layer of the OSI model and are commonly applied in asynchronous serial interfaces, such as RS-232 connections between computers and peripherals like printers or modems. They are particularly useful in scenarios with limited cabling, as no extra wires are needed beyond TX, RX, and ground. However, they introduce potential latency from processing control characters and risk data disruption if the protocol lacks error checking.[16] Compared to hardware handshakes, software methods offer flexibility for software-configurable systems but may incur higher CPU overhead for parsing control characters. In some setups, both can be combined, with hardware taking precedence if enabled.[19] The origins of software handshaking date to the early 1960s, coinciding with the development of ASCII (1963) and asynchronous teletypewriter systems, where control characters were used to manage transmission over telephone lines. It became standardized in serial communication protocols and remains in use for legacy and embedded systems.[16][20]

Hardware Handshakes

Hardware handshakes involve signal-based exchanges between devices using dedicated control pins or electrical lines at the physical or data link layer, enabling direct coordination without reliance on higher-layer software protocols.[21] These mechanisms ensure reliable data flow by signaling device readiness and managing transmission timing through voltage level changes, where +3 V to +15 V represents logic 0 (SPACE) and -3 V to -15 V represents logic 1 (MARK) in standards like RS-232.[22] This approach is particularly suited to environments where precise, low-level synchronization is required to prevent data loss or buffer overflows.[23] Common mechanisms include the Request-to-Send (RTS) and Clear-to-Send (CTS) signals, which provide hardware flow control in serial interfaces. In this protocol, the data terminal equipment (DTE), such as a computer, asserts the RTS line (pin 4) to indicate readiness to transmit data, prompting the data circuit-terminating equipment (DCE), like a modem, to assert CTS (pin 5) when it can receive, thereby initiating data transfer.[21] Additional signals, such as Data Terminal Ready (DTR) and Data Set Ready (DSR), may complement RTS/CTS by confirming overall connection status before communication begins.[22] These voltage-driven interchanges allow for simple synchronization without embedding control information in the data stream itself.[23] Hardware handshakes find primary application in point-to-point serial links, such as RS-232 connections between computers and peripherals like printers or sensors, where software cannot reliably predict timing due to variable processing delays.[21] For instance, in environmental control systems interfacing with devices like thermostats, RTS/CTS ensures half-duplex communication proceeds only when both ends are prepared, avoiding transmission errors in unreliable timing scenarios.[23] This is essential for legacy and embedded systems relying on direct electrical signaling for basic coordination.[22] Compared to software handshakes, hardware methods offer lower latency through immediate electrical responses, eliminating the need for packet-based acknowledgments and reducing CPU overhead.[24] They impose no additional data overhead, making them efficient for real-time flow control, though their scope is limited to straightforward negotiations such as readiness signaling rather than complex parameter exchanges like baud rate or parity settings.[21] In hybrid systems, hardware handshakes can integrate with software approaches to provide layered control for more robust communication.[24] The origins of hardware handshaking trace back to the 1960s, evolving from teletypewriter systems that required reliable signal coordination for early data transmission over telephone lines. These mechanisms were formalized in the EIA RS-232 standard, first published in 1962 by the Electronic Industries Association to standardize interfaces between data terminals and modems, with subsequent revisions ensuring compatibility across serial ports.[21][22]

Handshakes in Networking Protocols

TCP Three-Way Handshake

The TCP three-way handshake is a fundamental mechanism in the Transmission Control Protocol (TCP) for establishing a reliable, connection-oriented communication session between a client and a server. It consists of three sequential steps—Synchronize (SYN), Synchronize-Acknowledge (SYN-ACK), and Acknowledge (ACK)—designed to synchronize sequence numbers and verify bidirectional reachability, ensuring both endpoints can reliably exchange data without prior state assumptions. This process prevents issues like old or duplicate packets from previous connections interfering with new ones, as each side independently generates and confirms its initial sequence number (ISN). Defined in the original TCP specification, the handshake operates at the transport layer and is essential for TCP's reliability features, such as ordered delivery and error recovery.[25] The process begins with the client (active opener) initiating a connection by sending a SYN segment to the server (passive opener). This SYN packet includes the client's 32-bit ISN, randomly selected to avoid predictability, and sets the SYN control flag while leaving the acknowledgment number undefined. Upon receipt, the server responds with a SYN-ACK segment, which includes its own 32-bit ISN, sets both SYN and ACK flags, and specifies an acknowledgment number equal to the client's ISN plus one (ACK = client's ISN + 1) to confirm receipt of the SYN; this step consumes one sequence number from the server's side. Finally, the client sends an ACK segment acknowledging the server's ISN plus one (ACK = server's ISN + 1), completing the handshake and transitioning both endpoints to the ESTABLISHED state, where data transmission can begin. Each segment may carry TCP options, and the SYN and SYN-ACK packets do not carry application data to maintain the handshake's focus on connection setup.[25][26][27] During the handshake, several key parameters are exchanged to optimize the connection. The Maximum Segment Size (MSS) is advertised via a TCP option in the SYN and SYN-ACK segments, allowing each side to inform the other of its maximum receivable data payload size (typically derived from the interface MTU minus headers), though it is not formally negotiated but unilaterally stated for the peer to respect. The initial receive window size, a 16-bit field in the TCP header, is included in all segments to enable flow control by indicating the amount of buffer space available for incoming data, starting from the SYN-ACK onward. Additionally, support for Selective Acknowledgments (SACK) can be proposed through the SACK-Permitted option in the SYN segment, permitting the receiver to acknowledge non-contiguous blocks of data later in the session if both sides agree during the handshake. These parameters ensure efficient data transfer tailored to the network path.[28][29][30] Sequence number synchronization relies on a simple yet robust formula: each endpoint generates a random 32-bit ISN at the start, with the SYN segment advancing the sequence by one (as SYN consumes a slot), so subsequent data begins at ISN + 1; acknowledgments confirm this by setting ACK = peer's ISN + 1 in the response. This mutual confirmation—client acknowledging server's ISN + 1 in the final ACK, and server having already acknowledged the client's—ensures both sides agree on the starting point for byte-stream numbering, preventing desynchronization. In mathematical terms, for client ISN_c and server ISN_s:
SYN: SEQ=ISNc,CTL=SYN \text{SYN: } \text{SEQ} = \text{ISN}_c, \quad \text{CTL} = \text{SYN}
SYN-ACK: SEQ=ISNs,ACK=ISNc+1,CTL=SYN, ACK \text{SYN-ACK: } \text{SEQ} = \text{ISN}_s, \quad \text{ACK} = \text{ISN}_c + 1, \quad \text{CTL} = \text{SYN, ACK}
ACK: SEQ=ISNc+1,ACK=ISNs+1,CTL=ACK \text{ACK: } \text{SEQ} = \text{ISN}_c + 1, \quad \text{ACK} = \text{ISN}_s + 1, \quad \text{CTL} = \text{ACK}
The 32-bit space wraps around after 2322^{32} bytes, but the handshake's design minimizes collision risks through ISN randomization.[31][32][33] A notable security concern with the three-way handshake is vulnerability to SYN flooding attacks, where an attacker sends numerous SYN packets without completing the handshake, exhausting the server's half-open connection queue and denying service to legitimate clients. This exploits the server's need to allocate resources (like state and buffers) upon receiving a SYN, potentially leading to resource depletion. A common mitigation is the use of SYN cookies, a cryptographic technique where the server encodes state information into the ISN of the SYN-ACK without storing per-connection data; upon receiving the final ACK, the client-provided cookie is verified to restore state only for valid connections, reducing memory usage during floods.[34][35] The three-way handshake was originally standardized in RFC 793 in 1981 as part of the core TCP specification. Subsequent updates have maintained its core mechanics while enhancing compatibility, such as adaptations for IPv6 transport in RFC 2460, which ensure the handshake functions identically over IPv6 addresses without altering the sequence synchronization process, and further refinements in RFC 9293 (TCP specification update in 2022) for clarity and minor clarifications on options handling.[36]

TLS Handshake

The TLS handshake is a multi-phase cryptographic exchange protocol that establishes a secure communication session between a client and a server over a reliable transport layer, such as TCP, by authenticating the parties, negotiating cryptographic parameters, and deriving symmetric session keys for subsequent encrypted data transfer.[37] This process ensures confidentiality, integrity, and authenticity while preventing eavesdropping and tampering during session initiation.[38] The handshake begins with the ClientHello message, in which the client specifies supported TLS versions (e.g., up to 1.3), a list of preferred cipher suites defining encryption algorithms and key exchange methods, a 32-byte random nonce for freshness, and optional extensions such as supported key exchange groups (e.g., secp256r1 or X25519).[39] The server responds with a ServerHello message selecting the highest mutually supported version and cipher suite, its own 32-byte random nonce, and relevant extensions, followed by its X.509 certificate chain for public-key authentication.[40] The key exchange phase then occurs, typically using ephemeral Diffie-Hellman (DHE) or elliptic curve Diffie-Hellman (ECDHE) for forward secrecy, where the client and server exchange public values to compute a shared premaster secret without transmitting it directly; as of November 2025, hybrid post-quantum key exchanges—combining classical methods like X25519 with post-quantum algorithms such as ML-KEM (Kyber)—are widely adopted for quantum-resistant security, with over 50% of traffic on major platforms using such protections; alternatively, RSA-based exchange encrypts the premaster secret with the server's public key from the certificate.[41][42] The handshake concludes with Finished messages from both parties, each containing a message authentication code (MAC) computed over the entire handshake transcript using the newly derived keys to verify integrity and prevent replay attacks.[43] Central to the TLS handshake are concepts from public-key cryptography, which facilitate initial server (and optionally client) authentication via digital certificates, transitioning to efficient symmetric cryptography for the session's bulk encryption using algorithms like AES.[44] Extensions such as Server Name Indication (SNI) allow the client to specify the target hostname in the ClientHello, enabling virtual hosting on shared IP addresses without compromising security.[45] Key derivation combines the premaster secret with the client and server random nonces through a pseudorandom function (PRF); in earlier versions like TLS 1.2, this uses a PRF based on HMAC-SHA256, while TLS 1.3 employs HKDF (HMAC-based key derivation function) for enhanced security.[46] For instance, in TLS 1.3, traffic secrets are derived as:
Traffic Secret=HKDF-Expand-Label(Handshake Secret,"c ap traffic",0x,Hash.length) \text{Traffic Secret} = \text{HKDF-Expand-Label}(\text{Handshake Secret}, \text{"c ap traffic"}, \text{0x}, \text{Hash.length})
where HKDF-Expand-Label applies HKDF-Extract and HKDF-Expand with a label and context from the handshake transcript hash.[47] The protocol evolved from the insecure Secure Sockets Layer (SSL) 1.0, proposed by Netscape in 1994 and never released due to vulnerabilities, through SSL 2.0 (1995, flawed authentication) and SSL 3.0 (1996, basis for TLS), to TLS 1.0 (1999, RFC 2246) which formalized improvements like explicit IVs.[48] Subsequent versions addressed weaknesses: TLS 1.1 (2006, RFC 4346) mitigated CBC padding oracle attacks, TLS 1.2 (2008, RFC 5246) introduced flexible signature algorithms and AEAD ciphers, and TLS 1.3 (2018, RFC 8446) streamlined the handshake to a single round-trip (1-RTT) by integrating key exchange earlier and mandating forward secrecy.[49][38][37] TLS incorporates security features like Perfect Forward Secrecy (PFS), achieved through ephemeral key exchanges (e.g., ECDHE) that ensure compromise of long-term keys does not expose past sessions, and resistance to downgrade attacks via explicit version negotiation in extensions and authenticated checks in the ServerHello random field or Finished MACs.[40] These mechanisms build on an underlying reliable transport like TCP to focus solely on cryptographic security.[50]

SMTP Handshake

The SMTP handshake is the initial command-based exchange in the Simple Mail Transfer Protocol (SMTP) that establishes a session between a client Mail Transfer Agent (MTA) and a server MTA for email delivery, enabling capability negotiation and identity verification before message transfer begins.[51] This process occurs at the application layer over a TCP connection and differs from lower-layer handshakes by relying on textual commands and responses rather than binary flags or cryptographic keys. It ensures both parties agree on protocol features, such as support for extended capabilities, while identifying the domains involved in the transaction.[52] The handshake begins when the client initiates a TCP connection to the server on port 25, prompting the server to issue a 220 "Service ready" greeting, which may include the server's domain and software details.[53] The client then sends an EHLO (Extended SMTP) command followed by its fully qualified domain name (FQDN) or address literal, signaling support for SMTP extensions; alternatively, it uses the simpler HELO command for basic SMTP without extensions.[54] In response, the server replies with a multiline 250 "OK" status, listing supported extensions—such as AUTH for authentication or STARTTLS for opportunistic TLS encryption—in a parameterized format that allows the client to select compatible features.[55] If EHLO fails or extensions are unavailable, the client falls back to HELO, and the server confirms with a single 250 response, clearing any prior state.[54] Key parameters in the handshake include the client's domain identification via EHLO or HELO, which verifies the sender's origin, and the server's advertised features, such as 8BITMIME for transporting 8-bit text content without alteration or PIPELINING for sending multiple commands without waiting for individual responses to improve efficiency.[52] These extensions are registered with IANA and defined in separate RFCs, allowing modular enhancement of the base protocol.[56] Following the greeting exchange, the client may negotiate authentication using AUTH parameters or initiate encryption with STARTTLS if advertised, upgrading the session to TLS in a single sentence of integration without altering the core handshake flow.[57] Error handling during the handshake involves standardized reply codes; for instance, a 421 "Service not available" response indicates the server is busy or shutting down, prompting the client to close the connection and implement retry logic with exponential backoff.[58] Other errors, like 500 for syntax issues or 503 for invalid sequence, terminate the session immediately, distinguishing temporary (4xx) from permanent (5xx) failures to guide client behavior.[59] The SMTP handshake is standardized in RFC 5321, published in 2008, which consolidates and updates the original RFC 821 from 1982 by incorporating Extended SMTP (ESMTP) mechanisms for modern extensions while maintaining backward compatibility.[60][61] This evolution ensures robust session setup across diverse email infrastructures. In the broader email flow, the handshake establishes sender and receiver identities through domain parameters, paving the way for subsequent MAIL FROM and RCPT TO commands that specify paths without re-verifying basics.[62] For illustration, a typical EHLO exchange might appear as follows:
C: EHLO client.example.com
S: 250-server.example.com Hello client.example.com
S: 250-8BITMIME
S: 250-PIPELINING
S: 250 STARTTLS
S: 250 [OK](/page/OK)
This format highlights the server's multiline response, enabling the client to proceed with feature-aware commands.[63]

Handshakes in Wireless and Physical Communications

WPA2 Four-Way Handshake

The WPA2 four-way handshake is a cryptographic protocol exchange used in Wi-Fi Protected Access 2 (WPA2) to authenticate a client device (supplicant) and an access point (authenticator), confirm possession of a pre-shared key (PSK), and derive session-specific encryption keys for secure wireless communication. This process occurs after initial association and open system authentication in the IEEE 802.11 protocol, enabling the establishment of a Robust Security Network Association (RSNA). It protects unicast traffic with a pairwise transient key (PTK) and multicast/broadcast traffic with a group temporal key (GTK), using the Advanced Encryption Standard (AES) in Counter Mode with Cipher Block Chaining Message Authentication Code Protocol (CCMP). The handshake is defined in the IEEE 802.11i-2004 standard, which forms the basis for WPA2 certification by the Wi-Fi Alliance.[64] The handshake consists of four Extensible Authentication Protocol over LAN (EAPOL)-Key messages transmitted between the supplicant and authenticator over the 802.11 medium. In the first message, the authenticator initiates the exchange by sending an EAPOL-Key frame containing its randomly generated authenticator nonce (ANonce), along with the key confirmation key (KCK) identifier and replay counter, to the supplicant; this ANonce contributes to key freshness and prevents replay attacks. The supplicant responds in the second message with its own supplicant nonce (SNonce), the message integrity code (MIC) computed using the KCK to verify mutual possession of the pairwise master key (PMK), and its own replay counter, allowing both parties to derive the PTK independently. The authenticator then sends the third message, which includes a MIC for confirmation, the GTK encrypted with the key encryption key (KEK) derived from the PTK, and instructions for the supplicant to install the keys; upon receipt, the authenticator installs the PTK for unicast traffic. Finally, the supplicant acknowledges in the fourth message with an EAPOL-Key frame containing a MIC but no data, confirming key installation and completing the handshake, after which encrypted data transmission begins. The PTK is derived from the PMK, which is generated from the PSK using the Password-Based Key Derivation Function 2 (PBKDF2) with HMAC-SHA1, incorporating the network service set identifier (SSID) and passphrase; the PTK itself is computed via a pseudo-random function (PRF) using the PMK, ANonce, SNonce, and the MAC addresses of the supplicant and authenticator as inputs, resulting in a 384-bit key partitioned into the KCK (128 bits) for MIC calculations, KEK (128 bits) for GTK encryption, and temporal keys (128 bits) for data encryption. This derivation ensures that the PTK is unique per session and resistant to eavesdropping, as nonces provide entropy and the MIC in messages 2 and 3 mutually authenticates the parties without revealing the PSK. In enterprise modes, the PMK may instead derive from an Extensible Authentication Protocol (EAP) exchange under IEEE 802.1X, but the four-way handshake remains the same for key confirmation. A key security feature of the four-way handshake is its protection against offline dictionary attacks on the PSK, as the MIC requires online interaction and knowledge of the nonces to verify guesses, forcing attackers to perform resource-intensive on-path attempts rather than capturing and cracking handshakes offline. However, a vulnerability known as the Key Reinstallation Attack (KRACK), disclosed in 2017, exploited flaws in the handshake's nonce management and key installation logic, allowing an attacker within radio range to force repeated key reinstallations, decrypt some traffic, and potentially inject or replay packets without compromising the PSK itself; this was addressed through firmware updates and later superseded by WPA3's enhanced protections like simultaneous authentication of equals. The protocol's design in IEEE 802.11i-2004 has been widely adopted since WPA2's introduction, mandating CCMP encryption and the four-way handshake for certified devices to ensure interoperability and robust security in personal and enterprise Wi-Fi networks.[65][64]

Serial Hardware Handshaking

Serial hardware handshaking, also known as hardware flow control, employs dedicated control signals in serial communication standards like RS-232 and RS-485 to manage data transmission rates and prevent buffer overflows between devices.[66] In RS-232, common signals include Request to Send (RTS) on pin 4, Clear to Send (CTS) on pin 5, Data Terminal Ready (DTR) on pin 20, and Data Set Ready (DSR) on pin 6, which allow devices such as computers (Data Terminal Equipment, DTE) and modems (Data Circuit-terminating Equipment, DCE) to indicate readiness for data exchange.[67] These signals operate at electrical levels defined by the RS-232 standard, typically ±3 to ±15 volts, ensuring reliable detection over short distances up to 50 feet. The mechanics of RTS/CTS handshaking involve the sender (DTE) asserting the RTS signal high to request permission to transmit; the receiver (DCE) responds by asserting CTS high if its buffer has space, enabling data flow on the TX/RX lines.[68] If the receiver's buffer nears capacity, it deasserts CTS low, pausing transmission until space is available again, thus providing real-time pacing without software intervention.[66] DTR/DSR serves a similar but more static role, with the DTE asserting DTR to signal operational readiness and the DCE responding with DSR to confirm connection establishment, often used in conjunction with RTS/CTS for complete control.[67] In RS-485, which supports multi-drop networks over longer distances (up to 4000 feet), handshaking is adapted for half-duplex operation, where RTS may control a driver's enable/disable for transmit/receive switching, though full RTS/CTS is less common due to the two-wire differential setup.[69] Configurations for serial hardware handshaking support both full-duplex and half-duplex modes; in full-duplex RS-232, separate TX and RX lines allow simultaneous bidirectional communication with continuous RTS/CTS monitoring, while half-duplex alternatives like RS-485 rely on directional control via RTS to toggle between sending and receiving.[70] Software-assisted methods, such as XON/XOFF characters sent over the data line and triggered by hardware buffer thresholds, can complement pure hardware signals in mixed setups, though they require compatible protocol support.[66] During handshaking, parameters like baud rate, data bits, parity, and stop bits are typically established through initial synchronization rather than explicit negotiation, with the primary focus on flow control to maintain data integrity. For instance, devices synchronize at a predefined baud rate (e.g., 9600 bps), and handshaking ensures ongoing pacing without altering these settings mid-session.[67] Historically, serial hardware handshaking became standard in the 1960s with the RS-232 specification, initially for teletypewriters and modems to synchronize early data terminals with telecommunication equipment.[71] It remained prevalent through the 1980s and 1990s in PC serial ports for connecting printers, mice, and industrial sensors, but has been largely supplanted by USB for consumer applications since the early 2000s due to higher speeds and plug-and-play simplicity.[72] Nonetheless, it persists in industrial IoT environments for its robustness in noisy settings, such as factory automation and SCADA systems, where RS-485 multi-node networks benefit from simple handshaking to connect legacy PLCs and sensors.[73] Common troubleshooting issues in serial hardware handshaking include crossed or improperly wired control lines, such as swapping RTS and CTS in null-modem cables, which can cause perpetual pauses or data loss by preventing proper assertion.[74] Other frequent problems involve incompatible voltage levels between devices (e.g., TTL vs. RS-232), leading to undetected signals, or disconnected handshake pins in minimal three-wire setups, resulting in unchecked buffer overflows.[75] To resolve, verifying cable pinouts against the DB-9 or DB-25 standard and using loopback tests to confirm signal integrity is essential.[76]

Dial-Up Modem Negotiation

Dial-up modem negotiation refers to the initial connection establishment process between two modems over analog telephone lines, involving a series of acoustic and electrical signals to detect carriers, negotiate speeds, select protocols, and train the connection for reliable data transmission. This handshake is a hybrid analog-digital procedure, where modems exchange audio tones and modulated signals to adapt to line conditions, producing the characteristic screeching sounds audible during connection. The process ensures compatibility and optimal performance despite impairments like noise and attenuation in the public switched telephone network (PSTN).[77] The negotiation unfolds in multiple phases, beginning with carrier detection and echo suppression. In the first phase, aligned with ITU-T V.8, the answering modem emits an amplitude-modulated and phase-reversed tone at 2100 Hz (known as ANSam) for approximately 3 seconds to disable echo cancellers in the telephone network, enabling full-duplex operation; the originating modem responds with V.21-modulated signals to confirm capabilities. This is followed by a V.8 bis transaction where modems exchange binary information via differential phase-shift keying (DPSK) at 600 bps to identify supported modulation schemes and select protocols.[78][77] Subsequent phases focus on line probing, equalization, and training, as defined in ITU-T V.34 (initially standardized in 1994 for speeds up to 33.6 kbps). Phase 2 involves probing with tone pairs (L1/L2 sequences at frequencies from 500 Hz to 3900 Hz) and ranging signals (A/A' at 2400 Hz and B/B' at 1200 Hz with phase reversals) to assess channel characteristics, detect the carrier, and estimate transmit power levels. In Phase 3, the modems perform equalizer and preamble training using scrambled binary ones (TRN signal) and partial response signaling, allowing speed negotiation based on line quality; phase-locked loops synchronize the carriers during this stage. Phase 4 completes final training with additional scrambled ones and modulation probes (MP), establishing full-duplex mode and enabling fallback to lower speeds if higher rates fail due to poor conditions.[78][77] Following modulation training, the initial exchange configures error correction and framing via ITU-T V.42, which implements the Link Access Procedure for Modems (LAPM), an HDLC-based protocol for detecting and retransmitting erroneous frames to ensure data integrity over the noisy analog link. The V.92 standard (2000), an enhancement to V.90, supports downstream speeds up to 56 kbps while retaining a similar handshake structure, incorporating quick-connect features that cache prior line data to reduce negotiation time by up to 50% on repeated calls.[79][80] This process builds on serial hardware handshaking principles by adding telephony-specific audio phases for analog adaptation. Dial-up negotiation peaked in popularity during the 1990s as the primary internet access method, but became obsolete with the rise of broadband in the early 2000s; it remains historically significant for enabling rural and low-cost connectivity before widespread DSL and cable adoption.[81][82]

Modern and Peripheral Applications

Mobile Device Charging Protocols

Mobile device charging protocols involve an initial signal exchange between the device and charger to detect the charger type, negotiate appropriate voltage and current levels, and identify the device's power capabilities, ensuring safe and efficient power delivery without data transfer. This handshake process parallels hardware handshakes in peripherals by establishing a basic connection for power negotiation.[83] In USB wired charging, the Battery Charging Specification 1.2 (BC 1.2), released by the USB Implementers Forum (USB-IF), employs voltage detection on the D+ and D- data lines to classify the charger. For a Dedicated Charging Port (DCP), the device detects a short between D+ and D- with resistance less than 200 Ω by applying a source voltage of 0.5–0.7 V to D+ (V_DP_SRC) and monitoring if the voltage on D- exceeds 0.25 V (V_DAT_REF), indicating no data signaling but dedicated power up to 1.5 A at 5 V. A Charging Downstream Port (CDP) is identified similarly but with the port responding by applying 0.5–0.7 V to D- during secondary detection, allowing up to 1.5 A while supporting potential data. Standard Downstream Ports (SDP) show no such voltages, limiting current to 500 mA.[84] For wireless charging, the Qi standard developed by the Wireless Power Consortium (WPC) since 2010 uses in-band communication through modulated power packets over inductive coupling to negotiate power levels up to 15 W. Version 2.0 (Qi2, released April 2023) introduced magnetic alignment for precise positioning and improved efficiency, with subsequent versions 2.1 (September 2024) and 2.2 (2025) adding protocol refinements while maintaining the 15 W baseline. The process begins with a digital ping from the transmitter to detect a receiver, followed by signal strength assessment and configuration packets where the receiver requests specific power profiles, such as 5 W or 15 W. Foreign object detection (FOD) is integrated via power loss monitoring during this exchange, comparing expected versus actual power transfer to identify metallic interferences and halt charging if needed.[85][86] The handshake steps typically start with the device asserting its configuration: in USB, by applying test voltages to D+ or D- lines for 40 ms (T_VDPSRC_ON); the charger then responds by enabling VBUS power supply within 1 second if compatible, such as activating 5 V at up to 1.5 A for BC 1.2 ports. In Qi, the receiver sends identification packets post-ping, leading to power transfer initiation only after successful negotiation. These steps ensure mutual compatibility before full power delivery.[84][86] Safety features are embedded throughout, including overcurrent protection limiting output to 1.5 A maximum in BC 1.2 and Qi's FOD to prevent overheating from inefficiencies. Thermal limits monitor temperature to throttle power if exceeding safe thresholds, with voltage overshoot capped at 6.0 V and undershoot above 4.1 V during load changes. The evolution continued with USB Power Delivery (PD) Revision 3.0 in 2018 extending capabilities to 100 W via configurable voltages up to 20 V and 5 A, Revision 3.1 (2021) introducing Extended Power Range (EPR) up to 240 W at 48 V, and Revision 3.2 (June 2025) adding support for 28 V, 36 V, and 48 V fixed voltages, all enabling bidirectional power flow for devices like laptops to supply or receive power dynamically.[84][87][86] Standardization is overseen by the USB-IF for wired protocols like BC 1.2 and PD 3.2, ensuring interoperability through compliance testing, while the WPC manages Qi since its 2010 launch, certifying products for safe wireless power transfer.[83][87][88]

USB and Peripheral Handshakes

The USB and peripheral handshakes encompass the plug-and-play enumeration and configuration processes that enable seamless integration of devices such as keyboards, mice, and storage drives into a host system. Upon connection, the host detects the device through voltage changes on the data lines and initiates a series of control transfers to query descriptors, assign a unique address, and configure endpoints for data communication. This process ensures compatibility and resource allocation without manual intervention, supporting hierarchical topologies via hubs that manage multiple downstream devices.[89][90] The enumeration begins with device attachment, where the host identifies the connection via pull-up resistors on the D+ or D- lines. Speed detection follows through chirp signaling: for USB 2.0 full-speed devices (12 Mbps), a pull-up on D+ initiates communication, while high-speed (480 Mbps) devices use J/K-state chirps to confirm capabilities after initial full-speed fallback. The host then issues a bus reset to synchronize the device, assigns an address via the SET_ADDRESS request, and sends GET_DESCRIPTOR requests for key information including the device descriptor (containing USB version, class, subclass, protocol, vendor ID, product ID, and maximum packet size), configuration descriptor (outlining power requirements and interfaces), and interface descriptors (detailing endpoints and alternate settings). The device responds with these structured binary descriptors, enabling the host to load appropriate drivers. Finally, the SET_CONFIGURATION request activates a specific configuration, finalizing endpoint setup and allowing normal operation. Hubs play a crucial role by relaying these requests and managing port status changes for attached peripherals.[89][90][89] USB 2.0 and later versions (collectively USB 3.x under the USB 3.2 specification) introduce differences in link establishment, particularly for SuperSpeed modes. In USB 3.x, initial attachment uses USB 2.0 signaling for fallback compatibility, but SuperSpeed detection employs Low-Frequency Periodic Signaling (LFPS) bursts—short pulses on the SSTX/SSRX pairs—to negotiate speeds up to 5 Gbps (Gen 1), 10 Gbps (Gen 2), or 20 Gbps (Gen 2x2) via link training sequences like TS1/TS2 symbols for equalization and polarity inversion. This contrasts with USB 2.0's chirp-based method, adding robustness for higher frequencies. Hub support extends to SuperSpeed with dedicated downstream ports and bifurcation for multi-lane operation, ensuring transparent enumeration across tiers.[91][92][89] During configuration, the host negotiates bandwidth allocation by evaluating endpoint requirements in descriptors, reserving microframe slots (for high-speed) or frames to prevent overload—typically limiting isochronous transfers to 80% of bus capacity. Alternate interface settings allow dynamic reconfiguration, such as switching a device's audio interface from low-bandwidth stereo to high-bandwidth surround sound without re-enumeration, via the SET_INTERFACE request. This flexibility supports peripherals adapting to host constraints.[93][94] The handshake process has evolved significantly since USB 1.0 in 1996, which introduced basic low/full-speed enumeration at 1.5/12 Mbps with initial reset and descriptor exchanges. USB 2.0 (2000) added high-speed chirps and hub chaining, while USB 3.x (2008 onward) incorporated LFPS and SuperSpeed for backward compatibility. USB4 (specification version 1.0 released 2019, version 2.0 in 2022) builds on this with tunneling protocols that encapsulate USB 2.0/3.2 traffic over a 40/80 Gbps link using Thunderbolt 3-inspired architecture, maintaining similar enumeration but adding dynamic path management and error recovery through selective resets or link retraining.[89][95][91] Security in these handshakes remains primarily trust-based, relying on physical access controls rather than cryptographic verification during enumeration. However, USB 3.2 introduces optional basic authentication via the USB Authentication Specification, allowing devices to prove integrity using public-key challenges before full configuration, though adoption is limited to mitigate risks like unauthorized peripherals. Error recovery universally involves bus resets to reinitialize failed handshakes.[96][91][89]

References

User Avatar
No comments yet.