Recent from talks
Contribute something
Nothing was collected or created yet.
Email address
View on WikipediaAn email address identifies an email box to which messages are delivered. While early messaging systems used a variety of formats for addressing, today, email addresses follow a set of specific rules originally standardized by the Internet Engineering Task Force (IETF) in the 1980s, and updated by RFC 5322 and 6854. The term email address in this article refers to just the addr-spec in Section 3.4 of RFC 5322. The RFC defines address more broadly as either a mailbox or group. A mailbox value can be either a name-addr, which contains a display-name and addr-spec, or the more common addr-spec alone.
An email address, such as john.smith@example.com, is made up from a local-part, the symbol @, and a domain, which may be a domain name or an IP address enclosed in brackets. Although the standard requires the local-part to be case-sensitive,[1] it also urges that receiving hosts deliver messages in a case-independent manner,[2] e.g., that the mail system in the domain example.com treat John.Smith as equivalent to john.smith; some mail systems even treat them as equivalent to johnsmith.[3] Mail systems often limit the users' choice of name to a subset of the technically permitted characters; with the introduction of internationalized domain names, efforts are progressing to permit non-ASCII characters in email addresses.
Due to the ubiquity of email in today's world, email addresses are often used as regular usernames by many websites and services that provide a user profile or account.[4] For example, if a user wants to log in to their Xbox Live video gaming profile, they would use their Microsoft account in the form of an email address as the username ID, even though the service in this case is not email.
Message transport
[edit]An email address consists of two parts, a local-part (sometimes a user name, but not always) and a domain; if the domain is a domain name rather than an IP address then the SMTP client uses the domain name to look up the mail exchange IP address. The general format of an email address is local-part@domain, e.g. jsmith@[192.168.1.2], jsmith@example.com. The SMTP client transmits the message to the mail exchange, which may forward it to another mail exchange until it eventually arrives at the host of the recipient's mail system.
The transmission of electronic mail from the author's computer and between mail hosts in the Internet uses the Simple Mail Transfer Protocol (SMTP), defined in RFC 5321 and 5322, and extensions such as RFC 6531. The mailboxes may be accessed and managed by applications on personal computers, mobile devices or webmail sites, using the SMTP protocol and either the Post Office Protocol (POP) or the Internet Message Access Protocol (IMAP).
When transmitting email messages, mail user agents (MUAs) and mail transfer agents (MTAs) use the domain name system (DNS) to look up a Resource Record (RR) for the recipient's domain. A mail exchanger resource record (MX record) contains the name of the recipient's mailserver. In absence of an MX record, an address record (A or AAAA) directly specifies the mail host.
The local-part of an email address has no significance for intermediate mail relay systems other than the final mailbox host. Email senders and intermediate relay systems must not assume it to be case-insensitive, since the final mailbox host may or may not treat it as such. A single mailbox may receive mail for multiple email addresses, if configured by the administrator. Conversely, a single email address may be the alias to a distribution list to many mailboxes. Email aliases, electronic mailing lists, sub-addressing, and catch-all addresses, the latter being mailboxes that receive messages regardless of the local-part, are common patterns for achieving a variety of delivery goals.
The addresses found in the header fields of an email message are not directly used by mail exchanges to deliver the message. An email message also contains a message envelope that contains the information for mail routing. While envelope and header addresses may be equal, forged email addresses (also called spoofed email addresses) are often seen in spam, phishing, and many other Internet-based scams. This has led to several initiatives that aim to make such forgeries of fraudulent emails easier to spot.
Syntax
[edit]The format of an email address is local-part@domain, where the local-part may be up to 64 octets long and the domain may have a maximum of 255 octets.[5] The formal definitions are in RFC 5322 (sections 3.2.3 and 3.4.1) and RFC 5321—with a more readable form given in the informational RFC 3696 (written by J. Klensin, the author of RFC 5321[6]) and the associated errata.
An email address also may have an associated "display-name" (Display Name) for the recipient, which precedes the address specification, surrounded by angled brackets in that case, for example: John Smith <john.smith@example.org>.[7] Email spammers and phishers will often use "Display Name spoofing" to trick their victims, by using a false Display Name, or by using a different email address as the Display Name.[8]
Earlier forms of email addresses for other networks than the Internet included other notations, such as that required by X.400, and the UUCP bang path notation, in which the address was given in the form of a sequence of computers through which the message should be relayed. This was widely used for several years, but was superseded by the Internet standards promulgated by the Internet Engineering Task Force (IETF).
Local-part
[edit]The local-part of the email address may be unquoted or may be enclosed in quotation marks.
If unquoted, it may use any of these ASCII characters:
- uppercase and lowercase Latin letters
AtoZandatoz - digits
0to9 - printable characters
!#$%&'*+-/=?^_`{|}~ - dot
., provided that it is not the first or last character and provided also that it does not appear consecutively (e.g.,John..Doe@example.comis not allowed).[9]
If quoted, it may contain Space, Horizontal Tab (HT), any ASCII graphic except Backslash and Quote and a quoted-pair consisting of a Backslash followed by HT, Space or any ASCII graphic; it may also be split between lines anywhere that HT or Space appears. In contrast to unquoted local-parts, the addresses ".John.Doe"@example.com, "John.Doe."@example.com and "John..Doe"@example.com are allowed.
The maximum total length of the local-part of an email address is 64 octets.[10]
- Space and special characters
"(),:;<>@[\]are allowed with restrictions (they are only allowed inside a quoted string, as described in the paragraph below, and in that quoted string, any backslash or double-quote must be preceded once by a backslash); - Comments are allowed with parentheses, either at the start or end of the local-part; e.g.,
john.smith(comment)@example.comand(comment)john.smith@example.comare both equivalent tojohn.smith@example.com.
In addition to the above ASCII characters, international characters above U+007F, encoded as UTF-8, are permitted by RFC 6531 when the EHLO specifies SMTPUTF8, though even mail systems that support SMTPUTF8 and 8BITMIME may restrict what characters to use when assigning local-parts.
A local-part is either a Dot-string or a Quoted-string; it cannot be a combination. Quoted strings and characters, however, are not commonly used.[citation needed] RFC 5321 also warns that "a host that expects to receive mail SHOULD avoid defining mailboxes where the Local-part requires (or uses) the Quoted-string form".
The local-part postmaster is treated specially—it is case-insensitive, and should be forwarded to the domain email administrator. Technically all other local-parts are case-sensitive, therefore johns@example.com and JohnS@example.com specify different mailboxes; however, many organizations treat uppercase and lowercase letters as equivalent. Indeed, RFC 5321 warns that "a host that expects to receive mail SHOULD avoid defining mailboxes where ... the Local-part is case-sensitive".
Despite the wide range of special characters that are technically valid, organisations, mail services, mail servers, and mail clients in practice often do not accept all of them. For example, Windows Live Hotmail only allows creation of email addresses using alphanumerics, dot (.), underscore (_) and hyphen (-).[11] Common advice is to avoid using some special characters to avoid the risk of rejected emails.[12]
According to RFC 5321 2.3.11 Mailbox and Address, "the local-part MUST be interpreted and assigned semantics only by the host specified in the domain of the address". This means that no assumptions can be made about the meaning of the local-part of another mail server. It is entirely up to the configuration of the mail server.
Interpretation of the local-part is dependent on the conventions and policies implemented in the mail server. For example, case sensitivity may distinguish mailboxes differing only in capitalization of characters of the local-part, although this is not very common.[13] For example, Gmail ignores all dots in the local-part of user email address for the purposes of determining account identity.[14]
Sub-addressing
[edit]Some mail services support a tag included in the local-part, such that the address is an alias to a prefix of the local-part. Typically the characters following a plus and less often the characters following a minus, so fred+bar@domain and fred+foo@domain might end up in the same inbox as fred+@domain or even as fred@domain. For example, the address joeuser+tag@example.com denotes the same delivery address as joeuser@example.com. RFC 5233[15] refers to this convention as subaddressing, but it is also known as plus addressing, tagged addressing or mail extensions. This can be useful for tagging emails for sorting, and for spam control.[16]
Addresses of this form, using various separators between the base name and the tag, are supported by several email services, including Andrew Project (plus),[17] Runbox (plus),[18] Gmail (plus),[16] Rackspace (plus), Yahoo! Mail Plus (hyphen),[19] Apple's iCloud (plus), Outlook.com (plus),[20] Mailfence (plus),[21] Proton Mail (plus),[22] Fastmail (plus and Subdomain Addressing),[23] postale.io (plus),[24] Pobox (plus),[25] MeMail (plus),[26] and MTAs like MMDF (equals), Qmail and Courier Mail Server (hyphen).[27][28] Postfix and Exim allow configuring an arbitrary separator from the legal character set.[29][30]
The text of the tag may be used to apply filtering,[27] or to create single-use, or disposable email addresses.[31]
Domain
[edit]The domain name part of an email address has to conform to strict guidelines: it must match the requirements for a hostname, a list of dot-separated DNS labels, each label being limited to a length of 63 characters and consisting of:[9]: §2
- Uppercase and lowercase Latin letters
AtoZandatoz; - Digits
0to9, provided that top-level domain names are not all-numeric; - Hyphen
-, provided that it is not the first or last character.
This rule is known as the LDH rule (letters, digits, hyphen). In addition, the domain may be an IP address literal, surrounded by square brackets [], such as jsmith@[192.168.2.1] or jsmith@[IPv6:2001:db8::1], although this is rarely seen except in email spam. Internationalized domain names (which are encoded to comply with the requirements for a hostname) allow for presentation of non-ASCII domains. In mail systems compliant with RFC 6531 and RFC 6532, an email address may be encoded as UTF-8, both a local-part as well as a domain name.
Comments are allowed in the domain as well as in the local-part; for example, john.smith@(comment)example.com and john.smith@example.com(comment) are equivalent to john.smith@example.com.
RFC 2606 specifies that certain domains, for example those intended for documentation and testing, should not be resolvable and that as a result mail addressed to mailboxes in them and their subdomains should be non-deliverable. Of note for email are example, invalid, example.com, example.net, and example.org.
Examples
[edit]Valid email addresses
[edit]simple@example.comvery.common@example.comFirstName.LastName@EasierReading.org(case is always ignored after the @ and usually before)x@example.com(one-letter local-part)long.email-address-with-hyphens@and.subdomains.example.comuser.name+tag+sorting@example.com(may be routed touser.name@example.cominbox depending on mail server)name/surname@example.com(slashes are a printable character, and allowed)admin@example(local domain name with no TLD, although ICANN highly discourages dotless email addresses[32])example@s.example(see the List of Internet top-level domains)" "@example.org(space between the quotes)"john..doe"@example.org(quoted double dot)mailhost!username@example.org(bangified host route used for UUCP mailers)"very.(),:;<>[]\".VERY.\"very@\\ \"very\".unusual"@strange.example.com(include non-letters character AND multiple at sign, the first one being double quoted)user%example.com@example.org(% escaped mail route to user@example.com via example.org)user-@example.org(local-part ending with non-alphanumeric character from the list of allowed printable characters)postmaster@[123.123.123.123](IP addresses are allowed instead of domains when in square brackets, but strongly discouraged)postmaster@[IPv6:2001:0db8:85a3:0000:0000:8a2e:0370:7334](IPv6 uses a different syntax)_test@[IPv6:2001:0db8:85a3:0000:0000:8a2e:0370:7334](begin with underscore different syntax)
Valid email addresses with SMTPUTF8
[edit]I❤️CHOCOLATE@example.com(emoji are only allowed with SMTPUTF8)
Invalid email addresses
[edit]abc.example.com(no @ character)a@b@c@example.com(only one @ is allowed outside quotation marks)a"b(c)d,e:f;g<h>i[j\k]l@example.com(none of the special characters in this local-part are allowed outside quotation marks)just"not"right@example.com(quoted strings must be dot separated or be the only element making up the local-part)this is"not\allowed@example.com(spaces, quotes, and backslashes may only exist when within quoted strings and preceded by a backslash)this\ still\"not\\allowed@example.com(even if escaped (preceded by a backslash), spaces, quotes, and backslashes must still be contained by quotes)1234567890123456789012345678901234567890123456789012345678901234+x@example.com(local-part is longer than 64 characters)i.like.underscores@but_they_are_not_allowed_in_this_part(underscore is not allowed in domain part)
Validation and verification
[edit]This section needs additional citations for verification. (July 2019) |
Email addresses are often requested as input to website as validation of user existence. Other validation methods are available, such as cell phone number validation, postal mail validation, and fax validation.
An email address is generally recognized as having two parts joined with an at-sign (@), although the technical specifications detailed in RFC 822 and subsequent RFCs are more extensive.[33]
Syntactically correct, verified email addresses do not guarantee that an email box exists. Thus many email servers use other techniques and check the mailbox existence against relevant systems such as the Domain Name System for the domain or using callback verification to check if the mailbox exists. Callback verification is an imperfect solution, as it may be disabled to avoid a directory harvest attack, or callbacks may be reported as spam and lead to listing on a DNSBL.
Several validation techniques may be utilized to validate a user email address. For example,[34]
- Verification links: Email address validation is often accomplished for account creation on websites by sending an email to the user-provided email address with a special temporary hyperlink. On receipt, the user opens the link, immediately activating the account. Email addresses are also useful as means of delivering messages from a website, e.g., user messages, user actions, to the email inbox.
- Formal and informal standards: RFC 3696 provides specific advice for validating Internet identifiers, including email addresses. Some websites instead attempt to evaluate the validity of email addresses through arbitrary standards, such as by rejecting addresses containing valid characters, such as + and /, or enforcing arbitrary length limitations. Email address internationalization provides for a much larger range of characters than many current validation algorithms allow, such as all Unicode characters above U+0080, encoded as UTF-8.
- Algorithmic tools: Large websites, bulk mailers and spammers require efficient tools to validate email addresses. Such tools depend upon heuristic algorithms and statistical models.[35]
- Sender reputation: An email sender's reputation may be used to attempt to verify whether the sender is trustworthy or a potential spammer. Factors that may be incorporated into an assessment of sender reputation include the quality of past contact with or content provided by, and engagement levels of, the sender's IP address or email address.
- Browser-based verification: HTML5 forms implemented in many browsers allow email address validation to be handled by the browser.[36]
Some companies offer services to validate an email address, often using an application programming interface, but there is no guarantee that it will provide accurate results.
Internationalization
[edit]The IETF conducts a technical and standards working group devoted to internationalization issues of email addresses, entitled Email Address Internationalization (EAI, also known as IMA, Internationalized Mail Address).[37] This group produced RFC 6530, 6531, 6532 and 6533, and continues to work on additional EAI-related RFCs.
The IETF's EAI Working group published RFC 6530 "Overview and Framework for Internationalized Email" that enabled non-ASCII characters to be used in both the local-parts and domain of an email address. RFC 6530 provides for email based on the UTF-8 encoding, which permits the full repertoire of Unicode. RFC 6531 provides a mechanism for SMTP servers to negotiate transmission of the SMTPUTF8 content.
The basic EAI concepts involve exchanging mail in UTF-8. Though the original proposal included a downgrading mechanism for legacy systems, this has now been dropped.[38] The local servers are responsible for the local-part of the address, whereas the domain would be restricted by the rules of internationalized domain names, though still transmitted in UTF-8. The mail server is also responsible for any mapping mechanism between the IMA form and any ASCII alias.
EAI enables users to have a localized address in a native language script or character set, as well as an ASCII form for communicating with legacy systems or for script-independent use. Applications that recognize internationalized domain names and mail addresses must have facilities to convert these representations.
Significant demand for such addresses is expected in China, Japan, Russia, and other markets that have large user bases in a non-Latin-based writing system.
For example, in addition to the .in top-level domain, the government of India in 2011[39] got approval for ".bharat", (from Bhārat Gaṇarājya), written in seven different scripts[40][41] for use by Gujrati, Marathi, Bangali, Tamil, Telugu, Punjabi and Urdu speakers. Indian company XgenPlus.com claims to be the world's first EAI mailbox provider,[42] and the Government of Rajasthan now supplies a free email account on domain राजस्थान.भारत for every citizen of the state.[43] A leading media house Rajasthan Patrika launched their IDN domain पत्रिका.भारत with contactable email.
The example addresses below would not be handled by RFC 5321-based servers without an extension, but are permitted by the UTF8SMTP extension of RFC 6530 and 6531. Servers compliant with this will be able to handle these:
- Latin alphabet with diacritics: éléonore@example.com
- Greek alphabet: δοκιμή@παράδειγμα.δοκιμή
- Traditional Chinese characters: 我買@屋企.香港
- Japanese characters: 二ノ宮@黒川.日本
- Cyrillic characters: медведь@с-балалайкой.рф
- Devanagari characters: संपर्क@डाटामेल.भारत
See also
[edit]References
[edit]- ^ J. Klensin (October 2008). "General Syntax Principles and Transaction Model". Simple Mail Transfer Protocol. p. 15. sec. 2.4. doi:10.17487/RFC5321. RFC 5321.
The local-part of a mailbox MUST BE treated as case sensitive.
- ^ J. Klensin (October 2008). "General Syntax Principles and Transaction Model". Simple Mail Transfer Protocol. p. 15. sec. 2.4. doi:10.17487/RFC5321. RFC 5321.
However, exploiting the case sensitivity of mailbox local-parts impedes interoperability and is discouraged.
- ^ "...you can add or remove the dots from a mail address without changing the actual destination address; and they'll all go to your inbox...", Google.com
- ^ Morrison, Sara (2021-09-06). "How a simple email address makes things complicated". Vox. Retrieved 2024-07-15.
- ^ Klensin, J. (October 2008). "Size Limits and Minimums". Simple Mail Transfer Protocol. IETF. sec. 4.5.3.1. doi:10.17487/RFC5321. RFC 5321.
- ^ J. Klensin, RFC 5321, IETF, October 2008
- ^ "Address Specification". Internet Message Format. sec. 3.4. doi:10.17487/RFC5322. RFC 5322. Retrieved March 14, 2023.
- ^ "Spotting a Spoofing". cyber.nj.gov. November 19, 2020. Retrieved 17 April 2023.
- ^ a b Klensin, J. (February 2004). RFC 3696. IETF. doi:10.17487/RFC3696. Retrieved 2017-08-01.: §3
- ^ Klensin, J. (October 2008). RFC 5321. IETF. sec. 4.5.3.1.1. doi:10.17487/RFC5321. Retrieved 2019-08-01.
- ^ "Sign up for Windows Live". Retrieved 2008-07-26.. However, the phrase is hidden, thus one has to either check the availability of an invalid ID, e.g., me#1, or resort to alternative displaying, e.g., no-style or source view, in order to read it.
- ^ "Characters in the local part of an email address". Retrieved 2016-03-30.
- ^ Are Email Addresses Case Sensitive? Archived 2016-06-03 at the Wayback Machine by Heinz Tschabitscher
- ^ "Receiving someone else's mail". google.com.
- ^ Murchison, K. (2008). Sieve Email Filtering: Subaddress Extension. IETF. doi:10.17487/RFC5233. RFC 5233. Retrieved February 9, 2019.
- ^ a b "Send emails from a different address or alias". Gmail Help. Retrieved 13 December 2023.
- ^ "An Overview of the Andrew Message System" (PDF). Retrieved 17 April 2023.
- ^ "Subaddressing/Plus Addressing". Retrieved 1 January 2024.
- ^ "Disposable addresses in Yahoo Mail". Yahoo Help.
- ^ Rivera, Rafael (2013-09-17). "Outlook.com supports simpler "+" email aliases too". Within Windows. Archived from the original on 2014-02-20. Retrieved 2023-12-04.
- ^ "Plus Addressing: The Best Way to Track Spammers in 2024". mailfence.com.
- ^ "Addresses and Aliases". proton.me.
- ^ "Plus addressing and subdomain addressing". www.fastmail.com. Archived from the original on 2020-10-06. Retrieved 2020-10-06.
- ^ "postale.io's FAQ on sub-addressing". postale.io. Archived from the original on 2020-10-06. Retrieved 2020-10-06.
- ^ "Can I use myaddress+extension@pobox.com with my Pobox account?". helpspot.pobox.com. n.d. Archived from the original on 2020-10-03. Retrieved 2020-10-03.
Pobox supports the use of "+anystring" (plus extensions) with any address.
- ^ "MeMail". www.memail.com. Retrieved 2020-10-06.
- ^ a b "Dot-Qmail, Control the delivery of mail messages". Archived from the original on 26 January 2012. Retrieved 27 January 2012.
- ^ Sill, Dave. "4.1.5. extension addresses". Life with qmail. Retrieved 27 January 2012.
- ^ "Postfix Configuration Parameters". postfix.org.
- ^ "Exim Configuration Parameters, "local_part_suffix"". exim.org.
- ^ Gina Trapani (2005) "Instant disposable Gmail addresses"
- ^ "New gTLD Dotless Domain Names Prohibited". www.icann.org. ICANN. Retrieved 23 March 2020.
- ^ "How Domino formats the sender's Internet address in outbound messages". IBM Knowledge Center. Retrieved 23 July 2019.
- ^ "M3AAWG Sender Best Common Practices, Version 3" (PDF). Messaging, Malware and Mobile Anti-Abuse Working Group. February 2015. Retrieved 23 July 2019.
- ^ Verification & Validation Techniques for Email Address Quality Assurance by Jan Hornych 2011, University of Oxford
- ^ "4.10 Forms — HTML5". w3.org.
- ^ "Eai Status Pages". Email Address Internationalization (Active WG). IETF. March 17, 2006 – March 18, 2013. Retrieved July 26, 2008.
- ^ "Email Address Internationalization (eai)". IETF. Retrieved November 30, 2010.
- ^ "2011-01-25 - Approval of Delegation of the seven top-level domains representing India in various languages". features.icann.org.
- ^ "Internationalized Domain Names (IDNs) | Registry.In". registry.in. Retrieved 2016-10-17.
- ^ "Now, get your email address in Hindi". The Economic Times. Retrieved 2016-10-17.
- ^ "Universal Acceptance in India". 15 February 2017.
- ^ "देश में पहला, प्रदेश के हर नागरिक के लिए मुफ्त ई-वॉल्ट और ई-मेल की सुविधा शुरू - वसुन्धरा राजे". वसुन्धरा राजे (in Hindi). 2017-08-18. Retrieved 2017-08-20.
Further reading
[edit]- RFC 821 Simple Mail Transfer Protocol (Obsoleted by RFC 2821 and 5321)
- RFC 822 Standard for the Format of ARPA Internet Text Messages (Obsoleted by RFC 2822) (Errata)
- RFC 1035 Domain names, Implementation and specification (Errata)
- RFC 1123 Requirements for Internet Hosts, Application and Support (Updated by RFC 2821, 5321) (Errata)
- RFC 2142 Mailbox Names for Common Services, Roles and Functions (Errata)
- RFC 2821 Simple Mail Transfer Protocol (Obsoletes RFC 821, Updates RFC 1123, Obsoleted by RFC 5321) (Errata)
- RFC 2822 Internet Message Format (Obsoletes RFC 822, Obsoleted by RFC 5322) (Errata)
- RFC 3696 Application Techniques for Checking and Transformation of Names (Errata)
- RFC 4291 IP Version 6 Addressing Architecture (Updated by RFC 5952) (Errata)
- RFC 5321 Simple Mail Transfer Protocol (Obsoletes RFC 2821, Updates RFC 1123) (Errata)
- RFC 5322 Internet Message Format (Obsoletes RFC 2822, Updated by RFC 6854) (Errata)
- RFC 5598 Internet Mail Architecture
- RFC 5952 A Recommendation for IPv6 Address Text Representation (Updates RFC 4291) (Errata)
- RFC 6530 Overview and Framework for Internationalized Email (Obsoletes RFC 4952, 5504, 5825)
- RFC 6531 SMTP Extension for Internationalized Email (Obsoletes RFC 5336)
- RFC 6854 Update to Internet Message Format to Allow Group Syntax in the "From:" and "Sender:" Header Fields (Updates RFC 5322)
External links
[edit]
Validate Email Address at Wikibooks
Best Practices at Wikibooks
Media related to Email address at Wikimedia Commons
Email address
View on GrokipediaRole in Email Communication
Definition and Purpose
An email address is a unique string that identifies the recipient of an electronic mail message within the Internet's messaging framework, serving as a specific identifier for a mailbox on a host computer.[6] It typically follows the format of a local-part followed by an "@" symbol and a domain, enabling precise targeting of messages to individual users or shared mailboxes.[7] The primary purpose of an email address is to facilitate the routing and delivery of messages across interconnected networks, supporting both one-to-one correspondence and one-to-many distributions such as mailing lists.[8] Beyond message transport, it functions as a foundational digital identity, commonly used for user authentication, account registration, subscription to services like newsletters, and integration with other online systems.[8] Email addresses originated in the early 1970s as part of the ARPANET, the precursor to the modern Internet, where engineer Ray Tomlinson developed the first networked email system in 1971 by extending existing programs to allow inter-host messaging.[9] This innovation quickly evolved into a global standard for internet-based electronic communication, standardizing user addressing across diverse systems.[10] Unlike telephone numbers, which primarily enable voice or short message services, or IP addresses, which identify network devices for data routing, email addresses specifically target human users or virtual mailboxes for asynchronous text-based exchange.[11]Message Transport Usage
Email addresses play a central role in the Simple Mail Transfer Protocol (SMTP), the standard for transporting email messages across the internet. In an SMTP transaction, the sender's email address is specified using the MAIL FROM command, which defines the reverse-path for error notifications and delivery reports.[12] Similarly, each recipient's email address is indicated via the RCPT TO command, establishing the forward-path to guide message delivery.[13] These commands form the SMTP envelope, which encapsulates the routing information separate from the message content itself.[14] The routing process relies on the domain portion of the email address to determine the appropriate mail server. When an SMTP server receives a message, it resolves the recipient's domain through DNS MX (Mail Exchanger) records to identify the target server for relay or final delivery.[15] The local-part of the address then specifies the individual mailbox on that server, enabling precise delivery.[13] A key distinction exists between the transport envelope and the message headers. The envelope addresses (from MAIL FROM and RCPT TO) are used exclusively for routing and are not visible to end users, whereas header fields like From: and To: serve display and informational purposes within the email client.[16] This separation ensures that routing remains efficient and independent of the message's visible content, such as in cases of blind carbon copies where recipients are not listed in headers.[17] If an email address proves undeliverable during transport, the SMTP server generates error responses and bounce messages. For instance, a 550 reply code indicates a permanent failure, such as an invalid or non-existent recipient, prompting the sending server to notify the original sender via the reverse-path.[18] These bounce messages, often containing diagnostic details, are sent back to the MAIL FROM address to inform the sender of the issue.[19]Syntax and Components
Local-part
The local-part of an email address is the portion preceding the "@" symbol, which specifies the recipient's mailbox or alias on the mail server indicated by the domain.[7] It serves to uniquely identify the user within that specific domain, allowing for flexible naming conventions determined by the receiving server.[20] According to RFC 5322, the syntax for the local-part is defined as a dot-atom, a quoted-string, or an obsolete local-part form (obs-local-part).[7] The dot-atom consists of one or more dot-atom-text elements separated by dots, where dot-atom-text includes letters (a-z and A-Z), digits (0-9), and the special characters ! # $ % & ' * + - / = ? ^ _ ` { | } ~, but it cannot begin or end with a dot, nor contain consecutive dots.[21] The quoted-string format encloses content in double quotes, permitting a broader range of ASCII characters (excluding CR and LF) through escaped quoted-pairs, such as backslash-escaped specials or spaces.[22] Obsolete forms, retained for backward compatibility, allow additional structures like unquoted spaces or other legacy characters, though modern implementations favor the standard dot-atom and quoted-string.[23] The maximum length of the local-part is 64 octets, as specified in RFC 5321 for SMTP compliance, ensuring compatibility across mail transfer agents.[24] Regarding case sensitivity, RFC 5321 mandates that the local-part be treated as case-sensitive, requiring SMTP servers to preserve its casing during transmission.[25] However, many email providers, such as those implementing common extensions, treat it as case-insensitive for delivery purposes to improve user experience and reduce errors.[26] Common formats for the local-part include simple alphanumeric usernames (e.g., user), dotted variants for substructure (e.g., user.name), and plus-addressing extensions (e.g., user+tag), where the plus sign and following tag are valid per RFC 5322 and often used by providers like Gmail for filtering or disposable aliases.[21] Server-specific quoting enables inclusion of spaces or other restricted characters, such as "user name" or "user with space", by wrapping in double quotes and escaping as needed.[22] These formats enhance flexibility while adhering to the core syntax rules.Domain
The domain part of an email address is the segment following the "@" symbol, which specifies the destination mail server or organization for message delivery. It typically consists of a fully qualified domain name (FQDN), such as "example.com," or an IP address literal, ensuring the email can be routed accurately within the internet mail system.[27] The syntax of the domain adheres to rules outlined in RFC 5321 and aligns with DNS hostname specifications in RFC 1035. It comprises one or more labels separated by periods, where each label includes only letters (a-z, A-Z), digits (0-9), and hyphens (-), with hyphens not permitted at the start or end of a label and no underscores allowed in standard domain names. The entire domain must not exceed 255 octets in length to maintain compatibility with SMTP transport limits.[27][28] To resolve the domain for email routing, the sending SMTP server queries the Domain Name System (DNS) for MX (Mail Exchanger) records associated with the domain, as defined in RFC 5321 and detailed in RFC 974. These records list the preferred mail servers, ordered by a numeric preference value (lower values indicating higher priority), allowing selection of the optimal server for delivery. In the absence of MX records, the server falls back to querying A (IPv4) or AAAA (IPv6) records to obtain the domain's IP address directly.[27][29] Domain literals provide an alternative to FQDNs by embedding IP addresses directly in the email address, enclosed in square brackets to distinguish them from domain names. For IPv4, this appears as [192.0.2.1]; for IPv6, it uses the format [IPv6:2001:db8::1], supporting literal resolution without DNS involvement, though such usage is deprecated in modern systems for security reasons.[27] Domains incorporating non-ASCII characters, known as Internationalized Domain Names (IDNs), are represented in Punycode (xn-- prefix) to ensure ASCII compatibility during transmission, with full details on encoding provided in RFC 3490.Sub-addressing
Sub-addressing, also known as plus-addressing or tagged addressing, is an extension to the local-part of an email address that allows users to append optional tags using specific delimiters, enabling emails to be routed to the same mailbox without requiring a separate account. For instance, an email sent to [email protected] is delivered to the primary mailbox associated with [email protected], as the receiving server interprets the tag after the delimiter and strips it during processing.[30][31] The most common delimiter is the plus sign (+), which is supported by major providers such as Gmail and Microsoft Exchange Online, where it separates the base local-part from the tag. Other delimiters include the hyphen (-), used by some systems like certain spam filtering services, and the pipe (|), which is less commonly implemented across providers. These delimiters are permitted within the local-part syntax as defined by RFC 5322, but their interpretive handling for sub-addressing is implementation-specific and not mandated by the standard.[32][33] Common use cases for sub-addressing include organizing incoming mail by category, such as directing messages to [email protected] for professional correspondence or [email protected] for e-commerce notifications, thereby facilitating automated filtering rules. It also enables tracking the origin of email sign-ups, for example, by using [email protected] to identify which services might be sources of spam or data breaches. Additionally, users create temporary aliases for one-time purposes, like online registrations, to enhance privacy without exposing the primary address.[34][35] Support for sub-addressing varies significantly among email providers and mail transfer agents, as it is not standardized in RFC 5322 and relies on server-side configuration to recognize and process the delimiters by stripping them along with the tag before final delivery. While widely implemented in consumer services like Gmail, Outlook.com, and Proton Mail, enterprise systems or older infrastructures may not support it, potentially causing delivery failures if the tag is not handled.[32][36] Limitations of sub-addressing include its inconsistent treatment regarding case sensitivity, where tags are generally ignored in case comparisons since the base local-part's sensitivity is domain-dependent, but most modern providers treat the entire local-part as case-insensitive in practice. Furthermore, the feature can be vulnerable to abuse in spam filtering scenarios, as attackers might leverage varying provider support to generate multiple aliases and bypass blacklists or rate limits, though it is more commonly employed by legitimate users to detect and mitigate unwanted mail.[30][32][37]Examples
Valid Email Addresses
Valid email addresses must adhere to the syntax rules defined in RFC 5322, which specifies the permissible structures for the local-part and domain to ensure proper parsing and routing in Internet mail.[1] Basic valid examples demonstrate straightforward formats using alphanumeric characters in the local-part and a simple domain name. For instance,[email protected] is valid because the local-part "user" consists solely of allowed letters, and the domain "domain.com" follows the dot-atom structure with periods separating label sequences of permitted characters.[1] Similarly, [email protected] is acceptable, as the local-part incorporates dots to separate components without leading, trailing, or consecutive periods, while the domain uses hierarchical labels connected by dots, all within the atext character set (letters, digits, and specific symbols).[1]
The local-part supports quoting to include spaces or other non-standard characters. An example is "user name"@domain.com, where double quotes enclose the local-part to allow the embedded space, adhering to the quoted-string production in the standard.[1] To illustrate the full range of special characters permitted without quoting, !#$%&'*+-/=?^_{|}~@domain.com` is valid, as each symbol belongs to the atext set defined for unquoted local-parts, enabling robust handling of diverse identifiers up to 64 octets in length.[1][38]
Sub-addressing extends functionality within the local-part syntax. For example, [email protected] is syntactically correct, since the plus sign (+) is an allowed atext character, allowing the tag to augment the base local-part without violating length limits or character restrictions.[1]
Domain variations further highlight flexibility in addressing. The address user@[IPv6:2001:db8::1] uses a domain literal enclosed in brackets to specify an IPv6 address directly, bypassing DNS resolution as permitted for transport scenarios.[1] Additionally, [email protected] is valid, with the domain incorporating hyphens within labels, as hyphens are part of the permitted characters and conform to the overall domain length constraint of 255 octets.[1][28]
These examples reflect the core syntax rules for local-parts and domains, providing a foundation for compliant email construction.[1]
Invalid Email Addresses
Invalid email addresses are those that violate the syntactic rules defined for the Internet Message Format, primarily outlined in RFC 5322, which specifies the structure of an addr-spec as a local-part followed by "@" and a domain.[7] These violations prevent proper parsing and transport in email systems, leading to rejection during validation or delivery attempts. Note that some examples below are syntactically valid but practically invalid due to real-world constraints like DNS resolution or system compatibility. Common issues arise from missing components, improper character usage, or exceeding length constraints, as detailed in standards like RFC 3696, which imposes practical limits on address components to ensure compatibility with SMTP protocols.[38] One frequent syntax error is the absence of a domain after the "@" symbol, as in "user@", which fails because the addr-spec requires a non-empty domain following the separator.[7] Similarly, while "user@domain" is syntactically valid as a single-label domain per RFC 5322, it is practically invalid because it lacks a top-level domain (TLD) required for DNS resolution in Internet mail systems.[7] Another basic violation occurs with unquoted spaces in the local-part, such as "user [email protected]", since spaces are not permitted in dot-atom form without enclosing quotes, and quoted-strings must properly escape such characters.[21] More explicit syntax violations include multiple "@" symbols, like "user@@domain.com", which contravenes the single-separator rule in the addr-spec definition, allowing only one "@" between local-part and domain.[7] Consecutive dots in the domain, as in "[email protected]", are prohibited because the dot-atom syntax mandates at least one atext character (letters, digits, or specified specials) between dots.[21] Addresses exceeding 254 characters in total length, such as a contrived local-part of 200 characters followed by a long domain, are invalid due to SMTP command length restrictions clarified in errata for RFC 3696 and aligned with RFC 5321's path limits. Deprecated or non-standard forms further illustrate invalidity under modern rules. For instance, a domain starting with a dot, like "[email protected]", violates the dot-atom requirement that labels begin with atext, not a period, as obsolete syntax allowing leading dots has been prohibited.[39] Although "[email protected]" is syntactically valid since digits are allowed in atext, numeric-only local-parts are non-standard in many legacy systems and may fail delivery in contexts enforcing alphanumeric requirements for mailboxes.[21] Additionally, the inclusion of comments, like "user(comment)@domain.com", is invalid in current addr-spec syntax, as RFC 5322 explicitly prohibits comments within local-parts or domains to avoid parsing ambiguities, obsoleting their use from earlier standards.[39]| Invalid Example | Reason for Invalidity | Relevant RFC Reference |
|---|---|---|
| user@ | Missing domain after "@" | RFC 5322, Section 3.4.1[7] |
| user@domain | Lacks TLD (syntactically valid but practically invalid for DNS resolution) | RFC 5322, Section 3.4.1[7] |
| user [email protected] | Unquoted space in local-part | RFC 5322, Section 3.2.3[21] |
| user@@domain.com | Multiple "@" symbols | RFC 5322, Section 3.4.1[7] |
| [email protected] | Consecutive dots in domain | RFC 5322, Section 3.2.3[21] |
| [Very long address exceeding 254 chars]@domain.com | Exceeds total length limit | RFC 3696 Errata |
| [email protected] | Leading dot in domain | RFC 5322, Section 4[39] |
| user(comment)@domain.com | Comments not allowed in addr-spec | RFC 5322, Section 4[39] |
Internationalized Email Addresses
Internationalized email addresses incorporate non-ASCII characters from various scripts and languages, enabling users worldwide to employ native writing systems in both the local-part and domain components. These addresses conform to standards that extend traditional ASCII-based email syntax, allowing Unicode characters while maintaining compatibility with existing infrastructure. For instance, domains with accented or non-Latin characters are encoded using the Internationalizing Domain Names in Applications (IDNA) protocol, which converts them to Punycode for DNS resolution.[40] A common example involves an IDNA domain, such as user@exämple.com, where the domain "exämple.com" is represented in Punycode as xn--exmple-cua.com to ensure ASCII compatibility in the Domain Name System (DNS). This format supports internationalized domain names (IDNs) by mapping Unicode labels to ASCII-compatible encoding (ACE) strings prefixed with "xn--". Similarly, an ASCII local-part paired with a non-Latin domain, like user@dömäin.tld, uses the Punycode equivalent xn--dmin-5qa.tld for the domain, demonstrating mixed-language support in email routing.[40] The local-part can also include Unicode characters when the Simple Mail Transfer Protocol (SMTP) server supports the SMTPUTF8 extension, which permits UTF-8 encoding throughout the email transmission process. For example, café@domain.com or é[email protected] are valid under this extension, as it expands the allowable characters in the local-part beyond ASCII while preserving quoted or bracketed structures from earlier standards. Without SMTPUTF8, such addresses may fail delivery, as legacy systems expect ASCII-only local-parts.[41] Fully internationalized addresses combine Unicode in both parts, such as π@δóμäïň.com, where the local-part uses the Greek letter pi (π) and the domain incorporates accented Latin characters along with Greek delta (δ). The domain resolves via Punycode as xn--nxad5e.com, and the entire address requires SMTPUTF8 for transport to handle the non-ASCII local-part. Another illustration is 您好@example.com, featuring Chinese characters in the local-part (U+60A8 U+597D), which is supported in contexts like X.509 certificates for email verification. These examples highlight how internationalized addresses facilitate global communication but depend on end-to-end UTF-8 support to avoid downgrading or rejection.[41]Validation and Verification
Syntax Validation
Syntax validation of an email address involves verifying its format against established standards, such as those defined in RFC 5322, without performing any network queries or existence checks. This process ensures the address adheres to syntactic rules for the local-part (before the @ symbol) and the domain (after the @ symbol), focusing on character sets, lengths, and structural elements. The primary goal is to identify malformed addresses early, preventing errors in applications like user registration or data entry forms. Regex-based validation is a common approach, using regular expressions to match the complex patterns outlined in RFC 5322. For the local-part, which can include up to 64 characters of letters, digits, and special symbols like dots (.), hyphens (-), and quoted strings for unusual characters, a comprehensive regex might incorporate escaped characters and domain literals (e.g., [IPv4-address]). The domain portion requires patterns for dot-separated labels, each consisting of 1-63 characters from letters, digits, and hyphens, excluding leading or trailing hyphens. An example regex for basic validation could be^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$, but more robust implementations account for RFC 5322's allowances like comments (in parentheses) and folding whitespace, though these are rarely used in practice. Such patterns are derived directly from the RFC's ABNF (Augmented Backus-Naur Form) grammar for the addr-spec production rule.
Algorithmic checks provide an alternative or complementary method, parsing the address step-by-step rather than relying on a single regex. This begins by locating the @ symbol, ensuring exactly one occurrence and that it is neither at the start nor end of the string. The local-part is then validated for length (up to 64 octets) and permissible characters, including checking for properly quoted sections if present (e.g., "user name"@example.com). For the domain, the string is split by dots to verify each label's length (1-63 characters) and composition, confirming it ends with a top-level domain of at least two characters and disallowing consecutive dots or dots at the beginning or end. These checks align with RFC 5321 for SMTP envelope syntax but are applied locally without transmission. Tools implementing this often use state machines or recursive descent parsers for accuracy.
Programming libraries and tools facilitate syntax validation in various languages, balancing strict adherence to standards with practical usability. In Python, the email.utils module's parseaddr function or the validate_email package from PyPI performs checks based on RFC 5322, returning structured components or raising exceptions for invalid formats; it supports both strict mode (rejecting non-ASCII without quoting) and lenient mode (accepting common real-world variations). Similarly, Java's javax.mail.internet.InternetAddress class validates via its constructor, throwing AddressException for syntax errors and offering options for lenient parsing to handle legacy or internationalized addresses. Strict parsing ensures compliance but may reject valid yet uncommon formats like those with comments, while lenient approaches improve user experience by accepting 99% of practical addresses at the cost of potential false positives. Pros of library use include built-in handling of edge cases and updates for standard revisions, whereas cons involve dependency on specific implementations that might not cover all RFC nuances.
Common pitfalls in syntax validation arise from oversimplification or misunderstanding of the standards. A frequent error is using basic regex patterns like ^[\w\.-]+@[\w\.-]+\.[\w]{2,}$, which fail to handle quoted local-parts (e.g., "O'Brien"@example.com) or international characters without proper encoding, leading to rejection of valid addresses. Another issue is ignoring domain length limits or allowing invalid top-level domains, as domains must conform to DNS rules where labels avoid certain reserved characters. Additionally, validators might overlook the distinction between display names and actual addresses in full RFC 822-style strings (e.g., User [email protected]), parsing only the addr-spec. These errors can result in high false negative rates for simplistic checkers compared to full RFC compliance, emphasizing the need for comprehensive testing against diverse examples.
Existence Verification
Existence verification refers to methods used to determine whether an email address corresponds to an active mailbox that can receive messages, focusing on deliverability and user activity rather than format alone. A common technique is SMTP probing, which involves initiating a connection to the recipient's mail server and sending theRCPT TO command as defined in the Simple Mail Transfer Protocol (SMTP). This command specifies the recipient address, and the server responds with codes indicating acceptance or rejection; for instance, a 250 OK response signifies the mailbox is valid and will accept mail, while a 550 response (e.g., "User unknown") indicates the address does not exist.[42] The probe simulates the early stages of email transmission—connecting via the domain's MX record, greeting the server, and querying the recipient—without sending a full message or body, thereby testing server-side confirmation of the address.[43]
Callback verification, also known as double opt-in, provides an interactive confirmation by sending a verification email to the address and requiring the recipient to respond, typically by clicking a link or replying with a code. This method verifies not only existence but also the user's intent and control over the mailbox, as the address is not activated until confirmation is received. In practice, after an initial signup, an automated confirmation email is dispatched with clear instructions for action, ensuring compliance with regulations like CAN-SPAM and improving list quality by filtering out invalid or mistyped entries.[44]
Third-party services offer automated existence verification through APIs, often combining SMTP probing with proprietary checks to validate addresses at scale. For example, Hunter.io's Email Verifier performs an SMTP test to assess if the address exists by simulating a server handshake, alongside domain and database lookups, achieving high accuracy for business emails.[45] Similarly, NeverBounce integrates SMTP validation within its 20+ step process, conducted from multiple global locations to confirm deliverability and reduce bounces, supporting integrations with over 85 platforms.[46][47] These tools are widely used in marketing to clean lists, but they raise privacy concerns, as probing can inadvertently expose valid addresses to unauthorized parties or facilitate spam if data is mishandled.[48]
Despite their utility, these methods have significant limitations. Catch-all domains, configured to accept emails for any local-part (e.g., *@example.com routes all to a single inbox), produce false positives by returning acceptance codes for non-existent addresses, complicating accurate verification.[49] Anti-spam protections further hinder probing; many servers disable or restrict RCPT TO responses since the late 1990s to prevent address enumeration by spammers, often returning generic errors or temporary failures (e.g., 450 codes). High-volume probes can trigger rate limiting, firewalls, or blacklisting, rendering services unreliable over time and potentially damaging the verifier's IP reputation.[48]
Internationalization
IDNA and Domain Internationalization
The Internationalizing Domain Names in Applications (IDNA) protocol enables the use of non-ASCII characters in domain names by defining a mechanism to map Unicode strings to ASCII-compatible encodings, ensuring compatibility with the Domain Name System (DNS).[50] Specified in RFC 5890 through RFC 5894, IDNA2008 (the current version) replaces the earlier IDNA2003 framework and relies on Punycode for the actual encoding process.[50] Under IDNA, a domain label containing Unicode characters—known as a U-label—is converted to an A-label, which is an ASCII string prefixed with "xn--" and encoded in Punycode, allowing it to be stored and resolved in the DNS without modifications to the underlying infrastructure.[51] Punycode, detailed in RFC 3492, is a bootstring encoding algorithm that transforms a Unicode string into a representation using only ASCII letters, digits, and hyphens, preserving the original string's order and length constraints.[52] The process separates basic ASCII characters (which remain unchanged) from non-ASCII ones, then encodes the latter using a base-36 numbering system with a delimiter ("-") to indicate the insertion point for the encoded portion.[52] For example, the Unicode label "café" (where "é" is U+00E9) encodes to the A-label "xn--caf-dma", which can then be used in DNS queries.[52] This encoding ensures reversibility: decoding an A-label yields the original U-label in Unicode Normalization Form C (NFC).[50] In the context of email addresses, IDNA integration occurs at the DNS level, where MX records for internationalized domains are registered and resolved using A-label forms.[50] SMTP protocols, as defined in RFC 5321, require domain names in commands like MAIL FROM and RCPT TO to be in ASCII, so applications must convert U-labels to A-labels before performing DNS lookups for MX records.[42] This means email servers and clients need IDNA-aware implementations to handle the conversion; otherwise, resolution fails for non-ASCII domains.[51] Browser and server support for IDNA has become widespread, with modern systems automatically applying Punycode encoding during domain registration and resolution.[50] IDNA imposes several limitations to ensure security and stability, including validity checks that prohibit certain Unicode code points classified as DISALLOWED in RFC 5892, such as many punctuation marks and symbols that could lead to confusion or attacks. For right-to-left (RTL) scripts like Arabic or Hebrew, RFC 5893 defines bidirectional rules to mitigate visual spoofing risks: RTL labels must begin and end with specific character types (e.g., starting with R, AL, or L, and ending with R, AL, EN, or AN, optionally followed by non-spacing marks), and they cannot mix certain numeric types or include left-to-right characters inappropriately.[53] These rules prevent unrestricted RTL usage in domains, requiring strict validation during encoding to avoid invalid labels that could be rejected by DNS resolvers.[53]Local-part Internationalization and SMTPUTF8
The original specification for the Simple Mail Transfer Protocol (SMTP) in RFC 5321 restricts the local-part of email addresses to ASCII characters, explicitly prohibiting non-ASCII octets (those with the high-order bit set to 1) and ASCII control characters (decimal values 0-31 and 127).[26] This limitation confines usernames to the Latin alphabet, numerals, and a limited set of symbols, creating significant challenges for international users who wish to employ native scripts such as Cyrillic, Arabic, or Chinese characters in their email addresses.[26] As global internet usage expands beyond English-speaking regions, this ASCII-only constraint hinders email accessibility, cultural inclusivity, and the ability to create personalized, linguistically appropriate usernames.[41] To overcome these restrictions, RFC 6531 defines the SMTPUTF8 extension, which extends SMTP to support the transport and delivery of email messages containing internationalized addresses and header information encoded in UTF-8.[41] This extension permits UTF-8 characters in the local-part of mailbox addresses (e.g., before the "@" symbol) and in header fields, while domain names remain encoded via Internationalizing Domain Names in Applications (IDNA) for DNS compatibility.[41] Servers implementing SMTPUTF8 must advertise their capability by including the "SMTPUTF8" keyword—without parameters—in the response to the client's EHLO command, informing the sender that non-ASCII content can be transmitted without modification.[41] Without this advertisement, clients are prohibited from sending internationalized messages to avoid delivery failures.[41] Server implementation of SMTPUTF8 involves several key requirements to ensure reliable handling of UTF-8 content. Servers must validate UTF-8 syntax in mailbox local-parts and headers, perform IDNA-compliant domain lookups, and store messages using UTF-8 encoding, typically in conjunction with the 8BITMIME extension (RFC 6152) to support 8-bit data in message bodies.[41] No inspection of the message body for non-ASCII content is mandated, but servers should reject invalid UTF-8 sequences with appropriate error codes, such as 553 for mailbox issues.[41] In cases where the receiving server does not support SMTPUTF8, sending clients must not attempt delivery and should either reject the transaction (e.g., with a 550 or 553 response) or, if configured, downgrade the message to an ASCII-compatible form, though the latter risks data loss and is discouraged.[41] Adoption of the SMTPUTF8 extension remains partial and uneven across the email ecosystem. Major providers such as Google Workspace (including Gmail) have supported SMTPUTF8 since 2014, enabling users to send and receive emails with UTF-8 local-parts. Similarly, Microsoft has integrated support in Exchange Server 2019 and later, as well as in Microsoft 365 environments.[54] However, legacy systems, on-premises deployments of older Exchange versions, and many smaller or regional providers continue to lack compatibility, resulting in bounce rates and errors for internationalized messages—such as the common "SMTPUTF8 is required, but was not offered" rejection.[55] Recent advancements, including ICANN's achievement of full Email Address Internationalization (EAI) support in its systems in July 2025 and the Universal Acceptance Steering Group's (UASG) FY2025-2029 strategic plan focusing on governments and providers, signal increasing momentum, though global uptake was limited to approximately 10% of domains as of 2021, with ongoing efforts to accelerate deployment in multilingual regions.[56][57]History and Evolution
Early Development
The development of email addresses began in the context of the ARPANET, the precursor to the modern Internet, where early messaging systems required a way to specify recipients across networked computers. In 1971, Ray Tomlinson, working at Bolt, Beranek and Newman (BBN), implemented the first program to send electronic mail between users on different ARPANET hosts using the TENEX operating system. He introduced the "@" symbol as a separator to denote "user at host," creating the foundational format ofuser@host to distinguish the recipient's identifier from the destination machine. This choice of the "@" was arbitrary among available non-alphanumeric symbols on the keyboard, but it quickly became the standard delimiter for network email addressing.[58]
Early standardization efforts followed to address inconsistencies in mail headers and formats across ARPANET systems. RFC 561, published in September 1973 by Abhay Bhushan and Ray Tomlinson, proposed uniform network mail headers, defining fields such as "FROM: host-phrase combining a user phrase with a host-indicator using "@" or "at" (e.g., "Neuman@BBN-TENEXA"). It introduced support for hierarchical routing paths with multiple "@" signs (e.g., "User@hosta@local-net1@major-net") and explicitly restricted characters to the 128-printable ASCII set from TELNET (codes 32-126 decimal), establishing an ASCII-only assumption that persisted in early implementations. By 1973, email had already become dominant on ARPANET, comprising 75% of network traffic, underscoring its rapid adoption among researchers.[59][60][2]local-part@domain where the domain is a dot-separated sequence of sub-domains (e.g., "[email protected]"). It eliminated multi-"@" paths in favor of source routing via separate mechanisms, making addresses more logical and extensible for internetwork use while preserving the local-part's case sensitivity and uninterpreted nature by intermediate systems. RFC 822 retained the ASCII character restriction, focusing on printable US-ASCII for compatibility. Concurrently, email addressing spread beyond ARPANET through systems like UUCP (Unix-to-Unix Copy Protocol), introduced in the late 1970s for dial-up Unix networks, which initially used "bang-path" notation (e.g., "host1!host2!user") but increasingly integrated with Internet-style @ addresses for interoperability in the early 1980s, enabling wider adoption in academic and research communities.[3]
