Hubbry Logo
MIMEMIMEMain
Open search
MIME
Community hub
MIME
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
MIME
MIME
from Wikipedia

Multipurpose Internet Mail Extensions (MIME) is a standard that extends the format of email messages to support text in character sets other than ASCII, as well as attachments of audio, video, images, and application programs. Message bodies may consist of multiple parts, and header information may be specified in non-ASCII character sets. Email messages with MIME formatting are typically transmitted with standard protocols, such as the Simple Mail Transfer Protocol (SMTP), the Post Office Protocol (POP), and the Internet Message Access Protocol (IMAP).

MIME is an Internet standard. It is specified in a series of Requests for Comments (RFCs): RFC 2045, RFC 2046, RFC 2047, RFC 4288, RFC 4289 and RFC 2049. The integration with SMTP email is specified in RFC 1521 and RFC 1522.

Although the MIME formalism was designed mainly for SMTP, its content types are also important in other communication protocols. In the Hypertext Transfer Protocol (HTTP) for the World Wide Web, servers insert a MIME header field at the beginning of any Web transmission. Clients use the content type or media type header to select an appropriate viewer application for the type of data indicated.

History

[edit]

MIME originated from the Andrew Messaging System, which was part of Andrew Project developed at Carnegie Mellon University (CMU), as a cross-platform alternative to the Andrew-specific data format.[1]

MIME header fields

[edit]

MIME-Version

[edit]

The presence of this header field indicates the message is MIME-formatted. The value is typically "1.0". The field appears as follows:

MIME-Version: 1.0

According to MIME co-creator Nathaniel Borenstein, the version number was introduced to permit changes to the MIME protocol in subsequent versions. However, Borenstein admitted short-comings in the specification that hindered the implementation of this feature:

We did not adequately specify how to handle a future MIME version. ... So if you write something that knows 1.0, what should you do if you encounter 2.0 or 1.1? I sort of thought it was obvious but it turned out everyone implemented that in different ways. And the result is that it would be just about impossible for the Internet to ever define a 2.0 or a 1.1.[2]

Content-Disposition

[edit]

The original MIME specifications only described the structure of mail messages. They did not address the issue of presentation styles. The content-disposition header field was added in RFC 2183 to specify the presentation style. A MIME part can have:

  • an inline content disposition, which means that it should be automatically displayed when the message is displayed, or
  • an attachment content disposition, in which case it is not displayed automatically and requires some form of action from the user to open it.

In addition to the presentation style, the field Content-Disposition also provides parameters for specifying the name of the file, the creation date and modification date, which can be used by the reader's mail user agent to store the attachment.

The following example is taken from RFC 2183, where the header field is defined:

Content-Disposition: attachment; filename=genome.jpeg;
  modification-date="Wed, 12 Feb 1997 16:29:51 -0500";

The filename may be encoded as defined in RFC 2231.

As of 2010, a majority of mail user agents did not follow this prescription fully. The widely used Mozilla Thunderbird mail client ignores the content-disposition fields in the messages and uses independent algorithms for selecting the MIME parts to display automatically. Thunderbird prior to version 3 also sends out newly composed messages with inline content disposition for all MIME parts. Most users are unaware of how to set the content disposition to attachment.[3] Many mail user agents also send messages with the file name in the name parameter of the content-type header instead of the filename parameter of the header field Content-Disposition. This practice is discouraged, as the file name should be specified either with the parameter filename, or with both the parameters filename and name.[4]

In HTTP, the response header field Content-Disposition: attachment is usually used as a hint to the client to present the response body as a downloadable file. Typically, when receiving such a response, a Web browser prompts the user to save its content as a file, instead of displaying it as a page in a browser window, with filename suggesting the default file name.

Content-Transfer-Encoding

[edit]

In June 1992, MIME (RFC 1341, since made obsolete by RFC 2045) defined a set of methods for representing binary data in formats other than ASCII text format. The content-transfer-encoding: MIME header field has 2-sided significance:

  • It indicates whether or not a binary-to-text encoding scheme has been used on top of the original encoding as specified within the Content-Type header:
  1. If such a binary-to-text encoding method has been used, it states which one.
  2. If not, it provides a descriptive label for the format of content, with respect to the presence of 8-bit or binary content.

The RFC and the IANA's list of transfer encodings define the values shown below, which are not case sensitive. '7bit', '8bit', and 'binary' mean that no binary-to-text encoding on top of the original encoding was used. In these cases, the header field is actually redundant for the email client to decode the message body, but it may still be useful as an indicator of what type of object is being sent. Values 'quoted-printable' and 'base64' tell the email client that a binary-to-text encoding scheme was used and that appropriate initial decoding is necessary before the message can be read with its original encoding (e.g. UTF-8).

  • Suitable for use with normal SMTP:
    • 7bit – up to 998 octets per line of the code range 1..127 with CR and LF (codes 13 and 10 respectively) only allowed to appear as part of a CRLF line ending. This is the default value.
    • quoted-printable – used to encode arbitrary octet sequences into a form that satisfies the rules of 7bit. Designed to be efficient and mostly human-readable when used for text data consisting primarily of US-ASCII characters but also containing a small proportion of bytes with values outside that range.
    • base64 – used to encode arbitrary octet sequences into a form that satisfies the rules of 7bit. Designed to be efficient for non-text 8 bit and binary data. Sometimes used for text data that frequently uses non-US-ASCII characters.
  • Suitable for use with SMTP servers that support the 8BITMIME SMTP extension (RFC 6152):
    • 8bit – up to 998 octets per line with CR and LF (codes 13 and 10 respectively) only allowed to appear as part of a CRLF line ending.
  • Suitable for use with SMTP servers that support the BINARYMIME SMTP extension (RFC 3030):
    • binary – any sequence of octets.

There is no encoding defined which is explicitly designed for sending arbitrary binary data through SMTP transports with the 8BITMIME extension. Thus, if BINARYMIME isn't supported, base64 or quoted-printable (with their associated inefficiency) are sometimes still useful. This restriction does not apply to other uses of MIME such as Web Services with MIME attachments or MTOM.

Encoded-Word

[edit]

Since RFC 2822, conforming message header field names and values use ASCII characters; values that contain non-ASCII data should use the MIME encoded-word syntax (RFC 2047) instead of a literal string. This syntax uses a string of ASCII characters indicating both the original character encoding (the "charset") and the content-transfer-encoding used to map the bytes of the charset into ASCII characters.

The form is: "=?charset?encoding?encoded text?=".

  • charset may be any character set registered with IANA. Typically it would be the same charset as the message body.
  • encoding can be either "Q" denoting Q-encoding that is similar to the quoted-printable encoding, or "B" denoting base64 encoding.
  • encoded text is the Q-encoded or base64-encoded text.
  • An encoded-word may not be more than 75 characters long, including charset, encoding, encoded text, and delimiters. If it is desirable to encode more text than will fit in an encoded-word of 75 characters, multiple encoded-words (separated by CRLF SPACE) may be used.

Difference between Q-encoding and quoted-printable

[edit]

The ASCII codes for the question mark ("?") and equals sign ("=") may not be represented directly as they are used to delimit the encoded word. The ASCII code for space may not be represented directly because it could cause older parsers to split up the encoded word undesirably. To make the encoding smaller and easier to read the underscore is used to represent the ASCII code for space creating the side effect that underscore cannot be represented directly. The use of encoded words in certain parts of header fields imposes further restrictions on which characters may be represented directly.

For example,

Subject: =?iso-8859-1?Q?=A1Hola,_se=F1or!?=

is interpreted as "Subject: ¡Hola, señor!".

The encoded-word format is not used for the names of the headers fields (for example Subject). These names are usually English terms and always in ASCII in the raw message. When viewing a message with a non-English email client, the header field names might be translated by the client.

Multipart messages

[edit]

The MIME multipart message contains a boundary in the header field Content-Type:; this boundary, which must not occur in any of the parts, is placed between the parts, and at the beginning and end of the body of the message, as follows:

MIME-Version: 1.0
Content-Type: multipart/mixed; boundary=frontier

This is a message with multiple parts in MIME format.
--frontier
Content-Type: text/plain

This is the body of the message.
--frontier
Content-Type: application/octet-stream
Content-Transfer-Encoding: base64

PGh0bWw+CiAgPGhlYWQ+CiAgPC9oZWFkPgogIDxib2R5PgogICAgPHA+VGhpcyBpcyB0aGUg
Ym9keSBvZiB0aGUgbWVzc2FnZS48L3A+CiAgPC9ib2R5Pgo8L2h0bWw+Cg==
--frontier--

Each part consists of its own content header (zero or more Content- header fields) and a body. Multipart content can be nested. The Content-Transfer-Encoding of a multipart type must always be "7bit", "8bit" or "binary" to avoid the complications that would be posed by multiple levels of decoding. The multipart block as a whole does not have a charset; non-ASCII characters in the part headers are handled by the Encoded-Word system, and the part bodies can have charsets specified if appropriate for their content-type.

Notes:

  • Before the first boundary is an area that is ignored by MIME-compliant clients. This area is generally used to put a message to users of old non-MIME clients.
  • It is up to the sending mail client to choose a boundary string that doesn't clash with the body text. Typically this is done by inserting a long random string.
  • The last boundary must have two hyphens at the end.

Multipart subtypes

[edit]

The MIME standard defines various multipart-message subtypes, which specify the nature of the message parts and their relationship to one another. The subtype is specified in the Content-Type header field of the overall message. For example, a multipart MIME message using the digest subtype would have its Content-Type set as "multipart/digest".

The RFC initially defined four subtypes: mixed, digest, alternative and parallel. A minimally compliant application must support mixed and digest; other subtypes are optional. Applications must treat unrecognized subtypes as "multipart/mixed". Additional subtypes, such as signed and form-data, have since been separately defined in other RFCs.

mixed

[edit]

multipart/mixed is used for sending files with different Content-Type header fields inline (or as attachments). If sending pictures or other easily readable files, most mail clients will display them inline (unless explicitly specified with Content-Disposition: attachment in which case offered as attachments). The default content-type for each part is "text/plain".

The type is defined in RFC 2046.[5]

digest

[edit]

multipart/digest is a simple way to send multiple text messages. The default content-type for each part is "message/rfc822".

The MIME type is defined in RFC 2046.[6]

alternative

[edit]

The multipart/alternative subtype indicates that each part is an "alternative" version of the same (or similar) content, each in a different format denoted by its "Content-Type" header. The order of the parts is significant. RFC1341 states: In general, user agents that compose multipart/alternative entities should place the body parts in increasing order of preference, that is, with the preferred format last.[7]

Systems can then choose the "best" representation they are capable of processing; in general, this will be the last part that the system can understand, although other factors may affect this.

Since a client is unlikely to want to send a version that is less faithful than the plain text version, this structure places the plain text version (if present) first. This makes life easier for users of clients that do not understand multipart messages.

Most commonly, multipart/alternative is used for email with two parts, one plain text (text/plain) and one HTML (text/html). The plain text part provides backwards compatibility while the HTML part allows use of formatting and hyperlinks. Most email clients offer a user option to prefer plain text over HTML; this is an example of how local factors may affect how an application chooses which "best" part of the message to display.

While it is intended that each part of the message represent the same content, the standard does not require this to be enforced in any way. At one time, anti-spam filters would only examine the text/plain part of a message,[8] because it is easier to parse than the text/html part. But spammers eventually took advantage of this, creating messages with an innocuous-looking text/plain part and advertising in the text/html part. Anti-spam software eventually caught up on this trick, penalizing messages with very different text in a multipart/alternative message.[8]

The type is defined in RFC 2046.[9]

[edit]

A multipart/related is used to indicate that each message part is a component of an aggregate whole. It is for compound objects consisting of several inter-related components – proper display cannot be achieved by individually displaying the constituent parts. The message consists of a root part (by default, the first) which reference other parts inline, which may in turn reference other parts. Message parts are commonly referenced by Content-ID. The syntax of a reference is unspecified and is instead dictated by the encoding or protocol used in the part.

One common usage of this subtype is to send a web page complete with images in a single message. The root part would contain the HTML document, and use image tags to reference images stored in the latter parts.

The type is defined in RFC 2387.

report

[edit]

multipart/report is a message type that contains data formatted for a mail server to read. It is split between a text/plain (or some other content/type easily readable) and a message/delivery-status, which contains the data formatted for the mail server to read.

The type is defined in RFC 6522.

signed

[edit]

A multipart/signed message is used to attach a digital signature to a message. It has exactly two body parts, a body part and a signature part. The whole of the body part, including mime fields, is used to create the signature part. Many signature types are possible, like "application/pgp-signature" (RFC 3156) and "application/pkcs7-signature" (S/MIME).

The type is defined in RFC 1847.[10]

encrypted

[edit]

A multipart/encrypted message has two parts. The first part has control information that is needed to decrypt the application/octet-stream second part. Similar to signed messages, there are different implementations which are identified by their separate content types for the control part. The most common types are "application/pgp-encrypted" (RFC 3156) and "application/pkcs7-mime" (S/MIME).

The MIME type defined in RFC 1847.[11]

form-data

[edit]

The MIME type multipart/form-data is used to express values submitted through a form. Originally defined as part of HTML 4.0, it is most commonly used for submitting files with HTTP. It is specified in RFC 7578, superseding RFC 2388. example

x-mixed-replace

[edit]

The content type multipart/x-mixed-replace was developed as part of a technology to emulate server push and streaming over HTTP.

All parts of a mixed-replace message have the same semantic meaning. However, each part invalidates – "replaces" – the previous parts as soon as it is received completely. Clients should process the individual parts as soon as they arrive and should not wait for the whole message to finish.

Originally developed by Netscape,[12] it is still supported by Mozilla, Firefox, Safari, and Opera. It is commonly used in IP cameras as the MIME type for MJPEG streams.[13] It was supported by Chrome for main resources until 2013 (images can still be displayed using this content type).[14]

byterange

[edit]

The multipart/byterange is used to represent noncontiguous byte ranges of a single message, it is used by HTTP when a server returns multiple byte ranges and is defined in RFC 2616.

RFC documentation

[edit]
  • RFC 1426, SMTP Service Extension for 8bit-MIMEtransport. J. Klensin, N. Freed, M. Rose, E. Stefferud, D. Crocker. February 1993.
  • RFC 1847, Security Multiparts for MIME: Multipart/Signed and Multipart/Encrypted
  • RFC 3156, MIME Security with OpenPGP
  • RFC 2045, MIME Part One: Format of Internet Message Bodies
  • RFC 2046, MIME Part Two: Media Types. N. Freed, Nathaniel Borenstein. November 1996.
  • RFC 2047, MIME Part Three: Message Header Extensions for Non-ASCII Text. Keith Moore. November 1996.
  • (RFC 4288, MIME Part Four: Media Type Specifications and Registration Procedures. Obsoleted by RFC 6838.)
  • RFC 6838, Media Type Specifications and Registration Procedures. J. Klensin, N. Freed, T. Hansen. January 2013. (Obsoletes RFC 4288.)
  • RFC 4289, MIME Part Four: Registration Procedures. J. Klensin, N. Freed. December 2005.
  • RFC 2049, MIME Part Five: Conformance Criteria and Examples. N. Freed, N. Borenstein. November 1996.
  • RFC 2183, Communicating Presentation Information in Internet Messages: The Content-Disposition Header Field. Troost, R., Dorner, S. and K. Moore. August 1997.
  • RFC 2231, MIME Parameter Value and Encoded Word Extensions: Character Sets, Languages, and Continuations. N. Freed, K. Moore. November 1997.
  • RFC 2387, The MIME Multipart/Related Content-type
  • RFC 1521, Mechanisms for Specifying and Describing the Format of Internet Message Bodies
  • RFC 7578, Returning Values from Forms: multipart/form-data

See also

[edit]

References

[edit]

Further reading

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Multipurpose Internet Mail Extensions (MIME) is an that extends the format of messages to support text in non-ASCII character sets, binary attachments such as images, audio, video, and application files, multipart message structures, and header fields containing non-ASCII data. Originally designed to overcome the limitations of the plain-text-only format defined in RFC 822, MIME enables the reliable transmission of diverse content types across text-based Internet protocols by specifying encoding methods and descriptive headers. MIME was developed by Nathaniel S. Borenstein and Ned Freed to address the growing need for multimedia email in the early 1990s. The initial specification appeared in June 1992 as RFC 1341, which outlined mechanisms for multi-part and non-textual message bodies, along with companion RFC 1342 detailing specific content types. This early version was updated in 1993 by RFCs 1521 and 1522, and the definitive standards were established in November 1996 through RFCs 2045 (format of message bodies), 2046 (media types), 2047 (non-ASCII headers), 2048 (conformance), and 2049 (additional considerations). At its core, MIME introduces a hierarchical system of media types in the form of type/subtype pairs (e.g., text/html for formatted text or application/pdf for documents) to classify content, allowing recipients to interpret and process it appropriately. Binary data is encoded using methods like (for arbitrary 8-bit data) or (for mostly ASCII text with occasional binary elements) to ensure compatibility with 7-bit transport channels. Messages can be structured as single parts or complex nests using boundaries to separate components, supporting everything from simple attachments to richly composed emails with inline images. Although rooted in email, MIME's concepts have profoundly influenced broader Internet technologies, particularly through its media types, now officially termed "media types" in RFC 6838. In HTTP, these types appear in Content-Type and Accept headers to describe response formats and negotiate content delivery, enabling web browsers and servers to handle diverse resources like data (application/json) or streaming video. The (IANA) oversees the global registry of these types, ensuring standardization and extensibility across protocols.

Overview

Definition and Purpose

Multipurpose Internet Mail Extensions (MIME) is an Internet standard that extends the format of email messages to support the inclusion of binary data, non-ASCII text, and attachments within text-based protocols such as Simple Mail Transfer Protocol (SMTP). Defined in a series of Request for Comments (RFC) documents—specifically RFC 2045 through RFC 2049—MIME enables the representation of diverse content types in a structured manner, transforming the limitations of early email systems into a versatile framework for multimedia communication. The primary purpose of MIME is to facilitate the transport of non-textual data over channels originally designed for plain text, such as SMTP, which traditionally handled only 7-bit US-ASCII characters. By supporting an extensible range of content types—including images, audio, video, and application-specific files—MIME allows messages to incorporate multiple body parts, each described by appropriate headers, thereby enabling richer and more complex email exchanges without breaking compatibility with legacy systems. This structured approach ensures that messages can be reliably parsed and rendered across diverse mail systems. At its core, MIME builds upon the foundational email format outlined in RFC 822, which specified headers in US-ASCII and treated message bodies as unstructured text. It achieves this extension by introducing additional header fields that describe content types, encodings, and structures, while maintaining 7-bit ASCII compatibility to preserve interoperability with older infrastructure. Developed in the early 1990s to overcome the constraints of plain text-only , MIME thus provides a robust mechanism for evolving messaging toward support.

Applications and Usage

MIME's primary application is in electronic mail systems utilizing the (SMTP), where it enables the inclusion of non-textual content such as file attachments, HTML-formatted message bodies, and inline images within messages. This extension allows email clients to handle diverse media types beyond plain ASCII text, supporting richer communication formats. MIME has been extended to the Hypertext Transfer Protocol (HTTP) as specified in RFC 2046, where it defines media types for specifying the content of web responses, such as Content-Type: image/jpeg for transmitting images or application/pdf for documents. This integration facilitates the delivery of multimedia resources over the web, ensuring browsers and servers correctly interpret and process various file formats. In other protocols, MIME supports multimedia messaging in the Network News Transfer Protocol (NNTP) for Usenet articles, allowing binary attachments and rich text in news posts, and in the Session Initiation Protocol (SIP) for encapsulating media in signaling messages. These adaptations extend MIME's utility to distributed discussion systems and real-time communication sessions. In modern contexts, MIME integrates with Secure/Multipurpose Internet Mail Extensions (S/MIME) to provide digitally signed and encrypted email, enhancing security for MIME-formatted messages across transports like SMTP. Additionally, the multipart/form-data subtype is widely used in web APIs for handling file uploads in HTML forms, enabling the transmission of multiple files and form fields in HTTP requests. Regarding limitations, MIME messages can become large due to encoded attachments, prompting adaptations in transport protocols; for instance, HTTP employs to stream large MIME bodies without buffering the entire payload. In mobile messaging standards like (MMS), MIME structures multimedia content such as images and videos within messages, with protocols handling size constraints through segmentation or gateway mappings.

History

Origins and Development

MIME was developed between 1991 and 1992 by Nathaniel Borenstein at Bellcore (now Telcordia Technologies) and Ned Freed (1959–2022) at Innosoft International to overcome the limitations of systems that were restricted to . Borenstein, drawing from his prior work on messaging, and Freed, an experienced software maintainer, collaborated to create a framework that would enable the inclusion of diverse content types in . The primary motivations stemmed from the rapid growth of the in the early , which highlighted the need for to support rich media such as images, audio, and non-ASCII characters, beyond the text-only capabilities of existing protocols. This effort was inspired by earlier systems like the Andrew Message System (AMS), a platform Borenstein co-developed at in the late 1980s, which demonstrated the feasibility of integrated rich content but was not interoperable with standard . MIME built upon and generalized early prototypes for binary data transmission, including BinHex, which encoded Macintosh files for , and uuencode, a Unix tool from 1979 that converted to ASCII-safe text. These ad-hoc methods addressed immediate needs but lacked standardization, prompting MIME's more robust approach to encoding and content description. The initial release of MIME occurred in June 1992 as an experimental standard outlined in RFC 1341, specifically designed to comply with the 7-bit data transport restrictions imposed by the (SMTP) as defined in RFC 821. This allowed MIME to extend functionality without requiring changes to the underlying SMTP infrastructure.

Standardization and Evolution

The initial formal standardization of MIME began with RFC 1341, published in June 1992 as an experimental specification by Nathaniel S. Borenstein and Ned Freed (1959–2022), which introduced mechanisms for specifying and describing the format of message bodies to support multipart and non-textual content. This document laid the groundwork for extending the (SMTP) beyond plain ASCII text but was designated as experimental due to its novel approach. It was subsequently obsoleted in November 1996 by a comprehensive set of five proposed standard RFCs—RFC 2045 through RFC 2049—co-authored by Borenstein and Freed, which refined and expanded the specification into a more robust framework for mail. These key RFCs form the core of the MIME standard: RFC 2045 defines the overall format of Internet message bodies and the structure of MIME headers; RFC 2046 specifies media types and subtypes for content classification; RFC 2047 details extensions for non-ASCII text in message headers, including the encoded-word syntax; RFC 2048 establishes registration procedures for MIME-related facilities such as media types and external body access types; and RFC 2049 outlines conformance criteria along with illustrative examples of MIME messages. Together, they elevated MIME to proposed standard status within the Internet Engineering Task Force (IETF), enabling broader interoperability in email systems. MIME's evolution has involved targeted updates to address emerging needs. In November 1997, RFC 2231 introduced extensions for MIME parameter values and encoded words to better support , allowing parameters with non-ASCII characters and extended attribute continuations. For security, the Secure/Multipurpose Internet Mail Extensions () protocol was advanced through RFC 8551 in April 2019, defining version 4.0 with enhancements for digital signatures, encryption, and certificate handling to secure MIME data. Ongoing maintenance includes IETF errata processes and minor updates, such as clarifications in handling, ensuring compatibility with modern protocols. Adoption accelerated following the 1996 RFCs, with widespread implementation in email clients like Pine and Eudora by that year, which integrated MIME support for attachments and rich content, driving its use in both academic and commercial environments. MIME was further embedded in web technologies through its integration into HTTP/1.1, as specified in RFC 7231 (June 2014), where media types and content encoding mechanisms underpin resource representation and transfer.

Message Structure

Overall Format

A MIME message is structured as a series of headers followed by a body, mirroring the basic format of RFC 822 s but extended to support multimedia content. The headers consist of key-value pairs, where each header field is encoded in US-ASCII characters and terminated by a line feed (CRLF) sequence. The collection of headers ends with a blank line (CRLF followed by another CRLF), after which the body begins. This separation ensures that parsers can reliably distinguish the metadata from the content. For simple, single-part messages, the body directly contains the message content, which may be or encoded data as specified by the headers. In contrast, multipart messages organize the body into multiple discrete parts, enabling the inclusion of diverse content types within a single . Each part in a multipart message includes its own set of headers—again in US-ASCII with CRLF terminations—followed by a blank line and then the part's specific body content. These parts are delimited by unique boundary strings, which are opaque sequences chosen to avoid conflicts with the content itself. To maintain with non-MIME-aware mail systems, the overall format of MIME messages is designed such that unrecognized or unsupported elements are treated as undifferentiated text. For instance, in a multipart message, a legacy reader ignoring the boundary mechanism will interpret the entire body as a continuous text stream, with subsequent parts appended inline after the initial content. This approach allows MIME messages to be safely transported through older infrastructure without requiring universal upgrades.

Header Fields

MIME header fields extend the standard headers defined in earlier specifications to describe the content of MIME entities, enabling the handling of diverse media types and encodings in mail. These fields are placed in the header section of a or body part and follow the general syntax rules for headers, where long lines may be folded using whitespace for , as outlined in the Message Format standard. Parameters within these headers, particularly those involving non-ASCII characters, employ a specific encoding mechanism to ensure compatibility across systems. The MIME-Version header field declares the version of the MIME specification to which the message conforms, typically set to "1.0" for initial implementations. Its presence is required in the top-level header of any MIME-conformant message to signal that the message uses MIME extensions rather than plain RFC 822 formatting. The syntax is straightforward:

MIME-Version: 1.0

MIME-Version: 1.0

This field must appear before other MIME-specific headers and is case-insensitive. The Content-Type header field specifies the and subtype of the content in a MIME , allowing receiving agents to determine how to the body. It uses the format "type/subtype" with optional parameters, such as charset for or boundary for multipart structures. For example:

Content-Type: text/plain; charset=[UTF-8](/page/UTF-8)

Content-Type: text/plain; charset=[UTF-8](/page/UTF-8)

This header is essential for all MIME body parts, defaulting to "text/plain; charset=US-ASCII" if omitted in certain contexts, and supports a wide range of registered media types maintained by the . The Content-Transfer-Encoding header field indicates the encoding applied to the body to ensure safe transport over 7-bit networks, such as "7bit", "8bit", "binary", "quoted-printable", or "base64". It declares how the content has been transformed but does not alter the underlying media type. An example is:

Content-Transfer-Encoding: base64

Content-Transfer-Encoding: base64

This field is optional but recommended when the body requires non-7-bit safe transport, with "7bit" as the default assumption. Detailed encoding methods are specified separately to maintain compatibility with legacy mail systems. The Content-Disposition header field, introduced to provide guidance on how the content should be presented to the user, uses values like "inline" for display within the message or "attachment" for separate handling, often with a "" parameter suggesting a name for saving the content. Defined in 1997, its syntax includes:

Content-Disposition: attachment; filename="example.jpg"

Content-Disposition: attachment; filename="example.jpg"

This optional field applies to any MIME entity and helps user agents decide between rendering content directly or prompting for download, particularly useful for non-text attachments. Additional MIME header fields include Content-ID, which assigns a unique global identifier to a body part for cross-referencing, typically in the form of a /external-body or multipart/related , using a syntax like:

Content-ID: <[email protected]>

Content-ID: <[email protected]>

This field enables features such as embedding references in messages. Complementing it, the Content-Description field offers a human-readable textual description of the content's purpose, such as:

Content-Description: A photo from the conference

Content-Description: A photo from the conference

It is optional and unstructured, aiding users in understanding opaque entities without affecting processing. Extensions like the Content-Language header field specify the natural language(s) of the content using language tags, as standardized for broader protocols. For instance:

Content-Language: en-US

Content-Language: en-US

This optional field supports by indicating audience languages, with multiple tags separated by commas for multilingual content.

Encoding Methods

Content-Transfer-Encoding

The Content-Transfer-Encoding header field specifies the encoding transformation applied to the body of a MIME entity to ensure it can be safely transported over 7-bit text channels, such as those used by SMTP, which originally supported only US-ASCII data. This mechanism converts 8-bit or into forms compatible with 7-bit networks, preventing corruption during transmission. The header value is case-insensitive and applies only to the entity's body, not its headers. The standard encoding types defined for Content-Transfer-Encoding are 7bit, 8bit, binary, quoted-printable, and base64. The 7bit type indicates that no encoding has been applied, with the body consisting solely of US-ASCII characters (values 1 through 127), lines limited to no more than 1000 octets, and no line exceeding 998 octets followed by a CRLF line break. This is the default encoding assumed when the header is absent. The 8bit type allows for 8-bit data in the body, preserving byte values from 128 to 255, but still requires lines to be no longer than 998 octets plus CRLF; however, it assumes the underlying transport supports 8-bit channels, which is not guaranteed in all environments. Binary encoding signals that the body contains arbitrary with no restrictions on octet values, suitable only for transports capable of handling 8-bit or binary streams, as standard 7-bit SMTP cannot reliably convey it without further wrapping. Quoted-printable encoding, designed primarily for textual data that is mostly printable in US-ASCII, escapes 8-bit characters and control octets (0-31 and 127, excluding TAB, CR, and LF which are handled specially) and the (=) by representing them as an equals sign followed by two digits (=HH, where H is 0-9 or A-F). Printable characters (33-126) are left unchanged to maintain readability, except = which is encoded; (32) and TAB (9) may be left unencoded except at the end of a line, where they must be encoded. Lines are "soft-broken" every 76 characters by inserting =CRLF, which is ignored during decoding; hard line breaks in the original are preserved as-is. This method minimizes size overhead for text-heavy content—for instance, an 8-bit octet like 0xFF becomes =FF—but can become inefficient for dense binary data. Base64 encoding transforms arbitrary binary data into a 7-bit safe form by treating the input as groups of 24 bits (three octets), dividing each group into four 6-bit values, and mapping them to a 64-character alphabet: A-Z (0-25), a-z (26-51), 0-9 (52-61), + (62), and / (63). If the input length is not a multiple of three, padding with one or two = characters is added at the end to indicate the shortfall (one = for two octets, two = for one octet), and output lines are limited to 76 characters with CRLF breaks. This results in a size increase of about 33% for the encoded data (4/3 ratio), making it efficient for binary attachments while ensuring no information loss, though it renders the content unreadable without decoding. Among these, is preferred for binary data due to its robustness across 7-bit transports, while suits mostly textual content where human is desirable; 7bit offers no overhead but limits content to ASCII, and 8bit/binary depend on enhanced transport support. Obsolete types like x-uuencode, which used a different scheme, are vendor-specific extensions not part of the MIME standard and are not recommended for new implementations.

Encoded-Word Syntax

The encoded-word syntax, defined in RFC 2047, provides a mechanism for embedding non-ASCII text within MIME header fields to support while maintaining compatibility with ASCII-based systems. It consists of a structured token in the form =?charset?encoding?encoded-text?=, where the opening =? and closing ?= delimiters indicate the encoded content. This syntax is permitted in unstructured header fields such as Subject and in comments or unstructured text portions of structured fields like From or To. The components of an encoded-word include the charset, which specifies the character set (e.g., or ISO-8859-1); the encoding, which is either "Q" for Q-encoding or "B" for ; and the encoded-text, which represents the transformed text. In B-encoding, the text is first converted to bytes in the specified charset and then directly encoded using , as defined in the Content-Transfer-Encoding section. Q-encoding, in contrast, is a modified scheme tailored for headers: it represents printable ASCII characters (excluding ?, , and =) as themselves, spaces as underscores (), and all other characters (including non-printable ones and the specials ?, _, =) as an equals sign (=) followed by two hexadecimal digits representing the octet value in the charset. Unlike standard used in message bodies, Q-encoding prohibits soft line breaks (i.e., no = at the end of lines for continuation) and uses _ exclusively for spaces to avoid conflicts in header parsing. For example, the string "a b" in ISO-8859-1 might be encoded as =?ISO-8859-1?Q?a_b?=, where the space is replaced by _. A non-ASCII character like é (octet 0xE9) would appear as =?ISO-8859-1?Q?test=E9?=. Each encoded-word must not exceed 75 characters in length, including the charset name, encoding indicator, encoded-text, and delimiters, to ensure reliable transmission across diverse systems. Longer texts are split into multiple consecutive encoded-words, separated by linear whitespace (spaces or CRLF with spaces), which is ignored during decoding; for instance, a long subject might use =?UTF-8?Q?Part_One?= =?UTF-8?Q?_Part_Two?=. Decoding encoded-words reverses the encoding process. For B encoding, the base64-encoded text between ?B? and ?= is extracted, base64-decoded to bytes, and the bytes are then decoded using the specified charset to recover the original Unicode text. For example, =?utf-8?B?SGVsbG8g8J+Yig==?= decodes to "Hello 😊" by base64-decoding SGVsbG8g8J+Yig== to bytes and interpreting them as UTF-8. For Q encoding, the process reverses the substitutions: underscores (_) become spaces, =XX sequences become the corresponding octets, and the result is interpreted in the specified charset. Consecutive encoded-words separated by linear whitespace are decoded individually, and their resulting texts are concatenated directly, with the intervening whitespace ignored. In cases involving mixed plain text and encoded-words, multiple encodings or charsets, or complex headers, MIME-aware libraries such as Python's email.header.decode_header() are recommended, as they correctly parse, decode, and concatenate the components. Systems that do not support encoded-words treat them as unknown tokens and, per conformance requirements, should display them as question marks (?) or discard the encoded portion while preserving surrounding text to avoid complete loss of header information. This fallback ensures basic readability in legacy clients.

Multipart Messages

Boundary Mechanism

In multipart MIME messages, the boundary mechanism serves as a to separate individual body parts within the overall message body. The boundary is defined as a unique string, specified as the "boundary" in the Content-Type header field of the multipart entity, with a maximum of 70 characters to ensure the entire line does not exceed 76 characters including the required hyphens and line terminators. This string is typically generated randomly or algorithmically to minimize the risk of accidental matches with content, often incorporating prefixes like hyphens or equals signs for added distinctiveness. The structure of a multipart using boundaries begins with a line consisting of two hyphens ("--") immediately followed by the boundary string, terminated by a CRLF ( line feed) sequence; this marks the start of the first body part, which is then followed by the part's own headers and content. Subsequent body parts are preceded by the same "--boundary" line, allowing each part to have its independent headers and body until the next . The concludes with a final line of "--boundary--", also terminated by CRLF, signaling the end of all parts; any material after this is invalid. All boundary lines must appear as complete lines by themselves, with no leading or trailing whitespace, and the encapsulated body parts must not contain the boundary string to prevent parsing ambiguities. To support hierarchical structures, nested multipart messages are permitted, but each level must employ a distinct boundary to avoid conflicts during . Boundaries are always encoded in 7-bit US-ASCII, regardless of the content-transfer-encoding used for the body parts. If a boundary inadvertently appears within a body part's content, it can cause the parser to prematurely terminate the message or misinterpret subsequent parts, leading to parsing failures; thus, content creators are advised to choose boundaries that are highly unlikely to occur naturally, such as by using a combination of timestamps, process IDs, or random characters. For generating safe boundaries, RFC 2046 recommends using a short random string, ideally avoiding common words or patterns, and prefixing it with a distinctive marker like "=_" followed by alphanumeric characters to enhance uniqueness without relying on external sources. A simple example of a two-part multipart/mixed with a text body and an attachment might appear as follows in the message body:

--boundary42 Content-Type: text/plain This is the [plain](/page/Plain) text body of the message. --boundary42 Content-Type: application/octet-stream Content-Disposition: attachment; filename="example.txt" Content-Transfer-Encoding: [base64](/page/Base64) SGVsbG8gd29ybGQh (base64-encoded content) --boundary42--

--boundary42 Content-Type: text/plain This is the [plain](/page/Plain) text body of the message. --boundary42 Content-Type: application/octet-stream Content-Disposition: attachment; filename="example.txt" Content-Transfer-Encoding: [base64](/page/Base64) SGVsbG8gd29ybGQh (base64-encoded content) --boundary42--

In this illustration, the boundary "boundary42" delineates the text part from the binary attachment, with the final double-hyphen closing the multipart structure.

Subtype Variants

The multipart content type in MIME allows a message body to consist of multiple independent body parts, each with its own MIME headers and content, separated by boundaries. Subtypes of multipart specify the semantic relationship among these parts, enabling structured composition of content such as text, attachments, or alternative formats. The initial specification in RFC 2046 defines four primary subtypes, with additional ones introduced in later RFCs to address specific use cases like compound documents or security. These subtypes ensure interoperability by dictating how receiving agents should interpret and render the parts. multipart/mixed is the foundational subtype, used when body parts are independent and intended to be presented sequentially in the order they appear. It supports bundling diverse content types, such as plain text with binary attachments like images or files, without implying any interdependency. This subtype requires no additional parameters beyond the boundary and is the default for general-purpose multipart messages. multipart/alternative provides multiple representations of the same data in different formats, allowing the recipient's to select the "best" one based on capabilities, such as preferring over . The parts are ordered from least to most faithful to the original, with the final part typically being the richest (e.g., rich text or ). User agents must not present all alternatives simultaneously to avoid . This subtype is widely used in for dual-format messages, enhancing across diverse clients. multipart/parallel indicates that all body parts should be presented simultaneously, as they collectively represent a single logical entity, such as synchronized audio and video streams. Unlike mixed, the order of parts is not prescriptive for sequential display, and agents are expected to render them in parallel where possible. This subtype is less common in practice due to challenges in synchronized presentation but remains part of the core MIME framework. multipart/digest treats each body part as a complete, independent (often encapsulated as message/rfc822), suitable for bundling multiple messages into a single transmission, such as in digests or news feeds. The parts are processed as separate entities, and the subtype implies no rendering order beyond the sequence provided. It facilitates efficient distribution of message collections while preserving individual headers. Later extensions include multipart/related, defined for compound documents where parts reference each other, such as an body part linking to embedded images via Content-ID. This subtype uses a "type" parameter to indicate the root part (e.g., ) and a "start" for the starting reference, enabling cohesive rendering of interrelated content. It is essential for web-like structures in . For security, multipart/signed encapsulates a signed MIME entity alongside the signature, allowing verification of integrity and authenticity without altering the original content. Defined in the standard (RFC 8551), it uses protocols like CMS for the signature part, with the first body part being the signed data and subsequent parts containing the signature. Similarly, multipart/encrypted wraps encrypted content with decryption instructions, supporting in MIME exchanges. These subtypes integrate cryptographic protections while maintaining MIME compatibility.
Add your contribution
Related Hubs
User Avatar
No comments yet.