Ch 16 -- MIME--Multipurpose Internet Mail Extensions
UNIX Unleashed, Internet Edition
- 16 -
MIME--Multipurpose Internet Mail Extensions
by Robin Burk
HTML underlies the World Wide Web, but it is only one of a number of standard
data types whose definition makes the Web possible. In this chapter, we'll look at
the broader set of data formats used by Web and Internet programs to bridge the gaps
between diverse operating systems and hardware platforms.
The topics covered in this chapter include:
- How MIME became an Internet standard
- Common MIME data types
- Web pages, Web servers and MIME
TIP: Understanding what MIME formats are
and how they are approved can help you and your users solve day-to-day problems with
interpreting e-mail attachments or choosing browser plug-in software.
How MIME Became an Internet Standard
MIME (Multipurpose Internet Mail Extensions) is one of the Internet protocol standards
defined by the Internet Engineering Task Force (IETF). Once associated primarily
with electronic mail, MIME has evolved to become an important element supporting
multimedia applications on the Net. In order to understand MIME and how it operates,
it's helpful to step back and see how it got to where it is today.
How Internet Standards Are Adopted
The IETF is the official body that proposes and adopts communications protocols,
data formats, and similar conventions to be supported by the public Internet. For
instance, all of the familiar Internet communications protocols, such as TCP, IP,
PPP and SLIP, are formally defined by IETF documents called Requests For Comment
(RFCs). The IETF also defines the Simple Mail Transfer Protocol (SMTP), the Network
Timing Protocol (NTP), and newer, multimedia protocols such as the Resource reSerVation
Protocol (RSVP) and the Real Time Protocol (RTP) that support interactive conferencing
over the Net.
Not all RFCs adopted by the IETF become Internet standards. Those that are proposed
for the standards track often begin as Internet Drafts submitted by one or more people
from industry or academia. Internet Drafts must advance to RFC status within six
months of publication or they are removed from consideration.
Once advanced to RFC status, a proposed protocol is open for comment and can be
superseded by a revised version based on feedback from the technical community. Any
interested party can participate in the discussion, either online or at face-to-face
meetings. Each RFC is shepherded and debated within a specific Working Group of the
IETF. The Working Groups meet from time to time to hammer out the details of proposed
protocols.
Some RFCs are not intended for adoption as Internet standards. A few contain comments
or information about a given technical scenario or about the standards process itself.
Other informational RFCs do define protocols in detail, but are not proposed for
adoption as standards because they were developed by a single company that chooses
to retain control of their evolution. The RFCs that describe successive versions
of Sun's Network File System fall into this category. By publishing the definition
of the NFS protocol, Sun allows and encourages other vendors to support NFS in their
own operating systems. In this way NFS has become a de facto, but not official,
Internet standard.
Finally, some RFCs are designated as experimental (available for limited implementation
to evaluate their effectiveness) or historical (once in use, now effectively replaced
by an alternative protocol).
Official standards are not necessarily required to be adopted by all Internet
server or client systems. A standard may fall into any of several categories:
- Required--All systems must implement this protocol. The Internet Protocol (IP)
and associated Internet Control Message Protocol (ICMP) are among the required standard
protocols for all systems that are directly connected to the public Internet.
- Recommended--All systems should implement this protocol unless there is strong
reason to do otherwise. The Transmission Control Protocol (TCP), File Transfer Protocol
(FTP), and Telnet are some of the recommended standard protocols for the Internet.
- Elective--Any system that is going to implement something along these lines must
do so in accordance with the RFC. For example, MIME is an elective standard protocol
for the Internet.
- Limited Use--Use of these protocols is limited to special circumstances dues
to the experimental, historic, or specialized nature of the protocol.
- Not Recommended--General use of these protocols is not recommended due to their
limited functionality, specialized nature, or experimental or historic state.
The InterNIC Web site contains links to online copies of the RFCs, Internet Drafts,
and other Internet-related information. Point your browser to http://www.internic.net/ds/
for the main site and to http://ds.internic.net/ds/dspg0intdoc.html
to search for specific topics in the RFC database.
NOTE: RFCs are never modified once they
have been submitted and adopted. Instead, new RFCs are created when it is proposed
that an existing protocol be modified. The IETF online index contains an entry for
each RFC, which, among other information, states which RFCs it supersedes and which
supersede it. You can follow this chain to view the evolution of a protocol over
time. In most cases, the authors of newer RFCs will explicitly state in their documents
why they're proposing changes to the older standards or drafts.
TIP: The RFC mechanism itself is used
to document all of the RFCs that have been officially adopted as Internet standards
at any given time. As of June 1997, the list of current standards was contained in
RFC 2200. You can look up the index entry for this RFC to determine if any new standards
have been adopted since that time.
This is an easy way to acquaint yourself with the current standards for the Internet
without retracing the historical development of the Internet protocol suite.
TIP: The definition of a protocol in a
Request For Comment can look pretty formidable at first reading. These documents
are intended to constrain and direct software implementers, and are often quite formal
and abstract in tone.
Most RFCs do have an introduction and rationale that are more accessible, because
their purpose is to gain the support of the Internet technical community as a whole.
Reading these initial pages of an RFC can help you understand the intent of a protocol
and how it fits into the overall Internet architecture.
It will also help you to read an RFC to know that the specific values of parameters,
codes, and identifiers for a protocol are maintained in a separate document. The
Internet Assigned Numbers Authority (IANA) coordinates the values assigned to parameters
throughout the Internet protocols. At the time this is being written, RFC 1700 defined
the assigned number codes. The latest Standards list will always identify the associated
Assigned Numbers RFC.
You can use the assigned numbers RFC, along with the message formats defined in
the protocol RFCs, to completely decode network messages captured by software and
hardware monitors. Usually the most common message formats and parameter values are
documented by your software vendor; when troubleshooting a network problem, however,
it may be necessary to identify and decode an uncommon message type. Knowing your
way around the RFCs gives you another tool to use in troubleshooting network operations.
History of MIME
As its name suggests, MIME originally was associated with electronic mail transmission
over the Internet.
The core standards for Internet e-mail are defined in RFC 821 "Simple Mail
Transfer Protocol" and RFC 822 "Standard for the Format of ARPA Internet
Text Messages". Together, these documents define a common format for e-mail
encoded as U.S. ASCII characters.
Within the original ARPANET, a single, text-oriented e-mail standard was practical
and appropriate. Over time, however, the ARPANET underwent several significant changes,
among them a transition from its original home in the Department of Defense to become
the public Internet, which in turn now supports the World Wide Web and attracts truly
global use.
As the scope of the public internetwork expanded, it became useful to define ways
for e-mail to be exchanged across the Net without requiring non-ASCII systems to
convert all message character sets. Non-U.S. ASCII e-mail traveling over the Internet
is analogous to letters written in French or Chinese being sent through the U.S.
Postal Service. All that is required is that the letter be enclosed within an envelope
that carries the standard addressing information in a form readable to the Postal
Service's employees and scanning machines.
In addition, users often wanted to attach files of various formats and origins
to their e-mail messages, much as the writer of a letter might include a newspaper
clipping, photograph, or check in the letter's envelope. Potential e-mail attachments
might be the output of standard applications such as word processors and spreadsheets,
or might consist of binary executable files, graphical images, or even data files
from custom applications.
MIME was intended to support both of these scenarios. At its most fundamental,
MIME encodes e-mail messages into standard formats beyond the ASCII text format defined
in the original ARPANET protocols.
By extending these formats to include multi-part messages, MIME allows e-mail
messages to have attached files in a variety of formats. Prior to the adoption of
the MIME protocols, users on diverse systems (and often on similar systems) could
not easily pass non-text information along with their e-mail.
The MIME protocol provides both a list of currently-defined message types and
also a mechanism for adding new formats over time. This means that MIME can evolve
to support new multimedia formats, application file types, languages, character sets,
and other data types as they become widespread or otherwise useful within the Internet's
technical environment. It is this breadth of scope, and its open-ended nature, that
places MIME in the category of "elected" rather than "recommended"
or "required" Internet protocols.
MIME data type definitions soon found uses beyond e-mail. When the founders of
the World Wide Web created a hypertext capability, they found it easy to use the
MIME framework to define a new hypertext data type to specify HTML scripts. And when
the language rules for HTML were written, the authors found it easy to allow graphics
to be embedded in Web pages because MIME had already centralized the definition of
graphical image formats.
Today there are MIME formats for audio, video, ZIPped, and vendor-specific data
types. MIME even provides a way to name a data type for which no official IANA recognition
has yet occurred. This allows software vendors to create optimized or specialized
formats that, if they achieve widespread adoption, are then likely to be added to
the official list. Developers of browser clients and browser plug-ins have made extensive
use of this capability. In this way, MIME plays a critical role in the rapid evolution
of both the World Wide Web and of the wider use of multimedia in computing. All this
from what started as humble extensions to ASCII e-mail messages!
The MIME Data Type Scheme
For many years, the core MIME documents were RFCs 1521 and 1522. In November 1996,
however, a new series of MIME standards were proposed in RFCs 2045 through 2049.
These documents reflect the great variety of data types that had evolved, especially
for multimedia applications, since the original MIME definitions were established.
RFC 2046 outlines the media types that are supported by MIME. More accurately,
this RFC outlines the categories into which such data types can be placed.
The first distinction to be made is between discrete media and composite media.
Discrete media contain a single entity or data object. An entity consists
of a MIME header and either the contents of a message or one of the parts of a multi-part
message. MIME treats discrete media as opaque objects that are passed on to the receiving
application without interpretation or other processing.
Composite media contain multiple entities, which can be of the same or different
types. Composite media require MIME processing to correctly handle the various entities
being transmitted together.
MIME defines top-level media types, which are used to specify the general type
of data, and subtypes, which typically specify a particular format for that type
of data. New top-level media types and lower-level subtypes may be added as needed.
The definition of a top-level media type includes the following:
- A name and description of the type, along with the criteria by which a particular
media format would be known to fall under this type
- Parameters associated with all formats (subtypes) of this type
- How a user agent or a gateway should handle otherwise unknown subtypes of this
type
- Other issues and considerations regarding the handling of entities of this type
- Restrictions on content-transfer-encodings for entities of this type to ensure
that the information being transmitted is not inadvertently distorted
There are five discrete top-level media types initially defined in the new MIME
scheme. These are:
- Text--Readable text, including those word processor formats whose content is
more or less readable when displayed on a screen or printer.
- Image--Static graphical images that require a graphical display (monitor), a
graphic printer, or a fax machine for the user to view the information. Subtypes
include:
- Audio--Information requiring a speaker, telephone, or similar device to allow
the user to hear the contents.
- Video--Information requiring the capability to display moving images, typically
with specialized hardware and software.
- Application--Other kinds of data, either binary files (typically stored into
a disk file for the user to manage) or information to be processed by an application
program. The association of an appropriate application with a specific application
data subtype is made at the client machine.
The two top-level composite media types are:
- Multipart--Data consisting of multiple entries of independent data types. Subtypes
include generic "mixed" entities; "alternative" formats of the
same data; "parallel" entities that are intended to be view simultaneously
(as with audio and video that go together); and "digest" for transmitting
multiple mail messages in a single message.
- Message--An encapsulated message. The "rfc822" subtype is used when
the encapsulated message is itself an ASCII mail message as defined by RFC822. The
"partial" subtype allows large messages to be fragmented and later reassembled.
The "external-body" subtype passes a reference to a large, external data
source rather than the contents of the source.
MIME types that are not recognized by IANA are given names that start with "x-".
For instance, the MPEG layer-2 format for audio information, which is associated
with file extension .mp2, is mapped to the MIME type audio/x-mpeg".
Officially recognized MIME types are generally supported by the relevant server and
client software, but private or experimental types may require explicit configuration
at both the Internet server and the client workstation in order to be processed correctly.
Common MIME Data Types
Although the top-level MIME media types correspond to basic concepts that all
users would understand, not all subtypes fall under the obvious media category. Those
that are associated with specific application software, for instance, may be classified
as application types rather than text, image, or audio, despite being widely available
over the Internet. Often these data types require a browser plug-in before their
contents will be correctly processed when visiting a Web site, or the client browser
might ask you to specify which application is associated with that subtype or file
extension.
Because the official status of data types is changing rapidly, especially with
the rapid expansion of multimedia applications, I've grouped these descriptions by
the intuitive categories to which they belong rather than their official status.
Each data format description that follows includes the common format name and current
MIME name, the file extension(s) associated with the media, and a brief description.
Text Types
Table 16.1 lists the most common MIME text types.
Table 16.1. Text types commonly found on the Internet.
MIME Type |
File Extensions |
Common Format Name |
Description |
text/plain |
txt |
Text |
US ASCII text with no format tags |
text/html |
.html, .htm |
HyperText Markup Language |
Defines World Wide Web pages |
application/rtf |
.rtf |
Rich Text Format |
Vendor-independent word processing file type with some formatting capabilities |
application/ |
|
|
|
postscript |
ps, .ai, .eps |
PostScript |
Print and display format |
application/pdf |
pdf |
Portable Document Adobe's PDF |
Format used by Acrobat for platform-independent display and printing |
Image Types
Table 16.2 lists the most common MIME image types.
Table 16.2. Image types commonly found on the Internet.
MIME Type |
File Extensions |
Common Format Name |
Description |
image/gif |
.gif |
Graphics Interchagnge Format |
Common format for static images on the Web. 8 bit color and lossless compression;
very good for drawings. Patented by Unisys. |
image/jpeg |
..jpeg, .jpe, .jpg |
Joint Photographic Experts Group (JPEG) |
24 bit color with lossy compression. Often used for photos and high-detail drawings
on the Web. |
image/png |
.png |
Portable Network Graphics |
New format proposed by the IETF as a non- patented replacement for GIF and some uses
of TIFF |
image/tiff |
.tiff |
Tag Image File Format |
Developed by Aldus Corp. and adopted for experimental use in remote printing over
the Internet |
Audio Types
Table 16.3 lists the most common MIME audio types.
Table 25.3. Audio types commonly found on the Internet
MIME Type |
File Extensions |
Common format name |
Description |
audio/basic |
.au, .snd |
M-law |
Low fidelity, very common on the Web. First introduced by Sun Microsystems and NeXT
Computer |
audio/mpeg |
.mp2 |
Motion Picture Experts Group (MPEG) |
MPEG-1 audio format with layer II compression. Most systems have drivers for this
format, which is also used by some recording and broadcasting companies. |
audio/x-aiff |
.aif, .aiff, .aifc |
Audio Interchange File Format (AIFF) |
Apple, Silicon Graphics and Macintosh format for conversion between audio types. |
audio/x-voc |
.voc |
Creative Voice |
Used by Creative Lab's Sound Blaster and Sound Blaster Pro audio cards audio/x-midi |
audio/x-wav |
.wav |
Resource Interchange File Format Waveform Audio Format |
Adaptive Pulse Code Modulation (APCM) format native to Microsoft Windows environments. |
audio/x-xdma |
.xdm |
RealAudio |
Streaming audio format used for real-time audio transmission over the Internet. |
??? |
.mid,.midi |
Musical Instrument Digital Interface (MIDI) |
Format used to describe how synthesizers and samplers should reproduce sounds; also
used for electronic music composition. |
Video Types
Table 16.4 lists the most common MIME video types.
Table 16.4. Video types commonly found on the Internet.
MIME Type |
File Extensions |
Common format name |
Description |
video/mpeg |
. mpeg, .mpg, .mpe |
Motion Picture Experts Group (MPEG) |
Video portion of MPEG-2 standard; sometimes combined with MPEG-1 Level II audio |
video/quicktime |
.mov, .moov, .qt |
QuickTime |
Proprietary to Apple Computers, combining data and resource forks that are processed
in parallel |
video/x-msvideo |
.avi |
Microsoft's video for Windows. |
Native to the Windows environment; many translators to QuickTime exist. |
application/x-vrml |
.wrl |
Virtual Reality Modeling Language |
Non-proprietary format for 3-dimensional world models |
Application Types
Table 16.5 lists the most common MIME application types.
Table 16.5. Application types commonly found on the
Internet.
MIME Type |
File Extensions |
Common format name |
Description |
application/ x-gzip |
.gz |
Gnu ZIP |
Freeware compression for the UNIX environment |
application/ x-compress |
.z |
Compress |
Another common UNIX compression utility application/ |
x-zip |
.zip |
ZIP |
Multiplatform compression; widely used. |
application/ x-tar |
.tar |
Tape archive |
Standard UNIX archive format |
application/ x-stuffit |
.sit |
Macintosh archive |
Used for many image and video libraries |
Note that there are many other application types that can be sent over the Internet
as file attachments to e-mail. Spreadsheet and word processor files are the most
common, along with the output of presentation software. E-mail clients, browsers,
and similar software that receives such formats will simply store the data in a disk
file unless configured to map the file extension or private MIME type to a specific
executable for processing.
One important subcategory of the application media type is the variety of compression
schemes applied to general files. (Note that many audio, image, and video formats
include standard compression/decompression that is automatically applied when the
data is processed.) Table 16.6 lists the most common compression types and their
public or private MIME names.
Table 16.6. Compression types commonly found on the
Internet.
Column Heading |
Column Heading |
Column Heading |
Column Heading |
Table Entry |
Table Entry |
Table Entry |
Table Entry |
Table Entry |
Table Entry |
Table Entry |
Table Entry |
Last Table Entry |
Last Table Entry |
Last Table Entry |
Last Table Entry |
Multipart and Message Types
These MIME formats are primarily used for e-mail messages with multiple parts
and are manipulated by e-mail server and client software
Listing 16.1 shows a compound e-mail message, which includes the text of a message
received earlier, the sender's response, and an attached file. Each element of this
message has its own MIME format and is a separate entity within the compound message.
Listing 16.1. MIME supports compound e-mail messages.
X-POP3-Rcpt: robink@wizard.net
Return-Path: robink@wizard.net
From: robink@wizard.net
Date: Mon, 2 Jun 1997 10:29:20 -0500
Subject: example of forwarding a compound email message
To: rburk@digicon.com
Content-Description: cc:Mail note part
Here's my reply, which quotes the original message in full.
-----Original Message-----
From: rburk@digicon.com
Sent: Friday, 30 May 1997 11:40:00
To: robink@wizard.net
Subject: here's an original message with attachments
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Content-Description: cc:Mail note part
Attached are two files in different formats.
<<File: unx.ini>>S<<File: global95.dot>>u<<File: global97.dot>>b
Web Pages, Web Servers, and MIME
Web servers are the software that runs on a system that provides file, Internet,
and World Wide Web access to client workstations.
A variety of commercial and shareware Web servers are available. In most cases,
the operating system of choice for server systems is one or more flavors of UNIX.
The primary job of a Web server is to transmit the HTML scripts that make up a
World Wide Web page. The client's browser software then interprets the HTML script
and displays the Web page contents on the client system's monitor.
Along with the text whose presentation is specified by the HTML script, a Web
page may contain images or other multimedia content stored in separate files on the
server machine. The client browser will issue requests to the server each time it
finds a tag referring to such a file. The Web server software must find the file,
encode it appropriately (using standard MIME schemes) so that the integrity of the
transmitted information can be verified and send the file off to the client machine.
At the client, the browser then decodes the information and displays it, plays it
over the speaker, or otherwise presents it as part of the web page.
Web pages may also make use of Common Gateway Interface calls. CGI provides a
way for HTML scripts to exchange information with other applications running on the
server system. These most commonly are database applications accessed by HTML forms;
however, the Web page may contain server-side html logic, which causes the
server itself to take different actions depending on what has come before. A Web
page form may ask the user to specify whether or not his browser can support frames,
for instance. If the user says it does not, the server will then present a non-framed
version of the Web page to the user at his workstation. The Web server software is
responsible for processing server-side HTML logic.
In each of these cases, MIME data types are at work. HTML itself is a MIME text
type, as are the common image, audio, and video formats for Web page multimedia content.
Even application and private data types must be encoded properly to protect against
transmission errors, and MIME defines appropriate encoding schemes for this purpose.
Many servers and browsers come pre-configured to recognize the standard MIME data
types. Some standard types, and all private types, must be defined to the server
and browser software before they can be correctly processed.
To configure a MIME data type in the Netscape Navigator browser (version 3.01),
for instance, select options, general preferences and helpers
from the menu tree. Figure 16.1 shows how the helpers screen allows you to create
associations between MIME types, file extensions, and the actions to be taken when
such a data object is received.
Figure 16.1
Adding MIME types to the client browser.
Each Web server has its own way of configuring MIME types. Typically, this is
done by means of a configuration file read when the server process is created. The
Apache Web server, included on the CD-ROM for this book, looks for its configuration
files in /usr/local/httpd/conf unless told that the configuration files
are located elsewhere. The basic server configuration file httpd.conf and
the server resource map srm.conf tell the server which MIME data types are
legal and how to process the various data contents. For more information, see the
Apache documentation on the CD-ROM or online at http://www.apache.org/docs/.
Conclusion
In this chapter we've taken a brief look at the data format standards that allow
diverse hardware and software platforms to exchange data across the Internet and
the Web. Understanding how the MIME standard was established, what data formats it
covers, and how it is used by Web pages and Web servers can help you correctly configure
e-mail and browser software for yourself and your system's users.
An extensible Internet standard, MIME is a fundamental enabling technology for
Internet e-mail, the World Wide Web, and most networked multimedia applications.
©Copyright,
Macmillan Computer Publishing. All rights reserved.
|