Efficient Email Bounce AutomationEfficient Email Bounce Automation

  
 

RFC 1521 - Section 5.1 - Quoted-Printable Content-Transfer-Encoding

    

RFC 1521 - Section 5.1. - Quoted-Printable Content-Transfer-Encoding 


5.1. Quoted-Printable Content-Transfer-Encoding

The Quoted-Printable encoding is intended to represent data that 
largely consists of octets that correspond to printable characters
in the ASCII character set. It encodes the data in such a way that
the resulting octets are unlikely to be modified by mail transport. 
If the data being encoded are mostly ASCII text, the encoded form of
the data remains largely recognizable by humans. A body which is 
entirely ASCII may also be encoded in Quoted-Printable to ensure the
integrity of the data should the message pass through a 
character- translating, and/or line-wrapping gateway. 

In this encoding, octets are to be represented as determined by the
following rules: 


 Rule #1: (General 8-bit representation) Any octet, except those
 indicating a line break according to the newline convention of the
 canonical (standard) form of the data being encoded, may be
 represented by an "=" followed by a two digit hexadecimal
 representation of the octet's value.  The digits of the
 hexadecimal alphabet, for this purpose, are "0123456789ABCDEF".
 Uppercase letters must be used when sending hexadecimal data,
 though a robust implementation may choose to recognize lowercase
 letters on receipt.  Thus, for example, the value 12 (ASCII form
 feed) can be represented by "=0C", and the value 61 (ASCII EQUAL
 SIGN) can be represented by "=3D".  Except when the following
 rules allow an alternative encoding, this rule is mandatory.

 Rule #2: (Literal representation) Octets with decimal values of 33
 through 60 inclusive, and 62 through 126, inclusive, MAY be
 represented as the ASCII characters which correspond to those
 octets (EXCLAMATION POINT through LESS THAN, and GREATER THAN
 through TILDE, respectively).

 Rule #3: (White Space): Octets with values of 9 and 32 MAY be
 represented as ASCII TAB (HT) and SPACE characters, respectively,
 but MUST NOT be so represented at the end of an encoded line. Any
 TAB (HT) or SPACE characters on an encoded line MUST thus be
 followed on that line by a printable character.  In particular, an
 "=" at the end of an encoded line, indicating a soft line break
 (see rule #5) may follow one or more TAB (HT) or SPACE characters.
 It follows that an octet with value 9 or 32 appearing at the end
 of an encoded line must be represented according to Rule #1.  This
 rule is necessary because some MTAs (Message Transport Agents,
 programs which transport messages from one user to another, or
 perform a part of such transfers) are known to pad lines of text
 with SPACEs, and others are known to remove "white space"
 characters from the end of a line.  Therefore, when decoding a
 Quoted-Printable body, any trailing white space on a line must be
 deleted, as it will necessarily have been added by intermediate
 transport agents.

 Rule #4 (Line Breaks): A line break in a text body, independent of
 what its representation is following the canonical representation
 of the data being encoded, must be represented by a (RFC 822) line
 break, which is a CRLF sequence, in the Quoted-Printable encoding.
 Since the canonical representation of types other than text do not
 generally include the representation of line breaks, no hard line
 breaks (i.e.  line breaks that are intended to be meaningful and
 to be displayed to the user) should occur in the quoted-printable
 encoding of such types.  Of course, occurrences of "=0D", "=0A",
 "0A=0D" and "=0D=0A" will eventually be encountered.  In general,
 however, base64 is preferred over quoted-printable for binary
 data.

 Note that many implementations may elect to encode the local
 representation of various content types directly, as described in
 Appendix G.  In particular, this may apply to plain text material
 on systems that use newline conventions other than CRLF
 delimiters. Such an implementation is permissible, but the
 generation of line breaks must be generalized to account for the
 case where alternate representations of newline sequences are
 used.

 Rule #5 (Soft Line Breaks): The Quoted-Printable encoding REQUIRES
 that encoded lines be no more than 76 characters long. If longer
 lines are to be encoded with the Quoted-Printable encoding, 'soft'
 line breaks must be used. An equal sign as the last character on a
 encoded line indicates such a non-significant ('soft') line break
 in the encoded text. Thus if the "raw" form of the line is a
 single unencoded line that says:

Now's the time for all folk to come to the aid of
their country.

 This can be represented, in the Quoted-Printable encoding, as
      Now's the time =
      for all folk to come=
       to the aid of their country.

 This provides a mechanism with which long lines are encoded in
 such a way as to be restored by the user agent.  The 76 character
 limit does not count the trailing CRLF, but counts all other
 characters, including any equal signs.

Since the hyphen character ("-") is represented as itself in the 
Quoted-Printable encoding, care must be taken, when encapsulating a
quoted-printable encoded body in a multipart entity, to ensure that
the encapsulation boundary does not appear anywhere in the encoded
body. (A good strategy is to choose a boundary that includes a 
character sequence such as "=_" which can never appear in a quoted-printable 
body. See the definition of multipart messages later in this document.) 


 NOTE: The quoted-printable encoding represents something of a
 compromise between readability and reliability in transport.
 Bodies encoded with the quoted-printable encoding will work
 reliably over most mail gateways, but may not work perfectly over
 a few gateways, notably those involving translation into EBCDIC.
 (In theory, an EBCDIC gateway could decode a quoted-printable body
 and re-encode it using base64, but such gateways do not yet
 exist.)  A higher level of confidence is offered by the base64
 Content-Transfer-Encoding.  A way to get reasonably reliable
 transport through EBCDIC gateways is to also quote the ASCII
 characters

        !"#$@[\]^`{|}~

 according to rule #1.  See Appendix B for more information.

Because quoted-printable data is generally assumed to be line-oriented,
it is to be expected that the representation of the breaks between the
lines of quoted printable data may be altered in transport, in the same
manner that plain text mail has always been altered in Internet mail 
when passing between systems with differing newline conventions. If such
alterations are likely to constitute a corruption of the data, it is 
probably more sensible to use the base64 encoding rather than the 
quoted-printable encoding. 

WARNING TO IMPLEMENTORS: If binary data are encoded in quoted- printable,
care must be taken to encode CR and LF characters as "=0D" and "=0A", 
respectively. In particular, a CRLF sequence in binary data should be
encoded as "=0D=0A". Otherwise, if CRLF were represented as a hard line
break, it might be incorrectly decoded on platforms with different line
break conventions. 

For formalists, the syntax of quoted-printable data is described by the
following grammar: 


 quoted-printable := ([*(ptext / SPACE / TAB) ptext] ["="] CRLF)
      ; Maximum line length of 76 characters excluding CRLF

 ptext := octet / 127, =, SPACE, or TAB,
       ; and is recommended for any characters not listed in
       ; Appendix B as "mail-safe".








RFC 1521 - Appendix B - General Guidelines For Sending Email Data

Internet email is not a perfect, homogeneous system. Mail may become
corrupted at several stages in its travel to a final destination. 
Specifically, email sent throughout the Internet may travel across many
networking technologies. Many networking and mail technologies do not 
support the full functionality possible in the SMTP transport environment. 
Mail traversing these systems is likely to be modified in such a way 
that it can be transported. 

There exist many widely-deployed non-conformant MTAs in the Internet. These
MTAs, speaking the SMTP protocol, alter messages on the fly to take
advantage of the internal data structure of the hosts they are implemented
on, or are just plain broken. 

The following guidelines may be useful to anyone devising a data format 
(Content-Type) that will survive the widest range of networking technologies
and known broken MTAs unscathed. Note that anything encoded in the base64
encoding will satisfy these rules, but that some well-known mechanisms,
notably the UNIX uuencode facility, will not. Note also that anything encoded
in the Quoted-Printable encoding will survive most gateways intact, but
possibly not some gateways to systems that use the EBCDIC character set. 


Under some circumstances the encoding used for data may change as part of
normal gateway or user agent operation. In particular, conversion from 
base64 to quoted-printable and vice versa may be necessary. This may result
in the confusion of CRLF sequences with line breaks in text bodies. As such,
the persistence of CRLF as something other than a line break must not be
relied on. 

Many systems may elect to represent and store text data using local newline 
conventions. Local newline conventions may not match the RFC822 CRLF 
convention -- systems are known that use plain CR, plain LF, CRLF, or counted
records. The result is that isolated CR and LF characters are not well tolerated
in general; they may be lost or converted to delimiters on some systems, and
hence must not be relied on. 

TAB (HT) characters may be misinterpreted or may be automatically converted
to variable numbers of spaces. This is unavoidable in some environments, notably
those not based on the ASCII character set. Such conversion is STRONGLY
DISCOURAGED, but it may occur, and mail formats must not rely on the persistence
of TAB (HT) characters. 

Lines longer than 76 characters may be wrapped or truncated in some environments.
Line wrapping and line truncation are STRONGLY DISCOURAGED, but unavoidable in
some cases. Applications which require long lines must somehow differentiate
between soft and hard line breaks. (A simple way to do this is to use the 
quoted-printable encoding.)

Trailing "white space" characters (SPACE, TAB (HT)) on a line may be discarded 
by some transport agents, while other transport agents may pad lines with these
characters so that all lines in a mail file are of equal length. The persistence
of trailing white space, therefore, must not be relied on. 

Many mail domains use variations on the ASCII character set, or use character
sets such as EBCDIC which contain most but not all of the US-ASCII characters. 
The correct translation of characters not in the "invariant" set cannot be depended
on across character converting gateways. For example, this situation is a problem
when sending uuencoded information across BITNET, an EBCDIC system. Similar problems 
can occur without crossing a gateway, since many Internet hosts use character sets
other than ASCII internally. The definition of Printable Strings in X.400 adds
further restrictions in certain special cases. In particular, the only characters
that are known to be consistent across all gateways are the 73 characters that
correspond to the upper and lower case letters A-Z and a-z, the 10 digits 0-9, and
the following eleven special characters: 

                        "'"  (ASCII code 39)
                        "("  (ASCII code 40)
                        ")"  (ASCII code 41)
                        "+"  (ASCII code 43)
                        ","  (ASCII code 44)
                        "-"  (ASCII code 45)
                        "."  (ASCII code 46)
                        "/"  (ASCII code 47)
                        ":"  (ASCII code 58)
                        "="  (ASCII code 61)
                        "?"  (ASCII code 63)

A maximally portable mail representation, such as the base64 encoding, will confine 
itself to relatively short lines of text in which the only meaningful characters are 
taken from this set of 73 characters. 


Some mail transport agents will corrupt data that includes certain literal strings. In 
particular, a period (".") alone on a line is known to be corrupted by some (incorrect) 
SMTP implementations, and a line that starts with the five characters "From " (the fifth 
character is a SPACE) are commonly corrupted as well. A careful composition agent can 
prevent these corruptions by encoding the data (e.g., in the quoted-printable encoding, 
"=46rom " in place of "From " at the start of a line, and "=2E" in place of "." alone 
on a line. 

Please note that the above list is NOT a list of recommended practices for MTAs. RFC 821 
MTAs are prohibited from altering the character of white space or wrapping long lines. 
These BAD and illegal practices are known to occur on established networks, and 
implementations should be robust in dealing with the bad effects they can cause. 





"It's really amazing how many formats you recognize. You asked me what percentage of bounces you were able to recognize and I would say it is very close to 100%."


Mark Frishman
Corex Technologies
©1999-2024 BoogieTools, Inc. All rights reserved.