Quick Links:

bal | bbl | bdl | bsl

Namespaces

Component bdlde_quotedprintabledecoder
[Package bdlde]

Provide automata converting to and from Quoted-Printable encodings. More...

Namespaces

namespace  bdlde

Detailed Description

Outline
Purpose:
Provide automata converting to and from Quoted-Printable encodings.
Classes:
bdlde::QuotedPrintableDecoder automata for Quoted-Printable decoding
See also:
Component bdlde_quotedprintableencoder
Description:
This component provides a template class (parameterized separately on both input and output iterators) that can be used to decode byte sequences of arbitrary length from the Quoted Printable representation described in Section 6.7 "Quoted-Printable Content Transfer Encoding" of RFC 2045, "Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies."
Each instance of the decoder retains the state of the conversion from one supplied input to the next, enabling the processing of segmented input -- i.e., processing resumes where it left off with the next invocation on new input. Instance methods are provided for the decoder to (1) assert the end of input, (2) determine whether the input so far is currently acceptable, and (3) indicate whether a non-recoverable error has occurred.
Quoted-Printable Decoding:
(In the following, all rules mentioned refer to those listed in the encoder section above.)
The decoding process for this encoding scheme involves:
  1. transforming any encoded character triplets back into their original representation (rule #1 and rule #4).
  2. literally writing out characters that have not been changed (rule #2).
  3. deleting any trailing whitespace at the end of an encoded line (rule #3).
  1. removing the soft line breaks including the = prefix (i.e., concatenating broken sentences) (rule #5).
The standard imposes a maximum of 76 characters exclusive of CRLF; however, the decoder implemented in this component will handle lines of arbitrary length.
The decoder also provides support for 2 error-reporting modes: the strict mode and the relaxed mode (configurable at construction). A strict-mode decoder stops decoding at the first offending character encountered, while a relaxed-mode decoder would continue decoding to the end of the input, allowing straight pass-through of character sets that cannot be interpreted.
The following kinds of errors can be encountered during decoding, listed in order of decreasing order of precedence:
  E1. BAD_DATA
An = character is not followed by either two uppercase hexadecimal digits, or a soft line break -- e.g.,
   '=4=' (only one hexadecimal)
   '=K3' (K3 is not a hexadecimal number)
   '=1f' (lower case f is a literally encoded character)
Note that:
  1. In the relaxed error-reporting mode of this implementation, lowercase hexadecimal digits are treated as valid numerals.
  2. E1 can be caused by a missing or corrupted numeric, a corrupted character disguised as an =, or an accidental insertion of a = that does not belong.
  3. The case where a seemingly valid character is found in place of a missing numeric cannot be detected, e.g., =4F where F is actually a literally encoded character.
  4. An erroneous occurrence of a = character preceding 2 seemingly valid hexadecimal numerics is also undetectable, e.g., =4F where = was actually a t corrupted during transmission.
  E2. BAD_LINEBREAK
A \r is not followed by a \n. In the relaxed mode, each stand-alone \r or \n will be copied straight through to the output. For soft line breaks, whitespace is ignored between the = character and the CRLF as they are to be treated and removed as transport padding.
  E3. BAD_LINELENTH
An encoded line exceeds the specified maximum line length with missing soft line breaks. (Because input of flexible line lengths is allowed in this implementation, this error is not detected or reported.)
In the relaxed-mode, errors of the types E1 and E2 would be copied straight to output and type E3 ignored. Decoded lines will be broken even when a bare CRLF is encountered in this mode. Users can still be alerted to the the unreported errors as offending characters are copied straight through to the output stream, which can be observed.
The isError method is used to detect the above anomalies, while for the convert method, a numIn output parameter (indicating the number of input characters consumed) or possibly the iterator itself (for iterators with reference-semantics) identifies the offending character.
Usage:
TBD