BDE 4.14.0 Production release
|
Provide automata converting to and from Quoted-Printable encodings.
This component provides a template class (parameterized separately on both input and output iterators) that can be used to decode byte sequences of arbitrary length from the Quoted Printable representation described in Section 6.7 "Quoted-Printable Content Transfer Encoding" of RFC 2045, "Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies."
Each instance of the decoder retains the state of the conversion from one supplied input to the next, enabling the processing of segmented input – i.e., processing resumes where it left off with the next invocation on new input. Instance methods are provided for the decoder to (1) assert the end of input, (2) determine whether the input so far is currently acceptable, and (3) indicate whether a non-recoverable error has occurred.
(In the following, all rules mentioned refer to those listed in the encoder section above.)
The decoding process for this encoding scheme involves:
=
prefix (i.e., concatenating broken sentences) (rule #5).The standard imposes a maximum of 76 characters exclusive of CRLF; however, the decoder implemented in this component will handle lines of arbitrary length.
The decoder also provides support for 2 error-reporting modes: the strict mode and the relaxed mode (configurable at construction). A strict-mode decoder stops decoding at the first offending character encountered, while a relaxed-mode decoder would continue decoding to the end of the input, allowing straight pass-through of character sets that cannot be interpreted.
The following kinds of errors can be encountered during decoding, listed in order of decreasing order of precedence:
An =
character is not followed by either two uppercase hexadecimal digits, or a soft line break – e.g.,
Note that:
=
, or an accidental insertion of a =
that does not belong.=4F
where F
is actually a literally encoded character.=
character preceding 2 seemingly valid hexadecimal numerics is also undetectable, e.g., =4F
where =
was actually a t
corrupted during transmission. =
character and the CRLF as they are to be treated and removed as transport padding. In the relaxed-mode, errors of the types E1 and E2 would be copied straight to output and type E3 ignored. Decoded lines will be broken even when a bare CRLF is encountered in this mode. Users can still be alerted to the the unreported errors as offending characters are copied straight through to the output stream, which can be observed.
The isError
method is used to detect the above anomalies, while for the convert
method, a numIn
output parameter (indicating the number of input characters consumed) or possibly the iterator itself (for iterators with reference-semantics) identifies the offending character.
This section illustrates intended use of this component.
TBD