BDE 4.14.0 Production release
|
Provide automata for converting to and from Base64 encodings.
This component a class
, bdlde::Base64Decoder
, which provides a pair of template functions (each parameterized separately on both input and output iterators) that can be used respectively to encode and to decode byte sequences of arbitrary length into and from the printable Base64 representation described in Section 6.8 "Base64 Content Transfer Encoding" of RFC 2045, "Multipurpose Internet Mail Extensions (MIME) Part One: Format
of Internet Message Bodies."
The bdlde::Base64Encoder
and bdlde::Base64Decoder
support the standard "base64" encoding (described in https://tools.ietf.org/html/rfc4648) as well as the "Base 64 Encoding with URL and Filename Safe Alphabet", or "base64url", encoding. The "base64url" encoding is very similar to "base64" but substitutes a couple characters in the encoded alphabet to avoid characters that conflict with special characters in URL syntax or filename descriptions (replacing +
for -
. and /
for _
). See {Base 64 Encoding with URL and Filename Safe Alphabet} for more information.
Each instance of either the encoder or decoder retains the state of the conversion from one supplied input to the next, enabling the processing of segmented input – i.e., processing resumes where it left off with the next invocation on new input. Instance methods are provided for both the encoder and decoder to (1) assert the end of input, (2) determine whether the input so far is currently acceptable, and (3) indicate whether a non-recoverable error has occurred.
The data stream is processed three bytes at a time from left to right (a final quantum consisting of one or two bytes, as discussed below, is handled specially). Each sequence of three 8-bit quantities
is segmented into four intermediate 6-bit quantities.
Each 6-bit quantity is in turn used as an index into the following character table to generate an 8-bit character. The four resulting characters hence form the encoding for the original 3-byte sequence.
This component also supports a slightly different alphabet, "base64url", that is more appropriate if the encoded representation would be used in a file name or URL (see {Base 64 Encoding with URL and Filename Safe Alphabet}).
The 3-byte grouping of the input is only a design of convenience and not a requirement. When the number of bytes in the input stream is not divisible by 3, sufficient 0 bits are padded on the right to achieve an integral number of 6-bit character indices. Then one of two special cases will apply for the final processing step:
I) There is a single byte of data, in which case there will be two Base64 encoding characters (the second of which will be one of [AQgw]) followed by two equal (=
) signs.
II) There are exactly two bytes of data, in which case there will be three Base64 encoding characters (the third of which will be one of [AEIMQUYcgkosw048] followed by a single equal (=
) sign.
The MIME standard requires that the maximum line length of emitted text not exceed 76 characters exclusive of CRLF. The caller may override this default if desired.
Input values of increasing length along with their corresponding Base64 encodings are illustrated below:
In order for a Base64 encoding to be valid, the input data must be either of length a multiple of three (constituting maximal input), or have been terminated explicitly by the endConvert
method (initiating bit padding when necessary).
The encoder and decoder in this component also support the "base64url" encoding, which is the same as standard "base64" but substitutes (a couple) characters in the alphabet that are treated as special characters when used in a URL or in a file system. The following table is technically identical to the table presented in {Base 64 Encoding}, except for the 62:nd and 63:rd alphabet character, that indicates -
and _
respectively.
The degree to which decoding detects errors can significantly affect performance. The standard permits all non-Base64 characters to be treated as whitespace. One variant mode of this decoder does just that; the other reports an error if a bad (i.e., non-whitespace) character is detected. The mode of the instance is configurable. The standard imposes a maximum of 76 characters exclusive of CRLF; however, the decoder implemented in this component will handle lines of arbitrary length.
The following kinds of errors can occur during decoding and are reported with the following priority:
The isError
method is used to detect such anomalies, and the numIn
output parameter (indicating the number of input characters consumed) or possibly the iterator itself (for iterators with reference-semantics) identifies the offending character.
Note that the existence of an =
can be used to reliably indicate the end of the valid data, but no such assurance is possible when the length (in bytes) of the initial input data sequence before encoding was evenly divisible by 3.
This section illustrates intended use of this component.
The following example shows how to use a bdlde::Base64Decoder
object to implement a function, streamconverter
, that reads text from a bsl::istream
, decodes that text from base 64 representation, and writes the decoded text to a bsl::ostream
. streamconverter
returns 0 on success and a negative value if the input data could not be successfully decoded or if there is an I/O error.
We will use fixed-sized input and output buffers in the implementation, but, because of the flexibility of bsl::istream
and the output-buffer monitoring functionality of bdlde::Base64Decoder
, the fixed buffer sizes do not limit the quantity of data that can be read, decoded, or written to the output stream. The implementation file is as follows.
We declare a bdlde::Base64Decoder
object converter
, which will decode the input data. Note that various internal buffers and cursors are used as needed without further comment. We read as much data as is available from the user-supplied input stream is
or as much as will fit in inputBuffer
before beginning conversion. To obtain unobstructedly the output that results from decoding the entire input stream (even in the case of errors), the base64 decoder is configured not to detect errors.
With inputBuffer
now populated, we'll use converter
in an inner while
loop to decode the input and write the decoded data to outputBuffer
(via the output
cursor'). Note that if the call to converter.convert
fails, our function terminates with a negative status.
If the call to converter.convert
returns successfully, we'll see if the output buffer is full, and if so, write its contents to the user-supplied output stream os
. Note how we use the values of numOut
and numIn
generated by convert
to update the relevant cursors.
We have now exited both the input and the "decode" loops. converter
may still hold decoded output characters, and so we call converter.endConvert
to emit any retained output. To guarantee correct behavior, we call this method in an infinite loop, because it is possible that the retained output can fill the output buffer. In that case, we solve the problem by writing the contents of the output buffer to os
within the loop. The most likely case, however, is that endConvert
will return 0, in which case we exit the loop and write any data remaining in outputBuffer
to os
. As above, if endConvert
fails, we exit the function with a negative return status.
For ease of reading, we repeat the full content of the streamconverter.cpp
file without interruption.