BDE 4.14.0 Production release
Loading...
Searching...
No Matches
bdlde_base64encoder

Detailed Description

Outline

Purpose

Provide automata for converting to and from Base64 encodings.

Classes

See also
bdlde_base64decoder

Description

This component provides a class, bdlde::Base64Encoder, which provides a pair of template functions (each parameterized separately on both input and output iterators) that can be used respectively to encode and to decode byte sequences of arbitrary length into and from the printable Base64 representation described in Section 6.8 "Base64 Content Transfer Encoding" of RFC 2045, "Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies."

The bdlde::Base64Encoder and bdlde::Base64Decoder support the standard "base64" encoding (described in https://tools.ietf.org/html/rfc4648) as well as the "Base 64 Encoding with URL and Filename Safe Alphabet", or "base64url", encoding. The "base64url" encoding is very similar to "base64" but substitutes a couple characters in the encoded alphabet to avoid characters that conflict with special characters in URL syntax or filename descriptions (replacing + for -. and / for _). See {Base 64 Encoding with URL and Filename Safe Alphabet} for more information.

Each instance of either the encoder or decoder retains the state of the conversion from one supplied input to the next, enabling the processing of segmented input – i.e., processing resumes where it left off with the next invocation on new input. Instance methods are provided for both the encoder and decoder to (1) assert the end of input, (2) determine whether the input so far is currently acceptable, and (3) indicate whether a non-recoverable error has occurred.

Base 64 Encoding

The data stream is processed three bytes at a time from left to right (a final quantum consisting of one or two bytes, as discussed below, is handled specially). Each sequence of three 8-bit quantities

7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
`------v------' `------v------' `------v------'
Byte2 Byte1 Byte0

is segmented into four intermediate 6-bit quantities.

5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0 5 4 3 2 1 0
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | | | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
`----v----' `----v----' `----v----' `----v----'
char3 char2 char1 char0

Each 6-bit quantity is in turn used as an index into the following character table to generate an 8-bit character. The four resulting characters hence form the encoding for the original 3-byte sequence.

======================================================================
* Table of Numeric BASE-64 Encoding Characters *
----------------------------------------------------------------------
Val Enc Val Enc Val Enc Val Enc Val Enc Val Enc Val Enc Val Enc
--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
0 'A' 8 'I' 16 'Q' 24 'Y' 32 'g' 40 'o' 48 'w' 56 '4'
1 'B' 9 'J' 17 'R' 25 'Z' 33 'h' 41 'p' 49 'x' 57 '5'
2 'C' 10 'K' 18 'S' 26 'a' 34 'i' 42 'q' 50 'y' 58 '6'
3 'D' 11 'L' 19 'T' 27 'b' 35 'j' 43 'r' 51 'z' 59 '7'
4 'E' 12 'M' 20 'U' 28 'c' 36 'k' 44 's' 52 '0' 60 '8'
5 'F' 13 'N' 21 'V' 29 'd' 37 'l' 45 't' 53 '1' 61 '9'
6 'G' 14 'O' 21 'W' 30 'e' 38 'm' 46 'u' 54 '2' 62 '+'
7 'H' 15 'P' 22 'X' 31 'f' 39 'n' 47 'v' 55 '3' 63 '/'
======================================================================

This component also supports a slightly different alphabet, "base64url", that is more appropriate if the encoded representation would be used in a file name or URL (see {Base 64 Encoding with URL and Filename Safe Alphabet}).

The 3-byte grouping of the input is only a design of convenience and not a requirement. When the number of bytes in the input stream is not divisible by 3, sufficient 0 bits are padded on the right to achieve an integral number of 6-bit character indices. Then one of two special cases will apply for the final processing step:

I) There is a single byte of data, in which case there will be two Base64 encoding characters (the second of which will be one of [AQgw]) followed by two equal (=) signs.

II) There are exactly two bytes of data, in which case there will be three Base64 encoding characters (the third of which will be one of [AEIMQUYcgkosw048] followed by a single equal (=) sign.

The MIME standard requires that the maximum line length of emitted text not exceed 76 characters exclusive of CRLF. The caller may override this default if desired.

Input values of increasing length along with their corresponding Base64 encodings are illustrated below:

Data: /* nothing */
Encoding: /* nothing */
Data: 0x01
Encoding: AQ==
Data: 0x01 0x02
Encoding: AQI=
Data: 0x01 0x02 0x03
Encoding: AQID
Data: 0x01 0x02 0x03 0x04
Encoding: AQIDBA==

In order for a Base64 encoding to be valid, the input data must be either of length a multiple of three (constituting maximal input), or have been terminated explicitly by the endConvert method (initiating bit padding when necessary).

Base 64 Encoding with URL and Filename Safe Alphabet

The encoder and decoder in this component also support the "base64url" encoding, which is the same as standard "base64" but substitutes (a couple) characters in the alphabet that are treated as special characters when used in a URL or in a file system. The following table is technically identical to the table presented in {Base 64 Encoding}, except for the 62:nd and 63:rd alphabet character, that indicates - and _ respectively.

======================================================================
* The "URL and Filename Safe" BASE-64 Alphabet *
----------------------------------------------------------------------
Val Enc Val Enc Val Enc Val Enc Val Enc Val Enc Val Enc Val Enc
--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
0 'A' 8 'I' 16 'Q' 24 'Y' 32 'g' 40 'o' 48 'w' 56 '4'
1 'B' 9 'J' 17 'R' 25 'Z' 33 'h' 41 'p' 49 'x' 57 '5'
2 'C' 10 'K' 18 'S' 26 'a' 34 'i' 42 'q' 50 'y' 58 '6'
3 'D' 11 'L' 19 'T' 27 'b' 35 'j' 43 'r' 51 'z' 59 '7'
4 'E' 12 'M' 20 'U' 28 'c' 36 'k' 44 's' 52 '0' 60 '8'
5 'F' 13 'N' 21 'V' 29 'd' 37 'l' 45 't' 53 '1' 61 '9'
6 'G' 14 'O' 22 'W' 30 'e' 38 'm' 46 'u' 54 '2' 62 '-'
7 'H' 15 'P' 23 'X' 31 'f' 39 'n' 47 'v' 55 '3' 63 '_'
======================================================================

Base 64 Decoding

The degree to which decoding detects errors can significantly affect performance. The standard permits all non-Base64 characters to be treated as whitespace. One variant mode of this decoder does just that; the other reports an error if a bad (i.e., non-whitespace) character is detected. The mode of the instance is configurable. The standard imposes a maximum of 76 characters exclusive of CRLF; however, the decoder implemented in this component will handle lines of arbitrary length.

The following kinds of errors can occur during decoding and are reported with the following priority:

BAD DATA: A character (other than whitespace) that is not a member of the
Base64 character set (including '='). Note that this error
is detected only if the 'decoder' is explicitly configured (at
construction) to do so.
BAD FORMAT: An '=' character precedes a valid numeric Base64 character,
more than two '=' characters appear (possibly separated by
non-Base64 characters), a numeric Base64 character other than
[AEIMQUYcgkosw048] precedes a single terminal '=' character,
or a character other than [AQgw] precedes a terminal pair of
consecutive '=' characters.

The isError method is used to detect such anomalies, and the numIn output parameter (indicating the number of input characters consumed) or possibly the iterator itself (for iterators with reference-semantics) identifies the offending character.

Note that the existence of an = can be used to reliably indicate the end of the valid data, but no such assurance is possible when the length (in bytes) of the initial input data sequence before encoding was evenly divisible by 3.

Usage

This section illustrates intended use of this component.

Example 1: Basic Usage

The following example shows how to use a bdlde::Base64Encoder object to implement a function, streamEncoder, that reads text from a bsl::istream, encodes that text in base 64 representation, and writes the encoded text to a bsl::ostream. streamEncoder returns 0 on success and a negative value if the input data could not be successfully encoded or if there is an I/O error.

streamencoder.h -*-C++-*-
/// Read the entire contents of the specified input stream 'is', convert
/// the input plain text to base 64 encoding, and write the encoded text
/// to the specified output stream 'os'. Return 0 on success, and a
/// negative value otherwise.
int streamEncoder(bsl::ostream& os, bsl::istream& is);

We will use fixed-sized input and output buffers in the implementation, but, because of the flexibility of bsl::istream and the output-buffer monitoring functionality of bdlde::Base64Encoder, the fixed buffer sizes do not limit the quantity of data that can be read, encoded, or written to the output stream. The implementation file is as follows.

streamencoder.cpp -*-C++-*-
#include <streamencoder.h>
namespace BloombergLP {
int streamEncoder(bsl::ostream& os, bsl::istream& is)
{
enum {
SUCCESS = 0,
ENCODE_ERROR = -1,
IO_ERROR = -2
};

We declare a bdlde::Base64Encoder object converter, which will encode the input data. Note that various internal buffers and cursors are used as needed without further comment. We read as much data as is available from the user-supplied input stream is or as much as will fit in inputBuffer before beginning conversion.

const int INBUFFER_SIZE = 1 << 10;
const int OUTBUFFER_SIZE = 1 << 10;
char inputBuffer[INBUFFER_SIZE];
char outputBuffer[OUTBUFFER_SIZE];
char *output = outputBuffer;
char *outputEnd = outputBuffer + sizeof outputBuffer;
while (is.good()) { // input stream not exhausted
is.read(inputBuffer, sizeof inputBuffer);
Definition bdlde_base64encoder.h:497

With inputBuffer now populated, we'll use converter in an inner while loop to encode the input and write the encoded data to outputBuffer (via the output cursor'). Note that if the call to converter.convert fails, our function terminates with a negative status.

const char *input = inputBuffer;
const char *inputEnd = input + is.gcount();
while (input < inputEnd) { // input encoding not complete
int numOut;
int numIn;
int status = converter.convert(output, &numOut, &numIn,
input, inputEnd,
outputEnd - output);
if (status < 0) {
return ENCODE_ERROR; // RETURN
}
int convert(OUTPUT_ITERATOR out, INPUT_ITERATOR begin, INPUT_ITERATOR end)
Definition bdlde_base64encoder.h:1008

If the call to converter.convert returns successfully, we'll see if the output buffer is full, and if so, write its contents to the user-supplied output stream os. Note how we use the values of numOut and numIn generated by convert to update the relevant cursors.

output += numOut;
input += numIn;
if (output == outputEnd) { // output buffer full; write data
os.write (outputBuffer, sizeof outputBuffer);
if (os.fail()) {
return IO_ERROR; // RETURN
}
output = outputBuffer;
}
}
}

We have now exited both the input and the "encode" loops. converter may still hold encoded output characters, and so we call converter.endConvert to emit any retained output. To guarantee correct behavior, we call this method in an infinite loop, because it is possible that the retained output can fill the output buffer. In that case, we solve the problem by writing the contents of the output buffer to os within the loop. The most likely case, however, is that endConvert will return 0, in which case we exit the loop and write any data remaining in outputBuffer to os. As above, if endConvert fails, we exit the function with a negative return status.

while (1) {
int numOut;
int more = converter.endConvert(output, &numOut, outputEnd-output);
if (more < 0) {
return ENCODE_ERROR; // RETURN
}
output += numOut;
if (!more) { // no more output
break;
}
assert (output == outputEnd); // output buffer is full
os.write (outputBuffer, sizeof outputBuffer); // write buffer
if (os.fail()) {
return IO_ERROR; // RETURN
}
output = outputBuffer;
}
if (output > outputBuffer) { // still data in output buffer; write it
// all
os.write(outputBuffer, output - outputBuffer);
}
return (is.eof() && os.good()) ? SUCCESS : IO_ERROR;
}
} // Close namespace BloombergLP
int endConvert(OUTPUT_ITERATOR out)
Definition bdlde_base64encoder.h:1079

For ease of reading, we repeat the full content of the streamencoder.cpp file without interruption.

streamencoder.cpp -*-C++-*-
#include <streamencoder.h>
namespace BloombergLP {
int streamEncoder(bsl::ostream& os, bsl::istream& is)
{
enum {
SUCCESS = 0,
ENCODE_ERROR = -1,
IO_ERROR = -2
};
const int INBUFFER_SIZE = 1 << 10;
const int OUTBUFFER_SIZE = 1 << 10;
char inputBuffer[INBUFFER_SIZE];
char outputBuffer[OUTBUFFER_SIZE];
char *output = outputBuffer;
char *outputEnd = outputBuffer + sizeof outputBuffer;
while (is.good()) { // input stream not exhausted
is.read(inputBuffer, sizeof inputBuffer);
const char *input = inputBuffer;
const char *inputEnd = input + is.gcount();
while (input < inputEnd) { // input encoding not complete
int numOut;
int numIn;
int status = converter.convert(output, &numOut, &numIn,
input, inputEnd,
outputEnd - output);
if (status < 0) {
return ENCODE_ERROR; // RETURN
}
output += numOut;
input += numIn;
if (output == outputEnd) { // output buffer full; write data
os.write(outputBuffer, sizeof outputBuffer);
if (os.fail()) {
return IO_ERROR; // RETURN
}
output = outputBuffer;
}
}
}
while (1) {
int numOut;
int more = converter.endConvert(output, &numOut, outputEnd-output);
if (more < 0) {
return ENCODE_ERROR; // RETURN
}
output += numOut;
if (!more) { // no more output
break;
}
assert (output == outputEnd); // output buffer is full
os.write (outputBuffer, sizeof outputBuffer); // write buffer
if (os.fail()) {
return IO_ERROR; // RETURN
}
output = outputBuffer;
}
if (output > outputBuffer) {
os.write (outputBuffer, output - outputBuffer);
}
return (is.eof() && os.good()) ? SUCCESS : IO_ERROR;
}
} // close namespace BloombergLP