BDE 4.14.0 Production release
Loading...
Searching...
No Matches
bslalg_numericformatterutil

Detailed Description

Outline

Purpose

Provide a utility for formatting numbers into strings.

Classes

Description

This component, bslalg_numericformatterutil provides a namespace struct, bslalg::NumericFormatterUtil, containing the overloaded function toChars, that converts integral and floating point types into ASCII strings.

Shortest (Textual) Decimal Representation for Binary Floating Point Values

The floating point toChars implementations (for float and double) of this component provide the shortest (textual) decimal representation that can (later) be parsed back to the original binary value (i.e., a "round-trip" conversion). Such round-tripping enables precise, and human-friendly (textual) communication protocols, and storage formats that use minimal necessary bandwidth or storage.

Scientific notation, when chosen, always uses the minimum number of fractional digits necessary to restore the exact binary floating point value. The shortest decimal notation of a binary floating point number is text that has enough decimal fractional digits so that there can be no ambiguity in which binary representation value is closest to it. Notice that the previous sentence only addresses the number of fractional digits in the decimal notation. Floating point values that are mathematically integer are always written as their exact integer value in decimal notation. For large integers it would not strictly be necessary to use the exact decimal value as many integers (differing in some lower-decimal digits) may resolve to the same binary value, but readers may not expect integers to be "rounded", so C and C++ chose to standardize on the exact value.

Note that strictly speaking the C++-defined shortest round trip representation is not the shortest possible one as the C++ scientific notation is defined to possibly contain up to two extra characters: the sign of the exponent is always written (even for positive exponents), and at least 2 decimal digits of the exponent are always written.

More information about the difficulty of rendering binary floating point numbers as decimals can be found at https://bloomberg.github.io/bde/articles/binary_decimal_conversion.html . In short, IEEE-754 double precision binary floating point numbers (double) are guaranteed to round-trip when represented by 17 significant decimal digits, while single precisions (float) needs 9 digits. However those numbers are the maximum decimal digits that may be necessary, and in fact many values can be precisely represented precisely by less. toChars renders the minimum number of digits needed, so that the value can later be restored.

Default Floating Point Format

The default floating point format (that is used when no format argument is present in the signature) uses the shortest representation from the decimal notation and the scientific notation, favoring decimal notation in case of a tie.

Special Floating Point Values

Floating point values may also be special-numerical or non-numerical(*) values in addition to what we consider normal numbers.

The special numerical value is really just one, and that is negative zero.

For non-numerical special value both IEEE-754 and W3C XML Schema Definition Language (XSD) 1.1(**) numericalSpecialRep requires there to be three distinct values supported: positive infinity, negative infinity, and NaN. We represent those values according to the XSD lexical mapping specification. That also means that these values will round trip in text only if the reader algorithm recognizes those representations.

+-------------------+----------------+
| Special Value | Textual Repr. |
+-------------------+----------------+
| positive zero | "0", "0e+00" |
+-------------------+----------------+
| negative zero | "-0", "-0e+00" |
+-------------------+----------------+
| positive infinity | "+INF" |
+-------------------+----------------+
| negative infinity | "-INF" |
+-------------------+----------------+
| Not-a-number | "NaN" |
+-------------------+----------------+

(*) Non-numerical values do not represent a specific mathematical value. Do not confuse non-numerical values with Not-a-Number. NaN is just one of the possible non-numerical values. The positive and negative infinity represent all values too large (in their absolute value) to store. NaN represents all other values that cannot be represented by a real number. Non-numerical values normally come from computation results such as the square root of -1 resulting in Not-a-Number.

(**) https://www.w3.org/TR/xmlschema11-2/

Usage

In this section we show the intended use of this component.

Example 1: Writing an Integer to a streambuf

Suppose we want to define a function that writes an int to a streambuf. We can use bsl::to_chars to write the int to a buffer, then write the buffer to the streambuf.

First, we declare our function:

void writeJsonScalar(std::streambuf *result, int value)
// Write the specified 'value', in decimal, to the specified 'result'.
{

Then, we declare a buffer long enough to store any int value in decimal.

char buffer[bslalg::NumericFormatterUtil::
ToCharsMaxLength<int>::k_VALUE];
// size large enough to write 'INT_MIN', the
// worst-case value, in decimal.

Next, we call the function:

char *ret = bslalg::NumericFormatterUtil::toChars(
buffer,
buffer + sizeof buffer,
value);

Then, we check that the buffer was long enough, which should always be the case:

assert(0 != ret);

Now, we write our buffer to the streambuf:

result->sputn(buffer, ret - buffer);
}

Finally, we use an output string stream buffer to exercise the writeJsonScalar function for int:

std::ostringstream oss;
std::streambuf* sb = oss.rdbuf();
writeJsonScalar(sb, 0);
assert("0" == oss.str());
oss.str("");
writeJsonScalar(sb, 99);
assert("99" == oss.str());
oss.str("");
writeJsonScalar(sb, -1234567890); // worst case: max string length
assert("-1234567890" == oss.str());

Example 2: Writing the Minimal Form of a double

Suppose we want to store a floating point number using decimal text (such as JSON) for later retrieval, using the minimum number of digits that ensures we can later restore the same binary floating point value.

First, we declare our writer function:

void writeJsonScalar(std::streambuf *result,
double value,
bool stringNonNumericValues = false)
// Write the specified 'value' in the shortest round-trip decimal
// format into the specified 'result'. Write non-numeric values
// according to the optionally specified 'stringNonNumericValues'
// either as strings "NaN", "+Infinity", or "-Infinity" when
// 'stringNonNumericValues' is 'true', or a null when it is 'false' or
// not specified.
{

Then, we handle non-numeric values (toChars would write them the XSD way):

if (isnan(value) || isinf(value)) {
if (false == stringNonNumericValues) { // JSON standard output
result->sputn("null", 4);
}
else { // Frequent JSON extension
if (isnan(value)) {
result->sputn("\"NaN\"", 5);
}
else if (isinf(value)) {
result->sputn(value < 0 ? "\"-" : "\"+", 2);
result->sputn("Infinity\"", 9);
}
}
return; // RETURN
}

Next, we declare a buffer long enough to store any double value written in this minimal-length form:

char buffer[bslalg::NumericFormatterUtil::
ToCharsMaxLength<double>::k_VALUE];
// large enough to write the longest 'double'
// without a null terminator character.

Then, we call the function:

char *ret = bslalg::NumericFormatterUtil::toChars(
buffer,
buffer + sizeof buffer,
value);

Finally, we can write our buffer to the streambuf:

result->sputn(buffer, ret - buffer);
}

Finally, we use the output string stream buffer defined earlier to exercise the floating point writeJsonScalar function:

oss.str("");
writeJsonScalar(sb, 20211017.0);
assert("20211017" == oss.str());
oss.str("");
writeJsonScalar(sb, 3.1415926535897932);
assert("3.141592653589793" == oss.str());
oss.str("");
writeJsonScalar(sb, 2e5);
assert("2e+05" == oss.str());
oss.str(""); // Non-numeric are written as null by default
writeJsonScalar(sb, std::numeric_limits<double>::quiet_NaN());
assert("null" == oss.str()); oss.str("");
oss.str(""); // Non-numeric can be printed as strings
writeJsonScalar(sb, std::numeric_limits<double>::quiet_NaN(), true);
assert("\"NaN\"" == oss.str()); oss.str("");

Example 3: Determining The Necessary Minimum Buffer Size

Suppose you are writing code that uses bslalg::NumericFormatterUtil to convert values to text. Determining the necessary buffer sizes to ensure successful conversions, especially for floating point types, is non-trivial, and frankly usually strikes as a distraction in the flow of the work. This component provides the ToCharsMaxLength struct "overloaded" template that parallels the overloaded toChars function variants and provides the well-vetted and tested minimum sufficient buffer size values as compile time constants.

Determining the sufficient buffer size for any conversion starts with determining "What type are we converting?" and "Do we use an argument to control the conversion, and is that argument a compile time time constant? First, because of the descriptive type names we may want to start by locally shortening them using a <tt>typedef</tt>: @code typedef bslalg::NumericFormatterUtil NfUtil; @endcode Next, we determine the sufficient buffer size for converting a <tt>long</tt> to decimal. <tt>long</tt> is a type that has different <tt>sizeof</tt> on different 64 bit platforms, so it is especially convenient to have that difference hidden: @code const size_t k_LONG_DEC_SIZE = NfUtil::ToCharsMaxLength<long>::k_VALUE; // Sufficient buffer size to convert any 'long' value to decimal text. @endcode Then, we can write the longest possible <tt>long</tt> successfully into a buffer: @code char longDecimalBuffer[k_LONG_DEC_SIZE]; // We can write any 'long' in decimal into this buffer using // 'NfUtil::toChars' safely. char *p = NfUtil::toChars(longDecimalBuffer, longDecimalBuffer + sizeof longDecimalBuffer, LONG_MIN); assert(p != 0); @endcode Next, we can get the sufficient size for conversion of an <tt>unsigned int</tt> to octal: @code const size_t k_UINT_OCT_SIZE = NfUtil::ToCharsMaxLength<unsigned, 8>::k_VALUE; @endcode Then, if we do not know what <tt>base</tt> value <tt>toChars</tt> will use we have to, assume the longest, which is always base 2: @code const size_t k_SHRT_MAX_SIZE = NfUtil::ToCharsMaxLength<short, 2>::k_VALUE; @endcode Now, floating point types have an optional <tt>format</tt> argument instead of a <tt>base</tt>, with "default" format as the default, and "fixed" and "scientific" formats are selectable when a format argument is specified:

const size_t k_DBL_DFL_SIZE = NfUtil::ToCharsMaxLength<double>::k_VALUE;
const size_t k_FLT_DEC_SIZE = NfUtil::ToCharsMaxLength<
float,
NfUtil::e_FIXED>::k_VALUE;
const size_t k_DBL_SCI_SIZE = NfUtil::ToCharsMaxLength<
double,
NfUtil::e_SCIENTIFIC>::k_VALUE;

Finally, the longest floating point format is e_FIXED, so if the format argument is not known at compile time, e_FIXED should be used:

const size_t k_DBL_MAX_SIZE = NfUtil::ToCharsMaxLength<
double,
NfUtil::e_FIXED>::k_VALUE;