BDE 4.14.0 Production release
|
Provide a utility for formatting numbers into strings.
toChars
and support functionsThis component, bslalg_numericformatterutil provides a namespace struct
, bslalg::NumericFormatterUtil
, containing the overloaded function toChars
, that converts integral and floating point types into ASCII strings.
The floating point toChars
implementations (for float
and double
) of this component provide the shortest (textual) decimal representation that can (later) be parsed back to the original binary value (i.e., a "round-trip" conversion). Such round-tripping enables precise, and human-friendly (textual) communication protocols, and storage formats that use minimal necessary bandwidth or storage.
Scientific notation, when chosen, always uses the minimum number of fractional digits necessary to restore the exact binary floating point value. The shortest decimal notation of a binary floating point number is text that has enough decimal fractional digits so that there can be no ambiguity in which binary representation value is closest to it. Notice that the previous sentence only addresses the number of fractional digits in the decimal notation. Floating point values that are mathematically integer are always written as their exact integer value in decimal notation. For large integers it would not strictly be necessary to use the exact decimal value as many integers (differing in some lower-decimal digits) may resolve to the same binary value, but readers may not expect integers to be "rounded", so C and C++ chose to standardize on the exact value.
Note that strictly speaking the C++-defined shortest round trip representation is not the shortest possible one as the C++ scientific notation is defined to possibly contain up to two extra characters: the sign of the exponent is always written (even for positive exponents), and at least 2 decimal digits of the exponent are always written.
More information about the difficulty of rendering binary floating point numbers as decimals can be found at https://bloomberg.github.io/bde/articles/binary_decimal_conversion.html . In short, IEEE-754 double precision binary floating point numbers (double
) are guaranteed to round-trip when represented by 17 significant decimal digits, while single precisions (float
) needs 9 digits. However those numbers are the maximum decimal digits that may be necessary, and in fact many values can be precisely represented precisely by less. toChars
renders the minimum number of digits needed, so that the value can later be restored.
The default floating point format (that is used when no format
argument is present in the signature) uses the shortest representation from the decimal notation and the scientific notation, favoring decimal notation in case of a tie.
Floating point values may also be special-numerical or non-numerical(*) values in addition to what we consider normal numbers.
The special numerical value is really just one, and that is negative zero.
For non-numerical special value both IEEE-754 and W3C XML Schema Definition Language (XSD) 1.1(**) numericalSpecialRep
requires there to be three distinct values supported: positive infinity, negative infinity, and NaN. We represent those values according to the XSD lexical mapping specification. That also means that these values will round trip in text only if the reader algorithm recognizes those representations.
(*) Non-numerical values do not represent a specific mathematical value. Do not confuse non-numerical values with Not-a-Number. NaN is just one of the possible non-numerical values. The positive and negative infinity represent all values too large (in their absolute value) to store. NaN represents all other values that cannot be represented by a real number. Non-numerical values normally come from computation results such as the square root of -1 resulting in Not-a-Number.
(**) https://www.w3.org/TR/xmlschema11-2/
In this section we show the intended use of this component.
Suppose we want to define a function that writes an int
to a streambuf
. We can use bsl::to_chars
to write the int
to a buffer, then write the buffer to the streambuf
.
First, we declare our function:
Then, we declare a buffer long enough to store any int
value in decimal.
Next, we call the function:
Then, we check that the buffer was long enough, which should always be the case:
Now, we write our buffer to the streambuf
:
Finally, we use an output string stream buffer to exercise the writeJsonScalar
function for int
:
Suppose we want to store a floating point number using decimal text (such as JSON) for later retrieval, using the minimum number of digits that ensures we can later restore the same binary floating point value.
First, we declare our writer function:
Then, we handle non-numeric values (toChars
would write them the XSD way):
Next, we declare a buffer long enough to store any double
value written in this minimal-length form:
Then, we call the function:
Finally, we can write our buffer to the streambuf
:
Finally, we use the output string stream buffer defined earlier to exercise the floating point writeJsonScalar
function:
Suppose you are writing code that uses bslalg::NumericFormatterUtil
to convert values to text. Determining the necessary buffer sizes to ensure successful conversions, especially for floating point types, is non-trivial, and frankly usually strikes as a distraction in the flow of the work. This component provides the ToCharsMaxLength
struct
"overloaded" template that parallels the overloaded toChars
function variants and provides the well-vetted and tested minimum sufficient buffer size values as compile time constants.
Determining the sufficient buffer size for any conversion starts with determining "What type are we converting?" and "Do we use an argument to
control the conversion, and is that argument a compile time time constant?
First, because of the descriptive type names we may want to start by locally
shortening them using a <tt>typedef</tt>:
@code
typedef bslalg::NumericFormatterUtil NfUtil;
@endcode
Next, we determine the sufficient buffer size for converting a <tt>long</tt> to
decimal. <tt>long</tt> is a type that has different <tt>sizeof</tt> on different 64 bit
platforms, so it is especially convenient to have that difference hidden:
@code
const size_t k_LONG_DEC_SIZE = NfUtil::ToCharsMaxLength<long>::k_VALUE;
// Sufficient buffer size to convert any 'long' value to decimal text.
@endcode
Then, we can write the longest possible <tt>long</tt> successfully into a buffer:
@code
char longDecimalBuffer[k_LONG_DEC_SIZE];
// We can write any 'long' in decimal into this buffer using
// 'NfUtil::toChars' safely.
char *p = NfUtil::toChars(longDecimalBuffer,
longDecimalBuffer + sizeof longDecimalBuffer,
LONG_MIN);
assert(p != 0);
@endcode
Next, we can get the sufficient size for conversion of an <tt>unsigned int</tt> to
octal:
@code
const size_t k_UINT_OCT_SIZE = NfUtil::ToCharsMaxLength<unsigned,
8>::k_VALUE;
@endcode
Then, if we do not know what <tt>base</tt> value <tt>toChars</tt> will use we have to,
assume the longest, which is always base 2:
@code
const size_t k_SHRT_MAX_SIZE = NfUtil::ToCharsMaxLength<short, 2>::k_VALUE;
@endcode
Now, floating point types have an optional <tt>format</tt> argument instead of a
<tt>base</tt>, with "default" format as the default, and "fixed" and "scientific" formats are selectable when a format
argument is specified:
Finally, the longest floating point format is e_FIXED
, so if the format
argument is not known at compile time, e_FIXED
should be used: