Provide a utility for formatting numbers into strings.
More...
Detailed Description
- Outline
-
-
- Purpose:
- Provide a utility for formatting numbers into strings.
-
- Classes:
-
-
- Description:
- This component,
bslalg_numericformatterutil
provides a namespace struct
, bslalg::NumericFormatterUtil
, containing the overloaded function toChars
, that converts integral and floating point types into ASCII strings.
-
- Shortest (Textual) Decimal Representation for Binary Floating Point Values:
- The floating point
toChars
implementations (for float
and double
) of this component provide the shortest (textual) decimal representation that can (later) be parsed back to the original binary value (i.e., a "round-trip" conversion). Such round-tripping enables precise, and human-friendly (textual) communication protocols, and storage formats that use minimal necessary bandwidth or storage.
- Scientific notation, when chosen, always uses the minimum number of fractional digits necessary to restore the exact binary floating point value. The shortest decimal notation of a binary floating point number is text that has enough decimal fractional digits so that there can be no ambiguity in which binary representation value is closest to it. Notice that the previous sentence only addresses the number of fractional digits in the decimal notation. Floating point values that are mathematically integer are always written as their exact integer value in decimal notation. For large integers it would not strictly be necessary to use the exact decimal value as many integers (differing in some lower-decimal digits) may resolve to the same binary value, but readers may not expect integers to be "rounded", so C and C++ chose to standardize on the exact value.
- Note that strictly speaking the C++-defined shortest round trip representation is not the shortest possible one as the C++ scientific notation is defined to possibly contain up to two extra characters: the sign of the exponent is always written (even for positive exponents), and at least 2 decimal digits of the exponent are always written.
- More information about the difficulty of rendering binary floating point numbers as decimals can be found at https://bloomberg.github.io/bde/articles/binary_decimal_conversion.html . In short, IEEE-754 double precision binary floating point numbers (
double
) are guaranteed to round-trip when represented by 17 significant decimal digits, while single precisions (float
) needs 9 digits. However those numbers are the maximum decimal digits that may be necessary, and in fact many values can be precisely represented precisely by less. toChars
renders the minimum number of digits needed, so that the value can later be restored.
-
- Default Floating Point Format:
- The default floating point format (that is used when no
format
argument is present in the signature) uses the shortest representation from the decimal notation and the scientific notation, favoring decimal notation in case of a tie.
-
- Special Floating Point Values:
- Floating point values may also be special-numerical or non-numerical(*) values in addition to what we consider normal numbers.
- The special numerical value is really just one, and that is negative zero.
- For non-numerical special value both IEEE-754 and W3C XML Schema Definition Language (XSD) 1.1(**)
numericalSpecialRep
requires there to be three distinct values supported: positive infinity, negative infinity, and NaN. We represent those values according to the XSD lexical mapping specification. That also means that these values will round trip in text only if the reader algorithm recognizes those representations.
+-------------------+----------------+
| Special Value | Textual Repr. |
+-------------------+----------------+
| positive zero | "0", "0e+00" |
+-------------------+----------------+
| negative zero | "-0", "-0e+00" |
+-------------------+----------------+
| positive infinity | "+INF" |
+-------------------+----------------+
| negative infinity | "-INF" |
+-------------------+----------------+
| Not-a-number | "NaN" |
+-------------------+----------------+
- (*) Non-numerical values do not represent a specific mathematical value. Do not confuse non-numerical values with Not-a-Number. NaN is just one of the possible non-numerical values. The positive and negative infinity represent all values too large (in their absolute value) to store. NaN represents all other values that cannot be represented by a real number. Non-numerical values normally come from computation results such as the square root of -1 resulting in Not-a-Number.
- (**) https://www.w3.org/TR/xmlschema11-2/
-
- Usage:
- In this section we show the intended use of this component.
-
- Example 1: Writing an Integer to a streambuf:
- Suppose we want to define a function that writes an
int
to a streambuf
. We can use bsl::to_chars
to write the int
to a buffer, then write the buffer to the streambuf
.
- First, we declare our function:
void writeJsonScalar(std::streambuf *result, int value)
{
Then, we declare a buffer long enough to store any int
value in decimal. Next, we call the function: char *ret = bslalg::NumericFormatterUtil::toChars(
buffer,
buffer + sizeof buffer,
value);
Then, we check that the buffer was long enough, which should always be the case: Now, we write our buffer to the streambuf
: result->sputn(buffer, ret - buffer);
}
Finally, we use an output string stream buffer to exercise the writeJsonScalar
function for int
: std::ostringstream oss;
std::streambuf* sb = oss.rdbuf();
writeJsonScalar(sb, 0);
assert("0" == oss.str());
oss.str("");
writeJsonScalar(sb, 99);
assert("99" == oss.str());
oss.str("");
writeJsonScalar(sb, -1234567890);
assert("-1234567890" == oss.str());
-
- Example 2: Writing the Minimal Form of a double:
- Suppose we want to store a floating point number using decimal text (such as JSON) for later retrieval, using the minimum number of digits that ensures we can later restore the same binary floating point value.
- First, we declare our writer function:
void writeJsonScalar(std::streambuf *result,
double value,
bool stringNonNumericValues = false)
{
Then, we handle non-numeric values (toChars
would write them the XSD way): if (isnan(value) || isinf(value)) {
if (false == stringNonNumericValues) {
result->sputn("null", 4);
}
else {
if (isnan(value)) {
result->sputn("\"NaN\"", 5);
}
else if (isinf(value)) {
result->sputn(value < 0 ? "\"-" : "\"+", 2);
result->sputn("Infinity\"", 9);
}
}
return;
}
Next, we declare a buffer long enough to store any double
value written in this minimal-length form: Then, we call the function: char *ret = bslalg::NumericFormatterUtil::toChars(
buffer,
buffer + sizeof buffer,
value);
Finally, we can write our buffer to the streambuf
: result->sputn(buffer, ret - buffer);
}
Finally, we use the output string stream buffer defined earlier to exercise the floating point writeJsonScalar
function: oss.str("");
writeJsonScalar(sb, 20211017.0);
assert("20211017" == oss.str());
oss.str("");
writeJsonScalar(sb, 3.1415926535897932);
assert("3.141592653589793" == oss.str());
oss.str("");
writeJsonScalar(sb, 2e5);
assert("2e+05" == oss.str());
oss.str("");
writeJsonScalar(sb, std::numeric_limits<double>::quiet_NaN());
assert("null" == oss.str()); oss.str("");
oss.str("");
writeJsonScalar(sb, std::numeric_limits<double>::quiet_NaN(), true);
assert("\"NaN\"" == oss.str()); oss.str("");
-
- Example 3: Determining The Necessary Minimum Buffer Size:
- Suppose you are writing code that uses
bslalg::NumericFormatterUtil
to convert values to text. Determining the necessary buffer sizes to ensure successful conversions, especially for floating point types, is non-trivial, and frankly usually strikes as a distraction in the flow of the work. This component provides the ToCharsMaxLength
struct
"overloaded" template that parallels the overloaded toChars
function variants and provides the well-vetted and tested minimum sufficient buffer size values as compile time constants.
- Determining the sufficient buffer size for any conversion starts with determining "What type are we converting?" and "Do we use an argument to control the conversion, and is that argument a compile time time constant?
- First, because of the descriptive type names we may want to start by locally shortening them using a
typedef
: Next, we determine the sufficient buffer size for converting a long
to decimal. long
is a type that has different sizeof
on different 64 bit platforms, so it is especially convenient to have that difference hidden: const size_t k_LONG_DEC_SIZE = NfUtil::ToCharsMaxLength<long>::k_VALUE;
Then, we can write the longest possible long
successfully into a buffer: char longDecimalBuffer[k_LONG_DEC_SIZE];
char *p = NfUtil::toChars(longDecimalBuffer,
longDecimalBuffer + sizeof longDecimalBuffer,
LONG_MIN);
assert(p != 0);
Next, we can get the sufficient size for conversion of an unsigned int
to octal: const size_t k_UINT_OCT_SIZE = NfUtil::ToCharsMaxLength<unsigned,
8>::k_VALUE;
Then, if we do not know what base
value toChars
will use we have to, assume the longest, which is always base 2: const size_t k_SHRT_MAX_SIZE = NfUtil::ToCharsMaxLength<short, 2>::k_VALUE;
Now, floating point types have an optional format
argument instead of a base
, with "default" format as the default, and "fixed" and "scientific" formats are selectable when a format
argument is specified: const size_t k_DBL_DFL_SIZE = NfUtil::ToCharsMaxLength<double>::k_VALUE;
const size_t k_FLT_DEC_SIZE = NfUtil::ToCharsMaxLength<
float,
NfUtil::e_FIXED>::k_VALUE;
const size_t k_DBL_SCI_SIZE = NfUtil::ToCharsMaxLength<
double,
NfUtil::e_SCIENTIFIC>::k_VALUE;
Finally, the longest floating point format is e_FIXED
, so if the format
argument is not known at compile time, e_FIXED
should be used: const size_t k_DBL_MAX_SIZE = NfUtil::ToCharsMaxLength<
double,
NfUtil::e_FIXED>::k_VALUE;