Mar 30, 2021
Binary to Decimal and Back Again¶
Contents
Abstract¶
Handling floating-point numbers in computer systems is hard. Types like double
and float
(IEEE 754-1985) both provide approximations to the mathematical concept of a real number, which, for the most part, allow us to ignore this complexity. However, software where we expose the limits of these approximations can be a source of errors, and one area that is a particularly common source of errors at Bloomberg is the conversion between binary representations of floating-point numbers (i.e., double
and float
) and decimal representations (like the string “3.14159”). Such conversions are especially relevant at Bloomberg because financial calculations are governed by laws and expectations that are based on decimal (base-10) thinking. In this article we seek to explain the complexity in converting between binary and decimal floating-point representations, and to guide developers in performing such conversions safely.
Introduction¶
Modern computer systems typically operate on floating-point numbers represented in IEEE-754 binary format. Binary floating-point numbers are optimal for scientific computations due to the high computational performance enabled by the hardware (FPUs), their high precision, and carefully designed mathematical operations that minimize accumulation of rounding errors during lengthy calculations. However, binary floating-point numbers are far less suitable for computations involving humans, particularly those in the financial sector.
Financial calculations are governed by laws and expectations that are based on decimal (base-10) thinking. Since binary floating-point cannot represent decimal values exactly, it is difficult to use them while maintaining decimal accuracy requirements. Doing such decimal-based calculations and algorithms using binary floating-point numbers is so hard that it is considered not feasible. The IEEE-754 committee have recognized the issue and added specifications for 3 decimal floating-point types into their 2008 standard: the 32, 64, and 128-bit decimal floating-point formats (BDE provides implementations of decimal floating-point types in the bdldfp_decimal component).
Having two numeric formats, binary floating-point representations (like float
and double
) and decimal floating-point representations (like the bdldfp::Decimal
type and any textual representations of a floating-point value) raises the requirement of converting between them.
The desire to convert numbers from binary floating-point to decimal format is fraught with misunderstanding, and is often accompanied by ill-conceived attempts to “correct rounding errors” and otherwise coerce results into aesthetically pleasing forms. In the Bloomberg environment, there is the additional complication of IBM/Perkin-Elmer/Interdata floating-point. This is a floating-point format that uses a base-16 rather than a base-2 underlying representation, and the Bloomberg environment contains numbers that began as decimals, were converted to IBM format, and then were converted from IBM format to IEEE format.
Preserving Equality Through Conversions (“Round-Trip Equality”)¶
When we consider conversions between binary and decimal representations an important property we want to maintain is that the converted value in the original representation is equal to its original value (what we call “round-trip equality”). That is, we want to be able take a number in a decimal representation, convert it to a binary representation, and convert it back to a decimal representation and for that resulting value to be equal to the value we started with.
Decimal to Binary Conversion¶
Generally, when a decimal floating-point value is converted to binary floating-point, the result is the representable binary number nearest in value to that decimal. Unless the decimal value is exactly a multiple of a power of two (e.g., 3.4375 = 55 * 1/16), the converted binary value cannot be equal to the decimal value, it can only be close. It is also desirable that two different decimal values convert to two different binary values.
D ---4-------------------5-------------> Decimals with K significant digits
d -9-0-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-3-> Decimals with K + 1 significant digits
B ---0----1----2----3----4----5----6---> Binary values
The diagram above shows three digital scales of different precisions. The first and the second decimal scales reflect the decimal numbers having K and K + 1 significant digits, respectively (for example, 0.12345 and 0.123456). The latter is the binary scale that reflects the representable range of the target binary type float
or double
. The diagram illustrates that there is a maximum number of significant digits (in our case, K) for which different decimals are converted to different binary values:
On the first diagram D5 is the decimal number, the K-th significant digit of which is 5. Consider D5 is a multiple of a power of two, D5 = n*2m (e.g., 1.71875 = 55 * 1/32) then it converts to the single B4 binary number on the binary scale.
On the second diagram d50 is the decimal number, the K-th and K-th + 1 significant digits of which are 5 and 0, respectively. d49, d50, and d51 (i.e., 1.718749, 1.718750, 1.718751) all convert to B4.
The latter conversion breaks the round-trip equality property as the original decimals cannot be restored after being converted to binary and back.
It was observed that all decimals with K = 6 significant digits convert to distinct float
values, but 8589973000 and 8589974000, with K = 7 significant digits, convert to the same float
value. Similarly, all decimals with K = 15 significant digits convert to unique double
values but 900719925474.0992 and 900719925474.0993, with K = 16 significant digits, convert to the same double
value. Over restricted ranges, the maximum may be higher - for example, every decimal value with 7 or fewer significant digits in the range (1*10-3.. 8.5*109) converts to a unique float
value.
Note
When converting a decimal representation (such as a character string containing a monetary value) to either float
or double
the maximum number of decimal digits K that can be represented such that the decimal value can later be restored is expressed by the constants std::numeric_limits<float>::digits10
(6) and std::numeric_limits<double>::digits10
(15) respectively.
Binary to Decimal Conversion¶
Round-trip conversion from binary floating-point to decimal and back requires choosing a number of significant digits for the intermediate decimal representation. Choose too few, and multiple binary values will convert to the same decimal value. For example, B3, B4, B5 all convert to D5, but B4 converts to d50. There is a minimum number of significant digits such that all representable binary floating-point values will convert to distinct decimal values. For float
, 9 digits are required and for double
, 17 digits are required. For example, the two 32-bit float values 0x447FFF92 and 0x447FFF93 both convert to 1023.9933 when only 8 digits are used, but respectively convert to 1023.99329 and 1023.99335 when 9 digits are used. Note that while using this minimum number of digits guarantees unique decimal values, it is also the case that shorter decimal values can convert to the same binary value, and some conversion methods, such as std::to_chars
(which first appeared in the C++17 Standard) require that for conversion to decimal, a shortest representation that converts back exactly be produced. With our sample numbers, 1023.9933 also converts to 0x447FFF92, so we might prefer that result when converting 0x447FFF92 to decimal rather than the longer 1023.99329.
Note
When converting a binary floating-point representation to a decimal representation (such as a character string or bdldfp::Decimal
) the minimum number of digits K that are necessary to uniquely represent all distinct values such that the binary floating-point value can later be restored is expressed by the constants std::numeric_limits<float>::max_digits10
(9) and std::numeric_limits<double>::max_digits10
(17) respectively.
Note
To summarize:
std::numeric_limits<T>::digits10
: maximum number of digits in a decimal value where all such values can be uniquely represented in the binary floating-point representation T.std::numeric_limits<T>::max_digits10
: the number of decimal digits required to exactly represent all possible values of a binary floating-point representation T (such that values can be restored).
Conversion Methods¶
Floating-point numbers can be encoded in different formats for a variety of purposes:
Binary floating-point representation that is commonly used for scientific computations.
Decimal floating-point representation that provides accuracy for financial computations.
Decimal character representation used to encode floating-point values in human readable formats, and in text-based interchange (JSON or XML).
The following sections describe different alternatives for converting between these representations:
Conversion From Character Floating-Point Representation¶
From Character String to Binary Floating-Point (float
, double
)¶
If a floating-point value is represented as a (decimal) character string, then a binary floating-point value can be obtained in the following ways:
Initializing or assigning a binary floating-point variable with a literal in source code (e.g.,
double x = 1.5;
). This is the most common type of conversion, which almost no program can do without. This conversion is so integral to the language that it is often overlooked, but the programmer should keep in mind that in order to be able to round-trip the initial value, the number of significant digits should not exceed 6 (7 for the 1*10-3.. 8.5*109 range) forfloat
and 15 fordouble
.The utility functions in the bdlb_numericparserutil component:
bdlb::NumericParseUtil::parseDouble
. This function provides behavior which parallels the Standard Librarystrtod
function, but corrects the implementation deficiencies instrtod
noted below (consistently handling special values NaN and Infinity, both positive and negative), treats leading white-space as an error (which is more suitable to strictly parsing textual encodings), and provides consistent functionality on all of Bloomberg’s production platforms.
The Standard Library functions:
std::atof
only applies fordouble
std::strtof
(since C++11),std::strtod
(since C++03),std::strtold
(since C++11). For developers that must support all Bloomberg production platformsstd::strtod
is the primary standard library facility available (though we would encourage usingbdlb::NumericParseUtil
). There are a couple points of non-conformance to be aware of:
Visual Studio, up to and including MSVC 2013, does not parse the special
double
values NaN and Infinity as specified for strtof;libstdc++, the library implementation used with GCC, parses negative NaN but returns positive NaN as the result.
std::stof
,std::stod
,std::stold
(since C++11)
std::from_chars
(since C++17) is a locale-independent, non-allocating, and non-throwing alternative to the functions above. This function provides the fastest possible implementation that is useful in common high-throughput contexts such as text-based interchange (JSON or XML).Note
Round-trip equality when using
std::from_chars
to recover a floating-point value from a string representation formatted bystd::to_chars
is only guaranteed if both functions are from the same implementation.Note
std::from_chars
is not supported by all libraries provided by the Standard Library C++ vendors that are used at Bloomberg. According to C++ compiler supportstd::from_chars
forfloat
anddouble
is:
Supported in GCC 11.1.0 and later
Supported in Visual Studio 2017 version 19.15 (note:
from_chars
was supported earlier thanto_chars
) and laterNot supported in Clang (as of version 11)
Not supported in Sun Studio CC (as of version 12.4)
Not supported in IBM xlC (as of version 12)
All of these functions produce a binary floating-point value which is one of at most two floating-point values closest to the value of the input character string.
From Character String to Decimal Floating-Point (bdldfp::Decimal
)¶
A text string, and bdldfp::Decimal
, are both decimal representations of a floating-point number, and so conversions between them are simpler and exact (assuming the required number of digits can be represented in the target type). The following mechanisms for converting a floating-point value represented as a (decimal) character string to an object of bdldfp::Decimal
type provide the same result value:
The utility functions in bdldfp_decimalutil component:
bdldfp::DecimalUtil::parseDecimal32
bdldfp::DecimalUtil::parseDecimal64
bdldfp::DecimalUtil::parseDecimal128
These functions parse the input (decimal) character string to the
bdldfp::Decimal
value as specified for thestd::strtod32
function in section 9.6 of the ISO/EIC TR 24732 C Decimal Floating-Point Technical Report, except that it is unspecified whether theNaN
values returned are quiet or signaling.
User defined literals in
bdldfp::literals::DecimalLiterals
namespace (requires C++11):operator "" _d32
,operator "" _d64
, andoperator "" _d128
. These user-defined literal suffixes can be applied to both numeric and string literals (i.e.,1.2_d128
,"1.2"_d128
or"inf"_d128
) to produce a decimal floating-point value of the indicated type by parsing the argument string or numeric value. (If the numeric form is used, note that a leading sign will not be considered part of the literal, but rather will be a unary operator applied to the literal represented by the remainder of the literal.)
using namespace bdldfp::DecimalLiterals;
bdldfp::Decimal32 d0 = "1.2"_d32;
bdldfp::Decimal32 d1 = 1.2_d32;
bdldfp::Decimal64 d2 = "-3.45678901234"_d64;
bdldfp::Decimal64 d3 = -3.45678901234_d64;
bdldfp::Decimal128 inf = "inf"_d128;
bdldfp::Decimal128 nan = "nan"_d128;
Conversion From Decimal Floating-Point Representation (bdldfp::Decimal
)¶
From bdldfp::Decimal
to Binary Floating-Point (float
, double
)¶
If a decimal value is represented as an object of bdldfp::Decimal
type, then a binary floating-point value can be obtained using the following functions:
The utility functions in the bdldfp_decimalconvertutil component:
bdldfp::DecimalConvertUtil::decimalToFloat
bdldfp::DecimalConvertUtil::decimalToDouble
These functions result in a binary floating-point representation having the value closest to the value of the input decimal object following the conversion rules defined by IEEE-754.
From bdldfp::Decimal
to Character Floating-Point (const char *
)¶
A text string, and bdldfp::Decimal
, are both decimal representations of a floating-point number, so conversions between them are simpler and exact (assuming the required number of digits can be represented in the target string). The following mechanisms for converting a bdlfp::Decimal
object to a (decimal) character string are typically interchangeable, resulting in the same numerical value (potentially formatted differently):
The stream output operator (
operator <<
) will write abdldfp::Decimal
value to the output stream. Thestd::fixed
andstd::scientific
manipulators are supported for thebdldfp::Decimal
types. If neither of these are supplied, then the decimal value is formated in the “natural” format as described in the IEEE-754 standard. The “natural” format preserves the quantum stored in thebdldfp::Decimal
value, such that the formated form of every (bitwise) differentbdldfp::Decimal
value formats differently and restores to the samebdldfp::Decimal
value if converted back.bdldfp::DecimalUtil::format
formats abdldfp::Decimal
value placing the output into a buffer. This function implements the same conversion functionality as the stream output operator and provides additional formatting options via abdldfp::DecimalFormatConfig
parameter (for example, customizing the NaN and/or Infinity output). When the function is invoked without a configuration, then the decimal value is formatted in the “scientific” format.
Conversion From Binary Floating-Point Representation (float
, double
)¶
Because binary floating-point values are generally not equal to their decimal progenitors, “converting from binary to decimal” does not have a single meaning, and programmers performing such a conversion must be more precise about the properties of the conversion they want to perform.
From Binary Floating-Point to Character Floating-Point (const char *
)¶
A user looking to convert a binary floating-point representation to a character string may have one of a few possible use-cases for the conversion:
Express the value as the shortest decimal character string that will convert back to the same value. For this conversion, use:
The Standard Library
std::to_chars
(since C++17) function is a locale-independent, non-allocating, and non-throwing alternative to the C Standard Library formatting functions (sprintf
, etc.). This function converts a binary floating-point value to the shortest string representation and provides the fastest possible implementation that is useful in common high-throughput contexts such as text-based interchange (JSON or XML).Note
Round-trip equality when using
std::from_chars
to recover a floating-point value from a string representation formatted bystd::to_chars
is only guaranteed if both functions are from the same implementation.Note
std::to_chars
is not supported by all libraries provided by the Standard Library C++ vendors that are used at Bloomberg. According to C++ compiler supportstd::to_chars
forfloat
anddouble
is:
Supported in GCC 11.1.0 and later
Supported in Visual Studio 2017 version 19.24 and later
Not supported in Clang (as of version 11)
Not supported in Sun Studio CC (as of version 12.4)
Not supported in IBM xlC (as of version 12)
Express the value rounded to a given number of significant digits. For this conversion, use
snprintf
into a large-enough buffer:
char buf[100]; double value; snprintf(buf, 100, '%.*g", digits, value);
The resulting value will be in either fixed or scientific format depending on the range into which the value falls.
Trailing zeros will be trimmed away. If the specified number of digits is at least numeric_limits<T>::max_digits10
(for float
or double
, depending on the type of the value being converted) then the resulting decimal string will
convert back to the same value of the same type.
Express the value rounded to a given number of decimal places in scientific notation. For this conversion, use
snprintf
into a large-enough buffer:
char buf[100]; double value; snprintf(buf, 100, '%.*e", places, value);
Note that unlike the %g
version above, here we specify the number of decimal places rather than the number of significant digits,
and the converted value will therefore have one more significant digit than the number of places specified. ALso unlike %g
, trailing
zeros will not be trimmed from the converted value.
Express the value in fixed-point form rounded to a given number of decimal places. (The decimal places of a decimal number are the number of digits after the decimal point, with trailing 0s removed; .01, 10.01, and 1000.01 each have two decimal places.) For this conversion, use
snprintf
into a large-enough buffer:
char buf[2000]; double value; snprintf(buf, 2000, "%.*f", places, value);
This form is suitable when the value being converted is known to fall into an expected range and the number of decimal places is of particular interest, such as when converting a value representing dollars and cents.
This form is not suitable for values that span large orders of magnitude, because the significant digits of large values will be followed by many zeros and then a decimal point and zeros for the specified number of decimal places, and the significant digits of small values will be lost past the number of specified decimal places.
When determining the number of decimal digits, if the floating-point value being converted is the result of scientific calculations, then to guarantee that two different floating-point values convert to two different decimals, the total number of significant digits (the sum of digits before and after the decimal point) should be at least 9 for
float
and 17 fordouble
(see discussion ofnumeric_limits<T>::max_digits10
above).Issues may still be introduced by a conversion. If a binary floating-point value is the result of conversion from the decimal representation, then this conversion can be problematic when the integer portion of the value is large, as there may not be enough precision remaining to deliver a meaningful number of decimal places. For example, for numbers near one trillion that consume 15 digits like 999,999,999,999.999, there is not enough precision in a
double
for 4 decimal places. If you were to try dealing with decimal values in that range but with four decimal places instead of three, you would be using 16 digits and you would not have unique binary values for all such decimals.
Express the value exactly as a decimal. For example, the decimal value .1 converts to the 32-bit IEEE
float
value 0x3DCCCCCD, which has the exact value .100000001490116119384765625. This conversion is seldom useful, except perhaps for debugging, since the exact value may have over 1000 digits. For the same reason, binary floating-point values often cannot be represented exactly as a decimal floating-point. For this conversion, usesnprintf
into a large-enough buffer (the result will have trailing 0s, which may be trimmed):
char buf[2000]; double value; snprintf(buf, 2000, "%.1100f", value);
From Binary Floating-Point to Decimal Floating-Point (bdldfp::Decimal
)¶
A user looking to convert a floating-point representation to a bdlfp::Decimal
type may have one of a few possible use-cases for the conversion. There are two general mechanisms to perform the conversion (the bdldfp::Decimal
constructors, and bdldfp::DecimalConvertUtil
functions like bdldfp::DecimalConvertUtil::decimal64FromDouble
). Below, we enumerate the use-cases and provide the appropriate function(s) and arguments for each.
When converting from a binary floating-point representation to a bdldfp::Decimal
type, a user may wish to:
Express the value as its nearest representable decimal value. For this conversion, use the conversion constructors:
bdldfp::Decimal32(value)
bdldfp::Decimal64(value)
bdldfp::Decimal128(value)
Note
Although this conversion is the easiest to use and simplest to understand, it is frequently not the correct choice.
Frequently, the binary floating-point value being converted to a bdldfp::Decimal
is an approximation of what was originally a decimal value (e.g., a monetary value). For such values one of the other conversions is likely to be more appropriate.
Express the value rounded to a specific number of significant digits. (The significant digits of a decimal number are the digits with all leading and trailing 0s removed; e.g., 0.00103, 10.3 and 10300 each have 3 significant digits.) This conversion is the one that leads programmers to complain about “rounding error” (for example, .1f rounded to 9 digits is .100000001) but is the appropriate one to use when the programmer knows that the binary value was originally converted from a decimal value with that many significant digits. For this conversion, use:
Result Type
float
double
Decimal32
decimal32FromFloat(value, digits)
decimal32FromDouble(value, digits)
Decimal64
decimal64FromFloat(value, digits)
decimal64FromDouble(value, digits)
Decimal128
decimal128FromFloat(value, digits)
decimal128FromDouble(value, digits)
Express the value using the minimum number of significant digits for the type of the binary such that converting the decimal value back to binary will yield the same value. (Note that 17 digits are needed for
double
and 9 forfloat
, so not all decimal types can hold such a result.) For this conversion, use:Result Type
float
double
Decimal64
decimal64FromFloat(value, 9)
Decimal128
decimal128FromFloat(value, 9)
decimal128FromDouble(value, 17)
Note
This conversion is appropriate when the binary representation that did not necessarily originate as a conversion from a decimal number, but more likely from some process of calculation (e.g., present value of a cash flow, or a yield curve point calculated from financial instrument prices).
Express the value using a number of decimal places that restores the original decimal value from which the binary value was converted, assuming that the original decimal value had sufficiently few significant digits so that no two values with that number of digits would convert to the same binary value. (That number is 15 for
double
and 6 forfloat
in general but 7 over a limited range that spans [1*10-3 .. 8.5*109]). For this conversion, use:Result Type
float
double
Decimal32
decimal32FromFloat(value)
decimal32FromDouble(value)
Decimal64
decimal64FromFloat(value)
decimal64FromDouble(value)
Decimal128
decimal128FromFloat(value)
decimal128FromDouble(value)
Note
This conversion or the one that follows is appropriate when the provenance of a value was a decimal representation (like a monetary value that has been converted to float
or double
).
Express the value as the shortest decimal number that converts back exactly to the binary value. For example, given the binary value 0x3DCCCCCD above, that corresponding shortest decimal value is (unsurprisingly) .1, while the next lower value 0x3DCCCCCC has the shortest decimal .099999994 and the next higher value 0x3DCCCCCE has the shortest decimal .10000001. This is the most visually appealing result, but can be expensive and slow to compute. For this conversion, use:
Result Type
float
double
Decimal32
decimal32FromFloat(value, -1)
decimal32FromDouble(value, -1)
Decimal64
decimal64FromFloat(value, -1)
decimal64FromDouble(value, -1)
Decimal128
decimal128FromFloat(value, -1)
decimal128FromDouble(value, -1)
Express the value using a number of decimal places that restores the original decimal value assuming that it is a
float
which originated as an IBM/Perkin-Elmer/Interdata float value itself originally converted from a decimal value. For this conversion, use:Result Type
float
Decimal32
decimal32FromFloat(value, 6)
Decimal64
decimal64FromFloat(value, 6)
Decimal128
decimal128FromFloat(value, 6)