Mar 30, 2021

Binary to Decimal and Back Again

Abstract

Handling floating-point numbers in computer systems is hard. Types like double and float (IEEE 754-1985) both provide approximations to the mathematical concept of a real number, which, for the most part, allow us to ignore this complexity. However, software where we expose the limits of these approximations can be a source of errors, and one area that is a particularly common source of errors at Bloomberg is the conversion between binary representations of floating-point numbers (i.e., double and float) and decimal representations (like the string “3.14159”). Such conversions are especially relevant at Bloomberg because financial calculations are governed by laws and expectations that are based on decimal (base-10) thinking. In this article we seek to explain the complexity in converting between binary and decimal floating-point representations, and to guide developers in performing such conversions safely.

Introduction

Modern computer systems typically operate on floating-point numbers represented in IEEE-754 binary format. Binary floating-point numbers are optimal for scientific computations due to the high computational performance enabled by the hardware (FPUs), their high precision, and carefully designed mathematical operations that minimize accumulation of rounding errors during lengthy calculations. However, binary floating-point numbers are far less suitable for computations involving humans, particularly those in the financial sector.

Financial calculations are governed by laws and expectations that are based on decimal (base-10) thinking. Since binary floating-point cannot represent decimal values exactly, it is difficult to use them while maintaining decimal accuracy requirements. Doing such decimal-based calculations and algorithms using binary floating-point numbers is so hard that it is considered not feasible. The IEEE-754 committee have recognized the issue and added specifications for 3 decimal floating-point types into their 2008 standard: the 32, 64, and 128-bit decimal floating-point formats (BDE provides implementations of decimal floating-point types in the bdldfp_decimal component).

Having two numeric formats, binary floating-point representations (like float and double) and decimal floating-point representations (like the bdldfp::Decimal type and any textual representations of a floating-point value) raises the requirement of converting between them.

The desire to convert numbers from binary floating-point to decimal format is fraught with misunderstanding, and is often accompanied by ill-conceived attempts to “correct rounding errors” and otherwise coerce results into aesthetically pleasing forms. In the Bloomberg environment, there is the additional complication of IBM/Perkin-Elmer/Interdata floating-point. This is a floating-point format that uses a base-16 rather than a base-2 underlying representation, and the Bloomberg environment contains numbers that began as decimals, were converted to IBM format, and then were converted from IBM format to IEEE format.

Preserving Equality Through Conversions (“Round-Trip Equality”)

When we consider conversions between binary and decimal representations an important property we want to maintain is that the converted value in the original representation is equal to its original value (what we call “round-trip equality”). That is, we want to be able take a number in a decimal representation, convert it to a binary representation, and convert it back to a decimal representation and for that resulting value to be equal to the value we started with.

Decimal to Binary Conversion

Generally, when a decimal floating-point value is converted to binary floating-point, the result is the representable binary number nearest in value to that decimal. Unless the decimal value is exactly a multiple of a power of two (e.g., 3.4375 = 55 * 1/16), the converted binary value cannot be equal to the decimal value, it can only be close. It is also desirable that two different decimal values convert to two different binary values.

D ---4-------------------5-------------> Decimals with K significant digits
d -9-0-1-2-3-4-5-6-7-8-9-0-1-2-3-4-5-3-> Decimals with K + 1 significant digits
B ---0----1----2----3----4----5----6---> Binary values

The diagram above shows three digital scales of different precisions. The first and the second decimal scales reflect the decimal numbers having K and K + 1 significant digits, respectively (for example, 0.12345 and 0.123456). The latter is the binary scale that reflects the representable range of the target binary type float or double. The diagram illustrates that there is a maximum number of significant digits (in our case, K) for which different decimals are converted to different binary values:

  • On the first diagram D5 is the decimal number, the K-th significant digit of which is 5. Consider D5 is a multiple of a power of two, D5 = n*2m (e.g., 1.71875 = 55 * 1/32) then it converts to the single B4 binary number on the binary scale.

  • On the second diagram d50 is the decimal number, the K-th and K-th + 1 significant digits of which are 5 and 0, respectively. d49, d50, and d51 (i.e., 1.718749, 1.718750, 1.718751) all convert to B4.

The latter conversion breaks the round-trip equality property as the original decimals cannot be restored after being converted to binary and back.

It was observed that all decimals with K = 6 significant digits convert to distinct float values, but 8589973000 and 8589974000, with K = 7 significant digits, convert to the same float value. Similarly, all decimals with K = 15 significant digits convert to unique double values but 900719925474.0992 and 900719925474.0993, with K = 16 significant digits, convert to the same double value. Over restricted ranges, the maximum may be higher - for example, every decimal value with 7 or fewer significant digits in the range (1*10-3.. 8.5*109) converts to a unique float value.

Note

When converting a decimal representation (such as a character string containing a monetary value) to either float or double the maximum number of decimal digits K that can be represented such that the decimal value can later be restored is expressed by the constants std::numeric_limits<float>::digits10 (6) and std::numeric_limits<double>::digits10 (15) respectively.

Binary to Decimal Conversion

Round-trip conversion from binary floating-point to decimal and back requires choosing a number of significant digits for the intermediate decimal representation. Choose too few, and multiple binary values will convert to the same decimal value. For example, B3, B4, B5 all convert to D5, but B4 converts to d50. There is a minimum number of significant digits such that all representable binary floating-point values will convert to distinct decimal values. For float, 9 digits are required and for double, 17 digits are required. For example, the two 32-bit float values 0x447FFF92 and 0x447FFF93 both convert to 1023.9933 when only 8 digits are used, but respectively convert to 1023.99329 and 1023.99335 when 9 digits are used. Note that while using this minimum number of digits guarantees unique decimal values, it is also the case that shorter decimal values can convert to the same binary value, and some conversion methods, such as std::to_chars (which first appeared in the C++17 Standard) require that for conversion to decimal, a shortest representation that converts back exactly be produced. With our sample numbers, 1023.9933 also converts to 0x447FFF92, so we might prefer that result when converting 0x447FFF92 to decimal rather than the longer 1023.99329.

Note

When converting a binary floating-point representation to a decimal representation (such as a character string or bdldfp::Decimal) the minimum number of digits K that are necessary to uniquely represent all distinct values such that the binary floating-point value can later be restored is expressed by the constants std::numeric_limits<float>::max_digits10 (9) and std::numeric_limits<double>::max_digits10 (17) respectively.

Note

To summarize:

  • std::numeric_limits<T>::digits10: maximum number of digits in a decimal value where all such values can be uniquely represented in the binary floating-point representation T.

  • std::numeric_limits<T>::max_digits10: the number of decimal digits required to exactly represent all possible values of a binary floating-point representation T (such that values can be restored).

Conversion Methods

Floating-point numbers can be encoded in different formats for a variety of purposes:

  • Binary floating-point representation that is commonly used for scientific computations.

  • Decimal floating-point representation that provides accuracy for financial computations.

  • Decimal character representation used to encode floating-point values in human readable formats, and in text-based interchange (JSON or XML).

The following sections describe different alternatives for converting between these representations:

Conversion From Character Floating-Point Representation

From Character String to Binary Floating-Point (float, double)

If a floating-point value is represented as a (decimal) character string, then a binary floating-point value can be obtained in the following ways:

  1. Initializing or assigning a binary floating-point variable with a literal in source code (e.g., double x = 1.5;). This is the most common type of conversion, which almost no program can do without. This conversion is so integral to the language that it is often overlooked, but the programmer should keep in mind that in order to be able to round-trip the initial value, the number of significant digits should not exceed 6 (7 for the 1*10-3.. 8.5*109 range) for float and 15 for double.

  2. The utility functions in the bdlb_numericparserutil component:

  • bdlb::NumericParseUtil::parseDouble. This function provides behavior which parallels the Standard Library strtod function, but corrects the implementation deficiencies in strtod noted below (consistently handling special values NaN and Infinity, both positive and negative), treats leading white-space as an error (which is more suitable to strictly parsing textual encodings), and provides consistent functionality on all of Bloomberg’s production platforms.

  1. The Standard Library functions:

  • std::atof only applies for double

  • std::strtof (since C++11), std::strtod (since C++03), std::strtold (since C++11). For developers that must support all Bloomberg production platforms std::strtod is the primary standard library facility available (though we would encourage using bdlb::NumericParseUtil). There are a couple points of non-conformance to be aware of:

  • Visual Studio, up to and including MSVC 2013, does not parse the special double values NaN and Infinity as specified for strtof;

  • libstdc++, the library implementation used with GCC, parses negative NaN but returns positive NaN as the result.

  • std::stof, std::stod, std::stold (since C++11)

  • std::from_chars (since C++17) is a locale-independent, non-allocating, and non-throwing alternative to the functions above. This function provides the fastest possible implementation that is useful in common high-throughput contexts such as text-based interchange (JSON or XML).

Note

Round-trip equality when using std::from_chars to recover a floating-point value from a string representation formatted by std::to_chars is only guaranteed if both functions are from the same implementation.

Note

std::from_chars is not supported by all libraries provided by the Standard Library C++ vendors that are used at Bloomberg. According to C++ compiler support std::from_chars for float and double is:

  • Supported in GCC 11.1.0 and later

  • Supported in Visual Studio 2017 version 19.15 (note: from_chars was supported earlier than to_chars) and later

  • Not supported in Clang (as of version 11)

  • Not supported in Sun Studio CC (as of version 12.4)

  • Not supported in IBM xlC (as of version 12)

All of these functions produce a binary floating-point value which is one of at most two floating-point values closest to the value of the input character string.

From Character String to Decimal Floating-Point (bdldfp::Decimal)

A text string, and bdldfp::Decimal, are both decimal representations of a floating-point number, and so conversions between them are simpler and exact (assuming the required number of digits can be represented in the target type). The following mechanisms for converting a floating-point value represented as a (decimal) character string to an object of bdldfp::Decimal type provide the same result value:

  1. The utility functions in bdldfp_decimalutil component:

  • bdldfp::DecimalUtil::parseDecimal32

  • bdldfp::DecimalUtil::parseDecimal64

  • bdldfp::DecimalUtil::parseDecimal128

These functions parse the input (decimal) character string to the bdldfp::Decimal value as specified for the std::strtod32 function in section 9.6 of the ISO/EIC TR 24732 C Decimal Floating-Point Technical Report, except that it is unspecified whether the NaN values returned are quiet or signaling.

  1. User defined literals in bdldfp::literals::DecimalLiterals namespace (requires C++11): operator "" _d32, operator "" _d64, and operator "" _d128. These user-defined literal suffixes can be applied to both numeric and string literals (i.e., 1.2_d128, "1.2"_d128 or "inf"_d128) to produce a decimal floating-point value of the indicated type by parsing the argument string or numeric value. (If the numeric form is used, note that a leading sign will not be considered part of the literal, but rather will be a unary operator applied to the literal represented by the remainder of the literal.)

using namespace bdldfp::DecimalLiterals;

bdldfp::Decimal32   d0  = "1.2"_d32;
bdldfp::Decimal32   d1  =  1.2_d32;

bdldfp::Decimal64   d2  = "-3.45678901234"_d64;
bdldfp::Decimal64   d3  =  -3.45678901234_d64;

bdldfp::Decimal128  inf = "inf"_d128;
bdldfp::Decimal128  nan = "nan"_d128;

Conversion From Decimal Floating-Point Representation (bdldfp::Decimal)

From bdldfp::Decimal to Binary Floating-Point (float, double)

If a decimal value is represented as an object of bdldfp::Decimal type, then a binary floating-point value can be obtained using the following functions:

  1. The utility functions in the bdldfp_decimalconvertutil component:

  • bdldfp::DecimalConvertUtil::decimalToFloat

  • bdldfp::DecimalConvertUtil::decimalToDouble

These functions result in a binary floating-point representation having the value closest to the value of the input decimal object following the conversion rules defined by IEEE-754.

From bdldfp::Decimal to Character Floating-Point (const char *)

A text string, and bdldfp::Decimal, are both decimal representations of a floating-point number, so conversions between them are simpler and exact (assuming the required number of digits can be represented in the target string). The following mechanisms for converting a bdlfp::Decimal object to a (decimal) character string are typically interchangeable, resulting in the same numerical value (potentially formatted differently):

  • The stream output operator (operator <<) will write a bdldfp::Decimal value to the output stream. The std::fixed and std::scientific manipulators are supported for the bdldfp::Decimal types. If neither of these are supplied, then the decimal value is formated in the “natural” format as described in the IEEE-754 standard. The “natural” format preserves the quantum stored in the bdldfp::Decimal value, such that the formated form of every (bitwise) different bdldfp::Decimal value formats differently and restores to the same bdldfp::Decimal value if converted back.

  • bdldfp::DecimalUtil::format formats a bdldfp::Decimal value placing the output into a buffer. This function implements the same conversion functionality as the stream output operator and provides additional formatting options via a bdldfp::DecimalFormatConfig parameter (for example, customizing the NaN and/or Infinity output). When the function is invoked without a configuration, then the decimal value is formatted in the “scientific” format.

Conversion From Binary Floating-Point Representation (float, double)

Because binary floating-point values are generally not equal to their decimal progenitors, “converting from binary to decimal” does not have a single meaning, and programmers performing such a conversion must be more precise about the properties of the conversion they want to perform.

From Binary Floating-Point to Character Floating-Point (const char *)

A user looking to convert a binary floating-point representation to a character string may have one of a few possible use-cases for the conversion:

  1. Express the value as the shortest decimal character string that will convert back to the same value. For this conversion, use:

  • The Standard Library std::to_chars (since C++17) function is a locale-independent, non-allocating, and non-throwing alternative to the C Standard Library formatting functions (sprintf, etc.). This function converts a binary floating-point value to the shortest string representation and provides the fastest possible implementation that is useful in common high-throughput contexts such as text-based interchange (JSON or XML).

Note

Round-trip equality when using std::from_chars to recover a floating-point value from a string representation formatted by std::to_chars is only guaranteed if both functions are from the same implementation.

Note

std::to_chars is not supported by all libraries provided by the Standard Library C++ vendors that are used at Bloomberg. According to C++ compiler support std::to_chars for float and double is:

  • Supported in GCC 11.1.0 and later

  • Supported in Visual Studio 2017 version 19.24 and later

  • Not supported in Clang (as of version 11)

  • Not supported in Sun Studio CC (as of version 12.4)

  • Not supported in IBM xlC (as of version 12)

  1. Express the value rounded to a given number of significant digits. For this conversion, use snprintf into a large-enough buffer:

  • char buf[100]; double value; snprintf(buf, 100, '%.*g", digits, value);

The resulting value will be in either fixed or scientific format depending on the range into which the value falls. Trailing zeros will be trimmed away. If the specified number of digits is at least numeric_limits<T>::max_digits10 (for float or double, depending on the type of the value being converted) then the resulting decimal string will convert back to the same value of the same type.

  1. Express the value rounded to a given number of decimal places in scientific notation. For this conversion, use snprintf into a large-enough buffer:

  • char buf[100]; double value; snprintf(buf, 100, '%.*e", places, value);

Note that unlike the %g version above, here we specify the number of decimal places rather than the number of significant digits, and the converted value will therefore have one more significant digit than the number of places specified. ALso unlike %g, trailing zeros will not be trimmed from the converted value.

  1. Express the value in fixed-point form rounded to a given number of decimal places. (The decimal places of a decimal number are the number of digits after the decimal point, with trailing 0s removed; .01, 10.01, and 1000.01 each have two decimal places.) For this conversion, use snprintf into a large-enough buffer:

  • char buf[2000]; double value; snprintf(buf, 2000, "%.*f", places, value);

  • This form is suitable when the value being converted is known to fall into an expected range and the number of decimal places is of particular interest, such as when converting a value representing dollars and cents.

  • This form is not suitable for values that span large orders of magnitude, because the significant digits of large values will be followed by many zeros and then a decimal point and zeros for the specified number of decimal places, and the significant digits of small values will be lost past the number of specified decimal places.

  • When determining the number of decimal digits, if the floating-point value being converted is the result of scientific calculations, then to guarantee that two different floating-point values convert to two different decimals, the total number of significant digits (the sum of digits before and after the decimal point) should be at least 9 for float and 17 for double (see discussion of numeric_limits<T>::max_digits10 above).

  • Issues may still be introduced by a conversion. If a binary floating-point value is the result of conversion from the decimal representation, then this conversion can be problematic when the integer portion of the value is large, as there may not be enough precision remaining to deliver a meaningful number of decimal places. For example, for numbers near one trillion that consume 15 digits like 999,999,999,999.999, there is not enough precision in a double for 4 decimal places. If you were to try dealing with decimal values in that range but with four decimal places instead of three, you would be using 16 digits and you would not have unique binary values for all such decimals.

  1. Express the value exactly as a decimal. For example, the decimal value .1 converts to the 32-bit IEEE float value 0x3DCCCCCD, which has the exact value .100000001490116119384765625. This conversion is seldom useful, except perhaps for debugging, since the exact value may have over 1000 digits. For the same reason, binary floating-point values often cannot be represented exactly as a decimal floating-point. For this conversion, use snprintf into a large-enough buffer (the result will have trailing 0s, which may be trimmed):

  • char buf[2000]; double value; snprintf(buf, 2000, "%.1100f", value);

From Binary Floating-Point to Decimal Floating-Point (bdldfp::Decimal)

A user looking to convert a floating-point representation to a bdlfp::Decimal type may have one of a few possible use-cases for the conversion. There are two general mechanisms to perform the conversion (the bdldfp::Decimal constructors, and bdldfp::DecimalConvertUtil functions like bdldfp::DecimalConvertUtil::decimal64FromDouble). Below, we enumerate the use-cases and provide the appropriate function(s) and arguments for each.

When converting from a binary floating-point representation to a bdldfp::Decimal type, a user may wish to:

  1. Express the value as its nearest representable decimal value. For this conversion, use the conversion constructors:

  • bdldfp::Decimal32(value)

  • bdldfp::Decimal64(value)

  • bdldfp::Decimal128(value)

Note

Although this conversion is the easiest to use and simplest to understand, it is frequently not the correct choice.

Frequently, the binary floating-point value being converted to a bdldfp::Decimal is an approximation of what was originally a decimal value (e.g., a monetary value). For such values one of the other conversions is likely to be more appropriate.

  1. Express the value rounded to a specific number of significant digits. (The significant digits of a decimal number are the digits with all leading and trailing 0s removed; e.g., 0.00103, 10.3 and 10300 each have 3 significant digits.) This conversion is the one that leads programmers to complain about “rounding error” (for example, .1f rounded to 9 digits is .100000001) but is the appropriate one to use when the programmer knows that the binary value was originally converted from a decimal value with that many significant digits. For this conversion, use:

    Result Type

    float

    double

    Decimal32

    decimal32FromFloat(value, digits)

    decimal32FromDouble(value, digits)

    Decimal64

    decimal64FromFloat(value, digits)

    decimal64FromDouble(value, digits)

    Decimal128

    decimal128FromFloat(value, digits)

    decimal128FromDouble(value, digits)

  2. Express the value using the minimum number of significant digits for the type of the binary such that converting the decimal value back to binary will yield the same value. (Note that 17 digits are needed for double and 9 for float, so not all decimal types can hold such a result.) For this conversion, use:

    Result Type

    float

    double

    Decimal64

    decimal64FromFloat(value, 9)

    Decimal128

    decimal128FromFloat(value, 9)

    decimal128FromDouble(value, 17)

Note

This conversion is appropriate when the binary representation that did not necessarily originate as a conversion from a decimal number, but more likely from some process of calculation (e.g., present value of a cash flow, or a yield curve point calculated from financial instrument prices).

  1. Express the value using a number of decimal places that restores the original decimal value from which the binary value was converted, assuming that the original decimal value had sufficiently few significant digits so that no two values with that number of digits would convert to the same binary value. (That number is 15 for double and 6 for float in general but 7 over a limited range that spans [1*10-3 .. 8.5*109]). For this conversion, use:

    Result Type

    float

    double

    Decimal32

    decimal32FromFloat(value)

    decimal32FromDouble(value)

    Decimal64

    decimal64FromFloat(value)

    decimal64FromDouble(value)

    Decimal128

    decimal128FromFloat(value)

    decimal128FromDouble(value)

Note

This conversion or the one that follows is appropriate when the provenance of a value was a decimal representation (like a monetary value that has been converted to float or double).

  1. Express the value as the shortest decimal number that converts back exactly to the binary value. For example, given the binary value 0x3DCCCCCD above, that corresponding shortest decimal value is (unsurprisingly) .1, while the next lower value 0x3DCCCCCC has the shortest decimal .099999994 and the next higher value 0x3DCCCCCE has the shortest decimal .10000001. This is the most visually appealing result, but can be expensive and slow to compute. For this conversion, use:

    Result Type

    float

    double

    Decimal32

    decimal32FromFloat(value, -1)

    decimal32FromDouble(value, -1)

    Decimal64

    decimal64FromFloat(value, -1)

    decimal64FromDouble(value, -1)

    Decimal128

    decimal128FromFloat(value, -1)

    decimal128FromDouble(value, -1)

  2. Express the value using a number of decimal places that restores the original decimal value assuming that it is a float which originated as an IBM/Perkin-Elmer/Interdata float value itself originally converted from a decimal value. For this conversion, use:

    Result Type

    float

    Decimal32

    decimal32FromFloat(value, 6)

    Decimal64

    decimal64FromFloat(value, 6)

    Decimal128

    decimal128FromFloat(value, 6)