Macros
#define	BDLDFP_DECIMAL_DF(lit) BloombergLP::bdldfp::Decimal32(BDLDFP_DECIMALIMPUTIL_DF(lit))

#define	BDLDFP_DECIMAL_DD(lit) BloombergLP::bdldfp::Decimal64(BDLDFP_DECIMALIMPUTIL_DD(lit))

#define	BDLDFP_DECIMAL_DL(lit) BloombergLP::bdldfp::Decimal128(BDLDFP_DECIMALIMPUTIL_DL(lit))

Detailed Description

Outline

Purpose
Classes
Macros
Description

Purpose

Provide IEEE-754 decimal floating-point types.

Classes

bdldfp::Decimal32: 32bit IEEE-754 decimal floating-point type
bdldfp::Decimal64: 64bit IEEE-754 decimal floating-point type
bdldfp::Decimal128: 128bit IEEE-754 decimal floating-point type
bdldfp::DecimalNumGet: Stream Input Facet
bdldfp::DecimalNumPut: Stream Output Facet

Macros

BDLDFP_DECIMAL_DF: Portable Decimal32 literal macro
BDLDFP_DECIMAL_DD: Portable Decimal64 literal macro
BDLDFP_DECIMAL_DL: Portable Decimal128 literal macro

See also: bdldfp_decimalutil, bdldfp_decimalconvertutil, bdldfp_decimalplatform

Description

This component provides classes that implement decimal floating-point types that conform in layout, encoding and operations to the IEEE-754 2008 standard. This component also provides two facets to support standard C++ streaming operators as specified by ISO/IEC TR-24733:2009. These classes are bdldfp::Decimal32 for 32-bit Decimal floating point numbers, bdldfp::Decimal64 for 64-bit Decimal floating point numbers, and bdldfp::Decimal128 for 128-bit decimal floating point numbers.

Decimal encoded floating-point numbers are important where exact representation of decimal fractions is required, such as in financial transactions. Binary encoded floating-point numbers are generally optimal for complex computation but cannot exactly represent commonly encountered numbers such as 0.1, 0.2, and 0.99.

NOTE: Interconversion between binary and decimal floating-point values is fraught with misunderstanding and must be done carefully and with intent, taking into account the provenance of the data. See the discussion on conversion below and in the bdldfp_decimalconvertutil component.

The BDE decimal floating-point system has been designed from the ground up to be portable and support writing portable decimal floating-point user code, even for systems that do not have compiler or native library support for it; while taking advantage of native support (such as ISO/IEC TR 24732 - C99 decimal TR) when available.

bdldfp::DecimalNumGet and bdldfp::DecimalNumPut are IO stream facets.

Floating-Point Primer

There are several ways of represent numbers when using digital computers. The simplest would be an integer format, however such a format severely limits the range of numbers that can be represented; and it cannot represent real (non-integer) numbers directly at all. Integers might be used to represent real numbers of limited precision by treating them as a multiple of the real value being represented; these are often known as fixed-point numbers. However general computations require higher precision and a larger range than integer and fixed point types are able to efficiently provide. Floating-point numbers provide what integers cannot. They are able to represent a large range of real values (although not precisely) while using a fixed (and reasonable) amount of storage.

Floating-point numbers are constructed from a set of significant digits of a radix on a sliding scale, where their position is determined by an exponent over the same radix. For example let's see some 32bit decimal (radix 10) floating-point numbers that have maximum 7 significant digits (significand):

 Significand | Exponent | Value        |
-------------+----------+--------------+  In the Value column you may
     1234567 |        0 |   1234567.0  |  observer how the decimal point
     1234567 |        1 |  12345670.0  |  is "floating" about the digits
     1234567 |        2 | 123456700.0  |  of the significand.
     1234567 |       -1 |    123456.7  |
     1234567 |       -2 |     12345.67 |

Floating-point numbers are standardized by IEEE-754 2008, in two major flavors: binary and decimal. Binary floating-point numbers are supported by most computer systems in the forms of the float, double and long double fundamental data types. While they are not required to be binary that is almost always the choice on modern binary computer architectures.

Floating-Point Peculiarities

Floating-point approximation of real numbers creates a deliberate illusion. While it looks like we are working with real numbers, floating-point encodings are not able to represent real numbers precisely since they have a restricted number of digits in the significand. In fact, a 64 bit floating-point type can represent fewer distinct values than a 64 bit binary integer. Yet, because floating-point encodings can represent numbers over a much larger range, including extremely small (fractional) numbers, they are useful in practice.

Floating-point peculiarities may be split into three categories: those that are due to the (binary) radix/base, those that are inherent properties of any floating-point representation and finally those that are introduced by the IEEE-754 2008 standard. Decimal floating-point addresses the first set of surprises only; so users still need to be aware of the rest.

Floating-point types cannot exactly represent every number in their range. The consequences are surprising and unexpected for the newcomer. For example: when using binary floating-point numbers, the following expression is typically false: 0.1 + 0.2 == 0.3. The problem is not limited to binary floating-point. Decimal floating-point cannot represent the value of one third exactly.
Unlike with real numbers, the order of operations on floating-point numbers is significant, due to accumulation of round off errors. Therefore floating-point arithmetic is neither commutative nor transitive. E.g., 2e-30 + 1e30 - 1e-30 - 1e30 will typically produce 0 (unless your significand can hold 60 decimal digits). Alternatively, 1e30 - 1e30 + 2e-30 - 1e-30 will typically produce 1e-30.
IEEE floating-point types can have special values: negative zero, negative and positive infinity; and they can be NaN (Not a Number, in two variants: quiet or signaling). A NaN (any variant) is never equal to anything else - including NaN or itself!
In IEEE floating-point there are at least two representations of 0, the positive zero and negative zero. Consequently unary - operators change the sign of the value 0; therefore leading to surprising results: if f == 0.0 then 0 - f and -f will not result in the same value, because 0 - f will be +0.0' while -f will be -0.0.
Most IEEE floating-point operations (like arithmetic) have implicit input parameters and output parameters (that do not show up in function signatures. The implicit input parameters are called attributes by IEEE while the outputs are called status flags. The C/C++ programming language defines a so-called floating-point environment that contains those attributes and flags (<fenv.h> C and <cfenv> C++ headers). To learn more about the floating point environment read the subsection of the same title, but first make sure you read the next point as well.
IEEE floating-points overloads some very common programming language terms: exception, signal and handler with IEEE floating-point specific meanings that are not to be confused with C or C++ or Posix terms of the same spelling. Floating-point exceptions are events that occur when a floating-point operations on the specified operands is unable to produce a perfect outcome; such as when the result of an operation is inexact. When a floating point exception occurs the (floating-point) - and reporting it is requested by a so-called trap attribute - the implementation signals the user(*) by invoking a default or a user-defined handler. None of the words exception, signal, and handler used above have nothing to do with C++ exceptions, Posix signals and the handlers of those. (To complicate matters more, C and Posix has decided to implement IEEE floating-point exception reporting as C/Posix signals - and therefore rendered them mostly useless.)
While a 32bit integer is a quite useful type for (integer) calculations, a 32bit floating-point type has such low accuracy (its significand is so short) that it is all but useless for calculation. Such types are called "interchange formats" by the IEEE standard and should not be used for calculations. (Except in special circumstances and by floating-point experts. Even a 16 bit binary floating-point type can be useful for an expert in special circumstances, for example in graphics acceleration hardware.)

Notes: (*) IEEE Floating-point user is any person, hardware or software that uses the IEEE floating-point implementation.

Floating-Point Environment

NOTE: We currently do not give access to the user to the floating-point environment used by our decimal system, so description of it here is preliminary and generic. Note that since compilers and the C library already provides a (possibly binary floating-point only) environment and we cannot change that, our decimal floating-point environment implementation cannot conform to the C and C++ TRs (because those require extending the existing standard C library functions).

The floating-point environment provides implicit input and output parameters to floating-point operations (that are defined to use them). IEEE defined those parameters in principle, but how they are provided is left up to be designed/defined by the implementors of the programming languages.

C (and consequently C++) decided to provide a so-called floating-point environment that has "thread storage duration", meaning that each thread of a multi-threaded program will have its own distinct floating-point environment.

The C/C++ floating-point environment consists of 3 major parts: the rounding mode, the traps and the status flags.

Rounding Direction in The Environment

A floating-point rounding direction determines how is the significand of a higher (or infinite) precision number get rounded to fit into the limited number of significant digits (significand) of the floating-point representation that needs to store it as a result of an operation. Note that the rounding is done in the radix of the representation, so binary floating-point will do binary rounding while decimal floating-point will do decimal rounding - and not all rounding modes are useful with all radixes. An example of a generally applicable rounding mode would be FE_TOWARDZERO (round towards zero).

Most floating point operations in C and C++ do not take a rounding direction parameter (and the ones that are implemented as operators simply could not). When such operations (that do not have an explicit rounding direction parameter) need to do rounding, they use the rounding direction set in the floating-point environment (of their thread of execution).

Status Flags

Floating point operations in C and C++ do not take a status flag output parameter. They report an important events (such as underflow, overflow or in inexact (rounded) result) by setting the appropriate status flag in the floating-point environment (of their thread of execution). (Note that this is very similar to how flags work in CPUs, and that is not a coincidence.) The flags work much like individual, boolean errno values. Operations may set them to true. Users may examine them (when interested) and also reset them (set them to 0) before an operation.

Floating-Point Traps

IEEE says that certain floating-point events are floating-point exceptions and they result in invoking a handler. It may be a default handler (set a status flag and continue) or a user defined handler. Floating point traps are a C invention to enable "sort-of handlers" for floating point exceptions, but unfortunately they all go to the same handler: the SIGFPE handler. To add insult to injury, setting what traps are active (what will) cause a SIGFPE) is not standardized. So floating-point exceptions and handlers are considered pretty much useless in C. (All is not lost, since we do have the status flags. An application that wants to know about floating-point events can clear the flags prior to an operation and check their values afterwards.)

Error Reporting

The bdldfp_decimalutil utility component provides a set of decimal math functions that parallel those provided for binary floating point in the C++ standard math library. Errors during computation of these functions (e.g., domain errors) will be reported through the setting of errno as described in the "Status Flags" section above. (Note that this method of reporting errors is atypical for BDE-provided interfaces, but matches the style used by the standard functions.)

Floating-Point Terminology

A floating-point representation of a number is defined as follows: sign * significand * BASE^exponent, where sign is -1 or +1, significand is an integer, BASE is a positive integer (but usually 2 or 10) and exponent is a negative or positive integer. Concrete examples of (decimal) numbers in the so-called scientific notation are: 123.4567 is 1.234567e2, while -0.000000000000000000000000000000000000001234567 would be -1.234567e-41.

"base": the number base of the scaling used by the exponent; and by the significand

"bias": the number added to the exponent before it is stored in memory; 101, 398 and 6176 for the 32, 64 and 128 bit types respectively.

"exponent": the scaling applied to the significand is calculated by raising the base to the exponent (which may be also negative)

"quantum": (IEEE-754) the value of one unit at the last significant digit position; in other words the smallest difference that can be represented by a floating-point number without changing its exponent.

"mantissa": the old name for the significand

"radix": another name for base

"sign": +1 or -1, determines if the number is positive or negative. It is normally represented by a single sign bit.

"significand": the significant digits of the floating-point number; the value of the number is: sign * significand * base^exponent

"precision": the significant digits of the floating-point type in its base

"decimal precision": the maximum significant decimal digits of the floating-point type

"range": the smallest and largest number the type can represent. Note that for floating-point types there are at least two interpretations of minimum. It may be the largest negative number or the smallest number in absolute value) that can be represented.

"normalized number": 1 <= significand <= base

"normalization": finding the exponent such as 1 <= significand <= base

"denormal number": significand < 1

"densely packed decimal": one of the two IEEE significand encoding schemes

"binary integer significand": one of the two IEEE significand encoding schemes

"cohorts": equal numbers encoded using different exponents (to signify accuracy)

Decimal Floating-Point

Binary floating-point formats give best accuracy, they are the fastest (on binary computers), and were carefully designed by IEEE to minimize rounding errors (errors due to the inherent imprecision of floating-point types) during a lengthy calculation. This makes them the best solution for and serious scientific computation. However, they have a fatal flow when it comes to numbers and calculations that involve humans. Humans think in base 10 - decimal. And as the example has shown earlier, binary floating-point formats are unable to precisely represent very common decimal real numbers; with binary floating-point 0.1 + 0.2 != 0.3. (Why? Because none of the three numbers in that expression have an exact binary floating-point representation.)

Financial calculations are governed by laws and expectations that are based on decimal (10 based) thinking. Due to the inherent limitations of the binary floating-point format, doing such decimal based calculations and algorithms using binary floating-point numbers is so involved and hard that that it is considered not feasible. The IEEE-754 committee have recognized the issue and added specifications for 3 decimal floating-point types into their 2008 standard: the 32, 64 and 128 bits decimal floating-point formats.

Floating-point types are carefully designed trade-offs between saving space (in memory), CPU cycles (for calculations) and still provide useful accuracy for computations. Decimal floating-point types represent further compromises (compared to binary floating-points) in being able to represent less numbers (than their binary counterparts) and being slower, but providing exact representations for the numbers humans care about.

In decimal floating-point world 0.1 + 0.2 == 0.3, as humans expect; because each of those 3 numbers can be represented exactly in a decimal floating-point format.

WARNING: Conversions from float and double

Clients should be careful when using the conversions from float and double provided by this component. In situations where a float or double was originally obtained from a decimal floating point representation (e.g., a bdldfp::Decimal, or a string, like "4.1"), the conversions in bdldfp_decimalconvertutil will provide the correct conversion back to a decimal floating point value. The conversions in this component provide the closest decimal floating point value to the supplied binary floating point representation, which may replicate imprecisions required to initially approximate the value in a binary representation. The conversions in this component are typically useful when converting binary floating point values that have undergone mathematical operations that require rounding (so they are already in-exact approximations).

Cohorts

In the binary floating-point world the formats are optimized for the highest precision, range and speed. They are stored normalized and therefore store no information about their accuracy. In finances, the area that decimal floating-point types target, accuracy of a number is usually very important. We may have a number that is 1, but we know it may be 1.001 or 1.002 etc. And we may have another number 1, which we know to be accurate to 6 significant digits. We would display the former number as 1.00 and the latter number as 1.00000. The decimal floating-point types are able to store both numbers and their precision using so called cohorts. The 1.00 will be stored as 100e-2 while 1.00000 will be stored as 100000e-5.

Cohorts compare equal, and mostly behave the same way in calculation except when it comes to the accuracy of the result. If I have a number that is accurate to 5 digits only, it would be a mistake to try to expect more than 5 digits accuracy from a calculation involving it. The IEEE-754 rules of cohorts (in calculations) ensures that results will be a cohort that indicates the proper expected accuracy.

Standards Conformance

The component has also been designed to resemble the C++ Decimal Floating-Point Technical Report ISO/IEC TR-24733 of 2009 and its C++11 updates of ISO/IEC JTC1 SC22 WG21 N3407=12-0097 of 2012 as much as it is possible with C++03 compilers and environments that do not provide decimal floating-point support in any form.

At the time of writing there is just one standard about decimal-floating point, the IEEE-754 2008 standard and the content of this component conforms to it. The component does not fully implement all required IEEE-754 functionality because due to our architectural design guidelines some of these must go into a separate so-called utility component.)

The component uses the ISO/IEC TR 24732 - the C Decimal Floating-Point TR - in its implementation where it is available.

The component closely resembles ISO/IEC TR 24733 - the C++ Decimal Floating-Point TR - but does not fully conform to it for several reasons. The major reasons are: it is well known that TR 24733 has to change before it is included into the C++ standard; the TR would require us to change system header files we do not have access to.

In the following subsections the differences to the C++ technical report are explained in detail, including a short rationale.

No Namespace Level Named Functions

BDE design guidelines do not allow namespace level functions other than operators and aspects. According to BDE design principles all such functions are placed into a utility component.

All Converting Constructors from Integer Types are Explicit

This change is necessary to disable the use of comparison operators without explicit casting. See No Heterogeneous Comparisons Without Casting.

No Heterogeneous Comparisons Without Casting

The C and C++ Decimal TRs refer to IEEE-754 for specifications of the heterogeneous comparison operators (comparing decimal floating-point types to binary floating-point types and integer types); however IEEE-754 does not specify such operations - leaving them unspecified. To make matters worse, there are two possible ways to implement those operators (convert the decimal to the other type, or convert the other type to decimal first) and depending on which one is chosen, the result of the operator will be different. Also, the C committee is considering the removal of those operators. We have removed them until we know how to implement them. Comparing decimal types to those other types is still possible, it just requires explicit casting/conversion from the user code.

Arithmetic And Computing Support For Decimal32

IEEE-754 designates the 32 bit floating-point types "interchange formats" and does not require or recommend arithmetic or computing support of any kind for them. The C (and consequently the C++) TR goes against the IEEE design and requires _Decimal32 (and std::decimal32) to provide computing support, however, in a twist, allows it to be performed using one of the larger types (64 or 128 bits). The rationale from the C committee is that small embedded systems may need to do their calculations using the small type (so they have made it mandatory for everyone). To conform the requirement we provide arithmetic and computing support for Decimal32 type but users need to be aware of the drawbacks of calculations using the small type. Industry experience with the float C type (32bit floating-point type, usually binary) has shown that enabling computing using small floating-point types are a mistake that causes novice programmers to write calculations that are very slow and inaccurate.

We recommend what IEEE recommends: convert your 32 bit types on receipt to a type with higher precision (usually 64 bit will suffice), so you calculations using that larger type, and convert it back to 32 bit type only if your output interchange format requires it.

Non-Standard Member Functions

Due to BDE rules of design and some implementation needs we have extended the C++ TR mandated interface of the decimal floating-point types to include support for accessing the underlying data (type), to parse literals for the portable literal support.

Note that using any of these public member functions will render your code non-portable to non-BDE (but standards conforming) implementations.

Decimal32 Type

A basic format type that supports input, output, relational operators construction from the TR mandates data types and arithmetic or operations. The type has the size of exactly 32 bits. It supports 7 significant decimal digits and an exponent range of -95 to 96. The smallest non-zero value that can be represented is 1e-101.

Portable Decimal32 literals are created using the BDLDFP_DECIMAL_DF macro.

Decimal64 Type

A basic format type that supports input, output, relational operators construction from the TR mandates data types and arithmetic or operations. The type has the size of exactly 64 bits. It supports 16 significant decimal digits and an exponent range of -383 to 384. The smallest non-zero value that can be represented is 1e-398.

Portable Decimal64 literals are created using the BDLDFP_DECIMAL_DD macro.

Decimal128 Type

A basic format type that supports input, output, relational operators construction from the TR mandates data types and arithmetic or operations. The type has the size of exactly 128 bits. It supports 34 significant decimal digits and an exponent range of -6143 to 6144. The smallest non-zero value that can be represented is 1e-6176.

Portable Decimal128 literals are created using the BDLDFP_DECIMAL_DL macro.

Decimal Number Formatting

Streaming decimal floating point numbers to an output stream supports formatting flags for width, capitalization and justification and flags used to output numbers in natural, scientific and fixed notations. When scientific or fixed flags are set then the precision manipulator specifies how many digits of the decimal number are to be printed, otherwise all significant digits of the decimal number are output using native notation.

User-defined literals

The user-defined literal operator "" _d32, operator "" _d64, and operator "" _d128 are declared for the bdldfp::Decimal32, bdldfp::Decimal64, and bdldfp::Decimal128 types respectively . These user-defined literal suffixes can be applied to both numeric and string literals, (i.e., 1.2_d128, "1.2"_d128 or "inf"_d128) to produce a decimal floating-point value of the indicated type by parsing the argument string or numeric value:

using namespace bdldfp::DecimalLiterals;
 
bdldfp::Decimal32   d0  = "1.2"_d32;
bdldfp::Decimal32   d1  =  1.2_d32;
assert(d0 == d1);
 
bdldfp::Decimal64   d2  = "3.45678901234"_d64;
bdldfp::Decimal64   d3  =  3.45678901234_d64;
assert(d2 == d3);
 
bdldfp::Decimal128  inf = "inf"_d128;
bdldfp::Decimal128  nan = "nan"_d128;

The operators providing literals are available in the BloombergLP::bdldfp::literals::DecimalLiterals namespace (where literals and DecimalLiterals are both inline namespaces). Because of inline namespaces, there are several viable options for a using declaration, but we recommend using namespace bdldfp::DecimalLiterals, which minimizes the scope of the using declaration.

Note that the parsing follows the rules as specified for the strtod32, strtod64 and strtod128 functions in section 9.6 of the ISO/EIC TR 247128 C Decimal Floating-Point Technical Report.

Also note that these operators can be used only if the compiler supports C++11 standard.

Usage

In this section, we show the intended usage of this component.

Example 1: Portable Initialization of Non-Integer, Constant Values

If your compiler does not support the C Decimal TR, it does not support decimal floating-point literals, only binary floating-point literals. The problem with binary floating-point literals is the same as with binary floating-point numbers in general: they cannot represent the decimal numbers we care about. To solve this problem there are 3 macros provided by this component that can be used to initialize decimal floating-point types with non-integer values, precisely. These macros will evaluate to real, C language literals where those are supported and to a runtime-parsed solution otherwise. The following code demonstrates the use of these macros as well as mixed-type arithmetics and comparisons:

bdldfp::Decimal32  d32( BDLDFP_DECIMAL_DF(0.1));
bdldfp::Decimal64  d64( BDLDFP_DECIMAL_DD(0.2));
bdldfp::Decimal128 d128(BDLDFP_DECIMAL_DL(0.3));
 
assert(d32 + d64 == d128);
assert(bdldfp::Decimal64(d32)  * 10 == bdldfp::Decimal64(1));
assert(d64  * 10 == bdldfp::Decimal64(2));
assert(d128 * 10 == bdldfp::Decimal128(3));

Example 2: Precise Calculations with Decimal Values

Suppose we need to add two (decimal) numbers and then tell if the result is a particular decimal number or not. That can get difficult with binary floating-point, but easy with decimal:

if (std::numeric_limits<double>::radix == 2) {
  assert(.1 + .2 != .3);
}
assert(BDLDFP_DECIMAL_DD(0.1) + BDLDFP_DECIMAL_DD(0.2)
    == BDLDFP_DECIMAL_DD(0.3));

Macro Definition Documentation

◆ BDLDFP_DECIMAL_DD

#define BDLDFP_DECIMAL_DD ( lit ) BloombergLP::bdldfp::Decimal64(BDLDFP_DECIMALIMPUTIL_DD(lit))

◆ BDLDFP_DECIMAL_DF

#define BDLDFP_DECIMAL_DF ( lit ) BloombergLP::bdldfp::Decimal32(BDLDFP_DECIMALIMPUTIL_DF(lit))

◆ BDLDFP_DECIMAL_DL

#define BDLDFP_DECIMAL_DL ( lit ) BloombergLP::bdldfp::Decimal128(BDLDFP_DECIMALIMPUTIL_DL(lit))