BDE 4.14.0 Production release
|
Macros | |
#define | BDLDFP_DECIMAL_DF(lit) BloombergLP::bdldfp::Decimal32(BDLDFP_DECIMALIMPUTIL_DF(lit)) |
#define | BDLDFP_DECIMAL_DD(lit) BloombergLP::bdldfp::Decimal64(BDLDFP_DECIMALIMPUTIL_DD(lit)) |
#define | BDLDFP_DECIMAL_DL(lit) BloombergLP::bdldfp::Decimal128(BDLDFP_DECIMALIMPUTIL_DL(lit)) |
Provide IEEE-754 decimal floating-point types.
This component provides classes that implement decimal floating-point types that conform in layout, encoding and operations to the IEEE-754 2008 standard. This component also provides two facets to support standard C++ streaming operators as specified by ISO/IEC TR-24733:2009. These classes are bdldfp::Decimal32
for 32-bit Decimal floating point numbers, bdldfp::Decimal64
for 64-bit Decimal floating point numbers, and bdldfp::Decimal128
for 128-bit decimal floating point numbers.
Decimal encoded floating-point numbers are important where exact representation of decimal fractions is required, such as in financial transactions. Binary encoded floating-point numbers are generally optimal for complex computation but cannot exactly represent commonly encountered numbers such as 0.1, 0.2, and 0.99.
NOTE: Interconversion between binary and decimal floating-point values is fraught with misunderstanding and must be done carefully and with intent, taking into account the provenance of the data. See the discussion on conversion below and in the bdldfp_decimalconvertutil component.
The BDE decimal floating-point system has been designed from the ground up to be portable and support writing portable decimal floating-point user code, even for systems that do not have compiler or native library support for it; while taking advantage of native support (such as ISO/IEC TR 24732 - C99 decimal TR) when available.
bdldfp::DecimalNumGet
and bdldfp::DecimalNumPut
are IO stream facets.
There are several ways of represent numbers when using digital computers. The simplest would be an integer format, however such a format severely limits the range of numbers that can be represented; and it cannot represent real (non-integer) numbers directly at all. Integers might be used to represent real numbers of limited precision by treating them as a multiple of the real value being represented; these are often known as fixed-point numbers. However general computations require higher precision and a larger range than integer and fixed point types are able to efficiently provide. Floating-point numbers provide what integers cannot. They are able to represent a large range of real values (although not precisely) while using a fixed (and reasonable) amount of storage.
Floating-point numbers are constructed from a set of significant digits of a radix on a sliding scale, where their position is determined by an exponent over the same radix. For example let's see some 32bit decimal (radix 10) floating-point numbers that have maximum 7 significant digits (significand):
Floating-point numbers are standardized by IEEE-754 2008, in two major flavors: binary and decimal. Binary floating-point numbers are supported by most computer systems in the forms of the float
, double
and long double
fundamental data types. While they are not required to be binary that is almost always the choice on modern binary computer architectures.
Floating-point approximation of real numbers creates a deliberate illusion. While it looks like we are working with real numbers, floating-point encodings are not able to represent real numbers precisely since they have a restricted number of digits in the significand. In fact, a 64 bit floating-point type can represent fewer distinct values than a 64 bit binary integer. Yet, because floating-point encodings can represent numbers over a much larger range, including extremely small (fractional) numbers, they are useful in practice.
Floating-point peculiarities may be split into three categories: those that are due to the (binary) radix/base, those that are inherent properties of any floating-point representation and finally those that are introduced by the IEEE-754 2008 standard. Decimal floating-point addresses the first set of surprises only; so users still need to be aware of the rest.
0.1 + 0.2 == 0.3
. The problem is not limited to binary floating-point. Decimal floating-point cannot represent the value of one third exactly.f == 0.0
then 0 - f
and -f
will not result in the same value, because 0 - f
will be +0.0' while -f
will be -0.0.<fenv.h>
C and <cfenv>
C++ headers). To learn more about the floating point environment read the subsection of the same title, but first make sure you read the next point as well.Notes: (*) IEEE Floating-point user is any person, hardware or software that uses the IEEE floating-point implementation.
NOTE: We currently do not give access to the user to the floating-point environment used by our decimal system, so description of it here is preliminary and generic. Note that since compilers and the C library already provides a (possibly binary floating-point only) environment and we cannot change that, our decimal floating-point environment implementation cannot conform to the C and C++ TRs (because those require extending the existing standard C library functions).
The floating-point environment provides implicit input and output parameters to floating-point operations (that are defined to use them). IEEE defined those parameters in principle, but how they are provided is left up to be designed/defined by the implementors of the programming languages.
C (and consequently C++) decided to provide a so-called floating-point environment that has "thread storage duration", meaning that each thread of a multi-threaded program will have its own distinct floating-point environment.
The C/C++ floating-point environment consists of 3 major parts: the rounding mode, the traps and the status flags.
A floating-point rounding direction determines how is the significand of a higher (or infinite) precision number get rounded to fit into the limited number of significant digits (significand) of the floating-point representation that needs to store it as a result of an operation. Note that the rounding is done in the radix of the representation, so binary floating-point will do binary rounding while decimal floating-point will do decimal rounding - and not all rounding modes are useful with all radixes. An example of a generally applicable rounding mode would be FE_TOWARDZERO
(round towards zero).
Most floating point operations in C and C++ do not take a rounding direction parameter (and the ones that are implemented as operators simply could not). When such operations (that do not have an explicit rounding direction parameter) need to do rounding, they use the rounding direction set in the floating-point environment (of their thread of execution).
Floating point operations in C and C++ do not take a status flag output parameter. They report an important events (such as underflow, overflow or in inexact (rounded) result) by setting the appropriate status flag in the floating-point environment (of their thread of execution). (Note that this is very similar to how flags work in CPUs, and that is not a coincidence.) The flags work much like individual, boolean errno
values. Operations may set them to true. Users may examine them (when interested) and also reset them (set them to 0) before an operation.
IEEE says that certain floating-point events are floating-point exceptions and they result in invoking a handler. It may be a default handler (set a status flag and continue) or a user defined handler. Floating point traps are a C invention to enable "sort-of handlers" for floating point exceptions, but unfortunately they all go to the same handler: the SIGFPE
handler. To add insult to injury, setting what traps are active (what will) cause a SIGFPE
) is not standardized. So floating-point exceptions and handlers are considered pretty much useless in C. (All is not lost, since we do have the status flags. An application that wants to know about floating-point events can clear the flags prior to an operation and check their values afterwards.)
The bdldfp_decimalutil utility component provides a set of decimal math functions that parallel those provided for binary floating point in the C++ standard math library. Errors during computation of these functions (e.g., domain errors) will be reported through the setting of errno
as described in the "Status Flags" section above. (Note that this method of reporting errors is atypical for BDE-provided interfaces, but matches the style used by the standard functions.)
A floating-point representation of a number is defined as follows: sign * significand * BASE^exponent
, where sign is -1 or +1, significand is an integer, BASE is a positive integer (but usually 2 or 10) and exponent is a negative or positive integer. Concrete examples of (decimal) numbers in the so-called scientific notation are: 123.4567 is 1.234567e2, while -0.000000000000000000000000000000000000001234567 would be -1.234567e-41.
"base": the number base of the scaling used by the exponent; and by the significand
"bias": the number added to the exponent before it is stored in memory; 101, 398 and 6176 for the 32, 64 and 128 bit types respectively.
"exponent": the scaling applied to the significand is calculated by raising the base to the exponent (which may be also negative)
"quantum": (IEEE-754) the value of one unit at the last significant digit position; in other words the smallest difference that can be represented by a floating-point number without changing its exponent.
"mantissa": the old name for the significand
"radix": another name for base
"sign": +1 or -1, determines if the number is positive or negative. It is normally represented by a single sign bit.
"significand": the significant digits of the floating-point number; the value of the number is: sign * significand * base^exponent
"precision": the significant digits of the floating-point type in its base
"decimal precision": the maximum significant decimal digits of the floating-point type
"range": the smallest and largest number the type can represent. Note that for floating-point types there are at least two interpretations of minimum. It may be the largest negative number or the smallest number in absolute value) that can be represented.
"normalized number": 1 <= significand <= base
"normalization": finding the exponent such as 1 <= significand <= base
"denormal number": significand < 1
"densely packed decimal": one of the two IEEE significand encoding schemes
"binary integer significand": one of the two IEEE significand encoding schemes
"cohorts": equal numbers encoded using different exponents (to signify accuracy)
Binary floating-point formats give best accuracy, they are the fastest (on binary computers), and were carefully designed by IEEE to minimize rounding errors (errors due to the inherent imprecision of floating-point types) during a lengthy calculation. This makes them the best solution for and serious scientific computation. However, they have a fatal flow when it comes to numbers and calculations that involve humans. Humans think in base 10 - decimal. And as the example has shown earlier, binary floating-point formats are unable to precisely represent very common decimal real numbers; with binary floating-point 0.1 + 0.2 != 0.3
. (Why? Because none of the three numbers in that expression have an exact binary floating-point representation.)
Financial calculations are governed by laws and expectations that are based on decimal (10 based) thinking. Due to the inherent limitations of the binary floating-point format, doing such decimal based calculations and algorithms using binary floating-point numbers is so involved and hard that that it is considered not feasible. The IEEE-754 committee have recognized the issue and added specifications for 3 decimal floating-point types into their 2008 standard: the 32, 64 and 128 bits decimal floating-point formats.
Floating-point types are carefully designed trade-offs between saving space (in memory), CPU cycles (for calculations) and still provide useful accuracy for computations. Decimal floating-point types represent further compromises (compared to binary floating-points) in being able to represent less numbers (than their binary counterparts) and being slower, but providing exact representations for the numbers humans care about.
In decimal floating-point world 0.1 + 0.2 == 0.3
, as humans expect; because each of those 3 numbers can be represented exactly in a decimal floating-point format.
Clients should be careful when using the conversions from float
and double
provided by this component. In situations where a float
or double
was originally obtained from a decimal floating point representation (e.g., a bdldfp::Decimal
, or a string, like "4.1"), the conversions in bdldfp_decimalconvertutil will provide the correct conversion back to a decimal floating point value. The conversions in this component provide the closest decimal floating point value to the supplied binary floating point representation, which may replicate imprecisions required to initially approximate the value in a binary representation. The conversions in this component are typically useful when converting binary floating point values that have undergone mathematical operations that require rounding (so they are already in-exact approximations).
In the binary floating-point world the formats are optimized for the highest precision, range and speed. They are stored normalized and therefore store no information about their accuracy. In finances, the area that decimal floating-point types target, accuracy of a number is usually very important. We may have a number that is 1, but we know it may be 1.001 or 1.002 etc. And we may have another number 1, which we know to be accurate to 6 significant digits. We would display the former number as 1.00
and the latter number as 1.00000
. The decimal floating-point types are able to store both numbers and their precision using so called cohorts. The 1.00
will be stored as 100e-2
while 1.00000
will be stored as 100000e-5
.
Cohorts compare equal, and mostly behave the same way in calculation except when it comes to the accuracy of the result. If I have a number that is accurate to 5 digits only, it would be a mistake to try to expect more than 5 digits accuracy from a calculation involving it. The IEEE-754 rules of cohorts (in calculations) ensures that results will be a cohort that indicates the proper expected accuracy.
The component has also been designed to resemble the C++ Decimal Floating-Point Technical Report ISO/IEC TR-24733 of 2009 and its C++11 updates of ISO/IEC JTC1 SC22 WG21 N3407=12-0097 of 2012 as much as it is possible with C++03 compilers and environments that do not provide decimal floating-point support in any form.
At the time of writing there is just one standard about decimal-floating point, the IEEE-754 2008 standard and the content of this component conforms to it. The component does not fully implement all required IEEE-754 functionality because due to our architectural design guidelines some of these must go into a separate so-called utility component.)
The component uses the ISO/IEC TR 24732 - the C Decimal Floating-Point TR - in its implementation where it is available.
The component closely resembles ISO/IEC TR 24733 - the C++ Decimal Floating-Point TR - but does not fully conform to it for several reasons. The major reasons are: it is well known that TR 24733 has to change before it is included into the C++ standard; the TR would require us to change system header files we do not have access to.
In the following subsections the differences to the C++ technical report are explained in detail, including a short rationale.
BDE design guidelines do not allow namespace level functions other than operators and aspects. According to BDE design principles all such functions are placed into a utility component.
This change is necessary to disable the use of comparison operators without explicit casting. See No Heterogeneous Comparisons Without Casting.
The C and C++ Decimal TRs refer to IEEE-754 for specifications of the heterogeneous comparison operators (comparing decimal floating-point types to binary floating-point types and integer types); however IEEE-754 does not specify such operations - leaving them unspecified. To make matters worse, there are two possible ways to implement those operators (convert the decimal to the other type, or convert the other type to decimal first) and depending on which one is chosen, the result of the operator will be different. Also, the C committee is considering the removal of those operators. We have removed them until we know how to implement them. Comparing decimal types to those other types is still possible, it just requires explicit casting/conversion from the user code.
IEEE-754 designates the 32 bit floating-point types "interchange formats" and does not require or recommend arithmetic or computing support of any kind for them. The C (and consequently the C++) TR goes against the IEEE design and requires _Decimal32
(and std::decimal32
) to provide computing support, however, in a twist, allows it to be performed using one of the larger types (64 or 128 bits). The rationale from the C committee is that small embedded systems may need to do their calculations using the small type (so they have made it mandatory for everyone). To conform the requirement we provide arithmetic and computing support for Decimal32 type but users need to be aware of the drawbacks of calculations using the small type. Industry experience with the float
C type (32bit floating-point type, usually binary) has shown that enabling computing using small floating-point types are a mistake that causes novice programmers to write calculations that are very slow and inaccurate.
We recommend what IEEE recommends: convert your 32 bit types on receipt to a type with higher precision (usually 64 bit will suffice), so you calculations using that larger type, and convert it back to 32 bit type only if your output interchange format requires it.
Due to BDE rules of design and some implementation needs we have extended the C++ TR mandated interface of the decimal floating-point types to include support for accessing the underlying data (type), to parse literals for the portable literal support.
Note that using any of these public member functions will render your code non-portable to non-BDE (but standards conforming) implementations.
A basic format type that supports input, output, relational operators construction from the TR mandates data types and arithmetic or operations. The type has the size of exactly 32 bits. It supports 7 significant decimal digits and an exponent range of -95 to 96. The smallest non-zero value that can be represented is 1e-101.
Portable Decimal32
literals are created using the BDLDFP_DECIMAL_DF
macro.
A basic format type that supports input, output, relational operators construction from the TR mandates data types and arithmetic or operations. The type has the size of exactly 64 bits. It supports 16 significant decimal digits and an exponent range of -383 to 384. The smallest non-zero value that can be represented is 1e-398.
Portable Decimal64
literals are created using the BDLDFP_DECIMAL_DD
macro.
A basic format type that supports input, output, relational operators construction from the TR mandates data types and arithmetic or operations. The type has the size of exactly 128 bits. It supports 34 significant decimal digits and an exponent range of -6143 to 6144. The smallest non-zero value that can be represented is 1e-6176.
Portable Decimal128
literals are created using the BDLDFP_DECIMAL_DL
macro.
Streaming decimal floating point numbers to an output stream supports formatting flags for width, capitalization and justification and flags used to output numbers in natural, scientific and fixed notations. When scientific or fixed flags are set then the precision manipulator specifies how many digits of the decimal number are to be printed, otherwise all significant digits of the decimal number are output using native notation.
The user-defined literal operator "" _d32
, operator "" _d64
, and operator "" _d128
are declared for the bdldfp::Decimal32
, bdldfp::Decimal64
, and bdldfp::Decimal128
types respectively . These user-defined literal suffixes can be applied to both numeric and string literals, (i.e., 1.2_d128, "1.2"_d128 or "inf"_d128) to produce a decimal floating-point value of the indicated type by parsing the argument string or numeric value:
The operators providing literals are available in the BloombergLP::bdldfp::literals::DecimalLiterals
namespace (where literals
and DecimalLiterals
are both inline namespaces). Because of inline namespaces, there are several viable options for a using declaration, but we recommend using namespace bdldfp::DecimalLiterals
, which minimizes the scope of the using declaration.
Note that the parsing follows the rules as specified for the strtod32
, strtod64
and strtod128
functions in section 9.6 of the ISO/EIC TR 247128 C Decimal Floating-Point Technical Report.
Also note that these operators can be used only if the compiler supports C++11 standard.
In this section, we show the intended usage of this component.
If your compiler does not support the C Decimal TR, it does not support decimal floating-point literals, only binary floating-point literals. The problem with binary floating-point literals is the same as with binary floating-point numbers in general: they cannot represent the decimal numbers we care about. To solve this problem there are 3 macros provided by this component that can be used to initialize decimal floating-point types with non-integer values, precisely. These macros will evaluate to real, C language literals where those are supported and to a runtime-parsed solution otherwise. The following code demonstrates the use of these macros as well as mixed-type arithmetics and comparisons:
Suppose we need to add two (decimal) numbers and then tell if the result is a particular decimal number or not. That can get difficult with binary floating-point, but easy with decimal:
#define BDLDFP_DECIMAL_DD | ( | lit | ) | BloombergLP::bdldfp::Decimal64(BDLDFP_DECIMALIMPUTIL_DD(lit)) |
#define BDLDFP_DECIMAL_DF | ( | lit | ) | BloombergLP::bdldfp::Decimal32(BDLDFP_DECIMALIMPUTIL_DF(lit)) |
#define BDLDFP_DECIMAL_DL | ( | lit | ) | BloombergLP::bdldfp::Decimal128(BDLDFP_DECIMALIMPUTIL_DL(lit)) |