BDE 4.14.0 Production release
Loading...
Searching...
No Matches
bdlb_numericparseutil

Detailed Description

Outline

Purpose

Provide conversions from text into fundamental numeric types.

Classes

Description

This component provides a namespace, bdlb::NumericParseUtil, containing utility functions for parsing ascii text representations of numeric values into the corresponding value of a fundamental C++ type (like int or double).

None of the parsing functions in this component consume leading whitespace. For parsing to succeed, the sought item must be found at the beginning of the input string.

The following two subsections describe the grammar defining the parsing rules.

Definition of Symbols Used in Production Rules

The following grammar is used to specify regular expressions:

- Within brackets the minus means through. For example, [a-z] is
equivalent to [abcd...xyz]. The - can appear as itself only if used
as the first or last character. For example, the character class
expression []-] matches the characters ] and -.
| Logical OR between two expressions means one must be present.
( ... ) Parentheses are used for grouping. An operator, for example, *,
+, {}, can work on a single character or on a regular expression
enclosed in parentheses. For example, (a*(cb+)*)$.

Grammar Production Rules

<NUMBER> ::= <OPTIONAL_SIGN><DIGIT>+
<DECIMAL_NUMBER> ::= <OPTIONAL_SIGN><DECIMAL_DIGIT>+
<POSITIVE_NUMBER> ::= <DIGIT>+
<OPTIONAL_SIGN> ::= (+|-)?
<DIGIT> ::= depending on base can include characters 0-9 and case-
insensitive letters. For example, octal digit is in the range
[0 .. 7].
<DECIMAL_DIGIT> ::= [0123456789]
<OCTAL_DIGIT> ::= [01234567]
<HEX_DIGIT> ::= [0123456789abcdefABCDEF]
<SHORT> ::= <NUMBER>
<SHORT> must be in range [SHRT_MIN .. SHRT_MAX].
<USHORT> ::= <NUMBER>
<USHORT> must be in range [0 .. USHRT_MAX].
<INT> ::= <NUMBER>
<INT> must be in range [INT_MIN .. INT_MAX].
<INT64> ::= <NUMBER>
<INT64> must be in range
[-0x8000000000000000uLL .. 0x7FFFFFFFFFFFFFFFuLL].
<UNSIGNED> ::= <NUMBER>
<UNSIGNED> must be in range [0 .. UINT_MAX].
<UNSIGNED64> ::= <NUMBER>
<UNSIGNED64> must be in range
[0 .. 0xFFFFFFFFFFFFFFFFuLL].
<DECIMAL_EXPONENT> ::= <DECIMAL_NUMBER>
<REAL> ::= (<DECIMAL_DIGIT>+ (. <DECIMAL_DIGIT>*)? | . <DECIMAL_DIGIT>+)
(e|E <DECIMAL_EXPONENT>)
<INF> ::= infinity | inf
all case insensitive
<NAN-SEQUENCE> ::= [abcdefghijklmnopqrstuvwxyz0123456789_]*
case insensitive
<NAN> ::= nan(<NAN-SEQUENCE>) | nan
all case insensitive
<DOUBLE> ::= <OPTIONAL_SIGN> (<REAL> | <INF> | <NAN>)
<DOUBLE> must be in range [DBL_MIN .. DBL_MAX], or Nan, or Infinity.

Remainder Output Parameter

The parsing functions provided by bdlb::NumericParseUtil typically return an optional, second, output parameter named remainder. The output parameter remainder is loaded with a string reference starting at the character following the last character successfully parsed as part of the numeric value, and ending at the character one past the end of the input string. If the entire input string is parsed successfully, remainder is loaded with an empty string reference. However, if the parse function is not successful (i.e., it returns a non-zero error status), then it will not modify the value of remainder.

Floating Point Values

The conversion from text to values of type double results in the closest representable value to the decimal text. Note that this is the same as for the standard library function strtod. For example, the ASCII string "3.14159" is converted, on some platforms, to 3.1415899999999999.

The strtod function is locale-dependent. It uses the LC_CTYPE and LC_NUMERIC locale categories from the C standard global locale established by setlocale. LC_CTYPE is used by strtod to skip leading whitespace, whereas LC_NUMERIC is used in the actual parsing of the number. Our implementation forbids leading whitespace. When verifying the lack of leading whitespace we use both our own locale-independent character classification function (in case LC_CTYPE would not classify ASCII whitespace properly), as well as the C global locale-dependent bsl::isspace to ensure that strtod will not skip some special whitespace characters and parse a string as fully-a-number by mistake. That allows us to ignore the LC_CTYPE locale category, however we still have to require LC_NUMERIC to be set to the "C" locale for strtod itself.

Special Floating Point Values

The IEEE-754 (double precision) floating point format supports the following special values: Not-a-Number (NaN) and Infinity, both in positive or negative. parseDouble allows expressions for both:

Usage Example

In this section, we show the intended usage of this component.

Example 1: Parsing an Integer Value from a string_view

Suppose that we have a string_view that presumably contains a (not necessarily NUL terminated) string representing a 32-bit integer value and we want to convert that string into an int (32-bit integer).

First, we create the string:

const bsl::string_view input("20171024", 4);
Definition bslstl_stringview.h:441

Then we create the output variables for the parser:

int year;

Next we call the parser function:

const int rv = bdlb::NumericParseUtil::parseInt(&year, &rest, input);
static int parseInt(int *result, const bsl::string_view &inputString, int base=10)
Definition bdlb_numericparseutil.h:555

Then we verify the results:

assert(0 == rv);
assert(2017 == year);
assert(rest.empty());
BSLS_KEYWORD_CONSTEXPR bool empty() const BSLS_KEYWORD_NOEXCEPT
Return true if this view has length 0, and false otherwise.
Definition bslstl_stringview.h:1703