Quick Links:

bal | bbl | bdl | bsl

Namespaces

Component bdlb_numericparseutil
[Package bdlb]

Provide conversions from text into fundamental numeric types. More...

Namespaces

namespace  bdlb

Detailed Description

Outline
Purpose:
Provide conversions from text into fundamental numeric types.
Classes:
bdlb::NumericParseUtil namespace for parsing functions
Description:
This component provides a namespace, bdlb::NumericParseUtil, containing utility functions for parsing ascii text representations of numeric values into the corresponding value of a fundamental C++ type (like int or double).
None of the parsing functions in this component consume leading whitespace. For parsing to succeed, the sought item must be found at the beginning of the input string.
The following two subsections describe the grammar defining the parsing rules.
Definition of Symbols Used in Production Rules:
The following grammar is used to specify regular expressions:
   -     Within brackets the minus means through.  For example, [a-z] is
         equivalent to [abcd...xyz].  The - can appear as itself only if used
         as the first or last character.  For example, the character class
         expression []-] matches the characters ] and -.

   |     Logical OR between two expressions means one must be present.

   ( ... ) Parentheses are used for grouping.  An operator, for example, *,
         +, {}, can work on a single character or on a regular expression
         enclosed in parentheses.  For example, (a*(cb+)*)$.
Grammar Production Rules:
 <NUMBER> ::= <OPTIONAL_SIGN><DIGIT>+
 <DECIMAL_NUMBER> ::= <OPTIONAL_SIGN><DECIMAL_DIGIT>+
 <POSITIVE_NUMBER> ::= <DIGIT>+
 <OPTIONAL_SIGN> ::= (+|-)?
 <DIGIT> ::= depending on base can include characters 0-9 and case-
      insensitive letters.  For example, octal digit is in the range
      [0 .. 7].
 <DECIMAL_DIGIT> ::= [0123456789]
 <OCTAL_DIGIT> ::= [01234567]
 <HEX_DIGIT> ::= [0123456789abcdefABCDEF]
 <SHORT> ::= <NUMBER>
      <SHORT> must be in range [SHRT_MIN .. SHRT_MAX].
 <USHORT> ::= <NUMBER>
      <USHORT> must be in range [0 .. USHRT_MAX].
 <INT> ::= <NUMBER>
      <INT> must be in range [INT_MIN .. INT_MAX].
 <INT64> ::= <NUMBER>
      <INT64> must be in range
                           [-0x8000000000000000uLL .. 0x7FFFFFFFFFFFFFFFuLL].
 <UNSIGNED> ::= <NUMBER>
      <UNSIGNED> must be in range [0 .. UINT_MAX].
 <UNSIGNED64> ::= <NUMBER>
      <UNSIGNED64> must be in range
                           [0 .. 0xFFFFFFFFFFFFFFFFuLL].
 <REAL> ::= <OPTIONAL_SIGN>
            (<DECIMAL_DIGIT>+ (. <DECIMAL_DIGIT>*)? | . <DECIMAL_DIGIT>+)
            (e|E <DECIMAL_NUMBER>)
 <INF>    ::= infinity | inf
              case insensitive
 <NAN-SEQUENCE> ::= [abcdefghijklmnopqrstuvwxyz0123456789_]*
 <NAN>    ::= nan(<NAN-SEQUENCE>) | nan
              case insensitive
 <DOUBLE> ::= <REAL> | <INF> | <NAN>
      <DOUBLE> must be in range [DBL_MIN .. DBL_MAX].
Remainder Output Parameter:
The parsing functions provided by bdlb::NumericParseUtil typically return an optional, second, output parameter named remainder. The output parameter remainder is loaded with a string reference starting at the character following the last character successfully parsed as part of the numeric value, and ending at the character one past the end of the input string. If the entire input string is parsed successfully, remainder is loaded with an empty string reference. However, if the parse function is not successful (i.e., it returns a non-zero error status), then it will not modify the value of remainder.
Floating Point Values:
The conversion from text to values of type double results in the closest representable value to the decimal text. Note that this is the same as for the standard library function strtod. For example, the ASCII string "3.14159" is converted, on some platforms, to 3.1415899999999999.
The strtod function is locale-dependent. It uses the LC_CTYPE and LC_NUMERIC locale categories from the C standard global locale established by setlocale. LC_CTYPE is used by strtod to skip leading whitespace, whereas LC_NUMERIC is used in the actual parsing of the number. Our implementation forbids leading whitespace. When verifying the lack of leading whitespace we use both our own locale-independent character classification function (in case LC_CTYPE would not classify ASCII whitespace properly), as well as the C global locale-dependent bsl::isspace to ensure that strtod will not skip some special whitespace characters and parse a string as fully-a-number by mistake. That allows us to ignore the LC_CTYPE locale category, however we still have to require LC_NUMERIC to be set to the "C" locale for strtod itself.
Special Floating Point Values:
The IEEE-754 (double precision) floating point format supports the following special values: Not-a-Number (NaN) and Infinity, both in positive or negative. parseDouble allows expressions for both:
infinity-expression: results in negative of positive bsl::numeric_limits<double>infinity() value. The expresssion consists of the following elements:
  • an optional plus (+) or minus (-) sign
  • the word "INF" or INFINITY", ignoring case
not-a-number-expression: results in a negative or positive bsl::numeric_limits<double>quiet_NaN() value. The expresssion consists of the following elements:
  • an optional plus (+) or minus (-) sign
  • "NAN" or "NAN(char-sequence)" ignoring the case of "NAN". The char-sequence may be empty or contain digits, letters from the Latin alphabet and underscores.
Warning: Microsoft Visual Studio 2013 Output for Infinity and NaN:
Microsoft Visual Studio 2013 generates surprising output text when printing (using printf) or streaming (using C++ iostream) the double representations for infinity and NaN. For example, infinity might be rendered "1.#INF00" and NaN might be rendered "1.#IND00" or "1.#NAN0". parseDouble will successfully parse this text but will not return the result one would naively expect (e.g., returning the value 1.0).
Usage Example:
In this section, we show the intended usage of this component.
Example 1: Parsing an Integer Value from a string_view:
Suppose that we have a string_view that presumably contains a (not necessarily NUL terminated) string representing a 32-bit integer value and we want to convert that string into an int (32-bit integer).
First, we create the string:
  bsl::string_view input("20171024", 4);
Then we create the output variables for the parser:
  int              year;
  bsl::string_view rest;
Next we call the parser function:
  int rv = bdlb::NumericParseUtil::parseInt(&year, &rest, input);
Then we verify the results:
  assert(0    == rv);
  assert(2017 == year);
  assert(rest.empty());