Provide conversions from text into fundamental numeric types.
More...
Namespaces |
namespace | bdlb |
Detailed Description
- Outline
-
-
- Purpose:
- Provide conversions from text into fundamental numeric types.
-
- Classes:
-
-
- Description:
- This component provides a namespace,
bdlb::NumericParseUtil
, containing utility functions for parsing ascii text representations of numeric values into the corresponding value of a fundamental C++ type (like int
or double
).
- None of the parsing functions in this component consume leading whitespace. For parsing to succeed, the sought item must be found at the beginning of the input string.
- The following two subsections describe the grammar defining the parsing rules.
-
- Definition of Symbols Used in Production Rules:
- The following grammar is used to specify regular expressions:
- Within brackets the minus means through. For example, [a-z] is
equivalent to [abcd...xyz]. The - can appear as itself only if used
as the first or last character. For example, the character class
expression []-] matches the characters ] and -.
| Logical OR between two expressions means one must be present.
( ... ) Parentheses are used for grouping. An operator, for example, *,
+, {}, can work on a single character or on a regular expression
enclosed in parentheses. For example, (a*(cb+)*)$.
-
- Grammar Production Rules:
<NUMBER> ::= <OPTIONAL_SIGN><DIGIT>+
<DECIMAL_NUMBER> ::= <OPTIONAL_SIGN><DECIMAL_DIGIT>+
<POSITIVE_NUMBER> ::= <DIGIT>+
<OPTIONAL_SIGN> ::= (+|-)?
<DIGIT> ::= depending on base can include characters 0-9 and case-
insensitive letters. For example, octal digit is in the range
[0 .. 7].
<DECIMAL_DIGIT> ::= [0123456789]
<OCTAL_DIGIT> ::= [01234567]
<HEX_DIGIT> ::= [0123456789abcdefABCDEF]
<SHORT> ::= <NUMBER>
<SHORT> must be in range [SHRT_MIN .. SHRT_MAX].
<USHORT> ::= <NUMBER>
<USHORT> must be in range [0 .. USHRT_MAX].
<INT> ::= <NUMBER>
<INT> must be in range [INT_MIN .. INT_MAX].
<INT64> ::= <NUMBER>
<INT64> must be in range
[-0x8000000000000000uLL .. 0x7FFFFFFFFFFFFFFFuLL].
<UNSIGNED> ::= <NUMBER>
<UNSIGNED> must be in range [0 .. UINT_MAX].
<UNSIGNED64> ::= <NUMBER>
<UNSIGNED64> must be in range
[0 .. 0xFFFFFFFFFFFFFFFFuLL].
<REAL> ::= <OPTIONAL_SIGN>
(<DECIMAL_DIGIT>+ (. <DECIMAL_DIGIT>*)? | . <DECIMAL_DIGIT>+)
(e|E <DECIMAL_NUMBER>)
<INF> ::= infinity | inf
case insensitive
<NAN-SEQUENCE> ::= [abcdefghijklmnopqrstuvwxyz0123456789_]*
<NAN> ::= nan(<NAN-SEQUENCE>) | nan
case insensitive
<DOUBLE> ::= <REAL> | <INF> | <NAN>
<DOUBLE> must be in range [DBL_MIN .. DBL_MAX].
-
- Remainder Output Parameter:
- The parsing functions provided by
bdlb::NumericParseUtil
typically return an optional, second, output parameter named remainder
. The output parameter remainder
is loaded with a string reference starting at the character following the last character successfully parsed as part of the numeric value, and ending at the character one past the end of the input string. If the entire input string is parsed successfully, remainder
is loaded with an empty string reference. However, if the parse function is not successful (i.e., it returns a non-zero error status), then it will not modify the value of remainder
.
-
- Floating Point Values:
- The conversion from text to values of type
double
results in the closest representable value to the decimal text. Note that this is the same as for the standard library function strtod
. For example, the ASCII string "3.14159" is converted, on some platforms, to 3.1415899999999999.
- The
strtod
function is locale-dependent. It uses the LC_CTYPE
and LC_NUMERIC
locale categories from the C standard global locale established by setlocale
. LC_CTYPE
is used by strtod
to skip leading whitespace, whereas LC_NUMERIC
is used in the actual parsing of the number. Our implementation forbids leading whitespace. When verifying the lack of leading whitespace we use both our own locale-independent character classification function (in case LC_CTYPE
would not classify ASCII whitespace properly), as well as the C global locale-dependent bsl::isspace
to ensure that strtod
will not skip some special whitespace characters and parse a string as fully-a-number by mistake. That allows us to ignore the LC_CTYPE
locale category, however we still have to require LC_NUMERIC
to be set to the "C" locale for strtod
itself.
-
- Special Floating Point Values:
- The IEEE-754 (double precision) floating point format supports the following special values: Not-a-Number (NaN) and Infinity, both in positive or negative.
parseDouble
allows expressions for both:
infinity-expression: results in negative of positive bsl::numeric_limits<double>infinity() value. The expresssion consists of the following elements:
-
an optional plus (
+
) or minus (-
) sign
-
the word "INF" or INFINITY", ignoring case
not-a-number-expression: results in a negative or positive bsl::numeric_limits<double>quiet_NaN() value. The expresssion consists of the following elements:
-
an optional plus (
+
) or minus (-
) sign
-
"NAN" or "NAN(char-sequence)" ignoring the case of "NAN". The char-sequence may be empty or contain digits, letters from the Latin alphabet and underscores.
-
- Warning: Microsoft Visual Studio 2013 Output for Infinity and NaN:
- Microsoft Visual Studio 2013 generates surprising output text when printing (using
printf
) or streaming (using C++ iostream) the double
representations for infinity and NaN. For example, infinity might be rendered "1.#INF00" and NaN might be rendered "1.#IND00" or "1.#NAN0". parseDouble
will successfully parse this text but will not return the result one would naively expect (e.g., returning the value 1.0).
-
- Usage Example:
- In this section, we show the intended usage of this component.
-
- Example 1: Parsing an Integer Value from a string_view:
- Suppose that we have a
string_view
that presumably contains a (not necessarily NUL terminated) string representing a 32-bit integer value and we want to convert that string into an int
(32-bit integer).
- First, we create the string:
bsl::string_view input("20171024", 4);
Then we create the output variables for the parser: int year;
bsl::string_view rest;
Next we call the parser function: Then we verify the results: assert(0 == rv);
assert(2017 == year);
assert(rest.empty());