BDE 4.14.0 Production release
Loading...
Searching...
No Matches
bdlb_chartype

Detailed Description

Outline

Purpose

Supply locale-independent version of <ctype.h> functionality.

Classes

See also
bdlb_string

Description

This component defines a utility class bdlb::CharType that provides an efficient, locale-independent alternative for the standard functionality found in <ctype.h>. The following character categories are supported (note that ODIGIT, IDENT, ALUND, ALL, and NONE are new):

============================================================
Category Description
-------- -------------------------------------------------
UPPER [A-Z]
LOWER [a-z]
ALPHA [A-Za-z]
ODIGIT [0-7]
DIGIT [0-9]
XDIGIT [0-9A-Fa-f]
ALNUM [0-9A-Za-z]
SPACE [space|tab|CR|NL|VT|FF]
PRINT any printable character including SPACE
GRAPH any printable character except SPACE
PUNCT any printable character except SPACE or ALNUM
CNTRL [\0-\37] and \177 (in standard ASCII, see below)
ASCII [\0-\177]
IDENT [ALNUM|_]
ALUND [ALPHA|_]
ALL any 8-bit value
NONE []
============================================================

Supported functionality includes determining whether a character is a member of a given bdlb::CharType and also providing a null-terminated, contiguous sequence (and character count) for each character category. Additionally, the standard conversion methods toUpper and toLower are also provided.

Note that this component assumes the ASCII character set with standard encodings, which is sufficient for all currently supported platforms.

ASCII Character Set

The following table provides a reference for the ASCII character set:

Decimal Hexadecimal Key Meaning
------- ----------- --- -------
0 0x00 ^@ NULL
1 0x01 ^A Start Heading
2 0x02 ^B Start Text
3 0x03 ^C End Text
4 0x04 ^D End of transmission
5 0x05 ^E Enquiry
6 0x06 ^F Acknowledge
7 0x07 ^G Bell
8 0x08 ^H Backspace
9 0x09 ^I Horizontal Tab
10 0x0A ^J Newline (Linefeed)
11 0x0B ^K Vertical Tab
12 0x0C ^L Form Feed
13 0x0D ^M Carriage Return
14 0x0E ^N Shift Out
15 0x0F ^O Shift In
16 0x10 ^P Data Link Escape
17 0x11 ^Q Device Control 1
18 0x12 ^R Device Control 2
19 0x13 ^S Device Control 3
20 0x14 ^T Device Control 4
21 0x15 ^U Negative Acknowledgement
22 0x16 ^V Synchronous Idle
23 0x17 ^W End of transmission Block
24 0x18 ^X Cancel
25 0x19 ^Y End of Medium
26 0x1A ^Z Substitute
27 0x1B ^[ Escape
28 0x1C ^\ File Separator
29 0x1D ^] Group Separator
30 0x1E ^^ Record Separator
31 0x1F ^_ Unit Separator
32 0x20 (space)
33 0x21 !
34 0x22 "
35 0x23 #
36 0x24 $
37 0x25 %
38 0x26 &
39 0x27 '
40 0x28 (
41 0x29 )
42 0x2A *
43 0x2B +
44 0x2C ,
45 0x2D -
46 0x2E .
47 0x2F /
48-57 0x30-0x39 0-9
58 0x3A :
59 0x3B ;
60 0x3C <
61 0x3D =
62 0x3E >
63 0x3F ?
64 0x40 @
65-90 0x41-0x5A A-Z
91 0x5B [
92 0x5C \ backslash
93 0x5D ]
94 0x5E ^
95 0x5F _
96 0x60 `
97-122 0x61-0x7A a-z
123 0x7B {
124 0x7C |
125 0x7D }
126 0x7E ~
127 0x75 ^? Delete (Rubout)

Category Definitions

The following table defines the members of each category:

UPPER
: LOWER
: : ALPHA
: : : ODIGIT
: : : : DIGIT
: : : : : XDIGIT
: : : : : : ALNUM
: : : : : : : SPACE
: : : : : : : : PRINT
: : : : : : : : : GRAPH
: : : : : : : : : : PUNCT
: : : : : : : : : : : CNTRL
: : : : : : : : : : : : ASCII
: : : : : : : : : : : : : IDENT
: : : : : : : : : : : : : : ALUND
: : : : : : : : : : : : : : : ALL
: : : : : : : : : : : : : : : : NONE
Dec Hex : : : : : : : : : : : : : : : : : Char
--- --- - - - - - - - - - - - - - - - - - ----
0 0 _ _ _ _ _ _ _ _ _ _ _ C A _ _ A _ ^@
1 1 _ _ _ _ _ _ _ _ _ _ _ C A _ _ A _ ^A
2 2 _ _ _ _ _ _ _ _ _ _ _ C A _ _ A _ ^B
3 3 _ _ _ _ _ _ _ _ _ _ _ C A _ _ A _ ^C
4 4 _ _ _ _ _ _ _ _ _ _ _ C A _ _ A _ ^D
5 5 _ _ _ _ _ _ _ _ _ _ _ C A _ _ A _ ^E
6 6 _ _ _ _ _ _ _ _ _ _ _ C A _ _ A _ ^F
7 7 _ _ _ _ _ _ _ _ _ _ _ C A _ _ A _ ^G
8 8 _ _ _ _ _ _ _ _ _ _ _ C A _ _ A _ ^H
9 9 _ _ _ _ _ _ _ S _ _ _ C A _ _ A _ ^I
10 A _ _ _ _ _ _ _ S _ _ _ C A _ _ A _ ^J
11 B _ _ _ _ _ _ _ S _ _ _ C A _ _ A _ ^K
12 C _ _ _ _ _ _ _ S _ _ _ C A _ _ A _ ^L
13 D _ _ _ _ _ _ _ S _ _ _ C A _ _ A _ ^M
14 E _ _ _ _ _ _ _ _ _ _ _ C A _ _ A _ ^N
15 F _ _ _ _ _ _ _ _ _ _ _ C A _ _ A _ ^O
16 10 _ _ _ _ _ _ _ _ _ _ _ C A _ _ A _ ^P
17 11 _ _ _ _ _ _ _ _ _ _ _ C A _ _ A _ ^Q
18 12 _ _ _ _ _ _ _ _ _ _ _ C A _ _ A _ ^R
19 13 _ _ _ _ _ _ _ _ _ _ _ C A _ _ A _ ^S
20 14 _ _ _ _ _ _ _ _ _ _ _ C A _ _ A _ ^T
21 15 _ _ _ _ _ _ _ _ _ _ _ C A _ _ A _ ^U
22 16 _ _ _ _ _ _ _ _ _ _ _ C A _ _ A _ ^V
23 17 _ _ _ _ _ _ _ _ _ _ _ C A _ _ A _ ^W
24 18 _ _ _ _ _ _ _ _ _ _ _ C A _ _ A _ ^X
25 19 _ _ _ _ _ _ _ _ _ _ _ C A _ _ A _ ^Y
26 1A _ _ _ _ _ _ _ _ _ _ _ C A _ _ A _ ^Z
27 1B _ _ _ _ _ _ _ _ _ _ _ C A _ _ A _ ^[
28 1C _ _ _ _ _ _ _ _ _ _ _ C A _ _ A _ ^/
29 1D _ _ _ _ _ _ _ _ _ _ _ C A _ _ A _ ^]
30 1E _ _ _ _ _ _ _ _ _ _ _ C A _ _ A _ ^^
31 1F _ _ _ _ _ _ _ _ _ _ _ C A _ _ A _ ^_
32 20 _ _ _ _ _ _ _ S P _ _ _ A _ _ A _
33 21 _ _ _ _ _ _ _ _ P G P _ A _ _ A _ !
34 22 _ _ _ _ _ _ _ _ P G P _ A _ _ A _ "
35 23 _ _ _ _ _ _ _ _ P G P _ A _ _ A _ #
36 24 _ _ _ _ _ _ _ _ P G P _ A _ _ A _ $
37 25 _ _ _ _ _ _ _ _ P G P _ A _ _ A _ %
38 26 _ _ _ _ _ _ _ _ P G P _ A _ _ A _ &
39 27 _ _ _ _ _ _ _ _ P G P _ A _ _ A _ '
40 28 _ _ _ _ _ _ _ _ P G P _ A _ _ A _ (
41 29 _ _ _ _ _ _ _ _ P G P _ A _ _ A _ )
42 2A _ _ _ _ _ _ _ _ P G P _ A _ _ A _ *
43 2B _ _ _ _ _ _ _ _ P G P _ A _ _ A _ +
44 2C _ _ _ _ _ _ _ _ P G P _ A _ _ A _ ,
45 2D _ _ _ _ _ _ _ _ P G P _ A _ _ A _ -
46 2E _ _ _ _ _ _ _ _ P G P _ A _ _ A _ .
47 2F _ _ _ _ _ _ _ _ P G P _ A _ _ A _ /
48 30 _ _ _ O D X A _ P G _ _ A I _ A _ 0
49 31 _ _ _ O D X A _ P G _ _ A I _ A _ 1
50 32 _ _ _ O D X A _ P G _ _ A I _ A _ 2
51 33 _ _ _ O D X A _ P G _ _ A I _ A _ 3
52 34 _ _ _ O D X A _ P G _ _ A I _ A _ 4
53 35 _ _ _ O D X A _ P G _ _ A I _ A _ 5
54 36 _ _ _ O D X A _ P G _ _ A I _ A _ 6
55 37 _ _ _ O D X A _ P G _ _ A I _ A _ 7
56 38 _ _ _ _ D X A _ P G _ _ A I _ A _ 8
57 39 _ _ _ _ D X A _ P G _ _ A I _ A _ 9
58 3A _ _ _ _ _ _ _ _ P G P _ A _ _ A _ :
59 3B _ _ _ _ _ _ _ _ P G P _ A _ _ A _ ;
60 3C _ _ _ _ _ _ _ _ P G P _ A _ _ A _ <
61 3D _ _ _ _ _ _ _ _ P G P _ A _ _ A _ =
62 3E _ _ _ _ _ _ _ _ P G P _ A _ _ A _ >
63 3F _ _ _ _ _ _ _ _ P G P _ A _ _ A _ ?
64 40 _ _ _ _ _ _ _ _ P G P _ A _ _ A _ @
65 41 U _ A _ _ X A _ P G _ _ A I A A _ A
66 42 U _ A _ _ X A _ P G _ _ A I A A _ B
67 43 U _ A _ _ X A _ P G _ _ A I A A _ C
68 44 U _ A _ _ X A _ P G _ _ A I A A _ D
69 45 U _ A _ _ X A _ P G _ _ A I A A _ E
70 46 U _ A _ _ X A _ P G _ _ A I A A _ F
71 47 U _ A _ _ _ A _ P G _ _ A I A A _ G
72 48 U _ A _ _ _ A _ P G _ _ A I A A _ H
73 49 U _ A _ _ _ A _ P G _ _ A I A A _ I
74 4A U _ A _ _ _ A _ P G _ _ A I A A _ J
75 4B U _ A _ _ _ A _ P G _ _ A I A A _ K
76 4C U _ A _ _ _ A _ P G _ _ A I A A _ L
77 4D U _ A _ _ _ A _ P G _ _ A I A A _ M
78 4E U _ A _ _ _ A _ P G _ _ A I A A _ N
79 4F U _ A _ _ _ A _ P G _ _ A I A A _ O
80 50 U _ A _ _ _ A _ P G _ _ A I A A _ P
81 51 U _ A _ _ _ A _ P G _ _ A I A A _ Q
82 52 U _ A _ _ _ A _ P G _ _ A I A A _ R
83 53 U _ A _ _ _ A _ P G _ _ A I A A _ S
84 54 U _ A _ _ _ A _ P G _ _ A I A A _ T
85 55 U _ A _ _ _ A _ P G _ _ A I A A _ U
86 56 U _ A _ _ _ A _ P G _ _ A I A A _ V
87 57 U _ A _ _ _ A _ P G _ _ A I A A _ W
88 58 U _ A _ _ _ A _ P G _ _ A I A A _ X
89 59 U _ A _ _ _ A _ P G _ _ A I A A _ Y
90 5A U _ A _ _ _ A _ P G _ _ A I A A _ Z
91 5B _ _ _ _ _ _ _ _ P G P _ A _ _ A _ [
92 5C _ _ _ _ _ _ _ _ P G P _ A _ _ A _ '\'
93 5D _ _ _ _ _ _ _ _ P G P _ A _ _ A _ ]
94 5E _ _ _ _ _ _ _ _ P G P _ A _ _ A _ ^
95 5F _ _ _ _ _ _ _ _ P G P _ A I A A _ _
96 60 _ _ _ _ _ _ _ _ P G P _ A _ _ A _ `
97 61 _ L A _ _ X A _ P G _ _ A I A A _ a
98 62 _ L A _ _ X A _ P G _ _ A I A A _ b
99 63 _ L A _ _ X A _ P G _ _ A I A A _ c
100 64 _ L A _ _ X A _ P G _ _ A I A A _ d
101 65 _ L A _ _ X A _ P G _ _ A I A A _ e
102 66 _ L A _ _ X A _ P G _ _ A I A A _ f
103 67 _ L A _ _ _ A _ P G _ _ A I A A _ g
104 68 _ L A _ _ _ A _ P G _ _ A I A A _ h
105 69 _ L A _ _ _ A _ P G _ _ A I A A _ i
106 6A _ L A _ _ _ A _ P G _ _ A I A A _ j
107 6B _ L A _ _ _ A _ P G _ _ A I A A _ k
108 6C _ L A _ _ _ A _ P G _ _ A I A A _ l
109 6D _ L A _ _ _ A _ P G _ _ A I A A _ m
110 6E _ L A _ _ _ A _ P G _ _ A I A A _ n
111 6F _ L A _ _ _ A _ P G _ _ A I A A _ o
112 70 _ L A _ _ _ A _ P G _ _ A I A A _ p
113 71 _ L A _ _ _ A _ P G _ _ A I A A _ q
114 72 _ L A _ _ _ A _ P G _ _ A I A A _ r
115 73 _ L A _ _ _ A _ P G _ _ A I A A _ s
116 74 _ L A _ _ _ A _ P G _ _ A I A A _ t
117 75 _ L A _ _ _ A _ P G _ _ A I A A _ u
118 76 _ L A _ _ _ A _ P G _ _ A I A A _ v
119 77 _ L A _ _ _ A _ P G _ _ A I A A _ w
120 78 _ L A _ _ _ A _ P G _ _ A I A A _ x
121 79 _ L A _ _ _ A _ P G _ _ A I A A _ y
122 7A _ L A _ _ _ A _ P G _ _ A I A A _ z
123 7B _ _ _ _ _ _ _ _ P G P _ A _ _ A _ {
124 7C _ _ _ _ _ _ _ _ P G P _ A _ _ A _ |
125 7D _ _ _ _ _ _ _ _ P G P _ A _ _ A _ }
126 7E _ _ _ _ _ _ _ _ P G P _ A _ _ A _ ~
127 7F _ _ _ _ _ _ _ _ _ _ _ C A _ _ A _ ^?

Usage

This section illustrates intended use of this component.

Example 1: Validating C-Style Identifiers

The character category extensions IDENT and ALUND are particularly useful for parsing C-style identifier names as described by the following regular expression:

[A-Za-z_]([A-Za-z0-9_])*

The first character is required and must be in category ALUND. All subsequent characters are optional and must be in category IDENT:

bool isIdentifier(const char *token)
// Return 'true' if the specified 'token' conforms to the requirements
// of a C-style identifier, and 'false' otherwise.
{
assert(token);
if (!bdlb::CharType::isAlund(*token)) {
return false; // bad required first character // RETURN
}
for (const char *p = token + 1; *p; ++p) {
return false; // bad optional subsequent character // RETURN
return true;
}
static bool isAlund(char character)
Definition bdlb_chartype.h:884
static bool isIdent(char character)
Definition bdlb_chartype.h:878