Quick Links:

bal | bbl | bdl | bsl

Namespaces

Component bdld_datum
[Package bdld]

Provide a discriminated variant type with a small footprint. More...

Namespaces

namespace  bdld

Detailed Description

Outline
Purpose:
Provide a discriminated variant type with a small footprint.
Classes:
bdld::Datum POD type representing general-purpose values
bdld::DatumArrayRef type for const ref to array of datums
bdld::DatumIntMapEntry type for entry inside int-map of datums
bdld::DatumIntMapRef type for const ref to int-map of datums
bdld::DatumMapEntry type for entry inside map of datums
bdld::DatumMapRef type for const ref to map of datums
bdld::DatumMutableArrayRef type for mutable ref to array of datums
bdld::DatumMutableMapRef type for mutable ref to a map of datums
bdld::DatumMutableMapOwningKeysRef mutable ref to a map owning keys
See also:
Component bdld_datumerror, Component bdld_datumudt, Component bdld_datumbinaryref, Component bdld_manageddatum
Description:
This component defines a mechanism, bdld::Datum, that provides a space-efficient discriminated union (i.e., a variant) that holds the value of either a scalar type (e.g., int, double, Date) or an aggregate (i.e., array or map) of Datum objects. The set of possible types that a datum may hold is described in the Supported Types section.
The Datum class is implemented as a POD-type, such that instances of the class are bitwise copyable and have trivial initialization, assignment and destruction. The Datum class is also (primarily) designed to be compact, especially on a 32-bit platform. Being a compact POD type, Datum is ideal for applications creating and copying very large numbers of variant values (the canonical use-case is for the values in a spreadsheet).
However, not all representable values can be stored in-line in footprint of a Datum object itself. Those types may require memory be allocated for storage. In order to keep the footprint of a Datum object as small as possible, a Datum object does not hold a reference to an allocator, and so memory must be explicitly managed by the user of Datum. See Memory Management for more details.
Notion of Value:
Datum has a notion of value, but is neither a value-semantic type, nor is it an in-core value-semantic type (see bsldoc_glossary). A consequence of the Datum class's space-efficient design is that it does not fall neatly into any of the standard BDE type-classifications. The Datum type's notion of value is expressed by its equality-operator -- notice, in particular, that two Datum objects compare equal if the values they refer to are the same. However, Datum, as a POD, has compiler supplied copy and assignment operators that do not copy any of the storage a Datum may be pointing to, and only copy the address to which the Datum is pointing.
Notice that the differing treatment of references to external data between the equality comparison and the copy and assignment operations violates a couple properties required of a value-semantic type, most obviously: "The value of an object of the type is independent of any modifiable state that is not owned exclusively by that object." (see <A CLASS="el" HREF="group__bsldoc__glossary.html">bsldoc_glossary).
Special Floating Point Values:
Floating point data can represent special values, and of particular interest for Datum are values of NaN and infinity. Datum may internally store NaN and infinity values in a different way than the IEEE-754 representation, and this section describes the resulting behavior for NaN and infinity values.
Treatment of NaN (Not-A-Number):
When storing a NaN value in a Datum, Datum guarantees only that a NaN value will be represented, but does not guarantee that the particular bit pattern supplied for a NaN value will be preserved. Note that an IEEE-754 representation for double allows for signaling and quiet NaN values, as well as a sign bit, and other bits of NaN payload data. These non-salient elements of the "value" of the double may not be preserved (and in the case of signaling NaNs, cannot be preserved on some platforms).
Treatment of Infinity:
Datum is provides unique representations for positive and negative infinity. IEEE-754 double precisions format requires also only those two infinity values. (Unlike NaN values, these two infinity values have no non-normative bits in their representations, or signaling/quiet forms.)
Immutability:
Datum objects are generally immutable, meaning the value stored inside a Datum object cannot be changed except through the assignment operation. A Datum is copy-assignable, so a Datum object can assigned another Datum object. On assignment, a Datum object is "shallow-copied". Meaning that the footprint of original Datum object is copied into the footprint of the destination Datum object, but if the Datum refers to dynamically allocated memory, only the value of the address is copied (not the contents of the dynamic allocation). Datum also exposes a clone method to "deep-copy" Datum objects, so that any externally allocated memory (except user defined types) is cloned and not shared like copy-assignment. See also Deep Copying.
Memory Management:
A primary design goal for Datum is space-efficiency, particularly on 32-bit platforms. In order to minimize the foot-print (i.e., the sizeof) of a Datum object, Datum does not hold a reference to the allocator that was used to allocate its contents. This component provides static functions that allocate dynamic data structures referred to by a Datum object (i.e. the Datum::create* static functions). This memory is said to be "externally managed" because it not released when a Datum object is destroyed, instead clients must explicitly call Datum::destroy on a Datum to release its memory (see Analogy to Raw Pointers). The bdld package provides tools and components that can simplify the process of managing the memory (see bdld_manageddatum, and the various builder components like bdld_datumarraybuilder).
Analogy to Raw Pointers:
A good way to understand the model for a Datum object's relationship to its data is by analogy: The relationship between a Datum object and the memory to which it refers is analogous to that of a raw-pointer and the data to which it points. Where new and delete are used allocate and free memory a that a pointer points to, the static class methods Datum::create* and Datum::destroy are used to allocate and release the memory a Datum refers to.
In order to create a Datum object a client calls one of the create* static methods on the Datum class. In order to release the data a Datum holds, a client calls destroy.
Copying, or copy assigning a Datum object to another behaves just like copying a raw pointer. This copy does not allocate or deallocate data. That also means assigning to a datum object is not safe if the Datum being assigned to refers to dynamically allocated memory, and there isn't a (user controlled) strategy in place to release that memory.
Deep Copying:
Datum exposes a clone method that "deep-copies" Datum objects, so that any dynamically or externally referenced memory is cloned and not shared like it would be when using a copy or copy-assignment operation. The exception is User Defined Types as they are opaque, so Datum has no way to deep-copy them.
The purpose of clone is to create an independent copy of the content of any Datum, which also includes Datum values where isExternalreference returns true (except of course UDTs, as mentioned above). Cloning a reference to a string results in an owned string, not a reference to a string, with the cloned Datum object's isExternalReference returning false. When cloning a map with keys that are references to external strings the clone will have deep copies of those string keys, it will become a map with owned keys. This behavior is intentional. The deep-copy operation (clone) is designed to ensure that the lifetime of the new clone does not, in any way, depend on the lifetime of the original Datum, or any data that Datum may have referenced. So (except for UDTs), if a Datum is cloned, the original Datum can be destroyed without any effect on the cloned Datum.
Creating a Datum that Requires No Allocation:
Datum's containing certain types of scalar values do not require any memory allocation, so their factory functions do not take an allocator. These values are small enough that they can always fit inside of the footprint of the Datum object itself.
  Datum boolean = Datum::createBoolean(true);   // Create a boolean datum
  Datum integer = Datum::createInteger(7);      // Create a integer
  Datum    real = Datum::createDouble(2.0);     // Create a double
Creating a Datum that May Require Allocation:
Datum objects containing certain types may (or may-*not*!) require memory allocation, so their creation functions require an allocator:
  bslma::Allocator *allocator = bslma::Default::defaultAllocator();
  Datum datetime = Datum::createDatetime(bdlt::Datetime(), allocator);
  Datum int64    = Datum::createInteger64(1LL, allocator);
In the example above, createDatetime takes an allocator, but may not allocate memory. Depending on the value of the Datetime, a Datum might either store the value within the footprint of the Datum (requiring no allocation) or allocate external storage. The situations in which creation functions taking an allocator do, and do not, actually allocate memory is implementation-*defined*.
Clients of Datum should treat any creation function taking an allocator as-if it allocated memory, and eventually call Datum::destroy on the resulting Datum, even though in some instances memory allocation may not be required.
Destroying a Datum Object:
The contents of a Datum object are destroyed using the static method destroy. For example:
  bslma::Allocator *allocator = bslma::Default::defaultAllocator();
  Datum datetime = Datum::createDatetime(bdlt::Datetime(), allocator);

  Datum::destroy(datetime, allocator);
     // 'datetime' now refers to deallocated memory.  It cannot be used
     // used unless it is assigned a new value.
Notice that the destroyed Datum again behaves similar to a raw-pointer that has been deallocated: the destroyed Datum refers to garbage and must be assigned a new value before it can be used.
For aggregate types -- i.e., maps and arrays -- destroy will recursively call destroy on the Datum objects that compose the aggregate. The exception to this is references to external arrays (discussed below).
The destroy method does not nothing for User Defined Types as they are opaque, unknown, for Datum.
References to External Strings and Arrays:
Although a Datum does not own memory in the traditional sense, a call to Datum::destroy will release the memory to which that Datum refers. However, a Datum object also allows a user to create a Datum referring to an externally managed array or string. For a Datum having a reference to an external string or array, the isExternalReference method will return true and Datum::destroy will not deallocate memory for the data; otherwise, isExternalReference will return false and Datum::destroy will deallocate memory for the data.
For example, to create a Datum for an externally managed string:
  Datum externalStringRef = Datum::createStringRef("text", allocator);
Notice that the supplied allocator is not used to allocate memory in order copy the contents of the string, but may (or may-*not*) be used to allocate meta-data that the Datum stores about the string (e.g., the string's length).
To create a Datum that is responsible for the memory of a string:
  Datum managedString = Datum::copyString("text", allocator);
Here the contents of the string are copied and managed by the created datum, and later released by Datum::destroy.
External references to arrays and strings are important for efficiently handling memory allocations in situations where a string or array is externally supplied (e.g., as input to a function) and will clearly outlive the Datum object being created (e.g., a Datum variable within the scope of that function).
In general factory methods of the form create*Ref create a reference to external data that the Datum is not responsible for, while copy* methods copy the data and the resulting Datum is responsible for the allocated memory.
Supported Types:
The table below describes the set of types that a Datum may be.
                        external   requires
  dataType              reference  allocation  Description
  --------              ---------  ----------  -----------
  e_NIL                 no         no          null value
  e_INTEGER             no         no          integer value
  e_DOUBLE              no         no          double value
  e_STRING              maybe      maybe       string value
  e_BOOLEAN             no         no          boolean value
  e_ERROR               no         maybe       error value
  e_DATE                no         no          date value
  e_TIME                no         no          time value
  e_DATETIME            no         maybe       date+time value
  e_DATETIME_INTERVAL   no         maybe       date+time interval value
  e_INTEGER64           no         maybe       64-bit integer value
  e_USERDEFINED         always     maybe       pointer to a user-defined obj
  e_BINARY              no         maybe       binary data
  e_DECIMAL64           no         maybe       Decimal64

                        external   requires
  dataType              reference  allocation  Description
  --------              ---------  ----------  -----------
  e_ARRAY               maybe      maybe       array
  e_MAP                 no         maybe       map keyed by string values
  e_INT_MAP             no         maybe       map keyed by 32-bit int values
  • dataType - the value returned by the type()
  • external-reference - whether isExternalReference will return true, in which case Datum::destroy will not release the externally referenced data (see References to External Strings and Arrays})
  • requires-allocation - whether a Datum referring to this type requires memory allocation. Note that for externally represented string or arrays, meta-data may still need to be allocated.
User Defined Types:
Datum exposes a type DatumUdt with which a user can arbitrarily expand the set of types a Datum can support. A DatumUdt object hold a void pointer, and an integer value identifying the type. A DatumUdt object is always treated as an external reference, and the memory it refers to is not released by Datum::destroy, or deep-copied by clone. The meaning of the integer type identifier is determined by the application, which is responsible for ensuring the set of "user-defined" type identifiers remains unique. From the viewpoint of Datum a UDT is an opaque pointer with an integer value that holds no defined meaning. In that sense it is more akin akin to a void pointer than to any of the other kind of values a Datum may hold. All knowledge of what the pointer and integer value means is elsewhere, in the application that created the UDT.
Map and IntMap Types:
Datum provides two map types, map (datatype e_MAP) and int-map ( datatype e_INT_MAP). These types provide a mapping of key to value, as represented by a sequence of key-value pairs (and are not directly related to std::map). The key types for map and int-map are bslstl::StringRef and int respectively, and the value is always a Datum. Both map types keep track of whether they are sorted by key. Key-based lookup is done via the find function. If the map is in a sorted state, find has O(logN) complexity and find is O(N) otherwise (where N is the number of elements in the map). If entries with duplicate keys are present, which matching entry will be found is unspecified.
Usage:
This section illustrates intended use of this component.
Example 1: Basic Use of bdld::Datum:
This example illustrates the construction, manipulation and lifecycle of datums. Datums are created via a set of static methods called createTYPE, copyTYPE or adoptTYPE where TYPE is one of the supported types. The creation methods take a value and sometimes an allocator.
First, we create an allocator that will supply dynamic memory needed for the Datum objects being created:
  bslma::TestAllocator oa("object");
Then, we create a Datum, number, having an integer value of 3:
  Datum number = Datum::createInteger(3);
Next, we verify that the created object actually represents an integer value and verify that the value was set correctly:
  assert(true == number.isInteger());
  assert(3    == number.theInteger());
Note that this object does not allocate any dynamic memory on any supported platforms and thus we do not need to explicitly destroy this object to release any dynamic memory.
Then, we create a Datum, cityName, having the string value "Boston":
  Datum cityName = Datum::copyString("Boston", strlen("Boston"), &oa);
Note, that the copyString makes a copy of the specified string and will allocate memory to hold the copy. Whether the copy is stored in the object internal storage buffer or in memory obtained from the allocator depends on the length of the string and the platform.
Next, we verify that the created object actually represents a string value and verify that the value was set correctly:
  assert(true     == cityName.isString());
  assert("Boston" == cityName.theString());
Finally, we destroy the cityName object to deallocate memory used to hold string value:
  Datum::destroy(cityName, &oa);
Example 2: Creating a Datum Referring to an Array of Datum Objects:
This example demonstrates the construction of the Datum object referring to an existing array of Datum object.
First, we create array of the Datum object:
  const char theDay[] = "Birthday";
  const Datum array[2] = { Datum::createDate(bdlt::Date(2015, 10, 15)),
                           Datum::createStringRef(StringRef(theDay), &oa) };
Note, that in this case, the second element of the array does not make a copy of the string, but represents a string reference.
Then, we create a Datum that refers to the array of Datums:
  const Datum arrayRef = Datum::createArrayReference(array, 2, &oa);
Next, we verify that the created Datum represents the array value and that elements of this array can be accessed. We also verify that the object refers to external data:
  assert(true == arrayRef.isArray());
  assert(true == arrayRef.isExternalReference());
  assert(2    == arrayRef.theArray().length());
  assert(array[0] == arrayRef.theArray().data()[0]);
  assert(array[1] == arrayRef.theArray().data()[1]);
Then, we call destroy on arrayRef, releasing any memory it may have allocated, and verify that the external array is intact:
  Datum::destroy(arrayRef, &oa);

  assert(bdlt::Date(2015, 10, 15) == array[0].theDate());
  assert("Birthday"               == array[1].theString());
Finally, we need to deallocate memory that was potentially allocated for the (external) Datum string in the external array:
  Datum::destroy(array[1], &oa);
Example 3: Creating a Datum with an Array Value:
The following example illustrates the construction of an owned array of datums.
WARNING: Using corresponding builder components is a preferred way of constructing Datum array objects. This example shows how a user-facing builder component might use the primitives provided in bdld_datum.
First we create an array of datums:
  DatumMutableArrayRef bartArray;
  Datum::createUninitializedArray(&bartArray, 3, &oa);
  bartArray.data()[0] = Datum::createStringRef("Bart", &oa);
  bartArray.data()[1] = Datum::createStringRef("Simpson", &oa);
  bartArray.data()[2] = Datum::createInteger(10);
  *bartArray.length() = 3;
Then, we construct the Datum that holds the array itself:
  Datum bart = Datum::adoptArray(bartArray);
Note that after the bartArray has been adopted, the bartArray object can be destroyed without invalidating the array contained in the datum.
A DatumArray may be adopted by only one datum. If the DatumArray is not adopted, it must be destroyed via disposeUnitializedArray.
Now, we can access the contents of the array through the datum:
  assert(3      == bart.theArray().length());
  assert("Bart" == bart.theArray()[0].theString());
Finally, we destroy the datum, which releases all memory associated with the array:
  Datum::destroy(bart, &oa);
Note that the same allocator must be used to create the array, the elements, and to destroy the datum.
Example 4: Creating a Datum with a Map Value:
The following example illustrates the construction of a map of datums indexed by string keys.
WARNING: Using corresponding builder components is a preferred way of constructing Datum map objects. This example shows how a user-facing builder component might use the primitives provided in bdld_datum.
First we create a map of datums:
  DatumMutableMapRef lisaMap;
  Datum::createUninitializedMap(&lisaMap, 3, &oa);
  lisaMap.data()[0] = DatumMapEntry(StringRef("firstName"),
                                    Datum::createStringRef("Lisa", &oa));
  lisaMap.data()[1] = DatumMapEntry(StringRef("lastName"),
                                    Datum::createStringRef("Simpson", &oa));
  lisaMap.data()[2] = DatumMapEntry(StringRef("age"),
                                    Datum::createInteger(8));
  *lisaMap.size() = 3;
Then, we construct the Datum that holds the map itself:
  Datum lisa = Datum::adoptMap(lisaMap);
Note that after the lisaMap has been adopted, the lisaMap object can be destroyed without invalidating the map contained in the datum.
A DatumMutableMapRef may be adopted by only one datum. If the DatumMutableMapRef is not adopted, it must be destroyed via disposeUninitializedMap.
Now, we can access the contents of the map through the datum:
  assert(3      == lisa.theMap().size());
  assert("Lisa" == lisa.theMap().find("firstName")->theString());
Finally, we destroy the datum, which releases all memory associated with the array:
  Datum::destroy(lisa, &oa);
Note that the same allocator must be used to create the map, the elements, and to destroy the datum.
Example 5: Mass Destruction:
The following example illustrates an important idiom: the en masse destruction of a series of datums allocated in an arena.
  {
      // scope
      bsls::AlignedBuffer<200> bufferStorage;
      bdlma::BufferedSequentialAllocator arena(bufferStorage.buffer(), 200);

      Datum patty = Datum::copyString("Patty Bouvier",
                                      strlen("Patty Bouvier"),
                                      &arena);

      Datum selma = Datum::copyString("Selma Bouvier",
                                      strlen("Selma Bouvier"),
                                      &arena);
      DatumMutableArrayRef maggieArray;
      Datum::createUninitializedArray(&maggieArray, 2, &arena);
      maggieArray.data()[0] = Datum::createStringRef("Maggie", &arena);
      maggieArray.data()[1] = Datum::createStringRef("Simpson", &arena);
      *maggieArray.length() = 2;
      Datum maggie = Datum::adoptArray(maggieArray);
  } // end of scope
Here all the allocated memory is lodged in the arena allocator. At the end of the scope the memory is freed in a single step. Calling destroy for each datum individually is neither necessary nor permitted.
Example 6: User-defined, error and binary types:
Imagine we are using Datum within an expression evaluation subsystem. Within that subsystem, along with the set of types defined by Datum::DataType we also need to hold Sequence and Choice types within Datum values (which are not natively represented by Datum). First, we define the set of types used by our subsystem that are an extension to the types in DatumType:
  struct Sequence {
      struct Sequence *d_next_p;
      int              d_value;
  };

  enum ExtraExpressionTypes {
      e_SEQUENCE = 5,
      e_CHOICE = 6
  };
Notice that the numeric values will be provided as the type attribute when constructing Datum object.
Then we create a Sequence object, and create a Datum to hold it (note that we've created the object on the stack for clarity):
  Sequence sequence;
  const Datum datumS0 = Datum::createUdt(&sequence, e_SEQUENCE);
  assert(true == datumS0.isUdt());
Next, we verify that the datumS0 refers to the external Sequence object:
  bdld::DatumUdt udt = datumS0.theUdt();
  assert(e_SEQUENCE == udt.type());
  assert(&sequence  == udt.data());
Then, we create a Datum to hold a DatumError, consisting of an error code and an error description message:
  enum { e_FATAL_ERROR = 100 };
  Datum datumError = Datum::createError(e_FATAL_ERROR, "Fatal error.", &oa);
  assert(true == datumError.isError());
  DatumError error = datumError.theError();
  assert(e_FATAL_ERROR == error.code());
  assert("Fatal error." == error.message());
  Datum::destroy(datumError, &oa);
Finally, we create a Datum that holds an arbitrary binary data:
  int buffer[] = { 1, 2, 3 };
  Datum datumBlob = Datum::copyBinary(buffer, sizeof(buffer), &oa);
  buffer[2] = 666;
  assert(true == datumBlob.isBinary());
  DatumBinaryRef blob = datumBlob.theBinary();
  assert(blob.size() == 3 * sizeof(int));
  assert(reinterpret_cast<const int*>(blob.data())[2] == 3);
  Datum::destroy(datumBlob, &oa);
Note that the bytes have been copied.