Quick Links:

bal | bbl | bdl | bsl

Namespaces

Component bdlb_topologicalsortutil
[Package bdlb]

Provide a utility to topologically sort a collection of inputs. More...

Namespaces

namespace  bdlb

Detailed Description

Outline
Purpose:
Provide a utility to topologically sort a collection of inputs.
Classes:
bdlb::TopologicalSortUtil utility for topologically sorting inputs
bdlb::TopologicalSortUtilEdgeTraits customization point for edge types
See also:
Description:
This component provides a utility struct, bdlb::TopologicalSortUtil, to topologically sort a collection of inputs that describe a directed acyclic graph (also known as DAG). A topological sort is useful for defining the order in which an input should be processed based on certain conditions. As an example, consider two jobs B and C both of which have a dependency on job A, i.e., job A needs to be completed before B or C can be run. Given such a requirement, a topological sort can provide an ordering of the three jobs such that A precedes B and C. Note that there may be multiple orderings that might be correct (e.g.: A, B, then C or A, C then B; both are correct and satisfy the dependencies of A running before the B and C jobs; we don't care whether B precedes C or vice versa).
The dependencies are specified in pairs (U,V) (read as U precedes V), which define the relations between U and V that the sort needs to maintain. The elements in these pairs (e.g., U and V) make up a finite set S. The output of the topological sort is an ordering of all elements of set S, such that all relations specified in the input (as pairs) are satisfied. Note that such an ordering is only possible only if there are no cycles in the input dependencies. For example, if A depends on B and B depends on A, then it is not possible to order A and B satisfying both the relations. The routine sort defined in this component returns false if it detects a cycle while trying to sort the input.
Complexity of the topological sort in this component is O(n+m) where n is the number of input elements in the finite set S and m is the number of relations as specified by the input pairs.
The sort function is provided in two variants or flavors: simple and iterator-based. The simple flavor is templated on the NODE_TYPE, and it provides topological sorting for the case where the input is a bsl::vector of bsl::pairs, and the outputs are bsl::vectors. The templated variant is highly customizable (templated) on both the input and the outputs. The input may be any input iterator range that has a bsl::pair value_type or a value_type with a bdlb::TopologicalSortUtilEdgeTraits specialization. The value_type is determined using bsl::iterator::traits. The two outputs are both defined as templated output iterators and they may have different types. So (for example) the iterator based sort may be used with Null OUTPUT_ITER to answer the question "does this graph have cycles?" while not wasting memory in storing the sort results.
Self-referencing Nodes:
Some graph representations may use self-referencing nodes to indicate nodes without any connection -- where a self referencing node in the context of this component is a input edge pair like (u, u). The implementation of sort in this component treats such input entries as cycles and fails sorting.
You may still use this sort implementation (to process your nodes in proper order) if your data structure contains such entries. To process a graph having self-referencing nodes one may create a filtering iterator on top of the input that removes any self-referential edges from the input. Any filtered nodes could be treated as being sorted first.
Concepts:
This component provides generic (templated) utilities. This section describes the requirements of the type parameters (concepts) of these generic methods.
NODE_TYPE Concept:
The input for a topological sort is supplied as pairs (though not necessarily bsl::pairs) of NODE_TYPE values. NODE_TYPE is aliased to NodeType in TopologicalSortUtilEdgeTraits.
NODE_TYPE shall be a value type that supports hashing using the bsl::hash<NODE_TYPE> functor type and equality comparison using the bsl::equal_to<NODE_TYPE> functor type.
Notice that we have not provided a customization point for the hash functor type nor for the equality functor type because that would complicate the code greatly (esp. readability would suffer).
EdgeType Concept:
The EdgeType describes an edge as a (conceptual) pair of NODE_TYPE values and is supplied as the input to the topological sort methods. EdgeType is the value_type of the INPUT_ITER supplied to the sort functions. Conceptually it represents a pair (U,V) (where U and V are NodeTypes) that means "U precedes V". The EdgeType supplied to a sort is typically a bsl::pair, but users may customize this by specializing the TopologicalSortUtilEdgeTraits type. TopologicalSortUtil determines the NodeType and access operations on EdgeType via the TopologicalSortUtilEdgeTraits type.
TopologicalSortUtilEdgeTraits is specialized for bsl::pair<T,T>, the user needs to (partially or fully) specialize it for other types. See the CustomEdge specialization in the test driver for an example.
INPUT_ITER Concept:
INPUT_ITER is an input iterator type used by the sort functions, and its value_type is EdgeType .
OUTPUT_ITER Concept:
OUTPUT_ITER is an output iterator for sort functions, and its value_type is NodeType.
UNORDERED_ITER Concept:
UNORDERED_ITER is a (second) output iterator for sort functions, and its value_type is NodeType. Note that OUTPUT_ITERATOR and UNORDERED_ITER are distinct types allowing the sorted output nodes to be collected in a different way from the unsorted output nodes(e.g., into a different type of container).
USAGE:
This section illustrates intended use of this component.
Example 1: Using Topological Sort for Calculating Formulas:
Suppose we are evaluating formulas for a set of market data fields, where formulas can reference other fields as part of their calculation. As an example, let's say we have a field k_bbgDefinedVwap that is dependent on k_vwapTurnover and k_vwapVolume. These fields in turn are dependent on k_tradeSize and k_tradePrice respectively. So essentially the fields that are not dependent on any other field should be calculated first and then the dependent fields. We can use the topological sort utility to provide us the order in which these fields should be calculated.
First, we create the relations showcasing the above mentioned dependencies in the form of pairs (a,b) where b is dependent on a:
  enum FieldIds {
      k_bbgDefinedVwap  = 0,
      k_vwapTurnover    = 1,
      k_vwapVolume      = 2,
      k_tradeSize       = 3,
      k_tradePrice      = 4
  };

  bsl::vector<bsl::pair<int, int> > relations;

  relations.push_back(bsl::make_pair(static_cast<int>(k_vwapTurnover),
                                     static_cast<int>(k_bbgDefinedVwap)));
  relations.push_back(bsl::make_pair(static_cast<int>(k_vwapVolume),
                                     static_cast<int>(k_bbgDefinedVwap)));
  relations.push_back(bsl::make_pair(static_cast<int>(k_tradeSize),
                                     static_cast<int>(k_vwapVolume)));
  relations.push_back(bsl::make_pair(static_cast<int>(k_tradeSize),
                                     static_cast<int>(k_vwapTurnover)));
  relations.push_back(bsl::make_pair(static_cast<int>(k_tradePrice),
                                     static_cast<int>(k_vwapTurnover)));
Now, we call the topological sort to get a topological order for the fields referenced in the relations:
  bsl::vector<int> results;
  bsl::vector<int> unsorted;
  bool             sorted = TopologicalSortUtil::sort(&results,
                                                      &unsorted,
                                                      relations);
Finally, we verify that the call to sort populates the supplied results with a sequence in sorted order (e.g.. k_tradeSize, 'k_tradePrice, k_vwapTurnover, k_vwapVolume, k_bbgDefinedVwap) and unsorted will be empty because the input relationships do not contain a cycle.
  bool calculated[5] = { 0 };
  assert(sorted == true);
  assert(unsorted.empty());

  for (bsl::vector<int>::const_iterator iter = results.begin(),
                                        end  = results.end();
       iter != end; ++iter) {
      switch (*iter) {
        case k_bbgDefinedVwap: {
          assert(calculated[k_vwapTurnover] == true);
          assert(calculated[k_vwapVolume]   == true);

          assert(calculated[k_bbgDefinedVwap] == false);
          calculated[k_bbgDefinedVwap] = true;
        } break;
        case k_vwapTurnover: {
          assert(calculated[k_tradeSize]  == true);
          assert(calculated[k_tradePrice] == true);

          assert(calculated[k_vwapTurnover] == false);
          calculated[k_vwapTurnover] = true;
        } break;
        case k_vwapVolume: {
          assert(calculated[k_tradeSize]  == true);

          assert(calculated[k_vwapVolume] == false);
          calculated[k_vwapVolume] = true;
        } break;
        case k_tradeSize: {
          assert(calculated[k_vwapVolume]   == false);
          assert(calculated[k_vwapTurnover] == false);

          assert(calculated[k_tradeSize] == false);
          calculated[k_tradeSize] = true;
        } break;
        case k_tradePrice: {
          assert(calculated[k_vwapTurnover] == false);

          assert(calculated[k_tradePrice] == false);
          calculated[k_tradePrice] = true;
        } break;
        default:
          assert(false);
          break;
      };
  }

  for (int i = 0; i < 5; ++i) {
      assert(calculated[i] == true);
  }
Example 2: Using Topological Sort with Cycles in Input:
Suppose we have a set of inputs that contain a cycle.
First, we define a set of inputs that have a cycle:
  enum FieldIds2 {
      k_FIELD1 = 0,
      k_FIELD2 = 1,
      k_FIELD3 = 2
  };

  bsl::vector<bsl::pair<int, int> > relations2;

  relations2.push_back(bsl::make_pair(static_cast<int>(k_FIELD2),
                                      static_cast<int>(k_FIELD1)));
  relations2.push_back(bsl::make_pair(static_cast<int>(k_FIELD3),
                                      static_cast<int>(k_FIELD2)));
  relations2.push_back(bsl::make_pair(static_cast<int>(k_FIELD1),
                                      static_cast<int>(k_FIELD3)));
Now, we apply the topological sort routine on the input:
  bsl::vector<int> results2;
  bsl::vector<int> unordered2;
  bool             sorted2 = TopologicalSortUtil::sort(&results2,
                                                       &unordered2,
                                                       relations2);
Finally, we verify whether the routine recognizes that there is a cycle and returns false, and the 3 nodes comprising the cycle are reported in unordered2:
  assert(sorted2           == false);
  assert(unordered2.size() == 3);
Example 3: Using Topological Sort with Self Relations:
Suppose we have a set of inputs that have input relations where predecessor and successor point to the same value, i.e. we have pairs of input like (u,u). This example demonstrates that the sort function handles such input as a cycle. (See Example 6 for a way of processing such nodes and avoiding sort reporting a cycle.) First, we define such a set of inputs:
  enum FieldIds3 {
      k_FIELD4 = 3,
      k_FIELD5 = 4,
      k_FIELD6 = 5
  };

  bsl::vector<bsl::pair<int, int> > relations3;

  relations3.push_back(bsl::make_pair(static_cast<int>(k_FIELD4),
                                      static_cast<int>(k_FIELD6)));
  relations3.push_back(bsl::make_pair(static_cast<int>(k_FIELD5),
                                      static_cast<int>(k_FIELD4)));
  relations3.push_back(bsl::make_pair(static_cast<int>(k_FIELD4),
                                      static_cast<int>(k_FIELD4)));
Now, we apply the topological sort routine on the input:
  bsl::vector<int> results3;
  bsl::vector<int> unordered3;
  bool             sorted3 = TopologicalSortUtil::sort(&results3,
                                                       &unordered3,
                                                       relations3);
Finally, we verify that the self relations causes the cycle:
  assert(sorted3           == false);
  assert(results3.size()   == 1);
  assert(unordered3.size() == 2);

  if (veryVeryVerbose) {
      cout << "Size of results3 vector is "
           << results3.size() << endl;

      cout << "Size of unordered3 vector is "
           << unordered3.size() << endl;
  }

  if (veryVeryVerbose)
  {
      for (bsl::size_t i = 0; i < results3.size(); ++i) {
              cout << "results3[" << i << "] is " << results3[i] << "\n";
      }

      for (bsl::size_t i = 0; i < unordered3.size(); ++i) {
              cout << "unordered3[" << i << "] is " << unordered3[i] << "\n";
      }
  }
Example 4: Using Topological Sort with Iterators as Input:
Suppose we have a set of inputs that have input relations that conceptually follow the input requirements (of a listing of pairs of nodes) but are not physically stored in a bsl::vector of bsl::pair typed container. Let's suppose the input is in a bsl::list instead. First, we define such a set of inputs:
  bsl::list<bsl::pair<int, int> > relations4;

  relations4.push_back(bsl::make_pair(1, 2));
  relations4.push_back(bsl::make_pair(1, 3));
  relations4.push_back(bsl::make_pair(2, 3));
Now, we apply the topological sort routine on the input:
  bsl::vector<int> results4;
  bsl::vector<int> unordered4;
  typedef bsl::back_insert_iterator<bsl::vector<int> > OutIter;
  bool sorted4 = TopologicalSortUtil::sort(relations4.begin(),
                                           relations4.end(),
                                           OutIter(results4),
                                           OutIter(unordered3));
Finally, we verify that the sort is successful, there are no nodes in the unsorted output (there is no cycle) and the nodes are listed in the proper order in results4:
  assert(sorted4           == true);
  assert(unordered4.size() == 0);
  assert(results4.size()   == 3);

  assert(results4[0] == 1);
  assert(results4[1] == 2);
  assert(results4[2] == 3);
Example 5: Using Topological Sort with Iterators as Output:
Suppose we want our result in a bsl::list instead of a bsl::vector and we do not care about the unsorted elements so we do not want to pay for storing them if they exist. First, we would define a Null Output Iterator that writes to nowhere:
  class NullOutputIterator {
    public:
      typedef void container_type;
      typedef void value_type;
      typedef void difference_type;
      typedef void pointer;
      typedef void reference;
      typedef bsl::output_iterator_tag iterator_category;

      template <class T>
      NullOutputIterator& operator=(const T&)
      {
          return *this;
      }

      NullOutputIterator& operator*()
      {
          return *this;
      }

      NullOutputIterator& operator++()
      {
          return *this;
      }

      NullOutputIterator& operator++(int)
      {
          return *this;
      }
  };
Now, we apply the topological sort routine on the input:
  bsl::list<int> results5;
  typedef bsl::back_insert_iterator<bsl::list<int> > ListOutIter;
  bool sorted5 = TopologicalSortUtil::sort(relations4.begin(),
                                           relations4.end(),
                                           ListOutIter(results5),
                                           NullOutputIterator());
Finally, we verify that the sort is successful, and the 3 nodes are listed in the proper order in the results5 list:
  assert(sorted5           == true);
  assert(results5.size()   == 3);

  assert(results5[0] == 1);
  assert(results5[1] == 2);
  assert(results5[2] == 3);