BDE 4.14.0 Production release
|
Provide a utility to topologically sort a collection of inputs.
This component provides a utility struct
, bdlb::TopologicalSortUtil
, to topologically sort a collection of inputs that describe a directed acyclic graph (also known as DAG). A topological sort is useful for defining the order in which an input should be processed based on certain conditions. As an example, consider two jobs B and C both of which have a dependency on job A, i.e., job A needs to be completed before B or C can be run. Given such a requirement, a topological sort can provide an ordering of the three jobs such that A precedes B and C. Note that there may be multiple orderings that might be correct (e.g.: A, B, then C or A, C then B; both are correct and satisfy the dependencies of A running before the B and C jobs; we don't care whether B precedes C or vice versa).
The dependencies are specified in pairs (U,V) (read as U precedes V), which define the relations between U and V that the sort needs to maintain. The elements in these pairs (e.g., U and V) make up a finite set S. The output of the topological sort is an ordering of all elements of set S, such that all relations specified in the input (as pairs) are satisfied. Note that such an ordering is only possible only if there are no cycles in the input dependencies. For example, if A depends on B and B depends on A, then it is not possible to order A and B satisfying both the relations. The routine sort
defined in this component returns false
if it detects a cycle while trying to sort the input.
Complexity of the topological sort in this component is O(n+m)
where n is the number of input elements in the finite set S and m is the number of relations as specified by the input pairs.
The sort
function is provided in two variants or flavors: simple and iterator-based. The simple flavor is templated on the NODE_TYPE
, and it provides topological sorting for the case where the input is a bsl::vector
of bsl::pair
s, and the outputs are bsl::vector
s. The templated variant is highly customizable (templated) on both the input and the outputs. The input may be any input iterator range that has a bsl::pair
value_type
or a value_type
with a bdlb::TopologicalSortUtilEdgeTraits
specialization. The value_type
is determined using bsl::iterator::traits
. The two outputs are both defined as templated output iterators and they may have different types. So (for example) the iterator based sort
may be used with Null OUTPUT_ITER
to answer the question "does this graph have
cycles?" while not wasting memory in storing the sort results.
Some graph representations may use self-referencing nodes to indicate nodes without any connection – where a self referencing node in the context of this component is a input edge pair like (u, u). The implementation of sort
in this component treats such input entries as cycles and fails sorting.
You may still use this sort
implementation (to process your nodes in proper order) if your data structure contains such entries. To process a graph having self-referencing nodes one may create a filtering iterator on top of the input that removes any self-referential edges from the input. Any filtered nodes could be treated as being sorted first.
This component provides generic (templated) utilities. This section describes the requirements of the type parameters (concepts) of these generic methods.
The input for a topological sort is supplied as pairs (though not necessarily bsl::pair
s) of NODE_TYPE
values. NODE_TYPE
is aliased to NodeType
in TopologicalSortUtilEdgeTraits
.
NODE_TYPE
shall be a value type that supports hashing using the bsl::hash<NODE_TYPE>
functor type and equality comparison using the bsl::equal_to<NODE_TYPE>
functor type.
Notice that we have not provided a customization point for the hash functor type nor for the equality functor type because that would complicate the code greatly (esp. readability would suffer).
The EdgeType
describes an edge as a (conceptual) pair of NODE_TYPE
values and is supplied as the input to the topological sort methods. EdgeType
is the value_type
of the INPUT_ITER
supplied to the sort
functions. Conceptually it represents a pair (U,V) (where U and V are NodeType
s) that means "U precedes V". The EdgeType
supplied to a sort is typically a bsl::pair
, but users may customize this by specializing the TopologicalSortUtilEdgeTraits
type. TopologicalSortUtil
determines the NodeType
and access operations on EdgeType
via the TopologicalSortUtilEdgeTraits
type.
TopologicalSortUtilEdgeTraits
is specialized for bsl::pair<T,T>
, the user needs to (partially or fully) specialize it for other types. See the CustomEdge
specialization in the test driver for an example.
INPUT_ITER
is an input iterator type used by the sort
functions, and its value_type
is EdgeType
.
OUTPUT_ITER
is an output iterator for sort
functions, and its value_type
is NodeType
.
UNORDERED_ITER
is a (second) output iterator for sort
functions, and its value_type
is NodeType
. Note that OUTPUT_ITERATOR
and UNORDERED_ITER
are distinct types allowing the sorted output nodes to be collected in a different way from the unsorted output nodes(e.g., into a different type of container).
This section illustrates intended use of this component.
Suppose we are evaluating formulas for a set of market data fields, where formulas can reference other fields as part of their calculation. As an example, let's say we have a field k_bbgDefinedVwap
that is dependent on k_vwapTurnover
and k_vwapVolume
. These fields in turn are dependent on k_tradeSize
and k_tradePrice
respectively. So essentially the fields that are not dependent on any other field should be calculated first and then the dependent fields. We can use the topological sort utility to provide us the order in which these fields should be calculated.
First, we create the relations showcasing the above mentioned dependencies in the form of pairs (a,b) where b is dependent on a:
Now, we call the topological sort to get a topological order for the fields referenced in the relations:
Finally, we verify that the call to sort
populates the supplied results
with a sequence in sorted order (e.g.. k_tradeSize
, 'k_tradePrice, k_vwapTurnover
, k_vwapVolume
, k_bbgDefinedVwap
) and unsorted
will be empty because the input relationships do not contain a cycle.
Suppose we have a set of inputs that contain a cycle.
First, we define a set of inputs that have a cycle:
Now, we apply the topological sort routine on the input:
Finally, we verify whether the routine recognizes that there is a cycle and returns false, and the 3 nodes comprising the cycle are reported in unordered2
:
Suppose we have a set of inputs that have input relations where predecessor and successor point to the same value, i.e. we have pairs of input like (u,u). This example demonstrates that the sort
function handles such input as a cycle. (See Example 6 for a way of processing such nodes and avoiding sort
reporting a cycle.) First, we define such a set of inputs:
Now, we apply the topological sort routine on the input:
Finally, we verify that the self relations causes the cycle:
Suppose we have a set of inputs that have input relations that conceptually follow the input requirements (of a listing of pairs of nodes) but are not physically stored in a bsl::vector
of bsl::pair
typed container. Let's suppose the input is in a bsl::list
instead. First, we define such a set of inputs:
Now, we apply the topological sort routine on the input:
Finally, we verify that the sort is successful, there are no nodes in the unsorted
output (there is no cycle) and the nodes are listed in the proper order in results4
:
Suppose we want our result in a bsl::list
instead of a bsl::vector
and we do not care about the unsorted elements so we do not want to pay for storing them if they exist. First, we would define a Null Output Iterator that writes to nowhere:
Now, we apply the topological sort routine on the input:
Finally, we verify that the sort is successful, and the 3 nodes are listed in the proper order in the results5
list: