Provide a utility to topologically sort a collection of inputs.
More...
Namespaces |
namespace | bdlb |
Detailed Description
- Outline
-
-
- Purpose:
- Provide a utility to topologically sort a collection of inputs.
-
- Classes:
-
- See also:
-
- Description:
- This component provides a utility
struct
, bdlb::TopologicalSortUtil
, to topologically sort a collection of inputs that describe a directed acyclic graph (also known as DAG). A topological sort is useful for defining the order in which an input should be processed based on certain conditions. As an example, consider two jobs B and C both of which have a dependency on job A, i.e., job A needs to be completed before B or C can be run. Given such a requirement, a topological sort can provide an ordering of the three jobs such that A precedes B and C. Note that there may be multiple orderings that might be correct (e.g.: A, B, then C or A, C then B; both are correct and satisfy the dependencies of A running before the B and C jobs; we don't care whether B precedes C or vice versa).
- The dependencies are specified in pairs (U,V) (read as U precedes V), which define the relations between U and V that the sort needs to maintain. The elements in these pairs (e.g., U and V) make up a finite set S. The output of the topological sort is an ordering of all elements of set S, such that all relations specified in the input (as pairs) are satisfied. Note that such an ordering is only possible only if there are no cycles in the input dependencies. For example, if A depends on B and B depends on A, then it is not possible to order A and B satisfying both the relations. The routine
sort
defined in this component returns false
if it detects a cycle while trying to sort the input.
- Complexity of the topological sort in this component is
O(n+m)
where n is the number of input elements in the finite set S and m is the number of relations as specified by the input pairs.
- The
sort
function is provided in two variants or flavors: simple and iterator-based. The simple flavor is templated on the NODE_TYPE
, and it provides topological sorting for the case where the input is a bsl::vector
of bsl::pair
s, and the outputs are bsl::vector
s. The templated variant is highly customizable (templated) on both the input and the outputs. The input may be any input iterator range that has a bsl::pair
value_type
or a value_type
with a bdlb::TopologicalSortUtilEdgeTraits
specialization. The value_type
is determined using bsl::iterator::traits
. The two outputs are both defined as templated output iterators and they may have different types. So (for example) the iterator based sort
may be used with Null OUTPUT_ITER
to answer the question "does this graph have
cycles?" while not wasting memory in storing the sort results.
-
- Self-referencing Nodes:
- Some graph representations may use self-referencing nodes to indicate nodes without any connection -- where a self referencing node in the context of this component is a input edge pair like (u, u). The implementation of
sort
in this component treats such input entries as cycles and fails sorting.
- You may still use this
sort
implementation (to process your nodes in proper order) if your data structure contains such entries. To process a graph having self-referencing nodes one may create a filtering iterator on top of the input that removes any self-referential edges from the input. Any filtered nodes could be treated as being sorted first.
-
- Concepts:
- This component provides generic (templated) utilities. This section describes the requirements of the type parameters (concepts) of these generic methods.
-
- NODE_TYPE Concept:
- The input for a topological sort is supplied as pairs (though not necessarily
bsl::pair
s) of NODE_TYPE
values. NODE_TYPE
is aliased to NodeType
in TopologicalSortUtilEdgeTraits
.
NODE_TYPE
shall be a value type that supports hashing using the bsl::hash<NODE_TYPE>
functor type and equality comparison using the bsl::equal_to<NODE_TYPE>
functor type.
- Notice that we have not provided a customization point for the hash functor type nor for the equality functor type because that would complicate the code greatly (esp. readability would suffer).
-
- EdgeType Concept:
- The
EdgeType
describes an edge as a (conceptual) pair of NODE_TYPE
values and is supplied as the input to the topological sort methods. EdgeType
is the value_type
of the INPUT_ITER
supplied to the sort
functions. Conceptually it represents a pair (U,V) (where U and V are NodeType
s) that means "U precedes V". The EdgeType
supplied to a sort is typically a bsl::pair
, but users may customize this by specializing the TopologicalSortUtilEdgeTraits
type. TopologicalSortUtil
determines the NodeType
and access operations on EdgeType
via the TopologicalSortUtilEdgeTraits
type.
TopologicalSortUtilEdgeTraits
is specialized for bsl::pair<T,T>
, the user needs to (partially or fully) specialize it for other types. See the CustomEdge
specialization in the test driver for an example.
-
- INPUT_ITER Concept:
INPUT_ITER
is an input iterator type used by the sort
functions, and its value_type
is EdgeType
.
-
- OUTPUT_ITER Concept:
OUTPUT_ITER
is an output iterator for sort
functions, and its value_type
is NodeType
.
-
- UNORDERED_ITER Concept:
UNORDERED_ITER
is a (second) output iterator for sort
functions, and its value_type
is NodeType
. Note that OUTPUT_ITERATOR
and UNORDERED_ITER
are distinct types allowing the sorted output nodes to be collected in a different way from the unsorted output nodes(e.g., into a different type of container).
-
- USAGE:
- This section illustrates intended use of this component.
-
- Example 1: Using Topological Sort for Calculating Formulas:
- Suppose we are evaluating formulas for a set of market data fields, where formulas can reference other fields as part of their calculation. As an example, let's say we have a field
k_bbgDefinedVwap
that is dependent on k_vwapTurnover
and k_vwapVolume
. These fields in turn are dependent on k_tradeSize
and k_tradePrice
respectively. So essentially the fields that are not dependent on any other field should be calculated first and then the dependent fields. We can use the topological sort utility to provide us the order in which these fields should be calculated.
- First, we create the relations showcasing the above mentioned dependencies in the form of pairs (a,b) where b is dependent on a:
enum FieldIds {
k_bbgDefinedVwap = 0,
k_vwapTurnover = 1,
k_vwapVolume = 2,
k_tradeSize = 3,
k_tradePrice = 4
};
bsl::vector<bsl::pair<int, int> > relations;
relations.push_back(bsl::make_pair(static_cast<int>(k_vwapTurnover),
static_cast<int>(k_bbgDefinedVwap)));
relations.push_back(bsl::make_pair(static_cast<int>(k_vwapVolume),
static_cast<int>(k_bbgDefinedVwap)));
relations.push_back(bsl::make_pair(static_cast<int>(k_tradeSize),
static_cast<int>(k_vwapVolume)));
relations.push_back(bsl::make_pair(static_cast<int>(k_tradeSize),
static_cast<int>(k_vwapTurnover)));
relations.push_back(bsl::make_pair(static_cast<int>(k_tradePrice),
static_cast<int>(k_vwapTurnover)));
Now, we call the topological sort to get a topological order for the fields referenced in the relations: Finally, we verify that the call to sort
populates the supplied results
with a sequence in sorted order (e.g.. k_tradeSize
, 'k_tradePrice, k_vwapTurnover
, k_vwapVolume
, k_bbgDefinedVwap
) and unsorted
will be empty because the input relationships do not contain a cycle. bool calculated[5] = { 0 };
assert(sorted == true);
assert(unsorted.empty());
for (bsl::vector<int>::const_iterator iter = results.begin(),
end = results.end();
iter != end; ++iter) {
switch (*iter) {
case k_bbgDefinedVwap: {
assert(calculated[k_vwapTurnover] == true);
assert(calculated[k_vwapVolume] == true);
assert(calculated[k_bbgDefinedVwap] == false);
calculated[k_bbgDefinedVwap] = true;
} break;
case k_vwapTurnover: {
assert(calculated[k_tradeSize] == true);
assert(calculated[k_tradePrice] == true);
assert(calculated[k_vwapTurnover] == false);
calculated[k_vwapTurnover] = true;
} break;
case k_vwapVolume: {
assert(calculated[k_tradeSize] == true);
assert(calculated[k_vwapVolume] == false);
calculated[k_vwapVolume] = true;
} break;
case k_tradeSize: {
assert(calculated[k_vwapVolume] == false);
assert(calculated[k_vwapTurnover] == false);
assert(calculated[k_tradeSize] == false);
calculated[k_tradeSize] = true;
} break;
case k_tradePrice: {
assert(calculated[k_vwapTurnover] == false);
assert(calculated[k_tradePrice] == false);
calculated[k_tradePrice] = true;
} break;
default:
assert(false);
break;
};
}
for (int i = 0; i < 5; ++i) {
assert(calculated[i] == true);
}
-
- Example 2: Using Topological Sort with Cycles in Input:
- Suppose we have a set of inputs that contain a cycle.
- First, we define a set of inputs that have a cycle:
enum FieldIds2 {
k_FIELD1 = 0,
k_FIELD2 = 1,
k_FIELD3 = 2
};
bsl::vector<bsl::pair<int, int> > relations2;
relations2.push_back(bsl::make_pair(static_cast<int>(k_FIELD2),
static_cast<int>(k_FIELD1)));
relations2.push_back(bsl::make_pair(static_cast<int>(k_FIELD3),
static_cast<int>(k_FIELD2)));
relations2.push_back(bsl::make_pair(static_cast<int>(k_FIELD1),
static_cast<int>(k_FIELD3)));
Now, we apply the topological sort routine on the input: Finally, we verify whether the routine recognizes that there is a cycle and returns false, and the 3 nodes comprising the cycle are reported in unordered2
: assert(sorted2 == false);
assert(unordered2.size() == 3);
-
- Example 3: Using Topological Sort with Self Relations:
- Suppose we have a set of inputs that have input relations where predecessor and successor point to the same value, i.e. we have pairs of input like (u,u). This example demonstrates that the
sort
function handles such input as a cycle. (See Example 6 for a way of processing such nodes and avoiding sort
reporting a cycle.) First, we define such a set of inputs: enum FieldIds3 {
k_FIELD4 = 3,
k_FIELD5 = 4,
k_FIELD6 = 5
};
bsl::vector<bsl::pair<int, int> > relations3;
relations3.push_back(bsl::make_pair(static_cast<int>(k_FIELD4),
static_cast<int>(k_FIELD6)));
relations3.push_back(bsl::make_pair(static_cast<int>(k_FIELD5),
static_cast<int>(k_FIELD4)));
relations3.push_back(bsl::make_pair(static_cast<int>(k_FIELD4),
static_cast<int>(k_FIELD4)));
Now, we apply the topological sort routine on the input: Finally, we verify that the self relations causes the cycle: assert(sorted3 == false);
assert(results3.size() == 1);
assert(unordered3.size() == 2);
if (veryVeryVerbose) {
cout << "Size of results3 vector is "
<< results3.size() << endl;
cout << "Size of unordered3 vector is "
<< unordered3.size() << endl;
}
if (veryVeryVerbose)
{
for (bsl::size_t i = 0; i < results3.size(); ++i) {
cout << "results3[" << i << "] is " << results3[i] << "\n";
}
for (bsl::size_t i = 0; i < unordered3.size(); ++i) {
cout << "unordered3[" << i << "] is " << unordered3[i] << "\n";
}
}
-
- Example 4: Using Topological Sort with Iterators as Input:
- Suppose we have a set of inputs that have input relations that conceptually follow the input requirements (of a listing of pairs of nodes) but are not physically stored in a
bsl::vector
of bsl::pair
typed container. Let's suppose the input is in a bsl::list
instead. First, we define such a set of inputs: Now, we apply the topological sort routine on the input: bsl::vector<int> results4;
bsl::vector<int> unordered4;
typedef bsl::back_insert_iterator<bsl::vector<int> > OutIter;
bool sorted4 = TopologicalSortUtil::sort(relations4.begin(),
relations4.end(),
OutIter(results4),
OutIter(unordered3));
Finally, we verify that the sort is successful, there are no nodes in the unsorted
output (there is no cycle) and the nodes are listed in the proper order in results4
: assert(sorted4 == true);
assert(unordered4.size() == 0);
assert(results4.size() == 3);
assert(results4[0] == 1);
assert(results4[1] == 2);
assert(results4[2] == 3);
-
- Example 5: Using Topological Sort with Iterators as Output:
- Suppose we want our result in a
bsl::list
instead of a bsl::vector
and we do not care about the unsorted elements so we do not want to pay for storing them if they exist. First, we would define a Null Output Iterator that writes to nowhere: class NullOutputIterator {
public:
typedef void container_type;
typedef void value_type;
typedef void difference_type;
typedef void pointer;
typedef void reference;
typedef bsl::output_iterator_tag iterator_category;
template <class T>
NullOutputIterator& operator=(const T&)
{
return *this;
}
NullOutputIterator& operator*()
{
return *this;
}
NullOutputIterator& operator++()
{
return *this;
}
NullOutputIterator& operator++(int)
{
return *this;
}
};
Now, we apply the topological sort routine on the input: bsl::list<int> results5;
typedef bsl::back_insert_iterator<bsl::list<int> > ListOutIter;
bool sorted5 = TopologicalSortUtil::sort(relations4.begin(),
relations4.end(),
ListOutIter(results5),
NullOutputIterator());
Finally, we verify that the sort is successful, and the 3 nodes are listed in the proper order in the results5
list: assert(sorted5 == true);
assert(results5.size() == 3);
assert(results5[0] == 1);
assert(results5[1] == 2);
assert(results5[2] == 3);