Mar 31, 2021 (data collected Aug 9, 2019)
Concurrent Queue Evaluation¶
Executive Summary¶
Initially, BDE supplied two concurrent queue implementations: bdlcc::FixedQueue and bdlcc::Queue. As the name implies, bdlcc::FixedQueue has a fixed sized (specified at construction). bdlcc::Queue grows as needed. Both implementations support multiple producers and multiple consumers. By adding constraints upon the number of producers and consumers, and applying lessons learned since the creation of the initial two queues, queues with a cleaner interface and better performance - as measured by a simple micro-benchmark - were created (note that “bounded” is a synonym for “fixed” with regard to the queue’s capacity):
While a micro-benchmark is never as good an evaluation as a test using production code with real data, the micro-benchmark does demonstrate there exists a usage scenario where these new queue implementations outperform the old implementations. As such, these new queues are valuable additions to the BDE libraries. Furthermore, using the most constrained of the new queues applicable to your problem (e.g., using bdlcc::SingleProducerQueue for a problem with one producer that must never be blocked) is always recommended (if you find a usage scenario where the appropriate new queue significantly underperforms another queue, please contact BDE).
Benchmark¶
The results presented are based upon throughput measurements, as per bslmt::ThroughputBenchmark, for a simple micro-benchmark where there are a specified number of producer threads that push an element and then perform a specified amount of busy-waiting, and a specified number of consumer threads that pop an element and then perform a specified amount of busy-waiting. Each reported result is the median result of 101 trials of the micro-benchmark, with each trial having a ten millisecond execution time. The parameters of a scenario are:
Abbreviation |
Description |
Values |
---|---|---|
PUSH BUSY |
number of (arbitrary) work units a producer performs after a push |
20, 100, 500, 1000, 5000 |
POP BUSY |
number of (arbitrary) work units a consumer performs after a pop |
20, 100, 500, 1000, 5000 |
#PUSH |
number of producer threads |
1, 2, 4, 8, 16 |
#POP |
number of consumer threads |
1, 2, 4, 8, 16 |
INIT CAP |
queue initial capacity |
64, 512, 4096 |
OS |
platform |
LINUX, WIN |
The values for PUSH BUSY and POP BUSY are measured in an arbitrary unit of work, where each unit of work consists of one 64-bit integer multiplication and a modulus.
Selecting a Result Table¶
A table is presented to guide the selection of an implementation in a specific situation. The most common situations are formed by the cross product of three constraints: single or multiple producer threads, single or multiple consumer threads, and whether the queue’s size is unbounded, bounded, or either.
In general, with regard to single versus multiple threads, selecting the most constrained queue (e.g., single over multiple producer threads) will result in better performance. However, the constraint must then be met by the usage of the queue. For instance, to use a single producer queue, at most one thread may be using the queue at any given time (e.g., there is only one thread that will use the push methods).
Frequently there are constraints on the size of the queue. For instance, the queue must be unbounded to ensure push operations never block. Or, the queue has a maximum amount of memory that may be used and must therefore be a bounded implementation. Unfortunately, the relative performance of bounded versus unbounded queues can be complicated so choosing a queue in the absence of a constraint may be difficult; the result tables where the queue size is “either” will hopefully help.
Result Table Format¶
The result tables are information dense and hence provide succinct support for the recommendation of a particular implementation.
A table is comprised of two sections. The title section has three lines: what is being presented, the row and column parameters, and the data filters. The results section is a grid of scenario outcomes represented as a color-coded numeric value.
The first title line details what is being presented, and is always “% Throughput rel. Best”. The presented numeric results are the throughput percentage of the indicate algorithm relative to the best performing implementation. The implementation abbreviations used:
Abbreviation |
Implementation |
---|---|
F |
|
Q |
|
SP |
|
SC |
|
SPSCB |
|
B |
The second title line is formatted as “Row Parameter vs. Column Parameter”. For instance, “PUSH BUSY vs. POP BUSY” indicates the row header values are the PUSH BUSY parameter values, and the column header values are the POP BUSY parameter values. The third title line shows the filters for the presented data. For instance, “INIT CAP=64, OS=LINUX” indicates the results were for scenarios where the queue had an initial capacity of 64 elements, and the platform was Linux.
Note that, typically, a number of parameters are not defined within the second and third title lines. The value used for each implementation within a cell is the best result across the unrestricted parameters. For instance, if the title lines are “% Throughput rel. Best”, “PUSH BUSY vs. POP BUSY”, and “INIT CAP=64, OS=LINUX”, the result used for each algorithm in each cell is the one with highest throughput of all scenarios for #PUSH and #POP, divided by the result for the best value and multiplied by 100.
Finally, the result cells are color-coded to indicate the best performing implementation.
Sample Result Table Interpretation¶
The following table shows the percent throughput relative the best implementation of three queue implementations on the Linux platform for queues having an initial capacity of 64 elements. The three implementations are bdlcc::FixedQueue, bdlcc::BoundedQueue, and bdlcc::Queue. For the PUSH BUSY value 20 (20 units of work are done after each push) and the POP BUSY value 100 (100 units of work are done after each pop), bdlcc::BoundedQueue was the most performant (as indicated by the 100%), the throughput of bdlcc::FixedQueue was 80% that of bdlcc::BoundedQueue, and the throughput of bdlcc::Queue was 50% that of bdlcc::BoundedQueue. This cell is colored to indicate bdlcc::BoundedQueue was the most performant.
|
Multi-Producer, Multi-Consumer, Unbounded¶
Currently, the only available container for this set of constraints is bdlcc::Queue. As such, the result table is boring.
|
|
|
Multi-Producer, Multi-Consumer, Bounded¶
The containers available for this set of constraints are bdlcc::FixedQueue and bdlcc::BoundedQueue. bdlcc::BoundedQueue generally outperforms bdlcc::FixedQueue.
|
|
|
Multi-Producer, Multi-Consumer, Either¶
The containers available for this set of constraints are bdlcc::Queue, bdlcc::FixedQueue, and bdlcc::BoundedQueue. bdlcc::BoundedQueue generally outperforms the other implementations.
|
|
|
Single-Producer, Multi-Consumer, Unbounded¶
The containers available for this set of constraints are bdlcc::Queue and bdlcc::SingleProducerQueue. bdlcc::SingleProducerQueue either outperforms bdlcc::Queue or is a very close competitor (i.e., losing by only 1% of throughput).
|
|
|
Single-Producer, Multi-Consumer, Bounded¶
The containers available for this set of constraints are bdlcc::FixedQueue and bdlcc::BoundedQueue. The results indicate that bdlcc::FixedQueue outperforms bdlcc::BoundedQueue or is a close competitor. Arguably, what the results really show is the need for a special implementation for this constraint set.
|
|
|
Single-Producer, Multi-Consumer, Either¶
The containers available for this set of constraints are bdlcc::Queue, bdlcc::SingleProducerQueue, bdlcc::FixedQueue and bdlcc::BoundedQueue. bdlcc::SingleProducerQueue is generally the best choice at the moment. Hopefully, a single-producer, multi-consumer, bounded implementation will simplify the results in the future.
|
|
|
Multi-Producer, Single-Consumer, Unbounded¶
The containers available for this set of constraints are bdlcc::Queue and bdlcc::SingleConsumerQueue. bdlcc::SingleConsumerQueue generally dominates bdlcc::Queue. Expected improvements to bdlcc::SingleConsumerQueue will hopefully make bdlcc::SingleConsumerQueue the best candidate in all of these scenarios.
|
|
|
Multi-Producer, Single-Consumer, Bounded¶
The containers available for this set of constraints are bdlcc::FixedQueue and bdlcc::BoundedQueue. The mixed results imply the need for a special implementation for this constraint set.
|
|
|
Multi-Producer, Single-Consumer, Either¶
The containers available for this set of constraints are bdlcc::Queue, bdlcc::SingleConsumerQueue, bdlcc::FixedQueue and bdlcc::BoundedQueue. bdlcc::SingleConsumerQueue is generally the best choice at the moment. Hopefully, a multi-producer, single-consumer, bounded implementation will simplify the results in the future.
|
|
|
Single-Producer, Single-Consumer, Unbounded¶
The containers available for this set of constraints are bdlcc::Queue, bdlcc::SingleProducerQueue, and bdlcc::SingleConsumerQueue. bdlcc::SingleProducerQueue is generally the best choice at the moment. Hopefully, a single-producer, single-consumer, unbounded implementation will simplify the results in the future.
|
|
|
Single-Producer, Single-Consumer, Bounded¶
The containers available for this set of constraints are bdlcc::FixedQueue, bdlcc::BoundedQueue, and bdlcc::SingleProducerSingleConsumerBoundedQueue. The results indicate that bdlcc::SingleProducerSingleConsumerBoundedQueue outperforms the other implementations.
|
|
|
Single-Producer, Single-Consumer, Either¶
The containers available for this set of constraints are bdlcc::Queue, bdlcc::SingleProducerQueue, bdlcc::SingleConsumerQueue, bdlcc::FixedQueue, bdlcc::BoundedQueue, and bdlcc::SingleProducerSingleConsumerBoundedQueue. The results indicate that the selection of an implementation is very sensitive to the scenario.
|
|
|
Conclusion¶
The selection of a queue should always be based upon benchmarking the specific problem; the results presented here are for an unrelated micro-benchmark. As such, the results presented here should be viewed as a starting point to help with the selection process. bdlcc::BoundedQueue is a general replacement candidate for bdlcc::FixedQueue and is a reasonable selection for most problems when the problems allows for a bounded queue. When the problem is bounded and has a single producer and a single consumer, bdlcc::SingleProducerSingleConsumerBoundedQueue should always be a candidate implementation for the problem. For unbounded problems that can be solved using a single producer, bdlcc::SingleProducerQueue should always be a candidate implementation for the problem. Finally, for unbounded problems when a single producer is not a valid constraint, but a single consumer is a valid constraint, consider bdlcc::SingleConsumerQueue.
Extensive Results¶
An extensive set of result tables is provided in Extensive Results.