BDE 4.14.0 Production release
|
Macros | |
#define | BSLS_PERFORMANCEHINT_PREDICT_LIKELY(expr) (expr) |
#define | BSLS_PERFORMANCEHINT_PREDICT_UNLIKELY(expr) (expr) |
#define | BSLS_PERFORMANCEHINT_PREDICT_EXPECT(expr, value) (expr) |
#define | BSLS_PERFORMANCEHINT_ATTRIBUTE_COLD |
#define | BSLS_PERFORMANCEHINT_UNLIKELY_HINT |
Typedefs | |
typedef bsls::PerformanceHint | bsls_PerformanceHint |
This alias is defined for backward compatibility. | |
Provide performance hints for code optimization.
X
probably evaluates to non-zeroX
probably evaluates to zeroX
probably evaluates to Y
This component provides performance hints for the compiler or hardware. There are currently two types of hints that are supported:
The three macros provided, BSLS_PERFORMANCEHINT_PREDICT_LIKELY
, BSLS_PERFORMANCEHINT_PREDICT_UNLIKELY
, and BSLS_PERFORMANCEHINT_PREDICT_EXPECT
, can be used to optimize compiler generated code for branch prediction. The compiler, when given the hint under optimized mode (i.e., with BDE_BUILD_TARGET_OPT
defined) will rearrange the assembly instructions it generates to minimize the number of jumps needed.
The following describes the macros provided by this component:
Please use the macros provided in this component with caution. Always profile your code to get an idea of actual usage before attempting to optimize with these macros. Furthermore, these macros are merely hints to the compiler. Whether or not they will have visible effect on performance is not guaranteed. Note that one can perform similar optimization with a profile-based compilation. When compiled with the proper options, the compiler can collect usage information of the code, and such information can then be passed back to recompile the code in a more optimized form. Please refer to the compiler manual for more information.
There is a bug in gcc 4.2, 4.3, and 4.4 such that when using the branch prediction macros with multiple conditions, the generated code might not be properly optimized. For example:
The work-around is simply to split the conditions:
This applies to all of the "likely", "unlikely", and "expect" macros defined in this component. Note that a bug report has been filed:
The two functions provided in the bsls::PerformanceHint
struct
are prefetchForReading
and prefetchForWriting
. Use of these functions will cause the compiler to generate prefetch instructions to prefetch one cache line worth of data at the specified address into the cache line to minimize processor stalls.
Warning
These functions must be used with caution. Inappropriate use of these functions degrades performance. Note that there should be sufficient time for the prefetch instruction to finish before the specified address is accessed, otherwise prefetching will be pointless. A profiler should be used to understand the program's behavior before attempting to optimize with these functions.
The macro BSLS_PERFORMANCEHINT_OPTIMIZATION_FENCE
prevents some compiler optimizations, particularly compiler instruction reordering. This fence does not map to a CPU instruction and has no impact on processor instruction re-ordering, and therefore should not be used to synchronize memory between threads. The fence may be useful in unusual contexts, like performing benchmarks, or working around bugs identified in the compiler's optimizer.
Warning
This macro should be used with caution. The macro will generally decrease the performance of code on which it is applied, and is not implemented on all platforms.
The following series of examples illustrates use of the macros and functions provided by this component.
The following demonstrates the use of BSLS_PERFORMANCEHINT_PREDICT_LIKELY
and BSLS_PERFORMANCEHINT_PREDICT_UNLIKELY
to generate more efficient assembly instructions. Note the use of BSLS_PERFORMANCEHINT_UNLIKELY_HINT
inside the if
branch for maximum portability.
An excerpt of the assembly code generated using xlC
Version 10 on AIX from this small program is:
Now, if BSLS_PERFORMANCEHINT_PREDICT_LIKELY
is changed to BSLS_PERFORMANCEHINT_PREDICT_UNLIKELY
, and the BSLS_PERFORMANCEHINT_UNLIKELY_HINT
is moved to the first branch, the following assembly code will be generated:
A timing analysis shows that effective use of branch prediction can have a material effect on code efficiency:
This macro is essentially the same as the __builtin_expect(expr, value)
macro that is provided by some compilers. This macro allows the user to define more complex hints to the compiler, such as the optimization of switch
statements. For example, given:
the following is incorrect usage of BSLS_PERFORMANCEHINT_PREDICT_EXPECT
, since the probability of getting a 3 is equivalent to the other possibilities ( 0, 1, 2 ):
However, this is sufficient to illustrate the intent of this macro.
The following demonstrates use of prefetchForReading
and prefetchForWriting
to prefetch data cache lines:
The above code simply adds two arrays together multiple times. Using bsls::Stopwatch
, we recorded the running time and printed it to stdout
:
Now, we can observe that in the add
function, arrayA
and arrayB
are accessed sequentially for the majority of the program. arrayA
is used for writing and arrayB
is used for reading. Making use of prefetch, we add calls to prefetchForReading
and prefetchForWriting
:
Adding the prefetch improves the program's efficiency:
Note that we prefetch the address 16 * sizeof(int)
bytes away from arrayA
. This is such that the prefetch instruction has sufficient time to finish before the data is actually accessed. To see the difference, if we changed + 16
to + 4
:
And we get less of an improvement in speed. Similarly, if we prefetch too far away from the data use, the data might be removed from the cache before it is looked at and the prefetch is wasted.
#define BSLS_PERFORMANCEHINT_ATTRIBUTE_COLD |
#define BSLS_PERFORMANCEHINT_PREDICT_EXPECT | ( | expr, | |
value | |||
) | (expr) |
#define BSLS_PERFORMANCEHINT_PREDICT_LIKELY | ( | expr | ) | (expr) |
#define BSLS_PERFORMANCEHINT_PREDICT_UNLIKELY | ( | expr | ) | (expr) |
#define BSLS_PERFORMANCEHINT_UNLIKELY_HINT |