BDE 4.14.0 Production release
Loading...
Searching...
No Matches
bsls_performancehint

Macros

#define BSLS_PERFORMANCEHINT_PREDICT_LIKELY(expr)   (expr)
 
#define BSLS_PERFORMANCEHINT_PREDICT_UNLIKELY(expr)   (expr)
 
#define BSLS_PERFORMANCEHINT_PREDICT_EXPECT(expr, value)   (expr)
 
#define BSLS_PERFORMANCEHINT_ATTRIBUTE_COLD
 
#define BSLS_PERFORMANCEHINT_UNLIKELY_HINT
 

Typedefs

typedef bsls::PerformanceHint bsls_PerformanceHint
 This alias is defined for backward compatibility.
 

Detailed Description

Outline

Purpose

Provide performance hints for code optimization.

Classes

Macros

Description

This component provides performance hints for the compiler or hardware. There are currently two types of hints that are supported:

Branch Prediction

The three macros provided, BSLS_PERFORMANCEHINT_PREDICT_LIKELY, BSLS_PERFORMANCEHINT_PREDICT_UNLIKELY, and BSLS_PERFORMANCEHINT_PREDICT_EXPECT, can be used to optimize compiler generated code for branch prediction. The compiler, when given the hint under optimized mode (i.e., with BDE_BUILD_TARGET_OPT defined) will rearrange the assembly instructions it generates to minimize the number of jumps needed.

The following describes the macros provided by this component:

Macro Name Description of Macro
---------------------------------------- -----------------------------
BSLS_PERFORMANCEHINT_PREDICT_LIKELY(expr) Hint to the compiler that the
specified *integral* 'expr'
expression is likely to
evaluate to non-zero.
Returns 'true' or 'false'
depending on the result of
the expression.
BSLS_PERFORMANCEHINT_PREDICT_UNLIKELY(expr) Hint to the compiler that the
specified *integral* 'expr'
expression is likely to
evaluate to zero. Returns
'true' or 'false' depending
on the result of the
expression.
BSLS_PERFORMANCEHINT_PREDICT_EXPECT(expr, value)
Hint to the compiler that the
specified *integral* 'expr'
expression is likely to
evaluate to the specified
'value'. Returns the result
of the expression.
BSLS_PERFORMANCEHINT_UNLIKELY_HINT Hint to the compiler that the
block which contains the hint
is unlikely chosen. Use this
in conjunction with the
'PREDICT_UNLIKELY' clause for
maximum portability.
#define BSLS_PERFORMANCEHINT_PREDICT_LIKELY(expr)
Definition bsls_performancehint.h:451

Warning

Please use the macros provided in this component with caution. Always profile your code to get an idea of actual usage before attempting to optimize with these macros. Furthermore, these macros are merely hints to the compiler. Whether or not they will have visible effect on performance is not guaranteed. Note that one can perform similar optimization with a profile-based compilation. When compiled with the proper options, the compiler can collect usage information of the code, and such information can then be passed back to recompile the code in a more optimized form. Please refer to the compiler manual for more information.

Limitations

There is a bug in gcc 4.2, 4.3, and 4.4 such that when using the branch prediction macros with multiple conditions, the generated code might not be properly optimized. For example:

// ...
}

The work-around is simply to split the conditions:

This applies to all of the "likely", "unlikely", and "expect" macros defined in this component. Note that a bug report has been filed:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42233

Data Cache Prefetching

The two functions provided in the bsls::PerformanceHint struct are prefetchForReading and prefetchForWriting. Use of these functions will cause the compiler to generate prefetch instructions to prefetch one cache line worth of data at the specified address into the cache line to minimize processor stalls.

Function Name Description of Function
------------------------ ------------------------------------------
prefetchForReading(address) Prefetches one cache line worth of data at
the specified 'address' for reading.
prefetchForWriting(address) Prefetches one cache line worth of data at
the specified 'address' for writing.

Warning

‍These functions must be used with caution. Inappropriate use of these functions degrades performance. Note that there should be sufficient time for the prefetch instruction to finish before the specified address is accessed, otherwise prefetching will be pointless. A profiler should be used to understand the program's behavior before attempting to optimize with these functions.

Optimization Fence

The macro BSLS_PERFORMANCEHINT_OPTIMIZATION_FENCE prevents some compiler optimizations, particularly compiler instruction reordering. This fence does not map to a CPU instruction and has no impact on processor instruction re-ordering, and therefore should not be used to synchronize memory between threads. The fence may be useful in unusual contexts, like performing benchmarks, or working around bugs identified in the compiler's optimizer.

Warning

‍This macro should be used with caution. The macro will generally decrease the performance of code on which it is applied, and is not implemented on all platforms.

Usage

The following series of examples illustrates use of the macros and functions provided by this component.

Example 1: Using the Branch Prediction Macros

The following demonstrates the use of BSLS_PERFORMANCEHINT_PREDICT_LIKELY and BSLS_PERFORMANCEHINT_PREDICT_UNLIKELY to generate more efficient assembly instructions. Note the use of BSLS_PERFORMANCEHINT_UNLIKELY_HINT inside the if branch for maximum portability.

volatile int global;
void foo()
{
global = 1;
}
void bar()
{
global = 2;
}
int main(int argc, char **argv)
{
argc = std::atoi(argv[1]);
for (int x = 0; x < argc; ++x) {
int y = std::rand() % 10;
// Correct usage of 'BSLS_PERFORMANCEHINT_PREDICT_LIKELY' since
// there are nine of ten chance that this branch is taken.
foo();
}
else {
bar();
}
}
return 0;
}
#define BSLS_PERFORMANCEHINT_UNLIKELY_HINT
Definition bsls_performancehint.h:484

An excerpt of the assembly code generated using xlC Version 10 on AIX from this small program is:

b8: 2c 00 00 08 cmpwi r0,8
bc: 41 82 00 38 beq- f4 <.main+0xb4>
^
Note that if register r0 (y) equals 8, branch to
instruction f4 (a jump). The '-' after 'beq'
indicates that the branch is unlikely to be taken.
The predicted code path continues the 'if'
statement, which calls 'foo' below.
c0: 4b ff ff 41 bl 0 <.foo__Fv>
...
f4: 4b ff ff 2d bl 20 <.bar__Fv>

Now, if BSLS_PERFORMANCEHINT_PREDICT_LIKELY is changed to BSLS_PERFORMANCEHINT_PREDICT_UNLIKELY, and the BSLS_PERFORMANCEHINT_UNLIKELY_HINT is moved to the first branch, the following assembly code will be generated:

b8: 2c 00 00 08 cmpwi r0,8
bc: 40 c2 00 38 bne- f4 <.main+0xb4>
^
Note that the test became a "branch not equal"
test. The predicted code path now continues to the
'else' statement, which calls 'bar' below.
c0: 4b ff ff 61 bl 20 <.bar__Fv>
...
f4: 4b ff ff 0d bl 0 <.foo__Fv>

A timing analysis shows that effective use of branch prediction can have a material effect on code efficiency:

$time ./unlikely.out 100000000
real 0m2.022s
user 0m2.010s
sys 0m0.013s
$time ./likely.out 100000000
real 0m2.159s
user 0m2.149s
sys 0m0.005s

Example 2: Using BSLS_PERFORMANCEHINT_PREDICT_EXPECT

This macro is essentially the same as the __builtin_expect(expr, value) macro that is provided by some compilers. This macro allows the user to define more complex hints to the compiler, such as the optimization of switch statements. For example, given:

int x = std::rand() % 4;

the following is incorrect usage of BSLS_PERFORMANCEHINT_PREDICT_EXPECT, since the probability of getting a 3 is equivalent to the other possibilities ( 0, 1, 2 ):

case 1: //..
break;
case 2: //..
break;
case 3: //..
break;
default: break;
}
#define BSLS_PERFORMANCEHINT_PREDICT_EXPECT(expr, value)
Definition bsls_performancehint.h:453

However, this is sufficient to illustrate the intent of this macro.

Example 3: Cache Line Prefetching

The following demonstrates use of prefetchForReading and prefetchForWriting to prefetch data cache lines:

const int SIZE = 10 * 1024 * 1024;
void add(int *arrayA, int *arrayB)
{
for (int i = 0; i < SIZE / 8; ++i){
*arrayA += *arrayB; ++arrayA; ++arrayB;
*arrayA += *arrayB; ++arrayA; ++arrayB;
*arrayA += *arrayB; ++arrayA; ++arrayB;
*arrayA += *arrayB; ++arrayA; ++arrayB;
*arrayA += *arrayB; ++arrayA; ++arrayB;
*arrayA += *arrayB; ++arrayA; ++arrayB;
*arrayA += *arrayB; ++arrayA; ++arrayB;
*arrayA += *arrayB; ++arrayA; ++arrayB;
}
}
int array1[SIZE];
int array2[SIZE];
int main()
{
BloombergLP::bsls::Stopwatch timer;
timer.start();
for (int i = 0; i < 10; ++i) {
add(array1, array2);
}
printf("time: %f\n", timer.elapsedTime());
return 0;
}

The above code simply adds two arrays together multiple times. Using bsls::Stopwatch, we recorded the running time and printed it to stdout:

$./prefetch.sundev1.tsk
time: 8.446806

Now, we can observe that in the add function, arrayA and arrayB are accessed sequentially for the majority of the program. arrayA is used for writing and arrayB is used for reading. Making use of prefetch, we add calls to prefetchForReading and prefetchForWriting:

void add2(int *arrayA, int *arrayB)
{
for (int i = 0; i < SIZE / 8; ++i){
using namespace BloombergLP; // Generally avoid 'using' in this TD.
*arrayA += *arrayB; ++arrayA; ++arrayB;
*arrayA += *arrayB; ++arrayA; ++arrayB;
*arrayA += *arrayB; ++arrayA; ++arrayB;
*arrayA += *arrayB; ++arrayA; ++arrayB;
*arrayA += *arrayB; ++arrayA; ++arrayB;
*arrayA += *arrayB; ++arrayA; ++arrayB;
*arrayA += *arrayB; ++arrayA; ++arrayB;
*arrayA += *arrayB; ++arrayA; ++arrayB;
}
}
static void prefetchForWriting(void *address)
Definition bsls_performancehint.h:617
static void prefetchForReading(const void *address)
Definition bsls_performancehint.h:582

Adding the prefetch improves the program's efficiency:

$./prefetch.sundev1.tsk
time: 6.442100

Note that we prefetch the address 16 * sizeof(int) bytes away from arrayA. This is such that the prefetch instruction has sufficient time to finish before the data is actually accessed. To see the difference, if we changed + 16 to + 4:

$./prefetch.sundev1.tsk
time: 6.835928

And we get less of an improvement in speed. Similarly, if we prefetch too far away from the data use, the data might be removed from the cache before it is looked at and the prefetch is wasted.

Macro Definition Documentation

◆ BSLS_PERFORMANCEHINT_ATTRIBUTE_COLD

#define BSLS_PERFORMANCEHINT_ATTRIBUTE_COLD

◆ BSLS_PERFORMANCEHINT_PREDICT_EXPECT

#define BSLS_PERFORMANCEHINT_PREDICT_EXPECT (   expr,
  value 
)    (expr)

◆ BSLS_PERFORMANCEHINT_PREDICT_LIKELY

#define BSLS_PERFORMANCEHINT_PREDICT_LIKELY (   expr)    (expr)

◆ BSLS_PERFORMANCEHINT_PREDICT_UNLIKELY

#define BSLS_PERFORMANCEHINT_PREDICT_UNLIKELY (   expr)    (expr)

◆ BSLS_PERFORMANCEHINT_UNLIKELY_HINT

#define BSLS_PERFORMANCEHINT_UNLIKELY_HINT

Typedef Documentation

◆ bsls_PerformanceHint