Optimizing Branches - Intel PXA255 User Manual

Xscale microarchitecture
Hide thumbs Also See for PXA255:
Table of Contents

Advertisement

Optimization Guide
The optimized code generated for the above code segment would look like:
L6:
.
.
subs r3, r3, #1
bne
It is also beneficial to rewrite loops whenever possible so as to make the loop exit conditions check
against the value 0. For example, the code generated for the code segment below will need a
compare instruction to check for the loop exit condition.
for (i = 0; i < 10; i++)
{
do something;
}
If the loop were rewritten as follows, the code generated avoids using the compare instruction to
check for the loop exit condition.
for (i = 9; i >= 0; i--)
{
do something;
}
A.3.1.2.

Optimizing Branches

Branches decrease application performance by indirectly causing pipeline stalls. Branch prediction
improves the performance by lessening the delay inherent in fetching a new instruction stream. The
number of branches that can accurately be predicted is limited by the size of the branch target
buffer. Since the total number of branches executed in a program is relatively large compared to the
size of the branch target buffer; it is often beneficial to minimize the number of branches in a
program. Consider the following C code segment.
int foo(int a)
{
if (a > 10)
return 0;
else
return 1;
}
The code generated for the if-else portion of this code segment using branches is:
cmp
ble
mov
b
L1:
mov
L2:
The code generated above takes three cycles to execute the else part and four cycles for the if-part
assuming best case conditions and no branch misprediction penalties. In the case of the Intel®
XScale™ core, a branch misprediction incurs a penalty of four cycles. If the branch is mispredicted
50% of the time, and if we assume that both the if-part and the else-part are equally likely to be
taken, on an average the code above takes 5.5 cycles to execute.
A-8
.L6
r0, #10
L1
r0, #0
L2
r0, #1
50
3
+
4
×
-------- -
4
+
----------- -
=
5.5
100
2
.
cycles
Intel® XScale™ Microarchitecture User's Manual

Advertisement

Table of Contents
loading

Table of Contents