Explicitly Extract Common Subexpressions - AMD Athlon Processor x86 Optimization Manual

X86 code optimization
Table of Contents

Advertisement

AMD Athlon™ Processor x86 Code Optimization

Explicitly Extract Common Subexpressions

26
lead to unexpected results. Fortunately, in the vast majority of
cases, the final result will differ only in the least significant
bits.
Example 1 (Avoid):
double a[100],sum;
int i;
sum = 0.0f;
for (i=0; i<100; i++) {
sum += a[i];
}
Example 2 (Preferred):
double a[100],sum1,sum2,sum3,sum4,sum;
int i;
sum1 = 0.0;
sum2 = 0.0;
sum3 = 0.0;
sum4 = 0.0;
for (i=0; i<100; i+4) {
sum1 += a[i];
sum2 += a[i+1];
sum3 += a[i+2];
sum4 += a[i+3];
}
sum = (sum4+sum3)+(sum1+sum2);
Notice that the 4-way unrolling was chosen to exploit the 4-stage
fully pipelined floating-point adder. Each stage of the floating-
point adder is occupied on every clock cycle, ensuring maximal
sustained utilization.
In certain situations, C compilers are unable to extract common
subexpressions from floating-point expressions due to the
guarantee against reordering of such expressions in the ANSI
standard. Specifically, the compiler can not re-arrange the
computation according to algebraic equivalencies before
extracting common subexpressions. In such cases, the
p r o g ra m m e r s h o u l d m a nu a l ly e x t ra c t t h e c o m m o n
subexpression. It should be noted that re-arranging the
expression may result in different computational results due to
the lack of associativity of floating-point operations, but the
results usually differ in only the least significant bits.
22007E/0—November 1999
Explicitly Extract Common Subexpressions

Advertisement

Table of Contents
loading

Table of Contents