Dynamic Memory Allocation Consideration; Introduce Explicit Parallelism Into Code - AMD Athlon Processor x86 Optimization Manual

X86 code optimization
Table of Contents

Advertisement

22007E/0—November 1999

Dynamic Memory Allocation Consideration

Introduce Explicit Parallelism into Code

Dynamic Memory Allocation Consideration
w h i ch m i g h t i nh ib it c e rt a i n o p t i m i z a t i o n s w i t h so m e
compilers—for example, aggressive inlining.
Dynamic memory allocation ('malloc' in C language) should
always return a pointer that is suitably aligned for the largest
base type (quadword alignment). Where this aligned pointer
cannot be guaranteed, use the technique shown in the following
code to make the pointer quadword aligned, if needed. This
code assumes the pointer can be cast to a long.
Example:
double* p;
double* np;
p
= (double *)malloc(sizeof(double)*number_of_doubles+7L);
np = (double *)((((long)(p))+7L) & (–8L));
Then use 'np' instead of 'p' to access the data. 'p' is still needed
in order to deallocate the storage.
Where possible, long dependency chains should be broken into
several independent dependency chains which can then be
executed in parallel exploiting the pipeline execution units.
This is especially important for floating-point code, whether it
is mapped to x87 or 3DNow! instructions because of the longer
latency of floating-point operations. Since most languages,
including ANSI C, guarantee that floating-point expressions are
not re-ordered, compilers can not usually perform such
optimizations unless they offer a switch to allow ANSI non-
compliant reordering of floating-point expressions according to
algebraic rules.
Note that re-ordered code that is algebraically identical to the
o r i g i n a l c o d e d o e s n o t n e c e s s a r i ly d e l ive r i d e n t i c a l
computational results due to the lack of associativity of floating
p o i n t o p e ra t i o n s . T h e re a re w e l l - k n ow n n u m e r i c a l
considerations in applying these optimizations (consult a book
on numerical analysis). In some cases, these optimizations may
AMD Athlon™ Processor x86 Code Optimization
25

Advertisement

Table of Contents
loading

Table of Contents