AMD Athlon™ Processor x86 Code Optimization
One Supported Store-
to-Load Forwarding
Case
Summary of Store-to-Load Forwarding Pitfalls to Avoid
Stack Alignment Considerations
Extend to 32 Bits
Before Pushing onto
Stack
54
There is one case of a mismatched store-to-load forwarding that
is supported by the by AMD Athlon processor. The lower 32 bits
from an aligned QWORD write feeding into a DWORD read is
allowed.
Example 8 (Allowed):
MOVQ
[AlignedQword], mm0
...
MOV
EAX, [AlignedQword]
To avoid store-to-load forwarding pitfalls, code should conform
to the following guidelines:
Maintain consistent use of operand size across all loads and
stores. Preferably, use doubleword or quadword operand
sizes.
Avoid misaligned data references.
Avoid narrow-to-wide and wide-to-narrow forwarding cases.
When using word or byte stores, avoid loading data from
anywhere in the same doubleword of memory other than the
identical start addresses of the stores.
Make sure the stack is suitably aligned for the local variable
with the largest base type. Then, using the technique described
in "C Language Structure Component Considerations" on page
55, all variables can be properly aligned with no padding.
Function arguments smaller than 32 bits should be extended to
32 bits before being pushed onto the stack, which ensures that
the stack is always doubleword aligned on entry to a function.
If a function has no local variables with a base type larger than
doubleword, no further work is necessary. If the function does
have lo ca l variables whos e ba se type is la rger than a
doubleword, additional code should be inserted to ensure
proper alignment of the stack. For example, the following code
achieves quadword alignment:
22007E/0—November 1999
Stack Alignment Considerations
Need help?
Do you have a question about the Athlon Processor x86 and is the answer not in the manual?
Questions and answers