Example 4-27 Eliminating Delay For A Series Of Small Loads After A Large Store; Example 4-26 A Series Of Small Loads After A Large Store - Intel ARCHITECTURE IA-32 Reference Manual

Architecture optimization
Table of Contents

Advertisement

IA-32 Intel® Architecture Optimization
Let us now consider a case with a series of small loads after a large store
to the same area of memory (beginning at memory address
shown in Example 4-26. Most of the small loads will stall because they
are not aligned with the store; see "Store Forwarding" in Chapter 2 for
more details.

Example 4-26 A Series of Small Loads after a Large Store

movq
mov
mov
The word loads must wait for the quadword store to write to memory
before they can access the data they require. This stall can also occur
with other data types (for example, when doublewords or words are
stored and then words or bytes are read from the same area of memory).
When you change the code sequence as shown in Example 4-27, the
processor can access the data without delay.

Example 4-27 Eliminating Delay for a Series of Small Loads after a Large Store

movq
movq
movd
psrlq
shr
movd
and
4-36
mem, mm0
:
:
bx, mem + 2
cx, mem + 4
mem, mm0
; store qword to address "mem"
:
:
mm1, mem
; load qword at address "mem"
eax, mm1
; transfer "mem + 2" to eax from
; MMX register, not memory
mm1, 32
eax, 16
ebx, mm1
; transfer "mem + 4" to bx from
; MMX register, not memory
ebx, 0ffffh
; store qword to address "mem"
; load word at "mem + 2" stalls
; load word at "mem + 4" stalls
) as
mem

Advertisement

Table of Contents
loading
Need help?

Need help?

Do you have a question about the ARCHITECTURE IA-32 and is the answer not in the manual?

Table of Contents

Save PDF