2.2.1.8
Store Buffers May Satisfy Local Loads
In the Itanium memory ordering model, store buffers (or other logically-equivalent
structures) may satisfy local read requests from loads or acquire loads even if the
stored data is not yet visible to other agents in the coherence domain. Such bypassing
must honor any ordering semantics in the memory reference stream.
Table 2-11
.
Table 2-10.
st.rel
ld.acq
ld
In this sequence, each processor bypasses its locally-written value from a store buffer
before the value becomes visible to the other processor. This behavior may make
accesses of different sizes that have overlapping memory addresses appear to complete
non-atomically.
The following discussion focuses on the outcome r1 = 1, r3 = 1, r2 = 0, and r4 = 0
because this outcome is allowed if and only if store buffers can satisfy local loads (other
outcomes are allowed but do not depend on being able to satisfy local loads from a
store buffer).
The Itanium memory ordering semantics only require that
There are no constraints on the relative ordering of M1 and M2 or M3 nor on the relative
ordering of M4 and M5 or M6.
Remember that both dependencies and the memory ordering model place requirements
on the manner in which a processor based on the Itanium architecture may re-order
accesses. Even though the Itanium memory ordering model allows loads to pass stores,
a processor based on the Itanium architecture cannot re-order the following sequence:
st.rel
ld.acq
This is because there is a RAW dependency through memory between M1 and M2 and
the Itanium memory ordering model requires that the local processor resolve RAW,
WAR, and WAW dependencies between its memory accesses in program order. Thus,
M1
M2
ordering of M1 and M2.
Because there is a RAW dependency through memory between M1 and M2 and between
M4 and M5, the ordering constraints effectively become:
1.
That is, the store operations must become visible to the local processors before their loads that read
the stored value.
2:518
that
Section 2.2.1.9
presents illustrate this behavior.
Store Buffers May Satisfy Loads if the Stored Data is Not Yet
Globally Visible
Processor #0
[x] = 1
r1 = [x]
r2 = [y]
Outcome: r1 = 1, r3 = 1, r2 = 0, and r4 = 0 is allowed
[x] = r0
r1 = [x]
even though the ordering semantics place no constraints on the relative
// M1
st.rel
// M2
ld.acq
// M3
ld
// M1: store 0 to [x]
// M2: cannot move above st.rel due to RAW
M1
M2
M3
M4
M5
M6
Volume 2, Part 2: MP Coherence and Synchronization
Table 2-10
Processor #1
[y] = 1
r3 = [y]
r4 = [x]
and
M2
M3
M5
1
and
// M4
// M5
// M6
.
M6
Need help?
Do you have a question about the ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS MANUAL VOLUME 1 REV 2.3 and is the answer not in the manual?
Questions and answers