IA-32 Intel® Architecture Optimization
add esi,ecx
add edi,ecx
sub edx,ecx
jnz main_loop
sfence
}
}
Performance Comparisons of Memory Copy Routines
The throughput of a large-region, memory copy routine depends on
several factors:
•
coding techniques that implements the memory copy task
•
characteristics of the system bus (speed, peak bandwidth, overhead
in read/write transaction protocols)
•
microarchitecture of the processor
A comparison of the two coding techniques discussed above and two
un-optimized techniques is shown in Table 6-2.
Table 6-2
Relative Performance of Memory Copy Routines
Processor, CPUID
Signature and
FSB Speed
Pentium M processor,
0x6Dn, 400
Intel Core Solo and
Intel Core Duo
processors, 0x6En,
667
Pentium D processor,
0xF4n, 800
6-52
Byte
DWORD
Sequential
Sequential
1.3X
1.2X
3.3X
3.5X
3.4X
3.3X
4KB-Block
SW prefetch +
HW prefetch
8 byte
+ 16 byte
streaming
streaming
store
stores
1.6X
2.5X
2.1X
4.7X
4.9X
5.7X
Need help?
Do you have a question about the ARCHITECTURE IA-32 and is the answer not in the manual?