Intel ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS MANUAL VOLUME 1 REV 2.3 Manual page 1774

Hide thumbs Also See for ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS MANUAL VOLUME 1 REV 2.3:
Table of Contents

Advertisement

The PMULHUW (Unsigned high packed integer word multiply in MMX technology
register) instruction performs an unsigned multiply on each word field of the two source
MMX technology registers, returning the high word of each result to a MMX technology
register.
The PSADBW (Sum of absolute differences) instruction computes the absolute
difference for each pair of sub-operand byte sources and then accumulates the 8
differences into a single 16-bit result.
The PSHUFW (Shuffle packed integer word in MMX technology register) instruction
performs a full shuffle of any source word field to any result word field, using an 8-bit
immediate operand.
4.6.1.9
Cacheability Control Instructions
Data referenced by a programmer can have temporal (data will be used again) or
spatial (data will be in adjacent locations, e.g. same cache line) locality. Some
multimedia data types, such as the display list in a 3D graphics application, are
referenced once and not reused in the immediate future. We will refer to this data type
as non-temporal data. Thus the programmer does not want the application's cached
code and data to be overwritten by this non-temporal data. The cacheability control
instructions enable the programmer to control caching so that non-temporal accesses
will minimize cache pollution.
In addition, the execution engine needs to be fed such that it does not become stalled
waiting for data. SSE instructions allow the programmer to prefetch data long before
it's final use. These instructions are not architectural since they do not update any
architectural state, and are specific to each implementation. The programmer may have
to tune his application for each implementation to take advantage of these instructions.
These instructions merely provide a hint to the hardware, and they will not generate
exceptions or faults. Excessive use of prefetch instructions may be throttled by the
processor.
The following four instructions provide hints to the cache hierarchy which enables the
data to be prefetched to different levels of the cache hierarchy and avoid polluting
cache with non-temporal data.
The MASKMOVQ (Non-temporal byte mask store of packed integer in a MMX technology
register) instruction stores data from a MMX technology register to the location
specified by the EDI register. The most significant bit in each byte of the second MMX
technology mask register is used to selectively write the data of the first register on a
per-byte basis. The instruction is implicitly weakly-ordered, with all of the
characteristics of the WC memory type; successive non-temporal stores may not write
memory in program-order, do not write-allocate (i.e. the processor will not fetch the
corresponding cache line into the cache hierarchy, prior to performing the store), write
combine/collapse, and minimize cache pollution.
The MOVNTQ (Non-temporal store of packed integer in a MMX technology register)
instruction stores data from a MMX technology register to memory. The instruction is
implicitly weakly-ordered, does not write-allocate and minimizes cache pollution.
4:472
Volume 4: IA-32 SSE Instruction Reference

Advertisement

Table of Contents
loading

This manual is also suitable for:

Itanium architecture 2.3

Table of Contents