Parallel Arithmetic Instructions - Intel ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS MANUAL VOLUME 1 REV 2.3 Manual

Hide thumbs Also See for ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS MANUAL VOLUME 1 REV 2.3:
Table of Contents

Advertisement

saturation form treats both sources as signed and clamps the result to the limits of a
signed range. The unsigned saturation form treats one source as unsigned and clamps
the result to the limits of an unsigned range. Two variants are defined that treat the
second source as either signed (.uus) or unsigned (.uuu).
The parallel average instruction (pavg, pavg.raz) adds corresponding elements from
each source and right shifts each result by one bit. In the simple form of the
instruction, the carry out of the most-significant bit of each sum is written into the most
significant bit of the result element. In the round-away-from-zero form, a 1 is added to
each sum before shifting. The parallel average subtract instruction (pavgsub) performs
a similar operation on the difference of the sources.
The parallel shift left and add instruction (pshladd) performs a left shift on the
elements of the first source and then adds them to the corresponding elements from
the second source. Signed saturation is performed on both the shift and the add
operations. The parallel shift right and add instruction (pshradd) is similar to pshladd.
Both of these instructions are defined for 2-byte elements only.
The parallel compare instruction (pcmp) compares the corresponding elements of both
sources and writes all ones (if true) or all zeroes (if false) into the corresponding
elements of the target according to one of two relations (== or >).
The parallel multiply right instruction (pmpy.r) multiplies the corresponding two
even-numbered signed 2-byte elements of both sources and writes the results into two
4-byte elements in the target. The pmpy.l instruction performs a similar operation on
odd-numbered 2-byte elements. The parallel multiply and shift right instruction
(pmpyshr, pmpyshr.u) multiplies the corresponding 2-byte elements of both sources
producing four 4-byte results. The 4-byte results are shifted right by 0, 7, 15, or 16 bits
as specified by the instruction. The least-significant 2 bytes of the 4-byte shifted results
are then stored in the target register.
The parallel sum of absolute difference instruction (psad) accumulates the absolute
difference of corresponding 1-byte elements and writes the result in the target.
The parallel minimum (pmin.u, pmin) and the parallel maximum (pmax.u, pmax)
instructions deliver the minimum or maximum, respectively, of the corresponding
1-byte or 2-byte elements in the target. The 1-byte elements are treated as unsigned
values and the 2-byte elements are treated as signed values.
Table 4-29.
Mnemonic
padd
padd.sss
padd.uuu,
padd.uus
psub
psub.sss
psub.uuu,
psub.uus
pavg
pavg.raz
pavgsub
1:80

Parallel Arithmetic Instructions

Operation
Parallel modulo addition
Parallel addition with signed saturation
Parallel addition with unsigned saturation
Parallel modulo subtraction
Parallel subtraction with signed saturation
Parallel subtraction with unsigned saturation
Parallel arithmetic average
Parallel arithmetic average with round away from zero
Parallel average of a difference
1-byte
2-byte
x
x
x
x
x
x
x
x
x
Volume 1, Part 1: Application Programming Model
4-byte
x
x
x
x
x
x
x
x
x
x
x

Advertisement

Table of Contents
loading

This manual is also suitable for:

Itanium architecture 2.3

Table of Contents