Page 3
Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling1-800-548-4725, or by visiting Intel's website at http://www.intel.com.
Part 1: Application Architecture Guide ......3:1 1.1.2 Part 2: Optimization Guide for the Intel® Itanium® Architecture ..3:1 Overview of Volume 2: System Architecture.
Page 5
Function of getf.sig ............3:143 ® ® Intel Itanium Architecture Software Developer’s Manual, Rev. 2.3...
IA-32 application interface. This volume also describes optimization techniques used to generate high performance software. 1.1.1 Part 1: Application Architecture Guide ® Chapter 1, “About this Manual” provides an overview of all volumes in the Intel ® Itanium Architecture Software Developer’s Manual. ® ®...
1.2.1 Part 1: System Architecture Guide ® Chapter 1, “About this Manual” provides an overview of all volumes in the Intel ® Itanium Architecture Software Developer’s Manual. ® ®...
Chapter 9, “IA-32 Interruption Vector Descriptions” lists IA-32 exceptions, interrupts and intercepts that can occur during IA-32 instruction set execution in the Itanium System Environment. ® Chapter 10, “Itanium Architecture-based Operating System Interaction Model with IA-32 Applications” defines the operation of IA-32 instructions within the Itanium System Environment from the perspective of an Itanium architecture-based operating system.
Instruction Set Reference This volume is a comprehensive reference to the Itanium instruction set, including instruction format/encoding. ® Chapter 1, “About this Manual” provides an overview of all volumes in the Intel ® Itanium Architecture Software Developer’s Manual. Chapter 2, “Instruction Reference”...
These resources include instructions and registers. Itanium Architecture – The new ISA with 64-bit instruction capabilities, new performance- enhancing features, and support for the IA-32 instruction set. IA-32 Architecture – The 32-bit and 16-bit Intel architecture as described in the ® Intel 64 and IA-32 Architectures Software Developer’s Manual.
® • Intel 64 and IA-32 Architectures Software Developer’s Manual – This set of manuals describes the Intel 32-bit architecture. They are available from the Intel Literature Department by calling 1-800-548-4725 and requesting Document Numbers 243190, 243191and 243192. ® ®...
Page 16
Date of Revision Description Revision Number August 2005 Allow register fields in CR.LID register to be read-only and CR.LID checking on interruption messages by processors optional. See Vol 2, Part I, Ch 5 “Interruptions” and Section 11.2.2 PALE_RESET Exit State for details. Relaxed reserved and ignored fields checkings in IA-32 application registers in Vol 1 Ch 6 and Vol 2, Part I, Ch 10.
Page 17
Date of Revision Description Revision Number August 2002 Added Predicate Behavior of alloc Instruction Clarification (Section 4.1.2, Part I, Volume 1; Section 2.2, Part I, Volume 3). Added New fc.i Instruction (Section 4.4.6.1, and 4.4.6.2, Part I, Volume 1; Section 4.3.3, 4.4.1, 4.4.5, 4.4.6, 4.4.7, 5.5.2, and 7.1.2, Part I, Volume 2; Section 2.5, 2.5.1, 2.5.2, 2.5.3, and 4.5.2.1, Part II, Volume 2;...
Page 18
Date of Revision Description Revision Number Volume 2: Class pr-writers-int clarification (Table A-5). PAL_MC_DRAIN clarification (Section 4.4.6.1). VHPT walk and forward progress change (Section 4.1.1.2). IA-32 IBR/DBR match clarification (Section 7.1.1). ISR figure changes (pp. 8-5, 8-26, 8-33 and 8-36). PAL_CACHE_FLUSH return argument change –...
Page 19
Date of Revision Description Revision Number Volume 2: Clarifications regarding “reserved” fields in ITIR (Chapter 3). Instruction and Data translation must be enabled for executing IA-32 instructions (Chapters 3,4 and 10). FCR/FDR mappings, and clarification to the value of PSR.ri after an RFI (Chapters 3 and 4).
Instruction Reference This chapter describes the function of each Itanium instruction. The pages of this chapter are sorted alphabetically by assembly language mnemonic. Instruction Page Conventions The instruction pages are divided into multiple sections as listed in Table 2-1. The first three sections are present on all instruction pages.
(64-bits not including the NaT bit) where the notation GR[addr] is used. The syntactical differences between the code found in the Operation section and ANSI C is listed in Table 2-4. Table 2-3. Register File Notation Assembly Indirect Register File C Notation Mnemonic Access...
Table 2-5. Pervasive Conditions Not Included in Instruction Description Code Condition Action Read of a register outside the current frame. An undefined value is returned (no fault). Access to a banked general register (GR 16 through GR 31). The GR bank specified by PSR.bn is accessed. PSR.ss is set.
Page 23
add — Add ) add register_form Format: ) add plus1_form, register_form ) add pseudo-op ) adds imm14_form ) addl imm22_form The two source operands (and an optional constant 1) are added and the result placed Description: in GR . In the register form the first operand is GR ;...
addp4 addp4 — Add Pointer ) addp4 register_form Format: ) addp4 imm14_form The two source operands are added. The upper 32 bits of the result are forced to zero, Description: and then bits {31:30} of GR are copied to bits {62:61} of the result. This result is placed in GR .
alloc alloc — Allocate Stack Frame ) alloc = ar.pfs, Format: A new stack frame is allocated on the general register stack, and the Previous Function Description: State register (PFS) is copied to GR . The change of frame size is immediate. The write of GR and subsequent instructions in the same instruction group use the new frame.
Page 26
alloc Operation: // tmp_sof, tmp_sol, tmp_sor are the fields encoded in the instruction tmp_sof = i + l + o; tmp_sol = i + l; tmp_sor = r u>> 3; check_target_register_sof(r , tmp_sof); if (tmp_sof u> 96 || r u> tmp_sof || tmp_sol u> tmp_sof || qp != 0) illegal_operation_fault();...
Page 27
and — Logical And ) and register_form Format: ) and imm8_form The two source operands are logically ANDed and the result placed in GR . In the Description: register_form the first operand is GR ; in the imm8_form the first operand is taken from the encoding field.
Page 28
andcm andcm — And Complement ) andcm register_form Format: ) andcm imm8_form The first source operand is logically ANDed with the 1’s complement of the second Description: source operand and the result placed in GR . In the register_form the first operand is ;...
br — Branch ) br. ip_relative_form Format: btype dh target ) br. call_form, ip_relative_form btype dh b target counted_form, ip_relative_form btype dh target pseudo-op dh target ) br. indirect_form btype dh b ) br. call_form, indirect_form btype dh b pseudo-op dh b A branch condition is evaluated, and either a branch is taken, or execution continues Description:...
Page 30
the branch condition is simply the value of the specified predicate register. These basic branch types are: • cond: If the qualifying predicate is 1, the branch is taken. Otherwise it is not taken. • call: If the qualifying predicate is 1, the branch is taken and several other actions occur: •...
Page 31
group as br.ia are not allowed, since br.ia may implicitly reads all ARs. If an illegal RAW dependency is present between an AR write and br.ia, the first IA-32 instruction fetch and execution may or may not see the updated AR value. IA-32 instruction set execution leaves the contents of the ALAT undefined.
The modulo-scheduled loop types are: • ctop and cexit: These branch types behave identically, except in the determination of whether to branch or not. For br.ctop, the branch is taken if either LC is non-zero or EC is greater than one. For br.cexit, the opposite is true. It is not taken if either LC is non-zero or EC is greater than one and is taken otherwise.
Table 2-7. Branch Whether Hint bwh Completer Branch Whether Hint spnt Static Not-Taken sptk Static Taken dpnt Dynamic Not-Taken dptk Dynamic Taken Table 2-8. Sequential Prefetch Hint ph Completer Sequential Prefetch Hint few or none Few lines many Many lines Table 2-9.
Page 35
tmp_taken = PR[qp]; if (tmp_taken) { // tmp_growth indicates the amount to move logical TOP *up*: // tmp_growth = sizeof(previous out) - sizeof(current frame) // a negative amount indicates a shrinking stack tmp_growth = (AR[PFS].pfm.sof - AR[PFS].pfm.sol) - CFM.sof; alat_frame_update(-AR[PFS].pfm.sol, 0); rse_fatal = rse_restore_frame(AR[PFS].pfm.sol, tmp_growth, CFM.sof);...
Page 36
illegal_operation_fault(); tmp_taken = (AR[LC] != 0); if (AR[LC] != 0) AR[LC]--; break; case ‘ctop’: case ‘cexit’: // SW pipelined counted loop if (slot != 2) illegal_operation_fault(); if (btype == ‘ctop’) tmp_taken = ((AR[LC] != 0) || (AR[EC] u> 1)); if (btype == ‘cexit’)tmp_taken = !((AR[LC] != 0) || (AR[EC] u> 1)); if (AR[LC] != 0) { AR[LC]--;...
Page 37
taken_branch = 1; IP = tmp_IP; // set the new value for IP if (!impl_uia_fault_supported() && ((PSR.it && unimplemented_virtual_address(tmp_IP, PSR.vm)) || (!PSR.it && unimplemented_physical_address(tmp_IP)))) unimplemented_instruction_address_trap(lower_priv_transition, tmp_IP); if (lower_priv_transition && PSR.lp) lower_privilege_transfer_trap(); if (PSR.tb) taken_branch_trap(); Illegal Operation fault Lower-Privilege Transfer trap Interruptions: Disabled Instruction Set Transition fault Taken Branch trap...
Page 38
break break — Break ) break pseudo-op Format: ) break.i i_unit_form ) break.b b_unit_form ) break.m m_unit_form ) break.f f_unit_form ) break.x x_unit_form A Break Instruction fault is taken. For the i_unit_form, f_unit_form and m_unit_form, Description: the value specified by is zero-extended and placed in the Interruption Immediate control register (IIM).
brl — Branch Long ) brl. Format: btype dh target ) brl. call_form btype dh b target brl. pseudo-op dh target A branch condition is evaluated, and either a branch is taken, or execution continues Description: with the next sequential instruction. The execution of a branch logically follows the execution of all previous non-branch instructions in the same instruction group.
Page 40
system is required to provide an Illegal Operation fault handler which emulates taken and not-taken long branches. Presence of this instruction is indicated by a 1 in the lb bit of CPUID register 4. See Section 3.1.11, “Processor Identification Registers” on page 1:34.
brp — Branch Predict brp. ip_relative_form Format: ipwh ih target brp. indirect_form indwh ih b brp.ret. return_form, indirect_form indwh ih b This instruction can be used to provide to hardware early information about a future Description: branch. It has no effect on architectural machine state, and operates as a nop instruction except for its performance effects.
Page 43
bsw — Bank Switch bsw.0 zero_form Format: bsw.1 one_form This instruction switches to the specified register bank. The zero_form specifies Bank 0 Description: for GR16 to GR31. The one_form specifies Bank 1 for GR16 to GR31. After the bank switch the previous register bank is no longer accessible but does retain its current state.
Page 46
clrrrb clrrrb — Clear RRB clrrrb all_form Format: clrrrb.pr pred_form In the all_form, the register rename base registers (CFM.rrb.gr, CFM.rrb.fr, and Description: CFM.rrb.pr) are cleared. In the pred_form, the single register rename base register for the predicates (CFM.rrb.pr) is cleared. This instruction must be the last instruction in an instruction group;...
clz — Count Leading Zeros ) clz Format: The number of leading zeros in GR is placed in GR Description: An Illegal Operation fault is raised on processor models that do not support the instruction. CPUID register 4 indicates the presence of the feature on the processor model.
cmp — Compare ) cmp. register_form Format: crel ctype p ) cmp. imm8_form crel ctype p ) cmp. = r0, parallel_inequality_form crel ctype p ) cmp. , r0 pseudo-op crel ctype p The two source operands are compared for one of ten relations specified by crel. This Description: produces a boolean result which is 1 if the comparison condition is true, and 0 otherwise.
simply uses the negative relation with an implemented type. The implemented relations and how the pseudo-ops map onto them are shown in Table 2-16 (for normal and unc type compares), and Table 2-17 (for parallel type compares). Table 2-16. 64-bit Comparison Relations for Normal and unc Compares Compare Relation Register Form is a Immediate Form is a...
Page 50
Operation: if (PR[qp]) { if (p == p illegal_operation_fault(); tmp_nat = (register_form ? GR[r ].nat : 0) || GR[r ].nat; if (register_form) tmp_src = GR[r else if (imm8_form) tmp_src = sign_ext(imm , 8); else // parallel_inequality_form tmp_src = 0; (crel == ‘eq’) tmp_rel = tmp_src == GR[r else if (crel == ‘ne’) tmp_rel = tmp_src != GR[r...
cmp4 cmp4 — Compare 4 Bytes ) cmp4. register_form Format: crel ctype p ) cmp4. imm8_form crel ctype p ) cmp4. = r0, parallel_inequality_form crel ctype p ) cmp4. , r0 pseudo-op crel ctype p The least significant 32 bits from each of two source operands are compared for one of Description: ten relations specified by crel.
cmpxchg cmpxchg — Compare and Exchange ) cmpxchg , ar.ccv Format: ldhint r ) cmp8xchg16. , ar.csd, ar.ccv sixteen_byte_form ldhint r A value consisting of sz bytes (8 bytes for cmp8xchg16) is read from memory starting at Description: the address specified by the value in GR .
Page 56
cmpxchg affect program functionality and may be ignored by the implementation. See Section 4.4.6, “Memory Hierarchy Control and Consistency” on page 1:69 for details. For cmp8xchg16, Illegal Operation fault is raised on processor models that do not support the instruction. CPUID register 4 indicates the presence of the feature on the processor model.
Page 57
cover cover — Cover Stack Frame cover Format: A new stack frame of zero size is allocated which does not include any registers from Description: the previous frame (as though all output registers in the previous frame had been locals). The register rename base registers are reset. If interruption collection is disabled (PSR.ic is zero), then the old value of the Current Frame Marker (CFM) is copied to the Interruption Function State register (IFS), and IFS.v is set to one.
Page 58
czx — Compute Zero Index ) czx1.l one_byte_form, left_form Format: ) czx1.r one_byte_form, right_form ) czx2.l two_byte_form, left_form ) czx2.r two_byte_form, right_form is scanned for a zero element. The element is either an 8-bit aligned byte Description: (one_byte_form) or a 16-bit aligned pair of bytes (two_byte_form). The index of the first zero element is placed in GR .
Page 60
dep — Deposit ) dep merge_form, register_form Format: ) dep merge_form, imm_form , pos ) dep.z zero_form, register_form ) dep.z zero_form, imm_form In the merge_form, a right justified bit field taken from the first source operand is Description: deposited into the value in GR r at an arbitrary bit position and the result is placed in GR r .
Page 62
epc — Enter Privileged Code Format: This instruction increases the privilege level. The new privilege level is given by the TLB Description: entry for the page containing this instruction. This instruction can be used to implement calls to higher-privileged routines without the overhead of an interruption. Before increasing the privilege level, a check is performed.
Page 63
extr extr — Extract ) extr signed_form Format: ) extr.u unsigned_form A field is extracted from GR , either zero extended or sign extended, and placed Description: right-justified in GR . The field begins at the bit position given by the second operand and extends bits to the left.
Page 64
fabs fabs — Floating-point Absolute Value ) fabs pseudo-op of: ( ) fmerge.s = f0, Format: The absolute value of the value in FR is computed and placed in FR Description: If FR is a NaTVal, FR is set to NaTVal instead of the computed result. Operation: See “fmerge —...
fadd fadd — Floating-point Add ) fadd. pseudo-op of: ( ) fma. , f1, Format: sf f sf f and FR are added (computed to infinite precision), rounded to the precision Description: indicated by pc (and possibly FPSR.sf.pc and FPSR.sf.wre) using the rounding mode specified by FPSR.sf.rc, and placed in FR .
famax famax — Floating-point Absolute Maximum ) famax. Format: sf f The operand with the larger absolute value is placed in FR . If the magnitude of FR Description: equals the magnitude of FR , FR gets FR If either FR or FR is a NaN, FR gets FR...
Page 67
famin famin — Floating-point Absolute Minimum ) famin. Format: sf f The operand with the smaller absolute value is placed in FR . If the magnitude of FR Description: equals the magnitude of FR , FR gets FR If either FR or FR is a NaN, FR gets FR...
Page 68
fand fand — Floating-point Logical And ) fand Format: The bit-wise logical AND of the significand fields of FR and FR is computed. The Description: resulting value is stored in the significand field of FR . The exponent field of FR is set to the biased exponent for 2.0 (0x1003E) and the sign field of FR...
Page 69
fandcm fandcm — Floating-point And Complement ) fandcm Format: The bit-wise logical AND of the significand field of FR with the bit-wise complemented Description: significand field of FR is computed. The resulting value is stored in the significand field of FR .
fc — Flush Cache ) fc invalidate_line_form Format: ) fc.i instruction_cache_coherent_form In the invalidate_line form, the cache line associated with the address specified by the Description: value of GR r is invalidated from all levels of the processor cache hierarchy. The invalidation is broadcast throughout the coherence domain.
Page 71
Register NaT Consumption fault Data TLB fault Interruptions: Unimplemented Data Address fault Data Page Not Present fault Data Nested TLB fault Data NaT Page Consumption fault Alternate Data TLB fault Data Access Rights fault VHPT Data fault 3:62 Volume 3: Instruction Reference...
fchkf fchkf — Floating-point Check Flags ) fchkf. Format: sf target The flags in FPSR.sf.flags are compared with FPSR.s0.flags and FPSR.traps. If any flags Description: set in FPSR.sf.flags correspond to FPSR.traps which are enabled, or if any flags set in FPSR.sf.flags are not set in FPSR.s0.flags, then a branch to is taken.
fclass fclass — Floating-point Class ) fclass. Format: fcrel fctype p fclass The contents of FR are classified according to the completer as shown in Description: fclass Table 2-25. This produces a boolean result based on whether the contents of FR agrees with the floating-point number format specified by , as specified by the fclass...
Page 75
fclrf fclrf — Floating-point Clear Flags ) fclrf. Format: The status field’s 6-bit flags field is reset to zero. Description: The mnemonic values for sf are given in Table 2-23 on page 3:56. Operation: if (PR[qp]) { fp_set_sf_flags(sf, 0); None FP Exceptions: None Interruptions:...
fcmp fcmp — Floating-point Compare ) fcmp. Format: frel fctype sf p The two source operands are compared for one of twelve relations specified by frel. This Description: produces a boolean result which is 1 if the comparison condition is true, and 0 otherwise.
Page 77
fcmp Operation: if (PR[qp]) { if (p == p illegal_operation_fault(); if (tmp_isrcode = fp_reg_disabled(f , 0, 0)) disabled_fp_register_fault(tmp_isrcode, 0); if (fp_is_natval(FR[f ]) || fp_is_natval(FR[f ])) { PR[p ] = 0; PR[p ] = 0; } else { fcmp_exception_fault_check(f , frel, sf, &tmp_fp_env); if (fp_raise_fault(tmp_fp_env)) fp_exception_fault(fp_decode_fault(tmp_fp_env));...
Page 79
fcvt.fx fcvt.fx — Convert Floating-point to Integer ) fcvt.fx. signed_form Format: sf f ) fcvt.fx.trunc. signed_form, trunc_form sf f ) fcvt.fxu. unsigned_form sf f ) fcvt.fxu.trunc. unsigned_form, trunc_form sf f is treated as a register format floating-point value and converted to a signed Description: (signed_form) or unsigned integer (unsigned_form) using either the rounding mode specified in the FPSR.sf.rc, or using Round-to-Zero if the trunc_form of the instruction is...
Page 81
fcvt.xf fcvt.xf — Convert Signed Integer to Floating-point ) fcvt.xf Format: The 64-bit significand of FR is treated as a signed integer and its register file precision Description: floating-point representation is placed in FR If FR is a NaTVal, FR is set to NaTVal instead of the computed result.
Page 82
fcvt.xuf fcvt.xuf — Convert Unsigned Integer to Floating-point ) fcvt.xuf.pc.sf pseudo-op of: ( ) fma. , f1, f0 Format: sf f is multiplied with FR 1, rounded to the precision indicated by pc (and possibly Description: FPSR.sf.pc and FPSR.sf.wre) using the rounding mode specified by FPSR.sf.rc, and placed in FR Note: Multiplying FR with FR 1 (a 1.0) normalizes the canonical representation of an...
fetchadd fetchadd — Fetch and Add Immediate ) fetchadd4. four_byte_form Format: ldhint r ) fetchadd8. eight_byte_form ldhint r A value consisting of four or eight bytes is read from memory starting at the address Description: specified by the value in GR .
Page 84
fetchadd Operation: if (PR[qp]) { check_target_register(r if (GR[r ].nat) register_nat_consumption_fault(SEMAPHORE); size = four_byte_form ? 4 : 8; paddr = tlb_translate(GR[r ], size, SEMAPHORE, PSR.cpl, &mattr, &tmp_unused); if (!ma_supports_fetchadd(mattr)) unsupported_data_reference_fault(SEMAPHORE, GR[r if (sem == ‘acq’) val = mem_xchg_add(inc , paddr, size, UM.be, mattr, ACQUIRE, ldhint); else // ‘rel’...
Page 85
flushrs flushrs — Flush Register Stack flushrs Format: All stacked general registers in the dirty partition of the register stack are written to the Description: backing store before execution continues. The dirty partition contains registers from previous procedure frames that have not yet been saved to the backing store. For a description of the register stack partitions, refer to Chapter 6, “Register Stack Engine”...
Page 86
fma — Floating-point Multiply Add ) fma. Format: sf f The product of FR and FR is computed to infinite precision and then FR is added to Description: this product, again in infinite precision. The resulting value is then rounded to the precision indicated by pc (and possibly FPSR.sf.pc and FPSR.sf.wre) using the rounding mode specified by FPSR.sf.rc.
Page 88
fmax fmax — Floating-point Maximum ) fmax. Format: sf f The operand with the larger value is placed in FR . If FR equals FR , FR gets FR Description: If either FR or FR is a NaN, FR gets FR If either FR or FR is a NaTVal, FR...
fmerge fmerge — Floating-point Merge ) fmerge.ns neg_sign_form Format: ) fmerge.s sign_form ) fmerge.se sign_exp_form Sign, exponent and significand fields are extracted from FR and FR , combined, and Description: the result is placed in FR For the neg_sign_form, the sign of FR is negated and concatenated with the exponent and the significand of FR .
Page 91
fmin fmin — Floating-point Minimum ) fmin. Format: sf f The operand with the smaller value is placed in FR . If FR equals FR , FR gets FR Description: If either FR or FR is a NaN, FR gets FR If either FR or FR is a NaTVal, FR...
fmix fmix — Floating-point Mix ) fmix.l mix_l_form Format: ) fmix.r mix_r_form ) fmix.lr mix_lr_form For the mix_l_form (mix_r_form), the left (right) single precision value in FR Description: concatenated with the left (right) single precision value in FR . For the mix_lr_form, the left single precision value in FR is concatenated with the right single precision value in FR...
Page 93
fmix Operation: if (PR[qp]) { fp_check_target_register(f if (tmp_isrcode = fp_reg_disabled(f , 0)) disabled_fp_register_fault(tmp_isrcode, 0); if (fp_is_natval(FR[f ]) || fp_is_natval(FR[f ])) { FR[f ] = NATVAL; } else { if (mix_l_form) { tmp_res_hi = FR[f ].significand{63:32}; tmp_res_lo = FR[f ].significand{63:32}; } else if (mix_r_form) { tmp_res_hi = FR[f ].significand{31:0};...
Page 94
fmpy fmpy — Floating-point Multiply ) fmpy. pseudo-op of: ( ) fma. , f0 Format: sf f sf f The product FR and FR is computed to infinite precision. The resulting value is then Description: rounded to the precision indicated by pc (and possibly FPSR.sf.pc and FPSR.sf.wre) using the rounding mode specified by FPSR.sf.rc.
Page 95
fms — Floating-point Multiply Subtract ) fms. Format: sf f The product of FR and FR is computed to infinite precision and then FR Description: subtracted from this product, again in infinite precision. The resulting value is then rounded to the precision indicated by pc (and possibly FPSR.sf.pc and FPSR.sf.wre) using the rounding mode specified by FPSR.sf.rc.
Page 97
fneg fneg — Floating-point Negate ) fneg pseudo-op of: ( ) fmerge.ns Format: The value in FR is negated and placed in FR Description: If FR is a NaTVal, FR is set to NaTVal instead of the computed result. Operation: See “fmerge —...
Page 98
fnegabs fnegabs — Floating-point Negate Absolute Value ) fnegabs pseudo-op of: ( ) fmerge.ns = f0, Format: The absolute value of the value in FR is computed, negated, and placed in FR Description: If FR is a NaTVal, FR is set to NaTVal instead of the computed result. Operation: See “fmerge —...
Page 99
fnma fnma — Floating-point Negative Multiply Add ) fnma. Format: sf f The product of FR and FR is computed to infinite precision, negated, and then FR Description: is added to this product, again in infinite precision. The resulting value is then rounded to the precision indicated by pc (and possibly FPSR.sf.pc and FPSR.sf.wre) using the rounding mode specified by FPSR.sf.rc.
Page 101
fnmpy fnmpy — Floating-point Negative Multiply ) fnmpy. pseudo-op of: ( ) fnma. Format: sf f sf f The product FR and FR is computed to infinite precision and then negated. The Description: resulting value is then rounded to the precision indicated by pc (and possibly FPSR.sf.pc and FPSR.sf.wre) using the rounding mode specified by FPSR.sf.rc.
Page 102
fnorm fnorm — Floating-point Normalize ) fnorm. pseudo-op of: ( ) fma. , f1, f0 Format: sf f sf f is normalized and rounded to the precision indicated by pc (and possibly Description: FPSR.sf.pc and FPSR.sf.wre) using the rounding mode specified by FPSR.sf.rc, and placed in FR If FR is a NaTVal, FR...
Page 103
for — Floating-point Logical Or ) for Format: The bit-wise logical OR of the significand fields of FR and FR is computed. The Description: resulting value is stored in the significand field of FR . The exponent field of FR is set to the biased exponent for 2.0 (0x1003E) and the sign field of FR...
Page 104
fpabs fpabs — Floating-point Parallel Absolute Value ) fpabs pseudo-op of: ( ) fpmerge.s = f0, Format: The absolute values of the pair of single precision values in the significand field of FR Description: are computed and stored in the significand field of FR .
fpack fpack — Floating-point Pack ) fpack pack_form Format: The register format numbers in FR and FR are converted to single precision memory Description: format. These two single precision numbers are concatenated and stored in the significand field of FR .
Page 106
fpamax fpamax — Floating-point Parallel Absolute Maximum ) fpamax. Format: sf f The paired single precision values in the significands of FR and FR are compared. Description: The operands with the larger absolute value are returned in the significand field of FR If the magnitude of high (low) FR is less than the magnitude of high (low) FR , high...
Page 108
fpamin fpamin — Floating-point Parallel Absolute Minimum ) fpamin. Format: sf f The paired single precision values in the significands of FR or FR are compared. The Description: operands with the smaller absolute value is returned in the significand of FR If the magnitude of high (low) FR is less than the magnitude of high (low) FR , high...
fpcmp fpcmp — Floating-point Parallel Compare ) fpcmp. Format: frel sf f The two pairs of single precision source operands in the significand fields of FR and FR Description: are compared for one of twelve relations specified by frel. This produces a boolean result which is a mask of 32 1’s if the comparison condition is true, and a mask of 32 0’s otherwise.
Page 113
fpcvt.fx fpcvt.fx — Convert Parallel Floating-point to Integer ) fpcvt.fx. signed_form Format: sf f ) fpcvt.fx.trunc. signed_form, trunc_form sf f ) fpcvt.fxu. unsigned_form sf f ) fpcvt.fxu.trunc. unsigned_form, trunc_form sf f The pair of single precision values in the significand field of FR is converted to a pair Description: of 32-bit signed integers (signed_form) or unsigned integers (unsigned_form) using...
Page 114
fpcvt.fx Operation: if (PR[qp]) { fp_check_target_register(f if (tmp_isrcode = fp_reg_disabled(f , 0, 0)) disabled_fp_register_fault(tmp_isrcode, 0); if (fp_is_natval(FR[f ])) { FR[f ] = NATVAL; fp_update_psr(f } else { tmp_default_result_pair = fpcvt_exception_fault_check(f signed_form, trunc_form, sf, &tmp_fp_env); if (fp_raise_fault(tmp_fp_env)) fp_exception_fault(fp_decode_fault(tmp_fp_env)); if (fp_is_nan(tmp_default_result_pair.hi)) { tmp_res_hi = INTEGER_INDEFINITE_32_BIT;...
Page 116
fpma fpma — Floating-point Parallel Multiply Add ) fpma. Format: sf f The pair of products of the pairs of single precision values in the significand fields of FR Description: and FR are computed to infinite precision and then the pair of single precision values in the significand field of FR is added to these products, again in infinite precision.
Page 117
fpma Operation: if (PR[qp]) { fp_check_target_register(f if (tmp_isrcode = fp_reg_disabled(f disabled_fp_register_fault(tmp_isrcode, 0); if (fp_is_natval(FR[f ]) || fp_is_natval(FR[f ]) || fp_is_natval(FR[f ])) { FR[f ] = NATVAL; fp_update_psr(f } else { tmp_default_result_pair = fpma_exception_fault_check(f , sf, &tmp_fp_env); if (fp_raise_fault(tmp_fp_env)) fp_exception_fault(fp_decode_fault(tmp_fp_env)); if (fp_is_nan_or_inf(tmp_default_result_pair.hi)) { tmp_res_hi = fp_single(tmp_default_result_pair.hi);...
Page 118
fpmax fpmax — Floating-point Parallel Maximum ) fpmax. Format: sf f The paired single precision values in the significands of FR or FR are compared. The Description: operands with the larger value is returned in the significand of FR If the value of high (low) FR is less than the value of high (low) FR , high (low) FR gets high (low) FR...
fpmerge fpmerge — Floating-point Parallel Merge ) fpmerge.ns neg_sign_form Format: ) fpmerge.s sign_form ) fpmerge.se sign_exp_form For the neg_sign_form, the signs of the pair of single precision values in the significand Description: field of FR are negated and concatenated with the exponents and the significands of the pair of single precision values in the significand field of FR and stored in the significand field of FR...
Page 122
fpmin fpmin — Floating-point Parallel Minimum ) fpmin. Format: sf f The paired single precision values in the significands of FR or FR are compared. The Description: operands with the smaller value is returned in significand of FR If the value of high (low) FR is less than the value of high (low) FR , high (low) FR gets high (low) FR...
Page 124
fpmpy fpmpy — Floating-point Parallel Multiply ) fpmpy. pseudo-op of: ( ) fpma. , f0 Format: sf f sf f The pair of products of the pairs of single precision values in the significand fields of FR Description: and FR are computed to infinite precision.
Page 125
fpms fpms — Floating-point Parallel Multiply Subtract ) fpms. Format: sf f The pair of products of the pairs of single precision values in the significand fields of FR Description: and FR are computed to infinite precision and then the pair of single precision values in the significand field of FR is subtracted from these products, again in infinite precision.
Page 127
fpneg fpneg — Floating-point Parallel Negate ) fpneg pseudo-op of: ( ) fpmerge.ns Format: The pair of single precision values in the significand field of FR are negated and stored Description: in the significand field of FR . The exponent field of FR is set to the biased exponent for 2.0 (0x1003E) and the sign field of FR...
Page 128
fpnegabs fpnegabs — Floating-point Parallel Negate Absolute Value ) fpnegabs pseudo-op of: ( ) fpmerge.ns = f0, Format: The absolute values of the pair of single precision values in the significand field of FR Description: are computed, negated and stored in the significand field of FR .
Page 129
fpnma fpnma — Floating-point Parallel Negative Multiply Add ) fpnma. Format: sf f The pair of products of the pairs of single precision values in the significand fields of FR Description: and FR are computed to infinite precision, negated, and then the pair of single precision values in the significand field of FR are added to these (negated) products, again in infinite precision.
Page 130
fpnma Operation: if (PR[qp]) { fp_check_target_register(f if (tmp_isrcode = fp_reg_disabled(f disabled_fp_register_fault(tmp_isrcode, 0); if (fp_is_natval(FR[f ]) || fp_is_natval(FR[f ]) || fp_is_natval(FR[f ])) { FR[f ] = NATVAL; fp_update_psr(f } else { tmp_default_result_pair = fpms_fpnma_exception_fault_check(f , sf, &tmp_fp_env); if (fp_raise_fault(tmp_fp_env)) fp_exception_fault(fp_decode_fault(tmp_fp_env)); if (fp_is_nan_or_inf(tmp_default_result_pair.hi)) { tmp_res_hi = fp_single(tmp_default_result_pair.hi);...
Page 131
fpnmpy fpnmpy — Floating-point Parallel Negative Multiply ) fpnmpy. pseudo-op of: ( ) fpnma. Format: sf f sf f The pair of products of the pairs of single precision values in the significand fields of FR Description: and FR are computed to infinite precision and then negated. The resulting values are then rounded to single precision using the rounding mode specified by FPSR.sf.rc.
Page 132
fprcpa fprcpa — Floating-point Parallel Reciprocal Approximation ) fprcpa. Format: sf f If PR is 0, PR is cleared and FR remains unchanged. Description: If PR is 1, the following will occur: • Each half of the significand of FR is either set to an approximation (with a relative -8.886 error <...
Page 135
fprsqrta fprsqrta — Floating-point Parallel Reciprocal Square Root Approximation Format: ) fprsqrta. sf f If PR is 0, PR is cleared and FR remains unchanged. Description: If PR is 1, the following will occur: • Each half of the significand of FR is either set to an approximation (with a relative -8.831 error <...
Page 137
frcpa frcpa — Floating-point Reciprocal Approximation ) frcpa. Format: sf f If PR is 0, PR is cleared and FR remains unchanged. Description: If PR is 1, the following will occur: -8.886 • FR is either set to an approximation (with a relative error < 2 ) of the reciprocal of FR , or to the IEEE-754 mandated quotient of FR...
Page 140
frsqrta frsqrta — Floating-point Reciprocal Square Root Approximation ) frsqrta. Format: sf f If PR is 0, PR is cleared and FR remains unchanged. Description: If PR is 1, the following will occur: -8.831 • FR is either set to an approximation (with a relative error < 2 ) of the reciprocal square root of FR , or set to the IEEE-754 mandated square root of FR...
Page 143
fselect fselect — Floating-point Select ) fselect Format: The significand field of FR is logically AND-ed with the significand field of FR and the Description: significand field of FR is logically AND-ed with the one’s complement of the significand field of FR .
Page 144
fsetc fsetc — Floating-point Set Controls ) fsetc. Format: sf amask , omask The status field’s control bits are initialized to the value obtained by logically AND-ing Description: the sf0.controls and immediate field and logically OR-ing the immediate amask omask field.
Page 145
fsub fsub — Floating-point Subtract ) fsub. pseudo-op of: ( ) fms. , f1, Format: sf f sf f is subtracted from FR (computed to infinite precision), rounded to the precision Description: indicated by pc (and possibly FPSR.sf.pc and FPSR.sf.wre) using the rounding mode specified by FPSR.sf.rc, and placed in FR If either FR or FR...
fswap fswap — Floating-point Swap ) fswap swap_form Format: ) fswap.nl swap_nl_form ) fswap.nr swap_nr_form For the swap_form, the left single precision value in FR is concatenated with the right Description: single precision value in FR . The concatenated pair is then swapped. For the swap_nl_form, the left single precision value in FR is concatenated with the right single precision value in FR...
fsxt fsxt — Floating-point Sign Extend ) fsxt.l sxt_l_form Format: ) fsxt.r sxt_r_form For the sxt_l_form (sxt_r_form), the sign of the left (right) single precision value in FR Description: is extended to 32-bits and is concatenated with the left (right) single precision value in FR For all forms, the exponent field of FR is set to the biased exponent for 2.0...
Page 150
fwb — Flush Write Buffers ) fwb Format: The processor is instructed to expedite flushing of any pending stores held in write or Description: coalescing buffers. Since this operation is a hint, the processor may or may not take any action and actually flush any outstanding stores. The processor gives no indication when flushing of any prior stores is completed.
Page 151
fxor fxor — Floating-point Exclusive Or ) fxor Format: The bit-wise logical exclusive-OR of the significand fields of FR and FR is computed. Description: The resulting value is stored in the significand field of FR . The exponent field of FR is set to the biased exponent for 2.0 (0x1003E) and the sign field of FR is set to...
getf getf — Get Floating-point Value or Exponent or Significand ) getf.s single_form Format: ) getf.d double_form ) getf.exp exponent_form ) getf.sig significand_form In the single and double forms, the value in FR is converted into a single precision Description: (single_form) or double precision (double_form) memory representation and placed in , as shown in Figure 5-7...
hint hint — Performance Hint ) hint pseudo-op Format: (qp) hint.i i_unit_form (qp) hint.b b_unit_form (qp) hint.m m_unit_form (qp) hint.f f_unit_form (qp) hint.x x_unit_form Provides a performance hint to the processor about the program being executed. It has Description: no effect on architectural machine state, and operates as a nop instruction except for its performance effects.
Page 155
invala invala — Invalidate ALAT ) invala complete_form Format: (qp) invala.e r gr_form, entry_form (qp) invala.e f fr_form, entry_form The selected entry or entries in the ALAT are invalidated. Description: In the complete_form, all ALAT entries are invalidated. In the entry_form, the ALAT is queried using the general register specifier r (gr_form), or the floating-point register specifier f...
Page 156
itc — Insert Translation Cache ) itc.i instruction_form Format: ) itc.d data_form An entry is inserted into the instruction or data translation cache. GR specifies the Description: physical address portion of the translation. ITIR specifies the protection key, page size and additional information.
Page 157
Operation: if (PR[qp]) { if (!followed_by_stop()) undefined_behavior(); if (PSR.ic) illegal_operation_fault(); if (PSR.cpl != 0) privileged_operation_fault(0); if (GR[r ].nat) register_nat_consumption_fault(0); tmp_size = CR[ITIR].ps; tmp_va = CR[IFA]{60:0}; tmp_rid = RR[CR[IFA]{63:61}].rid; tmp_va = align_to_size_boundary(tmp_va, tmp_size); if (is_reserved_field(TLB_TYPE, GR[r ], CR[ITIR])) reserved_register_field_fault(); if (!impl_check_mov_ifa() && unimplemented_virtual_address(CR[IFA], PSR.vm)) unimplemented_data_address_fault(0);...
Page 158
itr — Insert Translation Register ) itr.i itr[ instruction_form Format: ) itr.d dtr[ data_form A translation is inserted into the instruction or data translation register specified by the Description: contents of GR . GR specifies the physical address portion of the translation. ITIR specifies the protection key, page size and additional information.
Page 159
Machine Check abort Reserved Register/Field fault Interruptions: Illegal Operation fault Unimplemented Data Address fault Privileged Operation fault Virtualization fault Register NaT Consumption fault For the instruction_form, software must issue an instruction serialization operation Serialization: before a dependent instruction fetch access. For the data_form, software must issue a data serialization operation before issuing a data access or non-access reference dependent on the new translation.
Table 2-33. Load Types (Continued) ldtype Interpretation Special Load Operation Completer Speculative An entry is added to the ALAT, and certain exceptions may be deferred. Advanced load Deferral causes the target register’s NaT bit to be set, and the processor ensures that no ALAT entry exists for the target register. The absence of an ALAT entry is later used to detect deferral or collision.
Page 162
Table 2-34. Load Hints (Continued) ldhint Completer Interpretation No temporal locality, level 1 No temporal locality, all levels In the no_base_update form, the value in GR r is not modified and no prefetch hint is implied. For the base update forms, specifying the same register address in r and r will cause an Illegal Operation fault.
Page 164
val = mem_read(paddr, size, UM.be, mattr, otype, bias | ldhint); if (check_clear || advanced) // remove any old ALAT entry alat_inval_single_entry(GENERAL, r if (defer) { if (speculative) { GR[r ] = natd_gr_read(paddr, size, UM.be, mattr, otype, bias | ldhint); GR[r ].nat = 1;...
Page 165
Illegal Operation fault Data NaT Page Consumption fault Interruptions: Register NaT Consumption fault Data Key Miss fault Unimplemented Data Address fault Data Key Permission fault Data Nested TLB fault Data Access Rights fault Alternate Data TLB fault Data Access Bit fault VHPT Data fault Data Debug fault Data TLB fault...
ldf — Floating-point Load ) ldf no_base_update_form Format: fldtype ldhint f ) ldf reg_base_update_form fldtype ldhint f ) ldf imm_base_update_form fldtype ldhint f ) ldf8. integer_form, no_base_update_form fldtype ldhint f ) ldf8. integer_form, reg_base_update_form fldtype ldhint f ) ldf8. integer_form, imm_base_update_form fldtype ldhint f (qp) ldf.fill.
Page 167
Table 2-36. FP Load Types (Continued) fldtype Interpretation Special Load Operation Completer Speculative An entry is added to the ALAT, and certain exceptions may be deferred. Advanced load Deferral causes NaTVal to be placed in the target register, and the processor ensures that no ALAT entry exists for the target register.
lfetch lfetch — Line Prefetch (qp) lfetch.lftype.lfhint [r no_base_update_form Format: ) lfetch. reg_base_update_form lftype lfhint ) lfetch. imm_base_update_form lftype lfhint ) lfetch. .excl. no_base_update_form, exclusive_form lftype lfhint ) lfetch. .excl. reg_base_update_form, exclusive_form lftype lfhint ) lfetch. .excl. imm_base_update_form, exclusive_form lftype lfhint The line containing the address specified by the value in GR is moved to the highest...
lfetch Table 2-38. lfhint Mnemonic Values lfhint Mnemonic Interpretation none Temporal locality, level 1 No temporal locality, level 1 No temporal locality, level 2 No temporal locality, all levels A faulting lfetch to an unimplemented address results in an Unimplemented Data Address fault.
Page 175
lfetch Operation: if (PR[qp]) { itype = READ|NON_ACCESS; itype |= (lftype == ‘fault’) ? LFETCH_FAULT : LFETCH; if (reg_base_update_form || imm_base_update_form) check_target_register(r if (lftype == ‘fault’) { // faulting form if (GR[r ].nat && !PSR.ed) // fault on NaT address register_nat_consumption_fault(itype);...
Page 176
loadrs loadrs — Load Register Stack loadrs Format: This instruction ensures that a specified number of bytes (registers values and/or NaT Description: collections) below the current BSP have been loaded from the backing store into the stacked general registers. The loaded registers are placed into the dirty partition of the register stack.
Page 177
mf — Memory Fence ) mf ordering_form Format: ) mf.a acceptance_form This instruction forces ordering between prior and subsequent memory accesses. The Description: ordering_form ensures all prior data memory accesses are made visible prior to any subsequent data memory accesses being made visible. It does not ensure prior data memory references have been accepted by the external platform, nor that prior data memory references are visible.
Page 178
mix — Mix ) mix1.l one_byte_form, left_form Format: ) mix2.l two_byte_form, left_form ) mix4.l four_byte_form, left_form ) mix1.r one_byte_form, right_form ) mix2.r two_byte_form, right_form ) mix4.r four_byte_form, right_form The data elements of GR are mixed as shown in Figure 2-25, and the result Description: placed in GR .
Figure 2-25. Mix Examples GR r GR r mix1.l GR r GR r GR r mix1.r GR r GR r GR r mix2.l GR r GR r GR r mix2.r GR r GR r GR r mix4.l GR r GR r GR r mix4.r GR r...
Page 182
mov ar Operation: if (PR[qp]) { tmp_type = (i_form ? AR_I_TYPE : AR_M_TYPE); if (is_reserved_reg(tmp_type, ar illegal_operation_fault(); if (from_form) { check_target_register(r if (((ar == BSPSTORE) || (ar == RNAT)) && (AR[RSC].mode != 0)) illegal_operation_fault(); if ((ar == ITC || ar == RUC) &&...
mov br mov — Move Branch Register ) mov from_form Format: ) mov pseudo-op ) mov. to_form ih b ) mov.ret. return_form, to_form ih b The source operand is copied to the destination register. Description: In the from_form, the branch register specified by is copied into GR .
Page 184
mov cr mov — Move Control Register ) mov from_form Format: ) mov to_form The source operand is copied to the destination register. Description: For the from_form, the control register specified by is read and the value copied into For the to_form, GR is read and the value copied into CR Control registers can only be accessed at the most privileged level, and when PSR.vm is 0.
Page 185
mov cr last_IP = tmp_val; Illegal Operation fault Reserved Register/Field fault Interruptions: Privileged Operation fault Unimplemented Data Address fault Register NaT Consumption fault Virtualization fault Reads of control registers reflect the results of all prior instruction groups and Serialization: interruptions. In general, writes to control registers do not immediately affect subsequent instructions.
Page 186
mov fr mov — Move Floating-point Register ) mov pseudo-op of: ( ) fmerge.s Format: The value of FR is copied to FR Description: Operation: See “fmerge — Floating-point Merge” on page 3:80. Volume 3: Instruction Reference 3:177...
Page 187
mov gr mov — Move General Register ) mov pseudo-op of: ( ) adds = 0, Format: The value of GR is copied to GR Description: Operation: See “add — Add” on page 3:14. 3:178 Volume 3: Instruction Reference...
Page 188
mov imm mov — Move Immediate ) mov pseudo-op of: ( ) addl , r0 Format: The immediate value, , is sign extended to 64 bits and placed in GR Description: Operation: See “add — Add” on page 3:14. Volume 3: Instruction Reference 3:179...
mov indirect mov — Move Indirect Register ) mov from_form Format: ireg ) mov to_form ireg The source operand is copied to the destination register. Description: For move from indirect register, GR is read and the value used as an index into the register file specified by (see Table 2-40...
Page 190
mov indirect if (from_form) { check_target_register(r if (PSR.cpl != 0 && !(ireg == PMD_TYPE || ireg == CPUID_TYPE)) privileged_operation_fault(0); if (GR[r ].nat) register_nat_consumption_fault(0); if (is_reserved_reg(ireg, tmp_index)) reserved_register_field_fault(); if (PSR.vm == 1 && ireg != PMD_TYPE) virtualization_fault(); if (ireg == PMD_TYPE) { if ((PSR.cpl != 0) &&...
Page 191
mov indirect case PMD_TYPE: pmd_write(tmp_index, tmp_val); break; case RR_TYPE: RR[tmp_index]= tmp_val; break; Illegal Operation fault Reserved Register/Field fault Interruptions: Privileged Operation fault Virtualization fault Register NaT Consumption fault For move to data breakpoint registers, software must issue a data serialize operation Serialization: before issuing a memory reference dependent on the modified register.
Page 192
mov ip mov — Move Instruction Pointer ) mov = ip Format: The Instruction Pointer (IP) for the bundle containing this instruction is copied into GR Description: Operation: if (PR[qp]) { check_target_register(r GR[r ] = IP; GR[r ].nat = 0; Illegal Operation fault Interruptions: Volume 3: Instruction Reference...
Page 193
mov pr mov — Move Predicates ) mov = pr from_form Format: ) mov pr = to_form mask ) mov pr.rot = to_rotate_form The source operand is copied to the destination register. Description: For moving the predicates to a GR, PR i is copied to bit position i within GR For moving to the predicates, the source can either be a general register, or an immediate value.
Page 194
mov psr mov — Move Processor Status Register ) mov = psr from_form Format: ) mov psr.l = to_form The source operand is copied to the destination register. See Section 3.3.2, “Processor Description: Status Register (PSR)” on page 2:23. For move from processor status register, PSR bits {36:35} and {31:0} are read, and copied into GR .
Page 195
mov um mov — Move User Mask ) mov = psr.um from_form Format: ) mov psr.um = to_form The source operand is copied to the destination register. Description: For move from user mask, PSR{5:0} is read, zero-extend, and copied into GR For move to user mask, PSR{5:0} is written by bits {5:0} of GR .
Page 196
movl movl — Move Long Immediate ) movl Format: The immediate value is copied to GR . The L slot of the bundle contains 41 bits of Description: Operation: if (PR[qp]) { check_target_register(r GR[r ] = imm GR[r ].nat = 0; Illegal Operation fault Interruptions: Volume 3: Instruction Reference...
Page 197
mpy4 mpy4 — Unsigned Integer Multiply ) mpy4 Format: The lower 32 bits of each of the two source operands are treated as unsigned values Description: and are multiplied, and the result is placed in GR . The upper 32 bits of each of the source operands are ignored.
Page 198
mpyshl4 mpyshl4 — Unsigned Integer Shift Left and Multiply ) mpyshl4 Format: The upper 32 bits of GR and the lower 32 bits of GR are treated as unsigned values Description: and are multiplied. The result of the multiplication is shifted left 32 bits, with the vacated bit positions filled with zeroes, and the result is placed in GR .
mux — Mux ) mux1 one_byte_form Format: mbtype ) mux2 two_byte_form mhtype A permutation is performed on the packed elements in a single source register, GR Description: and the result is placed in GR . For 8-bit elements, only some of all possible permutations can be specified.
For 16-bit elements, all possible permutations, with and without repetitions can be specified. They are expressed with an 8-bit mhtype field, which encodes the indices of the four 16-bit data elements. The indexed 16-bit elements of GR are copied to corresponding 16-bit positions in the target register GR .
Page 202
nop — No Operation ) nop pseudo-op Format: ) nop.i i_unit_form ) nop.b b_unit_form ) nop.m m_unit_form ) nop.f f_unit_form ) nop.x x_unit_form No operation is done. Description: The immediate, , can be used by software as a marker in program code. It is ignored by hardware.
Page 203
or — Logical Or ) or register_form Format: ) or imm8_form The two source operands are logically ORed and the result placed in GR . In the Description: register form the first operand is GR ; in the immediate form the first operand is taken from the encoding field.
pack pack — Pack ) pack2.sss two_byte_form, signed_saturation_form Format: ) pack2.uss two_byte_form, unsigned_saturation_form ) pack4.sss four_byte_form, signed_saturation_form 32-bit or 16-bit elements from GR and GR are converted into 16-bit or 8-bit Description: elements respectively, and the results are placed GR .
Page 205
pack Operation: if (PR[qp]) { check_target_register(r if (two_byte_form) { if (signed_saturation_form) { max = sign_ext(0x7f, 8); min = sign_ext(0x80, 8); } else { // unsigned_saturation_form max = 0xff; min = 0x00; temp[0] = sign_ext(GR[r ]{15:0}, 16); temp[1] = sign_ext(GR[r ]{31:16}, 16); temp[2] = sign_ext(GR[r ]{47:32}, 16);...
pavg pavg — Parallel Average ) pavg1 normal_form, one_byte_form Format: ) pavg1.raz raz_form, one_byte_form ) pavg2 normal_form, two_byte_form ) pavg2.raz raz_form, two_byte_form The unsigned data elements of GR are added to the unsigned data elements of GR Description: The results of the add are then each independently shifted to the right by one bit position.
pavg Figure 2-31. Parallel Average with Round Away from Zero Example GR r GR r Shift Right 1 Bit 16-bit Sum Plus Carry Carry Sum Bits Shift Right 1 Bit GR r pavg2.raz 3:202 Volume 3: Instruction Reference...
pavgsub pavgsub — Parallel Average Subtract ) pavgsub1 one_byte_form Format: ) pavgsub2 two_byte_form The unsigned data elements of GR are subtracted from the unsigned data elements of Description: . The results of the subtraction are then each independently shifted to the right by one bit position.
pcmp pcmp — Parallel Compare ) pcmp1. one_byte_form Format: prel r ) pcmp2. two_byte_form prel r ) pcmp4. four_byte_form prel r The two source operands are compared for one of the two relations shown in Description: Table 2-45. If the comparison condition is true for corresponding data elements of GR and GR , then the corresponding data element in GR is set to all ones.
Page 218
pmax pmax — Parallel Maximum ) pmax1.u one_byte_form Format: ) pmax2 two_byte_form The maximum of the two source operands is placed in the result register. In the Description: one_byte_form, each unsigned 8-bit element of GR is compared with the corresponding unsigned 8-bit element of GR and the greater of the two is placed in the corresponding 8-bit element of GR .
Page 220
pmin pmin — Parallel Minimum ) pmin1.u one_byte_form Format: ) pmin2 two_byte_form The minimum of the two source operands is placed in the result register. In the Description: one_byte_form, each unsigned 8-bit element of GR is compared with the corresponding unsigned 8-bit element of GR and the smaller of the two is placed in the corresponding 8-bit element of GR .
pmpy pmpy — Parallel Multiply ) pmpy2.r right_form Format: ) pmpy2.l left_form Two signed 16-bit data elements of GR are multiplied by the corresponding two Description: signed 16-bit data elements of GR as shown in Figure 2-36. The two 32-bit results are placed in GR Figure 2-36.
pmpyshr pmpyshr — Parallel Multiply and Shift Right ) pmpyshr2 signed_form Format: count ) pmpyshr2.u unsigned_form count The four 16-bit data elements of GR are multiplied by the corresponding four 16-bit Description: data elements of GR as shown in Figure 2-37.
Page 225
popcnt popcnt — Population Count ) popcnt Format: The number of bits in GR having the value 1 is counted, and the resulting sum is Description: placed in GR Operation: if (PR[qp]) { check_target_register(r res = 0; // Count up all the one bits for (i = 0;...
Page 226
probe probe — Probe Access ) probe.r regular_form, read_form, register_form Format: ) probe.w regular_form, write_form, register_form ) probe.r regular_form, read_form, immediate_form ) probe.w regular_form, write_form, immediate_form ) probe.r.fault fault_form, read_form, immediate_form ) probe.w.fault fault_form, write_form, immediate_form ) probe.rw.fault fault_form, read_write_form, immediate_form This instruction determines whether read or write access, with a specified privilege Description: level, to a given virtual address is permitted.
When PSR.vm is 1, this instruction may optionally raise Virtualization faults, see Section 11.7.4.2.8, “Probe Instruction Virtualization” on page 2:344 for details. ® ® Please refer to the Intel Itanium Software Conventions and Runtime Architecture Guide for usage information of the probe instruction. 3:218...
psad psad — Parallel Sum of Absolute Difference ) psad1 Format: The unsigned 8-bit elements of GR are subtracted from the unsigned 8-bit elements Description: of GR . The absolute value of each difference is accumulated across the elements and placed in GR Figure 2-38.
pshl pshl — Parallel Shift Left ) pshl2 two_byte_form, variable_form Format: ) pshl2 , count two_byte_form, fixed_form ) pshl4 four_byte_form, variable_form ) pshl4 , count four_byte_form, fixed_form The data elements of GR are each independently shifted to the left by the scalar shift Description: count in GR , or in the immediate field count...
Page 232
pshladd pshladd — Parallel Shift Left and Add ) pshladd2 , count Format: The four signed 16-bit data elements of GR are each independently shifted to the left Description: by count bits (shifting zeros into the low-order bits), and added to the four signed 16-bit data elements of GR .
Page 235
pshradd pshradd — Parallel Shift Right and Add ) pshradd2 , count Format: The four signed 16-bit data elements of GR are each independently shifted to the Description: right by count bits, and added to the four signed 16-bit data elements of GR .
Page 238
psub max = sign_ext(0x7fff, 16); min = sign_ext(0x8000, 16); for (i = 0; i < 4; i++) { temp[i] = sign_ext(x[i], 16) - sign_ext(y[i], 16); } else if (uus_saturation_form) { // uus_saturation_form max = 0xffff; min = 0x0000; for (i = 0; i < 4; i++) { temp[i] = zero_ext(x[i], 16) - sign_ext(y[i], 16);...
Page 239
ptc.e ptc.e — Purge Translation Cache Entry ) ptc.e r Format: One or more translation entries are purged from the local processor’s instruction and Description: data translation cache. Translation Registers and the VHPT are not modified. The number of translation cache entries purged is implementation specific. Some implementations may purge all levels of the translation cache hierarchy with one iteration of PTC.e, while other implementations may require several iterations to flush all levels, sets and associativities of both instruction and data translation caches.
Page 240
ptc.g, ptc.ga ptc.g, ptc.ga — Purge Global Translation Cache ) ptc.g global_form Format: ) ptc.ga global_alat_form The instruction and data translation cache for each processor in the local TLB coherence Description: domain are searched for all entries whose virtual address and page size partially or completely overlap the specified purge virtual address and purge address range.
Page 241
ptc.g, ptc.ga Operation: if (PR[qp]) { if (!followed_by_stop()) undefined_behavior(); if (PSR.cpl != 0) privileged_operation_fault(0); if (GR[r ].nat || GR[r ].nat) register_nat_consumption_fault(0); if (unimplemented_virtual_address(GR[r ], PSR.vm)) unimplemented_data_address_fault(0); if (PSR.vm == 1) virtualization_fault(); tmp_rid = RR[GR[r ]{63:61}].rid; tmp_va = GR[r ]{60:0}; tmp_size = GR[r ]{7:2};...
Page 242
ptc.l ptc.l — Purge Local Translation Cache ) ptc.l Format: The instruction and data translation cache of the local processor is searched for all Description: entries whose virtual address and page size partially or completely overlap the specified purge virtual address and purge address range. All these entries are removed. The purge virtual address is specified by GR bits{60:0} and the purge region identifier is selected by GR...
Page 243
ptr — Purge Translation Register ) ptr.d data_form Format: ) ptr.i instruction_form In the data form of this instruction, the data translation registers and caches are Description: searched for all entries whose virtual address and page size partially or completely overlap the specified purge virtual address and purge address range.
Page 244
Operation: if (PR[qp]) { if (PSR.cpl != 0) privileged_operation_fault(0); if (GR[r ].nat || GR[r ].nat) register_nat_consumption_fault(0); if (unimplemented_virtual_address(GR[r ], PSR.vm)) unimplemented_data_address_fault(0); if (PSR.vm == 1) virtualization_fault(); tmp_rid = RR[GR[r ]{63:61}].rid; tmp_va = GR[r ]{60:0}; tmp_size = GR[r ]{7:2}; tmp_va = align_to_size_boundary(tmp_va, tmp_size); if (data_form) { tlb_must_purge_dtr_entries(tmp_rid, tmp_va, tmp_size);...
Page 245
rfi — Return From Interruption Format: The machine context prior to an interruption is restored. PSR is restored from IPSR, Description: IPSR is unmodified, and IP is restored from IIP. Execution continues at the bundle address loaded into the IP, and the instruction slot loaded into PSR.ri. This instruction must be immediately followed by a stop;...
Page 246
If IPSR.is is 1, software must set other IPSR fields properly for IA-32 instruction set execution; otherwise processor operation is undefined. See Table 3-2, “Processor Status Register Fields” on page 2:24 for details. Software must issue a mf instruction before this instruction if memory ordering is required between IA-32 processor-consistent and Itanium unordered memory references.
Page 247
//instruction set execution. } else { //return to Itanium instruction set tmp_IP = CR[IIP] & ~0xf; slot = CR[IPSR].ri; if ((CR[IPSR].it && unimplemented_virtual_address(tmp_IP, IPSR.vm)) || (!CR[IPSR].it && unimplemented_physical_address(tmp_IP))) unimplemented_address = 1; if (CR[IFS].v) { tmp_growth = -CFM.sof; alat_frame_update(-CR[IFS].ifm.sof, 0); rse_restore_frame(CR[IFS].ifm.sof, tmp_growth, CFM.sof); CFM = CR[IFS].ifm;...
Page 248
rsm — Reset System Mask ) rsm Format: The complement of the operand is ANDed with the system mask (PSR{23:0}) and Description: the result is placed in the system mask. See Section 3.3.2, “Processor Status Register (PSR)” on page 2:23. The PSR system mask can only be written at the most privileged level, and when PSR.vm is 0.
Page 249
if (imm {21}) PSR{21} = 0;) // pp if (imm {22}) PSR{22} = 0;) // di if (imm {23}) PSR{23} = 0;) // si Privileged Operation fault Virtualization fault Interruptions: Reserved Register/Field fault Software must use a data serialize or instruction serialize operation before issuing Serialization: instructions dependent upon the altered PSR bits –...
Page 250
rum — Reset User Mask ) rum Format: The complement of the operand is ANDed with the user mask (PSR{5:0}) and the Description: result is placed in the user mask. See Section 3.3.2, “Processor Status Register (PSR)” on page 2:23. PSR.up is only cleared if the secure performance monitor bit (PSR.sp) is zero.
setf setf — Set Floating-point Value, Exponent, or Significand ) setf.s single_form Format: ) setf.d double_form ) setf.exp exponent_form ) setf.sig significand_form In the single and double forms, GR r is treated as a single precision (in the Description: single_form) or double precision (in the double_form) memory representation, converted into floating-point register format, and placed in FR , as shown in Figure 5-4...
Page 252
setf Operation: if (PR[qp]) { fp_check_target_register(f if (tmp_isrcode = fp_reg_disabled(f , 0, 0, 0)) disabled_fp_register_fault(tmp_isrcode, 0); if (!GR[r ].nat) { if (single_form) FR[f ] = fp_mem_to_fr_format(GR[r ], 4, 0); else if (double_form) FR[f ] = fp_mem_to_fr_format(GR[r ], 8, 0); else if (significand_form) { FR[f ].significand = GR[r FR[f...
Page 253
shl — Shift Left ) shl Format: ) shl pseudo-op of: ( ) dep.z , 64- count count count The value in GR is shifted to the left, with the vacated bit positions filled with zeroes, Description: and placed in GR r .
Page 254
shladd shladd — Shift Left and Add ) shladd Format: count The first source operand is shifted to the left by bits and then added to the second Description: count source operand and the result placed in GR . The first operand can be shifted by 1, 2, 3, or 4 bits.
shladdp4 shladdp4 — Shift Left and Add Pointer ) shladdp4 Format: count The first source operand is shifted to the left by bits and then is added to the Description: count second source operand. The upper 32 bits of the result are forced to zero, and then bits {31:30} of GR are copied to bits {62:61} of the result.
Page 256
shr — Shift Right ) shr signed_form Format: ) shr.u unsigned_form ) shr , count pseudo-op of: ( ) extr , count , 64-count ) shr.u , count pseudo-op of: ( ) extr.u , count , 64-count The value in GR is shifted to the right and placed in GR r .
Page 257
shrp shrp — Shift Right Pair ) shrp Format: count The two source operands, GR and GR , are concatenated to form a 128-bit value and Description: shifted to the right count bits. The least-significant 64 bits of the result are placed in The immediate value count can be any number in the range 0 to 63.
Page 258
srlz srlz — Serialize ) srlz.i instruction_form Format: ) srlz.d data_form Instruction serialization (srlz.i) ensures: Description: • prior modifications to processor register resources that affect fetching of subsequent instruction groups are observed, • prior modifications to processor register resources that affect subsequent execution or data memory accesses are observed, •...
Page 259
ssm — Set System Mask ) ssm Format: operand is ORed with the system mask (PSR{23:0}) and the result is placed Description: in the system mask. See Section 3.3.2, “Processor Status Register (PSR)” on page 2:23. The PSR system mask can only be written at the most privileged level, and when PSR.vm is 0.
st — Store ) st normal_form, no_base_update_form Format: sttype sthint ) st normal_form, imm_base_update_form sttype sthint ) st16. , ar.csd sixteen_byte_form, no_base_update_form sttype sthint ) st8.spill. spill_form, no_base_update_form sthint ) st8.spill. spill_form, imm_base_update_form sthint A value consisting of the least significant sz bytes of the value in GR is written to Description: memory starting at the address specified by the value in GR...
For the sixteen_byte_form, Illegal Operation fault is raised on processor models that do not support the instruction. CPUID register 4 indicates the presence of the feature on the processor model. See Section 3.1.11, “Processor Identification Registers” on page 1:34 for details. Table 2-51.
Page 262
Data TLB fault Unaligned Data Reference fault Data Page Not Present fault Unsupported Data Reference fault Data NaT Page Consumption fault Volume 3: Instruction Reference 3:253...
Page 263
stf — Floating-point Store ) stf normal_form, no_base_update_form Format: sthint ) stf normal_form, imm_base_update_form sthint ) stf8. integer_form, no_base_update_form sthint ) stf8. integer_form, imm_base_update_form sthint ) stf.spill. spill_form, no_base_update_form sthint ) stf.spill. spill_form, imm_base_update_form sthint A value, consisting of fsz bytes, is generated from the value in FR and written to Description: memory starting at the address specified by the value in GR...
Page 265
sub — Subtract ) sub register_form Format: ) sub minus1_form, register_form ) sub imm8_form The second source operand (and an optional constant 1) are subtracted from the first Description: operand and the result placed in GR . In the register form the first operand is GR ;...
Page 266
sum — Set User Mask ) sum Format: operand is ORed with the user mask (PSR{5:0}) and the result is placed in Description: the user mask. See Section 3.3.2, “Processor Status Register (PSR)” on page 2:23. PSR.up can only be set if the secure performance monitor bit (PSR.sp) is zero. Otherwise PSR.up is not modified.
sxt — Sign Extend (qp) sxt Format: xsz r The value in GR is sign extended from the bit position specified by xsz and the result Description: is placed in GR . The mnemonic values for xsz are given in Table 2-52.
Page 268
sync sync — Memory Synchronization ) sync.i Format: sync.i ensures that when previously initiated Flush Cache (fc, fc.i) operations issued Description: by the local processor become visible to local data memory references, prior Flush Cache operations are also observed by the local processor instruction fetch stream. sync.i also ensures that at the time previously initiated Flush Cache (fc, fc.i) operations are observed on a remote processor by data memory references they are also observed by instruction memory references on the remote processor.
Page 269
tak — Translation Access Key ) tak Format: The protection key for a given virtual address is obtained and placed in GR Description: When PSR.dt is 1, the DTLB and the VHPT are searched for the virtual address specified by GR and the region register indexed by GR bits {63:61}.
tbit tbit — Test Bit ) tbit. Format: trel ctype p The bit specified by the immediate is selected from GR r . The selected bit forms a Description: single bit result either complemented or not depending on the trel completer. This result is written to the two predicate register destinations .
Page 271
tbit Operation: if (PR[qp]) { if (p == p illegal_operation_fault(); if (trel == ‘nz’) // ‘nz’ - test for 1 tmp_rel = GR[r ]{pos else // ‘z’ - test for 0 tmp_rel = !GR[r ]{pos switch (ctype) { case ‘and’: // and-type compare if (GR[r ].nat || !tmp_rel) {...
tf — Test Feature ) tf. Format: trel ctype p value (in the range of 32-63) selects the feature bit defined in Table 2-57 to be Description: tested from the features vector in CPUID[4]. See Section 3.1.11, “Processor Identification Registers” on page 1:34 for details on CPUID registers.
Page 273
Operation: if (PR[qp]) { if (p == p illegal_operation_fault(); tmp_rel = (psr.vm && pal_vp_env_enabled() && VAC.a_tf) ? vcpuid[4]{imm5} : cpuid[4]{imm5}; if (trel == ‘z’) // ‘z’ - test for 0, not 1 tmp_rel = !tmp_rel; switch (ctype) { case ‘and’: // and-type compare if (!tmp_rel) { PR[p...
Page 274
thash thash — Translation Hashed Entry Address ) thash Format: A Virtual Hashed Page Table (VHPT) entry address is generated based on the specified Description: virtual address and the result is placed in GR . The virtual address is specified by GR and the region register selected by GR bits {63:61}.
tnat tnat — Test NaT ) tnat. Format: trel ctype p The NaT bit from GR forms a single bit result, either complemented or not depending Description: on the trel completer. This result is written to the two predicate register destinations, .
Page 276
tnat Operation: if (PR[qp]) { if (p == p illegal_operation_fault(); if (trel == ‘nz’) // ‘nz’ - test for 1 tmp_rel = GR[r ].nat; else // ‘z’ - test for 0 tmp_rel = !GR[r ].nat; switch (ctype) { case ‘and’: // and-type compare if (!tmp_rel) { PR[p...
Page 277
tpa — Translate to Physical Address ) tpa Format: The physical address for the virtual address specified by GR is obtained and placed in Description: When PSR.dt is 1, the DTLB and the VHPT are searched for the virtual address specified by GR and the region register indexed by GR bits {63:61}.
Page 278
ttag ttag — Translation Hashed Entry Tag ) ttag Format: A tag used for matching during searches of the long format Virtual Hashed Page Table Description: (VHPT) is generated and placed in GR . The virtual address is specified by GR the region register selected by GR bits {63:61}.
Page 279
unpack unpack — Unpack ) unpack1.h one_byte_form, high_form Format: ) unpack2.h two_byte_form, high_form ) unpack4.h four_byte_form, high_form ) unpack1.l one_byte_form, low_form ) unpack2.l two_byte_form, low_form ) unpack4.l four_byte_form, low_form The data elements of GR are unpacked, and the result placed in GR .
Page 280
unpack Figure 2-45. Unpack Operation GR r GR r unpack1.h GR r GR r GR r unpack1.l GR r GR r GR r unpack2.h GR r GR r GR r unpack2.l GR r GR r GR r unpack4.h GR r GR r GR r unpack4.l...
Page 282
vmsw vmsw — Virtual Machine Switch vmsw.0 zero_form Format: vmsw.1 one_form This instruction sets the PSR.vm bit to the specified value. This instruction can be used Description: to implement transitions to/from virtual machine mode without the overhead of an interruption. If instruction address translation is enabled and the page containing the vmsw instruction has access rights equal to 7, then the new value is written to the PSR.vm bit.
xchg xchg — Exchange ) xchg Format: ldhint r A value consisting of sz bytes is read from memory starting at the address specified by Description: the value in GR . The least significant sz bytes of the value in GR r are written to memory starting at the address specified by the value in GR r .
Page 285
xma — Fixed-Point Multiply Add ) xma.l low_form Format: ) xma.lu pseudo-op of: ( ) xma.l ) xma.h high_form ) xma.hu high_unsigned_form Two source operands (FR and FR ) are treated as either signed or unsigned integers Description: and multiplied. The third source operand (FR ) is zero extended and added to the product.
Page 287
xmpy xmpy — Fixed-Point Multiply ) xmpy.l pseudo-op of: ( ) xma.l , f0 Format: ) xmpy.lu pseudo-op of: ( ) xma.l , f0 ) xmpy.h pseudo-op of: ( ) xma.h ) xmpy.hu pseudo-op of: ( ) xma.hu , f0 Two source operands (FR and FR ) are treated as either signed or unsigned integers...
Page 288
xor — Exclusive Or ) xor register_form Format: ) xor imm8_form The two source operands are logically XORed and the result placed in GR . In the Description: register_form the first operand is GR ; in the imm8_form the first operand is taken from the encoding field.
Page 289
zxt — Zero Extend (qp) zxt Format: xsz r The value in GR is zero extended above the bit position specified by xsz and the result Description: is placed in GR . The mnemonic values for xsz are given in Table 2-52 on page 3:258.
Pseudo-Code Functions This chapter contains a table of all pseudo-code functions used on the Itanium instruction pages. Table 3-1. Pseudo-code Functions Function Operation xxx_fault(parameters ...) There are several fault functions. Each fault function accepts parameters specific to the fault, e.g., exception code values, virtual addresses, etc. If the fault is deferred for speculative load exceptions the fault function will return with a deferral indication.
Page 291
® ® Intel Itanium Architecture Software Developer’s Manual Rev. 2.3 Table 3-1. Pseudo-code Functions (Continued) Function Operation check_branch_implemented(check_type) Implementation-dependent routine which returns TRUE or FALSE, depending on whether a failing check instruction causes a branch (TRUE), or a Speculative Operation fault (FALSE). The result may be different for different types of check instructions: CHKS_GENERAL, CHKS_FLOAT, CHKA_GENERAL, CHKA_FLOAT.
Page 292
Table 3-1. Pseudo-code Functions (Continued) Function Operation fp_is_nan_or_inf(freg) Returns true if the floating-point exception_fault_check functions returned a IEEE fault disabled default result or a propagated NaN. fp_is_natval(freg) Returns true when floating register contains a NaTVal fp_is_normal(freg) Returns true when floating register contains a normal number. fp_is_pos_inf(freg) Returns true when floating register contains a positive infinity.
Page 293
® ® Intel Itanium Architecture Software Developer’s Manual Rev. 2.3 Table 3-1. Pseudo-code Functions (Continued) Function Operation impl_check_mov_itir() Implementation-specific function that returns TRUE if ITIR is checked for reserved fields and encodings on a mov to ITIR instruction. impl_check_mov_psr_l(gr) Implementation-specific function to check bits {63:32} of gr corresponding to reserved fields of the PSR for Reserved Register/Field fault.
Page 294
Table 3-1. Pseudo-code Functions (Continued) Function Operation is_read_only_reg(rtype, raddr) Returns a one if the register addressed by raddr in the register bank of type rtype is a read only register. is_reserved_field(regclass, arg2, arg3) Returns true if the specified data would write a one in a reserved field. is_reserved_reg(regclass, regnum) Returns true if register regnum is reserved in the regclass register file.
Page 295
® ® Intel Itanium Architecture Software Developer’s Manual Rev. 2.3 Table 3-1. Pseudo-code Functions (Continued) Function Operation mem_xchg_add(add_val, paddr, size, Returns size bytes from memory starting at the physical address specified by byte_order, mattr, otype, hint) paddr. The read is conditioned by the locality hint specified by hint. The least...
Page 296
Table 3-1. Pseudo-code Functions (Continued) Function Operation rse_load(type) Restores a register or NaT collection from the backing store (load_address = RSE.BspLoad - 8). If load_address{8:3} is equal to 0x3f then a NaT collection is loaded into a NaT dispersal register. (dispersal register may not be the same as AR[RNAT].) If load_address{8:3} is not equal to 0x3f then the register RSE.LoadReg - 1 is loaded and the NaT bit for that register is set to dispersal_register{load_address{8:3}}.
Page 297
® ® Intel Itanium Architecture Software Developer’s Manual Rev. 2.3 Table 3-1. Pseudo-code Functions (Continued) Function Operation spontaneous_deferral(paddr, size, Implementation-dependent routine which optionally forces *defer to TRUE if all of border, mattr, otype, hint, *defer) the following are true: spontaneous deferral is enabled, spontaneous deferral is permitted by the programming model, and the processor determines it would be advantageous to defer the speculative load (e.g., based on a miss in some particular...
Page 298
Table 3-1. Pseudo-code Functions (Continued) Function Operation tlb_may_purge_itc_entries(rid, vaddr, May locally purge ITC entries that match the specified virtual address (vaddr), region size) identifier (rid) and page size (size). May also invalidate entries that partially overlap the parameters. The extent of purging is implementation dependent. If the purge size is not supported, an implementation may generate a machine check abort or over purge the translation cache up to and including removal of all entries from the translation cache.
Page 299
® ® Intel Itanium Architecture Software Developer’s Manual Rev. 2.3 Table 3-1. Pseudo-code Functions (Continued) Function Operation tlb_translate(vaddr, size, type, cpl, *attr, Returns the translated data physical address for the specified virtual memory address *defer) (vaddr) when translation enabled; otherwise, returns vaddr. size specifies the size of the access, type specifies the type of access (e.g., read, write, advance, spec).
Page 300
Table 3-1. Pseudo-code Functions (Continued) Function Operation unimplemented_physical_address(paddr) Return TRUE if the presented physical address is unimplemented on this processor model; FALSE otherwise. This function is model specific. unimplemented_virtual_address(vaddr, Return TRUE if the presented virtual address is unimplemented on this processor model;...
Instruction Formats Each Itanium instruction is categorized into one of six types; each instruction type may be executed on one or more execution unit types. Table 4-1 lists the instruction types and the execution unit type on which they are executed: Table 4-1.
• Reserved major ops (light gray in the gray scale version of Table 4-3, brown in the color version) cause an Illegal Operation fault. • Reserved if PR[qp] is 1 major ops (dark gray in the gray scale version of Table 4-3, purple in the color version) cause an Illegal Operation fault if the predicate register...
Table 4-6. Instruction Field Names (Continued) Field Name Description sof, sol, sor alloc size of frame, size of locals, size of rotating immediates compare type opcode extension , timm branch predict tag immediate reserved opcode extension field branch whether hint opcode extension x, x opcode extension of length 1 or n extract/deposit/test bit/test NaT/hint opcode extension...
Some processors may implement the Reserved if PR[qp] is 1 (purple) and Reserved if PR[qp] is 1 B-unit (cyan) encodings in the L+X opcode space as Reserved (brown). These encodings appear in the L+X column of Table 4-3 on page 3:295, and in Table 4-69 on page 3:366,...
I-Unit Instruction Encodings 4.3.1 Multimedia and Variable Shifts All multimedia multiply/shift/max/min/mix/mux/pack/unpack and variable shift instructions are encoded within major opcode 7 using two 1-bit opcode extension fields in bits 36 (z ) and 33 (z ) and a 1-bit reserved opcode extension in bit 32 (v ) as shown in Table...
4.3.1.9 Bit Strings 37 36 35 34 33 32 31 30 29 28 27 26 20 19 13 12 Extension Instruction Operands Opcode popcnt 4.3.2 Integer Shifts The integer shift, test bit, and test NaT instructions are encoded within major opcode 5 using a 2-bit opcode extension field in bits 35:34 (x ) and a 1-bit opcode extension field in bit 33 (x).
Table 4-25. Misc I-Unit 6-bit Opcode Extensions Opcode Bits Bits Bits 32:31 Bits 40:37 35:33 30:27 break.i zxt1 mov from ip 1-bit Ext (Table 4-26) zxt2 mov from b zxt4 mov.i from ar mov from pr sxt1 sxt2 sxt4 czx1.l czx2.l mov.i to ar –...
4.3.5.2 Move from BR 37 36 35 33 32 27 26 16 15 13 12 Extension Instruction Operands Opcode 4.3.6 GR/Predicate/IP Moves The GR/Predicate/IP move instructions are encoded in major opcode 0. See “Miscellaneous I-Unit Instructions” on page 3:318 for a summary of the opcode extensions.
4.3.9 Test Feature 37 36 35 34 33 32 27 26 20 19 18 14 13 12 11 Extension Instruction Operands Opcode tf.z tf.z.unc tf.z.and tf.nz.and = imm tf.z.or tf.nz.or tf.z.or.andcm tf.nz.or.andcm M-Unit Instruction Encodings 4.4.1 Loads and Stores All load and store instructions are encoded within major opcodes 4, 5, 6, and 7 using a 6-bit opcode extension field in bits 35:30 (x ).
opcode extensions are summarized in Table 4-34 on page 3:326, Table 4-35 on page 3:326, and Table 4-36 on page 3:327, the floating-point load pair and set FR opcode extensions in Table 4-37 on page 3:327 Table 4-38 on page 3:328.
4.4.9 Miscellaneous M-Unit Instructions The miscellaneous M-unit instructions are encoded in major opcode 0 along with the system/memory management instructions. See “System/Memory Management” on page 3:345 for a summary of the opcode extensions. 4.4.9.1 Allocate Register Stack Frame 37 36 35 33 32 31 30 27 26 20 19...
4.4.10 System/Memory Management All system/memory management instructions are encoded within major opcodes 0 and 1 using a 3-bit opcode extension field (x ) in bits 35:33. Some instructions also have a 4-bit opcode extension field (x ) in bits 30:27, or a 6-bit opcode extension field (x ) in bits 32:27.
4.5.1 Branches Opcode 0 is used for indirect branch, opcode 1 for indirect call, opcode 4 for IP-relative branch, and opcode 5 for IP-relative call. The IP-relative branch instructions encoded within major opcode 4 use a 3-bit opcode extension field in bits 8:6 (btype) to distinguish the branch types as shown in Table 4-47.
The indirect branch instructions encoded within major opcodes 0 use a 3-bit opcode extension field in bits 8:6 (btype) to distinguish the branch types as shown in Table 4-49. Table 4-49. Indirect Branch Types Opcode btype Bits 40:37 Bits 32:27 Bits 8:6 br.cond br.ia...
Table 4-52. Branch Whether Hint Completer Bits 34:33 .sptk .spnt .dptk .dpnt Table 4-53. Indirect Call Whether Hint Completer Bits 34:32 .sptk .spnt .dptk .dpnt The branch instructions also have a 1-bit branch cache deallocation opcode hint extension field in bit 35 (d) as shown in Table 4-54.
Table 4-55. Indirect Predict/Nop/Hint Opcode Extensions Opcode Bits Bits 32:31 Bits 40:37 30:27 nop.b hint.b brp.ret The branch predict instructions all have a 1-bit branch importance opcode hint extension field in bit 35 (ih). The mov to BR instruction (page 3:320) also has this hint in bit 23.
Table 4-62. Reciprocal Approximation 1-bit Opcode Extensions Opcode Bits 40:37 Bit 33 Bit 36 frcpa frsqrta fprcpa fprsqrta Most floating-point instructions have a 2-bit opcode extension field in bits 35:34 (sf) which encodes the FPSR status field to be used. Table 4-63 summarizes these assignments.
Table 4-66. Floating-point Compare Opcode Extensions Opcode Bit 12 Bits Bit 33 Bit 36 40:37 fcmp.eq fcmp.eq.unc fcmp.lt fcmp.lt.unc fcmp.le fcmp.le.unc fcmp.unord fcmp.unord.unc The floating-point class instructions are encoded within major opcode 5 using a 1-bit opcode extension field in bit 12 (t ) as shown in Table 4-67.
Page 370
4.6.4 Approximation 4.6.4.1 Floating-point Reciprocal Approximation There are two Reciprocal Approximation instructions. The first, in major op 0, encodes the full register variant. The second, in major op 1, encodes the parallel variant. 37 36 35 34 33 32 27 26 20 19 13 12 0 - 1...
4.6.5 Minimum/Maximum and Parallel Compare There are two groups of Minimum/Maximum instructions. The first group, in major op 0, encodes the full register variants. The second group, in major op 1, encodes the parallel variants. The parallel compare instructions are all encoded in major op 1. 37 36 35 34 33 32 27 26 20 19...
4.6.9 Miscellaneous F-Unit Instructions 4.6.9.1 Break (F-Unit) 37 36 35 34 33 32 27 26 25 Extension Instruction Operands Opcode break.f 4.6.9.2 Nop/Hint (F-Unit) F-unit nop and hint instructions are encoded within major opcode 0 using a 3-bit opcode extension field in bits 35:33 (x ), a 6-bit opcode extension field in bits 32:27 ), and a 1-bit opcode extension field in bit 26 (y), as shown in Table...
Table 4-71. Move Long 1-bit Opcode Extensions Opcode Bits 40:37 Bit 20 movl 37 3635 2726 22 21 2019 1312 0 40 Extension Instruction Operands Opcode movl = imm 4.7.3 Long Branches Long branches are executed by a B-unit. Opcode C is used for long branch and opcode D for long call.
Resource and Dependency Semantics Reading and Writing Resources An Itanium instruction is said to be a reader of a resource if the instruction’s qualifying predicate is 1 or it has no qualifying predicate or is one of the instructions that reads a resource even when its qualifying predicate is 0, and the execution of the instruction depends on that resource.
RAW and WAW dependencies are generally not allowed without some type of serialization event (an implied, data, or instruction serialization after the first writing instruction. (See Section 3.2, “Serialization” on page 2:17 for details on serialization.) The tables and associated rules in this appendix provide a comprehensive list of readers and writers of resources and describe the serialization required for the dependency to be observed and possible outcomes if the required serialization is not met.
may expand to contain other classes, and that when fully expanded, a set of classes (e.g., the readers of some resource) may contain the same instruction multiple times. • The syntax ‘x\y’ where x and y are both instruction classes, indicates an unnamed instruction class that includes all instructions in instruction class x but that are not in instruction class y.
Table 5-1. Semantics of Dependency Codes (Continued) Semantics of Serialization Type Required Effects of Serialization Violation Dependency Code impliedF Instruction Group Break (same as above). An undefined value is returned, or an Illegal Operation fault may be taken. If no fault is taken, stop Stop.
• A list of all architecturally-defined, independently-writable resources in the Itanium architecture. Each row represents an ‘atomic’ resource. Thus, for each row in the table, hardware will probably require a separate write-enable control signal. • For each resource, a complete list of readers and writers. •...
Page 385
Table 5-2. RAW Dependencies Organized by Resource (Continued) Semantics of Resource Name Writers Readers Dependency AR[ITC] mov-to-AR-ITC br.ia, mov-from-AR-ITC impliedF AR[K%], mov-to-AR-K br.ia, mov-from-AR-K impliedF % in 0 - 7 AR[LC] mod-sched-brs-counted, br.ia, mod-sched-brs-counted, impliedF mov-to-AR-LC mov-from-AR-LC AR[PFS] br.call, brl.call alloc, br.ia, br.ret, epc, impliedF mov-from-AR-PFS...
Page 386
Table 5-2. RAW Dependencies Organized by Resource (Continued) Semantics of Resource Name Writers Readers Dependency CR[EOI] mov-to-CR-EOI none SC Section 5.8.3.4, “End of External Interrupt Register (EOI – CR67)” on page 2:124 CR[IFA] mov-to-CR-IFA itc.i, itc.d, itr.i, itr.d implied mov-from-CR-IFA data CR[IFS] mov-to-CR-IFS...
Page 387
Table 5-2. RAW Dependencies Organized by Resource (Continued) Semantics of Resource Name Writers Readers Dependency CR[TPR] mov-to-CR-TPR mov-from-CR-TPR, data mov-from-CR-IVR mov-to-PSR-l , ssm SC Section 5.8.3.3, “Task Priority Register (TPR – CR66)” page 2:123 implied CR%, none mov-from-CR-rv none % in 3, 5-7, 10-15, 18, 28-63, 75-79, 82-127 DBR# mov-to-IND-DBR...
5.3.3 WAW Dependency Table General rules specific to the WAW table: • All resources require at most an instruction group break to provide sequential behavior. • Some resources require no instruction group break to provide sequential behavior. • There are a few special cases that are described in greater detail elsewhere in the manual and are indicated with an SC (special case) result.
Page 397
Rule 2. These instructions only read CFM when they access a rotating GR, FR, or PR. mov-to-PR and mov-from-PR only access CFM when their qualifying predicate is in the rotating region. Rule 3. These instructions use a general register value to determine the specific indirect register accessed.
Need help?
Do you have a question about the ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS MANUAL VOLUME 3 REV 2.3 and is the answer not in the manual?
Questions and answers