Intel ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS MANUAL VOLUME 3 REV 2.3 Manual

Architecture software developer's manual revision 2.3
Hide thumbs Also See for ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS MANUAL VOLUME 3 REV 2.3:
Table of Contents

Advertisement

Quick Links

Advertisement

Table of Contents
loading

Summary of Contents for Intel ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS MANUAL VOLUME 3 REV 2.3

  • Page 2 ® ® Intel Itanium Architecture Software Developer’s Manual ® ® Volume 3: Intel Itanium Instruction Set Reference Revision 2.3 May 2010 Document Number: 323207...
  • Page 3 Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling1-800-548-4725, or by visiting Intel's website at http://www.intel.com.
  • Page 4: Table Of Contents

    Part 1: Application Architecture Guide ......3:1 1.1.2 Part 2: Optimization Guide for the Intel® Itanium® Architecture ..3:1 Overview of Volume 2: System Architecture.
  • Page 5 Function of getf.sig ............3:143 ® ® Intel Itanium Architecture Software Developer’s Manual, Rev. 2.3...
  • Page 6 Floating-point Class Relations ....... . 3:64 ® ® Intel Itanium Architecture Software Developer’s Manual, Rev. 2.3...
  • Page 7 Multimedia ALU Size 1 4-bit+2-bit Opcode Extensions ....3:307 4-14 Multimedia ALU Size 2 4-bit+2-bit Opcode Extensions ....3:307 ® ® Intel Itanium Architecture Software Developer’s Manual, Rev. 2.3...
  • Page 8 Floating-point Arithmetic 1-bit Opcode Extensions ..... 3:358 4-65 Fixed-point Multiply Add and Select Opcode Extensions ....3:358 ® ® Intel Itanium Architecture Software Developer’s Manual, Rev. 2.3...
  • Page 9 Instruction Classes ........3:389 § ® ® Intel Itanium Architecture Software Developer’s Manual, Rev. 2.3...
  • Page 10: About This Manual

    IA-32 application interface. This volume also describes optimization techniques used to generate high performance software. 1.1.1 Part 1: Application Architecture Guide ® Chapter 1, “About this Manual” provides an overview of all volumes in the Intel ® Itanium Architecture Software Developer’s Manual. ® ®...
  • Page 11: Overview Of Volume 2: System Architecture

    1.2.1 Part 1: System Architecture Guide ® Chapter 1, “About this Manual” provides an overview of all volumes in the Intel ® Itanium Architecture Software Developer’s Manual. ® ®...
  • Page 12: Part 2: System Programmer's Guide

    Chapter 9, “IA-32 Interruption Vector Descriptions” lists IA-32 exceptions, interrupts and intercepts that can occur during IA-32 instruction set execution in the Itanium System Environment. ® Chapter 10, “Itanium Architecture-based Operating System Interaction Model with IA-32 Applications” defines the operation of IA-32 instructions within the Itanium System Environment from the perspective of an Itanium architecture-based operating system.
  • Page 13: Appendices

    Instruction Set Reference This volume is a comprehensive reference to the Itanium instruction set, including instruction format/encoding. ® Chapter 1, “About this Manual” provides an overview of all volumes in the Intel ® Itanium Architecture Software Developer’s Manual. Chapter 2, “Instruction Reference”...
  • Page 14: Terminology

    These resources include instructions and registers. Itanium Architecture – The new ISA with 64-bit instruction capabilities, new performance- enhancing features, and support for the IA-32 instruction set. IA-32 Architecture – The 32-bit and 16-bit Intel architecture as described in the ® Intel 64 and IA-32 Architectures Software Developer’s Manual.
  • Page 15: Revision History

    ® • Intel 64 and IA-32 Architectures Software Developer’s Manual – This set of manuals describes the Intel 32-bit architecture. They are available from the Intel Literature Department by calling 1-800-548-4725 and requesting Document Numbers 243190, 243191and 243192. ® ®...
  • Page 16 Date of Revision Description Revision Number August 2005 Allow register fields in CR.LID register to be read-only and CR.LID checking on interruption messages by processors optional. See Vol 2, Part I, Ch 5 “Interruptions” and Section 11.2.2 PALE_RESET Exit State for details. Relaxed reserved and ignored fields checkings in IA-32 application registers in Vol 1 Ch 6 and Vol 2, Part I, Ch 10.
  • Page 17 Date of Revision Description Revision Number August 2002 Added Predicate Behavior of alloc Instruction Clarification (Section 4.1.2, Part I, Volume 1; Section 2.2, Part I, Volume 3). Added New fc.i Instruction (Section 4.4.6.1, and 4.4.6.2, Part I, Volume 1; Section 4.3.3, 4.4.1, 4.4.5, 4.4.6, 4.4.7, 5.5.2, and 7.1.2, Part I, Volume 2; Section 2.5, 2.5.1, 2.5.2, 2.5.3, and 4.5.2.1, Part II, Volume 2;...
  • Page 18 Date of Revision Description Revision Number Volume 2: Class pr-writers-int clarification (Table A-5). PAL_MC_DRAIN clarification (Section 4.4.6.1). VHPT walk and forward progress change (Section 4.1.1.2). IA-32 IBR/DBR match clarification (Section 7.1.1). ISR figure changes (pp. 8-5, 8-26, 8-33 and 8-36). PAL_CACHE_FLUSH return argument change –...
  • Page 19 Date of Revision Description Revision Number Volume 2: Clarifications regarding “reserved” fields in ITIR (Chapter 3). Instruction and Data translation must be enabled for executing IA-32 instructions (Chapters 3,4 and 10). FCR/FDR mappings, and clarification to the value of PSR.ri after an RFI (Chapters 3 and 4).
  • Page 20: Instruction Reference

    Instruction Reference This chapter describes the function of each Itanium instruction. The pages of this chapter are sorted alphabetically by assembly language mnemonic. Instruction Page Conventions The instruction pages are divided into multiple sections as listed in Table 2-1. The first three sections are present on all instruction pages.
  • Page 21: Register File Notation

    (64-bits not including the NaT bit) where the notation GR[addr] is used. The syntactical differences between the code found in the Operation section and ANSI C is listed in Table 2-4. Table 2-3. Register File Notation Assembly Indirect Register File C Notation Mnemonic Access...
  • Page 22: Instruction Descriptions

    Table 2-5. Pervasive Conditions Not Included in Instruction Description Code Condition Action Read of a register outside the current frame. An undefined value is returned (no fault). Access to a banked general register (GR 16 through GR 31). The GR bank specified by PSR.bn is accessed. PSR.ss is set.
  • Page 23 add — Add ) add register_form Format: ) add plus1_form, register_form ) add pseudo-op ) adds imm14_form ) addl imm22_form The two source operands (and an optional constant 1) are added and the result placed Description: in GR . In the register form the first operand is GR ;...
  • Page 24: Add Pointer

    addp4 addp4 — Add Pointer ) addp4 register_form Format: ) addp4 imm14_form The two source operands are added. The upper 32 bits of the result are forced to zero, Description: and then bits {31:30} of GR are copied to bits {62:61} of the result. This result is placed in GR .
  • Page 25: Stack Frame

    alloc alloc — Allocate Stack Frame ) alloc = ar.pfs, Format: A new stack frame is allocated on the general register stack, and the Previous Function Description: State register (PFS) is copied to GR . The change of frame size is immediate. The write of GR and subsequent instructions in the same instruction group use the new frame.
  • Page 26 alloc Operation: // tmp_sof, tmp_sol, tmp_sor are the fields encoded in the instruction tmp_sof = i + l + o; tmp_sol = i + l; tmp_sor = r u>> 3; check_target_register_sof(r , tmp_sof); if (tmp_sof u> 96 || r u> tmp_sof || tmp_sol u> tmp_sof || qp != 0) illegal_operation_fault();...
  • Page 27 and — Logical And ) and register_form Format: ) and imm8_form The two source operands are logically ANDed and the result placed in GR . In the Description: register_form the first operand is GR ; in the imm8_form the first operand is taken from the encoding field.
  • Page 28 andcm andcm — And Complement ) andcm register_form Format: ) andcm imm8_form The first source operand is logically ANDed with the 1’s complement of the second Description: source operand and the result placed in GR . In the register_form the first operand is ;...
  • Page 29: Branch Types

    br — Branch ) br. ip_relative_form Format: btype dh target ) br. call_form, ip_relative_form btype dh b target counted_form, ip_relative_form btype dh target pseudo-op dh target ) br. indirect_form btype dh b ) br. call_form, indirect_form btype dh b pseudo-op dh b A branch condition is evaluated, and either a branch is taken, or execution continues Description:...
  • Page 30 the branch condition is simply the value of the specified predicate register. These basic branch types are: • cond: If the qualifying predicate is 1, the branch is taken. Otherwise it is not taken. • call: If the qualifying predicate is 1, the branch is taken and several other actions occur: •...
  • Page 31 group as br.ia are not allowed, since br.ia may implicitly reads all ARs. If an illegal RAW dependency is present between an AR write and br.ia, the first IA-32 instruction fetch and execution may or may not see the updated AR value. IA-32 instruction set execution leaves the contents of the ALAT undefined.
  • Page 32: Operation Of Br.ctop And Br.cexit

    The modulo-scheduled loop types are: • ctop and cexit: These branch types behave identically, except in the determination of whether to branch or not. For br.ctop, the branch is taken if either LC is non-zero or EC is greater than one. For br.cexit, the opposite is true. It is not taken if either LC is non-zero or EC is greater than one and is taken otherwise.
  • Page 33: Operation Of Br.wtop And Br.wexit

    Figure 2-4. Operation of br.wtop and br.wexit wtop, wexit ==0 (Prolog / Epilog) (Special PR[qp]? Unrolled Loops) > 1 == 0 == 1 (Prolog / Kernel) (Prolog / Epilog) == 1 (Epilog) EC-- EC-- EC = EC EC = EC PR[63] = 0 PR[63] = 0 PR[63] = 0...
  • Page 34: Branch Whether Hint

    Table 2-7. Branch Whether Hint bwh Completer Branch Whether Hint spnt Static Not-Taken sptk Static Taken dpnt Dynamic Not-Taken dptk Dynamic Taken Table 2-8. Sequential Prefetch Hint ph Completer Sequential Prefetch Hint few or none Few lines many Many lines Table 2-9.
  • Page 35 tmp_taken = PR[qp]; if (tmp_taken) { // tmp_growth indicates the amount to move logical TOP *up*: // tmp_growth = sizeof(previous out) - sizeof(current frame) // a negative amount indicates a shrinking stack tmp_growth = (AR[PFS].pfm.sof - AR[PFS].pfm.sol) - CFM.sof; alat_frame_update(-AR[PFS].pfm.sol, 0); rse_fatal = rse_restore_frame(AR[PFS].pfm.sol, tmp_growth, CFM.sof);...
  • Page 36 illegal_operation_fault(); tmp_taken = (AR[LC] != 0); if (AR[LC] != 0) AR[LC]--; break; case ‘ctop’: case ‘cexit’: // SW pipelined counted loop if (slot != 2) illegal_operation_fault(); if (btype == ‘ctop’) tmp_taken = ((AR[LC] != 0) || (AR[EC] u> 1)); if (btype == ‘cexit’)tmp_taken = !((AR[LC] != 0) || (AR[EC] u> 1)); if (AR[LC] != 0) { AR[LC]--;...
  • Page 37 taken_branch = 1; IP = tmp_IP; // set the new value for IP if (!impl_uia_fault_supported() && ((PSR.it && unimplemented_virtual_address(tmp_IP, PSR.vm)) || (!PSR.it && unimplemented_physical_address(tmp_IP)))) unimplemented_instruction_address_trap(lower_priv_transition, tmp_IP); if (lower_priv_transition && PSR.lp) lower_privilege_transfer_trap(); if (PSR.tb) taken_branch_trap(); Illegal Operation fault Lower-Privilege Transfer trap Interruptions: Disabled Instruction Set Transition fault Taken Branch trap...
  • Page 38 break break — Break ) break pseudo-op Format: ) break.i i_unit_form ) break.b b_unit_form ) break.m m_unit_form ) break.f f_unit_form ) break.x x_unit_form A Break Instruction fault is taken. For the i_unit_form, f_unit_form and m_unit_form, Description: the value specified by is zero-extended and placed in the Interruption Immediate control register (IIM).
  • Page 39: Long Branch Types

    brl — Branch Long ) brl. Format: btype dh target ) brl. call_form btype dh b target brl. pseudo-op dh target A branch condition is evaluated, and either a branch is taken, or execution continues Description: with the next sequential instruction. The execution of a branch logically follows the execution of all previous non-branch instructions in the same instruction group.
  • Page 40 system is required to provide an Illegal Operation fault handler which emulates taken and not-taken long branches. Presence of this instruction is indicated by a 1 in the lb bit of CPUID register 4. See Section 3.1.11, “Processor Identification Registers” on page 1:34.
  • Page 41: Ip-Relative Branch Predict Whether Hint

    brp — Branch Predict brp. ip_relative_form Format: ipwh ih target brp. indirect_form indwh ih b brp.ret. return_form, indirect_form indwh ih b This instruction can be used to provide to hardware early information about a future Description: branch. It has no effect on architectural machine state, and operates as a nop instruction except for its performance effects.
  • Page 42 Operation: tmp_tag = IP + sign_ext((timm << 4), 13); if (ip_relative_form) { tmp_target = IP + sign_ext((imm << 4), 25); tmp_wh = ipwh; } else { // indirect_form tmp_target = BR[b tmp_wh = indwh; branch_predict(tmp_wh, ih, return_form, tmp_target, tmp_tag); None Interruptions: Volume 3: Instruction Reference 3:33...
  • Page 43 bsw — Bank Switch bsw.0 zero_form Format: bsw.1 one_form This instruction switches to the specified register bank. The zero_form specifies Bank 0 Description: for GR16 to GR31. The one_form specifies Bank 1 for GR16 to GR31. After the bank switch the previous register bank is no longer accessible but does retain its current state.
  • Page 44: Alat Clear Completer

    chk — Speculation Check ) chk.s pseudo-op Format: target ) chk.s.i control_form, i_unit_form, gr_form target ) chk.s.m control_form, m_unit_form, gr_form target ) chk.s control_form, fr_form target ) chk.a. data_form, gr_form aclr r target ) chk.a. data_form, fr_form aclr f target The result of a control- or data-speculative calculation is checked for success or failure.
  • Page 45 Operation: if (PR[qp]) { if (control_form) { if (fr_form && (tmp_isrcode = fp_reg_disabled(f , 0, 0, 0))) disabled_fp_register_fault(tmp_isrcode, 0); check_type = gr_form ? CHKS_GENERAL : CHKS_FLOAT; fail = (gr_form && GR[r ].nat) || (fr_form && FR[f ] == NATVAL); } else { // data_form if (gr_form) { reg_type...
  • Page 46 clrrrb clrrrb — Clear RRB clrrrb all_form Format: clrrrb.pr pred_form In the all_form, the register rename base registers (CFM.rrb.gr, CFM.rrb.fr, and Description: CFM.rrb.pr) are cleared. In the pred_form, the single register rename base register for the predicates (CFM.rrb.pr) is cleared. This instruction must be the last instruction in an instruction group;...
  • Page 47: ].Nat

    clz — Count Leading Zeros ) clz Format: The number of leading zeros in GR is placed in GR Description: An Illegal Operation fault is raised on processor models that do not support the instruction. CPUID register 4 indicates the presence of the feature on the processor model.
  • Page 48: Comparison Types

    cmp — Compare ) cmp. register_form Format: crel ctype p ) cmp. imm8_form crel ctype p ) cmp. = r0, parallel_inequality_form crel ctype p ) cmp. , r0 pseudo-op crel ctype p The two source operands are compared for one of ten relations specified by crel. This Description: produces a boolean result which is 1 if the comparison condition is true, and 0 otherwise.
  • Page 49: 64-Bit Comparison Relations For Normal And Unc Compares

    simply uses the negative relation with an implemented type. The implemented relations and how the pseudo-ops map onto them are shown in Table 2-16 (for normal and unc type compares), and Table 2-17 (for parallel type compares). Table 2-16. 64-bit Comparison Relations for Normal and unc Compares Compare Relation Register Form is a Immediate Form is a...
  • Page 50 Operation: if (PR[qp]) { if (p == p illegal_operation_fault(); tmp_nat = (register_form ? GR[r ].nat : 0) || GR[r ].nat; if (register_form) tmp_src = GR[r else if (imm8_form) tmp_src = sign_ext(imm , 8); else // parallel_inequality_form tmp_src = 0; (crel == ‘eq’) tmp_rel = tmp_src == GR[r else if (crel == ‘ne’) tmp_rel = tmp_src != GR[r...
  • Page 51 illegal_operation_fault(); PR[p ] = 0; PR[p ] = 0; Illegal Operation fault Interruptions: 3:42 Volume 3: Instruction Reference...
  • Page 52: Immediate Range For 32-Bit Compares

    cmp4 cmp4 — Compare 4 Bytes ) cmp4. register_form Format: crel ctype p ) cmp4. imm8_form crel ctype p ) cmp4. = r0, parallel_inequality_form crel ctype p ) cmp4. , r0 pseudo-op crel ctype p The least significant 32 bits from each of two source operands are compared for one of Description: ten relations specified by crel.
  • Page 53 cmp4 Operation: if (PR[qp]) { if (p == p illegal_operation_fault(); tmp_nat = (register_form ? GR[r ].nat : 0) || GR[r ].nat; if (register_form) tmp_src = GR[r else if (imm8_form) tmp_src = sign_ext(imm , 8); else // parallel_inequality_form tmp_src = 0; (crel == ‘eq’) tmp_rel = tmp_src{31:0} == GR[r ]{31:0};...
  • Page 54 cmp4 PR[p ] = 0; break; case ‘unc’: // unc-type compare default: // normal compare if (tmp_nat) { PR[p ] = 0; PR[p ] = 0; } else { PR[p ] = tmp_rel; PR[p ] = !tmp_rel; break; } else { if (ctype == ‘unc’) { if (p == p...
  • Page 55: Memory Compare And Exchange Size

    cmpxchg cmpxchg — Compare and Exchange ) cmpxchg , ar.ccv Format: ldhint r ) cmp8xchg16. , ar.csd, ar.ccv sixteen_byte_form ldhint r A value consisting of sz bytes (8 bytes for cmp8xchg16) is read from memory starting at Description: the address specified by the value in GR .
  • Page 56 cmpxchg affect program functionality and may be ignored by the implementation. See Section 4.4.6, “Memory Hierarchy Control and Consistency” on page 1:69 for details. For cmp8xchg16, Illegal Operation fault is raised on processor models that do not support the instruction. CPUID register 4 indicates the presence of the feature on the processor model.
  • Page 57 cover cover — Cover Stack Frame cover Format: A new stack frame of zero size is allocated which does not include any registers from Description: the previous frame (as though all output registers in the previous frame had been locals). The register rename base registers are reset. If interruption collection is disabled (PSR.ic is zero), then the old value of the Current Frame Marker (CFM) is copied to the Interruption Function State register (IFS), and IFS.v is set to one.
  • Page 58 czx — Compute Zero Index ) czx1.l one_byte_form, left_form Format: ) czx1.r one_byte_form, right_form ) czx2.l two_byte_form, left_form ) czx2.r two_byte_form, right_form is scanned for a zero element. The element is either an 8-bit aligned byte Description: (one_byte_form) or a 16-bit aligned pair of bytes (two_byte_form). The index of the first zero element is placed in GR .
  • Page 59 else if ((GR[r ] & 0x0000ffff00000000) == 0) GR[r ] = 2; else if ((GR[r ] & 0xffff000000000000) == 0) GR[r ] = 3; else GR[r ] = 4; GR[r ].nat = GR[r ].nat; Illegal Operation fault Interruptions: 3:50 Volume 3: Instruction Reference...
  • Page 60 dep — Deposit ) dep merge_form, register_form Format: ) dep merge_form, imm_form , pos ) dep.z zero_form, register_form ) dep.z zero_form, imm_form In the merge_form, a right justified bit field taken from the first source operand is Description: deposited into the value in GR r at an arbitrary bit position and the result is placed in GR r .
  • Page 61 Operation: if (PR[qp]) { check_target_register(r if (imm_form) { tmp_src = (merge_form ? sign_ext(imm ,1) : sign_ext(imm , 8)); tmp_nat = merge_form ? GR[r ].nat : 0; tmp_len = len } else { // register_form tmp_src = GR[r tmp_nat = (merge_form ? GR[r ].nat : 0) || GR[r ].nat;...
  • Page 62 epc — Enter Privileged Code Format: This instruction increases the privilege level. The new privilege level is given by the TLB Description: entry for the page containing this instruction. This instruction can be used to implement calls to higher-privileged routines without the overhead of an interruption. Before increasing the privilege level, a check is performed.
  • Page 63 extr extr — Extract ) extr signed_form Format: ) extr.u unsigned_form A field is extracted from GR , either zero extended or sign extended, and placed Description: right-justified in GR . The field begins at the bit position given by the second operand and extends bits to the left.
  • Page 64 fabs fabs — Floating-point Absolute Value ) fabs pseudo-op of: ( ) fmerge.s = f0, Format: The absolute value of the value in FR is computed and placed in FR Description: If FR is a NaTVal, FR is set to NaTVal instead of the computed result. Operation: See “fmerge —...
  • Page 65: Specified Pc Mnemonic Values

    fadd fadd — Floating-point Add ) fadd. pseudo-op of: ( ) fma. , f1, Format: sf f sf f and FR are added (computed to infinite precision), rounded to the precision Description: indicated by pc (and possibly FPSR.sf.pc and FPSR.sf.wre) using the rounding mode specified by FPSR.sf.rc, and placed in FR .
  • Page 66: Natval

    famax famax — Floating-point Absolute Maximum ) famax. Format: sf f The operand with the larger absolute value is placed in FR . If the magnitude of FR Description: equals the magnitude of FR , FR gets FR If either FR or FR is a NaN, FR gets FR...
  • Page 67 famin famin — Floating-point Absolute Minimum ) famin. Format: sf f The operand with the smaller absolute value is placed in FR . If the magnitude of FR Description: equals the magnitude of FR , FR gets FR If either FR or FR is a NaN, FR gets FR...
  • Page 68 fand fand — Floating-point Logical And ) fand Format: The bit-wise logical AND of the significand fields of FR and FR is computed. The Description: resulting value is stored in the significand field of FR . The exponent field of FR is set to the biased exponent for 2.0 (0x1003E) and the sign field of FR...
  • Page 69 fandcm fandcm — Floating-point And Complement ) fandcm Format: The bit-wise logical AND of the significand field of FR with the bit-wise complemented Description: significand field of FR is computed. The resulting value is stored in the significand field of FR .
  • Page 70: If (Fail)

    fc — Flush Cache ) fc invalidate_line_form Format: ) fc.i instruction_cache_coherent_form In the invalidate_line form, the cache line associated with the address specified by the Description: value of GR r is invalidated from all levels of the processor cache hierarchy. The invalidation is broadcast throughout the coherence domain.
  • Page 71 Register NaT Consumption fault Data TLB fault Interruptions: Unimplemented Data Address fault Data Page Not Present fault Data Nested TLB fault Data NaT Page Consumption fault Alternate Data TLB fault Data Access Rights fault VHPT Data fault 3:62 Volume 3: Instruction Reference...
  • Page 72: Taken_Branch = 1

    fchkf fchkf — Floating-point Check Flags ) fchkf. Format: sf target The flags in FPSR.sf.flags are compared with FPSR.s0.flags and FPSR.traps. If any flags Description: set in FPSR.sf.flags correspond to FPSR.traps which are enabled, or if any flags set in FPSR.sf.flags are not set in FPSR.s0.flags, then a branch to is taken.
  • Page 73: Floating-Point Class Relations

    fclass fclass — Floating-point Class ) fclass. Format: fcrel fctype p fclass The contents of FR are classified according to the completer as shown in Description: fclass Table 2-25. This produces a boolean result based on whether the contents of FR agrees with the floating-point number format specified by , as specified by the fclass...
  • Page 74 fclass Operation: if (PR[qp]) { if (p == p illegal_operation_fault(); if (tmp_isrcode = fp_reg_disabled(f , 0, 0, 0)) disabled_fp_register_fault(tmp_isrcode, 0); tmp_rel = ((fclass {0} && !FR[f ].sign || fclass {1} && FR[f ].sign) && ((fclass {2} && fp_is_zero(FR[f ]))|| (fclass {3} &&...
  • Page 75 fclrf fclrf — Floating-point Clear Flags ) fclrf. Format: The status field’s 6-bit flags field is reset to zero. Description: The mnemonic values for sf are given in Table 2-23 on page 3:56. Operation: if (PR[qp]) { fp_set_sf_flags(sf, 0); None FP Exceptions: None Interruptions:...
  • Page 76: Floating-Point Comparison Types

    fcmp fcmp — Floating-point Compare ) fcmp. Format: frel fctype sf p The two source operands are compared for one of twelve relations specified by frel. This Description: produces a boolean result which is 1 if the comparison condition is true, and 0 otherwise.
  • Page 77 fcmp Operation: if (PR[qp]) { if (p == p illegal_operation_fault(); if (tmp_isrcode = fp_reg_disabled(f , 0, 0)) disabled_fp_register_fault(tmp_isrcode, 0); if (fp_is_natval(FR[f ]) || fp_is_natval(FR[f ])) { PR[p ] = 0; PR[p ] = 0; } else { fcmp_exception_fault_check(f , frel, sf, &tmp_fp_env); if (fp_raise_fault(tmp_fp_env)) fp_exception_fault(fp_decode_fault(tmp_fp_env));...
  • Page 78 fcmp Invalid Operation (V) FP Exceptions: Denormal/Unnormal Operand (D) Software Assist (SWA) fault Illegal Operation fault Floating-point Exception fault Interruptions: Disabled Floating-point Register fault Volume 3: Instruction Reference 3:69...
  • Page 79 fcvt.fx fcvt.fx — Convert Floating-point to Integer ) fcvt.fx. signed_form Format: sf f ) fcvt.fx.trunc. signed_form, trunc_form sf f ) fcvt.fxu. unsigned_form sf f ) fcvt.fxu.trunc. unsigned_form, trunc_form sf f is treated as a register format floating-point value and converted to a signed Description: (signed_form) or unsigned integer (unsigned_form) using either the rounding mode specified in the FPSR.sf.rc, or using Round-to-Zero if the trunc_form of the instruction is...
  • Page 80 fcvt.fx Invalid Operation (V) Inexact (I) FP Exceptions: Denormal/Unnormal Operand (D) Software Assist (SWA) fault Illegal Operation fault Floating-point Exception fault Interruptions: Disabled Floating-point Register fault Floating-point Exception trap Volume 3: Instruction Reference 3:71...
  • Page 81 fcvt.xf fcvt.xf — Convert Signed Integer to Floating-point ) fcvt.xf Format: The 64-bit significand of FR is treated as a signed integer and its register file precision Description: floating-point representation is placed in FR If FR is a NaTVal, FR is set to NaTVal instead of the computed result.
  • Page 82 fcvt.xuf fcvt.xuf — Convert Unsigned Integer to Floating-point ) fcvt.xuf.pc.sf pseudo-op of: ( ) fma. , f1, f0 Format: sf f is multiplied with FR 1, rounded to the precision indicated by pc (and possibly Description: FPSR.sf.pc and FPSR.sf.wre) using the rounding mode specified by FPSR.sf.rc, and placed in FR Note: Multiplying FR with FR 1 (a 1.0) normalizes the canonical representation of an...
  • Page 83: Fetch And Add Semaphore Types

    fetchadd fetchadd — Fetch and Add Immediate ) fetchadd4. four_byte_form Format: ldhint r ) fetchadd8. eight_byte_form ldhint r A value consisting of four or eight bytes is read from memory starting at the address Description: specified by the value in GR .
  • Page 84 fetchadd Operation: if (PR[qp]) { check_target_register(r if (GR[r ].nat) register_nat_consumption_fault(SEMAPHORE); size = four_byte_form ? 4 : 8; paddr = tlb_translate(GR[r ], size, SEMAPHORE, PSR.cpl, &mattr, &tmp_unused); if (!ma_supports_fetchadd(mattr)) unsupported_data_reference_fault(SEMAPHORE, GR[r if (sem == ‘acq’) val = mem_xchg_add(inc , paddr, size, UM.be, mattr, ACQUIRE, ldhint); else // ‘rel’...
  • Page 85 flushrs flushrs — Flush Register Stack flushrs Format: All stacked general registers in the dirty partition of the register stack are written to the Description: backing store before execution continues. The dirty partition contains registers from previous procedure frames that have not yet been saved to the backing store. For a description of the register stack partitions, refer to Chapter 6, “Register Stack Engine”...
  • Page 86 fma — Floating-point Multiply Add ) fma. Format: sf f The product of FR and FR is computed to infinite precision and then FR is added to Description: this product, again in infinite precision. The resulting value is then rounded to the precision indicated by pc (and possibly FPSR.sf.pc and FPSR.sf.wre) using the rounding mode specified by FPSR.sf.rc.
  • Page 87 Illegal Operation fault Floating-point Exception fault Interruptions: Disabled Floating-point Register fault Floating-point Exception trap 3:78 Volume 3: Instruction Reference...
  • Page 88 fmax fmax — Floating-point Maximum ) fmax. Format: sf f The operand with the larger value is placed in FR . If FR equals FR , FR gets FR Description: If either FR or FR is a NaN, FR gets FR If either FR or FR is a NaTVal, FR...
  • Page 89: Floating-Point Merge Negative Sign Operation

    fmerge fmerge — Floating-point Merge ) fmerge.ns neg_sign_form Format: ) fmerge.s sign_form ) fmerge.se sign_exp_form Sign, exponent and significand fields are extracted from FR and FR , combined, and Description: the result is placed in FR For the neg_sign_form, the sign of FR is negated and concatenated with the exponent and the significand of FR .
  • Page 90 fmerge Operation: if (PR[qp]) { fp_check_target_register(f if (tmp_isrcode = fp_reg_disabled(f , 0)) disabled_fp_register_fault(tmp_isrcode, 0); if (fp_is_natval(FR[f ]) || fp_is_natval(FR[f ])) { FR[f ] = NATVAL; } else { FR[f ].significand = FR[f ].significand; if (neg_sign_form) { FR[f ].exponent = FR[f ].exponent;...
  • Page 91 fmin fmin — Floating-point Minimum ) fmin. Format: sf f The operand with the smaller value is placed in FR . If FR equals FR , FR gets FR Description: If either FR or FR is a NaN, FR gets FR If either FR or FR is a NaTVal, FR...
  • Page 92: Floating-Point Mix Left

    fmix fmix — Floating-point Mix ) fmix.l mix_l_form Format: ) fmix.r mix_r_form ) fmix.lr mix_lr_form For the mix_l_form (mix_r_form), the left (right) single precision value in FR Description: concatenated with the left (right) single precision value in FR . For the mix_lr_form, the left single precision value in FR is concatenated with the right single precision value in FR...
  • Page 93 fmix Operation: if (PR[qp]) { fp_check_target_register(f if (tmp_isrcode = fp_reg_disabled(f , 0)) disabled_fp_register_fault(tmp_isrcode, 0); if (fp_is_natval(FR[f ]) || fp_is_natval(FR[f ])) { FR[f ] = NATVAL; } else { if (mix_l_form) { tmp_res_hi = FR[f ].significand{63:32}; tmp_res_lo = FR[f ].significand{63:32}; } else if (mix_r_form) { tmp_res_hi = FR[f ].significand{31:0};...
  • Page 94 fmpy fmpy — Floating-point Multiply ) fmpy. pseudo-op of: ( ) fma. , f0 Format: sf f sf f The product FR and FR is computed to infinite precision. The resulting value is then Description: rounded to the precision indicated by pc (and possibly FPSR.sf.pc and FPSR.sf.wre) using the rounding mode specified by FPSR.sf.rc.
  • Page 95 fms — Floating-point Multiply Subtract ) fms. Format: sf f The product of FR and FR is computed to infinite precision and then FR Description: subtracted from this product, again in infinite precision. The resulting value is then rounded to the precision indicated by pc (and possibly FPSR.sf.pc and FPSR.sf.wre) using the rounding mode specified by FPSR.sf.rc.
  • Page 96 Illegal Operation fault Floating-point Exception fault Interruptions: Disabled Floating-point Register fault Floating-point Exception trap Volume 3: Instruction Reference 3:87...
  • Page 97 fneg fneg — Floating-point Negate ) fneg pseudo-op of: ( ) fmerge.ns Format: The value in FR is negated and placed in FR Description: If FR is a NaTVal, FR is set to NaTVal instead of the computed result. Operation: See “fmerge —...
  • Page 98 fnegabs fnegabs — Floating-point Negate Absolute Value ) fnegabs pseudo-op of: ( ) fmerge.ns = f0, Format: The absolute value of the value in FR is computed, negated, and placed in FR Description: If FR is a NaTVal, FR is set to NaTVal instead of the computed result. Operation: See “fmerge —...
  • Page 99 fnma fnma — Floating-point Negative Multiply Add ) fnma. Format: sf f The product of FR and FR is computed to infinite precision, negated, and then FR Description: is added to this product, again in infinite precision. The resulting value is then rounded to the precision indicated by pc (and possibly FPSR.sf.pc and FPSR.sf.wre) using the rounding mode specified by FPSR.sf.rc.
  • Page 100 fnma Illegal Operation fault Floating-point Exception fault Interruptions: Disabled Floating-point Register fault Floating-point Exception trap Volume 3: Instruction Reference 3:91...
  • Page 101 fnmpy fnmpy — Floating-point Negative Multiply ) fnmpy. pseudo-op of: ( ) fnma. Format: sf f sf f The product FR and FR is computed to infinite precision and then negated. The Description: resulting value is then rounded to the precision indicated by pc (and possibly FPSR.sf.pc and FPSR.sf.wre) using the rounding mode specified by FPSR.sf.rc.
  • Page 102 fnorm fnorm — Floating-point Normalize ) fnorm. pseudo-op of: ( ) fma. , f1, f0 Format: sf f sf f is normalized and rounded to the precision indicated by pc (and possibly Description: FPSR.sf.pc and FPSR.sf.wre) using the rounding mode specified by FPSR.sf.rc, and placed in FR If FR is a NaTVal, FR...
  • Page 103 for — Floating-point Logical Or ) for Format: The bit-wise logical OR of the significand fields of FR and FR is computed. The Description: resulting value is stored in the significand field of FR . The exponent field of FR is set to the biased exponent for 2.0 (0x1003E) and the sign field of FR...
  • Page 104 fpabs fpabs — Floating-point Parallel Absolute Value ) fpabs pseudo-op of: ( ) fpmerge.s = f0, Format: The absolute values of the pair of single precision values in the significand field of FR Description: are computed and stored in the significand field of FR .
  • Page 105: Floating-Point Pack

    fpack fpack — Floating-point Pack ) fpack pack_form Format: The register format numbers in FR and FR are converted to single precision memory Description: format. These two single precision numbers are concatenated and stored in the significand field of FR .
  • Page 106 fpamax fpamax — Floating-point Parallel Absolute Maximum ) fpamax. Format: sf f The paired single precision values in the significands of FR and FR are compared. Description: The operands with the larger absolute value are returned in the significand field of FR If the magnitude of high (low) FR is less than the magnitude of high (low) FR , high...
  • Page 107 fpamax Invalid Operation (V) FP Exceptions: Denormal/Unnormal Operand (D) Software Assist (SWA) fault Illegal Operation fault Floating-point Exception fault Interruptions: Disabled Floating-point Register fault 3:98 Volume 3: Instruction Reference...
  • Page 108 fpamin fpamin — Floating-point Parallel Absolute Minimum ) fpamin. Format: sf f The paired single precision values in the significands of FR or FR are compared. The Description: operands with the smaller absolute value is returned in the significand of FR If the magnitude of high (low) FR is less than the magnitude of high (low) FR , high...
  • Page 109 fpamin Invalid Operation (V) FP Exceptions: Denormal/Unnormal Operand (D) Software Assist (SWA) fault Illegal Operation fault Floating-point Exception fault Interruptions: Disabled Floating-point Register fault 3:100 Volume 3: Instruction Reference...
  • Page 110: Floating-Point Parallel Comparison Results

    fpcmp fpcmp — Floating-point Parallel Compare ) fpcmp. Format: frel sf f The two pairs of single precision source operands in the significand fields of FR and FR Description: are compared for one of twelve relations specified by frel. This produces a boolean result which is a mask of 32 1’s if the comparison condition is true, and a mask of 32 0’s otherwise.
  • Page 111 fpcmp Operation: if (PR[qp]) { fp_check_target_register(f if (tmp_isrcode = fp_reg_disabled(f , 0)) disabled_fp_register_fault(tmp_isrcode, 0); if (fp_is_natval(FR[f ]) || fp_is_natval(FR[f ])) { FR[f ] = NATVAL; } else { fpcmp_exception_fault_check(f , frel, sf, &tmp_fp_env); if (fp_raise_fault(tmp_fp_env)) fp_exception_fault(fp_decode_fault(tmp_fp_env)); tmp_fr2 = fp_reg_read_hi(f tmp_fr3 = fp_reg_read_hi(f (frel == ‘eq’) tmp_rel = fp_equal(tmp_fr2, tmp_fr3);...
  • Page 112 fpcmp tmp_res_lo = (tmp_rel ? 0xFFFFFFFF : 0x00000000); FR[f ].significand = fp_concatenate(tmp_res_hi, tmp_res_lo); FR[f ].exponent = FP_INTEGER_EXP; FR[f ].sign = FP_SIGN_POSITIVE; fp_update_fpsr(sf, tmp_fp_env); fp_update_psr(f Invalid Operation (V) FP Exceptions: Denormal/Unnormal Operand (D) Software Assist (SWA) fault Illegal Operation fault Floating-point Exception fault Interruptions: Disabled Floating-point Register fault Volume 3: Instruction Reference...
  • Page 113 fpcvt.fx fpcvt.fx — Convert Parallel Floating-point to Integer ) fpcvt.fx. signed_form Format: sf f ) fpcvt.fx.trunc. signed_form, trunc_form sf f ) fpcvt.fxu. unsigned_form sf f ) fpcvt.fxu.trunc. unsigned_form, trunc_form sf f The pair of single precision values in the significand field of FR is converted to a pair Description: of 32-bit signed integers (signed_form) or unsigned integers (unsigned_form) using...
  • Page 114 fpcvt.fx Operation: if (PR[qp]) { fp_check_target_register(f if (tmp_isrcode = fp_reg_disabled(f , 0, 0)) disabled_fp_register_fault(tmp_isrcode, 0); if (fp_is_natval(FR[f ])) { FR[f ] = NATVAL; fp_update_psr(f } else { tmp_default_result_pair = fpcvt_exception_fault_check(f signed_form, trunc_form, sf, &tmp_fp_env); if (fp_raise_fault(tmp_fp_env)) fp_exception_fault(fp_decode_fault(tmp_fp_env)); if (fp_is_nan(tmp_default_result_pair.hi)) { tmp_res_hi = INTEGER_INDEFINITE_32_BIT;...
  • Page 115 fpcvt.fx Illegal Operation fault Floating-point Exception fault Interruptions: Disabled Floating-point Register fault Floating-point Exception trap 3:106 Volume 3: Instruction Reference...
  • Page 116 fpma fpma — Floating-point Parallel Multiply Add ) fpma. Format: sf f The pair of products of the pairs of single precision values in the significand fields of FR Description: and FR are computed to infinite precision and then the pair of single precision values in the significand field of FR is added to these products, again in infinite precision.
  • Page 117 fpma Operation: if (PR[qp]) { fp_check_target_register(f if (tmp_isrcode = fp_reg_disabled(f disabled_fp_register_fault(tmp_isrcode, 0); if (fp_is_natval(FR[f ]) || fp_is_natval(FR[f ]) || fp_is_natval(FR[f ])) { FR[f ] = NATVAL; fp_update_psr(f } else { tmp_default_result_pair = fpma_exception_fault_check(f , sf, &tmp_fp_env); if (fp_raise_fault(tmp_fp_env)) fp_exception_fault(fp_decode_fault(tmp_fp_env)); if (fp_is_nan_or_inf(tmp_default_result_pair.hi)) { tmp_res_hi = fp_single(tmp_default_result_pair.hi);...
  • Page 118 fpmax fpmax — Floating-point Parallel Maximum ) fpmax. Format: sf f The paired single precision values in the significands of FR or FR are compared. The Description: operands with the larger value is returned in the significand of FR If the value of high (low) FR is less than the value of high (low) FR , high (low) FR gets high (low) FR...
  • Page 119 fpmax Illegal Operation fault Floating-point Exception fault Interruptions: Disabled Floating-point Register fault 3:110 Volume 3: Instruction Reference...
  • Page 120: Floating-Point Parallel Merge Negative Sign Operation

    fpmerge fpmerge — Floating-point Parallel Merge ) fpmerge.ns neg_sign_form Format: ) fpmerge.s sign_form ) fpmerge.se sign_exp_form For the neg_sign_form, the signs of the pair of single precision values in the significand Description: field of FR are negated and concatenated with the exponents and the significands of the pair of single precision values in the significand field of FR and stored in the significand field of FR...
  • Page 121: Floating-Point Parallel Merge Sign And Exponent Operation

    fpmerge Figure 2-17. Floating-point Parallel Merge Sign and Exponent Operation 64 63 55 54 32 31 23 22 64 63 55 54 32 31 23 22 64 63 55 54 32 31 23 22 1003E Operation: if (PR[qp]) { fp_check_target_register(f if (tmp_isrcode = fp_reg_disabled(f , 0)) disabled_fp_register_fault(tmp_isrcode, 0);...
  • Page 122 fpmin fpmin — Floating-point Parallel Minimum ) fpmin. Format: sf f The paired single precision values in the significands of FR or FR are compared. The Description: operands with the smaller value is returned in significand of FR If the value of high (low) FR is less than the value of high (low) FR , high (low) FR gets high (low) FR...
  • Page 123 fpmin Illegal Operation fault Floating-point Exception fault Interruptions: Disabled Floating-point Register fault 3:114 Volume 3: Instruction Reference...
  • Page 124 fpmpy fpmpy — Floating-point Parallel Multiply ) fpmpy. pseudo-op of: ( ) fpma. , f0 Format: sf f sf f The pair of products of the pairs of single precision values in the significand fields of FR Description: and FR are computed to infinite precision.
  • Page 125 fpms fpms — Floating-point Parallel Multiply Subtract ) fpms. Format: sf f The pair of products of the pairs of single precision values in the significand fields of FR Description: and FR are computed to infinite precision and then the pair of single precision values in the significand field of FR is subtracted from these products, again in infinite precision.
  • Page 126 fpms tmp_res_lo = fp_ieee_round_sp(tmp_res, LOW, &tmp_fp_env); FR[f ].significand = fp_concatenate(tmp_res_hi, tmp_res_lo); FR[f ].exponent = FP_INTEGER_EXP; FR[f ].sign = FP_SIGN_POSITIVE; fp_update_fpsr(sf, tmp_fp_env); fp_update_psr(f if (fp_raise_traps(tmp_fp_env)) fp_exception_trap(fp_decode_trap(tmp_fp_env)); Invalid Operation (V) Underflow (U) FP Exceptions: Denormal/Unnormal Operand (D) Overflow (O) Software Assist (SWA) fault Inexact (I) Software Assist (SWA) trap Illegal Operation fault...
  • Page 127 fpneg fpneg — Floating-point Parallel Negate ) fpneg pseudo-op of: ( ) fpmerge.ns Format: The pair of single precision values in the significand field of FR are negated and stored Description: in the significand field of FR . The exponent field of FR is set to the biased exponent for 2.0 (0x1003E) and the sign field of FR...
  • Page 128 fpnegabs fpnegabs — Floating-point Parallel Negate Absolute Value ) fpnegabs pseudo-op of: ( ) fpmerge.ns = f0, Format: The absolute values of the pair of single precision values in the significand field of FR Description: are computed, negated and stored in the significand field of FR .
  • Page 129 fpnma fpnma — Floating-point Parallel Negative Multiply Add ) fpnma. Format: sf f The pair of products of the pairs of single precision values in the significand fields of FR Description: and FR are computed to infinite precision, negated, and then the pair of single precision values in the significand field of FR are added to these (negated) products, again in infinite precision.
  • Page 130 fpnma Operation: if (PR[qp]) { fp_check_target_register(f if (tmp_isrcode = fp_reg_disabled(f disabled_fp_register_fault(tmp_isrcode, 0); if (fp_is_natval(FR[f ]) || fp_is_natval(FR[f ]) || fp_is_natval(FR[f ])) { FR[f ] = NATVAL; fp_update_psr(f } else { tmp_default_result_pair = fpms_fpnma_exception_fault_check(f , sf, &tmp_fp_env); if (fp_raise_fault(tmp_fp_env)) fp_exception_fault(fp_decode_fault(tmp_fp_env)); if (fp_is_nan_or_inf(tmp_default_result_pair.hi)) { tmp_res_hi = fp_single(tmp_default_result_pair.hi);...
  • Page 131 fpnmpy fpnmpy — Floating-point Parallel Negative Multiply ) fpnmpy. pseudo-op of: ( ) fpnma. Format: sf f sf f The pair of products of the pairs of single precision values in the significand fields of FR Description: and FR are computed to infinite precision and then negated. The resulting values are then rounded to single precision using the rounding mode specified by FPSR.sf.rc.
  • Page 132 fprcpa fprcpa — Floating-point Parallel Reciprocal Approximation ) fprcpa. Format: sf f If PR is 0, PR is cleared and FR remains unchanged. Description: If PR is 1, the following will occur: • Each half of the significand of FR is either set to an approximation (with a relative -8.886 error <...
  • Page 133 fprcpa tmp_res = FP_ZERO; tmp_res.sign = num.sign ^ den.sign; tmp_pred_hi = 0; } else { tmp_res = fp_ieee_recip(den); if (limits_check.hi_fr2_or_quot) tmp_pred_hi = 0; else tmp_pred_hi = 1; tmp_res_hi = fp_single(tmp_res); if (fp_is_nan_or_inf(tmp_default_result_pair.lo) || limits_check.lo_fr3) { tmp_res_lo = fp_single(tmp_default_result_pair.lo); tmp_pred_lo = 0; } else { num = fp_normalize(fp_reg_read_lo(f den = fp_normalize(fp_reg_read_lo(f...
  • Page 134 fprcpa Denormal/Unnormal Operand (D) Software Assist (SWA) fault Illegal Operation fault Floating-point Exception fault Interruptions: Disabled Floating-point Register fault Volume 3: Instruction Reference 3:125...
  • Page 135 fprsqrta fprsqrta — Floating-point Parallel Reciprocal Square Root Approximation Format: ) fprsqrta. sf f If PR is 0, PR is cleared and FR remains unchanged. Description: If PR is 1, the following will occur: • Each half of the significand of FR is either set to an approximation (with a relative -8.831 error <...
  • Page 136 fprsqrta if (limits_check.hi) tmp_pred_hi = 0; else tmp_pred_hi = 1; tmp_res_hi = fp_single(tmp_res); if (fp_is_nan(tmp_default_result_pair.lo)) { tmp_res_lo = fp_single(tmp_default_result_pair.lo); tmp_pred_lo = 0; } else { tmp_fr3 = fp_normalize(fp_reg_read_lo(f if (fp_is_zero(tmp_fr3)) { tmp_res = FP_INFINITY; tmp_res.sign = tmp_fr3.sign; tmp_pred_lo = 0; } else if (fp_is_pos_inf(tmp_fr3)) { tmp_res = FP_ZERO;...
  • Page 137 frcpa frcpa — Floating-point Reciprocal Approximation ) frcpa. Format: sf f If PR is 0, PR is cleared and FR remains unchanged. Description: If PR is 1, the following will occur: -8.886 • FR is either set to an approximation (with a relative error < 2 ) of the reciprocal of FR , or to the IEEE-754 mandated quotient of FR...
  • Page 138 frcpa } else { FR[f ] = fp_ieee_recip(den); PR[p ] = 1; fp_update_fpsr(sf, tmp_fp_env); fp_update_psr(f } else { PR[p ] = 0; // fp_ieee_recip() fp_ieee_recip(den) RECIP_TABLE[256] = { 0x3fc, 0x3f4, 0x3ec, 0x3e4, 0x3dd, 0x3d5, 0x3cd, 0x3c6, 0x3be, 0x3b7, 0x3af, 0x3a8, 0x3a1, 0x399, 0x392, 0x38b, 0x384, 0x37d, 0x376, 0x36f, 0x368, 0x361, 0x35b, 0x354, 0x34d, 0x346, 0x340, 0x339, 0x333, 0x32c, 0x326, 0x320, 0x319, 0x313, 0x30d, 0x307, 0x300, 0x2fa, 0x2f4, 0x2ee,...
  • Page 139 frcpa return (tmp_res); Invalid Operation (V) FP Exceptions: Zero Divide (Z) Denormal/Unnormal Operand (D) Software Assist (SWA) fault Illegal Operation fault Floating-point Exception fault Interruptions: Disabled Floating-point Register fault 3:130 Volume 3: Instruction Reference...
  • Page 140 frsqrta frsqrta — Floating-point Reciprocal Square Root Approximation ) frsqrta. Format: sf f If PR is 0, PR is cleared and FR remains unchanged. Description: If PR is 1, the following will occur: -8.831 • FR is either set to an approximation (with a relative error < 2 ) of the reciprocal square root of FR , or set to the IEEE-754 mandated square root of FR...
  • Page 141 frsqrta fp_update_psr(f } else { PR[p ] = 0; // fp_ieee_recip_sqrt() fp_ieee_recip_sqrt(root) RECIP_SQRT_TABLE[256] = { 0x1a5, 0x1a0, 0x19a, 0x195, 0x18f, 0x18a, 0x185, 0x180, 0x17a, 0x175, 0x170, 0x16b, 0x166, 0x161, 0x15d, 0x158, 0x153, 0x14e, 0x14a, 0x145, 0x140, 0x13c, 0x138, 0x133, 0x12f, 0x12a, 0x126, 0x122, 0x11e, 0x11a, 0x115, 0x111, 0x10d, 0x109, 0x105, 0x101, 0x0fd, 0x0fa, 0x0f6, 0x0f2, 0x0ee, 0x0ea, 0x0e7, 0x0e3, 0x0df, 0x0dc, 0x0d8, 0x0d5, 0x0d1, 0x0ce, 0x0ca, 0x0c7, 0x0c3, 0x0c0, 0x0bd, 0x0b9,...
  • Page 142 frsqrta Illegal Operation fault Floating-point Exception fault Interruptions: Disabled Floating-point Register fault Volume 3: Instruction Reference 3:133...
  • Page 143 fselect fselect — Floating-point Select ) fselect Format: The significand field of FR is logically AND-ed with the significand field of FR and the Description: significand field of FR is logically AND-ed with the one’s complement of the significand field of FR .
  • Page 144 fsetc fsetc — Floating-point Set Controls ) fsetc. Format: sf amask , omask The status field’s control bits are initialized to the value obtained by logically AND-ing Description: the sf0.controls and immediate field and logically OR-ing the immediate amask omask field.
  • Page 145 fsub fsub — Floating-point Subtract ) fsub. pseudo-op of: ( ) fms. , f1, Format: sf f sf f is subtracted from FR (computed to infinite precision), rounded to the precision Description: indicated by pc (and possibly FPSR.sf.pc and FPSR.sf.wre) using the rounding mode specified by FPSR.sf.rc, and placed in FR If either FR or FR...
  • Page 146: Floating-Point Swap

    fswap fswap — Floating-point Swap ) fswap swap_form Format: ) fswap.nl swap_nl_form ) fswap.nr swap_nr_form For the swap_form, the left single precision value in FR is concatenated with the right Description: single precision value in FR . The concatenated pair is then swapped. For the swap_nl_form, the left single precision value in FR is concatenated with the right single precision value in FR...
  • Page 147: Floating-Point Swap Negate Right

    fswap Figure 2-20. Floating-point Swap Negate Right 64 63 Negated Sign Bit 64 63 32 31 1003E Operation: if (PR[qp]) { fp_check_target_register(f if (tmp_isrcode = fp_reg_disabled(f , 0)) disabled_fp_register_fault(tmp_isrcode, 0); if (fp_is_natval(FR[f ]) || fp_is_natval(FR[f ])) { FR[f ] = NATVAL; } else { if (swap_form) { tmp_res_hi = FR[f...
  • Page 148: Floating-Point Sign Extend Left

    fsxt fsxt — Floating-point Sign Extend ) fsxt.l sxt_l_form Format: ) fsxt.r sxt_r_form For the sxt_l_form (sxt_r_form), the sign of the left (right) single precision value in FR Description: is extended to 32-bits and is concatenated with the left (right) single precision value in FR For all forms, the exponent field of FR is set to the biased exponent for 2.0...
  • Page 149 fsxt Operation: if (PR[qp]) { fp_check_target_register(f if (tmp_isrcode = fp_reg_disabled(f , 0)) disabled_fp_register_fault(tmp_isrcode, 0); if (fp_is_natval(FR[f ]) || fp_is_natval(FR[f ])) { FR[f ] = NATVAL; } else { if (sxt_l_form) { tmp_res_hi = (FR[f ].significand{63} ? 0xFFFFFFFF : 0x00000000); tmp_res_lo = FR[f ].significand{63:32};...
  • Page 150 fwb — Flush Write Buffers ) fwb Format: The processor is instructed to expedite flushing of any pending stores held in write or Description: coalescing buffers. Since this operation is a hint, the processor may or may not take any action and actually flush any outstanding stores. The processor gives no indication when flushing of any prior stores is completed.
  • Page 151 fxor fxor — Floating-point Exclusive Or ) fxor Format: The bit-wise logical exclusive-OR of the significand fields of FR and FR is computed. Description: The resulting value is stored in the significand field of FR . The exponent field of FR is set to the biased exponent for 2.0 (0x1003E) and the sign field of FR is set to...
  • Page 152: Function Of Getf.exp

    getf getf — Get Floating-point Value or Exponent or Significand ) getf.s single_form Format: ) getf.d double_form ) getf.exp exponent_form ) getf.sig significand_form In the single and double forms, the value in FR is converted into a single precision Description: (single_form) or double precision (double_form) memory representation and placed in , as shown in Figure 5-7...
  • Page 153 getf Operation: if (PR[qp]) { check_target_register(r if (tmp_isrcode = fp_reg_disabled(f , 0, 0, 0)) disabled_fp_register_fault(tmp_isrcode, 0); if (single_form) { GR[r ]{31:0} = fp_fr_to_mem_format(FR[f ], 4, 0); GR[r ]{63:32} = 0; } else if (double_form) { GR[r ] = fp_fr_to_mem_format(FR[f ], 8, 0); } else if (exponent_form) { GR[r ]{63:18} = 0;...
  • Page 154: Hint Immediates

    hint hint — Performance Hint ) hint pseudo-op Format: (qp) hint.i i_unit_form (qp) hint.b b_unit_form (qp) hint.m m_unit_form (qp) hint.f f_unit_form (qp) hint.x x_unit_form Provides a performance hint to the processor about the program being executed. It has Description: no effect on architectural machine state, and operates as a nop instruction except for its performance effects.
  • Page 155 invala invala — Invalidate ALAT ) invala complete_form Format: (qp) invala.e r gr_form, entry_form (qp) invala.e f fr_form, entry_form The selected entry or entries in the ALAT are invalidated. Description: In the complete_form, all ALAT entries are invalidated. In the entry_form, the ALAT is queried using the general register specifier r (gr_form), or the floating-point register specifier f...
  • Page 156 itc — Insert Translation Cache ) itc.i instruction_form Format: ) itc.d data_form An entry is inserted into the instruction or data translation cache. GR specifies the Description: physical address portion of the translation. ITIR specifies the protection key, page size and additional information.
  • Page 157 Operation: if (PR[qp]) { if (!followed_by_stop()) undefined_behavior(); if (PSR.ic) illegal_operation_fault(); if (PSR.cpl != 0) privileged_operation_fault(0); if (GR[r ].nat) register_nat_consumption_fault(0); tmp_size = CR[ITIR].ps; tmp_va = CR[IFA]{60:0}; tmp_rid = RR[CR[IFA]{63:61}].rid; tmp_va = align_to_size_boundary(tmp_va, tmp_size); if (is_reserved_field(TLB_TYPE, GR[r ], CR[ITIR])) reserved_register_field_fault(); if (!impl_check_mov_ifa() && unimplemented_virtual_address(CR[IFA], PSR.vm)) unimplemented_data_address_fault(0);...
  • Page 158 itr — Insert Translation Register ) itr.i itr[ instruction_form Format: ) itr.d dtr[ data_form A translation is inserted into the instruction or data translation register specified by the Description: contents of GR . GR specifies the physical address portion of the translation. ITIR specifies the protection key, page size and additional information.
  • Page 159 Machine Check abort Reserved Register/Field fault Interruptions: Illegal Operation fault Unimplemented Data Address fault Privileged Operation fault Virtualization fault Register NaT Consumption fault For the instruction_form, software must issue an instruction serialization operation Serialization: before a dependent instruction fetch access. For the data_form, software must issue a data serialization operation before issuing a data access or non-access reference dependent on the new translation.
  • Page 160: Sz Completers

    ld — Load ) ld no_base_update_form Format: ldtype ldhint r ) ld reg_base_update_form ldtype ldhint r ) ld imm_base_update_form ldtype ldhint r ) ld16. , ar.csd = [ sixteen_byte_form, no_base_update_form ldhint r ) ld16.acq. , ar.csd = [ sixteen_byte_form, acquire_form, ldhint r no_base_update_form ) ld8.fill.
  • Page 161: Load Hints

    Table 2-33. Load Types (Continued) ldtype Interpretation Special Load Operation Completer Speculative An entry is added to the ALAT, and certain exceptions may be deferred. Advanced load Deferral causes the target register’s NaT bit to be set, and the processor ensures that no ALAT entry exists for the target register. The absence of an ALAT entry is later used to detect deferral or collision.
  • Page 162 Table 2-34. Load Hints (Continued) ldhint Completer Interpretation No temporal locality, level 1 No temporal locality, all levels In the no_base_update form, the value in GR r is not modified and no prefetch hint is implied. For the base update forms, specifying the same register address in r and r will cause an Illegal Operation fault.
  • Page 163 Operation: if (PR[qp]) { size = fill_form ? 8 : (sixteen_byte_form ? 16 : sz); speculative = (ldtype == ‘s’ || ldtype == ‘sa’); advanced = (ldtype == ‘a’ || ldtype == ‘sa’); check_clear = (ldtype == ‘c.clr’ || ldtype == ‘c.clr.acq’); check_no_clear = (ldtype == ‘c.nc’);...
  • Page 164 val = mem_read(paddr, size, UM.be, mattr, otype, bias | ldhint); if (check_clear || advanced) // remove any old ALAT entry alat_inval_single_entry(GENERAL, r if (defer) { if (speculative) { GR[r ] = natd_gr_read(paddr, size, UM.be, mattr, otype, bias | ldhint); GR[r ].nat = 1;...
  • Page 165 Illegal Operation fault Data NaT Page Consumption fault Interruptions: Register NaT Consumption fault Data Key Miss fault Unimplemented Data Address fault Data Key Permission fault Data Nested TLB fault Data Access Rights fault Alternate Data TLB fault Data Access Bit fault VHPT Data fault Data Debug fault Data TLB fault...
  • Page 166: Fsz Completers

    ldf — Floating-point Load ) ldf no_base_update_form Format: fldtype ldhint f ) ldf reg_base_update_form fldtype ldhint f ) ldf imm_base_update_form fldtype ldhint f ) ldf8. integer_form, no_base_update_form fldtype ldhint f ) ldf8. integer_form, reg_base_update_form fldtype ldhint f ) ldf8. integer_form, imm_base_update_form fldtype ldhint f (qp) ldf.fill.
  • Page 167 Table 2-36. FP Load Types (Continued) fldtype Interpretation Special Load Operation Completer Speculative An entry is added to the ALAT, and certain exceptions may be deferred. Advanced load Deferral causes NaTVal to be placed in the target register, and the processor ensures that no ALAT entry exists for the target register.
  • Page 168 Operation: if (PR[qp]) { size = (fill_form ? 16 : (integer_form ? 8 : fsz)); speculative = (fldtype == ‘s’ || fldtype == ‘sa’); advanced = (fldtype == ‘a’ || fldtype == ‘sa’); check_clear = (fldtype == ‘c.clr’ ); check_no_clear = (fldtype == ‘c.nc’); check = check_clear || check_no_clear;...
  • Page 169 if (imm_base_update_form) { // update base register GR[r ] = GR[r ] + sign_ext(imm , 9); GR[r ].nat = GR[r ].nat; } else if (reg_base_update_form) { GR[r ] = GR[r ] + GR[r GR[r ].nat = GR[r ].nat || GR[r ].nat;...
  • Page 170 ldfp ldfp — Floating-point Load Pair ) ldfps. single_form, no_base_update_form Format: fldtype ldhint f ) ldfps. ], 8 single_form, base_update_form fldtype ldhint f ) ldfpd. double_form, no_base_update_form fldtype ldhint f ) ldfpd. ], 16 double_form, base_update_form fldtype ldhint f ) ldfp8. integer_form, no_base_update_form fldtype ldhint f...
  • Page 171 ldfp Operation: if (PR[qp]) { size = single_form ? 8 : 16; speculative = (fldtype == ‘s’ || fldtype == ‘sa’); advanced = (fldtype == ‘a’ || fldtype == ‘sa’); check_clear = (fldtype == ‘c.clr’); check_no_clear = (fldtype == ‘c.nc’); check = check_clear || check_no_clear;...
  • Page 172 ldfp FR[f ] = (integer_form ? FP_INT_ZERO : FP_ZERO); } else { // execute load normally FR[f ] = fp_mem_to_fr_format(f1_val, size/2, integer_form); FR[f ] = fp_mem_to_fr_format(f2_val, size/2, integer_form); if ((check_no_clear || advanced) && ma_is_speculative(mattr)) // add entry to ALAT alat_write(fldtype, FLOAT, f , paddr, size);...
  • Page 173: Lftype Mnemonic Values

    lfetch lfetch — Line Prefetch (qp) lfetch.lftype.lfhint [r no_base_update_form Format: ) lfetch. reg_base_update_form lftype lfhint ) lfetch. imm_base_update_form lftype lfhint ) lfetch. .excl. no_base_update_form, exclusive_form lftype lfhint ) lfetch. .excl. reg_base_update_form, exclusive_form lftype lfhint ) lfetch. .excl. imm_base_update_form, exclusive_form lftype lfhint The line containing the address specified by the value in GR is moved to the highest...
  • Page 174: Lfhint Mnemonic Values

    lfetch Table 2-38. lfhint Mnemonic Values lfhint Mnemonic Interpretation none Temporal locality, level 1 No temporal locality, level 1 No temporal locality, level 2 No temporal locality, all levels A faulting lfetch to an unimplemented address results in an Unimplemented Data Address fault.
  • Page 175 lfetch Operation: if (PR[qp]) { itype = READ|NON_ACCESS; itype |= (lftype == ‘fault’) ? LFETCH_FAULT : LFETCH; if (reg_base_update_form || imm_base_update_form) check_target_register(r if (lftype == ‘fault’) { // faulting form if (GR[r ].nat && !PSR.ed) // fault on NaT address register_nat_consumption_fault(itype);...
  • Page 176 loadrs loadrs — Load Register Stack loadrs Format: This instruction ensures that a specified number of bytes (registers values and/or NaT Description: collections) below the current BSP have been loaded from the backing store into the stacked general registers. The loaded registers are placed into the dirty partition of the register stack.
  • Page 177 mf — Memory Fence ) mf ordering_form Format: ) mf.a acceptance_form This instruction forces ordering between prior and subsequent memory accesses. The Description: ordering_form ensures all prior data memory accesses are made visible prior to any subsequent data memory accesses being made visible. It does not ensure prior data memory references have been accepted by the external platform, nor that prior data memory references are visible.
  • Page 178 mix — Mix ) mix1.l one_byte_form, left_form Format: ) mix2.l two_byte_form, left_form ) mix4.l four_byte_form, left_form ) mix1.r one_byte_form, right_form ) mix2.r two_byte_form, right_form ) mix4.r four_byte_form, right_form The data elements of GR are mixed as shown in Figure 2-25, and the result Description: placed in GR .
  • Page 179: Mix Examples

    Figure 2-25. Mix Examples GR r GR r mix1.l GR r GR r GR r mix1.r GR r GR r GR r mix2.l GR r GR r GR r mix2.r GR r GR r GR r mix4.l GR r GR r GR r mix4.r GR r...
  • Page 180 Operation: if (PR[qp]) { check_target_register(r if (one_byte_form) { // one-byte elements x[0] = GR[r ]{7:0}; y[0] = GR[r ]{7:0}; x[1] = GR[r ]{15:8}; y[1] = GR[r ]{15:8}; x[2] = GR[r ]{23:16}; y[2] = GR[r ]{23:16}; x[3] = GR[r ]{31:24}; y[3] = GR[r ]{31:24};...
  • Page 181 mov ar mov — Move Application Register ) mov pseudo-op Format: ) mov pseudo-op ) mov pseudo-op ) mov.i i_form, from_form ) mov.i i_form, register_form, to_form ) mov.i i_form, immediate_form, to_form ) mov.m m_form, from_form ) mov.m m_form, register_form, to_form ) mov.m m_form, immediate_form, to_form The source operand is copied to the destination register.
  • Page 182 mov ar Operation: if (PR[qp]) { tmp_type = (i_form ? AR_I_TYPE : AR_M_TYPE); if (is_reserved_reg(tmp_type, ar illegal_operation_fault(); if (from_form) { check_target_register(r if (((ar == BSPSTORE) || (ar == RNAT)) && (AR[RSC].mode != 0)) illegal_operation_fault(); if ((ar == ITC || ar == RUC) &&...
  • Page 183: Move To Br Whether Hints

    mov br mov — Move Branch Register ) mov from_form Format: ) mov pseudo-op ) mov. to_form ih b ) mov.ret. return_form, to_form ih b The source operand is copied to the destination register. Description: In the from_form, the branch register specified by is copied into GR .
  • Page 184 mov cr mov — Move Control Register ) mov from_form Format: ) mov to_form The source operand is copied to the destination register. Description: For the from_form, the control register specified by is read and the value copied into For the to_form, GR is read and the value copied into CR Control registers can only be accessed at the most privileged level, and when PSR.vm is 0.
  • Page 185 mov cr last_IP = tmp_val; Illegal Operation fault Reserved Register/Field fault Interruptions: Privileged Operation fault Unimplemented Data Address fault Register NaT Consumption fault Virtualization fault Reads of control registers reflect the results of all prior instruction groups and Serialization: interruptions. In general, writes to control registers do not immediately affect subsequent instructions.
  • Page 186 mov fr mov — Move Floating-point Register ) mov pseudo-op of: ( ) fmerge.s Format: The value of FR is copied to FR Description: Operation: See “fmerge — Floating-point Merge” on page 3:80. Volume 3: Instruction Reference 3:177...
  • Page 187 mov gr mov — Move General Register ) mov pseudo-op of: ( ) adds = 0, Format: The value of GR is copied to GR Description: Operation: See “add — Add” on page 3:14. 3:178 Volume 3: Instruction Reference...
  • Page 188 mov imm mov — Move Immediate ) mov pseudo-op of: ( ) addl , r0 Format: The immediate value, , is sign extended to 64 bits and placed in GR Description: Operation: See “add — Add” on page 3:14. Volume 3: Instruction Reference 3:179...
  • Page 189: Indirect Register File Mnemonics

    mov indirect mov — Move Indirect Register ) mov from_form Format: ireg ) mov to_form ireg The source operand is copied to the destination register. Description: For move from indirect register, GR is read and the value used as an index into the register file specified by (see Table 2-40...
  • Page 190 mov indirect if (from_form) { check_target_register(r if (PSR.cpl != 0 && !(ireg == PMD_TYPE || ireg == CPUID_TYPE)) privileged_operation_fault(0); if (GR[r ].nat) register_nat_consumption_fault(0); if (is_reserved_reg(ireg, tmp_index)) reserved_register_field_fault(); if (PSR.vm == 1 && ireg != PMD_TYPE) virtualization_fault(); if (ireg == PMD_TYPE) { if ((PSR.cpl != 0) &&...
  • Page 191 mov indirect case PMD_TYPE: pmd_write(tmp_index, tmp_val); break; case RR_TYPE: RR[tmp_index]= tmp_val; break; Illegal Operation fault Reserved Register/Field fault Interruptions: Privileged Operation fault Virtualization fault Register NaT Consumption fault For move to data breakpoint registers, software must issue a data serialize operation Serialization: before issuing a memory reference dependent on the modified register.
  • Page 192 mov ip mov — Move Instruction Pointer ) mov = ip Format: The Instruction Pointer (IP) for the bundle containing this instruction is copied into GR Description: Operation: if (PR[qp]) { check_target_register(r GR[r ] = IP; GR[r ].nat = 0; Illegal Operation fault Interruptions: Volume 3: Instruction Reference...
  • Page 193 mov pr mov — Move Predicates ) mov = pr from_form Format: ) mov pr = to_form mask ) mov pr.rot = to_rotate_form The source operand is copied to the destination register. Description: For moving the predicates to a GR, PR i is copied to bit position i within GR For moving to the predicates, the source can either be a general register, or an immediate value.
  • Page 194 mov psr mov — Move Processor Status Register ) mov = psr from_form Format: ) mov psr.l = to_form The source operand is copied to the destination register. See Section 3.3.2, “Processor Description: Status Register (PSR)” on page 2:23. For move from processor status register, PSR bits {36:35} and {31:0} are read, and copied into GR .
  • Page 195 mov um mov — Move User Mask ) mov = psr.um from_form Format: ) mov psr.um = to_form The source operand is copied to the destination register. Description: For move from user mask, PSR{5:0} is read, zero-extend, and copied into GR For move to user mask, PSR{5:0} is written by bits {5:0} of GR .
  • Page 196 movl movl — Move Long Immediate ) movl Format: The immediate value is copied to GR . The L slot of the bundle contains 41 bits of Description: Operation: if (PR[qp]) { check_target_register(r GR[r ] = imm GR[r ].nat = 0; Illegal Operation fault Interruptions: Volume 3: Instruction Reference...
  • Page 197 mpy4 mpy4 — Unsigned Integer Multiply ) mpy4 Format: The lower 32 bits of each of the two source operands are treated as unsigned values Description: and are multiplied, and the result is placed in GR . The upper 32 bits of each of the source operands are ignored.
  • Page 198 mpyshl4 mpyshl4 — Unsigned Integer Shift Left and Multiply ) mpyshl4 Format: The upper 32 bits of GR and the lower 32 bits of GR are treated as unsigned values Description: and are multiplied. The result of the multiplication is shifted left 32 bits, with the vacated bit positions filled with zeroes, and the result is placed in GR .
  • Page 199: Mux Permutations For 8-Bit Elements

    mux — Mux ) mux1 one_byte_form Format: mbtype ) mux2 two_byte_form mhtype A permutation is performed on the packed elements in a single source register, GR Description: and the result is placed in GR . For 8-bit elements, only some of all possible permutations can be specified.
  • Page 200: Mux2 Examples (16-Bit Elements)

    For 16-bit elements, all possible permutations, with and without repetitions can be specified. They are expressed with an 8-bit mhtype field, which encodes the indices of the four 16-bit data elements. The indexed 16-bit elements of GR are copied to corresponding 16-bit positions in the target register GR .
  • Page 201 Operation: if (PR[qp]) { check_target_register(r if (one_byte_form) { x[0] = GR[r ]{7:0}; x[1] = GR[r ]{15:8}; x[2] = GR[r ]{23:16}; x[3] = GR[r ]{31:24}; x[4] = GR[r ]{39:32}; x[5] = GR[r ]{47:40}; x[6] = GR[r ]{55:48}; x[7] = GR[r ]{63:56}; switch (mbtype) { case ‘@rev’: GR[r...
  • Page 202 nop — No Operation ) nop pseudo-op Format: ) nop.i i_unit_form ) nop.b b_unit_form ) nop.m m_unit_form ) nop.f f_unit_form ) nop.x x_unit_form No operation is done. Description: The immediate, , can be used by software as a marker in program code. It is ignored by hardware.
  • Page 203 or — Logical Or ) or register_form Format: ) or imm8_form The two source operands are logically ORed and the result placed in GR . In the Description: register form the first operand is GR ; in the immediate form the first operand is taken from the encoding field.
  • Page 204: Pack Operation

    pack pack — Pack ) pack2.sss two_byte_form, signed_saturation_form Format: ) pack2.uss two_byte_form, unsigned_saturation_form ) pack4.sss four_byte_form, signed_saturation_form 32-bit or 16-bit elements from GR and GR are converted into 16-bit or 8-bit Description: elements respectively, and the results are placed GR .
  • Page 205 pack Operation: if (PR[qp]) { check_target_register(r if (two_byte_form) { if (signed_saturation_form) { max = sign_ext(0x7f, 8); min = sign_ext(0x80, 8); } else { // unsigned_saturation_form max = 0xff; min = 0x00; temp[0] = sign_ext(GR[r ]{15:0}, 16); temp[1] = sign_ext(GR[r ]{31:16}, 16); temp[2] = sign_ext(GR[r ]{47:32}, 16);...
  • Page 206: Parallel Add Examples

    padd padd — Parallel Add ) padd1 one_byte_form, modulo_form Format: ) padd1.sss one_byte_form, sss_saturation_form ) padd1.uus one_byte_form, uus_saturation_form ) padd1.uuu one_byte_form, uuu_saturation_form ) padd2 two_byte_form, modulo_form ) padd2.sss two_byte_form, sss_saturation_form ) padd2.uus two_byte_form, uus_saturation_form ) padd2.uuu two_byte_form, uuu_saturation_form ) padd4 four_byte_form, modulo_form The sets of elements from the two source operands are added, and the results placed in Description:...
  • Page 207 padd Operation: if (PR[qp]) { check_target_register(r if (one_byte_form) { // one-byte elements x[0] = GR[r ]{7:0}; y[0] = GR[r ]{7:0}; x[1] = GR[r ]{15:8}; y[1] = GR[r ]{15:8}; x[2] = GR[r ]{23:16}; y[2] = GR[r ]{23:16}; x[3] = GR[r ]{31:24}; y[3] = GR[r ]{31:24};...
  • Page 208 padd x[2] = GR[r ]{47:32}; y[2] = GR[r ]{47:32}; x[3] = GR[r ]{63:48}; y[3] = GR[r ]{63:48}; if (sss_saturation_form) { max = sign_ext(0x7fff, 16); min = sign_ext(0x8000, 16); for (i = 0; i < 4; i++) { temp[i] = sign_ext(x[i], 16) + sign_ext(y[i], 16); } else if (uus_saturation_form) { max = 0xffff;...
  • Page 209 padd Illegal Operation fault Interruptions: 3:200 Volume 3: Instruction Reference...
  • Page 210: Parallel Average Example

    pavg pavg — Parallel Average ) pavg1 normal_form, one_byte_form Format: ) pavg1.raz raz_form, one_byte_form ) pavg2 normal_form, two_byte_form ) pavg2.raz raz_form, two_byte_form The unsigned data elements of GR are added to the unsigned data elements of GR Description: The results of the add are then each independently shifted to the right by one bit position.
  • Page 211: Parallel Average With Round Away From Zero Example

    pavg Figure 2-31. Parallel Average with Round Away from Zero Example GR r GR r Shift Right 1 Bit 16-bit Sum Plus Carry Carry Sum Bits Shift Right 1 Bit GR r pavg2.raz 3:202 Volume 3: Instruction Reference...
  • Page 212 pavg Operation: if (PR[qp]) { check_target_register(r if (one_byte_form) { x[0] = GR[r ]{7:0}; y[0] = GR[r ]{7:0}; x[1] = GR[r ]{15:8}; y[1] = GR[r ]{15:8}; x[2] = GR[r ]{23:16}; y[2] = GR[r ]{23:16}; x[3] = GR[r ]{31:24}; y[3] = GR[r ]{31:24};...
  • Page 213: Parallel Average Subtract Example

    pavgsub pavgsub — Parallel Average Subtract ) pavgsub1 one_byte_form Format: ) pavgsub2 two_byte_form The unsigned data elements of GR are subtracted from the unsigned data elements of Description: . The results of the subtraction are then each independently shifted to the right by one bit position.
  • Page 214 pavgsub Operation: if (PR[qp]) { check_target_register(r if (one_byte_form) { x[0] = GR[r ]{7:0}; y[0] = GR[r ]{7:0}; x[1] = GR[r ]{15:8}; y[1] = GR[r ]{15:8}; x[2] = GR[r ]{23:16}; y[2] = GR[r ]{23:16}; x[3] = GR[r ]{31:24}; y[3] = GR[r ]{31:24};...
  • Page 215: Parallel Compare Examples

    pcmp pcmp — Parallel Compare ) pcmp1. one_byte_form Format: prel r ) pcmp2. two_byte_form prel r ) pcmp4. four_byte_form prel r The two source operands are compared for one of the two relations shown in Description: Table 2-45. If the comparison condition is true for corresponding data elements of GR and GR , then the corresponding data element in GR is set to all ones.
  • Page 216 pcmp Operation: if (PR[qp]) { check_target_register(r if (one_byte_form) { // one-byte elements x[0] = GR[r ]{7:0}; y[0] = GR[r ]{7:0}; x[1] = GR[r ]{15:8}; y[1] = GR[r ]{15:8}; x[2] = GR[r ]{23:16}; y[2] = GR[r ]{23:16}; x[3] = GR[r ]{31:24}; y[3] = GR[r ]{31:24};...
  • Page 217 pcmp else res[i] = 0x00000000; GR[r ] = concatenate2(res[1], res[0]); GR[r ].nat = GR[r ].nat || GR[r ].nat; Illegal Operation fault Interruptions: 3:208 Volume 3: Instruction Reference...
  • Page 218 pmax pmax — Parallel Maximum ) pmax1.u one_byte_form Format: ) pmax2 two_byte_form The maximum of the two source operands is placed in the result register. In the Description: one_byte_form, each unsigned 8-bit element of GR is compared with the corresponding unsigned 8-bit element of GR and the greater of the two is placed in the corresponding 8-bit element of GR .
  • Page 219 pmax Operation: if (PR[qp]) { check_target_register(r if (one_byte_form) { // one-byte elements x[0] = GR[r ]{7:0}; y[0] = GR[r ]{7:0}; x[1] = GR[r ]{15:8}; y[1] = GR[r ]{15:8}; x[2] = GR[r ]{23:16}; y[2] = GR[r ]{23:16}; x[3] = GR[r ]{31:24}; y[3] = GR[r ]{31:24};...
  • Page 220 pmin pmin — Parallel Minimum ) pmin1.u one_byte_form Format: ) pmin2 two_byte_form The minimum of the two source operands is placed in the result register. In the Description: one_byte_form, each unsigned 8-bit element of GR is compared with the corresponding unsigned 8-bit element of GR and the smaller of the two is placed in the corresponding 8-bit element of GR .
  • Page 221 pmin Operation: if (PR[qp]) { check_target_register(r if (one_byte_form) { // one-byte elements x[0] = GR[r ]{7:0}; y[0] = GR[r ]{7:0}; x[1] = GR[r ]{15:8}; y[1] = GR[r ]{15:8}; x[2] = GR[r ]{23:16}; y[2] = GR[r ]{23:16}; x[3] = GR[r ]{31:24}; y[3] = GR[r ]{31:24};...
  • Page 222: Parallel Multiply Operation

    pmpy pmpy — Parallel Multiply ) pmpy2.r right_form Format: ) pmpy2.l left_form Two signed 16-bit data elements of GR are multiplied by the corresponding two Description: signed 16-bit data elements of GR as shown in Figure 2-36. The two 32-bit results are placed in GR Figure 2-36.
  • Page 223: Parallel Multiply And Shift Right Operation

    pmpyshr pmpyshr — Parallel Multiply and Shift Right ) pmpyshr2 signed_form Format: count ) pmpyshr2.u unsigned_form count The four 16-bit data elements of GR are multiplied by the corresponding four 16-bit Description: data elements of GR as shown in Figure 2-37.
  • Page 224 pmpyshr Operation: if (PR[qp]) { check_target_register(r x[0] = GR[r ]{15:0}; y[0] = GR[r ]{15:0}; x[1] = GR[r ]{31:16}; y[1] = GR[r ]{31:16}; x[2] = GR[r ]{47:32}; y[2] = GR[r ]{47:32}; x[3] = GR[r ]{63:48}; y[3] = GR[r ]{63:48}; for (i = 0; i < 4; i++) { if (unsigned_form) // unsigned multiplication temp[i] = zero_ext(x[i], 16) * zero_ext(y[i], 16);...
  • Page 225 popcnt popcnt — Population Count ) popcnt Format: The number of bits in GR having the value 1 is counted, and the resulting sum is Description: placed in GR Operation: if (PR[qp]) { check_target_register(r res = 0; // Count up all the one bits for (i = 0;...
  • Page 226 probe probe — Probe Access ) probe.r regular_form, read_form, register_form Format: ) probe.w regular_form, write_form, register_form ) probe.r regular_form, read_form, immediate_form ) probe.w regular_form, write_form, immediate_form ) probe.r.fault fault_form, read_form, immediate_form ) probe.w.fault fault_form, write_form, immediate_form ) probe.rw.fault fault_form, read_write_form, immediate_form This instruction determines whether read or write access, with a specified privilege Description: level, to a given virtual address is permitted.
  • Page 227: Faults For Regular_Form And Fault_Form Probe Instructions

    When PSR.vm is 1, this instruction may optionally raise Virtualization faults, see Section 11.7.4.2.8, “Probe Instruction Virtualization” on page 2:344 for details. ® ® Please refer to the Intel Itanium Software Conventions and Runtime Architecture Guide for usage information of the probe instruction. 3:218...
  • Page 228 probe Operation: if (PR[qp]) { itype = NON_ACCESS; itype |= (read_write_form) ? READ|WRITE : ((write_form) ? WRITE : READ); itype |= (fault_form) ? PROBE_FAULT : PROBE; itype |= (register_form) ? REGISTER_FORM : IMM_FORM; if (!fault_form) check_target_register(r if (GR[r ].nat || (register_form ? GR[r ].nat : 0)) register_nat_consumption_fault(itype);...
  • Page 229: Parallel Sum Of Absolute Difference Example

    psad psad — Parallel Sum of Absolute Difference ) psad1 Format: The unsigned 8-bit elements of GR are subtracted from the unsigned 8-bit elements Description: of GR . The absolute value of each difference is accumulated across the elements and placed in GR Figure 2-38.
  • Page 230 psad Operation: if (PR[qp]) { check_target_register(r x[0] = GR[r ]{7:0}; y[0] = GR[r ]{7:0}; x[1] = GR[r ]{15:8}; y[1] = GR[r ]{15:8}; x[2] = GR[r ]{23:16}; y[2] = GR[r ]{23:16}; x[3] = GR[r ]{31:24}; y[3] = GR[r ]{31:24}; x[4] = GR[r ]{39:32};...
  • Page 231: Parallel Shift Left Examples

    pshl pshl — Parallel Shift Left ) pshl2 two_byte_form, variable_form Format: ) pshl2 , count two_byte_form, fixed_form ) pshl4 four_byte_form, variable_form ) pshl4 , count four_byte_form, fixed_form The data elements of GR are each independently shifted to the left by the scalar shift Description: count in GR , or in the immediate field count...
  • Page 232 pshladd pshladd — Parallel Shift Left and Add ) pshladd2 , count Format: The four signed 16-bit data elements of GR are each independently shifted to the left Description: by count bits (shifting zeros into the low-order bits), and added to the four signed 16-bit data elements of GR .
  • Page 233 pshr pshr — Parallel Shift Right ) pshr2 signed_form, two_byte_form, variable_form Format: ) pshr2 , count signed_form, two_byte_form, fixed_form ) pshr2.u unsigned_form, two_byte_form, variable_form ) pshr2.u , count unsigned_form, two_byte_form, fixed_form ) pshr4 signed_form, four_byte_form, variable_form ) pshr4 , count signed_form, four_byte_form, fixed_form ) pshr4.u unsigned_form, four_byte_form, variable_form...
  • Page 234 pshr Operation: if (PR[qp]) { check_target_register(r shift_count = (variable_form ? GR[r ] : count tmp_nat = (variable_form ? GR[r ].nat : 0); if (two_byte_form) { // two_byte_form if (shift_count u> 16) shift_count = 16; if (unsigned_form) { // unsigned shift GR[r ]{15:0} = shift_right_unsigned(zero_ext(GR[r...
  • Page 235 pshradd pshradd — Parallel Shift Right and Add ) pshradd2 , count Format: The four signed 16-bit data elements of GR are each independently shifted to the Description: right by count bits, and added to the four signed 16-bit data elements of GR .
  • Page 236: Parallel Subtract Examples

    psub psub — Parallel Subtract ) psub1 one_byte_form, modulo_form Format: ) psub1.sss one_byte_form, sss_saturation_form ) psub1.uus one_byte_form, uus_saturation_form ) psub1.uuu one_byte_form, uuu_saturation_form ) psub2 two_byte_form, modulo_form ) psub2.sss two_byte_form, sss_saturation_form ) psub2.uus two_byte_form, uus_saturation_form ) psub2.uuu two_byte_form, uuu_saturation_form ) psub4 four_byte_form, modulo_form The sets of elements from the two source operands are subtracted, and the results Description:...
  • Page 237 psub Operation: if (PR[qp]) { check_target_register(r if (one_byte_form) { // one-byte elements x[0] = GR[r ]{7:0}; y[0] = GR[r ]{7:0}; x[1] = GR[r ]{15:8}; y[1] = GR[r ]{15:8}; x[2] = GR[r ]{23:16}; y[2] = GR[r ]{23:16}; x[3] = GR[r ]{31:24}; y[3] = GR[r ]{31:24};...
  • Page 238 psub max = sign_ext(0x7fff, 16); min = sign_ext(0x8000, 16); for (i = 0; i < 4; i++) { temp[i] = sign_ext(x[i], 16) - sign_ext(y[i], 16); } else if (uus_saturation_form) { // uus_saturation_form max = 0xffff; min = 0x0000; for (i = 0; i < 4; i++) { temp[i] = zero_ext(x[i], 16) - sign_ext(y[i], 16);...
  • Page 239 ptc.e ptc.e — Purge Translation Cache Entry ) ptc.e r Format: One or more translation entries are purged from the local processor’s instruction and Description: data translation cache. Translation Registers and the VHPT are not modified. The number of translation cache entries purged is implementation specific. Some implementations may purge all levels of the translation cache hierarchy with one iteration of PTC.e, while other implementations may require several iterations to flush all levels, sets and associativities of both instruction and data translation caches.
  • Page 240 ptc.g, ptc.ga ptc.g, ptc.ga — Purge Global Translation Cache ) ptc.g global_form Format: ) ptc.ga global_alat_form The instruction and data translation cache for each processor in the local TLB coherence Description: domain are searched for all entries whose virtual address and page size partially or completely overlap the specified purge virtual address and purge address range.
  • Page 241 ptc.g, ptc.ga Operation: if (PR[qp]) { if (!followed_by_stop()) undefined_behavior(); if (PSR.cpl != 0) privileged_operation_fault(0); if (GR[r ].nat || GR[r ].nat) register_nat_consumption_fault(0); if (unimplemented_virtual_address(GR[r ], PSR.vm)) unimplemented_data_address_fault(0); if (PSR.vm == 1) virtualization_fault(); tmp_rid = RR[GR[r ]{63:61}].rid; tmp_va = GR[r ]{60:0}; tmp_size = GR[r ]{7:2};...
  • Page 242 ptc.l ptc.l — Purge Local Translation Cache ) ptc.l Format: The instruction and data translation cache of the local processor is searched for all Description: entries whose virtual address and page size partially or completely overlap the specified purge virtual address and purge address range. All these entries are removed. The purge virtual address is specified by GR bits{60:0} and the purge region identifier is selected by GR...
  • Page 243 ptr — Purge Translation Register ) ptr.d data_form Format: ) ptr.i instruction_form In the data form of this instruction, the data translation registers and caches are Description: searched for all entries whose virtual address and page size partially or completely overlap the specified purge virtual address and purge address range.
  • Page 244 Operation: if (PR[qp]) { if (PSR.cpl != 0) privileged_operation_fault(0); if (GR[r ].nat || GR[r ].nat) register_nat_consumption_fault(0); if (unimplemented_virtual_address(GR[r ], PSR.vm)) unimplemented_data_address_fault(0); if (PSR.vm == 1) virtualization_fault(); tmp_rid = RR[GR[r ]{63:61}].rid; tmp_va = GR[r ]{60:0}; tmp_size = GR[r ]{7:2}; tmp_va = align_to_size_boundary(tmp_va, tmp_size); if (data_form) { tlb_must_purge_dtr_entries(tmp_rid, tmp_va, tmp_size);...
  • Page 245 rfi — Return From Interruption Format: The machine context prior to an interruption is restored. PSR is restored from IPSR, Description: IPSR is unmodified, and IP is restored from IIP. Execution continues at the bundle address loaded into the IP, and the instruction slot loaded into PSR.ri. This instruction must be immediately followed by a stop;...
  • Page 246 If IPSR.is is 1, software must set other IPSR fields properly for IA-32 instruction set execution; otherwise processor operation is undefined. See Table 3-2, “Processor Status Register Fields” on page 2:24 for details. Software must issue a mf instruction before this instruction if memory ordering is required between IA-32 processor-consistent and Itanium unordered memory references.
  • Page 247 //instruction set execution. } else { //return to Itanium instruction set tmp_IP = CR[IIP] & ~0xf; slot = CR[IPSR].ri; if ((CR[IPSR].it && unimplemented_virtual_address(tmp_IP, IPSR.vm)) || (!CR[IPSR].it && unimplemented_physical_address(tmp_IP))) unimplemented_address = 1; if (CR[IFS].v) { tmp_growth = -CFM.sof; alat_frame_update(-CR[IFS].ifm.sof, 0); rse_restore_frame(CR[IFS].ifm.sof, tmp_growth, CFM.sof); CFM = CR[IFS].ifm;...
  • Page 248 rsm — Reset System Mask ) rsm Format: The complement of the operand is ANDed with the system mask (PSR{23:0}) and Description: the result is placed in the system mask. See Section 3.3.2, “Processor Status Register (PSR)” on page 2:23. The PSR system mask can only be written at the most privileged level, and when PSR.vm is 0.
  • Page 249 if (imm {21}) PSR{21} = 0;) // pp if (imm {22}) PSR{22} = 0;) // di if (imm {23}) PSR{23} = 0;) // si Privileged Operation fault Virtualization fault Interruptions: Reserved Register/Field fault Software must use a data serialize or instruction serialize operation before issuing Serialization: instructions dependent upon the altered PSR bits –...
  • Page 250 rum — Reset User Mask ) rum Format: The complement of the operand is ANDed with the user mask (PSR{5:0}) and the Description: result is placed in the user mask. See Section 3.3.2, “Processor Status Register (PSR)” on page 2:23. PSR.up is only cleared if the secure performance monitor bit (PSR.sp) is zero.
  • Page 251: Function Of Setf.exp

    setf setf — Set Floating-point Value, Exponent, or Significand ) setf.s single_form Format: ) setf.d double_form ) setf.exp exponent_form ) setf.sig significand_form In the single and double forms, GR r is treated as a single precision (in the Description: single_form) or double precision (in the double_form) memory representation, converted into floating-point register format, and placed in FR , as shown in Figure 5-4...
  • Page 252 setf Operation: if (PR[qp]) { fp_check_target_register(f if (tmp_isrcode = fp_reg_disabled(f , 0, 0, 0)) disabled_fp_register_fault(tmp_isrcode, 0); if (!GR[r ].nat) { if (single_form) FR[f ] = fp_mem_to_fr_format(GR[r ], 4, 0); else if (double_form) FR[f ] = fp_mem_to_fr_format(GR[r ], 8, 0); else if (significand_form) { FR[f ].significand = GR[r FR[f...
  • Page 253 shl — Shift Left ) shl Format: ) shl pseudo-op of: ( ) dep.z , 64- count count count The value in GR is shifted to the left, with the vacated bit positions filled with zeroes, Description: and placed in GR r .
  • Page 254 shladd shladd — Shift Left and Add ) shladd Format: count The first source operand is shifted to the left by bits and then added to the second Description: count source operand and the result placed in GR . The first operand can be shifted by 1, 2, 3, or 4 bits.
  • Page 255: Shift Left And Add Pointer

    shladdp4 shladdp4 — Shift Left and Add Pointer ) shladdp4 Format: count The first source operand is shifted to the left by bits and then is added to the Description: count second source operand. The upper 32 bits of the result are forced to zero, and then bits {31:30} of GR are copied to bits {62:61} of the result.
  • Page 256 shr — Shift Right ) shr signed_form Format: ) shr.u unsigned_form ) shr , count pseudo-op of: ( ) extr , count , 64-count ) shr.u , count pseudo-op of: ( ) extr.u , count , 64-count The value in GR is shifted to the right and placed in GR r .
  • Page 257 shrp shrp — Shift Right Pair ) shrp Format: count The two source operands, GR and GR , are concatenated to form a 128-bit value and Description: shifted to the right count bits. The least-significant 64 bits of the result are placed in The immediate value count can be any number in the range 0 to 63.
  • Page 258 srlz srlz — Serialize ) srlz.i instruction_form Format: ) srlz.d data_form Instruction serialization (srlz.i) ensures: Description: • prior modifications to processor register resources that affect fetching of subsequent instruction groups are observed, • prior modifications to processor register resources that affect subsequent execution or data memory accesses are observed, •...
  • Page 259 ssm — Set System Mask ) ssm Format: operand is ORed with the system mask (PSR{23:0}) and the result is placed Description: in the system mask. See Section 3.3.2, “Processor Status Register (PSR)” on page 2:23. The PSR system mask can only be written at the most privileged level, and when PSR.vm is 0.
  • Page 260: Store Types

    st — Store ) st normal_form, no_base_update_form Format: sttype sthint ) st normal_form, imm_base_update_form sttype sthint ) st16. , ar.csd sixteen_byte_form, no_base_update_form sttype sthint ) st8.spill. spill_form, no_base_update_form sthint ) st8.spill. spill_form, imm_base_update_form sthint A value consisting of the least significant sz bytes of the value in GR is written to Description: memory starting at the address specified by the value in GR...
  • Page 261: Store Hints

    For the sixteen_byte_form, Illegal Operation fault is raised on processor models that do not support the instruction. CPUID register 4 indicates the presence of the feature on the processor model. See Section 3.1.11, “Processor Identification Registers” on page 1:34 for details. Table 2-51.
  • Page 262 Data TLB fault Unaligned Data Reference fault Data Page Not Present fault Unsupported Data Reference fault Data NaT Page Consumption fault Volume 3: Instruction Reference 3:253...
  • Page 263 stf — Floating-point Store ) stf normal_form, no_base_update_form Format: sthint ) stf normal_form, imm_base_update_form sthint ) stf8. integer_form, no_base_update_form sthint ) stf8. integer_form, imm_base_update_form sthint ) stf.spill. spill_form, no_base_update_form sthint ) stf.spill. spill_form, imm_base_update_form sthint A value, consisting of fsz bytes, is generated from the value in FR and written to Description: memory starting at the address specified by the value in GR...
  • Page 264 Operation: if (PR[qp]) { if (imm_base_update_form) check_target_register(r if (tmp_isrcode = fp_reg_disabled(f , 0, 0, 0)) disabled_fp_register_fault(tmp_isrcode, WRITE); if (GR[r ].nat || (!spill_form && (FR[f ] == NATVAL))) register_nat_consumption_fault(WRITE); size = spill_form ? 16 : (integer_form ? 8 : fsz); itype = WRITE; if (size == 10) itype |= UNCACHE_OPT;...
  • Page 265 sub — Subtract ) sub register_form Format: ) sub minus1_form, register_form ) sub imm8_form The second source operand (and an optional constant 1) are subtracted from the first Description: operand and the result placed in GR . In the register form the first operand is GR ;...
  • Page 266 sum — Set User Mask ) sum Format: operand is ORed with the user mask (PSR{5:0}) and the result is placed in Description: the user mask. See Section 3.3.2, “Processor Status Register (PSR)” on page 2:23. PSR.up can only be set if the secure performance monitor bit (PSR.sp) is zero. Otherwise PSR.up is not modified.
  • Page 267: Xsz Mnemonic Values

    sxt — Sign Extend (qp) sxt Format: xsz r The value in GR is sign extended from the bit position specified by xsz and the result Description: is placed in GR . The mnemonic values for xsz are given in Table 2-52.
  • Page 268 sync sync — Memory Synchronization ) sync.i Format: sync.i ensures that when previously initiated Flush Cache (fc, fc.i) operations issued Description: by the local processor become visible to local data memory references, prior Flush Cache operations are also observed by the local processor instruction fetch stream. sync.i also ensures that at the time previously initiated Flush Cache (fc, fc.i) operations are observed on a remote processor by data memory references they are also observed by instruction memory references on the remote processor.
  • Page 269 tak — Translation Access Key ) tak Format: The protection key for a given virtual address is obtained and placed in GR Description: When PSR.dt is 1, the DTLB and the VHPT are searched for the virtual address specified by GR and the region register indexed by GR bits {63:61}.
  • Page 270: Test Bit Relations For Normal And Unc Tbits

    tbit tbit — Test Bit ) tbit. Format: trel ctype p The bit specified by the immediate is selected from GR r . The selected bit forms a Description: single bit result either complemented or not depending on the trel completer. This result is written to the two predicate register destinations .
  • Page 271 tbit Operation: if (PR[qp]) { if (p == p illegal_operation_fault(); if (trel == ‘nz’) // ‘nz’ - test for 1 tmp_rel = GR[r ]{pos else // ‘z’ - test for 0 tmp_rel = !GR[r ]{pos switch (ctype) { case ‘and’: // and-type compare if (GR[r ].nat || !tmp_rel) {...
  • Page 272: Test Feature Relations For Normal And Unc

    tf — Test Feature ) tf. Format: trel ctype p value (in the range of 32-63) selects the feature bit defined in Table 2-57 to be Description: tested from the features vector in CPUID[4]. See Section 3.1.11, “Processor Identification Registers” on page 1:34 for details on CPUID registers.
  • Page 273 Operation: if (PR[qp]) { if (p == p illegal_operation_fault(); tmp_rel = (psr.vm && pal_vp_env_enabled() && VAC.a_tf) ? vcpuid[4]{imm5} : cpuid[4]{imm5}; if (trel == ‘z’) // ‘z’ - test for 0, not 1 tmp_rel = !tmp_rel; switch (ctype) { case ‘and’: // and-type compare if (!tmp_rel) { PR[p...
  • Page 274 thash thash — Translation Hashed Entry Address ) thash Format: A Virtual Hashed Page Table (VHPT) entry address is generated based on the specified Description: virtual address and the result is placed in GR . The virtual address is specified by GR and the region register selected by GR bits {63:61}.
  • Page 275: Test Nat Relations For Normal And Unc Tnats

    tnat tnat — Test NaT ) tnat. Format: trel ctype p The NaT bit from GR forms a single bit result, either complemented or not depending Description: on the trel completer. This result is written to the two predicate register destinations, .
  • Page 276 tnat Operation: if (PR[qp]) { if (p == p illegal_operation_fault(); if (trel == ‘nz’) // ‘nz’ - test for 1 tmp_rel = GR[r ].nat; else // ‘z’ - test for 0 tmp_rel = !GR[r ].nat; switch (ctype) { case ‘and’: // and-type compare if (!tmp_rel) { PR[p...
  • Page 277 tpa — Translate to Physical Address ) tpa Format: The physical address for the virtual address specified by GR is obtained and placed in Description: When PSR.dt is 1, the DTLB and the VHPT are searched for the virtual address specified by GR and the region register indexed by GR bits {63:61}.
  • Page 278 ttag ttag — Translation Hashed Entry Tag ) ttag Format: A tag used for matching during searches of the long format Virtual Hashed Page Table Description: (VHPT) is generated and placed in GR . The virtual address is specified by GR the region register selected by GR bits {63:61}.
  • Page 279 unpack unpack — Unpack ) unpack1.h one_byte_form, high_form Format: ) unpack2.h two_byte_form, high_form ) unpack4.h four_byte_form, high_form ) unpack1.l one_byte_form, low_form ) unpack2.l two_byte_form, low_form ) unpack4.l four_byte_form, low_form The data elements of GR are unpacked, and the result placed in GR .
  • Page 280 unpack Figure 2-45. Unpack Operation GR r GR r unpack1.h GR r GR r GR r unpack1.l GR r GR r GR r unpack2.h GR r GR r GR r unpack2.l GR r GR r GR r unpack4.h GR r GR r GR r unpack4.l...
  • Page 281 unpack Operation: if (PR[qp]) { check_target_register(r if (one_byte_form) { // one-byte elements x[0] = GR[r ]{7:0}; y[0] = GR[r ]{7:0}; x[1] = GR[r ]{15:8}; y[1] = GR[r ]{15:8}; x[2] = GR[r ]{23:16}; y[2] = GR[r ]{23:16}; x[3] = GR[r ]{31:24}; y[3] = GR[r ]{31:24};...
  • Page 282 vmsw vmsw — Virtual Machine Switch vmsw.0 zero_form Format: vmsw.1 one_form This instruction sets the PSR.vm bit to the specified value. This instruction can be used Description: to implement transitions to/from virtual machine mode without the overhead of an interruption. If instruction address translation is enabled and the page containing the vmsw instruction has access rights equal to 7, then the new value is written to the PSR.vm bit.
  • Page 283: Memory Exchange Size

    xchg xchg — Exchange ) xchg Format: ldhint r A value consisting of sz bytes is read from memory starting at the address specified by Description: the value in GR . The least significant sz bytes of the value in GR r are written to memory starting at the address specified by the value in GR r .
  • Page 284 xchg Operation: if (PR[qp]) { check_target_register(r if (GR[r ].nat || GR[r ].nat) register_nat_consumption_fault(SEMAPHORE); paddr = tlb_translate(GR[r ], sz, SEMAPHORE, PSR.cpl, &mattr, &tmp_unused); if (!ma_supports_semaphores(mattr)) unsupported_data_reference_fault(SEMAPHORE, GR[r val = mem_xchg(GR[r ], paddr, sz, UM.be, mattr, ACQUIRE, ldhint); alat_inval_multiple_entries(paddr, sz); GR[r ] = zero_ext(val, sz * 8); GR[r ].nat = 0;...
  • Page 285 xma — Fixed-Point Multiply Add ) xma.l low_form Format: ) xma.lu pseudo-op of: ( ) xma.l ) xma.h high_form ) xma.hu high_unsigned_form Two source operands (FR and FR ) are treated as either signed or unsigned integers Description: and multiplied. The third source operand (FR ) is zero extended and added to the product.
  • Page 286 Operation: if (PR[qp]) { fp_check_target_register(f if (tmp_isrcode = fp_reg_disabled(f disabled_fp_register_fault(tmp_isrcode, 0); if (fp_is_natval(FR[f ]) || fp_is_natval(FR[f ]) || fp_is_natval(FR[f ])) { FR[f ] = NATVAL; } else { if (low_form || high_form) tmp_res_128 = fp_I64_x_I64_to_I128(FR[f ].significand, FR[f ].significand); else // high_unsigned_form tmp_res_128 = fp_U64_x_U64_to_U128(FR[f ].significand, FR[f...
  • Page 287 xmpy xmpy — Fixed-Point Multiply ) xmpy.l pseudo-op of: ( ) xma.l , f0 Format: ) xmpy.lu pseudo-op of: ( ) xma.l , f0 ) xmpy.h pseudo-op of: ( ) xma.h ) xmpy.hu pseudo-op of: ( ) xma.hu , f0 Two source operands (FR and FR ) are treated as either signed or unsigned integers...
  • Page 288 xor — Exclusive Or ) xor register_form Format: ) xor imm8_form The two source operands are logically XORed and the result placed in GR . In the Description: register_form the first operand is GR ; in the imm8_form the first operand is taken from the encoding field.
  • Page 289 zxt — Zero Extend (qp) zxt Format: xsz r The value in GR is zero extended above the bit position specified by xsz and the result Description: is placed in GR . The mnemonic values for xsz are given in Table 2-52 on page 3:258.
  • Page 290: Pseudo-Code Functions

    Pseudo-Code Functions This chapter contains a table of all pseudo-code functions used on the Itanium instruction pages. Table 3-1. Pseudo-code Functions Function Operation xxx_fault(parameters ...) There are several fault functions. Each fault function accepts parameters specific to the fault, e.g., exception code values, virtual addresses, etc. If the fault is deferred for speculative load exceptions the fault function will return with a deferral indication.
  • Page 291 ® ® Intel Itanium Architecture Software Developer’s Manual Rev. 2.3 Table 3-1. Pseudo-code Functions (Continued) Function Operation check_branch_implemented(check_type) Implementation-dependent routine which returns TRUE or FALSE, depending on whether a failing check instruction causes a branch (TRUE), or a Speculative Operation fault (FALSE). The result may be different for different types of check instructions: CHKS_GENERAL, CHKS_FLOAT, CHKA_GENERAL, CHKA_FLOAT.
  • Page 292 Table 3-1. Pseudo-code Functions (Continued) Function Operation fp_is_nan_or_inf(freg) Returns true if the floating-point exception_fault_check functions returned a IEEE fault disabled default result or a propagated NaN. fp_is_natval(freg) Returns true when floating register contains a NaTVal fp_is_normal(freg) Returns true when floating register contains a normal number. fp_is_pos_inf(freg) Returns true when floating register contains a positive infinity.
  • Page 293 ® ® Intel Itanium Architecture Software Developer’s Manual Rev. 2.3 Table 3-1. Pseudo-code Functions (Continued) Function Operation impl_check_mov_itir() Implementation-specific function that returns TRUE if ITIR is checked for reserved fields and encodings on a mov to ITIR instruction. impl_check_mov_psr_l(gr) Implementation-specific function to check bits {63:32} of gr corresponding to reserved fields of the PSR for Reserved Register/Field fault.
  • Page 294 Table 3-1. Pseudo-code Functions (Continued) Function Operation is_read_only_reg(rtype, raddr) Returns a one if the register addressed by raddr in the register bank of type rtype is a read only register. is_reserved_field(regclass, arg2, arg3) Returns true if the specified data would write a one in a reserved field. is_reserved_reg(regclass, regnum) Returns true if register regnum is reserved in the regclass register file.
  • Page 295 ® ® Intel Itanium Architecture Software Developer’s Manual Rev. 2.3 Table 3-1. Pseudo-code Functions (Continued) Function Operation mem_xchg_add(add_val, paddr, size, Returns size bytes from memory starting at the physical address specified by byte_order, mattr, otype, hint) paddr. The read is conditioned by the locality hint specified by hint. The least...
  • Page 296 Table 3-1. Pseudo-code Functions (Continued) Function Operation rse_load(type) Restores a register or NaT collection from the backing store (load_address = RSE.BspLoad - 8). If load_address{8:3} is equal to 0x3f then a NaT collection is loaded into a NaT dispersal register. (dispersal register may not be the same as AR[RNAT].) If load_address{8:3} is not equal to 0x3f then the register RSE.LoadReg - 1 is loaded and the NaT bit for that register is set to dispersal_register{load_address{8:3}}.
  • Page 297 ® ® Intel Itanium Architecture Software Developer’s Manual Rev. 2.3 Table 3-1. Pseudo-code Functions (Continued) Function Operation spontaneous_deferral(paddr, size, Implementation-dependent routine which optionally forces *defer to TRUE if all of border, mattr, otype, hint, *defer) the following are true: spontaneous deferral is enabled, spontaneous deferral is permitted by the programming model, and the processor determines it would be advantageous to defer the speculative load (e.g., based on a miss in some particular...
  • Page 298 Table 3-1. Pseudo-code Functions (Continued) Function Operation tlb_may_purge_itc_entries(rid, vaddr, May locally purge ITC entries that match the specified virtual address (vaddr), region size) identifier (rid) and page size (size). May also invalidate entries that partially overlap the parameters. The extent of purging is implementation dependent. If the purge size is not supported, an implementation may generate a machine check abort or over purge the translation cache up to and including removal of all entries from the translation cache.
  • Page 299 ® ® Intel Itanium Architecture Software Developer’s Manual Rev. 2.3 Table 3-1. Pseudo-code Functions (Continued) Function Operation tlb_translate(vaddr, size, type, cpl, *attr, Returns the translated data physical address for the specified virtual memory address *defer) (vaddr) when translation enabled; otherwise, returns vaddr. size specifies the size of the access, type specifies the type of access (e.g., read, write, advance, spec).
  • Page 300 Table 3-1. Pseudo-code Functions (Continued) Function Operation unimplemented_physical_address(paddr) Return TRUE if the presented physical address is unimplemented on this processor model; FALSE otherwise. This function is model specific. unimplemented_virtual_address(vaddr, Return TRUE if the presented virtual address is unimplemented on this processor model;...
  • Page 301 ® ® Intel Itanium Architecture Software Developer’s Manual Rev. 2.3 3:292 Volume 3: Pseudo-Code Functions...
  • Page 302: Instruction Formats

    Instruction Formats Each Itanium instruction is categorized into one of six types; each instruction type may be executed on one or more execution unit types. Table 4-1 lists the instruction types and the execution unit type on which they are executed: Table 4-1.
  • Page 303: Format Summary

    Table 4-2. Template Field Encoding and Instruction Slot Mapping Template Slot 0 Slot 1 Slot 2 M-unit I-unit I-unit M-unit I-unit I-unit M-unit I-unit I-unit M-unit I-unit I-unit M-unit L-unit X-unit M-unit L-unit X-unit M-unit M-unit I-unit M-unit M-unit I-unit M-unit M-unit I-unit...
  • Page 304: Major Opcode Assignments

    • Reserved major ops (light gray in the gray scale version of Table 4-3, brown in the color version) cause an Illegal Operation fault. • Reserved if PR[qp] is 1 major ops (dark gray in the gray scale version of Table 4-3, purple in the color version) cause an Illegal Operation fault if the predicate register...
  • Page 305: Instruction Format Summary

    Table 4-4. Instruction Format Summary 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Shift L and Add ALU Imm Add Imm...
  • Page 306 Table 4-4. Instruction Format Summary (Continued) 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Int Spec Check FP Spec Check Int ALAT Check...
  • Page 307: Instruction Field Color Key

    Table 4-5. Instruction Field Color Key Field & Color ALU Instruction Opcode Extension Integer Instruction Opcode Hint Extension Memory Instruction Immediate Branch Instruction Indirect Source Floating-point Instruction Predicate Destination Integer Source Integer Destination Memory Source Memory Source & Destination Shift Source Shift Immediate Special Register Source Special Register Destination...
  • Page 308: Special Instruction Notations

    Table 4-6. Instruction Field Names (Continued) Field Name Description sof, sol, sor alloc size of frame, size of locals, size of rotating immediates compare type opcode extension , timm branch predict tag immediate reserved opcode extension field branch whether hint opcode extension x, x opcode extension of length 1 or n extract/deposit/test bit/test NaT/hint opcode extension...
  • Page 309: A-Unit Instruction Encodings

    Some processors may implement the Reserved if PR[qp] is 1 (purple) and Reserved if PR[qp] is 1 B-unit (cyan) encodings in the L+X opcode space as Reserved (brown). These encodings appear in the L+X column of Table 4-3 on page 3:295, and in Table 4-69 on page 3:366,...
  • Page 310: Integer Alu 4-Bit+2-Bit Opcode Extensions

    Table 4-9. Integer ALU 4-bit+2-bit Opcode Extensions Opcode Bits 28:27 Bits Bits Bits 40:37 35:34 32:29 add +1 sub -1 addp4 andcm shladd shladdp4 sub – imm and – imm andcm – imm or – imm xor – imm 4.2.1.1 Integer ALU –...
  • Page 311: Integer Compare

    4.2.1.3 Integer ALU – Immediate -Register 37 36 35 34 33 32 29 28 27 26 20 19 13 12 Extension Instruction Operands Opcode andcm = imm 4.2.1.4 Add Immediate 37 36 35 34 33 32 27 26 20 19 13 12 Extension Instruction...
  • Page 312: Integer Compare Opcode Extensions

    Table 4-10. Integer Compare Opcode Extensions Opcode Bits 40:37 Bits 35:34 cmp.lt cmp.ltu cmp.eq cmp.lt.unc cmp.ltu.unc cmp.eq.unc cmp.eq.and cmp.eq.or cmp.eq.or.andcm cmp.ne.and cmp.ne.or cmp.ne.or.andcm cmp.gt.and cmp.gt.or cmp.gt.or.andcm cmp.le.and cmp.le.or cmp.le.or.andcm cmp.ge.and cmp.ge.or cmp.ge.or.andcm cmp.lt.and cmp.lt.or cmp.lt.or.andcm cmp4.lt cmp4.ltu cmp4.eq cmp4.lt.unc cmp4.ltu.unc cmp4.eq.unc cmp4.eq.and cmp4.eq.or...
  • Page 313 4.2.2.1 Integer Compare – Register-Register 37 36 35 34 33 32 27 26 20 19 13 12 11 C - E Extension Instruction Operands Opcode cmp.lt cmp.ltu cmp.eq cmp.lt.unc cmp.ltu.unc cmp.eq.unc cmp.eq.and cmp.eq.or cmp.eq.or.andcm cmp.ne.and cmp.ne.or cmp.ne.or.andcm cmp4.lt cmp4.ltu cmp4.eq cmp4.lt.unc cmp4.ltu.unc cmp4.eq.unc...
  • Page 314 4.2.2.2 Integer Compare to Zero – Register 37 36 35 34 33 32 27 26 20 19 13 12 11 C - E Extension Instruction Operands Opcode cmp.gt.and cmp.gt.or cmp.gt.or.andcm cmp.le.and cmp.le.or cmp.le.or.andcm cmp.ge.and cmp.ge.or cmp.ge.or.andcm cmp.lt.and cmp.lt.or cmp.lt.or.andcm = r0, r cmp4.gt.and cmp4.gt.or cmp4.gt.or.andcm...
  • Page 315: Multimedia Alu 2-Bit+1-Bit Opcode Extensions

    4.2.2.3 Integer Compare – Immediate-Register 37 36 35 34 33 32 27 26 20 19 13 12 11 C - E Extension Instruction Operands Opcode cmp.lt cmp.ltu cmp.eq cmp.lt.unc cmp.ltu.unc cmp.eq.unc cmp.eq.and cmp.eq.or cmp.eq.or.andcm cmp.ne.and cmp.ne.or cmp.ne.or.andcm = imm cmp4.lt cmp4.ltu cmp4.eq cmp4.lt.unc...
  • Page 316: Multimedia Alu Size 1 4-Bit+2-Bit Opcode Extensions

    Table 4-13. Multimedia ALU Size 1 4-bit+2-bit Opcode Extensions Opcode Bits 28:27 Bits Bits Bits 40:37 35:34 32:29 padd1 padd1.sss padd1.uuu padd1.uus psub1 psub1.sss psub1.uuu psub1.uus pavg1 pavg1.raz pavgsub1 pcmp1.eq pcmp1.gt Table 4-14. Multimedia ALU Size 2 4-bit+2-bit Opcode Extensions Opcode Bits 28:27 Bits...
  • Page 317: Multimedia Alu Size 4 4-Bit+2-Bit Opcode Extensions

    Table 4-15. Multimedia ALU Size 4 4-bit+2-bit Opcode Extensions Opcode Bits 28:27 Bits Bits Bits 40:37 35:34 32:29 padd4 psub4 pcmp4.eq pcmp4.gt 3:308 Volume 3: Instruction Formats...
  • Page 318 4.2.3.1 Multimedia ALU 37 36 35 34 33 32 29 28 27 26 20 19 13 12 Extension Instruction Operands Opcode padd1 padd2 padd4 padd1.sss padd2.sss padd1.uuu padd2.uuu padd1.uus padd2.uus psub1 psub2 psub4 psub1.sss psub2.sss psub1.uuu psub2.uuu psub1.uus psub2.uus pavg1 pavg2 pavg1.raz pavg2.raz...
  • Page 319: I-Unit Instruction Encodings

    I-Unit Instruction Encodings 4.3.1 Multimedia and Variable Shifts All multimedia multiply/shift/max/min/mix/mux/pack/unpack and variable shift instructions are encoded within major opcode 7 using two 1-bit opcode extension fields in bits 36 (z ) and 33 (z ) and a 1-bit reserved opcode extension in bit 32 (v ) as shown in Table...
  • Page 320: Multimedia Opcode 7 Size 2 2-Bit Opcode Extensions

    Table 4-18. Multimedia Opcode 7 Size 2 2-bit Opcode Extensions Opcode Bits 31:30 Bits Bits Bits 40:37 35:34 29:28 pshr2.u – var pshl2 – var pmpyshr2.u pshr2 – var pmpyshr2 pshr2.u – fixed popcnt pshr2 – fixed pack2.uss unpack2.h mix2.r pmpy2.r pack2.sss unpack2.l...
  • Page 321: Variable Shift Opcode 7 2-Bit Opcode Extensions

    Table 4-20. Variable Shift Opcode 7 2-bit Opcode Extensions Opcode Bits 31:30 Bits Bits Bits 40:37 35:34 29:28 shr.u – var shl – var shr – var 4.3.1.1 Multimedia Multiply and Shift 37 36 35 34 33 32 31 30 29 28 27 26 20 19 13 12 Extension...
  • Page 322 4.3.1.2 Multimedia Multiply/Mix/Pack/Unpack 37 36 35 34 33 32 31 30 29 28 27 26 20 19 13 12 Extension Instruction Operands Opcode mpy4 mpyshl4 pmpy2.r pmpy2.l mix1.r mix2.r mix4.r mix1.l mix2.l mix4.l pack2.uss pack2.sss pack4.sss unpack1.h unpack2.h unpack4.h unpack1.l unpack2.l unpack4.l pmin1.u...
  • Page 323 4.3.1.5 Shift Right – Variable 37 36 35 34 33 32 31 30 29 28 27 26 20 19 13 12 Extension Instruction Operands Opcode pshr2 pshr4 pshr2.u pshr4.u shr.u 4.3.1.6 Multimedia Shift Right – Fixed 37 36 35 34 33 32 31 30 29 28 27 26 20 19 18 14 13 12 count...
  • Page 324: Integer Shifts

    4.3.1.9 Bit Strings 37 36 35 34 33 32 31 30 29 28 27 26 20 19 13 12 Extension Instruction Operands Opcode popcnt 4.3.2 Integer Shifts The integer shift, test bit, and test NaT instructions are encoded within major opcode 5 using a 2-bit opcode extension field in bits 35:34 (x ) and a 1-bit opcode extension field in bit 33 (x).
  • Page 325: Test Bit

    4.3.2.2 Extract 37 36 35 34 33 32 27 26 20 19 14 13 12 Extension Instruction Operands Opcode extr.u , pos , len extr 4.3.2.3 Zero and Deposit 37 36 35 34 33 32 27 26 25 20 19 13 12 cpos Extension...
  • Page 326: Test Bit Opcode Extensions

    Table 4-23. Test Bit Opcode Extensions Opcode Bit 19 Bits Bits 40:37 Bit 33 Bit 36 Bit 12 Bit 13 35:34 tbit.z tnat.z tf.z tbit.z.unc tnat.z.unc tf.z.unc tbit.z.and tnat.z.and tf.z.and tbit.nz.and tnat.nz.and tf.nz.and tbit.z.or tnat.z.or tf.z.or tbit.nz.or tnat.nz.or tf.nz.or tbit.z.or.andcm tnat.z.or.andcm tf.z.or.andcm tbit.nz.or.andcm...
  • Page 327: Miscellaneous I-Unit Instructions

    4.3.3.2 Test NaT 37 36 35 34 33 32 27 26 20 19 18 14 13 12 11 Extension Instruction Operands Opcode tnat.z tnat.z.unc tnat.z.and tnat.nz.and tnat.z.or tnat.nz.or tnat.z.or.andcm tnat.nz.or.andcm 4.3.4 Miscellaneous I-Unit Instructions The miscellaneous I-unit instructions are encoded in major opcode 0 using a 3-bit opcode extension field (x ) in bits 35:33.
  • Page 328: Misc I-Unit 6-Bit Opcode Extensions

    Table 4-25. Misc I-Unit 6-bit Opcode Extensions Opcode Bits Bits Bits 32:31 Bits 40:37 35:33 30:27 break.i zxt1 mov from ip 1-bit Ext (Table 4-26) zxt2 mov from b zxt4 mov.i from ar mov from pr sxt1 sxt2 sxt4 czx1.l czx2.l mov.i to ar –...
  • Page 329: Gr/Br Moves

    4.3.4.2 Break (I-Unit) 37 36 35 33 32 27 26 25 Extension Instruction Operands Opcode break.i 4.3.4.3 Integer Speculation Check (I-Unit) 37 36 35 33 32 20 19 13 12 Extension Instruction Operands Opcode chk.s.i , target 4.3.5 GR/BR Moves The GR/BR move instructions are encoded in major opcode 0.
  • Page 330: Gr/Predicate/Ip Moves

    4.3.5.2 Move from BR 37 36 35 33 32 27 26 16 15 13 12 Extension Instruction Operands Opcode 4.3.6 GR/Predicate/IP Moves The GR/Predicate/IP move instructions are encoded in major opcode 0. See “Miscellaneous I-Unit Instructions” on page 3:318 for a summary of the opcode extensions.
  • Page 331: Sign/Zero Extend/Compute Zero Index

    4.3.7.1 Move to AR – Register (I-Unit) 37 36 35 33 32 27 26 20 19 13 12 Extension Instruction Operands Opcode mov.i 4.3.7.2 Move to AR – Immediate (I-Unit) 37 36 35 33 32 27 26 20 19 13 12 Extension Instruction Operands...
  • Page 332: Test Feature

    4.3.9 Test Feature 37 36 35 34 33 32 27 26 20 19 18 14 13 12 11 Extension Instruction Operands Opcode tf.z tf.z.unc tf.z.and tf.nz.and = imm tf.z.or tf.nz.or tf.z.or.andcm tf.nz.or.andcm M-Unit Instruction Encodings 4.4.1 Loads and Stores All load and store instructions are encoded within major opcodes 4, 5, 6, and 7 using a 6-bit opcode extension field in bits 35:30 (x ).
  • Page 333: Integer Load/Store Opcode Extensions

    opcode extensions are summarized in Table 4-34 on page 3:326, Table 4-35 on page 3:326, and Table 4-36 on page 3:327, the floating-point load pair and set FR opcode extensions in Table 4-37 on page 3:327 Table 4-38 on page 3:328.
  • Page 334: Integer Load/Store +Imm Opcode Extensions

    Table 4-32. Integer Load/Store +Imm Opcode Extensions Opcode Bits Bits 31:30 Bits 40:37 35:32 ld1.s ld2.s ld4.s ld8.s ld1.a ld2.a ld4.a ld8.a ld1.sa ld2.sa ld4.sa ld8.sa ld1.bias ld2.bias ld4.bias ld8.bias ld1.acq ld2.acq ld4.acq ld8.acq ld8.fill ld1.c.clr ld2.c.clr ld4.c.clr ld8.c.clr ld1.c.nc ld2.c.nc ld4.c.nc ld8.c.nc...
  • Page 335: Floating-Point Load/Store/Lfetch Opcode Extensions

    Table 4-34. Floating-point Load/Store/Lfetch Opcode Extensions Opcode Bits Bits 31:30 Bits 40:37 35:32 ldfe ldf8 ldfs ldfd ldfe.s ldf8.s ldfs.s ldfd.s ldfe.a ldf8.a ldfs.a ldfd.a ldfe.sa ldf8.sa ldfs.sa ldfd.sa ldf.fill ldfe.c.clr ldf8.c.clr ldfs.c.clr ldfd.c.clr ldfe.c.nc ldf8.c.nc ldfs.c.nc ldfd.c.nc lfetch lfetch.excl lfetch.fault lfetch.fault.excl stfe...
  • Page 336: Floating-Point Load/Store/Lfetch +Imm Opcode Extensions

    Table 4-36. Floating-point Load/Store/Lfetch +Imm Opcode Extensions Opcode Bits Bits 31:30 Bits 40:37 35:32 ldfe ldf8 ldfs ldfd ldfe.s ldf8.s ldfs.s ldfd.s ldfe.a ldf8.a ldfs.a ldfd.a ldfe.sa ldf8.sa ldfs.sa ldfd.sa ldf.fill ldfe.c.clr ldf8.c.clr ldfs.c.clr ldfd.c.clr ldfe.c.nc ldf8.c.nc ldfs.c.nc ldfd.c.nc lfetch lfetch.excl lfetch.fault lfetch.fault.excl...
  • Page 337: Floating-Point Load Pair +Imm Opcode Extensions

    Table 4-38. Floating-point Load Pair +Imm Opcode Extensions Opcode Bits Bits 31:30 Bits 40:37 35:32 ldfp8 ldfps ldfpd ldfp8.s ldfps.s ldfpd.s ldfp8.a ldfps.a ldfpd.a ldfp8.sa ldfps.sa ldfpd.sa ldfp8.c.clr ldfps.c.clr ldfpd.c.clr ldfp8.c.nc ldfps.c.nc ldfpd.c.nc The load and store instructions all have a 2-bit cache locality opcode hint extension field in bits 29:28 (hint).
  • Page 338 4.4.1.1 Integer Load 37 36 35 30 29 28 27 26 20 19 13 12 hint x Extension Instruction Operands Opcode hint ld1.ldhint ld2.ldhint ld4.ldhint ld8.ldhint ld1.s.ldhint ld2.s.ldhint ld4.s.ldhint ld8.s.ldhint ld1.a.ldhint ld2.a.ldhint ld4.a.ldhint ld8.a.ldhint ld1.sa.ldhint ld2.sa.ldhint ld4.sa.ldhint ld8.sa.ldhint ld1.bias.ldhint ld2.bias.ldhint ld4.bias.ldhint = [r See Table 4-39...
  • Page 339 4.4.1.2 Integer Load – Increment by Register 37 36 35 30 29 28 27 26 20 19 13 12 hint x Extension Instruction Operands Opcode hint ld1.ldhint ld2.ldhint ld4.ldhint ld8.ldhint ld1.s.ldhint ld2.s.ldhint ld4.s.ldhint ld8.s.ldhint ld1.a.ldhint ld2.a.ldhint ld4.a.ldhint ld8.a.ldhint ld1.sa.ldhint ld2.sa.ldhint ld4.sa.ldhint ld8.sa.ldhint ld1.bias.ldhint...
  • Page 340 4.4.1.3 Integer Load – Increment by Immediate 37 36 35 30 29 28 27 26 20 19 13 12 hint i Extension Instruction Operands Opcode hint ld1.ldhint ld2.ldhint ld4.ldhint ld8.ldhint ld1.s.ldhint ld2.s.ldhint ld4.s.ldhint ld8.s.ldhint ld1.a.ldhint ld2.a.ldhint ld4.a.ldhint ld8.a.ldhint ld1.sa.ldhint ld2.sa.ldhint ld4.sa.ldhint ld8.sa.ldhint ld1.bias.ldhint...
  • Page 341 4.4.1.4 Integer Store 37 36 35 30 29 28 27 26 20 19 13 12 hint x Extension Instruction Operands Opcode hint st1.sthint st2.sthint st4.sthint st8.sthint st1.rel.sthint ] = r See Table 4-40 st2.rel.sthint on page 3:328 st4.rel.sthint st8.rel.sthint st8.spill.sthint st16.sthint ] = r , ar.csd...
  • Page 342 4.4.1.6 Floating-point Load 37 36 35 30 29 28 27 26 20 19 13 12 hint x Extension Instruction Operands Opcode hint ldfs.ldhint ldfd.ldhint ldf8.ldhint ldfe.ldhint ldfs.s.ldhint ldfd.s.ldhint ldf8.s.ldhint ldfe.s.ldhint ldfs.a.ldhint ldfd.a.ldhint ldf8.a.ldhint ldfe.a.ldhint See Table 4-39 ldfs.sa.ldhint = [r on page 3:328 ldfd.sa.ldhint ldf8.sa.ldhint...
  • Page 343 4.4.1.7 Floating-point Load – Increment by Register 37 36 35 30 29 28 27 26 20 19 13 12 hint x Extension Instruction Operands Opcode hint ldfs.ldhint ldfd.ldhint ldf8.ldhint ldfe.ldhint ldfs.s.ldhint ldfd.s.ldhint ldf8.s.ldhint ldfe.s.ldhint ldfs.a.ldhint ldfd.a.ldhint ldf8.a.ldhint ldfe.a.ldhint See Table 4-39 on ldfs.sa.ldhint = [r ], r...
  • Page 344 4.4.1.8 Floating-point Load – Increment by Immediate 37 36 35 30 29 28 27 26 20 19 13 12 hint i Extension Instruction Operands Opcode hint ldfs.ldhint ldfd.ldhint ldf8.ldhint ldfe.ldhint ldfs.s.ldhint ldfd.s.ldhint ldf8.s.ldhint ldfe.s.ldhint ldfs.a.ldhint ldfd.a.ldhint ldf8.a.ldhint ldfe.a.ldhint See Table 4-39 on ldfs.sa.ldhint = [r ], imm...
  • Page 345 4.4.1.10 Floating-point Store – Increment by Immediate 37 36 35 30 29 28 27 26 20 19 13 12 hint i Extension Instruction Operands Opcode hint stfs.sthint stfd.sthint See Table 4-40 on stf8.sthint ] = f , imm page 3:328 stfe.sthint stf.spill.sthint 4.4.1.11...
  • Page 346: Line Prefetch

    4.4.1.12 Floating-point Load Pair – Increment by Immediate 37 36 35 30 29 28 27 26 20 19 13 12 hint x Extension Instruction Operands Opcode hint ldfps.ldhint = [r ], 8 ldfpd.ldhint = [r ], 16 ldfp8.ldhint ldfps.s.ldhint = [r ], 8 ldfpd.s.ldhint = [r...
  • Page 347: Semaphores

    4.4.2.1 Line Prefetch 37 36 35 30 29 28 27 26 20 19 hint x Extension Instruction Operands Opcode hint lfetch.excl.lfhint See Table 4-41 on lfetch.fault.lfhint page 3:337 lfetch.fault.excl.lfhint 4.4.2.2 Line Prefetch – Increment by Register 37 36 35 30 29 28 27 26 20 19 13 12 hint x...
  • Page 348: Set/Get

    4.4.3.1 Exchange/Compare and Exchange 37 36 35 30 29 28 27 26 20 19 13 12 hint x Extension Instruction Operands Opcode hint cmpxchg1.acq.ldhint cmpxchg2.acq.ldhint cmpxchg4.acq.ldhint cmpxchg8.acq.ldhint = [r ], r , ar.ccv cmpxchg1.rel.ldhint cmpxchg2.rel.ldhint cmpxchg4.rel.ldhint Table 4-39 on cmpxchg8.rel.ldhint page 3:328 cmp8xchg16.acq.ldhint = [r...
  • Page 349: Speculation And Advanced Load Checks

    4.4.4.1 Set FR 37 36 35 30 29 28 27 26 20 19 13 12 Extension Instruction Operands Opcode setf.sig setf.exp setf.s setf.d 4.4.4.2 Get FR 37 36 35 30 29 28 27 26 20 19 13 12 Extension Instruction Operands Opcode getf.sig...
  • Page 350: Cache/Synchronization/Rse/Alat

    4.4.5.3 Integer Advanced Load Check 37 36 35 33 32 13 12 Extension Instruction Operands Opcode chk.a.nc , target chk.a.clr 4.4.5.4 Floating-point Advanced Load Check 37 36 35 33 32 13 12 Extension Instruction Operands Opcode chk.a.nc , target chk.a.clr 4.4.6 Cache/Synchronization/RSE/ALAT The cache/synchronization/RSE/ALAT instructions are encoded in major opcode 0 along...
  • Page 351: Gr/Ar Moves (M-Unit)

    4.4.6.2 RSE Control 37 36 35 33 32 31 30 27 26 Extension Instruction Opcode flushrs loadrs 4.4.6.3 Integer ALAT Entry Invalidate 37 36 35 33 32 31 30 27 26 13 12 Extension Instruction Operands Opcode invala.e 4.4.6.4 Floating-point ALAT Entry Invalidate 37 36 35 33 32 31 30 27 26...
  • Page 352: Gr/Cr Moves

    4.4.7.1 Move to AR – Register (M-Unit) 37 36 35 33 32 27 26 20 19 13 12 Extension Instruction Operands Opcode mov.m 4.4.7.2 Move to AR – Immediate (M-Unit) 37 36 35 33 32 31 30 27 26 20 19 13 12 Extension Instruction...
  • Page 353: Miscellaneous M-Unit Instructions

    4.4.9 Miscellaneous M-Unit Instructions The miscellaneous M-unit instructions are encoded in major opcode 0 along with the system/memory management instructions. See “System/Memory Management” on page 3:345 for a summary of the opcode extensions. 4.4.9.1 Allocate Register Stack Frame 37 36 35 33 32 31 30 27 26 20 19...
  • Page 354: System/Memory Management

    4.4.10 System/Memory Management All system/memory management instructions are encoded within major opcodes 0 and 1 using a 3-bit opcode extension field (x ) in bits 35:33. Some instructions also have a 4-bit opcode extension field (x ) in bits 30:27, or a 6-bit opcode extension field (x ) in bits 32:27.
  • Page 355: Opcode 1 System/Memory Management 3-Bit Opcode Extensions

    Table 4-44. Opcode 1 System/Memory Management 3-bit Opcode Extensions Opcode Bits Bits 40:37 35:33 System/Memory Management 6-bit Ext (Table 4-45) chk.s.m – int chk.s – fp alloc Table 4-45. Opcode 1 System/Memory Management 6-bit Opcode Extensions Opcode Bits Bits Bits 32:31 Bits 40:37 35:33...
  • Page 356 4.4.10.2 Probe – Immediate 37 36 35 33 32 27 26 20 19 15 14 13 12 Extension Instruction Operands Opcode probe.r , imm probe.w 4.4.10.3 Probe Fault – Immediate 37 36 35 33 32 27 26 20 19 15 14 13 12 Extension Instruction Operands...
  • Page 357 4.4.10.6 Move from Indirect Register 37 36 35 33 32 27 26 20 19 13 12 Extension Instruction Operands Opcode = rr[r = dbr[r = ibr[r = pkr[r = pmc[r = pmd[r = cpuid[r 4.4.10.7 Set/Reset User/System Mask 37 36 35 33 32 31 30 27 26 Extension...
  • Page 358: Nop/Hint (M-Unit)

    4.4.10.9 Translation Access 37 36 35 33 32 27 26 20 19 13 12 Extension Instruction Operands Opcode thash ttag 4.4.10.10 Purge Translation Cache Entry 37 36 35 33 32 27 26 20 19 Extension Instruction Operands Opcode ptc.e 4.4.11 Nop/Hint (M-Unit) M-unit nop and hint instructions are encoded within major opcode 0 using a 3-bit opcode extension field in bits 35:33 (x...
  • Page 359: Branches

    4.5.1 Branches Opcode 0 is used for indirect branch, opcode 1 for indirect call, opcode 4 for IP-relative branch, and opcode 5 for IP-relative call. The IP-relative branch instructions encoded within major opcode 4 use a 3-bit opcode extension field in bits 8:6 (btype) to distinguish the branch types as shown in Table 4-47.
  • Page 360: Indirect Branch Types

    The indirect branch instructions encoded within major opcodes 0 use a 3-bit opcode extension field in bits 8:6 (btype) to distinguish the branch types as shown in Table 4-49. Table 4-49. Indirect Branch Types Opcode btype Bits 40:37 Bits 32:27 Bits 8:6 br.cond br.ia...
  • Page 361: Branch Whether Hint Completer

    Table 4-52. Branch Whether Hint Completer Bits 34:33 .sptk .spnt .dptk .dpnt Table 4-53. Indirect Call Whether Hint Completer Bits 34:32 .sptk .spnt .dptk .dpnt The branch instructions also have a 1-bit branch cache deallocation opcode hint extension field in bit 35 (d) as shown in Table 4-54.
  • Page 362: Branch Predict/Nop/Hint

    4.5.1.2 IP-Relative Counted Branch 37 36 35 34 33 32 13 12 11 s d wh btype Extension Instruction Operands Opcode btype br.cloop.bwh.ph.dh br.cexit.bwh.ph.dh target Table 4-51 on Table 4-52 on Table 4-54 on page 3:351 page 3:352 page 3:352 br.ctop.bwh.ph.dh 4.5.1.3 IP-Relative Call...
  • Page 363: Indirect Predict/Nop/Hint Opcode Extensions

    Table 4-55. Indirect Predict/Nop/Hint Opcode Extensions Opcode Bits Bits 32:31 Bits 40:37 30:27 nop.b hint.b brp.ret The branch predict instructions all have a 1-bit branch importance opcode hint extension field in bit 35 (ih). The mov to BR instruction (page 3:320) also has this hint in bit 23.
  • Page 364: Miscellaneous B-Unit Instructions

    Table 4-58. Indirect Predict Whether Hint Completer indwh Bits 4:3 .sptk .dptk 4.5.2.1 IP-Relative Predict 37 36 35 34 33 32 13 12 6 5 4 3 2 s ih t timm Extension Instruction Operands Opcode See Table 4-56 on See Table 4-57 on brp.ipwh.ih target...
  • Page 365: F-Unit Instruction Encodings

    Extension Instruction Opcode vmsw.0 vmsw.1 4.5.3.2 Break/Nop/Hint (B-Unit) 37 36 35 33 32 27 26 25 Extension Instruction Operands Opcode break.b nop.b hint.b F-Unit Instruction Encodings The floating-point instructions are encoded in major opcodes 8 – E for floating-point and fixed-point arithmetic, opcode 4 for floating-point compare, opcode 5 for floating-point class, and opcodes 0 and 1 for miscellaneous floating-point instructions.
  • Page 366: Opcode 0 Miscellaneous Floating-Point 6-Bit Opcode Extensions

    Table 4-60. Opcode 0 Miscellaneous Floating-point 6-bit Opcode Extensions Opcode Bits Bits 32:31 Bits 40:37 30:27 break.f fmerge.s 1-bit Ext fmerge.ns (Table 4-68) fmerge.se fsetc fmin fswap fclrf fmax fswap.nl famin fswap.nr famax fchkf fcvt.fx fpack fcvt.fxu fmix.lr fcvt.fx.trunc fmix.r fcvt.fxu.trunc fmix.l fcvt.xf...
  • Page 367: Arithmetic

    Table 4-62. Reciprocal Approximation 1-bit Opcode Extensions Opcode Bits 40:37 Bit 33 Bit 36 frcpa frsqrta fprcpa fprsqrta Most floating-point instructions have a 2-bit opcode extension field in bits 35:34 (sf) which encodes the FPSR status field to be used. Table 4-63 summarizes these assignments.
  • Page 368: Parallel Floating-Point Select

    4.6.1.1 Floating-point Multiply Add 37 36 35 34 33 27 26 20 19 13 12 8 - D x sf Extension Instruction Operands Opcode fma.sf fma.s.sf fma.d.sf fpma.sf fms.sf fms.s.sf See Table 4-63 on page 3:358 fms.d.sf fpms.sf fnma.sf fnma.s.sf fnma.d.sf fpnma.sf 4.6.1.2...
  • Page 369: Floating-Point Compare Opcode Extensions

    Table 4-66. Floating-point Compare Opcode Extensions Opcode Bit 12 Bits Bit 33 Bit 36 40:37 fcmp.eq fcmp.eq.unc fcmp.lt fcmp.lt.unc fcmp.le fcmp.le.unc fcmp.unord fcmp.unord.unc The floating-point class instructions are encoded within major opcode 5 using a 1-bit opcode extension field in bit 12 (t ) as shown in Table 4-67.
  • Page 370 4.6.4 Approximation 4.6.4.1 Floating-point Reciprocal Approximation There are two Reciprocal Approximation instructions. The first, in major op 0, encodes the full register variant. The second, in major op 1, encodes the parallel variant. 37 36 35 34 33 32 27 26 20 19 13 12 0 - 1...
  • Page 371: Minimum/Maximum And Parallel Compare

    4.6.5 Minimum/Maximum and Parallel Compare There are two groups of Minimum/Maximum instructions. The first group, in major op 0, encodes the full register variants. The second group, in major op 1, encodes the parallel variants. The parallel compare instructions are all encoded in major op 1. 37 36 35 34 33 32 27 26 20 19...
  • Page 372: Merge And Logical

    4.6.6 Merge and Logical 37 36 34 33 32 27 26 20 19 13 12 0 - 1 Extension Instruction Operands Opcode fmerge.s fmerge.ns fmerge.se fmix.lr fmix.r fmix.l fsxt.r fsxt.l fpack fswap fswap.nl fswap.nr fand fandcm fxor fpmerge.s fpmerge.ns fpmerge.se 4.6.7 Conversion 4.6.7.1...
  • Page 373: Status Field Manipulation

    4.6.7.2 Convert Fixed-point to Floating-point 37 36 34 33 32 27 26 20 19 13 12 Extension Instruction Operands Opcode fcvt.xf 4.6.8 Status Field Manipulation 4.6.8.1 Floating-point Set Controls 37 36 35 34 33 32 27 26 20 19 13 12 sf x omask amask...
  • Page 374: Miscellaneous F-Unit Instructions

    4.6.9 Miscellaneous F-Unit Instructions 4.6.9.1 Break (F-Unit) 37 36 35 34 33 32 27 26 25 Extension Instruction Operands Opcode break.f 4.6.9.2 Nop/Hint (F-Unit) F-unit nop and hint instructions are encoded within major opcode 0 using a 3-bit opcode extension field in bits 35:33 (x ), a 6-bit opcode extension field in bits 32:27 ), and a 1-bit opcode extension field in bit 26 (y), as shown in Table...
  • Page 375: Move Long Immediate

    Table 4-69. Misc X-Unit 3-bit Opcode Extensions Opcode Bits 40:37 Bits 35:33 6-bit Ext (Table 4-70) Table 4-70. Misc X-Unit 6-bit Opcode Extensions Opcode Bits Bits Bits 32:31 Bits 40:37 35:33 30:27 break.x 1-bit Ext (Table 4-73) 4.7.1.1 Break (X-Unit) 37 36 35 33 32 27 26 25...
  • Page 376: Long Branch Types

    Table 4-71. Move Long 1-bit Opcode Extensions Opcode Bits 40:37 Bit 20 movl 37 3635 2726 22 21 2019 1312 0 40 Extension Instruction Operands Opcode movl = imm 4.7.3 Long Branches Long branches are executed by a B-unit. Opcode C is used for long branch and opcode D for long call.
  • Page 377: Immediate Formation

    4.7.3.2 Long Call 37 36 35 34 33 32 13 12 11 0 40 2 1 0 i d wh Extension Instruction Operands Opcode See Table 4-51 See Table 4-52 See Table 4-54 brl.call.bwh.ph.dh = target on page 3:351 on page 3:352 on page 3:352 4.7.4 Nop/Hint (X-Unit)
  • Page 378: Immediate Formation

    Table 4-74. Immediate Formation (Continued) Instruction Immediate Formation Format mbtype = (mbt == 0) ? @brcst : (mbt == 8) ? @mix : (mbt == 9) ? @shuf : (mbt 0xA) ? @alt : (mbt == 0xB) ? @rev : reservedQP mhtype = mht count...
  • Page 379 a. This encoding causes an Illegal Operation fault if the value of the qualifying predicate is 1. § 3:370 Volume 3: Instruction Formats...
  • Page 380: Resource And Dependency Semantics

    Resource and Dependency Semantics Reading and Writing Resources An Itanium instruction is said to be a reader of a resource if the instruction’s qualifying predicate is 1 or it has no qualifying predicate or is one of the instructions that reads a resource even when its qualifying predicate is 0, and the execution of the instruction depends on that resource.
  • Page 381: Resource And Dependency Table Format Notes

    RAW and WAW dependencies are generally not allowed without some type of serialization event (an implied, data, or instruction serialization after the first writing instruction. (See Section 3.2, “Serialization” on page 2:17 for details on serialization.) The tables and associated rules in this appendix provide a comprehensive list of readers and writers of resources and describe the serialization required for the dependency to be observed and possible outcomes if the required serialization is not met.
  • Page 382: Semantics Of Dependency Codes

    may expand to contain other classes, and that when fully expanded, a set of classes (e.g., the readers of some resource) may contain the same instruction multiple times. • The syntax ‘x\y’ where x and y are both instruction classes, indicates an unnamed instruction class that includes all instructions in instruction class x but that are not in instruction class y.
  • Page 383: Special Case Instruction Rules

    Table 5-1. Semantics of Dependency Codes (Continued) Semantics of Serialization Type Required Effects of Serialization Violation Dependency Code impliedF Instruction Group Break (same as above). An undefined value is returned, or an Illegal Operation fault may be taken. If no fault is taken, stop Stop.
  • Page 384: Raw Dependencies Organized By Resource

    • A list of all architecturally-defined, independently-writable resources in the Itanium architecture. Each row represents an ‘atomic’ resource. Thus, for each row in the table, hardware will probably require a separate write-enable control signal. • For each resource, a complete list of readers and writers. •...
  • Page 385 Table 5-2. RAW Dependencies Organized by Resource (Continued) Semantics of Resource Name Writers Readers Dependency AR[ITC] mov-to-AR-ITC br.ia, mov-from-AR-ITC impliedF AR[K%], mov-to-AR-K br.ia, mov-from-AR-K impliedF % in 0 - 7 AR[LC] mod-sched-brs-counted, br.ia, mod-sched-brs-counted, impliedF mov-to-AR-LC mov-from-AR-LC AR[PFS] br.call, brl.call alloc, br.ia, br.ret, epc, impliedF mov-from-AR-PFS...
  • Page 386 Table 5-2. RAW Dependencies Organized by Resource (Continued) Semantics of Resource Name Writers Readers Dependency CR[EOI] mov-to-CR-EOI none SC Section 5.8.3.4, “End of External Interrupt Register (EOI – CR67)” on page 2:124 CR[IFA] mov-to-CR-IFA itc.i, itc.d, itr.i, itr.d implied mov-from-CR-IFA data CR[IFS] mov-to-CR-IFS...
  • Page 387 Table 5-2. RAW Dependencies Organized by Resource (Continued) Semantics of Resource Name Writers Readers Dependency CR[TPR] mov-to-CR-TPR mov-from-CR-TPR, data mov-from-CR-IVR mov-to-PSR-l , ssm SC Section 5.8.3.3, “Task Priority Register (TPR – CR66)” page 2:123 implied CR%, none mov-from-CR-rv none % in 3, 5-7, 10-15, 18, 28-63, 75-79, 82-127 DBR# mov-to-IND-DBR...
  • Page 388 Table 5-2. RAW Dependencies Organized by Resource (Continued) Semantics of Resource Name Writers Readers Dependency itr.i itr.i, itc.i, ptc.g, ptc.ga, ptc.l, ptr.i impliedF epc, vmsw instr ptr.i itc.i, itr.i impliedF ptc.g, ptc.ga, ptc.l, ptr.i none epc, vmsw instr memory mem-writers mem-readers none mem-readers, mem-writers,...
  • Page 389 Table 5-2. RAW Dependencies Organized by Resource (Continued) Semantics of Resource Name Writers Readers Dependency PSR.ac user-mask-writers-partial mem-readers, mem-writers implied mov-to-PSR-um sys-mask-writers-partial mem-readers, mem-writers data mov-to-PSR-l user-mask-writers-partial mov-from-PSR, impliedF mov-to-PSR-um, mov-from-PSR-um sys-mask-writers-partial mov-to-PSR-l mem-readers, mem-writers, impliedF mov-from-PSR, mov-from-PSR-um PSR.be user-mask-writers-partial mem-readers, mem-writers implied mov-to-PSR-um...
  • Page 390 Table 5-2. RAW Dependencies Organized by Resource (Continued) Semantics of Resource Name Writers Readers Dependency PSR.dfh sys-mask-writers-partial fr-readers , fr-writers data mov-to-PSR-l mov-from-PSR impliedF fr-readers , fr-writers impliedF mov-from-PSR PSR.dfl sys-mask-writers-partial fr-writers , fr-readers data mov-to-PSR-l mov-from-PSR impliedF fr-writers , fr-readers impliedF mov-from-PSR PSR.di...
  • Page 391 Table 5-2. RAW Dependencies Organized by Resource (Continued) Semantics of Resource Name Writers Readers Dependency PSR.pk sys-mask-writers-partial lfetch-all, mem-readers, data mov-to-PSR-l mem-writers, probe-all mov-from-PSR impliedF lfetch-all, mem-readers, impliedF mem-writers, mov-from-PSR, probe-all PSR.pp sys-mask-writers-partial mov-from-PSR impliedF mov-to-PSR-l, rfi PSR.ri none PSR.rt mov-to-PSR-l mov-from-PSR impliedF...
  • Page 392: Waw Dependency Table

    5.3.3 WAW Dependency Table General rules specific to the WAW table: • All resources require at most an instruction group break to provide sequential behavior. • Some resources require no instruction group break to provide sequential behavior. • There are a few special cases that are described in greater detail elsewhere in the manual and are indicated with an SC (special case) result.
  • Page 393 Table 5-3. WAW Dependencies Organized by Resource (Continued) Semantics of Resource Name Writers Dependency AR[K%], mov-to-AR-K impliedF % in 0 - 7 AR[LC] mod-sched-brs-counted, mov-to-AR-LC impliedF AR[PFS] br.call, brl.call none br.call, brl.call mov-to-AR-PFS impliedF AR[RNAT] alloc, flushrs, loadrs, impliedF mov-to-AR-RNAT, mov-to-AR-BSPSTORE AR[RSC] mov-to-AR-RSC...
  • Page 394 Table 5-3. WAW Dependencies Organized by Resource (Continued) Semantics of Resource Name Writers Dependency CR[ITV] mov-to-CR-ITV impliedF CR[IVA] mov-to-CR-IVA impliedF CR[IVR] none CR[LID] mov-to-CR-LID CR[LRR%], mov-to-CR-LRR impliedF % in 0 - 1 CR[PMV] mov-to-CR-PMV impliedF CR[PTA] mov-to-CR-PTA impliedF CR[TPR] mov-to-CR-TPR impliedF CR%, none...
  • Page 395 Table 5-3. WAW Dependencies Organized by Resource (Continued) Semantics of Resource Name Writers Dependency PR%, pr-and-writers none % in 1 - 15 pr-or-writers none pr-unc-writers-fp pr-unc-writers-fp impliedF pr-unc-writers-int pr-unc-writers-int pr-norm-writers-fp pr-norm-writers-fp pr-norm-writers-int pr-norm-writers-int pr-and-writers pr-or-writers mov-to-PR-allreg mov-to-PR-allreg PR%, pr-and-writers none % in 16 - 62 pr-or-writers none...
  • Page 396: War Dependency Table

    Table 5-3. WAW Dependencies Organized by Resource (Continued) Semantics of Resource Name Writers Dependency PSR.mfh fr-writers none user-mask-writers-partial user-mask-writers-partial impliedF mov-to-PSR-um, fr-writers mov-to-PSR-um, sys-mask-writers-partial sys-mask-writers-partial mov-to-PSR-l, rfi mov-to-PSR-l, rfi PSR.mfl fr-writers none user-mask-writers-partial user-mask-writers-partial impliedF mov-to-PSR-um, fr-writers mov-to-PSR-um, sys-mask-writers-partial sys-mask-writers-partial mov-to-PSR-l, rfi mov-to-PSR-l, rfi PSR.pk...
  • Page 397 Rule 2. These instructions only read CFM when they access a rotating GR, FR, or PR. mov-to-PR and mov-from-PR only access CFM when their qualifying predicate is in the rotating region. Rule 3. These instructions use a general register value to determine the specific indirect register accessed.
  • Page 398: Instruction Classes

    Support Tables Table 5-5. Instruction Classes Class Events/Instructions predicatable-instructions, unpredicatable-instructions branches indirect-brs, ip-rel-brs cfm-readers fr-readers, fr-writers, gr-readers, gr-writers, mod-sched-brs, predicatable-instructions, pr-writers, alloc, br.call, brl.call, br.ret, cover, loadrs, rfi, chk-a, invala.e chk-a chk.a.clr, chk.a.nc cmpxchg cmpxchg1, cmpxchg2, cmpxchg4, cmpxchg8, cmp8xchg16 czx1, czx2 fcmp-s0 fcmp[Field(sf)==s0] fcmp-s1...
  • Page 399 Table 5-5. Instruction Classes (Continued) Class Events/Instructions ld-all-postinc ld[Format in M3}], ldfp[Format in {M12}], ldf[Format in M8}] ld-c ld-c-nc, ld-c-clr ld-c-clr ld1.c.clr, ld2.c.clr, ld4.c.clr, ld8.c.clr, ld-c-clr-acq ld-c-clr-acq ld1.c.clr.acq, ld2.c.clr.acq, ld4.c.clr.acq, ld8.c.clr.acq ld-c-nc ld1.c.nc, ld2.c.nc, ld4.c.nc, ld8.c.nc ld-s ld1.s, ld2.s, ld4.s, ld8.s ld-sa ld1.sa, ld2.sa, ld4.sa, ld8.sa ldfs, ldfd, ldfe, ldf8, ldf.fill...
  • Page 400 Table 5-5. Instruction Classes (Continued) Class Events/Instructions mov-from-AR-FDR mov-from-AR-M[Field(ar3) == FDR] mov-from-AR-FIR mov-from-AR-M[Field(ar3) == FIR] mov-from-AR-FPSR mov-from-AR-M[Field(ar3) == FPSR] mov-from-AR-FSR mov-from-AR-M[Field(ar3) == FSR] mov-from-AR-I mov_ar[Format in {I28}] mov-from-AR-ig mov-from-AR-IM[Field(ar3) in {48-63 112-127}] mov-from-AR-IM mov_ar[Format in {I28 M31}] mov-from-AR-ITC mov-from-AR-M[Field(ar3) == ITC] mov-from-AR-K mov-from-AR-M[Field(ar3) in {K0 K1 K2 K3 K4 K5 K6 K7}] mov-from-AR-LC...
  • Page 401 Table 5-5. Instruction Classes (Continued) Class Events/Instructions mov-from-IND mov_indirect[Format in {M43}] mov-from-IND-CPUID mov-from-IND[Field(ireg) == cpuid] mov-from-IND-DBR mov-from-IND[Field(ireg) == dbr] mov-from-IND-IBR mov-from-IND[Field(ireg) == ibr] mov-from-IND-PKR mov-from-IND[Field(ireg) == pkr] mov-from-IND-PMC mov-from-IND[Field(ireg) == pmc] mov-from-IND-PMD mov-from-IND[Field(ireg) == pmd] mov-from-IND-priv mov-from-IND[Field(ireg) in {dbr ibr pkr pmc rr}] mov-from-IND-RR mov-from-IND[Field(ireg) == rr] mov-from-interruption-CR...
  • Page 402 Table 5-5. Instruction Classes (Continued) Class Events/Instructions mov-to-CR-DCR mov-to-CR[Field(cr3) == DCR] mov-to-CR-EOI mov-to-CR[Field(cr3) == EOI] mov-to-CR-IFA mov-to-CR[Field(cr3) == IFA] mov-to-CR-IFS mov-to-CR[Field(cr3) == IFS] mov-to-CR-IHA mov-to-CR[Field(cr3) == IHA] mov-to-CR-IIB mov-to-CR[Field(cr3) in {IIB0 IIB1}] mov-to-CR-IIM mov-to-CR[Field(cr3) == IIM] mov-to-CR-IIP mov-to-CR[Field(cr3) == IIP] mov-to-CR-IIPA mov-to-CR[Field(cr3) == IIPA] mov-to-CR-IPSR...
  • Page 403 Table 5-5. Instruction Classes (Continued) Class Events/Instructions pavgsub pavgsub1, pavgsub2 pcmp pcmp1, pcmp2, pcmp4 pmax pmax1, pmax2 pmin pmin1, pmin2 pmpy pmpy2 pmpyshr pmpyshr2 pr-and-writers pr-gen-writers-int[Field(ctype) in {and andcm}], pr-gen-writers-int[Field(ctype) in {or.andcm and.orcm}] pr-gen-writers-fp fclass, fcmp pr-gen-writers-int cmp, cmp4, tbit, tf, tnat pr-norm-writers-fp pr-gen-writers-fp[Field(ctype)==] pr-norm-writers-int...
  • Page 404 Table 5-5. Instruction Classes (Continued) Class Events/Instructions rse-readers alloc, br.call, br.ia, br.ret, brl.call, cover, flushrs, loadrs, mov-from-AR-BSP, mov-from-AR-BSPSTORE, mov-to-AR-BSPSTORE, mov-from-AR-RNAT, mov-to-AR-RNAT, rfi rse-writers alloc, br.call, br.ia, br.ret, brl.call, cover, flushrs, loadrs, mov-to-AR-BSPSTORE, rfi st1, st2, st4, st8, st8.spill, st16 st-postinc stf[Format in {M10}], st[Format in {M5}] stfs, stfd, stfe, stf8, stf.spill sxt1, sxt2, sxt4...
  • Page 405 3:396 Volume 3: Resource and Dependency Semantics...
  • Page 406 Index 3:397 Intel® Itanium Architecture Software Developer’s Manual, Rev. 2.3...
  • Page 407 3:398 Intel® Itanium Architecture Software Developer’s Manual, Rev. 2.3...
  • Page 408 INDEX FOR VOLUMES 1, 2, 3 AND 4 Stores Register) 1:30 BSR Instruction 4:37 AAA Instruction 4:21 bsw Instruction 3:34 AAD Instruction 4:22 BSWAP Instruction 4:39 AAM Instruction 4:23 BT Instruction 4:40 AAS Instruction 4:24 BTC Instruction 4:42 Aborts 2:95, 2:538 BTR Instruction 4:44 ACPI 2:631 BTS Instruction 4:46...
  • Page 409 INDEX 1:155, 2:579 External Interrupt 2:96, 2:538 Control Speculative Load 1:156 External Interrupt Control Registers (CR64-81) Corrected Error 2:350 2:42 Corrected Machine Check Vector (CMCV) 2:126 External Interrupt Request Registers (IRR0-3) cover Instruction 3:48 2:125 CPUID (Processor Identification Register) 1:34 External Interrupt Vector Register (IVR) 2:123 CPUID Instruction 4:78 External Task Priority Cycle (XTP) 2:130...
  • Page 410 INDEX FICOM Instruction 4:128 fpabs Instruction 3:95 FICOMP Instruction 4:128 fpack Instruction 3:96 FIDIV Instruction 4:121 fpamax Instruction 3:97 FIDIVR Instruction 4:124 fpamin Instruction 3:99 FILD Instruction 4:130 FPATAN Instruction 4:149 FIMUL Instruction 4:145 fpcmp Instruction 3:101 FINCSTP Instruction 4:132 fpcvt Instruction 3:104 Firmware 1:7, 2:623 fpma Instruction 3:107...
  • Page 411 Instruction Set Transition 1:14 IA-32 Instruction Reference 4:11 Instruction Set Transitions 2:239, 2:596 IA-32 Instruction Set 2:253 Instruction Slot Mapping 1:38 IA-32 Intel® MMX™ Technology 1:129 Instruction Slots 1:38 IA-32 Intercept INSW Instruction 4:214 Gate Intercept Trap 2:235 INT (External Interrupt) 2:96...
  • Page 412 INDEX INTA (Interrupt Acknowledge) 2:130 INTn Instruction 4:217 Inter-processor Interrupt (IPI) 2:127 INTO Instruction 4:217 Interrupt Acknowledge Cycle 2:130 invala Instruction 3:146 Interruption Control Registers (CR16-27) 2:36 INVD instructions 4:228 Interruption Handler 2:537 INVLPG Instruction 4:230 Interruption Handling 2:543 IP (Instruction Pointer) 1:27, 1:140 Interruption Hash Address 2:41 IPI (Inter-processor Interrupt) 2:127 Interruption Instruction Bundle Registers (IIB0-1)
  • Page 413 INDEX LGS Instruction 4:255 MOVAPS Instruction 4:527 LIDT Instruction 4:264 MOVD Instruction 4:401 LLDT Instruction 4:267 MOVHLPS Instruction 4:529 LMSW Instruction 4:270 MOVHPS Instruction 4:530 Load Instructions 1:58 movl Instruction 3:187 loadrs Instruction 3:167 MOVLHPS Instruction 4:532 Loads from Memory 1:147 MOVLPS Instruction 4:533 Local Redirection Registers (LRR0-1) 2:126 MOVMSKPS Instruction 4:535...
  • Page 414 INDEX Illegal Dependency Fault 2:584 PAL_CACHE_READ 2:380 Long Branch Emulation 2:585 PAL_CACHE_SHARED_INFO 2:382 Multiple Address Spaces 1:20, 2:562 PAL_CACHE_SUMMARY 2:384 OS_BOOT Entrypoint 2:283 PAL_CACHE_WRITE 2:385 OS_INIT Entrypoint 2:283 PAL_COPY_INFO 2:388 OS_MCA Entrypoint 2:283 PAL_COPY_PAL 2:389 OS_RENDEZ Entrypoint 2:283 PAL_DEBUG_INFO 2:390 Performance Monitoring Support 2:620 PAL_FIXED_ADDR 2:391 Single Address Space 1:20, 2:565...
  • Page 415 INDEX PAL_VPS_RESUME_HANDLER 2:492 PMULHUW Instruction 4:572 PAL_VPS_RESUME_NORMAL 2:489 PMULHW Instruction 4:431 PAL_VPS_SAVE 2:500 PMULLW Instruction 4:433 PAL_VPS_SET_PENDING_INTERRUPT 2:495 PMV (Performance Monitoring Vector) 2:126 PAL_VPS_SYNC_READ 2:493 POP Instruction 4:311 PAL_VPS_SYNC_WRITE 2:494 POPA Instruction 4:315 PAL_VPS_THASH 2:497 POPAD Instruction 4:315 PAL_VPS_TTAG 2:498 popcnt Instruction 3:216 PAL-based Interruptions 2:95, 2:537 POPF Instruction 4:317...
  • Page 416 INDEX PSUBD Instruction 4:446 Resource Utilization Counter (RUC) 1:31, 2:33 PSUBSB Instruction 4:449 RET Instruction 4:340 PSUBSW Instruction 4:449 rfi Instruction 2:543, 3:236 PSUBUSB Instruction 4:452 RID (Region Identifier) 2:561 PSUBUSW Instruction 4:452 RNAT(RSE NaT Collection Register) 1:30 PSUBW Instruction 4:446 ROL Instruction 4:327 PTA (Page Table Address Register) 2:35 ROR Instruction 4:327...
  • Page 417 INDEX SIDT Instruction 4:359 Template Field Encoding 1:38 Single Step Trap 2:151 Templates 1:141 SLDT Instruction 4:367 TEST Instruction 4:381 SMSW Instruction 4:369 tf Instruction 3:263 Software Pipelining 1:19, 1:75, 1:145, 1:181 thash Instruction 3:265 Speculation 1:16, 1:142, 1:151 TLB (Translation Lookaside Buffer) 2:47, 2:565 Control Speculation 1:16 tnat Instruction 3:266 Data Speculation 1:17...
  • Page 418 INDEX WAIT Instruction 4:386 WAR Dependency 1:149 WAW Dependency 1:149 WBINVD Instruction 4:387 Write-after-read Dependency 1:149 Write-after-write Dependency 1:149 WRMSR Instruction 4:389 XADD Instruction 4:391 XCHG Instruction 4:393 xchg Instruction 2:508, 3:274 XLAT Instruction 4:395 XLATB Instruction 4:395 xma Instruction 3:276 xmpy Instruction 3:278 XOR Instruction 4:397 xor Instruction 3:279...
  • Page 419 INDEX Index:12 Index for Volumes 1, 2, 3 and 4...

This manual is also suitable for:

Itanium 9150m

Table of Contents