Intel ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS MANUAL VOLUME 4 REV 2.3 Manual
Intel ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS MANUAL VOLUME 4 REV 2.3 Manual

Intel ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS MANUAL VOLUME 4 REV 2.3 Manual

Hide thumbs Also See for ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS MANUAL VOLUME 4 REV 2.3:
Table of Contents

Advertisement

Quick Links

Advertisement

Table of Contents
loading
Need help?

Need help?

Do you have a question about the ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS MANUAL VOLUME 4 REV 2.3 and is the answer not in the manual?

Questions and answers

Summary of Contents for Intel ITANIUM ARCHITECTURE - SOFTWARE DEVELOPERS MANUAL VOLUME 4 REV 2.3

  • Page 2 ® ® Intel Itanium Architecture Software Developer’s Manual Volume 4: IA-32 Instruction Set Reference Revision 2.3 May 2010 Document Number: 323208...
  • Page 3 Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling1-800-548-4725, or by visiting Intel's website at http://www.intel.com.
  • Page 4: Table Of Contents

    Part 1: Application Architecture Guide ......4:1 1.1.2 Part 2: Optimization Guide for the Intel® Itanium® Architecture ..4:1 Overview of Volume 2: System Architecture.
  • Page 5 Feature Flags Returned in EDX Register ......4:80 ® ® Intel Itanium Architecture Software Developer’s Manual, Rev. 2.3...
  • Page 6 Key to SSE Naming Convention....... 4:485 § ® ® Intel Itanium Architecture Software Developer’s Manual, Rev. 2.3...
  • Page 7 ® ® Intel Itanium Architecture Software Developer’s Manual, Rev. 2.3...
  • Page 8: About This Manual

    IA-32 application interface. This volume also describes optimization techniques used to generate high performance software. 1.1.1 Part 1: Application Architecture Guide ® Chapter 1, “About this Manual” provides an overview of all volumes in the Intel ® Itanium Architecture Software Developer’s Manual. ® ®...
  • Page 9: Overview Of Volume 2: System Architecture

    1.2.1 Part 1: System Architecture Guide ® Chapter 1, “About this Manual” provides an overview of all volumes in the Intel ® Itanium Architecture Software Developer’s Manual. ® ®...
  • Page 10: Part 2: System Programmer's Guide

    Chapter 9, “IA-32 Interruption Vector Descriptions” lists IA-32 exceptions, interrupts and intercepts that can occur during IA-32 instruction set execution in the Itanium System Environment. ® Chapter 10, “Itanium Architecture-based Operating System Interaction Model with IA-32 Applications” defines the operation of IA-32 instructions within the Itanium System Environment from the perspective of an Itanium architecture-based operating system.
  • Page 11: Appendices

    Instruction Set Reference This volume is a comprehensive reference to the Itanium instruction set, including instruction format/encoding. ® Chapter 1, “About this Manual” provides an overview of all volumes in the Intel ® Itanium Architecture Software Developer’s Manual. Chapter 2, “Instruction Reference”...
  • Page 12: Related Documents

    These resources include instructions and registers. Itanium Architecture – The new ISA with 64-bit instruction capabilities, new performance- enhancing features, and support for the IA-32 instruction set. IA-32 Architecture – The 32-bit and 16-bit Intel architecture as described in the ® Intel 64 and IA-32 Architectures Software Developer’s Manual.
  • Page 13: Revision History

    Itanium architecture. ® • Intel 64 and IA-32 Architectures Software Developer’s Manual – This set of manuals describes the Intel 32-bit architecture. They are available from the Intel Literature Department by calling 1-800-548-4725 and requesting Document Numbers 243190, 243191and 243192. ®...
  • Page 14 Date of Revision Description Revision Number August 2005 Allow register fields in CR.LID register to be read-only and CR.LID checking on interruption messages by processors optional. See Vol 2, Part I, Ch 5 “Interruptions” and Section 11.2.2 PALE_RESET Exit State for details. Relaxed reserved and ignored fields checkings in IA-32 application registers in Vol 1 Ch 6 and Vol 2, Part I, Ch 10.
  • Page 15 Date of Revision Description Revision Number August 2002 Added Predicate Behavior of alloc Instruction Clarification (Section 4.1.2, Part I, Volume 1; Section 2.2, Part I, Volume 3). Added New fc.i Instruction (Section 4.4.6.1, and 4.4.6.2, Part I, Volume 1; Section 4.3.3, 4.4.1, 4.4.5, 4.4.6, 4.4.7, 5.5.2, and 7.1.2, Part I, Volume 2; Section 2.5, 2.5.1, 2.5.2, 2.5.3, and 4.5.2.1, Part II, Volume 2;...
  • Page 16 Date of Revision Description Revision Number Volume 2: Class pr-writers-int clarification (Table A-5). PAL_MC_DRAIN clarification (Section 4.4.6.1). VHPT walk and forward progress change (Section 4.1.1.2). IA-32 IBR/DBR match clarification (Section 7.1.1). ISR figure changes (pp. 8-5, 8-26, 8-33 and 8-36). PAL_CACHE_FLUSH return argument change –...
  • Page 17 Date of Revision Description Revision Number Volume 2: Clarifications regarding “reserved” fields in ITIR (Chapter 3). Instruction and Data translation must be enabled for executing IA-32 instructions (Chapters 3,4 and 10). FCR/FDR mappings, and clarification to the value of PSR.ri after an RFI (Chapters 3 and 4).
  • Page 18: Base Ia-32 Instruction Reference

    The following fault behavior is defined for all IA-32 instructions in the Itanium System Environment: ® • IA-32 Faults – All IA-32 faults are performed as defined in the Intel 64 and IA-32 Architectures Software Developer’s Manual, unless otherwise noted.
  • Page 19 It also explains the notational conventions and abbreviations used in these sections. 2.2.1 IA-32 Instruction Format The following is an example of the format used for each Intel architecture instruction description in this chapter. 2.2.1.0.0.1 CMC—Complement Carry Flag...
  • Page 20: Register Encodings Associated With The +Rb, +Rw, And +Rd Nomenclature

    2.2.1.1 Opcode Column The “Opcode” column gives the complete object code produced for each form of the instruction. When possible, the codes are given as hexadecimal bytes, in the same order in which they appear in memory. Definitions of entries other than hexadecimal bytes are as follows: •...
  • Page 21 • ptr16:16 and ptr16:32 – A far pointer, typically in a code segment different from that of the instruction. The notation 16:16 indicates that the value of the pointer has two parts. The value to the left of the colon is a 16-bit selector or value destined for the code segment register.
  • Page 22: Operation

    memory addressing modes are allowed. The m16&16 and m32&32 operands are used by the BOUND instruction to provide an operand containing an upper and lower bounds for array indices. The m16&32 operand is used by LIDT and LGDT to provide a word with which to load the limit field, and a doubleword with which to load the base field of the corresponding GDTR and IDTR registers.
  • Page 23 • StackAddrSize – Represents the stack address-size attribute associated with the instruction, which has a value of 16 or 32 bits (see “Address-Size Attribute for Stack” in Chapter 4 of the Intel Architecture Software Developer’s Manual, Volume • SRC – Represents the source operand.
  • Page 24 • SignExtend(value) – Returns a value sign-extended to the operand-size attribute of the instruction. For example, if the operand-size attribute is 32, sign extending a byte containing the value -10 converts the byte from F6H to a doubleword value of FFFFFFF6H.
  • Page 25: Flags Affected

    When a flag is cleared, it is equal to 0; when it is set, it is equal to 1. The arithmetic and logical instructions usually assign values to the status flags in a uniform manner (see Appendix A, EFLAGS Cross-Reference, in the Intel Architecture Software Developer’s Manual, Volume 1). Non-conventional assignments are described in the “Operation”...
  • Page 26: Protected Mode Exceptions

    See Chapter 5, Interrupt and Exception Handling, in the Intel Architecture Software Developer’s Manual, Volume 3, for a detailed description of the exceptions. Application programmers should consult the documentation provided with their operating systems to determine the actions taken when exceptions occur.
  • Page 27: Floating-Point Exceptions

    FPU numeric overflow Floating-point numeric underflow FPU numeric underflow Floating-point inexact result (precision) Inexact result (precision) IA-32 Base Instruction Reference The remainder of this chapter provides detailed descriptions of each of the Intel architecture instructions. 4:20 Volume 4: Base IA-32 Instruction Reference...
  • Page 28 AAA—ASCII Adjust After Addition Opcode Instruction Description ASCII adjust AL after addition Description Adjusts the sum of two unpacked BCD values to create an unpacked BCD result. The AL register is the implied source and destination operand for this instruction. The AAA instruction is only useful when it follows an ADD instruction that adds (binary addition) two unpacked BCD values and stores a byte result in the AL register.
  • Page 29 AAD—ASCII Adjust AX Before Division Opcode Instruction Description ASCII adjust AX before division Description Adjusts two unpacked BCD digits (the least-significant digit in the AL register and the most-significant digit in the AH register) so that a division operation performed on the result will yield a correct unpacked BCD value.
  • Page 30 AAM—ASCII Adjust AX After Multiply Opcode Instruction Description ASCII adjust AX after multiply Description Adjusts the result of the multiplication of two unpacked BCD values to create a pair of unpacked BCD values. The AX register is the implied source and destination operand for this instruction.
  • Page 31 AAS—ASCII Adjust AL After Subtraction Opcode Instruction Description ASCII adjust AL after subtraction Description Adjusts the result of the subtraction of two unpacked BCD values to create a unpacked BCD result. The AL register is the implied source and destination operand for this instruction.
  • Page 32 ADC—Add with Carry Opcode Instruction Description 14 ib ADC AL,imm8 Add with carry imm8 to AL 15 iw ADC AX,imm16 Add with carry imm16 to AX 15 id ADC EAX,imm32 Add with carry imm32 to EAX 80 /2 ib ADC r/m8,imm8 Add with carry imm8 to r/m8 81 /2 iw ADC r/m16,imm16...
  • Page 33 ADC—Add with Carry (Continued) Protected Mode Exceptions #GP(0) If the destination is located in a nonwritable segment. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register is used to access memory and it contains a null segment selector.
  • Page 34 ADD—Add Opcode Instruction Description 04 ib ADD AL,imm8 Add imm8 to AL 05 iw ADD AX,imm16 Add imm16 to AX 05 id ADD EAX,imm32 Add imm32 to EAX 80 /0 ib ADD r/m8,imm8 Add imm8 to r/m8 81 /0 iw ADD r/m16,imm16 Add imm16 to r/m16 81 /0 id...
  • Page 35 ADD—Add (Continued) Protected Mode Exceptions #GP(0) If the destination is located in a nonwritable segment. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register is used to access memory and it contains a null segment selector.
  • Page 36 AND—Logical AND Opcode Instruction Description 24 ib AND AL,imm8 AL AND imm8 25 iw AND AX,imm16 AX AND imm16 25 id AND EAX,imm32 EAX AND imm32 80 /4 ib AND r/m8,imm8 r/m8 AND imm8 81 /4 iw AND r/m16,imm16 r/m16 AND imm16 81 /4 id AND r/m32,imm32 r/m32 AND imm32...
  • Page 37 AND—Logical AND (Continued) #PF(fault-code) If a page fault occurs. #AC(0) If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3. Real Address Mode Exceptions If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit.
  • Page 38 (The segment selector for the application program’s code segment can be read from the procedure stack following a procedure call.) See the Intel Architecture Software Developer’s Manual, Volume 3 for more information about the use of this instruction. Operation IF DEST(RPL) <...
  • Page 39 ARPL—Adjust RPL Field of Segment Selector (Continued) Additional Itanium System Environment Exceptions Itanium Reg Faults NaT Register Consumption Abort. Itanium Mem FaultsVHPT Data Fault, Nested TLB Fault, Data TLB Fault, Alternate Data TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault Protected Mode Exceptions...
  • Page 40 BOUND—Check Array Index Against Bounds Opcode Instruction Description 62 /r BOUND r16,m16&16 Check if r16 (array index) is within bounds specified by m16&16 62 /r BOUND r32,m32&32 Check if r32 (array index) is within bounds specified by m16&16 Description Determines if the first operand (array index) is within the bounds of an array specified the second operand (bounds operand).
  • Page 41 BOUND—Check Array Index Against Bounds (Continued) Protected Mode Exceptions If the bounds test fails. If second operand is not a memory location. #GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register contains a null segment selector.
  • Page 42 BSF—Bit Scan Forward Opcode Instruction Description 0F BC BSF r16,r/m16 Bit scan forward on r/m16 0F BC BSF r32,r/m32 Bit scan forward on r/m32 Description Searches the source operand (second operand) for the least significant set bit (1 bit). If a least significant 1 bit is found, its bit index is stored in the destination operand (first operand).
  • Page 43 BSF—Bit Scan Forward (Continued) Protected Mode Exceptions #GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register contains a null segment selector. #SS(0) If a memory operand effective address is outside the SS segment limit.
  • Page 44 BSR—Bit Scan Reverse Opcode Instruction Description 0F BD BSR r16,r/m16 Bit scan reverse on r/m16 0F BD BSR r32,r/m32 Bit scan reverse on r/m32 Description Searches the source operand (second operand) for the most significant set bit (1 bit). If a most significant 1 bit is found, its bit index is stored in the destination operand (first operand).
  • Page 45 BSR—Bit Scan Reverse (Continued) Protected Mode Exceptions #GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register contains a null segment selector. #SS(0) If a memory operand effective address is outside the SS segment limit.
  • Page 46 Exceptions (All Operating Modes) None. Intel Architecture Compatibility Information The BSWAP instruction is not supported on Intel architecture processors earlier than the Intel486™ processor family. For compatibility with this instruction, include functionally-equivalent code for execution on Intel processors earlier than the Intel486 processor family.
  • Page 47 BT—Bit Test Opcode Instruction Description 0F A3 BT r/m16,r16 Store selected bit in CF flag 0F A3 BT r/m32,r32 Store selected bit in CF flag 0F BA /4 ib BT r/m16,imm8 Store selected bit in CF flag 0F BA /4 ib BT r/m32,imm8 Store selected bit in CF flag Description...
  • Page 48 BT—Bit Test (Continued) Additional Itanium System Environment Exceptions Itanium Reg Faults NaT Register Consumption Abort. Itanium Mem FaultsVHPT Data Fault, Nested TLB Fault, Data TLB Fault, Alternate Data TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault Protected Mode Exceptions #GP(0)
  • Page 49 BTC—Bit Test and Complement Opcode Instruction Description 0F BB BTC r/m16,r16 Store selected bit in CF flag and complement 0F BB BTC r/m32,r32 Store selected bit in CF flag and complement 0F BA /7 ib BTC r/m16,imm8 Store selected bit in CF flag and complement 0F BA /7 ib BTC r/m32,imm8 Store selected bit in CF flag and complement...
  • Page 50 BTC—Bit Test and Complement (Continued) Protected Mode Exceptions #GP(0) If the destination operand points to a non-writable segment. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register contains a null segment selector. #SS(0) If a memory operand effective address is outside the SS segment limit.
  • Page 51 BTR—Bit Test and Reset Opcode Instruction Description 0F B3 BTR r/m16,r16 Store selected bit in CF flag and clear 0F B3 BTR r/m32,r32 Store selected bit in CF flag and clear 0F BA /6 ib BTR r/m16,imm8 Store selected bit in CF flag and clear 0F BA /6 ib BTR r/m32,imm8 Store selected bit in CF flag and clear...
  • Page 52 BTR—Bit Test and Reset (Continued) Protected Mode Exceptions #GP(0) If the destination operand points to a nonwritable segment. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register contains a null segment selector. #SS(0) If a memory operand effective address is outside the SS segment limit.
  • Page 53 BTS—Bit Test and Set Opcode Instruction Description 0F AB BTS r/m16,r16 Store selected bit in CF flag and set 0F AB BTS r/m32,r32 Store selected bit in CF flag and set 0F BA /5 ib BTS r/m16,imm8 Store selected bit in CF flag and set 0F BA /5 ib BTS r/m32,imm8 Store selected bit in CF flag and set...
  • Page 54 BTS—Bit Test and Set (Continued) Protected Mode Exceptions #GP(0) If the destination operand points to a nonwritable segment. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register contains a null segment selector. #SS(0) If a memory operand effective address is outside the SS segment limit.
  • Page 55 IA-32_Intercept(Gate) in Itanium System Environment. The latter two call types (inter-privilege-level call and task switch) can only be executed in protected mode. See Chapter 6 in the Intel Architecture Software Developer’s Manual, Volume 3 for information on task switching with the CALL instruction.
  • Page 56 CALL—Call Procedure (Continued) When executing a near call, the operand-size attribute determines the size of the target operand (16 or 32 bits) for absolute addresses. Absolute addresses are loaded directly into the EIP register. When a relative offset is specified, it is added to the value of the EIP register.
  • Page 57 The CALL instruction can also specify the segment selector of the TSS directly. See the Intel Architecture Software Developer’s Manual, Volume 3 the for detailed information on the mechanics of a task switch.
  • Page 58 CALL—Call Procedure (Continued) Push(IP); CS  DEST[31:16]; (* DEST is ptr16:16 or [m16:16] *) EIP  DEST[15:0]; (* DEST is ptr16:16 or [m16:16] *) EIP  EIP AND 0000FFFFH; (* clear upper 16 bits *) IF Itanium System Environment AND PSR.tb THEN IA_32_Exception(Debug); IF far call AND (PE = 1 AND VM = 0) (* Protected mode, not virtual 8086 mode *) THEN IF segment selector in target operand null THEN #GP(0);...
  • Page 59 CALL—Call Procedure (Continued) IF stack not large enough for return address THEN #SS(0); FI; tempEIP  DEST(offset) IF OperandSize=16 THEN tempEIP  tempEIP AND 0000FFFFH; (* clear upper 16 bits *) IF tempEIP outside code segment limit THEN #GP(0); FI; IF OperandSize = 32 THEN Push(CS);...
  • Page 60 CALL—Call Procedure (Continued) TSSstackAddress  new code segment (DPL  4) + 2 IF (TSSstackAddress + 4)  TSS limit THEN #TS(current TSS selector); FI; newESP  TSSstackAddress; newSS  TSSstackAddress + 2; IF stack segment selector is null THEN #TS(stack segment selector); FI; IF stack segment selector index is not within its descriptor table limits THEN #TS(SS selector);...
  • Page 61 CALL—Call Procedure (Continued) IF EIP not within code segment limit then #GP(0); FI; CS:EIP  CallGate(CS:EIP) (* segment descriptor information also loaded *) Push(oldCS:oldEIP); (* return address to calling procedure *) ELSE (* CallGateSize = 16 *) IF stack does not have room for parameters plus 4 bytes THEN #SS(0);...
  • Page 62 CALL—Call Procedure (Continued) THEN #GP(0); END; Flags Affected All flags are affected if a task switch occurs; no flags are affected if a task switch does not occur. Additional Itanium System Environment Exceptions Itanium Mem FaultsVHPT Data Fault, Nested TLB Fault, Data TLB Fault, Alternate Data TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault...
  • Page 63 CALL—Call Procedure (Continued) #SS(0) If pushing the return address, parameters, or stack segment pointer onto the stack exceeds the bounds of the stack segment, when no stack switch occurs. If a memory operand effective address is outside the SS segment limit.
  • Page 64 CBW/CWDE—Convert Byte to Word/Convert Word to Doubleword Opcode Instruction Description AX  sign-extend of AL EAX  sign-extend of AX CWDE Description Double the size of the source operand by means of sign extension. The CBW (convert byte to word) instruction copies the sign (bit 7) in the source operand into every bit in the AH register.
  • Page 65 CDQ—Convert Double to Quad See entry for CWD/CDQ — Convert Word to Double/Convert Double to Quad. 4:58 Volume 4: Base IA-32 Instruction Reference...
  • Page 66 CLC—Clear Carry Flag Opcode Instruction Description Clear CF flag Description Clears the CF flag in the EFLAGS register. Operation CF  0; Flags Affected The CF flag is cleared to 0. The OF, ZF, SF, AF, and PF flags are unaffected. Exceptions (All Operating Modes) None.
  • Page 67 CLD—Clear Direction Flag Opcode Instruction Description Clear DF flag Description Clears the DF flag in the EFLAGS register. When the DF flag is set to 0, string operations increment the index registers (ESI and/or EDI). Operation DF  0; Flags Affected The DF flag is cleared to 0.
  • Page 68 CLI—Clear Interrupt Flag Opcode Instruction Description Clear interrupt flag; interrupts disabled when interrupt flag cleared Description Clears the IF flag in the EFLAGS register. No other flags are affected. Clearing the IF flag causes the processor to ignore maskable external interrupts. The IF flag and the CLI and STI instruction have no affect on the generation of exceptions and NMI interrupts.
  • Page 69 CLI—Clear Interrupt Flag (Continued) ELSE (*CR4.PVI==0 *) IF IOPL < CPL THEN #GP(0); ELSE IF <- 0; ELSE (* Executing in Virtual-8086 mode *) IF IOPL = 3 THEN IF  ELSE IF CR4.VME= 0 THEN #GP(0); ELSE VIF <- 0; IF Itanium System Environment AND CFLG.ii AND IF != OLD_IF THEN IA-32_Intercept(System_Flag,CLI);...
  • Page 70 The processor sets the TS flag every time a task switch occurs. The flag is used to synchronize the saving of FPU context in multitasking applications. See the description of the TS flag in the Intel Architecture Software Developer’s Manual, Volume 3 for more information about this flag.
  • Page 71 CMC—Complement Carry Flag Opcode Instruction Description Complement CF flag Description Complements the CF flag in the EFLAGS register. Operation CF  NOT CF; Flags Affected The CF flag contains the complement of its original value. The OF, ZF, SF, AF, and PF flags are unaffected.
  • Page 72 CMOVcc—Conditional Move Opcode Instruction Description 0F 47 cw/cd CMOVA r16, r/m16 Move if above (CF=0 and ZF=0) 0F 47 cw/cd CMOVA r32, r/m32 Move if above (CF=0 and ZF=0) 0F 43 cw/cd CMOVAE r16, r/m16 Move if above or equal (CF=0) 0F 43 cw/cd CMOVAE r32, r/m32 Move if above or equal (CF=0)
  • Page 73 CMOVcc—Conditional Move (Continued) Opcode Instruction Description 0F 41 cw/cd CMOVNO r16, r/m16 Move if not overflow (OF=0) 0F 41 cw/cd CMOVNO r32, r/m32 Move if not overflow (OF=0) 0F 4B cw/cd CMOVNP r16, r/m16 Move if not parity (PF=0) 0F 4B cw/cd CMOVNP r32, r/m32 Move if not parity (PF=0) 0F 49 cw/cd...
  • Page 74 CMOVcc—Conditional Move (Continued) The CMOVcc instructions are new for the Pentium Pro processor family; however, they may not be supported by all the processors in the family. Software can determine if the CMOVcc instructions are supported by checking the processor’s feature information with the CPUID instruction (see “CPUID—CPU Identification”...
  • Page 75 CMOVcc—Conditional Move (Continued) Virtual 8086 Mode Exceptions #GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. #SS(0) If a memory operand effective address is outside the SS segment limit. #PF(fault-code) If a page fault occurs. #AC(0) If alignment checking is enabled and an unaligned memory reference is made.
  • Page 76 CMP—Compare Two Operands Opcode Instruction Description 3C ib CMP AL, imm8 Compare imm8 with AL 3D iw CMP AX, imm16 Compare imm16 with AX 3D id CMP EAX, imm32 Compare imm32 with EAX 80 /7 ib CMP r/m8, imm8 Compare imm8 with r/m8 81 /7 iw CMP r/m16, imm16 Compare imm16 with r/m16...
  • Page 77 CMP—Compare Two Operands (Continued) Protected Mode Exceptions #GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register contains a null segment selector. #SS(0) If a memory operand effective address is outside the SS segment limit.
  • Page 78 CMPS/CMPSB/CMPSW/CMPSD—Compare String Operands Opcode Instruction Description CMPS DS:(E)SI, ES:(E)DI Compares byte at address DS:(E)SI with byte at address ES:(E)DI and sets the status flags accordingly CMPS DS:SI, ES:DI Compares byte at address DS:SI with byte at address ES:DI and sets the status flags accordingly CMPS DS:ESI, ES:EDI Compares byte at address DS:ESI with byte at address ES:EDI and sets the status flags accordingly...
  • Page 79 CMPS/CMPSB/CMPSW/CMPSD—Compare String Operands (Continued) Operation temp SRC1  SRC2; SetStatusFlags(temp); IF (byte comparison) THEN IF DF = 0 THEN (E)DI  1; (E)SI  1; ELSE (E)DI  -1; (E)SI  -1; ELSE IF (word comparison) THEN IF DF = 0 THEN DI ...
  • Page 80 CMPS/CMPSB/CMPSW/CMPSD—Compare String Operands (Continued) Virtual 8086 Mode Exceptions #GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. #SS(0) If a memory operand effective address is outside the SS segment limit. #PF(fault-code) If a page fault occurs.
  • Page 81 CMPXCHG—Compare and Exchange Opcode Instruction Description 0F B0/r CMPXCHG r/m8,r8 Compare AL with r/m8. If equal, ZF is set and r8 is loaded into r/m8. Else, clear ZF and load r/m8 into AL. 0F B1/r CMPXCHG r/m16,r16 Compare AX with r/m16. If equal, ZF is set and r16 is loaded into r/m16.
  • Page 82 If a page fault occurs. #AC(0) If alignment checking is enabled and an unaligned memory reference is made. Intel Architecture Compatibility This instruction is not supported on Intel processors earlier than the Intel486 processors. Volume 4: Base IA-32 Instruction Reference 4:75...
  • Page 83 CMPXCHG8B—Compare and Exchange 8 Bytes Opcode Instruction Description 0F C7 /1 m64 CMPXCHG8B m64 Compare EDX:EAX with m64. If equal, set ZF and load ECX:EBX into m64. Else, clear ZF and load m64 into EDX:EAX. Description Compares the 64-bit value in EDX:EAX with the operand (destination operand). If the values are equal, the 64-bit value in ECX:EBX is stored in the destination operand.
  • Page 84 If a page fault occurs. #AC(0) If alignment checking is enabled and an unaligned memory reference is made. Intel Architecture Compatibility This instruction is not supported on Intel processors earlier than the Pentium processors. Volume 4: Base IA-32 Instruction Reference 4:77...
  • Page 85: Information Returned By Cpuid Instruction

    CPUID—CPU Identification Opcode Instruction Description 0F A2 CPUID Returns processor identification and feature information in the EAX, EBX, ECX, and EDX registers, according to the input value entered initially in the EAX register. Description Returns processor identification and feature information in the EAX, EBX, ECX, and EDX registers.
  • Page 86: Version Information In Registers Eax

    When the input value is 1, the processor returns version information in the EAX register (see Figure 2-4). The version information consists of an Intel architecture family identifier, a model identifier, a stepping ID, and a processor type. Figure 2-4.
  • Page 87: Feature Flags Returned In Edx Register

    A feature flag set to 1 indicates the corresponding feature is supported. Software should identify Intel as the vendor to properly interpret the feature flags. Table 2-5. Feature Flags Returned in EDX Register...
  • Page 88 Table 2-5. Feature Flags Returned in EDX Register (Continued) Mnemonic Description Physical Address Extension. Physical addresses greater than 32 bits are supported: extended page table entry formats, an extra level in the page translation tables is defined, 2 Mbyte pages are supported instead of 4 Mbyte pages if PAE bit is 1.
  • Page 89 Thermal Monitor. The processor implements the thermal monitor automatic thermal control circuitry (TCC). Processor based on the Intel The processor is based on the Intel Itanium architecture and is Itanium architecture capable of executing the Intel Itanium instruction set. IA-32 application...
  • Page 90 CPUID—CPU Identification (Continued) Operation CASE (EAX) OF EAX = 0H: EAX  Highest input value understood by CPUID; EBX  Vendor identification string; EDX  Vendor identification string; ECX  Vendor identification string; BREAK; EAX = 1H: EAX[3:0]  Stepping ID; EAX[7:4] ...
  • Page 91 The CPUID instruction is not supported in early models of the Intel486 processor or in any Intel architecture processor earlier than the Intel486 processor. The ID flag in the EFLAGS register can be used to determine if this instruction is supported. If a procedure is able to set or clear this flag, the CPUID is supported by the processor running the procedure.
  • Page 92 CWD/CDQ—Convert Word to Doubleword/Convert Doubleword to Quadword Opcode Instruction Description DX:AX  sign-extend of AX EDX:EAX  sign-extend of EAX Description Doubles the size of the operand in register AX or EAX (depending on the operand size) by means of sign extension and stores the result in registers DX:AX or EDX:EAX, respectively.
  • Page 93 CWDE—Convert Word to Doubleword See entry for CBW/CWDE—Convert Byte to Word/Convert Word to Doubleword. 4:86 Volume 4: Base IA-32 Instruction Reference...
  • Page 94 DAA—Decimal Adjust AL after Addition Opcode Instruction Description Decimal adjust AL after addition Description Adjusts the sum of two packed BCD values to create a packed BCD result. The AL register is the implied source and destination operand. The DAA instruction is only useful when it follows an ADD instruction that adds (binary addition) two 2-digit, packed BCD values and stores a byte result in the AL register.
  • Page 95 DAS—Decimal Adjust AL after Subtraction Opcode Instruction Description Decimal adjust AL after subtraction Description Adjusts the result of the subtraction of two packed BCD values to create a packed BCD result. The AL register is the implied source and destination operand. The DAS instruction is only useful when it follows a SUB instruction that subtracts (binary subtraction) one 2-digit, packed BCD value from another and stores a byte result in the AL register.
  • Page 96 DEC—Decrement by 1 Opcode Instruction Description FE /1 DEC r/m8 Decrement r/m8 by 1 FF /1 DEC r/m16 Decrement r/m16 by 1 FF /1 DEC r/m32 Decrement r/m32 by 1 48+rw DEC r16 Decrement r16 by 1 48+rd DEC r32 Decrement r32 by 1 Description Subtracts 1 from the operand, while preserving the state of the CF flag.
  • Page 97 DEC—Decrement by 1 (Continued) Virtual 8086 Mode Exceptions #GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. #SS(0) If a memory operand effective address is outside the SS segment limit. #PF(fault-code) If a page fault occurs.
  • Page 98 DIV—Unsigned Divide Opcode Instruction Description  F6 /6 DIV r/m8 Unsigned divide AX by r/m8; AL Quotient,  Remainder  F7 /6 DIV r/m16 Unsigned divide DX:AX by r/m16; AX Quotient,  Remainder F7 /6 DIV r/m32 Unsigned divide EDX:EAX by r/m32 doubleword; ...
  • Page 99 DIV—Unsigned Divide (Continued) ELSE (* quadword/doubleword operation *) temp  EDX:EAX / SRC; IF temp > FFFFFFFFH THEN #DE; (* divide error *) ; ELSE EAX  temp; EDX  EDX:EAX MOD SRC; Flags Affected The CF, OF, SF, ZF, AF, and PF flags are undefined. Additional Itanium System Environment Exceptions Itanium Reg Faults NaT Register Consumption Abort.
  • Page 100 DIV—Unsigned Divide (Continued) Virtual 8086 Mode Exceptions If the source operand (divisor) is 0. If the quotient is too large for the designated register. #GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If a memory operand effective address is outside the SS segment limit.
  • Page 101 ENTER—Make Stack Frame for Procedure Parameters Opcode Instruction Description C8 iw 00 ENTER imm16,0 Create a stack frame for a procedure C8 iw 01 ENTER imm16,1 Create a nested stack frame for a procedure C8 iw ib ENTER imm16,imm8 Create a nested stack frame for a procedure Description Creates a stack frame for a procedure.
  • Page 102 ENTER—Make Stack Frame for Procedure Parameters (Continued) IF StackSize = 32 EBP  EBP  4; Push([EBP]); (* doubleword push *) ELSE (* StackSize = 16*) BP  BP  4; Push([BP]); (* doubleword push *) ELSE (* OperandSize = 16 *) IF StackSize = 32 THEN EBP ...
  • Page 103 ENTER—Make Stack Frame for Procedure Parameters (Continued) Protected Mode Exceptions #SS(0) If the new value of the SP or ESP register is outside the stack segment limit. #PF(fault-code) If a page fault occurs. Real Address Mode Exceptions None. Virtual 8086 Mode Exceptions None.
  • Page 104 F2XM1—Compute 2 Opcode Instruction Description ST(0) D9 F0 F2XM1 Replace ST(0) with (2 - 1) Description Calculates the exponential value of 2 to the power of the source operand minus 1. The source operand is located in register ST(0) and the result is also stored in ST(0). The value of the source operand must lie in the range -1.0 to +1.0.
  • Page 105 F2XM1—Compute 2 -1 (Continued) Protected Mode Exceptions EM or TS in CR0 is set. Real Address Mode Exceptions EM or TS in CR0 is set. Virtual 8086 Mode Exceptions EM or TS in CR0 is set. 4:98 Volume 4: Base IA-32 Instruction Reference...
  • Page 106 FABS—Absolute Value Opcode Instruction Description D9 E1 FABS Replace ST with its absolute value. Description Clears the sign bit of ST(0) to create the absolute value of the operand. The following table shows the results obtained when creating the absolute value of various classes of numbers.
  • Page 107 FADD/FADDP/FIADD—Add Opcode Instruction Description D8 /0 FADD m32 real Add m32real to ST(0) and store result in ST(0) DC /0 FADD m64real Add m64real to ST(0) and store result in ST(0) D8 C0+i FADD ST(0), ST(i) Add ST(0) to ST(i) and store result in ST(0) DC C0+i FADD ST(i), ST(0) Add ST(i) to ST(0) and store result in ST(i)
  • Page 108 FADD/FADDP/FIADD—Add (Continued) DEST F 0 -• + - - - - - - F or I F F or 0 - + 0 0 0 - DEST DEST + 0 - DEST DEST + F or 0 +For +I - +...
  • Page 109 FADD/FADDP/FIADD—Add (Continued) Floating-point Exceptions Stack underflow occurred. Operand is an SNaN value or unsupported format. Operands are infinities of unlike sign. Result is a denormal value. Result is too small for destination format. Result is too large for destination format. Value cannot be represented exactly in destination format.
  • Page 110 FBLD—Load Binary Coded Decimal Opcode Instruction Description DF /4 FBLD m80 dec Convert BCD value to real and push onto the FPU stack. Description Converts the BCD source operand into extended-real format and pushes the value onto the FPU stack. The source operand is loaded without rounding errors. The sign of the source operand is preserved, including that of 0.
  • Page 111 FBLD—Load Binary Coded Decimal (Continued) Real Address Mode Exceptions If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If a memory operand effective address is outside the SS segment limit. EM or TS in CR0 is set. Virtual 8086 Mode Exceptions #GP(0) If a memory operand effective address is outside the CS, DS, ES, FS,...
  • Page 112 FBSTP—Store BCD Integer and Pop Opcode Instruction Description DF /6 FBSTP m80bcd Store ST(0) in m80bcd and pop ST(0). Description Converts the value in the ST(0) register to an 18-digit packed BCD integer, stores the result in the destination operand, and pops the register stack. If the source value is a non-integral value, it is rounded to an integer value, according to rounding mode specified by the RC field of the FPU control word.
  • Page 113 FBSTP—Store BCD Integer and Pop (Continued) FPU Flags Affected Set to 0 if stack underflow occurred. Indicates rounding direction if the inexact exception (#P) is generated: 0 = not roundup; 1 = roundup. C0, C2, C3 Undefined. Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR.dfl is 1, NaT Register Consumption Abort.
  • Page 114 FBSTP—Store BCD Integer and Pop (Continued) Virtual 8086 Mode Exceptions #GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. #SS(0) If a memory operand effective address is outside the SS segment limit.
  • Page 115 FCHS—Change Sign Opcode Instruction Description D9 E0 FCHS Complements sign of ST(0) Description Complements the sign bit of ST(0). This operation changes a positive value into a negative value of equal magnitude or vice-versa. The following table shows the results obtained when creating the absolute value of various classes of numbers.
  • Page 116 FCLEX/FNCLEX—Clear Exceptions Opcode Instruction Description 9B DB E2 FCLEX Clear floating-point exception flags after checking for pending unmasked floating-point exceptions. DB E2 FNCLEX Clear floating-point exception flags without checking for pending unmasked floating-point exceptions. Description Clears the floating-point exception flags (PE, UE, OE, ZE, DE, and IE), the exception summary status flag (ES), the stack fault flag (SF), and the busy flag (B) in the FPU status word.
  • Page 117 FCMOVcc—Floating-point Conditional Move Opcode Instruction Description DA C0+i FCMOVB ST(0), ST(i) Move if below (CF=1) DA C8+i FCMOVE ST(0), ST(i) Move if equal (ZF=1) DA D0+i FCMOVBE ST(0), ST(i) Move if below or equal (CF=1 or ZF=1) DA D8+i FCMOVU ST(0), ST(i) Move if unordered (PF=1) DB C0+i FCMOVNB ST(0), ST(i)
  • Page 118 FCMOVcc—Floating-point Conditional Move (Continued) Protected Mode Exceptions EM or TS in CR0 is set. Real Address Mode Exceptions EM or TS in CR0 is set. Virtual 8086 Mode Exceptions EM or TS in CR0 is set. Volume 4: Base IA-32 Instruction Reference 4:111...
  • Page 119 FCOM/FCOMP/FCOMPP—Compare Real Opcode Instruction Description D8 /2 FCOM m32real Compare ST(0) with m32real. DC /2 FCOM m64real Compare ST(0) with m64real. D8 D0+i FCOM ST(i) Compare ST(0) with ST(i). D8 D1 FCOM Compare ST(0) with ST(1). D8 /3 FCOMP m32real Compare ST(0) with m32real and pop register stack.
  • Page 120 FCOM/FCOMP/FCOMPP—Compare Real (Continued) Operation CASE (relation of operands) OF C3, C2, C0  000; ST > SRC: C3, C2, C0  001; ST < SRC: C3, C2, C0  100; ST = SRC: ESAC; IF ST(0) or SRC = NaN or unsupported format THEN IF FPUControlWord.IM = 1 THEN...
  • Page 121 FCOM/FCOMP/FCOMPP—Compare Real (Continued) Protected Mode Exceptions #GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register contains a null segment selector. #SS(0) If a memory operand effective address is outside the SS segment limit.
  • Page 122 FCOMI/FCOMIP/ FUCOMI/FUCOMIP—Compare Real and Set EFLAGS Opcode Instruction Description DB F0+i FCOMI ST, ST(i) Compare ST(0) with ST(i) and set status flags accordingly DF F0+i FCOMIP ST, ST(i) Compare ST(0) with ST(i), set status flags accordingly, and pop register stack DB E8+i FUCOMI ST, ST(i) Compare ST(0) with ST(i), check for ordered values, and set...
  • Page 123 FCOMI/FCOMIP/ FUCOMI/FUCOMIP—Compare Real and Set EFLAGS (Continued) Operation CASE (relation of operands) OF ZF, PF, CF  000; ST(0) > ST(i): ZF, PF, CF  001; ST(0) < ST(i): ZF, PF, CF  100; ST(0) = ST(i): ESAC; IF instruction is FCOMI or FCOMIP THEN IF ST(0) or ST(i) = NaN or unsupported format THEN...
  • Page 124 FCOMI/FCOMIP/ FUCOMI/FUCOMIP—Compare Real and Set EFLAGS (Continued) Floating-point Exceptions Stack underflow occurred. (FCOMI or FCOMIP instruction) One or both operands are NaN values or have unsupported formats. (FUCOMI or FUCOMIP instruction) One or both operands are SNaN values (but not QNaNs) or have undefined formats. Detection of a QNaN value does not raise an invalid-operand exception.
  • Page 125 FCOS—Cosine Opcode Instruction Description D9 FF FCOS Replace ST(0) with its cosine Description Calculates the cosine of the source operand in register ST(0) and stores the result in ST(0). The source operand must be given in radians and must be within the range 2 to +2 .
  • Page 126 FCOS—Cosine (Continued) FPU Flags Affected Set to 0 if stack underflow occurred. Indicates rounding direction if the inexact-result exception (#P) is generated: 0 = not roundup; 1 = roundup. Undefined if C2 is 1. Set to 1 if source operand is outside the range 2 to +2 otherwise, cleared to 0.
  • Page 127 FDECSTP—Decrement Stack-Top Pointer Opcode Instruction Description D9 F6 FDECSTP Decrement TOP field in FPU status word. Description Subtracts one from the TOP field of the FPU status word (decrements the top-of-stack pointer). The contents of the FPU data registers and tag register are not affected. Operation IF TOP = 0 THEN TOP ...
  • Page 128 FDIV/FDIVP/FIDIV—Divide Opcode Instruction Description D8 /6 FDIV m32real Divide ST(0) by m32real and store result in ST(0) DC /6 FDIV m64real Divide ST(0) by m64real and store result in ST(0) D8 F0+i FDIV ST(0), ST(i) Divide ST(0) by ST(i) and store result in ST(0) DC F8+i FDIV ST(i), ST(0) Divide ST(i) by ST(0) and store result in ST(i)
  • Page 129 FDIV/FDIVP/FIDIV—Divide (Continued) DEST F 0 -• + 0 0 - F 0 F + -• I 0 F + -• 0 + -• -• + F 0 -• + F 0 -• + 0 0 + Notes: Fmeans finite-real number. Imeans integer.
  • Page 130 FDIV/FDIVP/FIDIV—Divide (Continued) Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR.dfl is 1, NaT Register Consumption Abort. Itanium Mem FaultsVHPT Data Fault, Nested TLB Fault, Data TLB Fault, Alternate Data TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault Floating-point Exceptions...
  • Page 131 FDIVR/FDIVRP/FIDIVR—Reverse Divide Opcode Instruction Description D8 /7 FDIVR m32real Divide m32real by ST(0) and store result in ST(0) DC /7 FDIVR m64real Divide m64real by ST(0) and store result in ST(0) D8 F8+i FDIVR ST(0), ST(i) Divide ST(i) by ST(0) and store result in ST(0) DC F0+i FDIVR ST(i), ST(0) Divide ST(0) by ST(i) and store result in ST(i)
  • Page 132 FDIVR/FDIVRP/FIDIVR—Reverse Divide (Continued) DEST  F 0 +   + + -• F 0 I 0 0 0 0 0 0 0 + 0 +   + + + Notes: Fmeans finite-real number. Imeans integer. *indicates floating-point invalid-arithmetic-operand (#IA) exception. **indicates floating-point zero-divide (#Z) exception.
  • Page 133 FDIVR/FDIVRP/FIDIVR—Reverse Divide (Continued) Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR.dfl is 1, NaT Register Consumption Abort. Itanium Mem FaultsVHPT Data Fault, Nested TLB Fault, Data TLB Fault, Alternate Data TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault Floating-point Exceptions...
  • Page 134 FFREE—Free Floating-point Register Opcode Instruction Description DD C0+i FFREE ST(i) Sets tag for ST(i) to empty Description Sets the tag in the FPU tag register associated with register ST(i) to empty (11B). The contents of ST(i) and the FPU stack-top pointer (TOP) are not affected. Operation TAG(i) ...
  • Page 135 FICOM/FICOMP—Compare Integer Opcode Instruction Description DE /2 FICOM m16int Compare ST(0) with m16int DA /2 FICOM m32int Compare ST(0) with m32int DE /3 FICOMP m16int Compare ST(0) with m16int and pop stack register DA /3 FICOMP m32int Compare ST(0) with m32int and pop stack register Description Compares the value in ST(0) with an integer source operand and sets the condition code flags C0, C2, and C3 in the FPU status word according to the results (see table...
  • Page 136 FICOM/FICOMP—Compare Integer (Continued) Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR.dfl is 1, NaT Register Consumption Abort. Itanium Mem FaultsVHPT Data Fault, Nested TLB Fault, Data TLB Fault, Alternate Data TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault Floating-point Exceptions...
  • Page 137 FILD—Load Integer Opcode Instruction Description DF /0 FILD m16int Push m16int onto the FPU register stack. DB /0 FILD m32int Push m32int onto the FPU register stack. DF /5 FILD m64int Push m64int onto the FPU register stack. Description Converts the signed-integer source operand into extended-real format and pushes the value onto the FPU register stack.
  • Page 138 FILD—Load Integer (Continued) Real Address Mode Exceptions If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If a memory operand effective address is outside the SS segment limit. EM or TS in CR0 is set. Virtual 8086 Mode Exceptions #GP(0) If a memory operand effective address is outside the CS, DS, ES, FS,...
  • Page 139 FINCSTP—Increment Stack-Top Pointer Opcode Instruction Description D9 F7 FINCSTP Increment the TOP field in the FPU status register Description Adds one to the TOP field of the FPU status word (increments the top-of-stack pointer). The contents of the FPU data registers and tag register are not affected. This operation is not equivalent to popping the stack, because the tag for the previous top-of-stack register is not marked empty.
  • Page 140 FINIT/FNINIT—Initialize Floating-point Unit Opcode Instruction Description 9B DB E3 FINIT Initialize FPU after checking for pending unmasked floating-point exceptions. DB E3 FNINIT Initialize FPU without checking for pending unmasked floating-point exceptions. Description Sets the FPU control, status, tag, instruction pointer, and data pointer registers to their default states.
  • Page 141 FIST/FISTP—Store Integer Opcode Instruction Description DF /2 FIST m16int Store ST(0) in m16int DB /2 FIST m32int Store ST(0) in m32int DF /3 FISTP m16int Store ST(0) in m16int and pop register stack DB /3 FISTP m32int Store ST(0) in m32int and pop register stack DF /7 FISTP m64int Store ST(0) in m64int and pop register stack...
  • Page 142 FIST/FISTP—Store Integer (Continued) Operation DEST  Integer(ST(0)); IF instruction = FISTP THEN PopRegisterStack; FPU Flags Affected Set to 0 if stack underflow occurred. Indicates rounding direction of if the inexact exception (#P) is generated: 0 = not roundup; 1 = roundup. Cleared to 0 otherwise.
  • Page 143 FIST/FISTP—Store Integer (Continued) Real Address Mode Exceptions If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If a memory operand effective address is outside the SS segment limit. EM or TS in CR0 is set. Virtual 8086 Mode Exceptions #GP(0) If a memory operand effective address is outside the CS, DS, ES, FS,...
  • Page 144 FLD—Load Real Opcode Instruction Description D9 /0 FLD m32real Push m32real onto the FPU register stack. DD /0 FLD m64real Push m64real onto the FPU register stack. DB /5 FLD m80real Push m80real onto the FPU register stack. D9 C0+i FLD ST(i) Push ST(i) onto the FPU register stack.
  • Page 145 FLD—Load Real (Continued) Protected Mode Exceptions #GP(0) If destination is located in a nonwritable segment. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register is used to access memory and it contains a null segment selector.
  • Page 146 FLD1/FLDL2T/FLDL2E/FLDPI/FLDLG2/FLDLN2/FLDZ—Load Constant Opcode Instruction Description D9 E8 FLD1 Push +1.0 onto the FPU register stack. D9 E9 FLDL2T Push log 10 onto the FPU register stack. D9 EA FLDL2E Push log e onto the FPU register stack. Push  onto the FPU register stack. D9 EB FLDPI D9 EC...
  • Page 147 EM or TS in CR0 is set. Intel Architecture Compatibility Information When the RC field is set to round-to-nearest, the FPU produces the same constants that is produced by the Intel 8087 and Intel287 math coprocessors. 4:140 Volume 4: Base IA-32 Instruction Reference...
  • Page 148 FLDCW—Load Control Word Opcode Instruction Description D9 /5 FLDCW m2byte Load FPU control word from m2byte. Description Loads the 16-bit source operand into the FPU control word. The source operand is a memory location. This instruction is typically used to establish or change the FPU’s mode of operation.
  • Page 149 FLDCW—Load Control Word (Continued) Real Address Mode Exceptions If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If a memory operand effective address is outside the SS segment limit. EM or TS in CR0 is set. Virtual 8086 Mode Exceptions #GP(0) If a memory operand effective address is outside the CS, DS, ES, FS,...
  • Page 150 The FPU operating environment consists of the FPU control word, status word, tag ® word, instruction pointer, data pointer, and last opcode. See the Intel 64 and IA-32 Architectures Software Developer’s Manual for the layout in memory of the loaded environment, depending on the operating mode of the processor (protected or real) and the size of the current address attribute (16-bit or 32-bit).
  • Page 151 FLDENV—Load FPU Environment (Continued) Protected Mode Exceptions #GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register is used to access memory and it contains a null segment selector.
  • Page 152 FMUL/FMULP/FIMUL—Multiply Opcode Instruction Description D8 /1 FMUL m32real Multiply ST(0) by m32real and store result in ST(0) DC /1 FMUL m64real Multiply ST(0) by m64real and store result in ST(0) D8 C8+i FMUL ST(0), ST(i) Multiply ST(0) by ST(i) and store result in ST(0) DC C8+i FMUL ST(i), ST(0) Multiply ST(i) by ST(0) and store result in ST(i)
  • Page 153 FMUL/FMULP/FIMUL—Multiply (Continued) DEST  F 0 +    + + F 0 F  + I 0 F  + 0 0 0 0 0  F 0 +  F 0 +   + + + Notes: Fmeans finite-real number.
  • Page 154 FMUL/FMULP/FIMUL—Multiply (Continued) Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR.dfl is 1, NaT Register Consumption Abort. Itanium Mem FaultsVHPT Data Fault, Nested TLB Fault, Data TLB Fault, Alternate Data TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault Protected Mode Exceptions...
  • Page 155 FNOP—No Operation Opcode Instruction Description D9 D0 FNOP No operation is performed. Description Performs no FPU operation. This instruction takes up space in the instruction stream but does not affect the FPU or machine context, except the EIP register. FPU Flags Affected C0, C1, C2, C3 undefined.
  • Page 156: Fpatan Zeros And Nans

    FPATAN—Partial Arctangent Opcode Instruction Description D9 F3 FPATAN Replace ST(1) with arctan(ST(1)ST(0)) and pop the register stack Description Computes the arctangent of the source operand in register ST(1) divided by the source operand in register ST(0), stores the result in ST(1), and pops the FPU register stack. The result in register ST(0) has the same sign as the source operand ST(1) and a magnitude less than .
  • Page 157 EM or TS in CR0 is set. Virtual 8086 Mode Exceptions EM or TS in CR0 is set. Intel Architecture Compatibility Information The source operands for this instruction are restricted for the 80287 math coprocessor to the following range: 0  |ST(1)|  |ST(0)|  ...
  • Page 158: Fprem Zeros And Nans

    The FPREM instruction does not compute the remainder specified in IEEE Std. 754. The IEEE specified remainder can be computed with the FPREM1 instruction. The FPREM instruction is provided for compatibility with the Intel 8087 and Intel287 math coprocessors. Volume 4: Base IA-32 Instruction Reference...
  • Page 159 FPREM—Partial Remainder (Continued) The FPREM instruction gets its name “partial remainder” because of the way it computes the remainder. This instructions arrives at a remainder through iterative subtraction. It can, however, reduce the exponent of ST(0) by no more than 63 in one execution of the instruction.
  • Page 160 FPREM—Partial Remainder (Continued) Floating-point Exceptions Stack underflow occurred. Source operand is an SNaN value, modulus is 0, dividend is , or unsupported format. Source operand is a denormal value. Result is too small for destination format. Protected Mode Exceptions EM or TS in CR0 is set. Real Address Mode Exceptions EM or TS in CR0 is set.
  • Page 161: Fprem1 Zeros And Nans

    FPREM1—Partial Remainder Opcode Instruction Description D9 F5 FPREM1 Replace ST(0) with the IEEE remainder obtained on dividing ST(0) by ST(1) Description Computes the IEEE remainder obtained on dividing the value in the ST(0) register (the dividend) by the value in the ST(1) register (the divisor or modulus), and stores the result in ST(0).
  • Page 162 FPREM1—Partial Remainder (Continued) Like the FPREM instruction, the FPREM1 computes the remainder through iterative subtraction, but can reduce the exponent of ST(0) by no more than 63 in one execution of the instruction. If the instruction succeeds in producing a remainder that is less than one half the modulus, the operation is complete and the C2 flag in the FPU status word is cleared.
  • Page 163 FPREM1—Partial Remainder (Continued) Floating-point Exceptions Stack underflow occurred. Source operand is an SNaN value, modulus (divisor) is 0, dividend is , or unsupported format. Source operand is a denormal value. Result is too small for destination format. Protected Mode Exceptions EM or TS in CR0 is set.
  • Page 164 2 or by using the FPREM instruction with a divisor of 2. The value 1.0 is pushed onto the register stack after the tangent has been computed to maintain compatibility with the Intel 8087 and Intel287 math coprocessors. This operation also simplifies the calculation of other trigonometric functions. For instance, the cotangent (which is the reciprocal of the tangent) can be computed by executing a FDIVR instruction after the FPTAN instruction.
  • Page 165 FPTAN—Partial Tangent (Continued) FPU Flags Affected Set to 0 if stack underflow occurred; set to 1 if stack overflow occurred. Indicates rounding direction if the inexact-result exception (#P) is generated: 0 = not roundup; 1 = roundup. Set to 1 if source operand is outside the range 2 to +2 otherwise, cleared to 0.
  • Page 166 FRNDINT—Round to Integer Opcode Instruction Description D9 FC FRNDINT Round ST(0) to an integer. Description Rounds the source value in the ST(0) register to the nearest integral value, depending on the current rounding mode (setting of the RC field of the FPU control word), and stores the result in ST(0).
  • Page 167 The FPU operating environment consists of the FPU control word, status word, tag ® word, instruction pointer, data pointer, and last opcode. See the Intel 64 and IA-32 Architectures Software Developer’s Manual for the layout in memory of the stored environment, depending on the operating mode of the processor (protected or real) and the size of the current address attribute (16-bit or 32-bit).
  • Page 168 FRSTOR—Restore FPU State (Continued) Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR.dfl is 1. Itanium Mem FaultsVHPT Data Fault, Nested TLB Fault, Data TLB Fault, Alternate Data TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault Protected Mode Exceptions...
  • Page 169 The FPU operating environment consists of the FPU control word, status word, tag ® word, instruction pointer, data pointer, and last opcode. See the Intel 64 and IA-32 Architectures Software Developer’s Manual for the layout in memory of the stored environment, depending on the operating mode of the processor (protected or real) and the size of the current address attribute (16-bit or 32-bit).
  • Page 170 FSAVE/FNSAVE—Store FPU State (Continued) FPUStatusWord  0; FPUTagWord  FFFFH; FPUDataPointer  0; FPUInstructionPointer  0; FPULastInstructionOpcode  0; FPU Flags Affected The C0, C1, C2, and C3 flags are saved and then cleared. Floating-point Exceptions None. Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR.dfl is 1, NaT Register Consumption Abort.
  • Page 171 Intel Architecture Compatibility Information For Intel math coprocessors and FPUs prior to the Pentium processor, an FWAIT instruction should be executed before attempting to read from the memory image stored with a prior FSAVE/FNSAVE instruction. This FWAIT instruction helps insure that the storage operation has been completed.
  • Page 172 FSCALE—Scale Opcode Instruction Description D9 FD FSCALE Scale ST(0) by ST(1). Description Multiplies the destination operand by 2 to the power of the source operand and stores the result in the destination operand. This instruction provides rapid multiplication or division by integral powers of 2. The destination operand is a real value that is located in register ST(0).
  • Page 173 FSCALE—Scale (Continued) Operation ST(1) ST(0)  ST(0) 2 FPU Flags Affected Set to 0 if stack underflow occurred. Indicates rounding direction if the inexact-result exception (#P) is generated: 0 = not roundup; 1 = roundup. C0, C2, C3 Undefined. Floating-point Exceptions Stack underflow occurred.
  • Page 174 FSIN—Sine Opcode Instruction Description D9 FE FSIN Replace ST(0) with its sine. Description Calculates the sine of the source operand in register ST(0) and stores the result in ST(0). The source operand must be given in radians and must be within the range 2 to +2 .
  • Page 175 FSIN—Sine (Continued) FPU Flags Affected Set to 0 if stack underflow occurred. Indicates rounding direction if the inexact-result exception (#P) is generated: 0 = not roundup; 1 = roundup. Set to 1 if source operand is outside the range 2 to +2 otherwise, cleared to 0.
  • Page 176 FSINCOS—Sine and Cosine Opcode Instruction Description D9 FB FSINCOS Compute the sine and cosine of ST(0); replace ST(0) with the sine, and push the cosine onto the register stack. Description Computes both the sine and the cosine of the source operand in register ST(0), stores the sine in ST(0), and pushes the cosine onto the top of the FPU register stack.
  • Page 177 FSINCOS—Sine and Cosine (Continued) FPU Flags Affected Set to 0 if stack underflow occurred; set to 1 of stack overflow occurs. Indicates rounding direction if the inexact-result exception (#P) is generated: 0 = not roundup; 1 = roundup. Set to 1 if source operand is outside the range 2 to +2 otherwise, cleared to 0.
  • Page 178 FSQRT—Square Root Opcode Instruction Description D9 FA FSQRT Calculates square root of ST(0) and stores the result in ST(0) Description Calculates the square root of the source value in the ST(0) register and stores the result in ST(0). The following table shows the results obtained when taking the square root of various classes of numbers, assuming that neither overflow nor underflow occurs.
  • Page 179 FSQRT—Square Root (Continued) Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR.dfl is 1, NaT Register Consumption Abort. Protected Mode Exceptions EM or TS in CR0 is set. Real Address Mode Exceptions EM or TS in CR0 is set. Virtual 8086 Mode Exceptions EM or TS in CR0 is set.
  • Page 180 FST/FSTP—Store Real Opcode Instruction Description D9 /2 FST m32real Copy ST(0) to m32real DD /2 FST m64real Copy ST(0) to m64real DD D0+i FST ST(i) Copy ST(0) to ST(i) D9 /3 FSTP m32real Copy ST(0) to m32real and pop register stack DD /3 FSTP m64real Copy ST(0) to m64real and pop register stack...
  • Page 181 FST/FSTP—Store Real (Continued) FPU Flags Affected Set to 0 if stack underflow occurred. Indicates rounding direction of if the floating-point inexact exception (#P) is generated: 0 = not roundup; 1 = roundup. C0, C2, C3 Undefined. Floating-point Exceptions Stack underflow occurred. Source operand is an SNaN value or unsupported format.
  • Page 182 FST/FSTP—Store Real (Continued) Virtual 8086 Mode Exceptions #GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. #SS(0) If a memory operand effective address is outside the SS segment limit. EM or TS in CR0 is set. #PF(fault-code) If a page fault occurs.
  • Page 183 FSTCW/FNSTCW—Store Control Word Opcode Instruction Description 9B D9 /7 FSTCW m2byte Store FPU control word to m2byte after checking for pending unmasked floating-point exceptions. D9 /7 FNSTCW m2byte Store FPU control word to m2byte without checking for pending unmasked floating-point exceptions. Description Stores the current value of the FPU control word at the specified destination in memory.
  • Page 184 FSTCW/FNSTCW—Store Control Word (Continued) Real Address Mode Exceptions If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If a memory operand effective address is outside the SS segment limit. EM or TS in CR0 is set. Virtual 8086 Mode Exceptions #GP(0) If a memory operand effective address is outside the CS, DS, ES, FS,...
  • Page 185 The FPU operating environment consists of the FPU control word, status word, tag word, instruction ® pointer, data pointer, and last opcode. See the Intel 64 and IA-32 Architectures Software Developer’s Manual for the layout in memory of the stored environment, depending on the operating mode of the processor (protected or real) and the size of the current address attribute (16-bit or 32-bit).
  • Page 186 FSTENV/FNSTENV—Store FPU Environment (Continued) Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR.dfl is 1 Itanium Mem FaultsVHPT Data Fault, Nested TLB Fault, Data TLB Fault, Alternate Data TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault Protected Mode Exceptions...
  • Page 187 FSTSW/FNSTSW—Store Status Word Opcode Instruction Description 9B DD /7 FSTSW m2byte Store FPU status word at m2byte after checking for pending unmasked floating-point exceptions. 9B DF E0 FSTSW AX Store FPU status word in AX register after checking for pending unmasked floating-point exceptions.
  • Page 188 FSTSW/FNSTSW—Store Status Word (Continued) Protected Mode Exceptions #GP(0) If the destination is located in a nonwritable segment. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register is used to access memory and it contains a null segment selector.
  • Page 189 FSUB/FSUBP/FISUB—Subtract Opcode Instruction Description D8 /4 FSUB m32real Subtract m32real from ST(0) and store result in ST(0) DC /4 FSUB m64real Subtract m64real from ST(0) and store result in ST(0) D8 E0+i FSUB ST(0), ST(i) Subtract ST(i) from ST(0) and store result in ST(0) DC E8+i FSUB ST(i), ST(0) Subtract ST(0) from ST(i) and store result in ST(i)
  • Page 190: Fsub Zeros And Nans

    FSUB/FSUBP/FISUB—Subtract (Continued) Table 2-9. FSUB Zeros and NaNs  F or I 0 +F or +I +       F F or 0 F  + DEST DEST 0 SRC 0 0 SRC  + DEST SRC 0 SRC...
  • Page 191 FSUB/FSUBP/FISUB—Subtract (Continued) Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR.dfl is 1, NaT Register Consumption Abort. Itanium Mem FaultsVHPT Data Fault, Nested TLB Fault, Data TLB Fault, Alternate Data TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault Protected Mode Exceptions...
  • Page 192 FSUBR/FSUBRP/FISUBR—Reverse Subtract Opcode Instruction Description D8 /5 FSUBR m32real Subtract ST(0) from m32real and store result in ST(0) DC /5 FSUBR m64real Subtract ST(0) from m64real and store result in ST(0) D8 E8+i FSUBR ST(0), ST(i) Subtract ST(0) from ST(i) and store result in ST(0) DC E0+i FSUBR ST(i), ST(0) Subtract ST(i) from ST(0)and store result in ST(i)
  • Page 193: Fsubr Zeros And Nans

    FSUBR/FSUBRP/FISUBR—Reverse Subtract (Continued) When the difference between two operands of like sign is 0, the result is +0, except for the round toward mode, in which case the result is 0. This instruction also  guarantees that +0  (0) = +0, and that 0  (+0) = 0. When the source operand is an integer 0, it is treated as a +0.
  • Page 194 FSUBR/FSUBRP/FISUBR—Reverse Subtract (Continued) Floating-point Exceptions Stack underflow occurred. Operand is an SNaN value or unsupported format. Operands are infinities of like sign. Source operand is a denormal value. Result is too small for destination format. Result is too large for destination format. Value cannot be represented exactly in destination format.
  • Page 195 FTST—TEST Opcode Instruction Description D9 E4 FTST Compare ST(0) with 0.0. Description Compares the value in the ST(0) register with 0.0 and sets the condition code flags C0, C2, and C3 in the FPU status word according to the results (see table below). Condition ST(0) >...
  • Page 196 FTST—TEST (Continued) Real Address Mode Exceptions EM or TS in CR0 is set. Virtual 8086 Mode Exceptions EM or TS in CR0 is set. Volume 4: Base IA-32 Instruction Reference 4:189...
  • Page 197 FUCOM/FUCOMP/FUCOMPP—Unordered Compare Real Opcode Instruction Description DD E0+i FUCOM ST(i) Compare ST(0) with ST(i) DD E1 FUCOM Compare ST(0) with ST(1) DD E8+i FUCOMP ST(i) Compare ST(0) with ST(i) and pop register stack DD E9 FUCOMP Compare ST(0) with ST(1) and pop register stack DA E9 FUCOMPP Compare ST(0) with ST(1) and pop register stack twice...
  • Page 198 FUCOM/FUCOMP/FUCOMPP—Unordered Compare Real (Continued) THEN C3, C2, C0  111; ELSE (* ST(0) or SRC is SNaN or unsupported format *) #IA; IF FPUControlWord.IM = 1 THEN C3, C2, C0  111; IF instruction = FUCOMP THEN PopRegisterStack; IF instruction = FUCOMPP THEN PopRegisterStack;...
  • Page 199 FWAIT—Wait See entry for WAIT. 4:192 Volume 4: Base IA-32 Instruction Reference...
  • Page 200 FXAM—Examine Opcode Instruction Description D9 E5 FXAM Classify value or number in ST(0) Description Examines the contents of the ST(0) register and sets the condition code flags C0, C2, and C3 in the FPU status word to indicate the class of value or number in the register (see the table below).
  • Page 201 FXAM—Examine (Continued) Protected Mode Exceptions EM or TS in CR0 is set. Real Address Mode Exceptions EM or TS in CR0 is set. Virtual 8086 Mode Exceptions EM or TS in CR0 is set. 4:194 Volume 4: Base IA-32 Instruction Reference...
  • Page 202 FXCH—Exchange Register Contents Opcode Instruction Description D9 C8+i FXCH ST(i) Exchange the contents of ST(0) and ST(i) D9 C9 FXCH Exchange the contents of ST(0) and ST(1) Description Exchanges the contents of registers ST(0) and ST(i). If no source operand is specified, the contents of ST(0) and ST(1) are exchanged.
  • Page 203 FXCH—Exchange Register Contents (Continued) Real Address Mode Exceptions EM or TS in CR0 is set. Virtual 8086 Mode Exceptions EM or TS in CR0 is set. 4:196 Volume 4: Base IA-32 Instruction Reference...
  • Page 204 FXTRACT—Extract Exponent and Significand Opcode Instruction Description D9 F4 FXTRACT Separate value in ST(0) into exponent and significand, store exponent in ST(0), and push the significand onto the register stack. Description Separates the source value in the ST(0) register into its exponent and significand, stores the exponent in ST(0), and pushes the significand onto the register stack.
  • Page 205 FXTRACT—Extract Exponent and Significand (Continued) Protected Mode Exceptions EM or TS in CR0 is set. Real Address Mode Exceptions EM or TS in CR0 is set. Virtual 8086 Mode Exceptions EM or TS in CR0 is set. 4:198 Volume 4: Base IA-32 Instruction Reference...
  • Page 206: Fyl2X Zeros And Nans

    FYL2X—Compute y  log Opcode Instruction Description Replace ST(1) with (ST(1) log D9 F1 FYL2X ST(0)) and pop the register stack Description Calculates (ST(1)  log (ST(0))), stores the result in resister ST(1), and pops the FPU register stack. The source operand in ST(0) must be a non-zero positive number. The following table shows the results obtained when taking the log of various classes of numbers, assuming that neither overflow nor underflow occurs.
  • Page 207 FYL2X—Compute y  log x (Continued) Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR.dfl is 1, NaT Register Consumption Abort. Floating-point Exceptions Stack underflow occurred. Either operand is an SNaN or unsupported format. Source operand in register ST(0) is a negative finite value (not ...
  • Page 208: Fyl2Xp1 Zeros And Nans

    FYL2XP1—Compute y  log (x +1) Opcode Instruction Description Replace ST(1) with ST(1) log D9 F9 FYL2XP1 (ST(0) + 1.0) and pop the register stack Description Calculates the log epsilon (ST(1)  log (ST(0) + 1.0)), stores the result in register ST(1), and pops the FPU register stack.
  • Page 209 FYL2XP1—Compute y  log (x +1) (Continued) FPU Flags Affected Set to 0 if stack underflow occurred. Indicates rounding direction if the inexact-result exception (#P) is generated: 0 = not roundup; 1 = roundup. C0, C2, C3 Undefined. Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR.dfl is 1, NaT Register Consumption Abort.
  • Page 210 HLT—Halt Opcode Instruction Description Halt Description Stops instruction execution and places the processor in a HALT state. An enabled interrupt, NMI, or a reset will resume execution. If an interrupt (including NMI) is used to resume execution after a HLT instruction, the saved instruction pointer (CS:EIP) points to the instruction following the HLT instruction.
  • Page 211: Idiv Operands

    IDIV—Signed Divide Opcode Instruction Description F6 /7 IDIV r/m8 Signed divide AX (where AH must contain sign-extension of AL) by r/m byte. (Results: AL=Quotient, AH=Remainder) F7 /7 IDIV r/m16 Signed divide DX:AX (where DX must contain sign-extension of AX) by r/m word. (Results: AX=Quotient, DX=Remainder) F7 /7 IDIV r/m32 Signed divide EDX:EAX (where EDX must contain...
  • Page 212 IDIV—Signed Divide (Continued) temp  DX:AX / SRC; (* signed division *) IF (temp > 7FFFH) OR (temp < 8000H) (* if a positive result is greater than 7FFFH *) (* or a negative result is less than 8000H *) THEN #DE;...
  • Page 213 IDIV—Signed Divide (Continued) Real Address Mode Exceptions If the source operand (divisor) is 0. The signed result (quotient) is too large for the destination. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If a memory operand effective address is outside the SS segment limit.
  • Page 214 IMUL—Signed Multiply Opcode Instruction Description AX AL  r/m byte F6 /5 IMUL r/m8 DX:AX  AX  r/m word F7 /5 IMUL r/m16 EDX:EAX  EAX  r/m doubleword F7 /5 IMUL r/m32 word register  word register  r/m word 0F AF /r IMUL r16,r/m16 doubleword register ...
  • Page 215 IMUL—Signed Multiply (Continued) The three forms of the IMUL instruction are similar in that the length of the product is calculated to twice the length of the operands. With the one-operand form, the product is stored exactly in the destination. With the two- and three- operand forms, however, result is truncated to the length of the destination before it is stored in the destination register.
  • Page 216 IMUL—Signed Multiply (Continued) Flags Affected For the one operand form of the instruction, the CF and OF flags are set when significant bits are carried into the upper half of the result and cleared when the result fits exactly in the lower half of the result. For the two- and three-operand forms of the instruction, the CF and OF flags are set when the result must be truncated to fit in the destination operand size and cleared when the result fits exactly in the destination operand size.
  • Page 217 IN—Input from Port Opcode Instruction Description E4 ib IN AL,imm8 Input byte from imm8 I/O port address into AL E5 ib IN AX,imm8 Input byte from imm8 I/O port address into AX E5 ib IN EAX,imm8 Input byte from imm8 I/O port address into EAX IN AL,DX Input byte from I/O port in DX into AL IN AX,DX...
  • Page 218 IN—Input from Port (Continued) ELSE ( * Real-address mode or protected mode with CPL  IOPL *) (* or virtual-8086 mode with all I/O permission bits for I/O port cleared *) IF (Itanium_System_Environment THEN SRC_VA = IOBase | (Port{15:2}<<12) | Port{11:0}; SRC_PA = translate(SRC_VA);...
  • Page 219 INC—Increment by 1 Opcode Instruction Description FE /0 INC r/m8 Increment r/m byte by 1 FF /0 INC r/m16 Increment r/m word by 1 FF /0 INC r/m32 Increment r/m doubleword by 1 40+ rw INC r16 Increment word register by 1 40+ rd INC r32 Increment doubleword register by 1...
  • Page 220 INC—Increment by 1 (Continued) Virtual 8086 Mode Exceptions #GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. #SS(0) If a memory operand effective address is outside the SS segment limit. #PF(fault-code) If a page fault occurs.
  • Page 221 INS/INSB/INSW/INSD—Input from Port to String Opcode Instruction Description INS ES:(E)DI, DX Input byte from port DX into ES:(E)DI INS ES:DI, DX Input word from port DX into ES:DI INS ES:EDI, DX Input doubleword from port DX into ES:EDI INSB Input byte from port DX into ES:(E)DI INSW Input word from port DX into ES:DI INSD...
  • Page 222 INS/INSB/INSW/INSD—Input from Port to String (Continued) If the referenced I/O port is mapped to an unimplemented virtual address (via the IOBase register) or if data translations are disabled (PSR.dt is 0) a GPFault is generated on the referencing INS instruction. Operation IF ((PE = 1) AND ((VM = 1) OR (CPL >...
  • Page 223 INS/INSB/INSW/INSD—Input from Port to String (Continued) Additional Itanium System Environment Exceptions Itanium Reg Faults NaT Register Consumption Abort. Itanium Mem FaultsVHPT Data Fault, Nested TLB Fault, Data TLB Fault, Alternate Data TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault IA_32_Exception...
  • Page 224 The destination operand specifies an interrupt vector from 0 to 255, encoded as an 8-bit unsigned intermediate value. The first 32 interrupt vectors are reserved by Intel for system use. Some of these interrupts are used for internally generated exceptions.
  • Page 225: Int Cases

    INTn/INTO/INT3—Call to Interrupt Procedure (Continued) Table 2-14. INT Cases – – – – – IOPL – – – – – – <3 DPL/CPL – DPL< – DPL> DPL= DPL< – – RELATIONSHIP CPL or C CPL & NC INTERRUPT TYPE –...
  • Page 226 INTn/INTO/INT3—Call to Interrupt Procedure (Continued) /*IN the Itanium System Environment all of the following operations are intercepted*/ IF PE=0 THEN GOTO REAL-ADDRESS-MODE; ELSE (* PE=1 *) GOTO PROTECTED-MODE; REAL-ADDRESS-MODE: IF ((DEST  4) + 3) is not within IDT limit THEN #GP; FI; IF stack not large enough for a 6-byte return information THEN #SS;...
  • Page 227 INTn/INTO/INT3—Call to Interrupt Procedure (Continued) IF TSS not present THEN #NP(TSS selector); SWITCH-TASKS (with nesting) to TSS; IF interrupt caused by fault with error code THEN IF stack limit does not allow push of two bytes THEN #SS(0); Push(error code); IF EIP not within code segment limit THEN #GP(0);...
  • Page 228 INTn/INTO/INT3—Call to Interrupt Procedure (Continued) END; INTER-PRIVILEGE-LEVEL-INTERRUPT (* PE=1, interrupt or trap gate, non-conforming code segment, DPLCPL *) (* Check segment selector and descriptor for stack of new privilege level in current TSS *) IF current TSS is 32-bit TSS THEN TSSstackAddress ...
  • Page 229 INTn/INTO/INT3—Call to Interrupt Procedure (Continued) Push(EFLAGS); Push(far pointer to return instruction); (* old CS and EIP, 3 words padded to 4*); Push(ErrorCode); (* if needed, 4 bytes *) ELSE(* 16-bit gate *) Push(far pointer to old stack); (* old SS and SP, 2 words *); Push(EFLAGS);...
  • Page 230 INTn/INTO/INT3—Call to Interrupt Procedure (Continued) OR 18 bytes (no error code pushed); THEN #SS(segment selector + EXT); IF instruction pointer is not within code segment limits THEN #GP(0); FI; IF CR4.VME = 0 THEN IF IOPL=3 THEN IF Gate DPL = 3 THEN (*CPL=3, VM=1, IOPL=3, VME=0, gate DPL=3) IF Target CPL != 0...
  • Page 231 INTn/INTO/INT3—Call to Interrupt Procedure (Continued) TF  0; RF  0; IF service through interrupt gate THEN IF  0; FI; TempSS  SS; TempESP  ESP; SS:ESP  TSS(SS0:ESP0); (* Change to level 0 stack segment *) (* Following pushes are 16 bits for 16-bit gate and 32 bits for 32-bit gates *) (* Segment selector pushes in 32-bit mode are padded to two words *) Push(GS);...
  • Page 232 INTn/INTO/INT3—Call to Interrupt Procedure (Continued) (* PE=1, DPL = CPL or conforming segment *) IF 32-bit gate THEN IF current stack does not have room for 16 bytes (error code pushed) OR 12 bytes (no error code pushed); THEN #SS(0); ELSE (* 16-bit gate *) IF current stack does not have room for 8 bytes (error code pushed) OR 6 bytes (no error code pushed);...
  • Page 233 INTn/INTO/INT3—Call to Interrupt Procedure (Continued) Protected Mode Exceptions #GP(0) If the instruction pointer in the IDT or in the interrupt-, trap-, or task gate is beyond the code segment limits. #GP(selector) If the segment selector in the interrupt-, trap-, or task gate is null. If a interrupt-, trap-, or task gate, code segment, or TSS segment selector index is outside its descriptor table limits.
  • Page 234 INTn/INTO/INT3—Call to Interrupt Procedure (Continued) Virtual 8086 Mode Exceptions #GP(0) (For INTn instruction) If the IOPL is less than 3 and the DPL of the interrupt-, trap-, or task-gate descriptor is not equal to 3. If the instruction pointer in the IDT or in the interrupt-, trap-, or task gate is beyond the code segment limits.
  • Page 235 CPL of a program or procedure must be 0 to execute this instruction. This instruction is also implementation-dependent; its function may be implemented differently on future Intel architecture processors. Use this instruction with care. Data cached internally and not written back to main memory will be lost.
  • Page 236 INVD—Invalidate Internal Caches (Continued) Intel Architecture Compatibility This instruction is not supported on Intel architecture processors earlier than the Intel486 processor. Volume 4: Base IA-32 Instruction Reference 4:229...
  • Page 237 Virtual 8086 Mode Exceptions #GP(0) The INVLPG instruction cannot be executed at the virtual 8086 mode. Intel Architecture Compatibility This instruction is not supported on Intel architecture processors earlier than the Intel486 processor. 4:230 Volume 4: Base IA-32 Instruction Reference...
  • Page 238 IRET/IRETD—Interrupt Return Opcode Instruction Description IRET Interrupt return (16-bit operand size) IRETD Interrupt return (32-bit operand size) Description Returns program control from an exception or interrupt handler to a program or procedure that was interrupted by an exception, an external interrupt or, a software-generated interrupt, or returns from a nested task.
  • Page 239 IRET/IRETD—Interrupt Return (Continued) If the NT flag is set, the IRET instruction performs a return from a nested task (switches from the called task back to the calling task) or reverses the operation of an interrupt or exception that caused a task switch. The updated state of the task executing the IRET instruction is saved in its TSS.
  • Page 240 IRET/IRETD—Interrupt Return (Continued) THEN #SS(0) tempEIP  Pop(); tempCS  Pop(); tempEFLAGS  Pop(); ELSE (* OperandSize = 16 *) IF top 6 bytes of stack are not within stack limits THEN #SS(0); tempEIP  Pop(); tempCS  Pop(); tempEFLAGS  Pop(); tempEIP ...
  • Page 241 IRET/IRETD—Interrupt Return (Continued) THEN EIP  Pop(); CS  Pop(); (* 32-bit pop, high-order 16-bits discarded *) TempEFlags  Pop(); FLAGS = (EFLAGS AND 1B3000H) OR (TempEFlags AND 244FD7H) (*VM,IOPL,RF,VIP,and VIF EFLAGS bits are not modified by pop *) ELSE (* OperandSize = 16 *) EIP ...
  • Page 242 IRET/IRETD—Interrupt Return (Continued) (* Resume execution in Virtual 8086 mode *) END; TASK-RETURN: (* PE=1, VM=1, NT=1 *) Read segment selector in link field of current TSS; IF local/global bit is set to local OR index not within GDT limits THEN #GP(TSS selector);...
  • Page 243 IRET/IRETD—Interrupt Return (Continued) IF CPL = 0 THEN EFLAGS(IOPL)  tempEFLAGS; IF OperandSize=32 THEN EFLAGS(VM, VIF, VIP)  tempEFLAGS; END; RETURN-TO-OUTER-PRIVILGE-LEVEL: IF OperandSize=32 THEN IF top 8 bytes on stack are not within limits THEN #SS(0); FI; ELSE (* OperandSize=16 *) IF top 4 bytes on stack are not within limits THEN #SS(0);...
  • Page 244 IRET/IRETD—Interrupt Return (Continued) AND CPL > segment descriptor DPL (* stored in hidden part of segment register *) THEN (* segment register invalid *) SegmentSelector  0; (* null segment selector *) END: Flags Affected All the flags and fields in the EFLAGS register are potentially modified, depending on the mode of operation of the processor.
  • Page 245 IRET/IRETD—Interrupt Return (Continued) Real Address Mode Exceptions If the return instruction pointer is not within the return code segment limit. If the top bytes of stack are not within stack limits. Virtual 8086 Mode Exceptions #GP(0) If the return instruction pointer is not within the return code segment limit.
  • Page 246 Jcc—Jump if Condition Is Met Opcode Instruction Description 77 cb JA rel8 Jump short if above (CF=0 and ZF=0) 73 cb JAE rel8 Jump short if above or equal (CF=0) 72 cb JB rel8 Jump short if below (CF=1) 76 cb JBE rel8 Jump short if below or equal (CF=1 or ZF=1) 72 cb...
  • Page 247 Jcc—Jump if Condition Is Met (Continued) Opcode Instruction Description 0F 8D cw/cd JGE rel16/32 Jump near if greater or equal (SF=OF) 0F 8C cw/cd JL rel16/32 Jump near if less (SF<>OF) 0F 8E cw/cd JLE rel16/32 Jump near if less or equal (ZF=1 or SF<>OF) 0F 86 cw/cd JNA rel16/32 Jump near if not above (CF=1 or ZF=1)
  • Page 248 Jcc—Jump if Condition Is Met (Continued) Because a particular state of the status flags can sometimes be interpreted in two ways, two mnemonics are defined for some opcodes. For example, the JA (jump if above) instruction and the JNBE (jump if not below or equal) instruction are alternate mnemonics for the opcode 77H.
  • Page 249 Jcc—Jump if Condition Is Met (Continued) Real Address Mode Exceptions If the offset being jumped to is beyond the limits of the CS segment or is outside of the effective address space from 0 to FFFFH. This condition can occur if 32-address size override prefix is used. Virtual 8086 Mode Exceptions #GP(0) If the offset being jumped to is beyond the limits of the CS segment...
  • Page 250 Results in an IA-32_Intercept(Gate) in Itanium System Environment. A task switch can only be executed in protected mode (see Chapter 6 in the Intel Architecture Software Developer’s Manual, Volume 3 for information on task switching with the JMP instruction).
  • Page 251 One form of the JMP instruction allows the jump to be made directly to a TSS, without going through a task gate. See Chapter 13 in Intel Architecture Software Developer’s Manual, Volume 3 the for detailed information on the mechanics of a task switch.
  • Page 252 JMP—Jump (Continued) Operation IF near jump THEN IF near relative jump THEN tempEIP  EIP + DEST; (* EIP is instruction following JMP instruction*) ELSE (* near absolute jump *) tempEIP  DEST; IF tempEIP is beyond code segment limit THEN #GP(0); FI; IF OperandSize = 32 THEN EIP ...
  • Page 253 JMP—Jump (Continued) CONFORMING-CODE-SEGMENT: IF DPL > CPL THEN #GP(segment selector); FI; IF segment not present THEN #NP(segment selector); FI; tempEIP  DEST(offset); IF OperandSize=16 THEN tempEIP  tempEIP AND 0000FFFFH; IF tempEIP not in code segment limit THEN #GP(0); FI; CS ...
  • Page 254 JMP—Jump (Continued) END; TASK-GATE: IF task gate DPL < CPL OR task gate DPL < task gate segment-selector RPL THEN #GP(task gate selector); FI; IF task gate not present THEN #NP(gate selector); FI; IF Itanium System Environment THEN IA-32_Intercept(Gate,JMP); Read the TSS segment selector in the task-gate descriptor; IF TSS segment selector local/global bit is set to local OR index not within GDT limits OR TSS descriptor specifies that the TSS is busy...
  • Page 255 JMP—Jump (Continued) If the DS, ES, FS, or GS register is used to access memory and it contains a null segment selector. #GP(selector) If segment selector index is outside descriptor table limits. If the segment descriptor pointed to by the segment selector in the destination operand is not for a conforming-code segment, nonconforming-code segment, call gate, task gate, or task state segment.
  • Page 256 Itanium Instruction Set Opcode Instruction Description 0F 00 /6 JMPE r/m16 Jump to Intel Itanium instruction set, indirect address specified by r/m16 0F 00 /6 JMPE r/m32 Jump to Intel Itanium instruction set, indirect address specified by r/m32 0F B8...
  • Page 257 ® ® JMPE—Jump to Intel Itanium Instruction Set (Continued) Operation IF(NOT Itanium System Environment) { IF (PSR.cpl==0) Terminate_IA-32_System_Env(); ELSE IA_32_Exception(IllegalOpcode); } ELSE IF(PSR.di==1) { Disabled_Instruction_Set_Transition_Fault(); } ELSE IF(pending_numeric_exceptions()) { IA_32_exception(FPError); } ELSE { IF(absolute_form) { //compute virtual target IP{31:0} = disp16/32 + AR[CSD].base;//disp is 16/32-bit unsigned value } ELSE IF(indirect_form) { IP{31:0} = [r/m16/32] + AR[CSD].base;...
  • Page 258 LAHF—Load Status Flags into AH Register Opcode Instruction Description LAHF Load: AH = EFLAGS(SF:ZF:0:AF:0:PF:1:CF) Description Moves the low byte of the EFLAGS register (which includes status flags SF, ZF, AF, PF, and CF) to the AH register. Reserved bits 1, 3, and 5 of the EFLAGS register are set in the AH register as shown in the “Operation”...
  • Page 259 LAR—Load Access Rights Byte Opcode Instruction Description r16  r/m16 masked by FF00H 0F 02 /r LAR r16,r/m16 r32  r/m32 masked by 00FxFF00H 0F 02 /r LAR r32,r/m32 Description Loads the access rights from the segment descriptor specified by the second operand (source operand) into the first operand (destination operand) and sets the ZF flag in the EFLAGS register.
  • Page 260: Lar Descriptor Validity

    LAR—Load Access Rights Byte (Continued) Table 2-15. LAR Descriptor Validity Type Name Valid Reserved Available 16-bit TSS Busy 16-bit TSS 16-bit call gate 16-bit/32-bit task gate 16-bit trap gate 16-bit interrupt gate Reserved Available 32-bit TSS Reserved Busy 32-bit TSS 32-bit call gate Reserved 32-bit trap gate...
  • Page 261 LAR—Load Access Rights Byte (Continued) Protected Mode Exceptions #GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register is used to access memory and it contains a null segment selector.
  • Page 262 LDS/LES/LFS/LGS/LSS—Load Far Pointer Opcode Instruction Description C5 /r LDS r16,m16:16 Load DS:r16 with far pointer from memory C5 /r LDS r32,m16:32 Load DS:r32 with far pointer from memory 0F B2 /r LSS r16,m16:16 Load SS:r16 with far pointer from memory 0F B2 /r LSS r32,m16:32 Load SS:r32 with far pointer from memory...
  • Page 263 LDS/LES/LFS/LGS/LSS—Load Far Pointer (Continued) SS  SegmentDescriptor([SRC]); ELSE IF DS, ES, FS, or GS is loaded with non-null segment selector THEN IF Segment selector index is not within descriptor table limits OR Access rights indicate segment neither data nor readable code segment OR (Segment is data or nonconforming-code segment AND both RPL and CPL >...
  • Page 264 LDS/LES/LFS/LGS/LSS—Load Far Pointer (Continued) If the DS, ES, FS, or GS register is being loaded with a non-null segment selector and any of the following is true: the segment selector index is not within descriptor table limits, the segment is neither a data nor a readable code segment, or the segment is a data or nonconforming-code segment and both RPL and CPL are greater than DPL.
  • Page 265: Lea Address And Operand Sizes

    LEA—Load Effective Address Opcode Instruction Description 8D /r LEA r16,m Store effective address for m in register r16 8D /r LEA r32,m Store effective address for m in register r32 Description Computes the effective address of the second operand (the source operand) and stores it in the first operand (destination operand).
  • Page 266 LEA—Load Effective Address (Continued) Flags Affected None. Additional Itanium System Environment Exceptions Itanium Reg Faults NaT Register Consumption Abort. Protected Mode Exceptions If source operand is not a memory location. Real Address Mode Exceptions If source operand is not a memory location. Virtual 8086 Mode Exceptions If source operand is not a memory location.
  • Page 267 LEAVE—High Level Procedure Exit Opcode Instruction Description LEAVE Set SP to BP, then pop BP LEAVE Set ESP to EBP, then pop EBP Description Executes a return from a procedure or group of nested procedures established by an earlier ENTER instruction. The instruction copies the frame pointer (in the EBP register) into the stack pointer register (ESP), releasing the stack space used by a procedure for its local variables.
  • Page 268 LEAVE—High Level Procedure Exit (Continued) Real Address Mode Exceptions If the EBP register points to a location outside of the effective address space from 0 to 0FFFFH. Virtual 8086 Mode Exceptions #GP(0) If the EBP register points to a location outside of the effective address space from 0 to 0FFFFH.
  • Page 269 LES—Load Full Pointer See entry for LDS/LES/LFS/LGS/LSS. 4:262 Volume 4: Base IA-32 Instruction Reference...
  • Page 270 LFS—Load Full Pointer See entry for LDS/LES/LFS/LGS/LSS. Volume 4: Base IA-32 Instruction Reference 4:263...
  • Page 271 LGDT/LIDT—Load Global/Interrupt Descriptor Table Register Opcode Instruction Description 0F 01 /2 LGDT m16&32 Load m into GDTR 0F 01 /3 LIDT m16&32 Load m into IDTR Description Loads the values in the source operand into the global descriptor table register (GDTR) or the interrupt descriptor table register (IDTR).
  • Page 272 LGDT/LIDT—Load Global/Interrupt Descriptor Table Register (Continued) Additional Itanium System Environment Exceptions IA-32_Intercept Mandatory Instruction Intercept for LIDT and LGDT Protected Mode Exceptions If source operand is not a memory location. #GP(0) If the current privilege level is not 0. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit.
  • Page 273 LGS—Load Full Pointer See entry for LDS/LES/LFS/LGS/LSS. 4:266 Volume 4: Base IA-32 Instruction Reference...
  • Page 274 LLDT—Load Local Descriptor Table Register Opcode Instruction Description 0F 00 /2 LLDT r/m16 Load segment selector r/m16 into LDTR Description Loads the source operand into the segment selector field of the local descriptor table register (LDTR). The source operand (a general-purpose register or a memory location) contains a segment selector that points to a local descriptor table (LDT).
  • Page 275 LLDT—Load Local Descriptor Table Register (Continued) #GP(selector) If the selector operand does not point into the Global Descriptor Table or if the entry in the GDT is not a Local Descriptor Table. Segment selector is beyond GDT limit. #SS(0) If a memory operand effective address is outside the SS segment limit.
  • Page 276 LIDT—Load Interrupt Descriptor Table Register See entry for LGDT/LIDT—Load Global Descriptor Table Register/Load Interrupt Descriptor Table Register. Volume 4: Base IA-32 Instruction Reference 4:269...
  • Page 277 CPL 0. This instruction is provided for compatibility with the Intel 286 processor; programs and procedures intended to run on processors more recent than the Intel 286 should use the MOV (control registers) instruction to load the machine status word.
  • Page 278 LMSW—Load Machine Status Word (Continued) Real Address Mode Exceptions If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. Virtual 8086 Mode Exceptions #GP(0) If the current privilege level is not 0. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit.
  • Page 279 LOCK—Assert LOCK# Signal Prefix Opcode Instruction Description LOCK Asserts LOCK# signal for duration of the accompanying instruction Description Causes the processor’s LOCK# signal to be asserted during execution of the accompanying instruction (turns the instruction into an atomic instruction). In a multiprocessor environment, the LOCK# signal insures that the processor has exclusive use of any shared memory while the signal is asserted.
  • Page 280 LOCK—Assert LOCK# Signal Prefix (Continued) Real Address Mode Exceptions If the LOCK prefix is used with an instruction not listed in the “Description” section above. Other exceptions can be generated by the instruction that the LOCK prefix is being applied to. Virtual 8086 Mode Exceptions If the LOCK prefix is used with an instruction not listed in the “Description”...
  • Page 281 LODS/LODSB/LODSW/LODSD—Load String Operand Opcode Instruction Description LODS DS:(E)SI Load byte at address DS:(E)SI into AL LODS DS:SI Load word at address DS:SI into AX LODS DS:ESI Load doubleword at address DS:ESI into EAX LODSB Load byte at address DS:(E)SI into AL LODSW Load word at address DS:SI into AX LODSD...
  • Page 282 LODS/LODSB/LODSW/LODSD—Load String Operand (Continued) THEN IF DF = 0 THEN ESI  4; ELSE ESI  -4; Flags Affected None. Additional Itanium System Environment Exceptions Itanium Reg Faults NaT Register Consumption Abort. Itanium Mem FaultsVHPT Data Fault, Nested TLB Fault, Data TLB Fault, Alternate Data TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault...
  • Page 283 LOOP/LOOPcc—Loop According to ECX Counter Opcode Instruction Description Decrement count; jump short if count  0 E2 cb LOOP rel8 Decrement count; jump short if count  0 and ZF=1 E1 cb LOOPE rel8 Decrement count; jump short if count  0 and ZF=1 E1 cb LOOPZ rel8 Decrement count;...
  • Page 284 LOOP/LOOPcc—Loop According to ECX Counter (Continued) IF (instruction = LOOPNE) OR (instruction = LOOPNZ) THEN IF (ZF =0 ) AND (Count  0) THEN BranchCond  1; ELSE BranchCond  0; ELSE (* instruction = LOOP *) IF (Count  0) THEN BranchCond ...
  • Page 285 LSL—Load Segment Limit Opcode Instruction Description Load: r16  segment limit, selector r/m16 0F 03 /r LSL r16,r/m16 Load: r32  segment limit, selector r/m32) 0F 03 /r LSL r32,r/m32 Description Loads the unscrambled segment limit from the segment descriptor specified with the second operand (source operand) into the first operand (destination operand) and sets the ZF flag in the EFLAGS register.
  • Page 286 LSL—Load Segment Limit (Continued) Type Name Valid Reserved Available 16-bit TSS Busy 16-bit TSS 16-bit call gate 16-bit/32-bit task gate 16-bit trap gate 16-bit interrupt gate Reserved Available 32-bit TSS Reserved Busy 32-bit TSS 32-bit call gate Reserved 32-bit trap gate 32-bit interrupt gate Operation IF SRC(Offset) >...
  • Page 287 LSL—Load Segment Limit (Continued) Additional Itanium System Environment Exceptions Itanium Reg Faults NaT Register Consumption Abort. Itanium Mem FaultsVHPT Data Fault, Nested TLB Fault, Data TLB Fault, Alternate Data TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault Protected Mode Exceptions...
  • Page 288 LSS—Load Full Pointer See entry for LDS/LES/LFS/LGS/LSS. Volume 4: Base IA-32 Instruction Reference 4:281...
  • Page 289 LTR—Load Task Register Opcode Instruction Description 0F 00 /3 LTR r/m16 Load r/m16 into TR Description Loads the source operand into the segment selector field of the task register. The source operand (a general-purpose register or a memory location) contains a segment selector that points to a task state segment (TSS).
  • Page 290 LTR—Load Task Register (Continued) #GP(selector) If the source selector points to a segment that is not a TSS or to one for a task that is already busy. If the selector points to LDT or is beyond the GDT limit. #NP(selector) If the TSS is marked not present.
  • Page 291 MOV—Move Opcode Instruction Description 88 /r MOV r/m8,r8 Move r8 to r/m8 89 /r MOV r/m16,r16 Move r16 to r/m16 89 /r MOV r/m32,r32 Move r32 to r/m32 8A /r MOV r8,r/m8 Move r/m8 to r8 8B /r MOV r16,r/m16 Move r/m16 to r16 8B /r MOV r32,r/m32...
  • Page 292 MOV—Move (Continued) If the destination operand is a segment register (DS, ES, FS, GS, or SS), the source operand must be a valid segment selector. In protected mode, moving a segment selector into a segment register automatically causes the segment descriptor information associated with that segment selector to be loaded into the hidden (shadow) part of the segment register.
  • Page 293 MOV—Move (Continued) SS segment selector; SS segment descriptor; IF DS, ES, FS or GS is loaded with non-null selector; THEN IF segment selector index is outside descriptor table limits OR segment is not a data or readable code segment OR ((segment is a data or nonconforming code segment) AND (both RPL and CPL DPL)) THEN #GP(selector);...
  • Page 294 MOV—Move (Continued) If the DS, ES, FS, or GS register is being loaded and the segment pointed to is a data or nonconforming code segment, but both the RPL and the CPL are greater than the DPL. #SS(0) If a memory operand effective address is outside the SS segment limit.
  • Page 295 CR0 and PGE, PSE, and PAE in register CR4), all TLB entries are flushed, including global entries. This operation is implementation specific for the Pentium Pro processor. Software should not depend on this functionality in future Intel architecture processors.
  • Page 296 MOV—Move to/from Control Registers (Continued) Flags Affected The OF, SF, ZF, AF, PF, and CF flags are undefined. Additional Itanium System Environment Exceptions IA-32_Intercept Move To CR#, Mandatory Instruction Intercept. Move From CR#, read the virtualized control register values, CR0{15:6} return zeros. Protected Mode Exceptions #GP(0) If the current privilege level is not 0.
  • Page 297 DR6 and DR7) to a general-purpose register or vice versa. The operand size for these instructions is always 32 bits, regardless of the operand-size attribute. (See the Intel Architecture Software Developer’s Manual, Volume 3 for a detailed description of the flags and fields in the debug registers.)
  • Page 298 MOV—Move to/from Debug Registers (Continued) If any debug register is accessed while the GD flag in debug register DR7 is set. Real Address Mode Exceptions If the DE (debug extensions) bit of CR4 is set and a MOV instruction is executed involving DR4 or DR5. If any debug register is accessed while the GD flag in debug register DR7 is set.
  • Page 299 MOVS/MOVSB/MOVSW/MOVSD—Move Data from String to String Opcode Instruction Description MOVS ES:(E)DI, DS:(E)SI Move byte at address DS:(E)SI to address ES:(E)DI MOVS ES:DI,DS:SI Move word at address DS:SI to address ES:DI MOVS ES:EDI, DS:ESI Move doubleword at address DS:ESI to address ES:EDI MOVSB Move byte at address DS:(E)SI to address ES:(E)DI MOVSW...
  • Page 300 MOVS/MOVSB/MOVSW/MOVSD—Move Data from String to String (Continued) ELSE (* doubleword move*) THEN IF DF = 0 THEN EDI  4; ELSE EDI  -4; Flags Affected None. Additional Itanium System Environment Exceptions Itanium Reg Faults NaT Register Consumption Abort. Itanium Mem FaultsVHPT Data Fault, Nested TLB Fault, Data TLB Fault, Alternate Data TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault...
  • Page 301 MOVSX—Move with Sign-Extension Opcode Instruction Description 0F BE /r MOVSX r16,r/m8 Move byte to word with sign-extension 0F BE /r MOVSX r32,r/m8 Move byte to doubleword, sign-extension 0F BF /r MOVSX r32,r/m16 Move word to doubleword, sign-extension Description Copies the contents of the source operand (register or memory location) to the destination operand (register) and sign extends the value to 16 or 32 bits.
  • Page 302 MOVZX—Move with Zero-Extend Opcode Instruction Description 0F B6 /r MOVZX r16,r/m8 Move byte to word with zero-extension 0F B6 /r MOVZX r32,r/m8 Move byte to doubleword, zero-extension 0F B7 /r MOVZX r32,r/m16 Move word to doubleword, zero-extension Description Copies the contents of the source operand (register or memory location) to the destination operand (register) and sign extends the value to 16 or 32 bits.
  • Page 303 MOVZX—Move with Zero-Extend (Continued) Virtual 8086 Mode Exceptions #GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. #SS(0) If a memory operand effective address is outside the SS segment limit. #PF(fault-code) If a page fault occurs.
  • Page 304 MUL—Unsigned Multiplication of AL, AX, or EAX Opcode Instruction Description Unsigned multiply (AX  AL  r/m8) F6 /4 MUL r/m8 Unsigned multiply (DX:AX  AX  r/m16) F7 /4 MUL r/m16 Unsigned multiply (EDX:EAX  EAX  r/m32) F7 /4 MUL r/m32 Description Performs an unsigned multiplication of the first operand (destination operand) and the...
  • Page 305 MUL—Unsigned Multiplication of AL, AX, or EAX (Continued) Protected Mode Exceptions #GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register contains a null segment selector. #SS(0) If a memory operand effective address is outside the SS segment limit.
  • Page 306 NEG—Two's Complement Negation Opcode Instruction Description F6 /3 NEG r/m8 Two's complement negate r/m8 F7 /3 NEG r/m16 Two's complement negate r/m16 F7 /3 NEG r/m32 Two's complement negate r/m32 Description Replaces the value of operand (the destination operand) with its two's complement. The destination operand is located in a general-purpose register or a memory location.
  • Page 307 NEG—Two's Complement Negation (Continued) Virtual 8086 Mode Exceptions #GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. #SS(0) If a memory operand effective address is outside the SS segment limit. #PF(fault-code) If a page fault occurs.
  • Page 308 NOP—No Operation Opcode Instruction Description No operation Description Performs no operation. This instruction is a one-byte instruction that takes up space in the instruction stream but does not affect the machine context, except the EIP register. The NOP instruction performs no operation, no registers are accessed and no faults are generated.
  • Page 309 NOT—One's Complement Negation Opcode Instruction Description F6 /2 NOT r/m8 Reverse each bit of r/m8 F7 /2 NOT r/m16 Reverse each bit of r/m16 F7 /2 NOT r/m32 Reverse each bit of r/m32 Description Performs a bitwise NOT operation (1’s complement) on the destination operand and stores the result in the destination operand location.
  • Page 310 NOT—One's Complement Negation (Continued) Virtual 8086 Mode Exceptions #GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. #SS(0) If a memory operand effective address is outside the SS segment limit. #PF(fault-code) If a page fault occurs.
  • Page 311 OR—Logical Inclusive OR Opcode Instruction Description 0C ib OR AL,imm8 AL OR imm8 0D iw OR AX,imm16 AX OR imm16 0D id OR EAX,imm32 EAXOR imm32 80 /1 ib OR r/m8,imm8 r/m8 OR imm8 81 /1 iw OR r/m16,imm16 r/m16 OR imm16 81 /1 id OR r/m32,imm32 r/m32 OR imm32...
  • Page 312 OR—Logical Inclusive OR (Continued) Protected Mode Exceptions #GP(0) If the destination operand points to a nonwritable segment. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register contains a null segment selector. #SS(0) If a memory operand effective address is outside the SS segment limit.
  • Page 313 OUT—Output to Port Opcode Instruction Description E6 ib OUT imm8, AL Output byte AL to imm8 I/O port address E7 ib OUT imm8, AX Output word AX to imm8 I/O port address E7 ib OUT imm8, EAX Output doubleword EAX to imm8 I/O port address OUT DX, AL Output byte AL to I/O port address in DX OUT DX, AX...
  • Page 314 OUT—Output to Port (Continued) (* or virtual-8086 mode with all I/O permission bits for I/O port cleared *) IF (Itanium_System_Environment) THEN DEST_VA = IOBase | (Port{15:2}<<12) | Port{11:0}; DEST_PA = translate(DEST_VA); [DEST_PA]  SRC; (* Writes to selected I/O port *) memory_fence();...
  • Page 315 OUTS/OUTSB/OUTSW/OUTSD—Output String to Port Opcode Instruction Description OUTS DX, DS:(E)SI Output byte at address DS:(E)SI to I/O port in DX OUTS DX, DS:SI Output word at address DS:SI to I/O port in DX OUTS DX, DS:ESI Output doubleword at address DS:ESI to I/O port in DX OUTSB Output byte at address DS:(E)SI to I/O port in DX OUTSW...
  • Page 316 OUTS/OUTSB/OUTSW/OUTSD—Output String to Port (Continued) In the Itanium System Environment, I/O port references are mapped into the 64-bit virtual address pointed to by the IOBase register, with four ports per 4K-byte virtual page. Operating systems can utilize TLBs in the Itanium architecture to grant or deny permission to any four I/O ports.
  • Page 317 OUTS/OUTSB/OUTSW/OUTSD—Output String to Port (Continued) Flags Affected None. Additional Itanium System Environment Exceptions Itanium Reg Faults NaT Register Consumption Abort. Itanium Mem FaultsVHPT Data Fault, Nested TLB Fault, Data TLB Fault, Alternate Data TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault IA_32_Exception...
  • Page 318 POP—Pop a Value from the Stack Opcode Instruction Description 8F /0 POP m16 Pop top of stack into m16; increment stack pointer 8F /0 POP m32 Pop top of stack into m32; increment stack pointer 58+ rw POP r16 Pop top of stack into r16; increment stack pointer 58+ rd POP r32 Pop top of stack into r32;...
  • Page 319 POP—Pop a Value from the Stack (Continued) This action allows sequential execution of POP SS and MOV ESP, EBP instructions without the danger of having an invalid stack during an interrupt. However, use of the LSS instruction is the preferred method of loading the SS and ESP registers. If the ESP register is used as a base register for addressing a destination operand in memory, the POP instructions computes the effective address of the operand after it increments the ESP register.
  • Page 320 POP—Pop a Value from the Stack (Continued) IF DS, ES, FS or GS is loaded with non-null selector; THEN IF segment selector index is outside descriptor table limits OR segment is not a data or readable code segment OR ((segment is a data or nonconforming code segment) AND (both RPL and CPL DPL)) THEN #GP(selector);...
  • Page 321 POP—Pop a Value from the Stack (Continued) If the DS, ES, FS, or GS register is being loaded and the segment pointed to is not a data or readable code segment. If the DS, ES, FS, or GS register is being loaded and the segment pointed to is a data or nonconforming code segment, but both the RPL and the CPL are greater than the DPL.
  • Page 322 POPA/POPAD—Pop All General-Purpose Registers Opcode Instruction Description POPA Pop DI, SI, BP, BX, DX, CX, and AX POPAD Pop EDI, ESI, EBP, EBX, EDX, ECX, and EAX Description Pops doublewords (POPAD) or words (POPA) from the procedure stack into the general-purpose registers.
  • Page 323 POPA/POPAD—Pop All General-Purpose Registers (Continued) Additional Itanium System Environment Exceptions Itanium Reg Faults NaT Register Consumption Abort. Itanium Mem FaultsVHPT Data Fault, Nested TLB Fault, Data TLB Fault, Alternate Data TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault Protected Mode Exceptions...
  • Page 324 POPF/POPFD—Pop Stack into EFLAGS Register Opcode Instruction Description POPF Pop top of stack into EFLAGS POPFD Pop top of stack into EFLAGS Description Pops a doubleword (POPFD) from the top of the stack (if the current operand-size attribute is 32) and stores the value in the EFLAGS register or pops a word from the top of the stack (if the operand-size attribute is 16) and stores it in the lower 16 bits of the EFLAGS register.
  • Page 325 POPF/POPFD—Pop Stack into EFLAGS Register (Continued) IF VM=0 (* Not in Virtual-8086 Mode *) THEN IF CPL=0 THEN IF OperandSize = 32; THEN EFLAGS  Pop(); (* All non-reserved flags except VM, RF, VIP and VIF can be *) (* modified; *) ELSE (* OperandSize = 16 *) EFLAGS[15:0] ...
  • Page 326 POPF/POPFD—Pop Stack into EFLAGS Register (Continued) IF(Itanium System Environment AND (AC, TF != OLD_AC, OLD_TF) THEN IA-32_Intercept(System_Flag,POPF); IF Itanium System Environment AND CFLG.ii AND IF != OLD_IF THEN IA-32_Intercept(System_Flag,POPF); Flags Affected All flags except the reserved bits. Additional Itanium System Environment Exceptions Itanium Reg Faults NaT Register Consumption Abort.
  • Page 327 PUSH—Push Word or Doubleword Onto the Stack Opcode Instruction Description FF /6 PUSH r/m16 Push r/m16 FF /6 PUSH r/m32 Push r/m32 50+rw PUSH r16 Push r16 50+rd PUSH r32 Push r32 PUSH imm8 Push imm8 PUSH imm16 Push imm16 PUSH imm32 Push imm32 PUSH CS...
  • Page 328 PUSH—Push Word or Doubleword Onto the Stack (Continued) IF OperandSize = 16 THEN SP  SP  2; SS:SP  SRC; (* push word *) ELSE (* OperandSize = 32*) SP  SP  4; SS:SP  SRC; (* push doubleword *) Flags Affected None.
  • Page 329 ESP register as it existed before the instruction was executed. (This is also true in the real-address and virtual-8086 modes.) For the Intel 8086 processor, the PUSH SP instruction pushes the new value of the SP register (that is the value after it has been decremented by 2).
  • Page 330 PUSHA/PUSHAD—Push All General-Purpose Registers Opcode Instruction Description PUSHA Push AX, CX, DX, BX, original SP, BP, SI, and DI PUSHAD Push EAX, ECX, EDX, EBX, original ESP, EBP, ESI, and EDI Description Push the contents of the general-purpose registers onto the procedure stack. The registers are stored on the stack in the following order: EAX, ECX, EDX, EBX, EBP, ESP (original value), EBP, ESI, and EDI (if the current operand-size attribute is 32) and AX, CX, DX, BX, SP (original value), BP, SI, and DI (if the operand-size attribute is 16).
  • Page 331 PUSHA/PUSHAD—Push All General-Purpose Registers (Continued) Flags Affected None. Additional Itanium System Environment Exceptions Itanium Reg Faults NaT Register Consumption Abort. Itanium Mem FaultsVHPT Data Fault, Nested TLB Fault, Data TLB Fault, Alternate Data TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault Protected Mode Exceptions...
  • Page 332 PUSHF/PUSHFD—Push EFLAGS Register onto the Stack Opcode Instruction Description PUSHF Push EFLAGS PUSHFD Push EFLAGS Description Decrement the stack pointer by 4 (if the current operand-size attribute is 32) and push the entire contents of the EFLAGS register onto the procedure stack or decrement the stack pointer by 2 (if the operand-size attribute is 16) push the lower 16 bits of the EFLAGS register onto the stack.
  • Page 333 PUSHF/PUSHFD—Push EFLAGS Register onto the Stack (Continued) THEN push(EFLAGS AND 0FCFFFFH); (* VM and RF EFLAGS bits are cleared in image stored on the stack*) ELSE push(EFLAGS); (* Lower 16 bits only *) ELSE (*IOPL < 3*) IF OperandSize =32 OR CR$.VME=0 THEN #GP(0);...
  • Page 334 RCL/RCR/ROL/ROR-—Rotate Opcode Instruction Description D0 /2 RCL r/m8,1 Rotate 9 bits (CF,r/m8) left once D2 /2 RCL r/m8,CL Rotate 9 bits (CF,r/m8) left CL times C0 /2 ib RCL r/m8,imm8 Rotate 9 bits (CF,r/m8) left imm8 times D1 /2 RCL r/m16,1 Rotate 17 bits (CF,r/m16) left once D3 /2 RCL r/m16,CL...
  • Page 335 RCL/RCR/ROL/ROR-—Rotate (Continued) Description Shifts (rotates) the bits of the first operand (destination operand) the number of bit positions specified in the second operand (count operand) and stores the result in the destination operand. The destination operand can be a register or a memory location; the count operand is an unsigned integer that can be an immediate or a value in the CL register.
  • Page 336 RCL/RCR/ROL/ROR-—Rotate (Continued) SIZE DEST  (DEST / 2) + (tempCF  2 tempCOUNT  tempCOUNT - 1; IF COUNT = 1 THEN OF  MSB(DEST) XOR MSB  1(DEST); ELSE OF is undefined; (* RCL instruction operation *)  WHILE (tempCOUNT tempCF ...
  • Page 337 If alignment checking is enabled and an unaligned memory reference is made. Intel Architecture Compatibility The 8086 does not mask the rotation count. All Intel architecture processors from the Intel386™ processor on do mask the rotation count in all operating modes. 4:330...
  • Page 338 RDMSR—Read from Model Specific Register Opcode Instruction Description 0F 32 RDMSR Load MSR specified by ECX into EDX:EAX Description Loads the contents of a 64-bit model specific register (MSR) specified in the ECX register into registers EDX:EAX. The EDX register is loaded with the high-order 32 bits of the MSR and the EAX register is loaded with the low-order 32 bits.
  • Page 339 The MSRs and the ability to read them with the RDMSR instruction were introduced into the Intel architecture with the Pentium processor. Execution of this instruction by an Intel architecture processor earlier than the Pentium processor results in an invalid opcode exception #UD.
  • Page 340 RDPMC—Read Performance-Monitoring Counters Opcode Instruction Description 0F 33 RDPMC Read performance-monitoring counter specified by ECX into EDX:EAX Description Loads the contents of the N-bit performance-monitoring counter specified in the ECX register into registers EDX:EAX. The EDX register is loaded with the high-order N-32 bits of the counter and the EAX register is loaded with the low-order 32 bits.
  • Page 341 RDPMC—Read Performance-Monitoring Counters (Continued) Flags Affected None. Additional Itanium System Environment Exceptions Itanium Reg Faults NaT Register Consumption Abort. #GP(0) If the current privilege level is not 0 and the selected PMD register’s PM bit is 1, or if PSR.sp is 1. Protected Mode Exceptions #GP(0) If the current privilege level is not 0 and the PCE flag in the CR4...
  • Page 342 Similarly, subsequent instructions may begin execution before the read operation is performed. This instruction was introduced into the Intel architecture in the Pentium processor. Operation IF (IA-32 System Environement) IF (CR4.TSD = 0) OR ((CR4.TSD = 1) AND (CPL=0))
  • Page 343 RDTSC—Read Time-Stamp Counter (Continued) Additional Itanium System Environment Exceptions Itanium Reg Faults NaT Register Consumption Abort. #GP(0) If PSR.si is 1 or CR4.TSD is 1 and the CPL is greater than 0. Protected Mode Exceptions #GP(0) If the TSD flag in register CR4 is set and the CPL is greater than 0. /*For the IA-32 System Environment only*/ Real Address Mode Exceptions If the TSD flag in register CR4 is set.
  • Page 344 REP/REPE/REPZ/REPNE /REPNZ—Repeat String Operation Prefix F3 6C REP INS r/m8, DX Input ECX bytes from port DX into ES:[EDI] F3 6D REP INS r/m16,DX Input ECX words from port DX into ES:[EDI] F3 6D REP INS r/m32,DX Input ECX doublewords from port DX into ES:[EDI] F3 A4 REP MOVS m8,m8 Move ECX bytes from DS:[ESI] to ES:[EDI]...
  • Page 345: Repeat Conditions

    REP/REPE/REPZ/REPNE /REPNZ—Repeat String Operation Prefix (Continued) All of these repeat prefixes cause the associated instruction to be repeated until the count in register ECX is decremented to 0 (see the following table). The REPE, REPNE, REPZ, and REPNZ prefixes also check the state of the ZF flag after each iteration and terminate the repeat loop if the ZF flag is not in the specified state.
  • Page 346 REP/REPE/REPZ/REPNE /REPNZ—Repeat String Operation Prefix (Continued) IF CountReg = 0 THEN exit WHILE loop IF (repeat prefix is REPZ or REPE) AND (ZF=0) OR (repeat prefix is REPNZ or REPNE) AND (ZF=1) THEN exit WHILE loop Flags Affected None; however, the CMPS and SCAS instructions do set the status flags in the EFLAGS register.
  • Page 347 RET—Return from Procedure Opcode Instruction Description Near return to calling procedure Far return to calling procedure C2 iw RET imm16 Near return to calling procedure and pop imm16 bytes from stack CA iw RET imm16 Far return to calling procedure and pop imm16 bytes from stack Description Transfers program control to a return address located on the top of the stack.
  • Page 348 RET—Return from Procedure (Continued) Operation (* Near return *) IF instruction = near return THEN; IF OperandSize = 32 THEN IF top 12 bytes of stack not within stack limits THEN #SS(0); FI; EIP  Pop(); ELSE (* OperandSize = 16 *) IF top 6 bytes of stack not within stack limits THEN #SS(0) tempEIP ...
  • Page 349 RET—Return from Procedure (Continued) IF second doubleword on stack is not within stack limits THEN #SS(0); FI; ELSE (* OperandSize = 16 *) IF second word on stack is not within stack limits THEN #SS(0); FI; IF return code segment selector is null THEN GP(0); FI; IF return code segment selector addrsses descriptor beyond diescriptor table limit THEN GP(selector;...
  • Page 350 RET—Return from Procedure (Continued) IF stack segment not present THEN #SS(StackSegmentSelector); FI; IF the return instruction pointer is not within the return code segment limit THEN #GP(0); FI: CPL  ReturnCodeSegmentSelector(RPL); IF OperandSize=32 THEN EIP  Pop(); CS  Pop(); (* 32-bit pop, high-order 16-bits discarded *) (* segment descriptor information also loaded *) CS(RPL) ...
  • Page 351 RET—Return from Procedure (Continued) Additional Itanium System Environment Exceptions Itanium Mem FaultsVHPT Data Fault, Nested TLB Fault, Data TLB Fault, Alternate Data TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault IA_32_Exception Taken Branch Debug Exception if PSR.tb is 1...
  • Page 352 ROL/ROR—Rotate See entry for RCL/RCR/ROL/ROR. Volume 4: Base IA-32 Instruction Reference 4:345...
  • Page 353 • Any illegal combination of bits in CR0, such as (PG=1 and PE=0) or (NW=1 and CD=0). • (Intel Pentium and Intel486 only.) The value stored in the state dump base field is not a 32-KByte aligned address. The contents of the model-specific registers are not affected by a return from SMM.
  • Page 354 SAHF—Store AH into Flags Opcode Instruction Clocks Description SAHF Loads SF, ZF, AF, PF, and CF from AH into EFLAGS register Description Loads the SF, ZF, AF, PF, and CF flags of the EFLAGS register with values from the corresponding bits in the AH register (bits 7, 6, 4, 2, and 0, respectively). Bits 1, 3, and 5 of register AH are ignored;...
  • Page 355 SAL/SAR/SHL/SHR—Shift Instructions Opcode Instruction Description D0 /4 SAL r/m8,1 Multiply r/m8 by 2, once D2 /4 SAL r/m8,CL Multiply r/m8 by 2, CL times C0 /4 ib SAL r/m8,imm8 Multiply r/m8 by 2, imm8 times D1 /4 SAL r/m16,1 Multiply r/m16 by 2, once D3 /4 SAL r/m16,CL Multiply r/m16 by 2, CL times...
  • Page 356 SAL/SAR/SHL/SHR—Shift Instructions (Continued) Description Shift the bits in the first operand (destination operand) to the left or right by the number of bits specified in the second operand (count operand). Bits shifted beyond the destination operand boundary are first shifted into the CF flag, then discarded. At the end of the shift operation, the CF flag contains the last bit shifted out of the destination operand.
  • Page 357 SAL/SAR/SHL/SHR—Shift Instructions (Continued) IF instruction is SAL or SHL THEN CF  MSB(DEST); ELSE (* instruction is SAR or SHR *) CF  LSB(DEST); IF instruction is SAL or SHL THEN DEST  DEST  2; ELSE IF instruction is SAR THEN DEST ...
  • Page 358 If alignment checking is enabled and an unaligned memory reference is made. Intel Architecture Compatibility The 8086 does not mask the shift count. All Intel architecture processors from the Intel386 processor on do mask the rotation count in all operating modes. Volume 4: Base IA-32 Instruction Reference...
  • Page 359 SBB—Integer Subtraction with Borrow Opcode Instruction Description 1C ib SBB AL,imm8 Subtract with borrow imm8 from AL 1D iw SBB AX,imm16 Subtract with borrow imm16 from AX 1D id SBB EAX,imm32 Subtract with borrow imm32 from EAX 80 /3 ib SBB r/m8,imm8 Subtract with borrow imm8 from r/m8 81 /3 iw...
  • Page 360 SBB—Integer Subtraction with Borrow (Continued) Itanium Mem FaultsVHPT Data Fault, Nested TLB Fault, Data TLB Fault, Alternate Data TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault Protected Mode Exceptions #GP(0)
  • Page 361 SCAS/SCASB/SCASW/SCASD—Scan String Data Opcode Instruction Description SCAS ES:(E)DI Compare AL with byte at ES:(E)DI and set status flags SCAS ES:DI Compare AX with word at ES:DI and set status flags SCAS ES:EDI Compare EAX with doubleword at ES:EDI and set status flags SCASB Compare AL with byte at ES:(E)DI and set status flags SCASW...
  • Page 362 SCAS/SCASB/SCASW/SCASD—Scan String Data (Continued) THEN DI  2; ELSE DI  -2; ELSE (* doubleword comparison *) temp  EAX  SRC; SetStatusFlags(temp) THEN IF DF = 0 THEN EDI  4; ELSE EDI  -4; Flags Affected The OF, SF, ZF, AF, PF, and CF flags are set according to the temporary result of the comparison.
  • Page 363 SETcc—Set Byte on Condition Opcode Instruction Description 0F 97 SETA r/m8 Set byte if above (CF=0 and ZF=0) 0F 93 SETAE r/m8 Set byte if above or equal (CF=0) 0F 92 SETB r/m8 Set byte if below (CF=1) 0F 96 SETBE r/m8 Set byte if below or equal (CF=1 or (ZF=1) 0F 92...
  • Page 364 SETcc—Set Byte on Condition (Continued) Many of the SETcc instruction opcodes have alternate mnemonics. For example, the SETG (set byte if greater) and SETNLE (set if not less or equal) both have the same opcode and test for the same condition: ZF equals 0 and SF equals OF. These alternate mnemonics are provided to make code more intelligible.
  • Page 365 SETcc—Set Byte on Condition (Continued) Virtual 8086 Mode Exceptions #GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. #SS(0) If a memory operand effective address is outside the SS segment limit. #PF(fault-code) If a page fault occurs.
  • Page 366 SGDT/SIDT—Store Global/Interrupt Descriptor Table Register Opcode Instruction Description 0F 01 /0 SGDT m Store GDTR to m 0F 01 /1 SIDT m Store IDTR to m Description Stores the contents of the global descriptor table register (GDTR) or the interrupt descriptor table register (IDTR) in the destination operand.
  • Page 367 The 16-bit forms of the SGDT and SIDT instructions are compatible with the Intel 286 processor, if the upper 8 bits are not referenced. The Intel 286 processor fills these bits with 1s; the Pentium Pro processor fills these bits with 0s.
  • Page 368 SHL/SHR—Shift Instructions See entry for SAL/SAR/SHL/SHR. Volume 4: Base IA-32 Instruction Reference 4:361...
  • Page 369 SHLD—Double Precision Shift Left Opcode Instruction Description 0F A4 SHLD r/m16,r16,imm8 Shift r/m16 to left imm8 places while shifting bits from r16 in from the right 0F A5 SHLD r/m16,r16,CL Shift r/m16 to left CL places while shifting bits from r16 in from the right 0F A4 SHLD r/m32,r32,imm8...
  • Page 370 SHLD—Double Precision Shift Left (Continued) BIT[DEST, i]  BIT[SRC, i - COUNT + SIZE]; Flags Affected If the count is 1 or greater, the CF flag is filled with the last bit shifted out of the destination operand and the SF, ZF, and PF flags are set according to the value of the result.
  • Page 371 SHRD—Double Precision Shift Right Opcode Instruction Description 0F AC SHRD r/m16,r16,imm8 Shift r/m16 to right imm8 places while shifting bits from r16 in from the left 0F AD SHRD r/m16,r16,CL Shift r/m16 to right CL places while shifting bits from r16 in from the left 0F AC SHRD r/m32,r32,imm8...
  • Page 372 SHRD—Double Precision Shift Right (Continued) Flags Affected If the count is 1 or greater, the CF flag is filled with the last bit shifted out of the destination operand and the SF, ZF, and PF flags are set according to the value of the result.
  • Page 373 SIDT—Store Interrupt Descriptor Table Register See entry for SGDT/SIDT. 4:366 Volume 4: Base IA-32 Instruction Reference...
  • Page 374 SLDT—Store Local Descriptor Table Register Opcode Instruction Description 0F 00 /0 SLDT r/m16 Stores segment selector from LDTR in r/m16 0F 00 /0 SLDT r/m32 Store segment selector from LDTR in low-order 16 bits of r/m32; high-order 16 bits are undefined Description Stores the segment selector from the local descriptor table register (LDTR) in the destination operand.
  • Page 375 SLDT—Store Local Descriptor Table Register (Continued) Real Address Mode Exceptions The SLDT instruction is not recognized in real address mode. Virtual 8086 Mode Exceptions The SLDT instruction is not recognized in virtual 8086 mode. 4:368 Volume 4: Base IA-32 Instruction Reference...
  • Page 376 This instruction is provided for compatibility with the Intel 286 processor; programs and procedures intended to run on processors more recent than the Intel 286 should use the MOV (control registers) instruction to load the machine status word.
  • Page 377 SMSW—Store Machine Status Word (Continued) Real Address Mode Exceptions If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. Virtual 8086 Mode Exceptions #GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit.
  • Page 378 STC—Set Carry Flag Opcode Instruction Description Set CF flag Description Sets the CF flag in the EFLAGS register. Operation CF  1; Flags Affected The CF flag is set. The OF, ZF, SF, AF, and PF flags are unaffected. Exceptions (All Operating Modes) None.
  • Page 379 STD—Set Direction Flag Opcode Instruction Description Set DF flag Description Sets the DF flag in the EFLAGS register. When the DF flag is set to 1, string operations decrement the index registers (ESI and/or EDI). Operation DF  1; Flags Affected The DF flag is set.
  • Page 380 STI—Set Interrupt Flag Opcode Instruction Description Set interrupt flag; interrupts enabled at the end of the next instruction Description Sets the interrupt flag (IF) in the EFLAGS register. In the IA-32 System Environment, after the IF flag is set, the processor begins responding to external maskable interrupts after the next instruction is executed.
  • Page 381 STI—Set Interrupt Flag (Continued) IF CPL = 3 THENSTI—Set Interrupt Flag (Continued) IF IOPL < 3 THEN IF VIP = 0 THEN VIF <- 1; ELSE #GP(0); ELSE (*IOPL = 3 *) IF <- 1; ELSE (*CPL < 3*) IF IOPL < CPL THEN #GP(0); FI; IF IOPL>=CPL OR IOPL=3 THEN IF <-1;...
  • Page 382 STI—Set Interrupt Flag (Continued) Real Address Mode Exceptions None. Virtual 8086 Mode Exceptions #GP(0) If the CPL is greater (has less privilege) than the IOPL of the current program or procedure. Volume 4: Base IA-32 Instruction Reference 4:375...
  • Page 383 STOS/STOSB/STOSW/STOSD—Store String Data Opcode Instruction Description STOS ES:(E)DI Store AL at address ES:(E)DI STOS ES:DI Store AX at address ES:DI STOS ES:EDI Store EAX at address ES:EDI STOSB Store AL at address ES:(E)DI STOSW Store AX at address ES:DI STOSD Store EAX at address ES:EDI Description Stores a byte, word, or doubleword from the AL, AX, or EAX register, respectively, into...
  • Page 384 STOS/STOSB/STOSW/STOSD—Store String Data (Continued) DEST  EAX; THEN IF DF = 0 THEN EDI  4; ELSE EDI  -4; Flags Affected None. Additional Itanium System Environment Exceptions Itanium Reg Faults NaT Register Consumption Abort. Itanium Mem FaultsVHPT Data Fault, Nested TLB Fault, Data TLB Fault, Alternate Data TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault...
  • Page 385 STR—Store Task Register Opcode Instruction Description 0F 00 /1 STR r/m16 Stores segment selector from TR in r/m16 Description Stores the segment selector from the task register (TR) in the destination operand. The destination operand can be a general-purpose register or a memory location. The segment selector stored with this instruction points to the task state segment (TSS) for the currently running task.
  • Page 386 SUB—Integer Subtraction Opcode Instruction Description 2C ib SUB AL,imm8 Subtract imm8 from AL 2D iw SUB AX,imm16 Subtract imm16 from AX 2D id SUB EAX,imm32 Subtract imm32 from EAX 80 /5 ib SUB r/m8,imm8 Subtract imm8 from r/m8 81 /5 iw SUB r/m16,imm16 Subtract imm16 from r/m16 81 /5 id...
  • Page 387 SUB—Integer Subtraction (Continued) Protected Mode Exceptions #GP(0) If the destination is located in a nonwritable segment. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register contains a null segment selector. #SS(0) If a memory operand effective address is outside the SS segment limit.
  • Page 388 TEST—Logical Compare Opcode Instruction Description A8 ib TEST AL,imm8 AND imm8 with AL; set SF, ZF, PF according to result A9 iw TEST AX,imm16 AND imm16 with AX; set SF, ZF, PF according to result A9 id TEST EAX,imm32 AND imm32 with EAX; set SF, ZF, PF according to result F6 /0 ib TEST r/m8,imm8 AND imm8 with r/m8;...
  • Page 389 TEST—Logical Compare (Continued) Protected Mode Exceptions #GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register contains a null segment selector. #SS(0) If a memory operand effective address is outside the SS segment limit.
  • Page 390 UD2—Undefined Instruction Opcode Instruction Description 0F 0B Raise invalid opcode exception Description Generates an invalid opcode. This instruction is provided for software testing to explicitly generate an invalid opcode. The opcode for this instruction is reserved for this purpose. Other than raising the invalid opcode exception, this instruction is the same as the NOP instruction.
  • Page 391 VERR, VERW—Verify a Segment for Reading or Writing Opcode Instruction Description 0F 00 /4 VERR r/m16 Set ZF=1 if segment specified with r/m16 can be read 0F 00 /5 VERW r/m16 Set ZF=1 if segment specified with r/m16 can be written Description Verifies whether the code or data segment specified with the source operand is readable (VERR) or writable (VERW) from the current privilege level (CPL).
  • Page 392 VERR, VERW—Verify a Segment for Reading or Writing (Continued) Flags Affected The ZF flag is set to 1 if the segment is accessible and readable (VERR) or writable (VERW); otherwise, it is cleared to 0. Additional Itanium System Environment Exceptions Itanium Reg Faults NaT Register Consumption Abort.
  • Page 393 WAIT/FWAIT—Wait Opcode Instruction Description WAIT Check pending unmasked floating-point exceptions. FWAIT Check pending unmasked floating-point exceptions. Description Causes the processor to check for and handle pending unmasked floating-point exceptions before proceeding. (FWAIT is an alternate mnemonic for the WAIT). This instruction is useful for synchronizing exceptions in critical sections of code. Coding a WAIT instruction after a floating-point instruction insures that any unmasked floating-point exceptions the instruction may raise are handled before the processor can modify the instruction’s results.
  • Page 394 WBINVD—Write-Back and Invalidate Cache Opcode Instruction Description 0F 09 WBINVD Write-back and flush Internal caches; initiate writing-back and flushing of external caches. Description Writes back all modified cache lines in the processor’s internal cache to main memory, invalidates (flushes) the internal caches, and issues a special-function bus cycle that directs external caches to also write back modified data.
  • Page 395 Intel Architecture Compatibility The WDINVD instruction implementation-dependent; its function may be implemented differently on future Intel architecture processors. The instruction is not supported on Intel architecture processors earlier than the Intel486 processor. 4:388 Volume 4: Base IA-32 Instruction Reference...
  • Page 396 MSR address in ECX will also cause a general protection exception. When the WRMSR instruction is used to write to an MTRR, the TLBs are invalidated, including the global entries see the Intel Architecture Software Developer’s Manual, Volume 3). The MSRs control functions for testability, execution tracing, performance-monitoring and machine check errors.
  • Page 397 The MSRs and the ability to read them with the WRMSR instruction were introduced into the Intel architecture with the Pentium processor. Execution of this instruction by an Intel architecture processor earlier than the Pentium processor results in an invalid opcode exception #UD.
  • Page 398 XADD—Exchange and Add Opcode Instruction Description 0F C0/r XADD r/m8,r8 Exchange r8 and r/m8; load sum into r/m8. 0F C1/r XADD r/m16,r16 Exchange r16 and r/m16; load sum into r/m16. 0F C1/r XADD r/m32,r32 Exchange r32 and r/m32; load sum into r/m32. Description Exchanges the first operand (destination operand) with the second operand (source operand), then loads the sum of the two values into the destination operand.
  • Page 399 If alignment checking is enabled and an unaligned memory reference is made. Intel Architecture Compatibility Intel architecture processors earlier than the Intel486 processor do not recognize this instruction. If this instruction is used, you should provide an equivalent code sequence that runs on earlier processors.
  • Page 400 This instruction is useful for implementing semaphores or similar data structures for process synchronization. (See Chapter 5, Processor Management and Initialization, in the Intel Architecture Software Developer’s Manual, Volume 3 for more information on bus locking.) The XCHG instruction can also be used instead of the BSWAP instruction for 16-bit operands.
  • Page 401 XCHG—Exchange Register/Memory with Register (Continued) IA-32_Intercept Lock Intercept If an external atomic bus lock is required to – complete this operation and DCR.lc is 1, no atomic transaction occurs, this instruction is faulted and an IA-32_Intercept(Lock) fault is generated. The software lock handler is responsible for the emulation of this instruction.
  • Page 402 XLAT/XLATB—Table Look-up Translation Opcode Instruction Description XLAT m8 Set AL to memory byte DS:[(E)BX + unsigned AL] XLATB Set AL to memory byte DS:[(E)BX + unsigned AL] Description Locates a byte entry in a table in memory, using the contents of the AL register as a table index, then copies the contents of the table entry back into the AL register.
  • Page 403 XLAT/XLATB—Table Look-up Translation (Continued) Real Address Mode Exceptions If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If a memory operand effective address is outside the SS segment limit. Virtual 8086 Mode Exceptions #GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit.
  • Page 404 XOR—Logical Exclusive OR Opcode Instruction Description 34 ib XOR AL,imm8 AL XOR imm8 35 iw XOR AX,imm16 AX XOR imm16 35 id XOR EAX,imm32 EAX XOR imm32 80 /6 ib XOR r/m8,imm8 r/m8 XOR imm8 81 /6 iw XOR r/m16,imm16 r/m16 XOR imm16 81 /6 id XOR r/m32,imm32...
  • Page 405 XOR—Logical Exclusive OR (Continued) Protected Mode Exceptions #GP(0) If the destination operand points to a nonwritable segment. If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. If the DS, ES, FS, or GS register contains a null segment selector. #SS(0) If a memory operand effective address is outside the SS segment limit.
  • Page 406: Intel ® Mmx™ Technology Instruction Reference

    ® IA-32 Intel MMX™ Technology Instruction Reference This section lists the IA-32 MMX technology instructions designed to increase performance of multimedia intensive applications. ® Volume 4: IA-32 Intel MMX™ Technology Instruction Reference 4:399...
  • Page 407 Sets the values of all the tags in the FPU tag word to empty (all ones). This operation marks the MMX technology registers as available, so they can subsequently be used by floating-point instructions. (See Figure 7-11 in the Intel Architecture Software Developer’s Manual, Volume 1, for the format of the FPU tag word.) All other MMX technology instructions (other than the EMMS instruction) set all the tags in FPU tag word to valid (all zeros).
  • Page 408: Operation Of The Movd Instruction

    32 31 xxxxxxxx MOVD mm, r32 32 31 00000000 3006010 Operation IF DEST is MMX register THEN DEST  ZeroExtend(SRC); ELSE (* SRC is MMX register *) DEST  LowOrderDoubleword(SRC); ® Volume 4: IA-32 Intel MMX™ Technology Instruction Reference 4:401...
  • Page 409 If TS in CR0 is set. If there is a pending FPU exception. #PF(fault-code) If a page fault occurs. #AC(0) If alignment checking is enabled and an unaligned memory reference is made. ® 4:402 Volume 4: IA-32 Intel MMX™ Technology Instruction Reference...
  • Page 410: Operation Of The Movq Instruction

    TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault ® Volume 4: IA-32 Intel MMX™ Technology Instruction Reference 4:403...
  • Page 411 If TS in CR0 is set. If there is a pending FPU exception. #PF(fault-code) If a page fault occurs. #AC(0) If alignment checking is enabled and an unaligned memory reference is made. ® 4:404 Volume 4: IA-32 Intel MMX™ Technology Instruction Reference...
  • Page 412: Operation Of The Packssdw Instruction

    DEST(7..0)  SaturateSignedWordToSignedByte DEST(15..0); DEST(15..8)  SaturateSignedWordToSignedByte DEST(31..16); DEST(23..16)  SaturateSignedWordToSignedByte DEST(47..32); DEST(31..24)  SaturateSignedWordToSignedByte DEST(63..48); DEST(39..32)  SaturateSignedWordToSignedByte SRC(15..0); DEST(47..40)  SaturateSignedWordToSignedByte SRC(31..16); DEST(55..48)  SaturateSignedWordToSignedByte SRC(47..32); DEST(63..56)  SaturateSignedWordToSignedByte SRC(63..48); ® Volume 4: IA-32 Intel MMX™ Technology Instruction Reference 4:405...
  • Page 413 If any part of the operand lies outside of the effective address space from 0 to FFFFH. If EM in CR0 is set. If TS in CR0 is set. If there is a pending FPU exception. ® 4:406 Volume 4: IA-32 Intel MMX™ Technology Instruction Reference...
  • Page 414 If TS in CR0 is set. If there is a pending FPU exception. #PF(fault-code) If a page fault occurs. #AC(0) If alignment checking is enabled and an unaligned memory reference is made. ® Volume 4: IA-32 Intel MMX™ Technology Instruction Reference 4:407...
  • Page 415: Operation Of The Packuswb Instruction

    TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault ® 4:408 Volume 4: IA-32 Intel MMX™ Technology Instruction Reference...
  • Page 416 If TS in CR0 is set. If there is a pending FPU exception. #PF(fault-code) If a page fault occurs. #AC(0) If alignment checking is enabled and an unaligned memory reference is made. ® Volume 4: IA-32 Intel MMX™ Technology Instruction Reference 4:409...
  • Page 417: Operation Of The Paddw Instruction

    32 bits, the lower 32 bits of the result are written to the destination operand and therefore the result wraps around. ® 4:410 Volume 4: IA-32 Intel MMX™ Technology Instruction Reference...
  • Page 418 TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault ® Volume 4: IA-32 Intel MMX™ Technology Instruction Reference 4:411...
  • Page 419 If TS in CR0 is set. If there is a pending FPU exception. #PF(fault-code) If a page fault occurs. #AC(0) If alignment checking is enabled and an unaligned memory reference is made. ® 4:412 Volume 4: IA-32 Intel MMX™ Technology Instruction Reference...
  • Page 420: Operation Of The Paddsw Instruction

    DEST(23..16)  SaturateToSignedByte(DEST(23..16)+ SRC(23..16) ); DEST(31..24)  SaturateToSignedByte(DEST(31..24) + SRC(31..24) ); DEST(39..32)  SaturateToSignedByte(DEST(39..32) + SRC(39..32) ); DEST(47..40)  SaturateToSignedByte(DEST(47..40)+ SRC(47..40) ); DEST(55..48)  SaturateToSignedByte(DEST(55..48) + SRC(55..48) ); DEST(63..56)  SaturateToSignedByte(DEST(63..56) + SRC(63..56) ); ® Volume 4: IA-32 Intel MMX™ Technology Instruction Reference 4:413...
  • Page 421 If any part of the operand lies outside of the effective address space from 0 to FFFFH. If EM in CR0 is set. If TS in CR0 is set. If there is a pending FPU exception. ® 4:414 Volume 4: IA-32 Intel MMX™ Technology Instruction Reference...
  • Page 422 If TS in CR0 is set. If there is a pending FPU exception. #PF(fault-code) If a page fault occurs. #AC(0) If alignment checking is enabled and an unaligned memory reference is made. ® Volume 4: IA-32 Intel MMX™ Technology Instruction Reference 4:415...
  • Page 423: Operation Of The Paddusb Instruction

    When an individual result is beyond the range of an unsigned word (that is, greater than FFFFH), the saturated unsigned word value of FFFFH is written to the destination operand. ® 4:416 Volume 4: IA-32 Intel MMX™ Technology Instruction Reference...
  • Page 424 If there is a pending FPU exception. #PF(fault-code) If a page fault occurs. #AC(0) If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3. ® Volume 4: IA-32 Intel MMX™ Technology Instruction Reference 4:417...
  • Page 425 If TS in CR0 is set. If there is a pending FPU exception. #PF(fault-code) If a page fault occurs. #AC(0) If alignment checking is enabled and an unaligned memory reference is made. ® 4:418 Volume 4: IA-32 Intel MMX™ Technology Instruction Reference...
  • Page 426: Operation Of The Pand Instruction

    TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault ® Volume 4: IA-32 Intel MMX™ Technology Instruction Reference 4:419...
  • Page 427 If TS in CR0 is set. If there is a pending FPU exception. #PF(fault-code) If a page fault occurs. #AC(0) If alignment checking is enabled and an unaligned memory reference is made. ® 4:420 Volume 4: IA-32 Intel MMX™ Technology Instruction Reference...
  • Page 428: Operation Of The Pandn Instruction

    TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault ® Volume 4: IA-32 Intel MMX™ Technology Instruction Reference 4:421...
  • Page 429 If TS in CR0 is set. If there is a pending FPU exception. #PF(fault-code) If a page fault occurs. #AC(0) If alignment checking is enabled and an unaligned memory reference is made. ® 4:422 Volume 4: IA-32 Intel MMX™ Technology Instruction Reference...
  • Page 430: Operation Of The Pcmpeqw Instruction

    The PCMPEQD instruction compares the doublewords in the destination operand to the corresponding doublewords in the source operand, with the doublewords in the destination operand being set according to the results. ® Volume 4: IA-32 Intel MMX™ Technology Instruction Reference 4:423...
  • Page 431 If a memory operand effective address is outside the SS segment limit. If EM in CR0 is set. If TS in CR0 is set. If there is a pending FPU exception. #PF(fault-code) If a page fault occurs. ® 4:424 Volume 4: IA-32 Intel MMX™ Technology Instruction Reference...
  • Page 432 If TS in CR0 is set. If there is a pending FPU exception. #PF(fault-code) If a page fault occurs. #AC(0) If alignment checking is enabled and an unaligned memory reference is made. ® Volume 4: IA-32 Intel MMX™ Technology Instruction Reference 4:425...
  • Page 433: Operation Of The Pcmpgtw Instruction

    The PCMPGTD instruction compares the signed doublewords in the destination operand to the corresponding signed doublewords in the source operand, with the doublewords in the destination operand being set according to the results. ® 4:426 Volume 4: IA-32 Intel MMX™ Technology Instruction Reference...
  • Page 434 TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault ® Volume 4: IA-32 Intel MMX™ Technology Instruction Reference 4:427...
  • Page 435 If TS in CR0 is set. If there is a pending FPU exception. #PF(fault-code) If a page fault occurs. #AC(0) If alignment checking is enabled and an unaligned memory reference is made. ® 4:428 Volume 4: IA-32 Intel MMX™ Technology Instruction Reference...
  • Page 436: Operation Of The Pmaddwd Instruction

    TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault ® Volume 4: IA-32 Intel MMX™ Technology Instruction Reference 4:429...
  • Page 437 If TS in CR0 is set. If there is a pending FPU exception. #PF(fault-code) If a page fault occurs. #AC(0) If alignment checking is enabled and an unaligned memory reference is made. ® 4:430 Volume 4: IA-32 Intel MMX™ Technology Instruction Reference...
  • Page 438: Operation Of The Pmulhw Instruction

    TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault ® Volume 4: IA-32 Intel MMX™ Technology Instruction Reference 4:431...
  • Page 439 If TS in CR0 is set. If there is a pending FPU exception. #PF(fault-code) If a page fault occurs. #AC(0) If alignment checking is enabled and an unaligned memory reference is made. ® 4:432 Volume 4: IA-32 Intel MMX™ Technology Instruction Reference...
  • Page 440: Operation Of The Pmullw Instruction

    TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault ® Volume 4: IA-32 Intel MMX™ Technology Instruction Reference 4:433...
  • Page 441 If TS in CR0 is set. If there is a pending FPU exception. #PF(fault-code) If a page fault occurs. #AC(0) If alignment checking is enabled and an unaligned memory reference is made. ® 4:434 Volume 4: IA-32 Intel MMX™ Technology Instruction Reference...
  • Page 442: Operation Of The Por Instruction

    TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault ® Volume 4: IA-32 Intel MMX™ Technology Instruction Reference 4:435...
  • Page 443 If TS in CR0 is set. If there is a pending FPU exception. #PF(fault-code) If a page fault occurs. #AC(0) If alignment checking is enabled and an unaligned memory reference is made. ® 4:436 Volume 4: IA-32 Intel MMX™ Technology Instruction Reference...
  • Page 444: Operation Of The Psllw Instruction

    Figure 3-16. Operation of the PSLLW Instruction PSLLW mm, 2 1111111111111100 0001000111000111 shift left shift left shift left shift left 1111111111110000 0100011100011100 3006026 ® Volume 4: IA-32 Intel MMX™ Technology Instruction Reference 4:437...
  • Page 445 If any part of the operand lies outside of the effective address space from 0 to FFFFH. If EM in CR0 is set. If TS in CR0 is set. If there is a pending FPU exception. ® 4:438 Volume 4: IA-32 Intel MMX™ Technology Instruction Reference...
  • Page 446 If TS in CR0 is set. If there is a pending FPU exception. #PF(fault-code) If a page fault occurs. #AC(0) If alignment checking is enabled and an unaligned memory reference is made. ® Volume 4: IA-32 Intel MMX™ Technology Instruction Reference 4:439...
  • Page 447: Operation Of The Psraw Instruction

    Figure 3-17. Operation of the PSRAW Instruction PSRAW mm, 2 1111111111111100 1101000111000111 shift right shift right shift right shift right 1111111111111111 1111010001110001 3006048 ® 4:440 Volume 4: IA-32 Intel MMX™ Technology Instruction Reference...
  • Page 448 If any part of the operand lies outside of the effective address space from 0 to FFFFH. If EM in CR0 is set. If TS in CR0 is set. If there is a pending FPU exception. ® Volume 4: IA-32 Intel MMX™ Technology Instruction Reference 4:441...
  • Page 449 If TS in CR0 is set. If there is a pending FPU exception. #PF(fault-code) If a page fault occurs. #AC(0) If alignment checking is enabled and an unaligned memory reference is made. ® 4:442 Volume 4: IA-32 Intel MMX™ Technology Instruction Reference...
  • Page 450: Operation Of The Psrlw Instruction

    Figure 3-18. Operation of the PSRLW Instruction PSRLW mm, 2 1111111111111100 0001000111000111 shift right shift right shift right shift right 0011111111111111 0000010001110001 3006027 ® Volume 4: IA-32 Intel MMX™ Technology Instruction Reference 4:443...
  • Page 451 If any part of the operand lies outside of the effective address space from 0 to FFFFH. If EM in CR0 is set. If TS in CR0 is set. If there is a pending FPU exception. ® 4:444 Volume 4: IA-32 Intel MMX™ Technology Instruction Reference...
  • Page 452 If TS in CR0 is set. If there is a pending FPU exception. #PF(fault-code) If a page fault occurs. #AC(0) If alignment checking is enabled and an unaligned memory reference is made. ® Volume 4: IA-32 Intel MMX™ Technology Instruction Reference 4:445...
  • Page 453: Operation Of The Psubw Instruction

    When an individual result is too large to be represented in 32 bits, the lower 32 bits of the result are written to the destination operand and therefore the result wraps around. ® 4:446 Volume 4: IA-32 Intel MMX™ Technology Instruction Reference...
  • Page 454 TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault ® Volume 4: IA-32 Intel MMX™ Technology Instruction Reference 4:447...
  • Page 455 If TS in CR0 is set. If there is a pending FPU exception. #PF(fault-code) If a page fault occurs. #AC(0) If alignment checking is enabled and an unaligned memory reference is made. ® 4:448 Volume 4: IA-32 Intel MMX™ Technology Instruction Reference...
  • Page 456: Operation Of The Psubsw Instruction

    When an individual result is beyond the range of a signed word (that is, greater than 7FFFH or less than 8000H), the saturated word value of 7FFFH or 8000H, respectively, is written to the destination operand. ® Volume 4: IA-32 Intel MMX™ Technology Instruction Reference 4:449...
  • Page 457 If there is a pending FPU exception. #PF(fault-code) If a page fault occurs. #AC(0) If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3. ® 4:450 Volume 4: IA-32 Intel MMX™ Technology Instruction Reference...
  • Page 458 If TS in CR0 is set. If there is a pending FPU exception. #PF(fault-code) If a page fault occurs. #AC(0) If alignment checking is enabled and an unaligned memory reference is made. ® Volume 4: IA-32 Intel MMX™ Technology Instruction Reference 4:451...
  • Page 459: Operation Of The Psubusb Instruction

    When an individual result is less than zero (a negative value), the saturated unsigned word value of 0000H is written to the destination operand. ® 4:452 Volume 4: IA-32 Intel MMX™ Technology Instruction Reference...
  • Page 460 If there is a pending FPU exception. #PF(fault-code) If a page fault occurs. #AC(0) If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3. ® Volume 4: IA-32 Intel MMX™ Technology Instruction Reference 4:453...
  • Page 461 If TS in CR0 is set. If there is a pending FPU exception. #PF(fault-code) If a page fault occurs. #AC(0) If alignment checking is enabled and an unaligned memory reference is made. ® 4:454 Volume 4: IA-32 Intel MMX™ Technology Instruction Reference...
  • Page 462: High-Order Unpacking And Interleaving Of Bytes With The Punpckhbw Instruction

    With the PUNPCKHBW instruction the high-order bytes are zero extended (that is, unpacked into unsigned words), and with the PUNPCKHWD instruction, the high-order words are zero extended (unpacked into unsigned doublewords). ® Volume 4: IA-32 Intel MMX™ Technology Instruction Reference 4:455...
  • Page 463 If there is a pending FPU exception. #PF(fault-code) If a page fault occurs. #AC(0) If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3. ® 4:456 Volume 4: IA-32 Intel MMX™ Technology Instruction Reference...
  • Page 464 If TS in CR0 is set. If there is a pending FPU exception. #PF(fault-code) If a page fault occurs. #AC(0) If alignment checking is enabled and an unaligned memory reference is made. ® Volume 4: IA-32 Intel MMX™ Technology Instruction Reference 4:457...
  • Page 465: Low-Order Unpacking And Interleaving Of Bytes With The Punpcklbw Instruction

    With the PUNPCKLBW instruction the low-order bytes are zero extended (that is, unpacked into unsigned words), and with the PUNPCKLWD instruction, the low-order words are zero extended (unpacked into unsigned doublewords). ® 4:458 Volume 4: IA-32 Intel MMX™ Technology Instruction Reference...
  • Page 466 If there is a pending FPU exception. #PF(fault-code) If a page fault occurs. #AC(0) If alignment checking is enabled and an unaligned memory reference is made while the current privilege level is 3. ® Volume 4: IA-32 Intel MMX™ Technology Instruction Reference 4:459...
  • Page 467 If TS in CR0 is set. If there is a pending FPU exception. #PF(fault-code) If a page fault occurs. #AC(0) If alignment checking is enabled and an unaligned memory reference is made. ® 4:460 Volume 4: IA-32 Intel MMX™ Technology Instruction Reference...
  • Page 468: Operation Of The Pxor Instruction

    TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault ® Volume 4: IA-32 Intel MMX™ Technology Instruction Reference 4:461...
  • Page 469 If TS in CR0 is set. If there is a pending FPU exception. #PF(fault-code) If a page fault occurs. #AC(0) If alignment checking is enabled and an unaligned memory reference is made. § ® 4:462 Volume 4: IA-32 Intel MMX™ Technology Instruction Reference...
  • Page 470: About The Intel ® Sse Architecture

    The SSE instruction set is fully compatible with all software written for Intel architecture microprocessors. All existing software continues to run correctly, without modification, on microprocessors that incorporate the Intel SSE architecture, as well as in the presence of existing and new applications that incorporate this technology.
  • Page 471: Single Instruction Multiple Data

    • Localized re-occurring operations performed on the data • Data independent control flow The Intel SSE architecture is 100% compatible with the IEEE Standard 754 for Binary Floating-point Arithmetic. The SSE instructions are accessible from all IA execution modes: Protected mode, Real address mode, and Virtual 8086 mode.New Features...
  • Page 472: Extended Instruction Set

    XMM0 Extended Instruction Set The Intel SSE architecture supplies a rich set of instructions that operate on either all or the least significant pairs of packed data operands, in parallel. The packed instructions operate on a pair of operands as shown in...
  • Page 473: Instruction Group Review

    Figure 4-3. Packed Operation X1 (SP) X2 (SP) X3 (SP) X4 (SP) Y1 (SP) Y2 (SP) Y3 (SP) Y4 (SP) X1 op Y1 (SP) X2 op Y2 (SP) X3 op Y3 (SP) X4 op Y4 (SP) Figure 4-4. Scalar Operation X1 (SP) X2 (SP) X3 (SP)
  • Page 474 The DIVPS (Divide packed single-precision floating-point) instruction divides four pairs of packed single-precision floating-point operands. The DIVSS (Divide scalar single-precision floating-point) instruction divides the least significant pair of packed single-precision floating-point operands; the upper three fields are passed through from the source operand. Packed/Scalar Square Root The SQRTPS (Square root packed single-precision floating-point) instruction returns the square root of the packed four single-precision floating-point numbers from the source...
  • Page 475: Packed Shuffle Operation

    4.6.1.3 Compare Instructions The CMPPS (Compare packed single-precision floating-point) instruction compares four pairs of packed single-precision floating-point numbers using the immediate operand as a predicate, returning per SP field an all “1” 32-bit mask or an all “0” 32-bit mask as a result.
  • Page 476: Unpack High Operation

    sources (Figure 4-6). When unpacking from a memory operand, the full 128-bit operand is accessed from memory but only the high order 64 bits are utilized by the instruction. Figure 4-6. Unpack High Operation The UNPCKLPS (Unpacked low packed single-precision floating-point) instruction performs an interleaved unpack of the low-order data elements of first and second packed single-precision floating-point operands.
  • Page 477 The CVTSS2SI (Convert scalar single-precision floating-point to a 32-bit integer) instruction converts the least significant single-precision floating-point number to a 32-bit signed integer in an Intel architecture 32-bit integer register; when the conversion is inexact, the rounded value according to the rounding mode in MXCSR is returned.The CVTTSS2SI (Convert truncate scalar single-precision floating-point to...
  • Page 478 The MOVSS (Move scalar single-precision floating-point) instruction transfers a single 32-bit floating-point number from memory to a SSE register or vice versa, and between registers. 4.6.1.7 State Management Instructions The LDMXCSR (Load SSE Control and Status Register) instruction loads the SSE control and status register from memory.
  • Page 479 The PMULHUW (Unsigned high packed integer word multiply in MMX technology register) instruction performs an unsigned multiply on each word field of the two source MMX technology registers, returning the high word of each result to a MMX technology register. The PSADBW (Sum of absolute differences) instruction computes the absolute difference for each pair of sub-operand byte sources and then accumulates the 8 differences into a single 16-bit result.
  • Page 480 The MOVNTPS (Non-temporal store of packed single-precision floating-point) instruction stores data from a SSE register to memory. The memory address must be aligned to a 16-byte boundary; if it is not aligned, a general protection exception will occur. The instruction is implicitly weakly-ordered, does not write-allocate and minimizes cache pollution.
  • Page 481: Ieee Compliance

    The SFENCE (Store Fence) instruction guarantees that every store instruction that precedes the store fence instruction in program order is globally visible before any store instruction which follows the fence. The SFENCE instruction provides an efficient way of ensuring ordering between routines that produce weakly-ordered results and routines that consume this data.
  • Page 482: Binary Real Number System

    Figure 4-8. Binary Real Number System Binary Real Number System -100 Subset of binary real-numbers that can be represented with IEEE single-precision (32-bit) floating-point format. -100 10.0000000000000000000000 1.11111111111111111111111 Precision 24 Binary Digits Numbers within this range cannot be represented. Because the size and number of registers that any computer can have is limited, only a subset of the real-number continuum can be used in real-number calculations.
  • Page 483: Binary Floating-Point Format

    Figure 4-9. Binary Floating-point Format Sign Exponent Significand Fraction Integer or J-Bit Table 4-1 shows how the real number 178.125 (in ordinary decimal format) is stored in floating-point format. The table lists a progression of real number notations that leads to the format that the processor uses.
  • Page 484 4.7.1.4 Real Number and Non-Number Encodings A variety of real numbers and special values can be encoded in the processor’s floating-point format. These numbers and values are generally divided into the following classes: • Signed zeros • Denormalized finite numbers •...
  • Page 485: Real Numbers And Nans

    Figure 4-10. Real Numbers and NaNs -Denormalized Finite +Denormalized Finite   Normalized Finite +Normalized Finite Real Number and NaN Encodings For 32-bit Floating-point Format -Denormalized +Denormalized 0.XXX 0.XXX Finite Finite -Normalized +Normalized 1 1...254 0 1...254 Any Value Any Value Finite Finite ...
  • Page 486 Table 4-2. Denormalization Process Operation Sign Exponent Significand 126 Denormalize 0.00101011100...00 126 Denormal Result 0.00101011100...00 a. Expressed as an unbiased, decimal number. In the extreme case, all the significant bits are shifted out to the right by leading zeros, creating a zero result. The processor deals with denormal values in the following ways: •...
  • Page 487: Operating On Nans

    As was described in Section 4.7.1.8, “NaNs” on page 4:479, the Intel SSE architecture supports two types of NaNs: SNaNs and QNaNs. An SNaN is any NaN value with its most-significant fraction bit set to 0 and at least one other fraction bit set to 1. (If all the fraction bits are set to 0, the value is an .) A QNaN is any NaN value with the...
  • Page 488: Data Formats

    4.8.1 Memory Data Formats The Intel SSE architecture introduces a new packed 128-bit data type which consists of 4 single-precision floating-point numbers. The 128 bits are numbered 0 through 127. Bit 0 is the least significant bit (LSB), and bit 127 is the most significant bit (MSB).
  • Page 489: Precision And Range Of Sse Datatype

    Table 4-4. Precision and Range of SSE Datatype Approximate Normalized Range Precision Data Type Length (Bits) Binary Decimal -126 1.18  10 to 3.40  10 Single-precision to 2 Table 4-5 shows the encodings for all the classes of real numbers (that is, zero, denormalized-finite, normalized-finite, and ) and NaNs for the single-real data-type.
  • Page 490: Instruction Formats

    Instruction Formats The nature of the Intel SSE architecture allows the use of existing instruction formats. Instructions use the ModR/M format and are preceded by the 0F prefix byte. In general, operations are not duplicated to provide two directions (i.e. separate load and store variants).
  • Page 491: Reserved Behavior And Software Compatibility

    In addition, the following abbreviations are used: • r32: Intel architecture 32-bit integer register. • xmm/m128:Indicates a 128-bit multimedia register or a 128-bit memory location. • xmm/m64: Indicates a 128-bit multimedia register or a 64-bit memory location.
  • Page 492: Key To Sse Naming Convention

    • imm8: Indicates an immediate 8-bit operand. • ib: Indicates that an immediate byte operand follows the opcode, ModR/M byte or scaled-indexing byte. When there is ambiguity, xmm1 indicates the first source operand and xmm2 the second source operand. Table 4-9 describes the naming conventions used in the SSE instruction mnemonics.
  • Page 493 ADDPS: Packed Single-FP Add Opcode Instruction Description 0F,58,/r ADDPS xmm1, xmm2/m128 Add packed SP FP numbers from XMM2/Mem to XMM1. Operation: xmm1[31-0] = xmm1[31-0] + xmm2/m128[31-0]; xmm1[63-32] = xmm1[63-32] + xmm2/m128[63-32]; xmm1[95-64] = xmm1[95-64] + xmm2/m128[95-64]; xmm1[127-96] = xmm1[127-96] + xmm2/m128[127-96]; The ADDPS instruction adds the packed SP FP numbers of both their operands.
  • Page 494 ADDSS: Scalar Single-FP Add Opcode Instruction Description F3,0F,58, /r ADDSS xmm1, xmm2/m32 Add the lower SP FP number from XMM2/Mem to XMM1. Operation: xmm1[31-0] = xmm1[31-0] + xmm2/m32[31-0]; xmm1[63-32] = xmm1[63-32]; xmm1[95-64] = xmm1[95-64]; xmm1[127-96] = xmm1[127-96]; The ADDSS instruction adds the lower SP FP numbers of both their operands; the upper Description: 3 fields are passed through from xmm1.
  • Page 495 ANDNPS: Bit-wise Logical And Not for Single-FP Opcode Instruction Description 0F,55,/r ANDNPS xmm1, xmm2/m128 Invert the 128 bits in XMM1and then AND the result with 128 bits from XMM2/Mem. Operation: xmm1[127-0] = ~(xmm1[127-0]) & xmm2/m128[127-0]; The ANDNPS instructions returns a bit-wise logical AND between the complement of Description: XMM1 and XMM2/Mem.
  • Page 496 ANDPS: Bit-wise Logical And for Single-FP Opcode Instruction Description 0F,54,/r ANDPS xmm1, xmm2/m128 Logical AND of 128 bits from XMM2/Mem to XMM1 register. Operation: xmm1[127-0] &= xmm2/m128[127-0]; The ANDPS instruction returns a bit-wise logical AND between XMM1 and XMM2/Mem. Description: General protection exception if not aligned on 16-byte boundary, regardless of FP Exceptions: segment.
  • Page 497 CMPPS: Packed Single-FP Compare Opcode Instruction Description 0F,C2,/r,ib CMPPS xmm1, xmm2/m128, Compare packed SP FP numbers from XMM2/Mem to imm8 packed SP FP numbers in XMM1 register using imm8 as predicate. Operation: switch (imm8) { case eq: op = eq; case lt: op = lt;...
  • Page 498 CMPPS: Packed Single-FP Compare (Continued) QNaN Result if imm8 Operand Predicate Description Relation Emulation Encoding Signals Operand Invalid equal xmm1 == xmm2 000B False less-than xmm1 < xmm2 001B False less-than-or-equal xmm1 <= xmm2 010B False greater than xmm1 > xmm2 swap, protect, lt False greater-than-or-equal...
  • Page 499 CMPPS: Packed Single-FP Compare (Continued) Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR.dfl is 1, NaT Register Consumption Fault Itanium Mem Faults VHPT Data Fault, Data TLB Fault, Alternate Data TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault Compilers and assemblers should implement the following 2-operand pseudo-ops in...
  • Page 500 CMPSS: Scalar Single-FP Compare Opcode Instruction Description F3,0F,C2,/r,ib CMPSS xmm1, xmm2/m32, Compare lowest SP FP number from XMM2/Mem to lowest imm8 SP FP number in XMM1 register using imm8 as predicate. Operation: switch (imm8) { case eq: op = eq; case lt: op = lt;...
  • Page 501 CMPSS: Scalar Single-FP Compare (Continued) Result if qNaN imm8 Predicate Description Relation Emulation OperandSig Encoding Operand nals Invalid equal xmm1 == xmm2 000B False less-than xmm1 < xmm2 001B False less-than-or-equal xmm1 <= xmm2 010B False greater than xmm1 > xmm2 swap, protect, lt False greater-than-or-equal...
  • Page 502 CMPSS: Scalar Single-FP Compare (Continued) Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR.dfl is 1, NaT Register Consumption Fault Itanium Mem Faults VHPT Data Fault, Data TLB Fault, Alternate Data TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault Compilers and assemblers should implement the following 2-operand pseudo-ops in...
  • Page 503 COMISS: Scalar Ordered Single-FP Compare and set EFLAGS Opcode Instruction Description 0F,2F,/r COMISS xmm1, xmm2/m32 Compare lower SP FP number in XMM1 register with lower SP FP number in XMM2/Mem and set the status flags accordingly Operation: switch (xmm1[31-0] <> xmm2/m32[31-0]) { OF,SF,AF = 000;...
  • Page 504 COMISS: Scalar Ordered Single-FP Compare and set EFLAGS (Continued) Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR.dfl is 1, NaT Register Consumption Fault Itanium Mem Faults VHPT Data Fault, Data TLB Fault, Alternate Data TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault...
  • Page 505 CVTPI2PS: Packed Signed INT32 to Packed Single-FP Conversion Opcode Instruction Description 0F,2A,/r CVTPI2PS xmm, mm/m64 Convert two 32-bit signed integers from MM/Mem to two SP Operation: xmm[31-0] = (float) (mm/m64[31-0]); xmm[63-32] = (float) (mm/m64[63-32]); xmm[95-64] = xmm[95-64]; xmm[127-96] = xmm[127-96]; The CVTPI2PS instruction converts signed 32-bit integers to SP FP numbers;...
  • Page 506 CVTPI2PS: Packed Signed INT32 to Packed Single-FP Conversion (Continued) This instruction behaves identically to original MMX technology instructions, in the Comments: presence of x87-FP instructions: • Transition from x87-FP to MMX technology (TOS=0, FP valid bits set to all valid). •...
  • Page 507 CVTPS2PI: Packed Single-FP to Packed INT32 Conversion Opcode Instruction Description 0F,2D,/r CVTPS2PI mm, xmm/m64 Convert lower 2 SP FP from XMM/Mem to 2 32-bit signed integers in MM using rounding specified by MXCSR. Operation: mm[31-0] = (int) (xmm/m64[31-0]); mm[63-32] = (int) (xmm/m64[63-32]); The CVTPS2PI instruction converts the lower 2 SP FP numbers in xmm/m64 to signed Description: 32-bit integers in mm;...
  • Page 508 CVTPS2PI: Packed Single-FP to Packed INT32 Conversion (Continued) • Transition from x87-FP to MMX technology (TOS=0, FP valid bits set to all valid). • MMX technology instructions write ones (1’s) to the exponent part of the corresponding x87-FP register. Prioritization for fault and assist behavior for CVTPS2PI is as follows: Memory source 1.
  • Page 509 CVTSI2SS: Scalar signed INT32 to Single-FP Conversion Opcode Instruction Description F3,0F,2A,/r CVTSI2SS xmm, r/m32 Convert one 32-bit signed integer from Integer Reg/Mem to one SP FP. Operation: xmm[31-0] = (float) (r/m32); xmm[63-32] = xmm[63-32]; xmm[95-64] = xmm[95-64]; xmm[127-96] = xmm[127-96]; The CVTSI2SS instruction converts a signed 32-bit integer from memory or from a Description: 32-bit integer register to a SP FP number;...
  • Page 510 CVTSS2SI: Scalar Single-FP to Signed INT32 Conversion Opcode Instruction Description F3,0F,2D,/r CVTSS2SI r32, xmm/m32 Convert one SP FP from XMM/Mem to one 32 bit signed integer using rounding mode specified by MXCSR, and move the result to an integer register. Operation: r32 = (int) (xmm/m32[31-0]);...
  • Page 511 CVTTPS2PI: Packed Single-FP to Packed INT32 Conversion (truncate) Opcode Instruction Description 0F,2C,/r CVTTPS2PI mm, xmm/m64 Convert lower 2 SP FP from XMM/Mem to 2 32-bit signed integers in MM using truncate. Operation: mm[31-0] = (int) (xmm/m64[31-0]); mm[63-32] = (int) (xmm/m64[63-32]); The CVTTPS2PI instruction converts the lower 2 SP FP numbers in xmm/m64 to 2 32-bit Description: signed integers in mm;...
  • Page 512 CVTTPS2PI: Packed Single-FP to Packed INT32 Conversion (truncate) (Continued) This instruction behaves identically to original MMX technology instructions, in the Comments: presence of x87-FP instructions, including: • Transition from x87-FP to MMX technology (TOS=0, FP valid bits set to all valid). •...
  • Page 513 CVTTSS2SI: Scalar Single-FP to signed INT32 Conversion (truncate) Opcode Instruction Description F3,0F,2C,/r CVTTSS2SI r32, xmm/m32 Convert lowest SP FP from XMM/Mem to one 32 bit signed integer using truncate, and move the result to an integer register. Operation: r32 = (int) (xmm/m32[31-0]); The CVTTSS2SI instruction converts a SP FP number to a signed 32-bit integer and Description: returns it in the 32-bit integer register;...
  • Page 514 DIVPS: Packed Single-FP Divide Opcode Instruction Description 0F,5E,/r DIVPS xmm1, xmm2/m128 Divide packed SP FP numbers in XMM1 by XMM2/Mem Operation: xmm1[31-0] = xmm1[31-0] / (xmm2/m128[31-0]); xmm1[63-32] = xmm1[63-32] / (xmm2/m128[63-32]); xmm1[95-64] = xmm1[95-64] / (xmm2/m128[95-64]); xmm1[127-96] = xmm1[127-96] / (xmm2/m128[127-96]); The DIVPS instruction divides the packed SP FP numbers of both their operands.
  • Page 515 DIVSS: Scalar Single-FP Divide Opcode Instruction Description F3,0F,5E,/r DIVSS xmm1, xmm2/m32 Divide lower SP FP numbers in XMM1 by XMM2/Mem Operation: xmm1[31-0] = xmm1[31-0] / (xmm2/m32[31-0]); xmm1[63-32] = xmm1[63-32]; xmm1[95-64] = xmm1[95-64]; xmm1[127-96] = xmm1[127-96]; The DIVSS instructions divide the lowest SP FP numbers of both operands; the upper 3 Description: fields are passed through from xmm1.
  • Page 516 Opcode Instruction Description 0F,AE,/1 FXRSTOR Load FP/Intel MMX technology and SSE state from m512byte. m512byte Operation: FP and MMX technology state and SSE state = m512byte; The FXRSTOR instruction reloads the FP and MMX technology state and SSE state Description: (environment and registers) from the memory area defined by m512byte.
  • Page 517 Rsrvd Reserved Reserved Reserved Three fields in the floating-point save area contain reserved bits that are not indicated in the table: • FOP: The lower 11-bits contain the opcode, upper 5-bits are reserved. • IP & DP:32-bit mode: 32-bit IP-offset. •...
  • Page 518 ® FXRSTOR: Restore FP and Intel MMX™ Technology State and SSE State (Continued) Virtual 8086 Mode Exceptions: Same exceptions as in Real Address Mode; #AC for unaligned memory reference if the current privilege level is 3; #PF (fault-code) for a page fault.
  • Page 519 Instruction Description 0F,AE,/0 FXSAVE Store FP and Intel MMX technology state and SSE state to m512byte. m512byte Operation: m512byte = FP and MMX technology state and SSE state; The FXSAVE instruction writes the current FP and MMX technology state and SSE state Description: (environment and registers) to the specified destination defined by m512byte.
  • Page 520 Rsrvd Reserved Reserved Reserved Reserved Reserved Reserved Three fields in the floating-point save area contain reserved bits that are not indicated in the table: • FOP: The lower 11-bits contain the opcode, upper 5-bits are reserved. • IP & DP: 32-bit mode: 32-bit IP-offset. •...
  • Page 521 Exponent Exponent Fraction J and M FTW valid bit x87 FTW all 1’s all 0’s all 0’s bits Special Special Special Special For all legal combinations above Empty The J-bit is defined to be the 1-bit binary integer to the left of the decimal place in the significand.
  • Page 522 ® FXSAVE: Store FP and Intel MMX™ Technology State and SSE State (Continued) Real Address Mode Exceptions: Interrupt 13 if any part of the operand would lie outside of the effective address space from 0 to 0FFFFH; #NM if CR0.EM = 1; #NM if TS bit in CR0 is set.
  • Page 523 The exception will occur only upon the next SSE instruction to cause this type of exception. The Intel SSE architecture uses only one exception flag for each exception. There is no provision for individual exception reporting within a packed data type.
  • Page 524 LDMXCSR: Load SSE Control/Status (Continued) Bit 15 (FZ) is used to turn on the Flush To Zero mode (bit is set). Turning on the Flush To Zero mode has the following effects during underflow situations: • Zero results are returned with the sign of the true result. •...
  • Page 525 LDMXCSR: Load SSE Control/Status (Continued) Additional Itanium System Environment Exceptions Itanium Reg Faults NaT Register Consumption Fault Itanium Mem Faults VHPT Data Fault, Data TLB Fault, Alternate Data TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault The usage of Repeat (F2H, F3H) and Operand Size (66H) prefixes with LDMXCSR is...
  • Page 526 MAXPS: Packed Single-FP Maximum Opcode Instruction Description 0F,5F,/r MAXPS xmm1, xmm2/m128 Return the maximum SP FP numbers between XMM2/Mem and XMM1. Operation: xmm1[31-0] (xmm1[31-0] == NAN) ? xmm2[31-0] : (xmm2[31-0] == NAN) ? xmm2[31-0] : (xmm1[31-0] > xmm2/m128[31-0]) ? xmm1[31-0] ? xmm2/m128[31-0];...
  • Page 527 MAXPS: Packed Single-FP Maximum (Continued) Real Address Mode Exceptions: Interrupt 13 if any part of the operand would lie outside of the effective address space from 0 to 0FFFFH; #UD if CR0.EM = 1; #NM if TS bit in CR0 is set; #XM for an unmasked SSE numeric exception (CR4.OSXMMEXCPT =1);...
  • Page 528 MAXSS: Scalar Single-FP Maximum Opcode Instruction Description F3,0F,5F,/r MAXSS xmm1, xmm2/m32 Return the maximum SP FP number between the lower SP FP numbers from XMM2/Mem and XMM1. Operation: xmm1[31-0] = (xmm1[31-0] == NAN) ? xmm2[31-0] : (xmm2[31-0] == NAN) ? xmm2[31-0] : (xmm1[31-0] >...
  • Page 529 MAXSS: Scalar Single-FP Maximum (Continued) Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR.dfl is 1, NaT Register Consumption Fault Itanium Mem Faults VHPT Data Fault, Data TLB Fault, Alternate Data TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault Note that if only one source is a NaN for these instructions, the Src2 operand (either...
  • Page 530 MINPS: Packed Single-FP Minimum Opcode Instruction Description 0F,5D,/r MINPS xmm1, xmm2/m128 Return the minimum SP numbers between XMM2/Mem and XMM1. Operation: xmm1[31-0] (xmm1[31-0] == NAN) ? xmm2[31-0] : (xmm2[31-0] == NAN) ? xmm2[31-0] : (xmm1[31-0] < xmm2/m128[31-0]) : xmm1[31-0] ? xmm2/m128[31-0];...
  • Page 531 MINPS: Packed Single-FP Minimum (Continued) Real Address Mode Exceptions: Interrupt 13 if any part of the operand would lie outside of the effective address space from 0 to 0FFFFH; #UD if CR0.EM = 1; #NM if TS bit in CR0 is set; #XM for an unmasked SSE numeric exception (CR4.OSXMMEXCPT =1);...
  • Page 532 MINSS: Scalar Single-FP Minimum Opcode Instruction Description F3,0F,5D,/r MINSS xmm1, xmm2/m32 Return the minimum SP FP number between the lowest SP FP numbers from XMM2/Mem and XMM1. Operation: xmm1[31-0] = (xmm1[31-0] == NAN) ? xmm2[31-0] : (xmm2[31-0] == NAN) ? xmm2[31-0] : (xmm1[31-0] <...
  • Page 533 MINSS: Scalar Single-FP Minimum (Continued) Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR.dfl is 1, NaT Register Consumption Fault Itanium Mem Faults VHPT Data Fault, Data TLB Fault, Alternate Data TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault Note that if only one source is a NaN for these instructions, the Src2 operand (either...
  • Page 534 MOVAPS: Move Aligned Four Packed Single-FP Opcode Instruction Description 0F,28,/r MOVAPS xmm1, xmm2/m128 Move 128 bits representing 4 packed SP data from XMM2/Mem to XMM1 register. Move 128 bits representing 4 packed SP from XMM1 register 0F,29,/r MOVAPS xmm2/m128, xmm1 to XMM2/Mem.
  • Page 535 MOVUPS should be used instead of MOVAPS. The usage of this instruction should be limited to the cases where the aligned restriction is easy to meet. Processors that support the Intel SSE architecture will provide optimal aligned performance for the MOVAPS instruction.
  • Page 536 MOVHLPS: Move High to Low Packed Single-FP Opcode Instruction Description 0F,12,/r MOVHLPS xmm1, xmm2 Move 64 bits representing higher two SP operands from XMM2 to lower two fields of XMM1 register. Operation: // move instruction xmm1[127-64] = xmm1[127-64]; xmm1[63-0] = xmm2[127-64]; The upper 64-bits of the source register xmm2 are loaded into the lower 64-bits of the Description: 128-bit register xmm1 and the upper 64-bits of xmm1 are left unchanged.
  • Page 537 MOVHPS: Move High Packed Single-FP Opcode Instruction Description 0F,16,/r MOVHPS xmm, m64 Move 64 bits representing two SP operands from Mem to upper two fields of XMM register. Move 64 bits representing two SP operands from upper two 0F,17,/r MOVHPS m64, xmm fields of XMM register to Mem.
  • Page 538 MOVHPS: Move High Packed Single-FP (Continued) Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR.dfl is 1, NaT Register Consumption Fault Itanium Mem Faults VHPT Data Fault, Data TLB Fault, Alternate Data TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault...
  • Page 539 MOVLHPS: Move Low to High Packed Single-FP Opcode Instruction Description 0F,16,/r MOVLHPS xmm1, xmm2 Move 64 bits representing lower two SP operands from XMM2 to upper two fields of XMM1 register. Operation: // move instruction xmm1[127-64] = xmm2[63-0]; xmm1[63-0] = xmm1[63-0]; The lower 64-bits of the source register xmm2 are loaded into the upper 64-bits of the Description: 128-bit register xmm1 and the lower 64-bits of xmm1 are left unchanged.
  • Page 540 MOVLPS: Move Low Packed Single-FP Opcode Instruction Description 0F,12,/r MOVLPS xmm, m64 Move 64 bits representing two SP operands from Mem to lower two fields of XMM register. Move 64 bits representing two SP operands from lower two 0F,13,/r MOVLPS m64, xmm fields of XMM register to Mem.
  • Page 541 MOVLPS: Move Low Packed Single-FP (Continued) Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR.dfl is 1, NaT Register Consumption Fault Itanium Mem Faults VHPT Data Fault, Data TLB Fault, Alternate Data TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault, Data Dirty Bit Fault...
  • Page 542 MOVMSKPS: Move Mask to Integer Opcode Instruction Description 0F,50,/r MOVMSKPS r32, xmm Move the single mask to r32. Operation: r32[3] = xmm[127]; r32[2] = xmm[95]; r32[1] = xmm[63]; r32[0] = xmm[31]; r32[7-4] = 0x0; r32[15-8] = 0x00; r32[31-16] = 0x0000; The MOVMSKPS instruction returns to the integer register r32 a 4-bit mask formed of Description: the most significant bits of each SP FP number of its operand.
  • Page 543 MOVSS: Move Scalar Single-FP Opcode Instruction Description F3,0F,10,/r MOVSS xmm1, xmm2/m32 Move 32 bits representing one scalar SP operand from XMM2/Mem to XMM1 register. Move 32 bits representing one scalar SP operand from XMM1 F3,0F,11,/r MOVSS xmm2/m32, xmm1 register to XMM2/Mem. Operation: if (destination == xmm1) { if (source == m32) {...
  • Page 544 MOVSS: Move Scalar Single-FP (Continued) xmm2[127-96] = xmm2[127-96]; The linear address corresponds to the address of the least-significant byte of the Description: referenced memory data. When a memory address is indicated, the 4 bytes of data at memory location m32 are loaded or stored. When the load form of this operation is used, the 32-bits from memory are copied into the lower 32 bits of the 128-bit register xmm, the 96 most significant bits being cleared.
  • Page 545 MOVUPS: Move Unaligned Four Packed Single-FP Opcode Instruction Description 0F,10,/r MOVUPS xmm1, xmm2/m128 Move 128 bits representing four SP data from XMM2/Mem to XMM1 register. Move 128 bits representing four SP data from XMM1 register to 0F,11,/r MOVUPS xmm2/m128, xmm1 XMM2/Mem.
  • Page 546 MOVUPS: Move Unaligned Four Packed Single-FP (Continued) Protected Mode Exceptions: #GP(0) for an illegal memory operand effective address in the CS, DS, ES, FS or GS segments; #SS(0) for an illegal address in the SS segment; #PF (fault-code) for a page fault;...
  • Page 547 MULPS: Packed Single-FP Multiply Opcode Instruction Description 0F,59,/r MULPS xmm1, xmm2/m128 Multiply packed SP FP numbers in XMM2/Mem to XMM1. Operation: xmm1[31-0] = xmm1[31-0] * xmm2/m128[31-0]; xmm1[63-32] = xmm1[63-32] * xmm2/m128[63-32]; xmm1[95-64] = xmm1[95-64] * xmm2/m128[95-64]; xmm1[127-96] = xmm1[127-96] * xmm2/m128[127-96]; The MULPS instructions multiply the packed SP FP numbers of both their operands.
  • Page 548 MULSS: Scalar Single-FP Multiply Opcode Instruction Description F3,0F,59,/r MULSS xmm1 xmm2/m32 Multiply the lowest SP FP number in XMM2/Mem to XMM1. xmm1[31-0] = xmm1[31-0] * xmm2/m32[31-0]; xmm1[63-32] = xmm1[63-32]; xmm1[95-64] = xmm1[95-64]; xmm1[127-96] = xmm1[127-96]; The MULSS instructions multiply the lowest SP FP numbers of both their operands; the Description: upper 3 fields are passed through from xmm1.
  • Page 549 ORPS: Bit-wise Logical OR for Single-FP Data Opcode Instruction Description 0F,56,/r ORPS xmm1, xmm2/m128 OR 128 bits from XMM2/Mem to XMM1 register. Operation: xmm1[127-0] |= xmm2/m128[127-0]; The ORPS instructions return a bit-wise logical OR between xmm1 and xmm2/mem. Description: General protection exception if not aligned on 16-byte boundary, regardless of FP Exceptions: segment.
  • Page 550 RCPPS: Packed Single-FP Reciprocal Opcode Instruction Description 0F,53,/r RCPPS xmm1, xmm2/m128 Return a packed approximation of the reciprocal of XMM2/Mem. Operation: xmm1[31-0] = approx (1.0/(xmm2/m128[31-0])); xmm1[63-32] = approx (1.0/(xmm2/m128[63-32])); xmm1[95-64] = approx (1.0/(xmm2/m128[95-64])); xmm1[127-96] = approx (1.0/(xmm2/m128[127-96])); RCPPS returns an approximation of the reciprocal of the SP FP numbers from Description: xmm2/m128.
  • Page 551 RCPPS: Packed Single-FP Reciprocal (Continued) For input values x which satisfy 1.11111111110100000000001 ×2 <= |x| <= 1.00000000000110000000000 ×2 flush-to-zero might or might not occur, depending on the implementation (this interval contains 6144 + 3072 = 9216 single precision floating-point numbers). Results are guaranteed to be tiny, and therefore flushed to zero, for input values x which satisfy |x| <= 1.00000000000110000000001...
  • Page 552 RCPSS: Scalar Single-FP Reciprocal Opcode Instruction Description F3,0F,53,/r RCPSS xmm1, xmm2/m32 Return an approximation of the reciprocal of the lower SP FP number in XMM2/Mem. Operation: xmm1[31-0] = approx (1.0/(xmm2/m32[31-0])); xmm1[63-32] = xmm1[63-32]; xmm1[95-64] = xmm1[95-64]; xmm1[127-96] = xmm1[127-96]; RCPSS returns an approximation of the reciprocal of the lower SP FP number from Description: xmm2/m32;...
  • Page 553 RCPSS: Scalar Single-FP Reciprocal (Continued) For input values x which satisfy 1.11111111110100000000001 ×2 <= |x| <= 1.00000000000110000000000 ×2 flush-to-zero might or might not occur, depending on the implementation (this interval contains 6144 + 3072 = 9216 single precision floating-point numbers). Results are guaranteed to be tiny, and therefore flushed to zero, for input values x which satisfy |x| <= 1.00000000000110000000001...
  • Page 554 RSQRTPS: Packed Single-FP Square Root Reciprocal Opcode Instruction Description 0F,52,/r RSQRTPS xmm1, xmm2/m128 Return a packed approximation of the square root of the reciprocal of XMM2/Mem. Operation: xmm1[31-0] = approx (1.0/sqrt(xmm2/m128[31-0])); xmm1[63-32] = approx (1.0/sqrt(xmm2/m128[63-32])); xmm1[95-64] = approx (1.0/sqrt(xmm2/m128[95-64])); xmm1[127-96] = approx (1.0/sqrt(xmm2/m128[127-96])); RSQRTPS returns an approximation of the reciprocal of the square root of the SP FP Description: numbers from xmm2/m128.
  • Page 555 RSQRTSS: Scalar Single-FP Square Root Reciprocal Opcode Instruction Description F3,0F,52,/r RSQRTSS xmm1, xmm2/m32 Return an approximation of the square root of the reciprocal of the lowest SP FP number in XMM2/Mem. Operation: xmm1[31-0] = approx (1.0/sqrt(xmm2/m32[31-0])); xmm1[63-32] = xmm1[63-32]; xmm1[95-64] = xmm1[95-64];...
  • Page 556 SHUFPS: Shuffle Single-FP Opcode Instruction Description 0F,C6,/r, ib SHUFPS xmm1, xmm2/m128, imm8 Shuffle Single. Operation: fp_select = (imm8 >> 0) & 0x3; xmm1[31-0] = (fp_select == 0) ? xmm1[31-0] (fp_select == 1) ? xmm1[63-32] (fp_select == 2) ? xmm1[95-64] xmm1[127-96]; fp_select = (imm8 >>...
  • Page 557 SHUFPS: Shuffle Single-FP (Continued) Example: xmm1 xmm2/m128 xmm1 {Y4 ... Y1} {Y4 ... Y1} {X4 ... X1} {X4 ... X1} General protection exception if not aligned on 16-byte boundary, regardless of FP Exceptions: segment. None Numeric Exceptions: Protected Mode Exceptions: #GP(0) for an illegal memory operand effective address in the CS, DS, ES, FS or GS segments;...
  • Page 558 SQRTPS: Packed Single-FP Square Root Opcode Instruction Description 0F,51,/r SQRTPS xmm1, xmm2/m128 Square Root of the packed SP FP numbers in XMM2/Mem. Operation: xmm1[31-0] = sqrt (xmm2/m128[31-0]); xmm1[63-32] = sqrt (xmm2/m128[63-32]); xmm1[95-64] = sqrt (xmm2/m128[95-64]); xmm1[127-96] = sqrt (xmm2/m128[127-96]); The SQRTPS instruction returns the square root of the packed SP FP numbers from Description: xmm2/m128.
  • Page 559 SQRTSS: Scalar Single-FP Square Root Opcode Instruction Description F3,0F,51,/r SQRTSS xmm1, xmm2/m32 Square Root of the lower SP FP number in XMM2/Mem. Operation: xmm1[31-0] = sqrt (xmm2/m32[31-0]); xmm1[63-32] = xmm1[63-32]; xmm1[95-64] = xmm1[95-64]; xmm1[127-96] = xmm1[127-96]; The SQRTSS instructions return the square root of the lowest SP FP numbers of their Description: operand.
  • Page 560 STMXCSR: Store SSE Control/Status Opcode Instruction Description 0F,AE,/3 STMXCSR m32 Store SSE control/status word to m32. Operation: m32 = MXCSR; The MXCSR control/status register is used to enable masked/unmasked exception Description: handling, to set rounding modes, to set flush-to-zero mode, and to view exception status flags.
  • Page 561 SUBPS: Packed Single-FP Subtract Opcode Instruction Description 0F,5C,/r SUBPS xmm1 xmm2/m128 Subtract packed SP FP numbers in XMM2/Mem from XMM1. Operation: xmm1[31-0] = xmm1[31-0] - xmm2/m128[31-0]; xmm1[63-32] = xmm1[63-32] - xmm2/m128[63-32]; xmm1[95-64] = xmm1[95-64] - xmm2/m128[95-64]; xmm1[127-96] = xmm1[127-96] - xmm2/m128[127-96]; The SUBPS instruction subtracts the packed SP FP numbers of both their operands.
  • Page 562 SUBSS: Scalar Single-FP Subtract Opcode Instruction Description F3,0F,5C, /r SUBSS xmm1, xmm2/m32 Subtract the lower SP FP numbers in XMM2/Mem from XMM1. Operation: xmm1[31-0] = xmm1[31-0] - xmm2/m32[31-0]; xmm1[63-32] = xmm1[63-32]; xmm1[95-64] = xmm1[95-64]; xmm1[127-96] = xmm1[127-96]; The SUBSS instruction subtracts the lower SP FP numbers of both their operands. Description: None.
  • Page 563 UCOMISS: Unordered Scalar Single-FP Compare and Set EFLAGS Opcode Instruction Description 0F,2E,/r UCOMISS xmm1, xmm2/m32 Compare lower SP FP number in XMM1 register with lower SP FP number in XMM2/Mem and set the status flags accordingly. Operation: switch (xmm1[31-0] <> xmm2/m32[31-0]) { OF,SF,AF = 000;...
  • Page 564 UCOMISS: Unordered Scalar Single-FP Compare and Set EFLAGS (Continued) Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR.dfl is 1, NaT Register Consumption Fault Itanium Mem Faults VHPT Data Fault, Data TLB Fault, Alternate Data TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault...
  • Page 565 UNPCKHPS: Unpack High Packed Single-FP Data Opcode Instruction Description 0F,15,/r UNPCKHPS xmm1, xmm2/m128 Interleaves SP FP numbers from the high halves of XMM1 and XMM2/Mem into XMM1 register. Operation: xmm1[31-0] = xmm1[95-64]; xmm1[63-32] = xmm2/m128[95-64]; xmm1[95-64] = xmm1[127-96]; xmm1[127-96] = xmm2/m128[127-96]; The UNPCKHPS instruction performs an interleaved unpack of the high-order data Description: elements of XMM1 and XMM2/Mem.
  • Page 566 UNPCKHPS: Unpack High Packed Single-FP Data (Continued) Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR.dfl is 1, NaT Register Consumption Fault Itanium Mem Faults VHPT Data Fault, Data TLB Fault, Alternate Data TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault...
  • Page 567 UNPCKLPS: Unpack Low Packed Single-FP Data Opcode Instruction Description 0F,14,/r UNPCKLPS xmm1, xmm2/m128 Interleaves SP FP numbers from the low halves of XMM1 and XMM2/Mem into XMM1 register. Operation: xmm1[31-0] = xmm1[31-0]; xmm1[63-32] = xmm2/m128[31-0]; xmm1[95-64] = xmm1[63-32]; xmm1[127-96] = xmm2/m128[63-32]; The UNPCKLPS instruction performs an interleaved unpack of the low-order data Description: elements of XMM1 and XMM2/Mem.
  • Page 568 UNPCKLPS: Unpack Low Packed Single-FP Data (Continued) Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR.dfl is 1, NaT Register Consumption Fault Itanium Mem Faults VHPT Data Fault, Data TLB Fault, Alternate Data TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data Key Miss Fault, Data Key Permission Fault, Data Access Rights Fault, Data Access Bit Fault...
  • Page 569: Simd Integer Instruction Set Extensions

    XORPS: Bit-wise Logical Xor for Single-FP Data Opcode Instruction Description 0F,57,/r XORPS xmm1, xmm2/m128 XOR 128 bits from XMM2/Mem to XMM1 register. Operation: xmm[127-0] ^= xmm/m128[127-0]; The XORPS instruction returns a bit-wise logical XOR between XMM1 and XMM2/Mem. Description: General protection exception if not aligned on 16-byte boundary, regardless of FP Exceptions: segment.
  • Page 570 PAVGB/PAVGW: Packed Average Opcode Instruction Description 0F,E0, /r PAVGB mm1,mm2/m64 Average with rounding packed unsigned bytes from MM2/Mem to packed bytes in MM1 register. 0F,E3, /r PAVGW mm1, mm2/m64 Average with rounding packed unsigned words from MM2/Mem to packed words in MM1 register. Operation: if (instruction == PAVGB) { x[0]...
  • Page 571 PAVGB/PAVGW: Packed Average (Continued) temp[i] = zero_ext(x[i], 16) + zero_ext(y[i], 16); res[i] = (temp[i] +1) >> 1; mm1[15-0] res[0]; mm1[63-48] res[3]; The PAVG instructions add the unsigned data elements of the source operand to the Description: unsigned data elements of the destination register, along with a carry-in. The results of the add are then each independently right shifted by one bit position.
  • Page 572 PEXTRW: Extract Word Opcode Instruction Description 0F,C5, /r, ib PEXTRW r32, mm, imm8 Extract the word pointed to by imm8 from MM and move it to a 32-bit integer register. Operation: sel = imm8 & 0x3; mm_temp = (mm >> (sel * 16)) & 0xffff; r[15-0] = mm_temp[15-0];...
  • Page 573 PINSRW: Insert Word Opcode Instruction Description 0F,C4,/r,ib PINSRW mm, r32/m16, imm8 Insert the word from the lower half of r32 or from Mem16 into the position in MM pointed to by imm8 without touching the other words. Operation: sel = imm8 & 0x3; mask = (sel == 0)? 0x000000000000ffff : (sel == 1)? 0x00000000ffff0000 : (sel == 2)? 0x0000ffff00000000 :...
  • Page 574 PMAXSW: Packed Signed Integer Word Maximum Opcode Instruction Description 0F,EE, /r PMAXSW mm1, mm2/m64 Return the maximum words between MM2/Mem and MM1. Operation: mm1[15-0] = (mm1[15-0] > mm2/m64[15-0]) ? mm1[15-0] : mm2/m64[15-0]; mm1[31-16] = (mm1[31-16] > mm2/m64[31-16]) ? mm1[31-16] : mm2/m64[31-16]; mm1[47-32] = (mm1[47-32] >...
  • Page 575 PMAXUB: Packed Unsigned Integer Byte Maximum Opcode Instruction Description 0F,DE, /r PMAXUB mm1, mm2/m64 Return the maximum bytes between MM2/Mem and MM1. Operation: mm1[7-0] = (mm1[7-0] > mm2/m64[7-0]) ? mm1[7-0] : mm2/m64[7-0]; mm1[15-8] = (mm1[15-8] > mm2/m64[15-8]) ? mm1[15-8] : mm2/m64[15-8]; mm1[23-16] = (mm1[23-16] >...
  • Page 576 PMINSW: Packed Signed Integer Word Minimum Opcode Instruction Description 0F,EA, /r PMINSW mm1, mm2/m64 Return the minimum words between MM2/Mem and MM1. Operation: mm1[15-0] = (mm1[15-0] < mm2/m64[15-0]) ? mm1[15-0] : mm2/m64[15-0]; mm1[31-16] = (mm1[31-16] < mm2/m64[31-16]) ? mm1[31-16] : mm2/m64[31-16]; mm1[47-32] = (mm1[47-32] <...
  • Page 577 PMINUB: Packed Unsigned Integer Byte Minimum Opcode Instruction Description 0F,DA, /r PMINUB mm1, mm2/m64 Return the minimum bytes between MM2/Mem and MM1. Operation: mm1[7-0] = (mm1[7-0] < mm2/m64[7-0]) ? mm1[7-0] : mm2/m64[7-0]; mm1[15-8] = (mm1[15-8] < mm2/m64[15-8]) ? mm1[15-8] : mm2/m64[15-8]; mm1[23-16] = (mm1[23-16] <...
  • Page 578 PMOVMSKB: Move Byte Mask To Integer Opcode Instruction Description Move the byte mask of MM to r32. 0F,D7,/r PMOVMSKB r32, mm Operation: r32[7] = mm[63]; r32[6] = mm[55]; r32[5] = mm[47]; r32[4] = mm[39]; r32[3] = mm[31]; r32[2] = mm[23]; r32[1] = mm[15];...
  • Page 579 PMULHUW: Packed Multiply High Unsigned Opcode Instruction Description 0F,E4,/r PMULHUW mm1, mm2/m64 Multiply the packed unsigned words in MM1 register with the packed unsigned words in MM2/Mem, then store the high-order 16 bits of the results in MM1. Operation: mm1[15-0] = (mm1[15-0] * mm2/m64[15-0])[31-16];...
  • Page 580 PSADBW: Packed Sum of Absolute Differences Opcode Instruction Description 0F,F6, /r PSADBW mm1,mm2/m64 Absolute difference of packed unsigned bytes from MM2 /Mem and MM1; these differences are then summed to produce a word result. Operation: temp1 = ABS(mm1[7-0] - mm2/m64[7-0]); temp2 = ABS(mm1[15-8] - mm2/m64[15-8]);...
  • Page 581 PSADBW: Packed Sum of Absolute Differences (Continued) Virtual 8086 Mode Exceptions Same exceptions as in Real Address Mode; #PF(fault-code) for a page fault. Additional Itanium System Environment Exceptions Itanium Reg Faults Disabled FP Register Fault if PSR.dfl is 1, NaT Register Consumption Fault Itanium Mem Faults VHPT Data Fault, Data TLB Fault, Alternate Data TLB Fault, Data Page Not Present Fault, Data NaT Page Consumption Abort, Data...
  • Page 582: Cacheability Control Instructions

    PSHUFW: Packed Shuffle Word Opcode Instruction Description 0F,70,/r,ib PSHUFW mm1, mm2/m64, imm8 Shuffle the words in MM2/Mem based on the encoding in imm8 and store in MM1. Operation: mm1[15-0] = (mm2/m64 >> (imm8[1-0] * 16) )[15-0] mm1[31-16] = (mm2/m64 >> (imm8[3-2] * 16) )[15-0] mm1[47-32] = (mm2/m64 >>...
  • Page 583 MASKMOVQ: Byte Mask Write Opcode Instruction Description 0F,F7,/r MASKMOVQ mm1, mm2 Move 64-bits representing integer data from MM1 register to memory location specified by the edi register, using the byte mask in MM2 register. Operation: if (mm2[7]) m64[edi] = mm1[7-0]; if (mm2[15]) m64[edi+1] = mm1[15-8];...
  • Page 584 MASKMOVQ: Byte Mask Write (Continued) Real Address Mode Exceptions: Interrupt 13 if any part of the operand would lie outside of the effective address space from 0 to 0FFFFH; #UD if CR0.EM = 1; #NM if TS bit in CR0 is set; #MF if there is a pending FPU exception.
  • Page 585 MOVNTPS: Move Aligned Four Packed Single-FP Non-temporal Opcode Instruction Description 0F,2B, /r MOVNTPS m128, xmm Move 128 bits representing four packed SP FP data from XMM register to Mem, minimizing pollution in the cache hierarchy. Operation: m128 = xmm; The linear address corresponds to the address of the least-significant byte of the Description: referenced memory data.
  • Page 586 MOVNTQ: Move 64 Bits Non-temporal Opcode Instruction Description 0F,E7,/r MOVNTQ m64, mm Move 64 bits representing integer operands (8b, 16b, 32b) from MM register to memory, minimizing pollution within cache hierarchy. Operation: m64 = mm; The linear address corresponds to the address of the least-significant byte of the Description: referenced memory data.
  • Page 587 PREFETCH: Prefetch Opcode Instruction Description 0F,18,/1 PREFETCHT0 m8 Move data specified by address closer to the processor using the t0 hint. 0F,18,/2 PREFETCHT1 m8 Move data specified by address closer to the processor using the t1 hint. 0F,18,/3 PREFETCHT2 m8 Move data specified by address closer to the processor using the t2 hint.
  • Page 588 SFENCE: Store Fence Opcode Instruction Description 0F AE /7 SFENCE Guarantees that every store instruction that precedes in program order the store fence instruction is globally visible before any store instruction which follows the fence is globally visible. Operation: while (!(preceding_stores_globally_visible)) wait(); Weakly ordered memory types can enable higher performance through such techniques Description: as out-of-order issue, write-combining, and write-collapsing.
  • Page 589 4:582 Volume 4: IA-32 SSE Instruction Reference...
  • Page 590: Index

    Index Index Intel® Itanium Architecture Software Developer’s Manual, Rev. 2.3...
  • Page 591 Index Intel® Itanium Architecture Software Developer’s Manual, Rev. 2.3...
  • Page 592 INDEX FOR VOLUMES 1, 2, 3 AND 4 Stores Register) 1:30 BSR Instruction 4:37 AAA Instruction 4:21 bsw Instruction 3:34 AAD Instruction 4:22 BSWAP Instruction 4:39 AAM Instruction 4:23 BT Instruction 4:40 AAS Instruction 4:24 BTC Instruction 4:42 Aborts 2:95, 2:538 BTR Instruction 4:44 ACPI 2:631 BTS Instruction 4:46...
  • Page 593 INDEX 1:155, 2:579 External Interrupt 2:96, 2:538 Control Speculative Load 1:156 External Interrupt Control Registers (CR64-81) Corrected Error 2:350 2:42 Corrected Machine Check Vector (CMCV) 2:126 External Interrupt Request Registers (IRR0-3) cover Instruction 3:48 2:125 CPUID (Processor Identification Register) 1:34 External Interrupt Vector Register (IVR) 2:123 CPUID Instruction 4:78 External Task Priority Cycle (XTP) 2:130...
  • Page 594 INDEX FICOM Instruction 4:128 fpabs Instruction 3:95 FICOMP Instruction 4:128 fpack Instruction 3:96 FIDIV Instruction 4:121 fpamax Instruction 3:97 FIDIVR Instruction 4:124 fpamin Instruction 3:99 FILD Instruction 4:130 FPATAN Instruction 4:149 FIMUL Instruction 4:145 fpcmp Instruction 3:101 FINCSTP Instruction 4:132 fpcvt Instruction 3:104 Firmware 1:7, 2:623 fpma Instruction 3:107...
  • Page 595 Instruction Set Transition 1:14 IA-32 Instruction Reference 4:11 Instruction Set Transitions 2:239, 2:596 IA-32 Instruction Set 2:253 Instruction Slot Mapping 1:38 IA-32 Intel® MMX™ Technology 1:129 Instruction Slots 1:38 IA-32 Intercept INSW Instruction 4:214 Gate Intercept Trap 2:235 INT (External Interrupt) 2:96...
  • Page 596 INDEX INTA (Interrupt Acknowledge) 2:130 INTn Instruction 4:217 Inter-processor Interrupt (IPI) 2:127 INTO Instruction 4:217 Interrupt Acknowledge Cycle 2:130 invala Instruction 3:146 Interruption Control Registers (CR16-27) 2:36 INVD instructions 4:228 Interruption Handler 2:537 INVLPG Instruction 4:230 Interruption Handling 2:543 IP (Instruction Pointer) 1:27, 1:140 Interruption Hash Address 2:41 IPI (Inter-processor Interrupt) 2:127 Interruption Instruction Bundle Registers (IIB0-1)
  • Page 597 INDEX LGS Instruction 4:255 MOVAPS Instruction 4:527 LIDT Instruction 4:264 MOVD Instruction 4:401 LLDT Instruction 4:267 MOVHLPS Instruction 4:529 LMSW Instruction 4:270 MOVHPS Instruction 4:530 Load Instructions 1:58 movl Instruction 3:187 loadrs Instruction 3:167 MOVLHPS Instruction 4:532 Loads from Memory 1:147 MOVLPS Instruction 4:533 Local Redirection Registers (LRR0-1) 2:126 MOVMSKPS Instruction 4:535...
  • Page 598 INDEX Illegal Dependency Fault 2:584 PAL_CACHE_READ 2:380 Long Branch Emulation 2:585 PAL_CACHE_SHARED_INFO 2:382 Multiple Address Spaces 1:20, 2:562 PAL_CACHE_SUMMARY 2:384 OS_BOOT Entrypoint 2:283 PAL_CACHE_WRITE 2:385 OS_INIT Entrypoint 2:283 PAL_COPY_INFO 2:388 OS_MCA Entrypoint 2:283 PAL_COPY_PAL 2:389 OS_RENDEZ Entrypoint 2:283 PAL_DEBUG_INFO 2:390 Performance Monitoring Support 2:620 PAL_FIXED_ADDR 2:391 Single Address Space 1:20, 2:565...
  • Page 599 INDEX PAL_VPS_RESUME_HANDLER 2:492 PMULHUW Instruction 4:572 PAL_VPS_RESUME_NORMAL 2:489 PMULHW Instruction 4:431 PAL_VPS_SAVE 2:500 PMULLW Instruction 4:433 PAL_VPS_SET_PENDING_INTERRUPT 2:495 PMV (Performance Monitoring Vector) 2:126 PAL_VPS_SYNC_READ 2:493 POP Instruction 4:311 PAL_VPS_SYNC_WRITE 2:494 POPA Instruction 4:315 PAL_VPS_THASH 2:497 POPAD Instruction 4:315 PAL_VPS_TTAG 2:498 popcnt Instruction 3:216 PAL-based Interruptions 2:95, 2:537 POPF Instruction 4:317...
  • Page 600 INDEX PSUBD Instruction 4:446 Resource Utilization Counter (RUC) 1:31, 2:33 PSUBSB Instruction 4:449 RET Instruction 4:340 PSUBSW Instruction 4:449 rfi Instruction 2:543, 3:236 PSUBUSB Instruction 4:452 RID (Region Identifier) 2:561 PSUBUSW Instruction 4:452 RNAT(RSE NaT Collection Register) 1:30 PSUBW Instruction 4:446 ROL Instruction 4:327 PTA (Page Table Address Register) 2:35 ROR Instruction 4:327...
  • Page 601 INDEX SIDT Instruction 4:359 Template Field Encoding 1:38 Single Step Trap 2:151 Templates 1:141 SLDT Instruction 4:367 TEST Instruction 4:381 SMSW Instruction 4:369 tf Instruction 3:263 Software Pipelining 1:19, 1:75, 1:145, 1:181 thash Instruction 3:265 Speculation 1:16, 1:142, 1:151 TLB (Translation Lookaside Buffer) 2:47, 2:565 Control Speculation 1:16 tnat Instruction 3:266 Data Speculation 1:17...
  • Page 602 INDEX WAIT Instruction 4:386 WAR Dependency 1:149 WAW Dependency 1:149 WBINVD Instruction 4:387 Write-after-read Dependency 1:149 Write-after-write Dependency 1:149 WRMSR Instruction 4:389 XADD Instruction 4:391 XCHG Instruction 4:393 xchg Instruction 2:508, 3:274 XLAT Instruction 4:395 XLATB Instruction 4:395 xma Instruction 3:276 xmpy Instruction 3:278 XOR Instruction 4:397 xor Instruction 3:279...
  • Page 603 INDEX Index:12 Index for Volumes 1, 2, 3 and 4...

This manual is also suitable for:

Itanium architecture

Table of Contents