® ADSP-BF53x/BF56x Blackfin Processor Programming Reference Revision 1.2, February 2007 Part Number 82-000556-01 Analog Devices, Inc. One Technology Way Norwood, Mass. 02062-9106...
Analog Devices, Inc. Printed in the USA. Disclaimer Analog Devices, Inc. reserves the right to change this product without prior notice. Information furnished by Analog Devices is believed to be accurate and reliable. However, no responsibility is assumed by Analog Devices for its use;...
CONTENTS PREFACE Purpose of This Manual ..............xxv Intended Audience ................ xxv Manual Contents ................. xxvi What’s New in This Manual ............xxvii Technical or Customer Support ..........xxviii Supported Processors ..............xxix Product Information ..............xxix MyAnalog.com ................ xxx Processor Product Information ..........
Page 5
Contents Glossary ..................1-13 Register Names ..............1-13 Functional Units ..............1-14 Arithmetic Status Flags ............1-15 Fractional Convention ............1-16 Saturation ................1-17 Rounding and Truncating ............1-19 Automatic Circular Addressing ..........1-21 COMPUTATIONAL UNITS Using Data Formats ..............2-4 Binary String ................
Page 6
Contents Using Multiplier Integer and Fractional Formats ....2-17 Rounding Multiplier Results ..........2-19 Unbiased Rounding ............2-20 Biased Rounding .............. 2-22 Truncation ............... 2-23 Special Rounding Instructions ..........2-24 Using Computational Status ............2-24 ASTAT Register ................2-25 Arithmetic Logic Unit (ALU) ............2-26 ALU Operations ..............
Page 7
Contents Multiplier Instruction Summary ..........2-38 Multiplier Instruction Options .......... 2-40 Multiplier Data Flow Details ..........2-42 Multiply Without Accumulate ..........2-44 Special 32-Bit Integer MAC Instruction ......... 2-46 Dual MAC Operations ............2-47 Barrel Shifter (Shifter) ..............2-48 Shifter Operations ..............2-48 Two-Operand Shifts ............
Page 8
Contents Supervisor Mode ................3-7 Non-OS Environments ............3-7 Example Code for Supervisor Mode Coming Out of Reset ..............3-8 Emulation Mode ................3-9 Idle State ..................3-9 Example Code for Transition to Idle State ......3-10 Reset State .................. 3-10 System Reset and Powerup ............
Page 15
Contents Atomic Operations ..............6-72 Memory-mapped Registers ............ 6-72 Core MMR Programming Code Example ....... 6-73 Terminology ................6-74 PROGRAM FLOW CONTROL Jump .................... 7-2 IF CC JUMP ................7-5 Call ....................7-8 RTS, RTI, RTX, RTN, RTE (Return) ......... 7-10 LSETUP, LOOP .................
Page 16
Contents Store Low Data Register Half ............8-49 Store Byte ................... 8-54 MOVE Move Register ................9-2 Move Conditional ................ 9-8 Move Half to Full Word – Zero-Extended ........9-10 Move Half to Full Word – Sign-Extended ........9-13 Move Register Half ..............9-15 Move Byte –...
Intended Audience The primary audience for this manual is programmers who are familiar with Analog Devices Blackfin processors. This manual assumes that the audience has a working knowledge of the appropriate Blackfin architec- ture and instruction set. Programmers who are unfamiliar with Analog...
Manual Contents Manual Contents The manual consists of: • Chapter 1, “Introduction” This chapter provides a general description of the instruction syn- tax and notation conventions. • Chapter 2, “Computational Units” Describes the arithmetic/logic units (ALUs), multiplier/accumula- tor units (MACs), shifter, and the set of video ALUs. The chapter also discusses data formats, data types, and register files.
Technical or Customer Support Technical or Customer Support You can reach Analog Devices, Inc. Customer Support in the following ways: • Visit the Embedded Processing and DSP products Web site at http://www.analog.com/processors/manuals • E-mail tools questions to processor.tools.support@analog.com • E-mail processor questions to processor.support@analog.com (World wide support)
Preface Supported Processors The following is the list of Analog Devices, Inc. processors supported in VisualDSP++®. Blackfin (ADSP-BFxxx) Processors The name Blackfin refers to a family of 16-bit, embedded processors. VisualDSP++ currently supports the following Blackfin families: ADSP-BF53x, ADSP-BF54x, and ADSP-BF56x SHARC®...
MyAnalog.com is a free feature of the Analog Devices Web site that allows MyAnalog.com customization of a Web page to display only the latest information on products you are interested in.
137.71.25.69 ftp://ftp.analog.com Related Documents The following publications that describe the ADSP-BF53x/BF56x proces- sors (and related processors) can be ordered from any Analog Devices sales office: • ADSP-BF533 Blackfin Processor Hardware Reference • ADSP-BF535 Blackfin Processor Hardware Reference • ADSP-BF561 Blackfin Processor Hardware Reference •...
Product Information • ADSP-BF536/ADSP-BF537 Blackfin Embedded Processor Data Sheet • ADSP-BF538 Blackfin Embedded Processor Data Sheet • ADSP-BF539 Blackfin Embedded Processor Data Sheet For information on product related development software and Analog Devices processors, see these publications: • VisualDSP++ User's Guide •...
Preface Each documentation file type is described as follows. File Description Help system files and manuals in Help format .CHM Dinkum Abridged C++ library and FlexLM network license manager software doc- .HTM umentation. Viewing and printing the files requires a browser, such as .HTML .HTML Internet Explorer 4.0 (or higher).
• Double-click any file that is part of the VisualDSP++ documenta- tion set. Using the Windows Start Button • Access VisualDSP++ online Help by clicking the Start button and choosing Programs, Analog Devices, VisualDSP++, and VisualDSP++ Documentation. • Access the files by clicking the Start button and choosing .PDF...
To purchase VisualDSP++ manuals, call 1-603-883-2430. The manuals may be purchased only as a kit. If you do not have an account with Analog Devices, you are referred to Analog Devices distributors. For information on our distributors, log onto http://www.analog.com/salesdir Hardware Tools Manuals To purchase EZ-KIT Lite®...
Conventions Conventions Text conventions used in this manual are identified and described as follows. Example Description Close command Titles in reference sections indicate the location of an item within the (File menu) VisualDSP++ environment’s menu system. For example, the Close command appears on the File menu.
Page 37
Preface Example Description Hexadecimal numbers use the 0x prefix and are typically shown with a 0xFBCD CBA9 space between the upper four and lower four digits. Binary numbers use the b# prefix and are typically shown with a space b#1010 0101 between each four digit group.
This ADSP-BF53x/BF56x Blackfin Processor Programming Reference pro- vides details on the assembly language instructions used by the Micro Signal Architecture (MSA) core developed jointly by Analog Devices, Inc. and Intel Corporation. This manual is applicable to all ADSP-BF53x and ADSP-BF56x processor derivatives. With the exception of the first-gener- ation ADSP-BF535 processor, all devices provide an identical core architecture and instruction set.
Page 40
Core Architecture ADDRESS ARITHMETIC UNIT DAG1 DAG0 PREG ASTAT SEQUENCER R7.H R7.L R6.H R6.L R5.H R5.L ALIGN R4.H R4.L R3.H R3.L DECODE R2.H R2.L R1.H R1.H BARREL R0.H R0.L SHIFTER LOOP BUFFER CONTROL UNIT DATA ARITHMETIC UNIT Figure 1-1. Processor Core Architecture The compute register file contains eight 32-bit registers.
Page 41
Introduction Each MAC can perform a 16- by 16-bit multiply per cycle, with accumu- lation to a 40-bit result. Signed and unsigned formats, rounding, and saturation are supported. The ALUs perform a traditional set of arithmetic and logical operations on 16-bit or 32-bit data. Many special instructions are included to acceler- ate various signal processing tasks.
Memory Architecture Blackfin processors support a modified Harvard architecture in combina- tion with a hierarchical memory structure. Level 1 (L1) memories typically operate at the full processor speed with little or no latency. At the L1 level, the instruction memory holds instructions only. The two data memories hold data, and a dedicated scratchpad data memory stores stack and local variable information.
Introduction common address space. The memory portions of this address space are arranged in a hierarchical structure to provide a good cost/performance balance of some very fast, low latency on-chip memory as cache or SRAM, and larger, lower cost and lower performance off-chip memory systems. The L1 memory system is the primary highest performance memory avail- able to the core.
Event Handling External Memory External (off-chip) memory is accessed via the External Bus Interface Unit (EBIU). This 16-bit interface provides a glueless connection to a bank of synchronous DRAM (SDRAM) and as many as four banks of asynchro- nous memory devices including flash memory, EPROM, ROM, SRAM, and memory-mapped I/O devices.
Page 45
Introduction servicing a higher priority event takes precedence over servicing a lower priority event. The controller provides support for five different types of events: • Emulation – Causes the processor to enter Emulation mode, allow- ing command and control of the processor via the JTAG interface. •...
Syntax Conventions Core Event Controller (CEC) The Core Event Controller supports nine general-purpose interrupts (IVG15 – 7), in addition to the dedicated interrupt and exception events. Of these general-purpose interrupts, the two lowest priority interrupts (IVG15 – 14) are recommended to be reserved for software interrupt han- dlers, leaving seven prioritized interrupt inputs to support peripherals.
Introduction This manual shows register names and instruction keywords in examples using lower case. Otherwise, in explanations and descriptions, this manual uses upper case to help the register names and keywords stand out among text. Free Format Assembler input is free format, and may appear anywhere on the line. One instruction may extend across multiple lines, or more than one instruction may appear on the same line.
Notation Conventions Comments The assembler supports various kinds of comments, including the following. • End of line: A double forward slash token (“ ”) indicates the beginning of a comment that concludes at the next newline character. • General comment: A general comment begins with the token “ ”...
Page 49
Introduction • Some instructions (such as “--SP (Push Multiple)”) require a group of adjacent registers. Adjacent registers are denoted in syntax by the range enclosed in parentheses and separated by a colon, for exam- ple, . Again, the larger number appears first. (R7:3) •...
Behavior Conventions • PC-relative, signed values are designated as “ ” with the pcrel following modifiers: • the decimal number indicates how many bits the value can include; for example, is a 5-bit pcrel5 value. • any alignment requirements are designated by an optional “...
Introduction Glossary The following terms appear throughout this document. Without trying to explain the Blackfin processor, here are the terms used with their defini- tions. See the Blackfin Processor Hardware Reference for your specific product for more details on the architecture. Register Names The architecture includes the registers shown in Table...
Glossary Table 1-1. Registers (Cont’d) Register Description Loop Count LC0 and LC1; contains 32-bit counter of the zero overhead loop executions. Loop Bottom LB0 and LB1; contains 32-bit address of the bottom of a zero overhead loop. Index The set of 32-bit registers I0, I1, I2, I3 that normally contain byte addresses of Register data structures.
Introduction Arithmetic Status Flags The MSA includes 12 arithmetic status flags that indicate specific results of a prior operation. These flags reside in the Arithmetic Status ( ASTAT Register. A summary of the flags appears below. All flags are active high. Instructions regarding P-registers, I-registers, L-registers, M-registers, or B-registers do not affect flags.
Glossary The ADSP-BF535 processor has fewer flags and some flags ASTAT operate differently than subsequent Blackfin family products. For more information on the ADSP-BF535 status flags, see Table A-1 on page A-3. Fractional Convention Fractional numbers include subinteger components less than ±1. Whereas decimal fractions appear to the right of a decimal point, binary fractions appear to the right of a binal point.
Introduction 40-bit accumulator 8-bit extension 31-bit fraction 32-bit register 31-bit fraction 16-bit register half 15-bit fraction binal point alignment Figure 1-2. Conventional Placement of Binal Point Saturation When the result of an arithmetic operation exceeds the range of the desti- nation register, important information can be lost.
Page 56
Glossary The maximum positive value in a 16-bit register is 0x7FFF. The maxi- mum negative value is 0x8000. For a signed two’s-complement 1.15 fractional notation, the allowable range is –1 through (1–2–15). The maximum positive value in a 32-bit register is 0x7FFF FFFF. The maximum negative value is 0x8000 0000.
Introduction occur when a 40-bit value is written to a 32-bit destination. If there was any useful information in the upper 8 bits of the 40-bit value, then infor- mation is lost in the process. Some processor instructions report overflow conditions in the arithmetic flags, as noted in the instruction descriptions.
Page 58
Glossary Some instructions for this processor support biased and unbiased round- ing. The bit in the Arithmetic Status ( ) Register determines RND_MOD ASTAT which mode is used. See the Blackfin Processor Hardware Reference for your specific product for more details on the Register.
Introduction Automatic Circular Addressing The Blackfin processor provides an optional circular (or “modulo”) addressing feature that increments an Index Register ( ) through a pre- Ireg defined address range, then automatically resets the to repeat that Ireg range. This feature improves input/output loop performance by eliminat- ing the need to manually reinitialize the address index pointer each time.
Page 60
Glossary The circular buffer registers define the length ( ) of the data block in Lreg bytes and the base ( ) address to reinitialize the Breg Ireg Some instructions modify an Index Register without using it for address- ing; for example, the Add Immediate Modify –...
2 COMPUTATIONAL UNITS The processor’s computational units perform numeric processing for DSP and general control algorithms. The six computational units are two arith- metic/logic units (ALUs), two multiplier/accumulator (multiplier) units, a shifter, and a set of video ALUs. These units get data from registers in the Data Register File.
Page 62
buses leads to a better understanding of proper data flow for computa- tions. Next, details about the processor’s advanced parallelism reveal how to take advantage of multifunction instructions. Figure 2-1 shows the relationship between the Data Register File and the computational units—multipliers, ALUs, and shifter.
Page 63
Computational Units ADDRESS ARITHMETIC UNIT DAG1 DAG0 PREG ASTAT SEQUENCER R7.H R7.L R6.H R6.L R5.H R5.L ALIGN R4.H R4.L R3.H R3.L DECODE R2.H R2.L R1.H R1.H BARREL R0.H R0.L SHIFTER LOOP BUFFER CONTROL UNIT DATA ARITHMETIC UNIT Figure 2-1. Processor Core Architecture ADSP-BF53x/BF56x Blackfin Processor Programming Reference...
Using Data Formats Using Data Formats ADSP-BF53x/56x processors are primarily 16-bit, fixed-point machines. Most operations assume a two’s-complement number representation, while others assume unsigned numbers or simple binary strings. Other instructions support 32-bit integer arithmetic, with further special fea- tures supporting 8-bit arithmetic and block floating point. For detailed information about each number format, see Appendix D, “Numeric Formats.”...
Computational Units Signed Numbers: Two’s-Complement In ADSP-BF53x/56x processor arithmetic, the word signed refers to two’s-complement numbers. Most ADSP-BF53x/56x processor family operations presume or support two’s-complement arithmetic. Fractional Representation: 1.15 ADSP-BF53x processor arithmetic is optimized for numerical values in a fractional binary format denoted by 1.15 (“one dot fifteen”). In the 1.15 format, 1 sign bit (the Most Significant Bit (MSB)) and 15 fractional bits represent values from –1 to 0.999969.
Register Files Register Files The processor’s computational units have three definitive register groups—a Data Register File, a Pointer Register File, and set of Data Address Generation (DAG) registers. • The Data Register File receives operands from the data buses for the computational units and stores computational results.
Computational Units Address Arithmetic Unit Registers Pointer Data Address Registers Registers User SP Supervisor SP Supervisor only register. Attempted read or write in User mode causes an exception error. Figure 2-3. Register Files Data Register File The Data Register File consists of eight registers, each 32 bits wide. Each register may be viewed as a pair of independent 16-bit registers.
Register Files Three separate buses (two load, one store) connect the Register File to the L1 data memory, each bus being 32 bits wide. Transfers between the Data Register File and the data memory can move up to two 32-bit words of valid data in each cycle.
Computational Units Register File Instruction Summary Table 2-1 lists the register file instructions. In Table 2-1, note the mean- ing of these symbols: • Allreg denotes: R[7:0], P[5:0], SP, FP, I[3:0], M[3:0], B[3:0], L[3:0], A0.X, A0.W, A1.X, A1.W, ASTAT, RETS, RETI, RETX, RETN, RETE, LC[1:0], LT[1:0], LB[1:0], USP, SEQSTAT SYSCFG, CYCLES, CYCLES2.
Page 70
Register Files • Option (X) denotes sign extended. • Option (Z) denotes zero extended. • * Indicates the flag may be set or cleared, depending on the result of the instruction. • ** Indicates the flag is cleared. • – Indicates no effect. Table 2-1.
Page 72
Data Types Some instructions manipulate data in the registers by sign-extending or zero-extending the data to 32 bits: • Instructions zero-extend unsigned data • Instructions sign-extend signed 16-bit half words and 8-bit bytes Other instructions manipulate data as 32-bit numbers. In addition, two 16-bit half words or four 8-bit bytes can be manipulated as 32-bit values.
Data Types ALU Data Types Operations on each ALU treat operands and results as either 16- or 32-bit binary strings, except the signed division primitive ( ). ALU result sta- DIVS tus bits treat the results as signed, indicating status with the overflow flags ) and the negative flag ( ).
Computational Units unsigned, a mixture, or a rounding operation). The 32-bit result from the multipliers is assumed to be signed; it is sign-extended across the full 40-bit width of the registers. The processor supports two modes of format adjustment: the fractional mode for fractional operands (1.15 format with 1 sign bit and 15 frac- tional bits) and the integer mode for integer operands (16.0 format).
Data Types Shifter results generate status information. For more information about using shifter status, see “Shifter Instruction Summary” on page 2-53. Arithmetic Formats Summary Table 2-3, Table 2-4, Table 2-5, and Table 2-6 summarize some of the arithmetic characteristics of computational operations. Table 2-3.
Computational Units Table 2-5. Multiplier Arithmetic Integer Modes Formats (Cont’d) Operation Operand Formats Result Formats Multiplication/Addition 16.0 explicitly signed or 32.0 not shifted unsigned Multiplication/Subtraction 16.0 explicitly signed or 32.0 not shifted unsigned Table 2-6. Shifter Arithmetic Formats Operation Operand Formats Result Formats Logical Shift Unsigned binary string...
Page 78
Data Types With either fractional or integer operations, the multiplier output product is fed into a 40-bit adder/subtracter which adds or subtracts the new prod- uct with the current contents of the register to produce the final 40-bit result. SHIFTED ZERO FILLED P SIGN,...
Data Types Unbiased Rounding The convergent rounding method returns the number closest to the origi- nal. In cases where the original number lies exactly halfway between two numbers, this method returns the nearest even number, the one contain- ing an LSB of 0. For example, when rounding the 3-bit, two’s-complement fraction 0.25 (binary 0.01) to the nearest 2-bit, two’s-complement fraction, the result would be 0.0, because that is the even-numbered choice of 0.5 and 0.0.
Page 81
Computational Units UNROUNDED VALUE: X X X X X X X X X X X X X X X X 0 0 1 0 0 1 0 1 1 X X X X X X X X X X X X X X X ADD 1 AND CARRY: ROUNDED VALUE: X X X X X X X X...
Data Types UNROUNDED VALUE: X X X X X X X X X X X X X X X X 0 1 1 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 ADD 1 AND CARRY: A0 BIT 16 = 1: X X X X X X X X...
Computational Units bit in the register enables biased rounding. When the RND_MOD ASTAT bit is cleared, the option in multiplier instructions uses the RND_MOD normal, unbiased rounding operation, as discussed in “Unbiased Round- ing” on page 2-20. When the bit is set (=1), the processor uses biased rounding RND_MOD instead of unbiased rounding.
Using Computational Status Special Rounding Instructions The ALU provides the ability to round the arithmetic results directly into a data register with biased or unbiased rounding as described above. It also provides the ability to round on different bit boundaries. The options , and round at bit 12, bit 16, and bit 20, respectively, RND12...
Computational Units ASTAT Register Figure 2-9 describes the Arithmetic Status ( ) register. The processor ASTAT updates the status bits in , indicating the status of the most recent ASTAT ALU, multiplier, or shifter operation. Arithmetic Status Register (ASTAT) 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 Reset = 0x0000 0000 VS (Sticky Dreg Overflow) AV0 (A0 Overflow)
Arithmetic Logic Unit (ALU) Arithmetic Logic Unit (ALU) The two ALUs perform arithmetic and logical operations on fixed-point data. ALU fixed-point instructions operate on 16-, 32-, and 40-bit fixed-point operands and output 16-, 32-, or 40-bit fixed-point results. ALU instructions include: •...
Computational Units Single 16-Bit Operations In single 16-bit operations, any two 16-bit register halves may be used as the input to the ALU. An addition, subtraction, or logical operation pro- duces a 16-bit result that is deposited into an arbitrary destination register half.
Arithmetic Logic Unit (ALU) Quad 16-Bit Operations In quad 16-bit operations, any two 32-bit registers may be used as the inputs to ALU0 and ALU1, considered as pairs of 16-bit operands. A small number of addition or subtraction operations produces four 16-bit results that are deposited into two arbitrary, 32-bit destination registers.
Computational Units Single 32-Bit Operations In single 32-bit operations, any two 32-bit registers may be used as the input to the ALU, considered as 32-bit operands. An addition, subtrac- tion, or logical operation produces a 32-bit result that is deposited into an arbitrary 32-bit destination register.
Arithmetic Logic Unit (ALU) For example: R3 = R1 + R2, R4 = R1 – R2 (NS) ; adds the 32-bit contents of to the 32-bit contents of and deposits the result in with no saturation. The instruction also subtracts the 32-bit contents of from that of and deposits the result in with no saturation.
Page 91
Computational Units • An denotes either ALU Result register • DIVS denotes a Divide Sign primitive. • DIVQ denotes a Divide Quotient primitive. • MAX denotes the maximum, or most positive, value of the source registers. • MIN denotes the minimum value of the source registers. •...
Computational Units Special SIMD Video ALU Operations Four 8-bit Video ALUs enable the processor to process video information with high efficiency. Each Video ALU instruction may take from one to four pairs of 8-bit inputs and return one to four 8-bit results. The inputs are presented to the Video ALUs in two 32-bit words from the Data Reg- ister File.
Multiply Accumulators (Multipliers) Inputs are treated as fractional or integer, unsigned or two’s-complement. Multiplier instructions include: • Multiplication • Multiply and accumulate with addition, rounding optional • Multiply and accumulate with subtraction, rounding optional • Dual versions of the above Multiplier Operation Each multiplier has two 32-bit inputs from which it derives the two 16-bit operands.
Computational Units Placing Multiplier Results in Multiplier Accumulator Registers As shown in Figure 2-10 on page 2-42, each multiplier has a dedicated accumulator, . Each Accumulator register is divided into three sec- tions— (bits 15:0), (bits 31:16), and (bits A0.L/A1.L A0.H/A1.H A0.X/A1.X 39:32).
Multiply Accumulators (Multipliers) • If an overflow or underflow has occurred, the saturate operation sets the specified Result register to the maximum positive or nega- tive value. For more information, see the following section. Saturating Multiplier Results on Overflow The following bits in indicate multiplier overflow status: ASTAT •...
Page 99
Computational Units • An denotes either MAC Accumulator register • * Indicates the flag may be set or cleared, depending on the results of the instruction. • – Indicates no effect. Multiplier instruction options are described on page 2-40. Table 2-10. Multiplier Instruction Summary Instruction ASTAT Status Flags AV0S...
Multiply Accumulators (Multipliers) Multiplier Instruction Options The following descriptions of multiplier instruction options provide an overview. Not all options are available for all instructions. For informa- tion about how to use these options with their respective instructions, see Chapter 15, “Arithmetic Operations.” default No option;...
Page 101
Computational Units If multiplying and accumulating to a half register: When copying the lower 16 bits to the destination half register, the Accumulator contents are scaled. If scaling produces a signed value greater than 16 bits, the number is saturated to its maximum positive or negative value.
Multiply Accumulators (Multipliers) Multiplier Data Flow Details Figure 2-10 shows the Register files and ALUs, along with the multiplier/ accumulators. TO MEMORY ALUs OPERAND OPERAND R0.H R0.L SELECTION SELECTION R1.H R1.L MAC0 MAC1 R2.H R2.L R3.H R3.L R4.H R4.L SHIFTER R5.H R5.L R6.H...
Page 103
Computational Units contain the same register information, giving the options for squaring and multiplying the high half and low half of the same register. Figure 2-11 show these possible combinations. MAC0 MAC0 MAC0 MAC0 Figure 2-11. Four Possible Combinations of MAC Operations The 32-bit product is passed to a 40-bit adder/subtracter, which may add or subtract the new product from the contents of the Accumulator Result register or pass the new product directly to the Data Register File Results...
Multiply Accumulators (Multipliers) For example: A1 += R3.H * R4.H ; In this instruction, the multiplier/accumulator performs a multiply MAC1 and accumulates the result with the previous results in the Accumulator. Multiply Without Accumulate The multiplier may operate without the accumulation function. If accu- mulation is not used, the result can be directly stored in a register from the Data Register File or the Accumulator register.
Computational Units Dual MAC Operations The processor has two 16-bit MACs. Both MACs can be used in the same operation to double the MAC throughput. The same two 32-bit input registers are offered to each MAC unit, providing each with four possible combinations of 16-bit input operands.
Barrel Shifter (Shifter) The operand type determines the correct bits to extract from the Accumu- lator and deposit in the 16-bit destination register. See “Multiply Without Accumulate” on page 2-44. R3 = (A1 += R1.H * R2.L), R2 = (A0 += R1.L * R2.L) ; In this instruction, the 40-bit Accumulators are packed into two 32-bit registers.
Computational Units The arithmetic shift and logical shift operations can be further broken into subsections. Instructions that are intended to operate on 16-bit single or paired numeric values (as would occur in many DSP algorithms) can use the instructions . These are typically three-operand ASHIFT LSHIFT instructions.
Barrel Shifter (Shifter) The following example shows the input value upshifted. R0 contains 0000 B6A3 ; R0 <<= 0x04 ; results in R0 contains 000B 6A30 ; Register Shifts Register-based shifts use a register to hold the shift value. The entire 32-bit register is used to derive the shift value, and when the magnitude of the shift is greater than or equal to 32, then the result is either 0 or –1.
Computational Units The following example shows the input value downshifted. R0 contains 0000 B6A3 ; R1 = R0 >> 0x04 ; results in R1 contains 0000 0B6A ; The following example shows the input value upshifted. R0.L contains B6A3 ; R1.H = R0.L <<...
Barrel Shifter (Shifter) The following example shows the input value rotated. Assume the Condi- tion Code ( ) bit is set to 0. For more information about , see “Condition Code Flag” on page 4-18. R0 contains ABCD EF12 ; R2.L contains 0004 ;...
Computational Units Two register arguments are used for these functions. One holds the 32-bit destination or 32-bit source. The other holds the extract/deposit value, its length, and its position within the source. Shifter Instruction Summary Table 2-11 lists the shifter instructions. For more information about assembly language syntax and the effect of shifter instructions on the sta- tus flags, see Chapter 14, “Shift/Rotate Operations.”...
Page 114
Barrel Shifter (Shifter) Table 2-11. Shifter Instruction Summary Instruction ASTAT Status Flag AN AC0 CC V AC0_COPY AV0S AV1S V_COPY BITCLR ( Dreg, uimm5 ) ; – – – **/– BITSET ( Dreg, uimm5 ) ; – – – **/– BITTGL ( Dreg, uimm5 ) ;...
Page 115
Computational Units Table 2-11. Shifter Instruction Summary (Cont’d) Instruction ASTAT Status Flag AN AC0 CC V AC0_COPY AV0S AV1S V_COPY An = An >>> uimm5 ; – ** 0/ ** 1/– – – – An = An >> uimm5 ; –...
Page 116
Barrel Shifter (Shifter) Table 2-11. Shifter Instruction Summary (Cont’d) Instruction ASTAT Status Flag AN AC0 CC V AC0_COPY AV0S AV1S V_COPY Dreg_lo_hi = LSHIFT – – – – **/– Dreg_lo_hi BY Dreg_lo ; An = An ASHIFT BY Dreg _lo ; –...
3 OPERATING MODES AND STATES The processor supports the following three processor modes: • User mode • Supervisor mode • Emulation mode Emulation and Supervisor modes have unrestricted access to the core resources. User mode has restricted access to certain system resources, thus providing a protected software environment.
Table 3-1. Identifying the Current Processor Mode Event Mode IPEND ≥ 0x10 Interrupt Supervisor but IPEND[0], IPEND[1], IPEND[2], and IPEND[3] = 0. ≥ 0x08 Exception Supervisor The core is processing an exception event if IPEND[0] = 0, IPEND[1] = 0, IPEND[2] = 0, IPEND[3] = 1, and IPEND[15:4] are 0’s or 1’s.
Operating Modes and States IDLE instruction USER Application Level Code Wakeup Interrupt System Code, RTI, Event Handlers Exception RTX, RTN IDLE instruction SUPERVISOR Emulation Event Emulation Event Interrupt IDLE Active RST Inactive RESET EMULATION Emulation Event (1) (1) Normal exit from Reset is to Supervisor mode. However, emulation hardware may have initiated a reset.
User Mode Table 3-2. Registers Accessible in User Mode Processor Registers Register Names Data Registers R[7:0], A[1:0] Pointer Registers P[5:0], SP, FP, I[3:0], M[3:0], L[3:0], B[3:0] Sequencer and Status Registers RETS, LC[1:0], LT[1:0], LB[1:0], ASTAT, CYCLES, CYCLES2 Protected Resources and Instructions System resources consist of a subset of processor registers, all MMRs, and a subset of protected instructions.
Operating Modes and States Protected Memory Additional memory locations can be protected from User mode access. A Cacheability Protection Lookaside Buffer (CPLB) entry can be created and enabled. See “Memory Management Unit” on page 6-45 for further information. Entering User Mode When coming out of reset, the processor is in Supervisor mode because it is servicing a reset event.
Page 122
User Mode case of an interrupt routine, if the service routine is interruptible, the return address is stored on the stack. For this case, the address can be found by popping the value from the stack into . Once has been RETI RETI loaded, the...
Operating Modes and States Supervisor Mode The processor services all interrupt, NMI, and exception events in Super- visor mode. Supervisor mode has full, unrestricted access to all processor system resources, including all emulation resources, unless a CPLB has been con- figured and enabled.
Supervisor Mode The interrupt handler for can be set to jump to the application code IVG15 starting address. An additional is not required. As a result, the proces- sor remains in Supervisor mode because remains set. At this IPEND[15] point, the processor is servicing the lowest priority interrupt. This ensures that higher priority interrupts can be processed.
Operating Modes and States RTI ; /* Return from Reset Event */ WAIT_HERE : /* Wait here till IVG15 interrupt is serviced */ JUMP WAIT_HERE ; START: /* IVG15 vectors here */ [--SP] = RETI ; /* Enables interrupts and saves return address to stack */ Emulation Mode The processor enters Emulation mode if Emulation mode is enabled and...
Reset State The processor remains in the Idle state until a peripheral or external device, such as a SPORT or the Real-Time Clock (RTC), generates an interrupt that requires servicing. Listing 3-3, core interrupts are disabled and the instruction is exe- IDLE cuted.
Page 127
Operating Modes and States Software in Supervisor or Emulation mode can invoke the Reset state without involving the external signal. This can be done by issuing RESET the Reset version of the instruction. RAISE Application programs in User mode cannot invoke the Reset state, except through a system call provided by an operating system kernel.
System Reset and Powerup System Reset and Powerup Table 3-6 describes the five types of resets. Note all resets, except System Software, reset the core. Table 3-6. Resets Reset Source Result Hardware Reset The RESET pin causes a Resets both the core and the peripherals, hardware reset.
Operating Modes and States Table 3-6. Resets (Cont’d) Reset Source Result Core Double- If the core enters a dou- Resets both the core and the peripherals, Fault Reset ble-fault state, a reset can be excluding the RTC block and most of the caused by unmasking the DPMC.
System Reset and Powerup System Software reset. See “Reset Interrupt” on page 4-46, and Table 4-11, “Events That Cause Exceptions,” on page 4-63 for further information. SYSCR Register The values sensed from the pins are latched into the System Reset BMODE Configuration register ( ) upon the deassertion of the...
Operating Modes and States After either the watchdog or System Software reset is initiated, the proces- sor ensures that all asynchronous peripherals have recognized and completed a reset. For a reset generated by the watchdog timer, the processors transitions into the Boot mode sequence. The Boot mode is configured by the state of and the No Boot on Software Reset control bits.
4 PROGRAM SEQUENCER This chapter describes the Blackfin processor program sequencing and interrupt processing modules. For information about instructions that control program flow, see Chapter 7, “Program Flow Control.” For infor- mation about instructions that control interrupt processing, see Chapter 16, “External Event Management.”...
Page 136
Introduction • Interrupts and Exceptions. A runtime event or instruction triggers the execution of a subroutine. • Idle. An instruction causes the processor to stop operating and hold its current state until an interrupt occurs. Then, the processor services the interrupt and continues normal execution. LINEAR FLOW LOOP JUMP...
Page 137
Program Sequencer The fetched address enters the instruction pipeline, ending with the pro- gram counter ( ). The pipeline contains the 32-bit addresses of the instructions currently being fetched, decoded, and executed. The cou- ples with the registers, which store return addresses. All addresses RETn generated by the sequencer are 32-bit memory instruction addresses.
Page 138
Introduction SYSTEM INTERRUPT CONTROLLER SIC_IAR0 SIC_ISR DYNAMIC POWER SIC_IAR1 SIC_IWR PERIPHERALS MANAGEMENT SIC_IAR2 SIC_IMASK SIC_IAR3 SCLK PAB 16/32 CCLK CORE EVENT CONTROLLER EMULATION RESET ILAT IMASK EXCEPTIONS IPEND HARDWARE ERRORS CORE TIMER RAB 32 PROGRAM SEQUENCER PREG 32 ADDRESS ARITHMETIC PROGRAM UNIT LC0 LT0 LB0...
Program Sequencer Sequencer Related Registers Table 4-1 lists the non-memory-mapped registers within the processor that are related to the sequencer. Except for the registers, SEQSTAT all sequencer-related registers are directly readable and writable by move instructions, for example: SYSCFG = R0 ; P0 = RETI ;...
Page 140
Introduction Table 4-1. Non-memory-mapped Sequencer Registers Register Name Description SEQSTAT Sequencer Status register: See “Hardware Errors and Exception Handling” on page 4-58. Return Address registers: See “Events and Interrupts” on page 4-29. RETX Exception Return RETN NMI Return RETI Interrupt Return RETE Emulation Return RETS...
Program Sequencer Instruction Pipeline The program sequencer determines the next instruction address by exam- ining both the current instruction being executed and the current state of the processor. If no conditions require otherwise, the processor executes instructions from memory in sequential order by incrementing the look- ahead address.
Page 142
Instruction Pipeline Figure 4-3 shows a diagram of the pipeline. Instr Instr Instr Instr Addr Data Data Fetch Decode Calc Fetch Fetch Fetch Fetch Instr Instr Instr Instr Addr Data Data Fetch Fetch Fetch Decode Calc Fetch Fetch Figure 4-3. Processor Pipeline The instruction fetch and branch logic generates 32-bit fetch addresses for the Instruction Memory Unit.
Page 143
Program Sequencer Register file reads occur in the DF2 pipeline stage (for operands). Register file writes occur in the WB stage (for stores). The multipliers and the video units are active in the EX1 stage, and the ALUs and shifter are active in the EX2 stage.
Branches Branches One type of nonsequential program flow that the sequencer supports is branching. A branch occurs when a instruction begins execu- JUMP CALL tion at a new location other than the next sequential address. For descriptions of how to use the instructions, see Chapter 7, JUMP...
Program Sequencer Branches can be direct or indirect. A direct branch address is determined solely by the instruction word (for example, ), while an indirect JUMP 0x30 branch gets its address from the contents of a DAG register (for example, JUMP(P3) All types of s and...
Branches JUMP mylabel ; /* skip any code placed here */ mylabel: /* continue to fetch and execute instruction here */ Direct Call instruction is a branch instruction that copies the address of the CALL instruction which would have executed next (had the instruction not CALL executed) into the...
Program Sequencer mytarget: /* continue here */ Legacy style: P4.H = mytarget; P4.L = mytarget; JUMP (P4); mytarget: /* continue here */ PC-Relative Indirect Branch and Call The PC-relative indirect instructions use the contents of a JUMP CALL P-register as an offset to the branch target. For the instruction, the CALL register is loaded with the address of the instruction which would...
Page 151
Program Sequencer The Blackfin instruction set features a pair of instructions that provides cleaner and more efficient functionality than the above example: the LINK instructions. These multi-cycle instructions perform multiple UNLINK operations that can be best explained by the equivalent code sequences: Table 4-3.
Branches If subroutines require local, private, and temporary variables beyond the capabilities of core registers, it is a good idea to place these variables on the stack as well. The instruction takes a parameter that specifies the size LINK of the stack memory required for this local purpose. The following exam- ple provides two local 32-bit variables and initializes them to zero when the routine is entered: _sub3:...
Program Sequencer • A status flag may be copied into , and the value in may be copied to a status flag. • The flag bit may be set to the result of a Pointer register comparison. • The flag bit may be set to the result of a Data register comparison.
Branches PC-relative offset is an 11-bit immediate value that must be a multiple of two (bit 0 must be a 0). This gives an effective dynamic range of –1024 to +1022 bytes. For example, the following instruction tests the flag and, if it is posi- tive, jumps to a location identified by the label dest_address IF CC JUMP dest_address ;...
Program Sequencer The branch latency for conditional branches is as follows. • If prediction was “not to take branch,” and branch was actually not taken: 0 cycles. CCLK • If prediction was “not to take branch,” and branch was actually taken: 8 cycles.
Page 156
Hardware Loops Two sets of zero-overhead loop registers implement loops, using hardware counters instead of software instructions to evaluate loop conditions. After evaluation, processing branches to a new target address. Both sets of regis- ters include the Loop Counter ( ), Loop Top ( ), and Loop Bottom ) registers.
Page 157
Program Sequencer Listing 4-1. Loop Example P5 = 0x20 ; LSETUP ( lp_start, lp_end ) LCO = P5 ; lp_start: R5 = R0 + R1(ns) || R2 = [P2++] || R3 = [I1++] ; lp_end: R5 = R5 + R2 ; When executing an instruction, the program sequencer loads the LSETUP...
Hardware Loops Table 4-6. Loop Registers First/Last Address of the PC-Relative Offset Used to Effective Range of the Loop Start Loop Compute the Loop Start Instruction Address Top / First 5-bit signed immediate; must be 0 to 30 bytes away from LSETUP a multiple of 2.
Page 159
Program Sequencer Therefore, two-dimensional loops are supported directly in hardware, consisting of an outer loop and a nested inner loop. The outer loop is always represented by loop unit 0 ( while loop unit 1 ( ) manages the inner loop. To enable the two nested loops to end at the same instruction ( equals ), loop unit 1 is assigned higher priority than loop unit 0.
Hardware Loops Loop Unrolling Typical DSP algorithms are coded for speed rather than for small code size. Especially when fetching data from circular buffers, loops are often unrolled in order to pass only N-1 times. The initial data fetch is executed before the loop is entered.
Program Sequencer Saving and Resuming Loops Normally, loops can process and terminate without regard to system-level concepts. Even if interrupted by interrupts or exceptions, no special care is needed. There are, however, a few situations that require special atten- tion—whenever a loop is interrupted by events that require the loop resources themselves, that is: •...
Hardware Loops To avoid unnecessary penalty cycles, the loop hardware follows these rules: • Restoring registers always re-initializes the loop hard- ware and causes a ten-cycle “replay” penalty. • Restoring , and performs in a single cycle if the respective loop counter register is zero. •...
Program Sequencer /* If the handler uses loop 0, it is a good idea to have it leave LC0 equal to zero at the end. Normally, this will happen naturally as a loop is fully executed. If LC0 == 0, then LT0 and LB0 restores will not incur additional cycles.
Page 164
Events and Interrupts An interrupt is an event that changes normal processor instruction flow and is asynchronous to program flow. In contrast, an exception is a soft- ware initiated event whose effects are synchronous to program flow. The event system is nested and prioritized. Consequently, several service routines may be active at any time, and a low priority event may be pre-empted by one of higher priority.
Program Sequencer Note the System Interrupt to Core Event mappings shown are the default values at reset and can be changed by software. System Interrupt Processing Referring to Figure 4-4 on page 4-33, note when an interrupt (Interrupt A) is generated by an interrupt-enabled peripheral: logs the request and keeps track of system interrupts that SIC_ISR are asserted but not yet serviced (that is, an interrupt service rou-...
Page 166
Events and Interrupts 8. When the event vector for Interrupt A has entered the core pipe- line, the appropriate bit is set, which clears the respective IPEND bit. Thus, tracks all pending interrupts, as well as those ILAT IPEND being presently serviced. 9.
Program Sequencer RESET "INTERRUPT IVTMR A" IVHW PERIPHERAL CORE INTERRUPT CORE SYSTEM ASSIGN EVENT CORE REQUESTS INTERRUPT INTERRUPT SYSTEM VECTOR STATUS MASK MASK PRIORITY TABLE (ILAT) (IMASK) (SIC_IMASK) (SIC_IARx) (EVT[15:0]) SYSTEM SYSTEM CORE WAKEUP STATUS PENDING (SIC_IWR) (SIC_ISR) (IPEND) TO DYNAMIC POWER MANAGEMENT CONTROLLER SYSTEM INTERRUPT CONTROLLER...
Events and Interrupts If the default assignments shown in the System Interrupt Appendix of the Blackfin Processor Hardware Reference for your part are acceptable, then interrupt initialization involves only: • Initialization of the core Event Vector Table (EVT) vector address entries •...
Page 169
Program Sequencer register has no effect unless the core is idled. The bits in this SIC_IWR register correspond to those of the System Interrupt Mask ( SIC_IMASK and Interrupt Status ( ) registers. SIC_ISR After reset, all valid bits of this register are set to 1, enabling the wakeup function for all interrupts that are not masked.
Events and Interrupts Depending on how interrupt sources map to the general-purpose interrupt inputs of the core, the interrupt service routine may have to interrogate multiple interrupt status bits to determine the source of the interrupt. One of the first instructions executed in an interrupt service routine should read to determine whether more than one of the peripher- SIC_ISR...
Program Sequencer Although this register can be read from or written to at any time (in Supervisor mode), it should be configured in the reset initialization sequence before enabling interrupts. System Interrupt Assignment Registers (SIC_IARx) The relative priority of peripheral interrupts can be set by mapping the peripheral interrupt to the appropriate general-purpose interrupt level in the core.
Events and Interrupts Core Event Controller Registers The Event Controller uses three MMRs to coordinate pending event requests. In each of these MMRs, the 16 lower bits correspond to the 16 event levels (for example, bit 0 corresponds to “Emulator mode”). The registers are: •...
Events and Interrupts Table 4-8 lists events by priority. Each event has a corresponding bit in the event state registers , and ILAT IMASK IPEND Table 4-8. Core Event Vector Table Name Event Class Event Vector MMR Location Notes Register Emulation EVT0 0xFFE0 2000...
Page 177
Program Sequencer into the register prior to jumping to the event vector. A typical inter- RETI rupt service routine terminates with an instruction that instructs the sequencer to reload the Program Counter, , from the register. The RETI following example shows a simple interrupt service routine. isr: [--SP] = (R7:0, P5:0);...
Page 178
Events and Interrupts If there is not a need for non-interruptible code inside the service routine, it is good programming practice to enable nesting immediately. This avoids unnecessary delay to high priority interrupt routines. For example: isr: [--SP] = RETI; /* enable nesting */ [--SP] = (R7:0, P5:0);...
Program Sequencer Table 4-9. Return Registers and Instructions (Cont’d) Name Event Class Return Register Return Instruction IVG8 Interrupt 8 RETI IVG9 Interrupt 9 RETI IVG10 Interrupt 10 RETI IVG11 Interrupt 11 RETI IVG12 Interrupt 12 RETI IVG13 Interrupt 13 RETI IVG14 Interrupt 14 RETI...
Events and Interrupts Reset Interrupt The reset interrupt ( ) can be initiated via the pin or through RESET expiration of the watchdog timer. This location differs from that of other interrupts in that its content is read-only. Writes to this address change the register but do not change where the processor vectors upon reset.
Program Sequencer Exceptions Exceptions are discussed in “Hardware Errors and Exception Handling” on page 4-58. Hardware Error Interrupt Hardware Errors are discussed in “Hardware Errors and Exception Han- dling” on page 4-58. Core Timer Interrupt The Core Timer Interrupt ( ) is triggered when the core timer value IVTMR reaches zero.
Interrupt Processing Interrupt Processing The following sections describe interrupt processing. Global Enabling/Disabling of Interrupts General-purpose interrupts can be globally disabled with the CLI Dreg instruction and re-enabled with the instruction, both of which STI Dreg are only available in Supervisor mode. Reset, NMI, emulation, and excep- tion events cannot be globally disabled.
Page 183
Program Sequencer To determine when to service an interrupt, the controller logically ANDs the three quantities in , and the current processor priority ILAT IMASK level. Servicing the highest priority interrupt involves these actions: 1. The interrupt vector in the Event Vector Table (EVT) becomes the next fetch address.
Interrupt Processing Software Interrupts Software cannot set bits of the register directly, as writes to ILAT ILAT cause write-1-to-clear (W1C) operation. Instead, use the instruction RAISE to set individual bits by software. It safely sets any of the bits ILAT ILAT without affecting the rest of the register.
Program Sequencer “Example Code for an Exception Handler” on page 4-68 uses the same principle to handle an exception with normal interrupt priority level. Nesting of Interrupts Interrupts are handled either with or without nesting, individually. For more information, see “Return Registers and Instructions”...
Page 186
Interrupt Processing INTERRUPTS DISABLED DURING THIS INTERVAL. CYCLE: IF 1 . . . A1 0 . . . IF 2 A1 0 . . . IF 3 ... . I n-1 .
Page 187
Program Sequencer Figure 4-9 illustrates that by pushing onto the stack, interrupts can RETI be re-enabled during an interrupt service routine, resulting in a short duration where interrupts are globally disabled. INTERRUPTS DISABLED INTERRUPTS DISABLED DURING THIS INTERVAL. DURING THIS INTERVAL. CYCLE: IF 1 PUSH...
Interrupt Processing [--SP] = FP ; [-- SP] = (R7:0, P5:0) ; /* Body of service routine. Note none of the processor resources (accumulators, DAGs, loop counters and bounds) have been saved. It is assumed this interrupt service routine does not use the processor resources.
Program Sequencer Logging of Nested Interrupt Requests The System Interrupt Controller () detects level-sensitive interrupt requests from the peripherals. The Core Event Controller (CEC) provides edge-sensitive detection for its general-purpose interrupts ( IVG7-IVG15 Consequently, the SIC generates a synchronous interrupt pulse to the CEC and then waits for interrupt acknowledgement from the CEC.
Interrupt Processing Self-nesting is not supported for system level peripheral interrupts such as the SPORT or SPI. register is discussed in “SYSCFG Register” on page 21-26. SYSCFG Additional Usability Issues The following sections describe additional usability issues. Allocating the System Stack The software stack model for processing exceptions implies that the Supervisor stack must never generate an exception while the exception handler is saving its state.
Page 191
Program Sequencer In order for high priority interrupts to be serviced with the least latency possible, the processor allows any high latency fill operation to be com- pleted at the system level, while an interrupt service routine executes from L1 memory. See Figure 4-10.
Hardware Errors and Exception Handling Note the interrupt service routine must reside in L1 cache or SRAM mem- ory and must not generate a cache miss, an L2 memory access, or a peripheral access, as the processor is already busy completing the original cache line fill operation.
Program Sequencer SEQSTAT Register The Sequencer Status register ( ) contains information about the SEQSTAT current state of the sequencer as well as diagnostic information from the last event. is a read-only register and is accessible only in Supervi- SEQSTAT sor mode.
Page 194
Hardware Errors and Exception Handling Interrupt service routine can then read the cause of the error from the 5-bit field appearing in the Sequencer Status register HWERRCAUSE ) and respond accordingly. SEQSTAT The Hardware Error Interrupt is generated by: • Bus parity errors •...
Program Sequencer Table 4-10. Hardware Conditions Causing Hardware Error Interrupts Hardware HWERRCAUSE HWERRCAUSE Notes / Examples Condition (Binary) (Hexadecimal) System MMR 0b00010 0x02 An error can occur if an invalid Sys- Error tem MMR location is accessed, if a 32-bit register is accessed with a 16-bit instruction, or if a 16-bit register is accessed with a 32-bit instruction.
Page 196
Hardware Errors and Exception Handling An excepting instruction may or may not commit before the exception event is taken, depending on if it is a service type or an error type exception. An instruction causing a service type event will commit, and the address written to the register will be the next instruction after the excepting RETX...
Page 197
Program Sequencer Table 4-11. Events That Cause Exceptions Exception EXCAUSE Type: Notes/Examples [5:0] (E) Error (S) Service See note 1. Force Exception m field Instruction provides 4 bits of EXCAUSE. instruction EXCPT with 4-bit m field Single step 0x10 When the processor is in single step mode, every instruction generates an exception.
Page 198
Hardware Errors and Exception Handling Table 4-11. Events That Cause Exceptions (Cont’d) Exception EXCAUSE Type: Notes/Examples [5:0] (E) Error (S) Service See note 1. Data access mis- 0x24 Attempted misaligned data memory or aligned address viola- data cache access. tion Unrecoverable event 0x25 For example, an exception generated while...
Page 199
Program Sequencer Table 4-11. Events That Cause Exceptions (Cont’d) Exception EXCAUSE Type: Notes/Examples [5:0] (E) Error (S) Service See note 1. Instruction fetch mul- 0x2D More than one CPLB entry matches tiple CPLB hits instruction fetch address. Illegal use of supervi- 0x2E Attempted to use a Supervisor register or sor resource...
Program Sequencer • The field in is updated with an unrecoverable EXCAUSE SEQSTAT event code. • The address of the offending instruction is saved in . Note if RETX the processor were executing, for example, the NMI handler, the register would not have been updated; the excepting instruc- RETN tion address is always stored in RETX...
Hardware Errors and Exception Handling Deferring Exception Processing Exception handlers are usually long routines, because they must discrimi- nate among several exception causes and take corrective action accordingly. The length of the routines may result in long periods during which the interrupt system is, in effect, suspended. To avoid lengthy suspension of interrupts, write the exception handler to identify the exception cause, but defer the processing to a low priority interrupt.
Page 203
Program Sequencer /* Using jump table EVTABLE, jump to the event pointed to by R0 P0 = R0 ; P1 = _EVTABLE ; P0 = P1 + ( P0 << 1 ) ; R0 = W [ P0 ] (Z) ; P1 = R0 ;...
Hardware Errors and Exception Handling /* The jump table EVTABLE holds 16-bit address offsets for each event. With offsets, this code is position independent and the table is small. +--------------+ | addr_event1 | _EVTABLE +--------------+ | addr_event2 | _EVTABLE + 2 +--------------+ .
5 ADDRESS ARITHMETIC UNIT Like most DSP and RISC platforms, the Blackfin processors have a load/store architecture. Computation operands and results are always rep- resented by core registers. Prior to computation, data is loaded from memory into core registers and results are stored back by explicit move operations.
Page 206
ADDRESS ARITHMETIC UNIT DAG1 DAG0 PREG TO L1 DATA MEMORY TO SEQUENCER Figure 5-1. AAU Block Diagram The AAU architecture supports several functions that minimize overhead in data access routines. These functions include: • Supply address – Provides an address during a data access •...
Page 207
Address Arithmetic Unit The AAU comprises two DAGs, nine Pointer registers, four Index regis- ters and four complete sets of related Modify, Base, and Length registers. These registers, shown in Figure 5-2 on page 5-4, hold the values that the DAGs use to generate addresses.
Page 208
can be used for 8-, 16-, and 32-bit memory accesses. For added mode protection, is accessible only in Supervisor mode, while is accessible in User mode. Do not assume the L-registers are automatically initialized to zero for linear addressing. The I-, M-, L-, and B-registers contain ran- dom values after reset.
Address Arithmetic Unit Addressing With the AAU The DAGs can generate an address that is incremented by a value or by a register. In post-modify addressing, the DAG outputs the I-register value unchanged; then the DAG adds an M-register or immediate value to the I-register.
Addressing With the AAU Instructions using Index registers use an M-register or a small immediate value (+/– 2 or 4) as the modifier. Instructions using Pointer registers use a small immediate value or another P-register as the modifier. For details, Table 5-3, “AAU Instruction Summary,”...
Page 211
Address Arithmetic Unit The User Stack Pointer register and the Supervisor Stack Pointer register are accessed using the register alias . Depending on the current proces- sor operating mode, only one of these registers is active and accessible as • In User mode, any reference to (for example, stack pop ) implicitly uses the as the effective address.
Addressing With the AAU DAG Register Set DSP instructions primarily use the Data Address Generator (DAG) regis- ter set for addressing. The data address register set consists of these registers: • contain index addresses I[3:0] • contain modify values M[3:0] •...
Address Arithmetic Unit For example: R0 = [ I2 ] loads a 32-bit value from an address pointed to by and stores it in the destination register R0.H = W [ I2 ] loads a 16-bit value from an address pointed to by and stores it in the 16-bit destination register R0.H.
Addressing With the AAU Indexed Addressing With Immediate Offset Indexed addressing allows programs to obtain values from data tables, with reference to the base of that table. The Pointer register is modified by the immediate field and then used as the effective address. The value of the Pointer register is not updated.
Address Arithmetic Unit For example: R0 = [ I2-- ] ; loads a 32-bit value into the destination register and decrements the Index register by 4. Pre-modify Stack Pointer Addressing The only pre-modify instruction in the processor uses the Stack Pointer register, .
Addressing With the AAU For example: R2 = W [ P4++P5 ] (Z) ; loads a 16-bit word into the low half of the destination register zero-extends it to 32 bits. The value of the pointer is incremented by the value of the pointer For example: R2 = [ I2++M1 ] loads a 32-bit word into the destination register...
Page 217
Address Arithmetic Unit • The Length (L) register sets the size of the circular buffer and the address range through which the DAG circulates the I-register. L is positive and cannot have a value greater than 2 – 1. If an L-register’s value is zero, its circular buffer operation is disabled.
Page 218
Addressing With the AAU LENGTH = 11 BASE ADDRESS = 0X0 MODIFIER = 4 THE COLUMNS ABOVE SHOW THE SEQUENCE IN ORDER OF LOCATIONS ACCESSED IN ONE PASS. THE SEQUENCE REPEATS ON SUBSEQUENT PASSES. Figure 5-3. Circular Data Buffers As seen in Figure 5-3, on the first post-modify access to the buffer, the DAG outputs the I-register value on the address bus, then modifies the...
Address Arithmetic Unit In equation form, these post-modify and wraparound operations work as follows, shown for “I+M” operations. • If M is positive: if I + M < buffer base + length (end of buffer) + M – L + M ≥ buffer base + length (end of buffer) if I •...
Memory Address Alignment The address-modify operation modifies addresses in any Index and Pointer register ( ) without accessing memory. If the I[3:0] P[5:0] Index register’s corresponding B- and L-registers are set up for circular buffering, the address-modify operation performs the specified buffer wraparound (if needed).
Page 221
Address Arithmetic Unit Table 5-1 summarizes the types of transfers and transfer sizes supported by the addressing modes. Table 5-1. Types of Transfers Supported and Transfer Sizes Addressing Mode Types of Transfers Transfer Sizes Supported Auto-increment To and from Data LOADS: Auto-decrement Registers...
Page 222
Memory Address Alignment Table 5-2 summarizes the addressing modes. In the table, an asterisk (*) indicates the processor supports the addressing mode. Table 5-2. Addressing Modes 32-bit 16-bit 8-bit Sign/zero Data Pointer Data word half- byte extend Register register Register word Half P Auto-inc...
Address Arithmetic Unit AAU Instruction Summary Table 5-3 lists the AAU instructions. In Table 5-3, note the meaning of these symbols: • Dreg denotes any Data Register File register. • Dreg_lo denotes the lower 16 bits of any Data Register File register.
6 MEMORY Blackfin processors support a hierarchical memory model with different performance and size parameters, depending on the memory location within the hierarchy. Level 1 (L1) memories interconnect closely and effi- cient with the Blackfin core for best performance. Separate blocks of L1 memory can be accessed simultaneously through multiple bus systems.
Memory Architecture Memory Architecture Blackfin processors have a unified 4G byte address range that spans a com- bination of on-chip and off-chip memory and memory-mapped I/O resources. Of this range, some of the address space is dedicated to internal, on-chip resources. The processor populates portions of this internal mem- ory space with: •...
Page 231
Memory CORE L1 MEMORY INSTRUCTION PROCESSOR LOAD DATA CORE CLOCK LOAD DATA (CCLK) DOMAIN STORE DATA SYSTEM CLOCK (SCLK) DOMAIN CORE BUS (DCB) EXTERNAL ACCESS CONTROLLER BUS (EAB) EXTERNAL PERIPHERAL BUS (DEB) ACCESS BUS (PAB) EBIU EXTERNAL NON-DMA PERIPHERALS DMA PERIPHERALS PORT BUS (EPB) DMA ACCESS BUS...
Memory Architecture • Instruction and data cache options for microcontroller code, excel- lent High Level Language (HLL) support, and ease of programming cache control instructions, such as PREFETCH FLUSH • Memory protection The L1 memories operate at the core clock frequency ( CCLK Overview of Scratchpad Data SRAM The processor provides a dedicated 4K byte bank of scratchpad data...
Memory L1 Instruction Memory L1 Instruction Memory consists of a combination of dedicated SRAM and banks which can be configured as SRAM or cache. For the 16K byte bank that can be either cache or SRAM, control bits in the regis- IMEM_CONTROL ter can be used to organize all four subbanks of the L1 Instruction...
L1 Instruction Memory nonparticipating Ways. Code in nonparticipating Ways can still be removed from the cache using an instruction. If an IFLUSH ILOC[3:0] is 0, the corresponding Way is not locked and that Way participates in cache replacement policy. If an bit is 1, the corresponding Way ILOC[3:0] is locked and does not participate in cache replacement policy.
Page 236
L1 Instruction Memory Write access to the L1 Instruction SRAM Memory must be made through the 64-bit wide system DMA port. Because the SRAM is implemented as a collection of single ported subbanks, the instruction memory is effectively dual ported. Figure 6-3 on page 6-9 describes the bank architecture of the L1 Instruc- tion Memory.
Page 237
Memory CACHE CONTROL & MEMORY MANAGEMENT HIGH PRIORITY LOW PRIORITY CACHE CACHE LINE FILL LINE FILL BUFFER BUFFER 8 X 32 BIT 8 X 32 BIT 4 KB 4 KB BUFFER 4 KB 4 KB The shaded blocks are not present on all derivatives.
L1 Instruction Memory L1 Instruction Cache For information about cache terminology, see “Terminology” on page 6-74. The L1 Instruction Memory may also be configured to contain a, 4-Way set associative instruction 16K byte cache. To improve the average access latency for critical code sections, each Way or line of the cache can be locked independently.
Page 239
Memory The address tag consists of the upper 18 bits plus bits 11 and 10 of the physical address. Bits 12 and 13 of the physical address are not part of the address tag. Instead, these bits are used to identify the 4K byte memory subbank targeted for the access.
Page 240
L1 Instruction Memory SUBBANK BYTE 32-BIT IAB ADDRESS SELECT SELECT FOR LOOKUP 13 12 11 10 ADDRESS TAG WAY 3 4 x 64 VALID LRU ADDRESS WD2 WD1 WD0 LINE 0 VALID LRU ADDRESS WD2 WD1 WD0 LINE 1 VALID LRU ADDRESS WD2 WD1 WD0 LINE 2 WAY 2...
Memory LRUPRIO - 20-BIT ADDRESS TAG LRUPRIO - LRU PRIORITY BIT FOR LINE LOCKING - LRU STATE - VALID BIT WD 3 WD 2 WD 1 WD 0 WD - 64-BIT DATA WORD Figure 6-5. Cache Line – Tag and Data Portions Cache Hits and Misses A cache hit occurs when the address for an instruction fetch request from the core matches a valid entry in the cache.
L1 Instruction Memory Cache Line Fills A cache line fill consists of fetching 32 bytes of data from memory. The operation starts when the instruction memory unit requests a line-read data transfer on its external read-data port. This is a burst of four 64-bit words of data from the line fill buffer.
Memory Line Fill Buffer As the new cache line is retrieved from external memory, each 64-bit word is buffered in a four-entry line fill buffer before it is written to a 4K byte memory bank within L1 memory. The line fill buffer allows the core to access the data from the new cache line as the line is being retrieved from external memory, rather than having to wait until the line has been writ- ten into the cache.
L1 Instruction Memory For example: • If Way3 is invalid and Ways0, 1, 2 are valid, Way3 is selected for the new cache line. • If Ways0 and 1 are invalid and Ways2 and 3 are valid, Way0 is selected for the new cache line. •...
Memory cacheable line is fetched. This bit indicates that a line is of either “low” or “high” importance. In a modified LRU policy, a high can replace a low, but a low cannot replace a high. If all Ways are occupied by highs, an oth- erwise cacheable low will still be fetched for the core, but will not be cached.
L1 Instruction Memory • Execute the code of interest. Any cacheable exceptions, such as exit code, traversed by this code execution are also locked into the instruction cache. • Upon exit of the critical code, clear and set ILOC[3:1] ILOC[0] The critical code (and the instructions which set ) is now ILOC[0]...
Page 247
Memory the Invalid bit of each cache line to the invalid state. To implement this technique, additional MMRs ( ) are ITEST_COMMAND ITEST_DATA[1:0] available to allow arbitrary read/write of all the cache entries directly. This method is explained in the next section. For invalidating the complete instruction cache, a third method is avail- able.
Page 248
Instruction Test Registers The following figures describe the registers: ITEST • Figure 6-6, “Instruction Test Command Register,” on page 6-21 • Figure 6-7, “Instruction Test Data 1 Register,” on page 6-22 • Figure 6-8, “Instruction Test Data 0 Register,” on page 6-23 Access to these registers is possible only in Supervisor or Emulation mode.
Memory ITEST_COMMAND Register When the Instruction Test Command register ( ) is written ITEST_COMMAND to, the L1 cache data or tag arrays are accessed, and the data is transferred through the Instruction Test Data registers ( ITEST_DATA[1:0] Instruction Test Command Register (ITEST_COMMAND) 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 0xFFE0 1300 Reset = 0x0000 0000...
Instruction Test Registers ITEST_DATA1 Register Instruction Test Data registers ( ) are used to access L1 ITEST_DATA[1:0] cache data arrays. They contain either the 64-bit data that the access is to write to or the 64-bit data that the access is to read from. The Instruction Test Data 1 register ( ) stores the upper 32 bits.
Memory ITEST_DATA0 Register The Instruction Test Data 0 register ( ) stores the lower 32 ITEST_DATA0 bits of the 64-bit data to be written to or read from by the access. The register is also used to access tag arrays. This register also ITEST_DATA0 contains the Valid and Dirty bits, which indicate the state of the cache line.
L1 Data Memory L1 Data Memory The L1 data SRAM/cache is constructed from single-ported subsections, but organized to reduce the likelihood of access collisions. This organiza- tion results in apparent multi-ported behavior. When there are no collisions, this L1 data traffic could occur in a single core clock cycle: •...
Page 254
L1 Data Memory bit selects the data port used to process DAG0 PORT_PREF0 non-cacheable L2 fetches. Cacheable fetches are always processed by the data port physically associated with the targeted cache memory. Steering DAG0, DAG1, and cache traffic to different ports optimizes performance by keeping the queue to L2 memory full.
Memory By default after reset, all L1 Data Memory serves as SRAM. The DMC[1:0] bits can be used to reserve portions of this memory to serve as cache instead. Reserving memory to serve as cache does not enable L2 memory accesses to be cached.
Page 256
L1 Data Memory CACHE CONTROL & TO RAB MEMORY MANAGEMENT SRAM SRAM OR CACHE I/O BUFFERS 4 KB READ CACHE CACHE HIGH PRIORITY LOW PRIORITY LINE FILL LINE FILL BUFFER BUFFER 4 KB 4 KB 4 KB 4 KB 8 X 32 BIT 8 X 32 BIT 32 BIT 32 BIT...
Memory L1 Data Cache For definitions of cache terminology, see “Terminology” on page 6-74. Unlike instruction cache, which is 4-Way set associative, data cache is 2-Way set associative. When two banks are available and enabled as cache, additional sets rather than Ways are created. When both Data Bank A and Data Bank B have memory serving as cache, the bit in the DCBS...
L1 Data Memory If cache is enabled (controlled by bits in the regis- DMC[1:0] DMEM_CONTROL ter), data CPLBs should also be enabled (controlled by bit in the ENDCPLB register). Only memory pages specified as cacheable by data DMEM_CONTROL CPLBs will be cached. The default behavior when data CPLBs are dis- abled is for nothing to be cached.
Page 259
Memory • If , then is part of the address index, and all DCBS = 1 A[23] addresses where use Data Bank B. All addresses where A[23] = 0 use Data Bank A. A[23] = 1 In this case, is treated as merely another bit in the address A[14] that is stored with the tag in the cache and compared for hit/miss processing by the cache.
Page 260
L1 Data Memory • If selects Data Bank A instead of Data Bank B. DCBS = 1 A[23] With , the system functions more like two independent DCBS = 1 caches, each a 2-Way set associative 16K byte cache. Each Bank serves an alternating set of 8M byte blocks of memory.
Memory Figure 6-11 shows an example of how mapping is performed when DCBS = 1 selection can be changed dynamically; however, to ensure DCBS that no data is lost, first flush and invalidate the entire cache. WAY0 WAY1 DATA BANK B DATA BANK B WAY0 WAY1 Figure 6-11.
Page 262
L1 Data Memory A data cache line is in one of three states: invalid, exclusive (valid and clean), and modified (valid and dirty). If valid data already occupies the allocated line and the cache is configured for write-back storage, the con- troller checks the state of the cache line and treats it accordingly: •...
Memory Cache Write Method Cache write memory operations can be implemented by using either a write-through method or a write-back method: • For each store operation, write-through caches initiate a write to external memory immediately upon the write to cache. If the cache line is replaced or explicitly flushed by software, the contents of the cache line are invalidated rather than written back to external memory.
Page 264
L1 Data Memory when posted writes are to a slow external memory device. When returning from a high priority interrupt service routine to a low priority interrupt service routine or user mode, the core stalls until the write buffer has com- pleted the necessary writes to return to a two-deep state.
Memory Data Cache Control Instructions The processor defines three data cache control instructions that are acces- sible in User and Supervisor modes. The instructions are PREFETCH FLUSH . Examples of each of these instructions can be found in FLUSHINV Chapter 17, “Cache Control.” •...
Data Test Registers Data Cache Invalidation Besides the instruction, explained in the previous section, two FLUSHINV additional methods are available to invalidate the data cache when flush- ing is not required. The first technique directly invalidates Valid bits by setting the Invalid bit of each cache line to the invalid state. To implement this technique, additional MMRs ( DTEST_COMMAND DTEST_DATA[1:0]...
Memory These figures describe the registers. DTEST • Figure 6-13, “Data Test Command Register,” on page 6-40 • Figure 6-14, “Data Test Data 1 Register,” on page 6-41 • Figure 6-15, “Data Test Data 0 Register,” on page 6-42 Access to these registers is possible only in Supervisor or Emulation mode. When writing to registers, always write to the registers...
Page 268
Data Test Registers Data Test Command Register (DTEST_COMMAND) 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 0xFFE0 0300 Reset = Undefined Access Way/Instruction Address Bit 11 Subbank Access[1:0] 0 - Access Way0/Instruction bit 11 = 0 (SRAM ADDR[13:12]) 1 - Access Way1/Instruction bit 11 = 1 00 - Access subbank 0...
Memory DTEST_DATA1 Register Data Test Data registers ( ) contain the 64-bit data to be DTEST_DATA[1:0] written, or they contain the destination for the 64-bit data read. The Data Test Data 1 register ( ) stores the upper 32 bits. DTEST_DATA1 Data Test Data 1 Register (DTEST_DATA1) 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16...
Data Test Registers DTEST_DATA0 Register The Data Test Data 0 register ( ) stores the lower 32 bits of DTEST_DATA0 the 64-bit data to be written, or it contains the lower 32 bits of the desti- nation for the 64-bit data read. The register is also used to DTEST_DATA0 access the tag arrays and contains the Valid and Dirty bits, which indicate...
Memory On-chip Level 2 (L2) Memory Some Blackfin processors provide additional low-latency and high-band- width SRAM on chip, called Level 2 (L2) memory. L2 memory runs at clock rate, but takes multiple cycles to access. CCLK CCLK Simultaneous access to the multibanked, on-chip L2 memory architecture from the core(s) and system DMA can occur in parallel, provided that they access different banks.
On-chip Level 2 (L2) Memory Latency When cache is enabled, the bus between the core and L2 memory is fully pipelined for contiguous burst transfers. The cache line fill from on-chip memory behaves the same for instruction and data fetches. Operations that miss the cache trigger a cache line replacement.
Memory part of the cache-line fill executes on the tenth cycle; the second instruc- tion executes on the eleventh cycle, and the third instruction executes on the twelfth cycle—all of them in parallel with the cache line fill. Each cache line fill is aligned on a 32-byte boundary. When the requested instruction or data is not 32-byte aligned, the requested item is always loaded in the first read;...
Page 274
Memory Protection and Properties INSTRUCTION ALIGNMENT UNIT T+9 ABCD READY L2 MEMORY TO EXECUTE INSTRUCTION ALIGNMENT UNIT T+10 A EXECUTES T+11 B EXECUTES T+12 C EXECUTES T+13 D EXECUTES T+18 E EXECUTES EACH INSTRUCTION FETCH IS 64 BITS 64 BITS INSTRUCTION ALIGNMENT UNIT CYCLES Figure 6-17.
Page 275
Memory ) and L1 Data Memory Control ( ) registers, IMEM_CONTROL DMEM_CONTROL respectively. These registers are shown in Figure 6-2 on page 6-7 Figure 6-9 on page 6-25, respectively. Each CPLB entry consists of a pair of 32-bit values. For instruction fetches: •...
Memory Protection and Properties Memory Pages The 4G byte address space of the processor can be divided into smaller ranges of memory or I/O referred to as memory pages. Every address within a page shares the attributes defined for that page. The architecture supports four different page sizes: •...
Page 277
Memory • If cacheable: write-through/write-back Data writes propagate directly to memory or are deferred until the cache line is reallocated. If write-through, allocate on read only, or read and write. • Dirty/modified The data in this page in memory has changed since the CPLB was last loaded.
Memory Protection and Properties Page Descriptor Table For memory accesses to utilize the cache when CPLBs are enabled for instruction access, data access, or both, a valid CPLB entry must be avail- able in an MMR pair. The MMR storage locations for CPLB entries are limited to 16 descriptors for instruction fetches and 16 descriptors for data load and store operations.
Page 279
Memory “Exceptions” on page 4-47 for more information). The handler is typically part of the operating system (OS) kernel that implements the CPLB replacement policy. Before CPLBs are enabled, valid CPLB descriptors must be in place for both the Page Descriptor Table and the MMU exception han- dler.
Memory Protection and Properties A single instruction may generate an instruction fetch as well as one or two data accesses. It is possible that more than one of these memory oper- ations references data for which there is no valid CPLB descriptor in an MMR pair.
Page 281
Memory from inadvertent modification by a running User mode application. This protection can be achieved by defining CPLB descriptors for protected memory ranges that allow write access only when in Supervisor mode. If a write to a protected memory region is attempted while in User mode, an exception is generated before the memory is modified.
Memory Protection and Properties Examples of Protected Memory Regions Figure 6-18, a starting point is provided for basic CPLB allocation for Instruction and Data CPLBs. Note some ICPLBs and DCPLBs have com- mon descriptors for the same address space. INSTRUCTION CPLB SETUP SDRAM: CACHEABLE L1 INSTRUCTION: SRAM EIGHT 4MB PAGES...
Memory ICPLB_DATAx Registers Figure 6-19 describes the ICPLB Data registers ( ICPLB_DATAx To ensure proper behavior and future compatibility, all reserved bits in this register must be set to 0 whenever this register is written. ICPLB Data Registers (ICPLB_DATAx) For Memory- 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 mapped addresses, see...
Memory DCPLB_DATAx Registers Figure 6-20 shows the DCPLB Data registers ( DCPLB_DATAx To ensure proper behavior and future compatibility, all reserved bits in this register must be set to 0 whenever this register is written. DCPLB Data Registers (DCPLB_DATAx) 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 For Memory- Reset = 0x0000 0000 mapped...
Page 290
Memory Protection and Properties Bits in the DCPLB Status regis- FAULT_DAG FAULT_USERSUPV FAULT_RW ter ( ) are used to identify the CPLB entry that has triggered DCPLB_STATUS the CPLB-related exception (see Figure 6-23). DCPLB Status Register (DCPLB_STATUS) 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 0xFFE0 0008 Reset = Undefined FAULT_ILLADDR...
Memory Memory Transaction Model Both internal and external memory locations are accessed in little endian byte order. Figure 6-27 shows a data word stored in register and in memory at address location addr. B0 refers to the least significant byte of the 32-bit word.
Load/Store Operation Load/Store Operation The Blackfin processor architecture supports the RISC concept of a Load/Store machine. This machine is the characteristic in RISC architec- tures whereby memory operations (loads and stores) are intentionally separated from the arithmetic functions that use the targets of the memory operations.
Memory the memory-read operation to complete. If the instruction immediately following the load uses the same register, it simply stalls until the value is returned. Consequently, it operates as the programmer expects. However, if four other instructions are placed after the load but before the instruc- tion that uses the same register, all of them execute, and the overall throughput of the processor is improved.
Load/Store Operation This ordering provides significant performance advantages in the opera- tion of most memory instructions. However, it can cause side effects that the programmer must be aware of to avoid improper system operation. When writing to or reading from nonmemory locations such as off-chip I/O device registers, the order of how read and write operations complete is often significant.
Memory In the preceding example code, the instruction ensures: CSYNC • The conditional branch ( ) is resolved, IF CC JUMP away_from_here forcing stalls into the execution pipeline until the condition is resolved and any entries in the processor store buffer have been flushed.
Load/Store Operation If the branch is taken, then the load is flushed from the pipeline, and any results that are in the process of being returned can be ignored. Con- versely, if the branch is not taken, the memory will have returned the correct value earlier than if the operation were stalled until the branch condition was resolved.
Memory devices, such as peripheral data FIFOs, reads are destructive. Each time the device is read, the FIFO advances, and the data cannot be recovered and re-read. When accessing off-chip memory-mapped devices that have state dependencies on the number of read operations on a given address location, disable interrupts before performing the load operation.
Working With Memory Atomic Operations The processor provides a single atomic operation: . Atomic opera- TESTSET tions are used to provide noninterruptible memory operations in support of semaphores between tasks. The instruction loads an indirectly TESTSET addressed memory half word, tests whether the low byte is zero, and then sets the most significant bit (MSB) of the low memory byte without affecting any other bits.
Memory System MMRs connect to the Peripheral Access Bus (PAB), which is implemented as either a 16-bit or a 32-bit wide bus on specific derivatives. The PAB bus operates at rate. Writes to system MMRs do not go SCLK through write buffers nor through store buffers. Rather, there is a simple bridge between the RAB and the PAB bus that translates between clock domains (and bus width) only.
Terminology Listing 6-1. Core MMR Programming CLI R0; stop interrupts and save IMASK */ P0 = MMR_BASE; 32-bit instruction to load base of MMRs */ R1 = [P0 + TIMER_CONTROL_REG]; get value of control reg */ BITSET R1, #N; set bit N */ [P0 + TIMER_CONTROL_REG] = R1;...
Page 303
Memory dirty or modified. A state bit, stored along with the tag, indicating whether the data in the data cache line has been changed since it was cop- ied from the source memory and, therefore, needs to be updated in that source memory.
Page 304
Terminology set associative. Cache architecture that limits line placement to a number of sets (or Ways). tag. Upper address bits, stored along with the cached data line, to identify the specific address source in memory that the cached line represents. valid.
7 PROGRAM FLOW CONTROL Instruction Summary • “Jump” on page 7-2 • “IF CC JUMP” on page 7-5 • “Call” on page 7-8 • “RTS, RTI, RTX, RTN, RTE (Return)” on page 7-10 • “LSETUP, LOOP” on page 7-13 Instruction Overview This chapter discusses the instructions that control program flow.
Instruction Overview Jump General Form JUMP (destination_indirect) JUMP (PC + offset) JUMP offset JUMP.S offset JUMP.L offset Syntax JUMP ( Preg ) ; /* indirect to an absolute (not PC-relative) address (a) */ JUMP ( PC + Preg ) ; /* PC-relative, indexed (a) */ JUMP pcrel25m2 ;...
Page 307
Program Flow Control : 25-bit signed, even relative offset, with a range of pcrel25m2 –16,777,216 through 16,777,214 bytes (0xFF00 0000 to 0x00FF FFFE) : valid assembler address label, resolved by the assembler/linker user_label to a valid PC-relative offset Instruction Length In the syntax, comment (a) identifies 16-bit instruction length.
Page 308
Instruction Overview Example jump get_new_sample ; /* assembler resolved target, abstract offsets */ jump (p5) ; /* P5 contains the absolute address of the target jump (pc + p2) ; /* P2 relative absolute address of the target and then a presentation of the absolute values for target */ jump 0x224 ;...
Program Flow Control IF CC JUMP General Form IF CC JUMP destination IF !CC JUMP destination Syntax IF CC JUMP pcrel11m2 ; /* branch if CC=1, branch predicted as not taken (a) */ IF CC JUMP pcrel11m2 (bp) ; /* branch if CC=1, branch predicted as taken (a) */ IF !CC JUMP pcrel11m2 ;...
Page 310
Instruction Overview Syntax Terminology : 11-bit signed even relative offset, with a range of –1024 pcrel11m2 through 1022 bytes (0xFC00 to 0x03FE). This value can optionally be replaced with an address label that is evaluated and replaced during linking. : valid assembler address label, resolved by the assembler/linker user_label to a valid PC-relative offset Instruction Length...
Page 311
Program Flow Control Required Mode User & Supervisor Parallel Issue This instruction cannot be issued in parallel with other instructions. Example if cc jump 0xFFFFFE08 (bp) ; /* offset is negative in 11 bits, so target address is a backwards branch, branch predicted */ if cc jump 0x0B4 ;...
Program Flow Control Functional Description instruction calls a subroutine from an address that a P-register CALL points to or by using a -relative offset. After the instruction exe- CALL cutes, the register contains the address of the next instruction. RETS The value in the must be an even value to maintain 16-bit alignment.
Page 314
Instruction Overview RTS, RTI, RTX, RTN, RTE (Return) General Form RTS, RTI, RTX, RTN, RTE Syntax RTS ; // Return from Subroutine (a) RTI ; // Return from Interrupt (a) RTX ; // Return from Exception (a) RTN ; // Return from NMI (a) RTE ;...
Page 315
Program Flow Control Table 7-1. Types of Return Instruction Mnemonic Description Forces a return from a subroutine by loading the value of the RETS Register into the Program Counter (PC), causing the processor to fetch the next instruction from the address contained in RETS. For nested subroutines, you must save the value of the RETS Register.
Page 316
Instruction Overview Example rts ; rti ; rtx ; rtn ; rte ; Also See Call, --SP (Push), SP++ (Pop) Special Applications None 7-12 ADSP-BF53x/BF56x Blackfin Processor Programming Reference...
Program Flow Control LSETUP, LOOP General Form There are two forms of this instruction. The first is: LOOP loop_name loop_counter LOOP_BEGIN loop_name LOOP_END loop_name The second form is: LSETUP (Begin_Loop, End_Loop)Loop_Counter Syntax For Loop0 LOOP loop_name LC0 ; /* (b) */ LOOP loop_name LC0 = Preg ;...
Page 319
Program Flow Control Functional Description The Zero-Overhead Loop Setup instruction provides a flexible, counter-based, hardware loop mechanism that provides efficient, zero-overhead software loops. In this context, zero-overhead means that the software in the loops does not incur a performance or code size penalty by decrementing a counter, evaluating a loop condition, then calculating and branching to a new target address.
Page 320
Instruction Overview syntax is generally more readable and LOOP LOOP_BEGIN LOOP_END user friendly. The syntax contains the same information, but in a LSETUP more compact form. is nonzero when the fetch address equals , the processor decre- ments and places the address in into the .
Page 321
Program Flow Control , the value loaded into , is a 5-bit, PC-relative, even offset Begin_Loop from the current instruction to the first instruction in the loop. The user is required to preserve half-word alignment by maintaining even values in this register.
Page 322
Instruction Overview As long as the hardware loop is active ( is nonzero), any of Loop_Count these forbidden instructions at the address produces undefined End_Loop execution, and no exception is generated. Forbidden End_Loop instructions that appear anywhere else in the defined loop execute nor- mally.
Page 323
Program Flow Control loop DoItSome LC0 ; /* define loop ‘DoItSome’ with Loop Counter 0 */ loop_begin DoItSome ; /* place before the first instruction in the loop */ loop_end DoItSome ; /* place after the last instruction in the loop */ loop MyLoop LC1 ;...
8 LOAD / STORE Instruction Summary • “Load Immediate” on page 8-3 • “Load Pointer Register” on page 8-7 • “Load Data Register” on page 8-10 • “Load Half-Word – Zero-Extended” on page 8-15 • “Load Half-Word – Sign-Extended” on page 8-19 •...
Page 326
Instruction Overview Instruction Overview This chapter discusses the load/store instructions. Users can take advan- tage of these instructions to load and store immediate values, pointer registers, data registers or data register halves, and half words (zero or sign extended). ADSP-BF53x/BF56x Blackfin Processor Programming Reference...
Load / Store Load Immediate General Form register = constant A1 = A0 = 0 Syntax Half-Word Load reg_lo = uimm16 /* 16-bit value into low-half data or address register (b) */ reg_hi = uimm16 ; /* 16-bit value into high-half data or address register (b) */ Zero Extended reg = uimm16 (Z) ;...
Page 328
Instruction Overview reg_lo: R7–0.L P5–0.L SP.L FP.L I3–0.L M3–0.L B3–0.L L3–0.L reg_hi R7–0.H P5–0.H SP.H FP.H I3–0.H M3–0.H B3–0.H L3–0.H reg: R7–0 P5–0 I3–0 M3–0 B3–0 L3–0 7-bit signed field, with a range of –64 through 63 imm7: 16-bit signed field, with a range of –32,768 through 32,767 imm16: (0x800 through 0x7FFF) 16-bit unsigned field, with a range of 0 through 65,535 (0x0000...
Page 329
Load / Store Loading a 32-bit value into a register using Load Immediate requires two separate instructions—one for the high and one for the low half. For example, to load the address “ ” into register , write: p3.h = foo ; p3.1 = foo ;...
Page 330
Instruction Overview a0 = 0 ; a1 = 0 ; a1 = a0 = 0 ; Also See Load Pointer Register Special Applications Use the Load Immediate instruction to initialize registers. ADSP-BF53x/BF56x Blackfin Processor Programming Reference...
Page 332
Instruction Overview Functional Description The Load Pointer Register instruction loads a 32-bit P-register with a 32-bit word from an address specified by a P-register. The indirect address and offset must yield an even multiple of 4 to main- tain 4-byte word address alignment. Failure to maintain proper alignment causes a misaligned memory access exception.
Page 333
Load / Store Flags Affected None Required Mode User & Supervisor Parallel Issue The 16-bit versions of this instruction can be issued in parallel with spe- cific other instructions. For more information, see “Issuing Parallel Instructions” on page 20-1. The 32-bit versions of this instruction cannot be issued in parallel with other instructions.
Page 335
Load / Store : 6-bit unsigned field that must be a multiple of 4, with a range of uimm6m4 0 through 60 bytes : 7-bit unsigned field that must be a multiple of 4, with a range of uimm7m4 4 through 128 bytes : 17-bit unsigned field that must be a multiple of 4, with a range uimm17m4 of 0 through 131,068 bytes (0x0000 0000 through 0x0001 FFFC)
Page 336
Instruction Overview Options The Load Data Register instruction supports the following options. • Post-increment the source pointer by 4 bytes to maintain word alignment. • Post-decrement the source pointer by 4 bytes to maintain word alignment. • Offset the source pointer with a small (6-bit), word-aligned (multi- ple of 4), unsigned constant.
Page 337
Load / Store where: • is the destination register. ( in the syntax example). Dest Dreg • is the first source register on the right-hand side of the Src_1 equation. • is the second source register. Src_2 Indirect and post-increment index addressing supports customized indi- rect address cadence.
Load / Store Load Half-Word – Zero-Extended General Form D-register = W [ indirect_address ] (Z) Syntax Dreg = W [ Preg ] (Z) ; /* indirect (a)*/ Dreg = W [ Preg ++ ] (Z) ; /* indirect, post-increment (a)*/ Dreg = W [ Preg -- ] (Z) ;...
Page 340
Instruction Overview Instruction Length In the syntax, comment (a) identifies 16-bit instruction length. Comment (b) identifies 32-bit instruction length. Functional Description The Load Half-Word – Zero-Extended instruction loads 16 bits from a memory location into the lower half of a 32-bit data register. The instruc- tion zero-extends the upper half of the register.
Page 341
Load / Store Indirect and Post-Increment Index Addressing The syntax of the form: Dest = W [ Src_1 ++ Src_2 ] is indirect, post-increment index addressing. The form is shorthand for the following sequence. Dest = [Src_1] ; /* load the 32-bit destination, indirect*/ Src_1 += Src_2 ;...
Page 342
Instruction Overview Parallel Issue The 16-bit versions of this instruction can be issued in parallel with spe- cific other instructions. For more information, see “Issuing Parallel Instructions” on page 20-1. The 32-bit versions of this instruction cannot be issued in parallel with other instructions.
Load / Store Load Half-Word – Sign-Extended General Form D-register = W [ indirect_address ] (X) Syntax Dreg = W [ Preg ] (X) ; // indirect (a) Dreg = W [ Preg ++ ] (X) ; // indirect, post-increment (a) Dreg = W [ Preg -- ] (X) ;...
Page 344
Instruction Overview Instruction Length In the syntax, comment (a) identifies 16-bit instruction length. Comment (b) identifies 32-bit instruction length. Functional Description The Load Half-Word – Sign-Extended instruction loads 16 bits sign-extended from a memory location into a 32-bit data register. The Pointer register is a P-register.
Page 345
Load / Store Indirect and Post-Increment Index Addressing The syntax of the form: Dest = W [ Src_1 ++ Src_2 ] (X) is indirect, post-increment index addressing. The form is shorthand for the following sequence. Dest = [Src_1] ; /* load the 32-bit destination, indirect*/ Src_1 += Src_2 ;...
Page 346
Instruction Overview Parallel Issue The 16-bit versions of this instruction can be issued in parallel with spe- cific other instructions. For more information, see “Issuing Parallel Instructions” on page 20-1. The 32-bit versions of this instruction cannot be issued in parallel with other instructions.
Load / Store Load High Data Register Half General Form Dreg_hi = W [ indirect_address ] Syntax Dreg_hi = W [ Ireg ] ; /* indirect data addressing (a)*/ Dreg_hi = W [ Ireg ++ ] ; /* indirect, post-increment data addressing (a) */ Dreg_hi = W [ Ireg -- ] ;...
Page 348
Instruction Overview The indirect address must be even to maintain 2-byte half-word address alignment. Failure to maintain proper alignment causes a misaligned memory access exception. The instruction versions that explicitly modify support Ireg optional circular buffering. See “Automatic Circular Addressing” on page 1-21 for more details.
Page 349
Load / Store Indirect and Post-Increment Index Addressing Dst_hi = [ Src_1 ++ Src_2 ] is indirect, post-increment index addressing. The form is shorthand for the following sequence. Dst_hi = [Src_1] ; /* load the half-word into the upper half of the destination register, indirect*/ Src_1 += Src_2 ;...
Page 350
Instruction Overview Parallel Issue This instruction can be issued in parallel with specific other instructions. For more information, see “Issuing Parallel Instructions” on page 20-1. Example r3.h = w [ i1 ] ; r7.h = w [ i3 ++ ] ; r1.h = w [ i0 -- ] ;...
Load / Store Load Low Data Register Half General Form Dreg_lo = W [ indirect_address ] Syntax Dreg_lo = W [ Ireg ] ; /* indirect data addressing (a)*/ Dreg_lo = W [ Ireg ++ ] ; /* indirect, post-increment data addressing (a) */ Dreg_lo = W [ Ireg -- ] ;...
Page 352
Instruction Overview The indirect address must be even to maintain 2-byte half-word address alignment. Failure to maintain proper alignment causes an misaligned memory access exception. The instruction versions that explicitly modify support Ireg optional circular buffering. See “Automatic Circular Addressing” on page 1-21 for more details.
Page 353
Load / Store Indirect and Post-Increment Index Addressing The syntax of the form: Dst_lo = [ Src_1 ++ Src_2 ] is indirect, post-increment index addressing. The form is shorthand for the following sequence. Dst_lo = [Src_1] ; /* load the half-word into the lower half of the destination register, indirect*/ Src_1 += Src_2 ;...
Page 354
Instruction Overview Instruction Length In the syntax, comment (a) identifies 16-bit instruction length. Parallel Issue This instruction can be issued in parallel with specific other instructions. For more information, see “Issuing Parallel Instructions” on page 20-1. Example r3.l = w[ i1 ] ; r7.l = w[ i3 ++ ] ;...
Load / Store Load Byte – Zero-Extended General Form D-register = B [ indirect_address ] (Z) Syntax Dreg = B [ Preg ] (Z) ; /* indirect (a)*/ Dreg = B [ Preg ++ ] (Z) ; /* indirect, post-increment (a)*/ Dreg = B [ Preg -- ] (Z) ;...
Page 356
Instruction Overview Options The Load Byte – Zero-Extended instruction supports the following options. • Post-increment the source pointer by 1 byte. • Post-decrement the source pointer by 1 byte. • Offset the source pointer with a 16-bit signed constant. Flags Affected None Required Mode User &...
Page 357
Load / Store Example r3 = b [ p0 ] (z) ; r7 = b [ p1 ++ ] (z) ; r2 = b [ sp -- ] (z) ; r0 = b [ p4 + 0xFFFF800F ] (z) ; Also See Load Byte –...
Instruction Overview Load Byte – Sign-Extended General Form D-register = B [ indirect_address ] (X) Syntax Dreg = B [ Preg ] (X) ; /* indirect (a)*/ Dreg = B [ Preg ++ ] (X) ; /* indirect, post-increment (a)*/ Dreg = B [ Preg -- ] (X) ;...
Page 359
Load / Store Options The Load Byte – Sign-Extended instruction supports the following options. • Post-increment the source pointer by 1 byte. • Post-decrement the source pointer by 1 byte. • Offset the source pointer with a 16-bit signed constant. Flags Affected None Required Mode...
Page 360
Instruction Overview Example r3 = b [ p0 ] (x) ; r7 = b [ p1 ++ ](x) ; r2 = b [ sp -- ] (x) ; r0 = b [ p4 + 0xFFFF800F ](x) ; Also See Load Byte – Zero-Extended Special Applications None 8-36...
Page 362
Instruction Overview Functional Description The Store Pointer Register instruction stores the contents of a 32-bit P-register to a 32-bit memory location. The Pointer register is a P-register. The indirect address and offset must yield an even multiple of 4 to main- tain 4-byte word address alignment.
Page 363
Load / Store Required Mode User & Supervisor Parallel Issue The 16-bit versions of this instruction can be issued in parallel with spe- cific other instructions. For more information, see “Issuing Parallel Instructions” on page 20-1. The 32-bit versions of this instruction cannot be issued in parallel with other instructions.
Page 365
Load / Store Mreg M3–0 : 6-bit unsigned field that must be a multiple of 4, with a range of uimm6m4 0 through 60 bytes : 7-bit unsigned field that must be a multiple of 4, with a range of uimm7m4 4 through 128 bytes : 17-bit unsigned field that must be a multiple of 4, with a range...
Page 366
Instruction Overview Example: If you use to increment your address pointer, first clear to disable circular buffering. Failure to explicitly clear Lreg beforehand can result in unexpected Ireg values. The circular address buffer registers (Index, Length, and Base) are not initialized automatically by Reset. Traditionally, user software clears all the circular address buffer registers during boot-up to dis- able circular buffering, then initializes them later, if needed.
Page 367
Load / Store Indirect and Post-Increment Index Addressing The syntax of the form: [Dst_1 ++ Dst_2] = Src is indirect, post-increment index addressing. The form is shorthand for the following sequence. [Dst_1] = Src ; /* load the 32-bit source, indirect*/ Dst_1 += Dst_2 ;...
Page 368
Instruction Overview Parallel Issue The 16-bit versions of this instruction can be issued in parallel with spe- cific other instructions. For more information, see “Issuing Parallel Instructions” on page 20-1. The 32-bit versions of this instruction cannot be issued in parallel with other instructions.
Load / Store Store High Data Register Half General Form W [ indirect_address ] = Dreg_hi Syntax W [ Ireg ] = Dreg_hi ; /* indirect data addressing (a)*/ W [ Ireg ++ ] = Dreg_hi ; /* indirect, post-increment data addressing (a) */ W [ Ireg -- ] = Dreg_hi ;...
Page 370
Instruction Overview The indirect address and offset must yield an even number to maintain 2-byte half-word address alignment. Failure to maintain proper alignment causes a misaligned memory access exception. The instruction versions that explicitly modify support Ireg optional circular buffering. See “Automatic Circular Addressing”...
Page 371
Load / Store Indirect and Post-Increment Index Addressing The syntax of the form: [Dst_1 ++ Dst_2] = Src_hi is indirect, post-increment index addressing. The form is shorthand for the following sequence. [Dst_1] = Src_hi ; /* store the upper half of the source regis- ter, indirect*/ Dst_1 += Dst_2 ;...
Page 372
Instruction Overview Parallel Issue This instruction can be issued in parallel with specific other instructions. For more information, see “Issuing Parallel Instructions” on page 20-1. Example w[ i1 ] = r3.h ; w[ i3 ++ ] = r7.h ; w[ i0 -- ] = r1.h ; w[ p4 ] = r2.h ;...
Load / Store Store Low Data Register Half General Form W [ indirect_address ] = Dreg_lo W [ indirect_address ] = D-register Syntax W [ Ireg ] = Dreg_lo ; /* indirect data addressing (a)*/ W [ Ireg ++ ] = Dreg_lo ; /* indirect, post-increment data addressing (a) */ W [ Ireg -- ] = Dreg_lo ;...
Page 374
Instruction Overview Dreg R7–0 : 5-bit unsigned field that must be a multiple of 2, with a range of uimm5m2 0 through 30 bytes : 16-bit unsigned field that must be a multiple of 2, with a range uimm16m2 of 0 through 65,534 bytes (0x0000 through 0xFFFE) Instruction Length In the syntax, comment (a) identifies 16-bit instruction length.
Page 375
Load / Store Options The Store Low Data Register Half instruction supports the following options. • Post-increment the destination pointer by 2 bytes. • Post-decrement the destination pointer by 2 bytes. • Offset the source pointer with a small (5-bit), half-word-aligned (even), unsigned constant.
Page 376
Instruction Overview Indirect and post-increment index addressing supports customized indi- rect address cadence. The indirect, post-increment index version must have separate P-registers for the input operands. If a common is used Preg for the inputs, the auto-increment feature does not work. Flags Affected None Required Mode...
Page 377
Load / Store Also See Store High Data Register Half, Store Data Register Special Applications To write consecutive, aligned 16-bit values for high-performance DSP operations, use the Store Data Register instructions instead of these Half-Word instructions. The Half-Word Store instructions use only half the available 32-bit data bus bandwidth, possibly imposing a bottleneck constriction in the data flow rate.
Instruction Overview Store Byte General Form B [ indirect_address ] = D-register Syntax B [ Preg ] = Dreg ; /* indirect (a)*/ B [ Preg ++ ] = Dreg ; /* indirect, post-increment (a)*/ B [ Preg -- ] = Dreg ; /* indirect, post-decrement (a)*/ B [ Preg + uimm15 ] = Dreg ;...
Page 379
Load / Store Options The Store Byte instruction supports the following options. • Post-increment the destination pointer by 1 byte to maintain byte alignment. • Post-decrement the destination pointer by 1 byte to maintain byte alignment. • Offset the destination pointer with a 16-bit signed constant. Flags Affected None Required Mode...
Page 380
Instruction Overview Also See None Special Applications To write consecutive, 8-bit values for high-performance DSP operations, use the Store Data Register instructions instead of these byte instructions. The byte store instructions use only one fourth the available 32-bit data bus bandwidth, possibly imposing a bottleneck constriction in the data flow rate.
Page 381
9 MOVE Instruction Summary • “Move Register” on page 9-2 • “Move Conditional” on page 9-8 • “Move Half to Full Word – Zero-Extended” on page 9-10 • “Move Half to Full Word – Sign-Extended” on page 9-13 • “Move Register Half” on page 9-15 •...
Page 383
Move Syntax Terminology genreg R7–0 P5–0 A0.X A0.W A1.X A1.W dagreg I3–0 M3–0 B3–0 L3–0 sysreg ASTAT SEQSTAT SYSCFG RETI RETX RETN RETE RETS , and CYCLES CYCLES2 EMUDAT USP: The User Stack Pointer Register Dreg R7–0 Preg P5–0 Dreg_even Dreg_odd When combining two moves in the same instruction, the operands must be members of the same...
Page 384
Instruction Overview All moves from 40-bit Accumulators to 32-bit D-registers support saturation. Options The Accumulator to Data Register Move instruction supports the options listed in the table below. Table 9-1. Accumulator to Data Register Move Option Accumulator Copy Formatting Default Signed fraction.
Page 385
Move Table 9-1. Accumulator to Data Register Move (Cont’d) Option Accumulator Copy Formatting (S2RND) Signed fraction with scaling. Shift the Accumulator contents one place to the left (multiply x 2). Saturate result to 1.31 format. Copy to destination register. Results range between minimum -1 and maximum 1-2 Signed integer with scaling.
Page 386
Instruction Overview • is set if result is zero; cleared if nonzero. In the case of two simultaneous operations, represents the logical “OR” of the two. • is set if result is negative; cleared if non-negative. In the case of two simultaneous operations, represents the logical “OR”...
Page 387
Move a1 = a0 ; a0 = r7 ; /* move R7 to 32-bit A0.W */ a1 = r3 ; /* move R3 to 32-bit A1.W */ retn = p0 ; /* must be in Supervisor mode */ r2 = a0 ; /* 32-bit move with saturation */ r7 = a1 ;...
Instruction Overview Move Conditional General Form IF CC dest_reg = src_reg IF ! CC dest_reg = src_reg Syntax IF CC DPreg = DPreg ; /* move if CC = 1 (a) */ IF ! CC DPreg = DPreg ; /* move if CC = 0 (a) */ Syntax Terminology DPreg R7–0...
Page 389
Move Parallel Issue The Move Conditional instruction cannot be issued in parallel with other instructions. Example if cc r3 = r0 ; /* move if CC=1 */ if cc r2 = p4 ; if cc p0 = r7 ; if cc p2 = p5 ; if ! cc r3 = r0 ;...
Instruction Overview Move Half to Full Word – Zero-Extended General Form dest_reg = src_reg (Z) Syntax Dreg = Dreg_lo (Z) ; /* (a) */ Syntax Terminology Dreg R7–0 Dreg_lo R7–0.L Instruction Length In the syntax, comment (a) identifies 16-bit instruction length. Functional Description The Move Half to Full Word –...
Page 391
Move Flags Affected The following flags are affected by the Move Half to Full Word – Zero-Extended instruction. • is set if result is zero; cleared if nonzero. • is cleared. • is cleared. • is cleared. • All other flags are unaffected. The ADSP-BF535 processor has fewer flags and some flags ASTAT...
Page 392
Instruction Overview Also See Move Half to Full Word – Sign-Extended, Move Register Half Special Applications None 9-12 ADSP-BF53x/BF56x Blackfin Processor Programming Reference...
Move Move Half to Full Word – Sign-Extended General Form dest_reg = src_reg (X) Syntax Dreg = Dreg_lo (X) ; /* (a)*/ Syntax Terminology Dreg R7–0 Dreg_lo R7–0.L Instruction Length In the syntax, comment (a) identifies 16-bit instruction length. Functional Description The Move Half to Full Word –...
Page 394
Instruction Overview • is cleared. • All other flags are unaffected. The ADSP-BF535 processor has fewer flags and some flags ASTAT operate differently than subsequent Blackfin family products. For more information on the ADSP-BF535 status flags, see Table A-1 on page A-3.
Move Move Register Half General Form dest_reg_half = src_reg_half dest_reg_half = accumulator (opt_mode) Syntax A0.X = Dreg_lo ; /* least significant 8 bits of Dreg into A0.X (b) */ A1.X = Dreg_lo ; /* least significant 8 bits of Dreg into A1.X (b) */ Dreg_lo = A0.X ;...
Page 396
Instruction Overview Accumulator to Half D-register Moves Dreg_lo = A0 (opt_mode) ; /* move A0 to lower half of Dreg (b) */ Dreg_hi = A1 (opt_mode) ; /* move A1 to upper half of Dreg (b) Dreg_lo = A0, Dreg_hi = A1 (opt_mode) ; /* move both values at once;...
Page 397
Move Functional Description The Move Register Half instruction copies 16 bits from a source register into half of a 32-bit register. The instruction does not affect the unspeci- fied half of the destination register. It supports only D-registers and the Accumulator.
Page 398
Instruction Overview The integer version of this instruction (the option) transfers the (IS) Accumulator result to the destination register according to the diagrams, shown in Figure 9-2. Accumulator contents transfer to the lower half A0.L of the destination D-register. contents transfer to the upper half of A1.L the destination D-register.
Page 399
Move Options The Accumulator to Half D-Register Move instructions support the copy options in Table 9-2. Table 9-2. Accumulator to Half D-Register Move Options Option Accumulator Copy Formatting Default Signed fraction format. Round Accumulator 9.31 format value at bit 16. (RND_MOD bit in the ASTAT register controls the rounding.) Saturate the result to 1.15 precision and copy it to the destination register half.
Page 400
Instruction Overview Table 9-2. Accumulator to Half D-Register Move Options (Cont’d) Option Accumulator Copy Formatting (S2RND) Signed fraction with scaling and rounding. Shift the Accumulator contents one place to the left (multiply x 2). Round Accumulator 9.31 format value at bit 16. (RND_MOD bit in the ASTAT register controls the rounding.) Saturate the result to 1.15 precision and copy it to the destination register half.
Page 401
Move • is set if result is zero; cleared if nonzero. • is set if result is negative; cleared if non-negative. • All other flags are unaffected. Flags are not affected by other versions of this instruction. The ADSP-BF535 processor has fewer flags and some flags ASTAT operate differently than subsequent Blackfin family products.
Page 402
Instruction Overview r3.1 = a0, r3.h = a1 ; /* copy both half words; must go to the lower and upper halves of the same Dreg. */ r1.h = a1, rl.l = a0 ; /* copy both half words; must go to the upper and lower halves of the same Dreg.
Move Move Byte – Zero-Extended General Form dest_reg = src_reg_byte (Z) Syntax Dreg = Dreg_byte (Z) ; /* (a)*/ Syntax Terminology , the low-order 8 bits of each Data Register Dreg_byte R7–0.B Instruction Length In the syntax, comment (a) identifies 16-bit instruction length. Functional Description The Move Byte –...
Page 404
Instruction Overview • is cleared. • All other flags are unaffected. The ADSP-BF535 processor has fewer flags and some flags ASTAT operate differently than subsequent Blackfin family products. For more information on the ADSP-BF535 status flags, see Table A-1 on page A-3.
Move Move Byte – Sign-Extended General Form dest_reg = src_reg_byte (X) Syntax Dreg = Dreg_byte (X) ; /* (a) */ Syntax Terminology , the low-order 8 bits of each Data Register Dreg_byte R7–0.B Instruction Length In the syntax, comment (a) identifies 16-bit instruction length. Functional Description The Move Byte –...
Page 406
Instruction Overview • is cleared. • All other flags are unaffected. The ADSP-BF535 processor has fewer flags and some flags ASTAT operate differently than subsequent Blackfin family products. For more information on the ADSP-BF535 status flags, see Table A-1 on page A-3.
10 STACK CONTROL Instruction Summary • “--SP (Push)” on page 10-2 • “--SP (Push Multiple)” on page 10-5 • “SP++ (Pop)” on page 10-8 • “SP++ (Pop Multiple)” on page 10-12 • “LINK, UNLINK” on page 10-17 Instruction Overview This chapter discusses the instructions that control the stack. Users can take advantage of these instructions to save the contents of single or multi- ple registers to the stack or to control the stack frame space on the stack and the Frame Pointer (...
Page 409
Stack Control higher memory [--sp]=p5 ; [--sp]=p1 ; <-------- SP [--sp]=r3 ; lower memory The Stack Pointer must already be 32-bit aligned to use this instruction. If an unaligned memory access occurs, an exception is generated and the instruction aborts. Push/pop on has no effect on the interrupt system.
Page 412
Instruction Overview The instruction pre-decrements the Stack Pointer to the next available location in the stack first. The stack grows down from high memory to low memory, therefore the decrement operation is the same used for pushing, and the increment operation is used for popping values.
Page 413
Stack Control the instruction can be restarted after the exception. Note that when a Push Multiple operation is aborted due to an exception, the memory state is changed by the stores that have already completed before the exception. The Stack Pointer must already be 32-bit aligned to use this instruction. If an unaligned memory access occurs, an exception is generated and the instruction aborts, as described above.
Instruction Overview SP++ (Pop) General Form dest_reg = [ SP ++ ] Syntax mostreg = [ SP ++ ] ; /* post-increment SP; does not apply to Data Registers and Pointer Registers (a) */ Dreg = [ SP ++ ] ; /* Load Data Register instruction (repeated here for user convenience) (a) */ Preg = [ SP ++ ] ;...
Page 415
Stack Control The stack grows down from high memory to low memory, therefore the decrement operation is used for pushing, and the increment operation is used for popping values. The Stack Pointer always points to the last used location. When a pop operation is issued, the value pointed to by the Stack Pointer is transferred and the is replaced by SP+4...
Page 416
Instruction Overview Of course, the usual intent for Pop and these specific Load Register instructions is to recover register values that were previously pushed onto the stack. The user must exercise programming discipline to restore the stack values back to their intended registers from the first-in, last-out structure of the stack.
Page 419
Stack Control The instruction post-increments the Stack Pointer to the next occupied location in the stack before concluding. The stack grows down from high memory to low memory, therefore the decrement operation is used for pushing, and the increment operation is used for popping values.
Page 420
Instruction Overview higher memory LOAD REGISTER R6 FROM STACK <------ ========> R6 = Word2 lower memory higher memory. LOAD REGISTER R5 FROM STACK <------ ========> R5 = Word1 lower memory higher memory POST-INCREMENT STACK POINTER Word0 <------ Word1 Word2 lower memory The value(s) just popped remain on the stack until another push instruc- tion overwrites it.
Page 421
Stack Control registers from the first-in, last-out structure of the stack. Pop exactly the same registers that were pushed onto the stack, but pop them in the oppo- site order. Although this instruction takes a variable amount of time to complete depending on the number of registers to be saved, it reduces compiled code size.
Page 422
Instruction Overview Parallel Issue This instruction cannot be issued in parallel with other instructions. Example (p5:4) = [ sp ++ ] ; /* P3 through P0 excluded */ (r7:2) = [ sp ++ ] ; /* R1 through R0 excluded */ (r7:5, p5:0) = [ sp ++ ] ;...
Stack Control LINK, UNLINK General Form LINK, UNLINK Syntax LINK uimm18m4 ; /* allocate a stack frame of specified size (b) */ UNLINK ; /* de-allocate the stack frame (b)*/ Syntax Terminology : 18-bit unsigned field that must be a multiple of 4, with a range uimm18m4 of 8 through 262,152 bytes (0x00000 through 0x3FFFC) Instruction Length...
Page 424
Instruction Overview The user-supplied argument for determines the size of the allocated LINK stack frame. always saves on the stack, so the minimum LINK RETS frame size is 2 words when the argument is zero. The maximum stack frame size is 2 + 8 = 262152 bytes in 4-byte increments.
Page 425
Stack Control higher memory ..AFTER LINK EXECUTES Saved RETS Prior FP <-FP Allocated words for local <-SP = FP +– frame_size subroutine variables . . . lower memory higher memory ..AFTER A PUSH Saved RETS MULTIPLE EXECUTES...
Page 426
Instruction Overview Flags Affected None Required Mode User & Supervisor Parallel Issue This instruction cannot be issued in parallel with other instructions. Example link 8 ; /* establish frame with 8 words allocated for local variables */ [ -- sp ] = (r7:0, p5:0) ; /* save D- and P-registers */ (r7:0, p5:0) = [ sp ++ ] ;...
11 CONTROL CODE BIT MANAGEMENT Instruction Summary • “Compare Data Register” on page 11-2 • “Compare Pointer” on page 11-6 • “Compare Accumulator” on page 11-9 • “Move CC” on page 11-12 • “Negate CC” on page 11-15 Instruction Overview This chapter discusses the instructions that affect the Control Code ( bit in the register.
Instruction Overview Compare Data Register General Form CC = operand_1 == operand_2 CC = operand_1 < operand_2 CC = operand_1 <= operand_2 CC = operand_1 < operand_2 (IU) CC = operand_1 <= operand_2 (IU) Syntax CC = Dreg == Dreg ; /* equal, register, signed (a) */ CC = Dreg == imm3 ;...
Page 429
Control Code Bit Management Instruction Length In the syntax, comment (a) identifies 16-bit instruction length. Functional Description The Compare Data Register instruction sets the Control Code ( ) bit based on a comparison of two values. The input operands are D-registers. The compare operations are nondestructive on the input operands and affect only the bit and the flags.
Page 430
Instruction Overview The following flags are affected by the Compare Data Register instruction. • is set if the test condition is true; cleared if false. • is set if result is zero; cleared if nonzero. • is set if result is negative; cleared if non-negative. •...
Control Code Bit Management /* If r0 = 0x8FFF FFFF and r3 = 0x0000 0001,then the unsigned operation . . . */ cc = r0 < r3 (iu) ; /* . . . produces CC = 0, because r0 is treated as a large unsigned value */ cc = r1 <...
Instruction Overview Compare Pointer General Form CC = operand_1 == operand_2 CC = operand_1 < operand_2 CC = operand_1 <= operand_2 CC = operand_1 < operand_2 (IU) CC = operand_1 <= operand_2 (IU) Syntax CC = Preg == Preg ; /* equal, register, signed (a) */ CC = Preg == imm3 ;...
Page 433
Control Code Bit Management Instruction Length In the syntax, comment (a) identifies 16-bit instruction length. Functional Description The Compare Pointer instruction sets the Control Code ( ) bit based on a comparison of two values. The input operands are P-registers. The compare operations are nondestructive on the input operands and affect only the bit and the flags.
Control Code Bit Management Compare Accumulator General Form CC = A0 == A1 CC = A0 < A1 CC = A0 <= A1 Syntax CC = A0 == A1 ; /* equal, signed (a) */ CC = A0 < A1 ; /* less than, Accumulator, signed (a) */ CC = A0 <= A1 ;...
Page 436
Instruction Overview Flags Affected The Compare Accumulator instruction uses the values shown in Table 11-2 in compare operations. Table 11-2. Compare Accumulator Instruction Values Comparison Signed Equal AZ=1 Less than AN=1 Less than or equal AN or AZ=1 The following arithmetic status bits reside in the ASTAT register. •...
Page 437
Control Code Bit Management Example cc = a0 == a1 ; cc = a0 < a1 ; cc = a0 <= a1 ; Also See Compare Pointer, Compare Data Register, IF CC JUMP Special Applications None ADSP-BF53x/BF56x Blackfin Processor Programming Reference 11-11...
Instruction Overview Move CC General Form dest = CC dest |= CC dest &= CC dest ^= CC CC = source CC |= source CC &= source CC ^= source Syntax Dreg = CC ; /* CC into 32-bit data register, zero-extended (a) statbit = CC ;...
Page 439
Control Code Bit Management Functional Description The Move instruction moves the status of the Control Code ( ) bit to and from a data register or arithmetic status bit. When copying the bit into a 32-bit register, the operation moves the bit into the least significant bit of the register, zero-extended to 32 bits.
Page 440
Instruction Overview Required Mode User & Supervisor Instruction Length In the syntax, comment (a) identifies 16-bit instruction length. Parallel Issue This instruction cannot be issued in parallel with other instructions. Example r0 = cc ; az = cc ; an |= cc ; ac0 &= cc ;...
Control Code Bit Management Negate CC General Form CC = ! CC Syntax CC = ! CC ; /* (a) */ Instruction Length In the syntax, comment (a) identifies 16-bit instruction length. Functional Description The Negate instruction inverts the logical state of Flags Affected •...
Page 442
Instruction Overview Example cc =! cc ; Also See Move CC Special Applications None 11-16 ADSP-BF53x/BF56x Blackfin Processor Programming Reference...
12 LOGICAL OPERATIONS Instruction Summary • “& (AND)” on page 12-2 • “~ (NOT One’s-Complement)” on page 12-4 • “| (OR)” on page 12-6 • “^ (Exclusive-OR)” on page 12-8 • “BXORSHIFT, BXOR” on page 12-10 Instruction Overview This chapter discusses the instructions that specify logical operations. Users can take advantage of these instructions to perform logical AND, NOT, OR, exclusive-OR, and bit-wise exclusive-OR (BXORSHIFT) operations.
Page 444
Instruction Overview & (AND) General Form dest_reg = src_reg_0 & src_reg_1 Syntax Dreg = Dreg & Dreg ; /* (a) */ Syntax Terminology Dreg R7–0 Instruction Length In the syntax, comment (a) identifies 16-bit instruction length. Functional Description The AND instruction performs a 32-bit, bit-wise logical AND operation on the two source registers and stores the results into the dest_reg The instruction does not implicitly modify the source registers.
Page 445
Logical Operations • are cleared. • All other flags are unaffected. The ADSP-BF535 processor has fewer flags and some flags ASTAT operate differently than subsequent Blackfin family products. For more information on the ADSP-BF535 status flags, see Table A-1 on page A-3.
Instruction Overview ~ (NOT One’s-Complement) General Form dest_reg = ~ src_reg Syntax Dreg = ~ Dreg ; /* (a)*/ Syntax Terminology Dreg R7–0 Instruction Length In the syntax, comment (a) identifies 16-bit instruction length. Functional Description The NOT One’s-Complement instruction toggles every bit in the 32-bit register.
Page 447
Logical Operations • are cleared. • All other flags are unaffected. The ADSP-BF535 processor has fewer flags and some flags ASTAT operate differently than subsequent Blackfin family products. For more information on the ADSP-BF535 status flags, see Table A-1 on page A-3.
Page 448
Instruction Overview | (OR) General Form dest_reg = src_reg_0 | src_reg_1 Syntax Dreg = Dreg | Dreg ; /* (a) */ Syntax Terminology Dreg R7–0 Instruction Length In the syntax, comment (a) identifies 16-bit instruction length. Functional Description The OR instruction performs a 32-bit, bit-wise logical OR operation on the two source registers and stores the results into the dest_reg The instruction does not implicitly modify the source registers.
Page 449
Logical Operations • are cleared. • All other flags are unaffected. The ADSP-BF535 processor has fewer flags and some flags ASTAT operate differently than subsequent Blackfin family products. For more information on the ADSP-BF535 status flags, see Table A-1 on page A-3.
Instruction Overview ^ (Exclusive-OR) General Form dest_reg = src_reg_0 ^ src_reg_1 Syntax Dreg = Dreg ^ Dreg ; /* (a) */ Syntax Terminology Dreg R7–0 Instruction Length In the syntax, comment (a) identifies 16-bit instruction length. Functional Description The Exclusive-OR (XOR) instruction performs a 32-bit, bit-wise logical exclusive OR operation on the two source registers and loads the results into the dest_reg...
Logical Operations • are cleared. • All other flags are unaffected. The ADSP-BF535 processor has fewer flags and some flags ASTAT operate differently than subsequent Blackfin family products. For more information on the ADSP-BF535 status flags, see Table A-1 on page A-3.
Page 452
Instruction Overview BXORSHIFT, BXOR General Form dest_reg = CC = BXORSHIFT ( A0, src_reg ) dest_reg = CC = BXOR ( A0, src_reg ) dest_reg = CC = BXOR ( A0, A1, CC ) A0 = BXORSHIFT ( A0, A1, CC ) Syntax Type I (Without Feedback) LFSR...
Page 453
Logical Operations The Type I s (no feedback) applies a 32-bit registered mask to a LFSR 40-bit state residing in Accumulator , followed by a bit-wise XOR reduction operation. The result is placed in CC and a destination register half. The Type I s (with feedback) applies a 40-bit mask in Accumulator LFSR...
Page 454
Instruction Overview Modified Type I LFSR (without feedback) Two instructions support the with no feedback. LSFR Dreg_lo = CC = BXORSHIFT(A0, dreg) Dreg_lo = CC = BXOR(A0, dreg) In the first instruction the Accumulator is left-shifted by 1 prior to the XOR reduction.
Page 455
Logical Operations The second instruction in this class performs a bit-wise XOR of logi- cally AND'ed with the . The output is placed into the least significant dreg bit of the destination register and into the bit. The Accumulator not modified by this operation. This operation is illustrated in Figure 12-3.
Page 456
Instruction Overview The first instruction provides a bit-wise XOR of logically AND'ed with . The resulting intermediate bit is XOR'ed with the flag. The result of the operation is left-shifted into the least significant bit of following the operation. This operation is illustrated in Figure 12-4.
Page 457
Logical Operations CC dreg_lo[0] A1[0] A1[39] A1[37] A1[38] A0[39] A0[38] A0[0] A0[37] After Operation dr[15] dr[14] dr[13] dreg_lo[15:0] Figure 12-5. XOR of A0 AND A1, to CC Flag and LSB of Dest Register Flags Affected The following flags are affected by the Four Bit-Wise Exclusive-OR instructions.
Page 458
Instruction Overview Parallel Issue This instruction can be issued in parallel with specific other 16-bit instructions. For details, see “Issuing Parallel Instructions” on page 20-1. Example r0.l = cc = bxorshift (a0, r1) ; r0.l = cc = bxor (a0, r1) ; r0.l = cc = bxor (a0, a1, cc) ;...
Instruction Overview BITCLR General Form BITCLR ( register, bit_position ) Syntax BITCLR ( Dreg , uimm5 ) ; /* (a) */ Syntax Terminology Dreg R7–0 : 5-bit unsigned field, with a range of 0 through 31 uimm5 Instruction Length In the syntax, comment (a) identifies 16-bit instruction length. Functional Description The Bit Clear instruction clears the bit designated by in the...
Page 461
Bit Operations • is cleared. • All other flags are unaffected. The ADSP-BF535 processor has fewer flags and some flags ASTAT operate differently than subsequent Blackfin family products. For more information on the ADSP-BF535 status flags, see Table A-1 on page A-3.
Instruction Overview BITSET General Form BITSET ( register, bit_position ) Syntax BITSET ( Dreg , uimm5 ) ; /* (a) */ Syntax Terminology Dreg R7–0 : 5-bit unsigned field, with a range of 0 through 31 uimm5 Instruction Length In the syntax, comment (a) identifies 16-bit instruction length. Functional Description The Bit Set instruction sets the bit designated by in the...
Bit Operations • is cleared. • All other flags are unaffected. The ADSP-BF535 processor has fewer flags and some flags ASTAT operate differently than subsequent Blackfin family products. For more information on the ADSP-BF535 status flags, see Table A-1 on page A-3.
Page 464
Instruction Overview BITTGL General Form BITTGL ( register, bit_position ) Syntax BITTGL ( Dreg , uimm5 ) ; /* (a) */ Syntax Terminology Dreg R7–0 : 5-bit unsigned field, with a range of 0 through 31 uimm5 Instruction Length In the syntax, comment (a) identifies 16-bit instruction length. Functional Description The Bit Toggle instruction inverts the bit designated by bit_position...
Page 465
Bit Operations • is cleared. • All other flags are unaffected. The ADSP-BF535 processor has fewer flags and some flags ASTAT operate differently than subsequent Blackfin family products. For more information on the ADSP-BF535 status flags, see Table A-1 on page A-3.
Instruction Overview BITTST General Form CC = BITTST ( register, bit_position ) CC = ! BITTST ( register, bit_position ) Syntax CC = BITTST ( Dreg , uimm5 ) ; /* set CC if bit = 1 (a)*/ CC = ! BITTST ( Dreg , uimm5 ) ; /* set CC if bit = 0 (a)*/ Syntax Terminology Dreg...
Page 467
Bit Operations Flags Affected The Bit Test instruction affects flags as follows. • is set if the tested bit is 1; cleared otherwise. • All other flags are unaffected. The ADSP-BF535 processor has fewer flags and some flags ASTAT operate differently than subsequent Blackfin family products. For more information on the ADSP-BF535 status flags, see Table A-1 on page...
Page 469
Bit Operations Table 13-1. Input Register Bit Field Definitions 31....24 23....16 15....8 7....0 backgnd_reg: bbbb bbbb bbbb bbbb bbbb bbbb bbbb bbbb foregnd_reg: nnnn nnnn nnnn nnnn xxxp pppp xxxL LLLL 1 where b = background bit field (32 bits) 2 where: –n = foreground bit field (16 bits);...
Page 470
Instruction Overview • Sign-extended, L = 0 and p = 0: The architecture copies the lower order bits of below position p into , then backgnd_reg dest_reg sign-extends that number. The foreground value has no effect. For instance, if: = 0x0000 8123, backgnd_reg L = 0, and p = 16,...
Page 471
Bit Operations • is cleared. • All other flags are unaffected. The ADSP-BF535 processor has fewer flags and some flags ASTAT operate differently than subsequent Blackfin family products. For more information on the ADSP-BF535 status flags, see Table A-1 on page A-3.
Page 472
Instruction Overview • If • R4=0b1111 1111 1111 1111 1111 1111 1111 1111 where this is the background bit field • R3=0b0000 0000 1111 1010 0000 1101 0000 1001 where bits 31–16 are the foreground bit field, bits 15–8 are the position, and bits 7–0 are the length then the Bit Field Deposit (unsigned) instruction produces: •...
Bit Operations • If • R4=0b1111 1111 1111 1111 1111 1111 1111 1111 where this is the background bit field • R3=0b0000 1001 1010 1100 0000 1101 0000 1001 where bits 31–16 are the foreground bit field, bits 15–8 are the position, and bits 7–0 are the length then the Bit Field Deposit (unsigned) instruction produces: •...
Page 475
Bit Operations Table 13-2. Input Register Bit Field Definitions 31....24 23....16 15....8 7....0 scene_reg: ssss ssss ssss ssss ssss ssss ssss ssss pattern_reg: xxxp pppp xxxL LLLL 1 where s = scene bit field (32 bits) 2 where: –p = position of pattern bit field LSB in scene_reg (valid range 0 through 31) –L = length of pattern bit field (valid range 0 through 31) The operation reads the pattern bit field of length L from the scene bit field, with the pattern LSB located at bit p of the scene.
Page 476
Instruction Overview • is cleared. • is cleared. • All other flags are unaffected. The ADSP-BF535 processor has fewer flags and some flags ASTAT operate differently than subsequent Blackfin family products. For more information on the ADSP-BF535 status flags, see Table A-1 on page A-3.
Page 477
Bit Operations • If • R4=0b1010 0101 1010 0101 1100 0011 1010 1010 where this is the scene bit field • R3=0bxxxx xxxx xxxx xxxx 0000 1101 0000 1001 where bits bits 15–8 are the position, and bits 7–0 are the length then the Bit Field Extraction (unsigned) instruction produces: •...
Page 478
Instruction Overview • If • R4=0b1010 0101 1010 0101 1100 0011 1010 1010 where this is the scene bit field • R3=0bxxxx xxxx xxxx xxxx 0000 1101 0000 1001 where bits bits 15–8 are the position, and bits 7–0 are the length Then the Bit Field Extraction (sign-extended) instruction produces:...
Bit Operations BITMUX General Form BITMUX ( source_1, source_0, A0 ) (ASR) BITMUX ( source_1, source_0, A0 ) (ASL) Syntax BITMUX ( Dreg , Dreg , A0 ) (ASR) ; /* shift right, LSB is shifted out (b) */ BITMUX ( Dreg , Dreg , A0 ) (ASL) ; /* shift left, MSB is shifted out (b) */ Syntax Terminology...
Page 480
Instruction Overview In the Shift Left version, the processor performs the following sequence. 1. Left shift Accumulator by one bit. Left shift the MSB of into the LSB of the Accumulator. source_0 2. Left shift Accumulator by one bit. Left shift the MSB of into the LSB of the Accumulator.
Page 481
Bit Operations Table 13-5. A Shift Left Instruction 39....32 31....24 23....16 15....8 7....0 source_1: xxxx xxxx xxxx xxxx xxxx xxxx xxxx xxx0 source_0: yyyy yyyy yyyy yyyy yyyy yyyy yyyy yyy0 Accumulator A0: zzzz zzzz zzzz zzzz zzzz zzzz zzzz zzzz zzzz zzyx 1 source_1 is shifted left 1 place 2 source_0 is shifted left 1 place...
Page 482
Instruction Overview Example bitmux (r2, r3, a0) (asr) ; /* right shift*/ • If • R2=0b1010 0101 1010 0101 1100 0011 1010 1010 • R3=0b1100 0011 1010 1010 1010 0101 1010 0101 • A0=0b0000 0000 0000 0000 0000 0000 0000 0000 0000 0111 then the Shift Right instruction produces: •...
Page 483
Bit Operations Also See None Special Applications Convolutional encoder algorithms ADSP-BF53x/BF56x Blackfin Processor Programming Reference 13-25...
Instruction Overview ONES (One’s-Population Count) General Form dest_reg = ONES src_reg Syntax Dreg_lo = ONES Dreg ; /* (b) */ Syntax Terminology Dreg R7–0 Dreg_lo R7–0.L Instruction Length In the syntax, comment (b) identifies 32-bit instruction length. Functional Description The One’s-Population Count instruction loads the number of 1’s contained in the into the lower half of the src_reg...
Page 485
Bit Operations Flags Affected None The ADSP-BF535 processor has fewer flags and some flags ASTAT operate differently than subsequent Blackfin family products. For more information on the ADSP-BF535 status flags, see Table A-1 on page A-3. Required Mode User & Supervisor Parallel Issue This instruction can be issued in parallel with specific other 16-bit instructions.
14 SHIFT/ROTATE OPERATIONS Instruction Summary • “Add with Shift” on page 14-2 • “Shift with Add” on page 14-5 • “Arithmetic Shift” on page 14-7 • “Logical Shift” on page 14-14 • “ROT (Rotate)” on page 14-21 Instruction Overview This chapter discusses the instructions that manipulate bit operations. Users can take advantage of these instructions to perform logical and arithmetic shifts, combine addition operations with shifts, and rotate a registered number through the Control Code (...
Page 489
Shift/Rotate Operations Functional Description The Add with Shift instruction combines an addition operation with a one- or two-place logical shift left. Of course, a left shift accomplishes a x2 multiplication on sign-extended numbers. Saturation is not supported. The Add with Shift instruction does not intrinsically modify values that are strictly input.
Shift/Rotate Operations Arithmetic Shift General Form dest_reg >>>= shift_magnitude dest_reg = src_reg >>> shift_magnitude (opt_sat) dest_reg = src_reg << shift_magnitude (S) accumulator = accumulator >>> shift_magnitude dest_reg = ASHIFT src_reg BY shift_magnitude (opt_sat) accumulator = ASHIFT accumulator BY shift_magnitude Syntax Constant Shift Magnitude Dreg >>>= uimm5 ;...
Page 494
Instruction Overview A0 = ASHIFT A0 BY Dreg_lo ; /* arithmetic right shift (b)*/ A1 = ASHIFT A1 BY Dreg_lo ; /* arithmetic right shift (b)*/ Syntax Terminology Dreg R7–0 Dreg_lo_hi R7–0.L R7–0.H Dreg_lo R7–0.L : 4-bit unsigned field, with a range of 0 through 15 uimm4 : 5-bit unsigned field, with a range of 0 through 31 uimm5...
Page 495
Shift/Rotate Operations The “ ” versions of this instruction support two modes. ASHIFT 1. Default–arithmetic right shifts and logical left shifts. left Logical shifts do not guarantee sign bit preservation. The “ ” versions ASHIFT automatically select arithmetic and logical shift modes based on the sign of the shift_magnitude 2.
Page 496
Instruction Overview Table 14-1. Arithmetic Shifts Syntax Description “>>>=” The value in dest_reg is right-shifted by the number of places specified by shift_magnitude. The data size is always 32 bits long. The entire 32 bits of the shift_magnitude determine the shift value. Shift magnitudes larger than 0x1F result in either 0x00000000 (when the input value is positive) or 0xFFFFFFFF (when the input value is negative).
Page 497
Shift/Rotate Operations Options Option (S) invokes saturation of the result. In the default case–without the saturation option–numbers can be left-shifted so far that all the sign bits overflow and are lost. However, when the saturation option is enabled, a left shift that would otherwise shift nonsign bits off the left-hand side saturates to the maximum positive or negative value instead.
Page 498
Instruction Overview The versions of this instruction that send results to an Accumulator flags as follows. • is set if result is zero; cleared if nonzero. • is set if result is negative; cleared if non-negative. • is set if result is zero; cleared if nonzero. •...
Page 501
Shift/Rotate Operations Data Shift, Registered Shift Magnitude Dreg >>= Dreg ; /* right shift (a) */ Dreg <<= Dreg ; /* left shift (a) */ Dreg_lo_hi = LSHIFT Dreg_lo_hi BY Dreg_lo ; /* (b) */ Dreg = LSHIFT Dreg BY Dreg_lo ; /* (b) */ A0 = LSHIFT A0 BY Dreg_lo ;...
Page 502
Instruction Overview Four versions of the Logical Shift instruction support pointer shifting. The instruction does not implicitly modify the input value. For src_pntr the P-register versions of this instruction, can be the same dest_pntr P-register as . Doing so explicitly modifies the source register. src_pntr The rest of this description applies to the data shift versions of this instruction relating to D-registers and Accumulators.
Page 503
Shift/Rotate Operations can be a 16-, 32-, or 40-bit register. dest_reg src_reg For the instruction, the shift magnitude is the lower 6 bits of the LSHIFT , sign extended. The instruc- Dreg_lo Dreg >>= Dreg Dreg <<= Dreg tions use the entire 32 bits of magnitude. The D-register versions of this instruction shift 16 or 32 bits for half-word and word registers, respectively.
Page 504
Instruction Overview The versions of this instruction that send results to an Accumulator flags as follows. • is set if result is zero; cleared if nonzero. • is set if result is negative; cleared if non-negative. • is cleared. • All other flags are unaffected. The versions of this instruction that send results to an Accumulator flags as follows.
Page 505
Shift/Rotate Operations Example p3 = p2 >> 1 ; /* pointer right shift by 1 */ p3 = p3 >> 2 ; /* pointer right shift by 2 */ p4 = p5 << 1 ; /* pointer left shift by 1 */ p0 = p1 <<...
Instruction Overview Also See Arithmetic Shift, (Rotate), Shift with Add, Vector Arithmetic Shift, Vector Logical Shift Special Applications None 14-20 ADSP-BF53x/BF56x Blackfin Processor Programming Reference...
Page 507
Shift/Rotate Operations ROT (Rotate) General Form dest_reg = ROT src_reg BY rotate_magnitude accumulator_new = ROT accumulator_old BY rotate_magnitude Syntax Constant Rotate Magnitude Dreg = ROT Dreg BY imm6 ; /* (b) */ A0 = ROT A0 BY imm6 ; /* (b) */ A1 = ROT A1 BY imm6 ;...
Page 508
Instruction Overview Rotation shifts all the bits either right or left. Each bit that rotates out of the register (the LSB for rotate right or the MSB for rotate left) is stored in bit, and the bit is stored into the bit vacated by the rotate on the opposite end of the register.
Page 509
Shift/Rotate Operations The sign of the rotate magnitude determines the direction of the rotation. • Positive rotate magnitudes produce Left rotations. • Negative rotate magnitudes produce Right rotations. Valid rotate magnitudes are –32 through +31, zero included. The Rotate instruction masks and ignores bits that are more significant than those allowed.
Page 510
Instruction Overview Flags Affected The following flags are affected by the Rotate instruction. • contains the latest value shifted into it. • All other flags are unaffected. The ADSP-BF535 processor has fewer flags and some flags ASTAT operate differently than subsequent Blackfin family products. For more information on the ADSP-BF535 status flags, see Table A-1 on page...
Page 511
Shift/Rotate Operations Also See Arithmetic Shift, Logical Shift Special Applications None ADSP-BF53x/BF56x Blackfin Processor Programming Reference 14-25...
Page 513
15 ARITHMETIC OPERATIONS Instruction Summary • “ABS” on page 15-3 • “Add” on page 15-6 • “Add/Subtract – Prescale Down” on page 15-10 • “Add/Subtract – Prescale Up” on page 15-13 • “Add Immediate” on page 15-16 • “DIVS, DIVQ (Divide Primitive)” on page 15-19 •...
Page 514
Instruction Overview • “Multiply and Multiply-Accumulate to Data Register” on page 15-67 • “Negate (Two’s-Complement)” on page 15-73 • “RND (Round to Half-Word)” on page 15-77 • “Saturate” on page 15-80 • “SIGNBITS” on page 15-83 • “Subtract” on page 15-86 •...
Instruction Overview Functional Description The Dreg form of the Absolute Value instruction calculates the absolute value of a 32-bit register and stores it into a 32-bit . The accumu- dest_reg lator form of this instruction takes the absolute value of a 40-bit input value in a register and produces a 40-bit result.
Page 517
Arithmetic Operations • is set if result overflows and the ; cleared if no dest_reg overflow. • is set if is set; unaffected otherwise. AV1S • All other flags are unaffected. The ADSP-BF535 processor has fewer flags and some flags ASTAT operate differently than subsequent Blackfin family products.
Instruction Overview General Form dest_reg = src_reg_1 + src_reg_2 Syntax Pointer Registers — 32-Bit Operands, 32-Bit Result Preg = Preg + Preg ; /* (a) */ Data Registers — 32-Bit Operands, 32-bit Result Dreg = Dreg + Dreg ; /* no saturation support but shorter instruction length (a) */ Dreg = Dreg + Dreg (sat_flag) ;...
Page 519
Arithmetic Operations Functional Description The Add instruction adds two source values and places the result in a des- tination register. There are two ways to specify addition on 32-bit data in D-registers: • One does not support saturation (16-bit instruction length) •...
Page 520
Instruction Overview Flags Affected D-register versions of this instruction set flags as follows. • is set if result is zero; cleared if nonzero. • is set if result is negative; cleared if non-negative. • is set if the operation generates a carry; cleared if no carry. •...
Page 521
Arithmetic Operations Example r5 = r2 + r1 ; /* 16-bit instruction length add, no saturation */ r5 = r2 + r1(ns) ; /* same result as above, but 32-bit instruction length */ r5 = r2 + r1(s) ; /* saturate the result */ p5 = p3 + p0 ;...
Page 523
Arithmetic Operations The instruction supports only biased rounding. The bit in the RND_MOD register has no bearing on the rounding behavior of this instruction. ASTAT “Rounding and Truncating” on page 1-19 for a description of round- ing behavior. Flags Affected The following flags are affected by this instruction: •...
Instruction Overview Also See Add/Subtract – Prescale RND (Round to Half-Word), Special Applications Typically, use the Add/Subtract – Prescale Down instruction to provide an IEEE 1180–compliant 2D 8x8 inverse discrete cosine transform. 15-12 ADSP-BF53x/BF56x Blackfin Processor Programming Reference...
Page 526
Instruction Overview The instruction supports only biased rounding. The bit in the RND_MOD register has no bearing on the rounding behavior of this instruction. ASTAT “Saturation” on page 1-17 for a description of saturation behavior. “Rounding and Truncating” on page 1-19 for a description of round- ing behavior.
Page 527
Arithmetic Operations Also See RND (Round to Half-Word), Add/Subtract – Prescale Down, Special Applications Typically, use the Add/Subtract – Prescale Up instruction to provide an IEEE 1180–compliant 2D 8x8 inverse discrete cosine transform. ADSP-BF53x/BF56x Blackfin Processor Programming Reference 15-15...
Page 529
Arithmetic Operations on page 1-21 for more details. Unless circular buffering is desired, disable it prior to issuing this instruction by clearing the Length Register ( ) corresponding to the used in this instruction. Lreg Ireg Example: If you use to increment your address pointer, first clear to disable circular buffering.
Page 530
Instruction Overview Required Mode User & Supervisor Parallel Issue The Index Register versions of this instruction can be issued in parallel with specific other instructions. For details, see “Issuing Parallel Instruc- tions” on page 20-1. The Data Register and Pointer Register versions of this instruction cannot be issued in parallel with other instructions.
Arithmetic Operations DIVS, DIVQ (Divide Primitive) General Form DIVS ( dividend_register, divisor_register ) DIVQ ( dividend_register, divisor_register ) Syntax DIVS ( Dreg, Dreg ) ; /* Initialize for DIVQ. Set the AQ flag based on the signs of the 32-bit dividend and the 16-bit divisor. Left shift the dividend one bit.
Page 532
Instruction Overview The division can either be signed or unsigned, but the dividend and divi- sor must both be of the same type. The divisor cannot be negative. A signed division operation, where the dividend may be negative, begins the sequence with the (“divide-sign”) instruction, followed by repeated DIVS...
Page 533
Arithmetic Operations Both instruction versions align the dividend for the next iteration by left shifting the dividend one bit to the left (without carry). This left shift accomplishes the same function as aligning the divisor one bit to the right, such as one would do in manual binary division.
Page 534
Instruction Overview fractional (in 1.15 format) and therefore the upper 16 bits of the dividend must have a smaller magnitude than the divisor to avoid a quotient over- flow beyond 16 bits. If an overflow occurs, is set. User software is able to detect the overflow, rescale the operand, and repeat the division.
Page 535
Arithmetic Operations • After the divide sequence concludes, multiply the resulting quotient by the original divisor sign. • The quotient then has the correct magnitude and sign. 2. The Divide Primitive instructions do not support unsigned divi- sion by a divisor greater than 0x7FFF. If such divisions are necessary, prescale both operands by shifting the dividend and divi- sor one bit to the right prior to division.
Page 536
Instruction Overview Flags Affected This instruction affects flags as follows. • equals Exclusive-OR where dividend dividend_MSB divisor_MSB is a 32-bit value and divisor is a 16-bit value. • All other flags are unaffected. The ADSP-BF535 processor has fewer flags and some flags ASTAT operate differently than subsequent Blackfin family products.
Page 537
Arithmetic Operations r0 = r0.l (x) ; /* Sign extend the 16-bit quotient to 32bits. /* r0 contains the quotient (70/5 = 14). */ Also See LSETUP, LOOP, Multiply 32-Bit Operands Special Applications None ADSP-BF53x/BF56x Blackfin Processor Programming Reference 15-25...
Page 539
Arithmetic Operations Exponents are unsigned integers. The Exponent Detection instruction accommodates the two special cases (0 and –1) and always returns the smallest exponent for each case. The reference exponent and destination exponent are 16-bit half-word unsigned values. The sample number can be either a word or half-word. The Exponent Detection instruction does not implicitly modify input val- ues.
Page 540
Instruction Overview Example r5.l = expadj (r4, r2.l) ; • Assume = 0x0000 0052 and = 12. Then becomes 12. R2.L R5.L • Assume = 0xFFFF 0052 and = 12. Then becomes 12. R2.L R5.L • Assume = 0x0000 0052 and = 27.
Arithmetic Operations Special Applications detects the exponent of the largest magnitude number in an array. EXPADJ The detected value may then be used to normalize the array on a subse- quent pass with a shift operation. Typically, use this feature to implement block floating-point capabilities.
Instruction Overview General Form dest_reg = MAX ( src_reg_0, src_reg_1 ) Syntax Dreg = MAX ( Dreg , Dreg ) ; /* 32-bit operands (b) */ Syntax Terminology Dreg R7–0 Instruction Length In the syntax, comment (b) identifies 32-bit instruction length. Functional Description The Maximum instruction returns the maximum, or most positive, value of the source registers.
Page 543
Arithmetic Operations • is cleared. • All other flags are unaffected. The ADSP-BF535 processor has fewer flags and some flags ASTAT operate differently than subsequent Blackfin family products. For more information on the ADSP-BF535 status flags, see Table A-1 on page A-3.
Instruction Overview General Form dest_reg = MIN ( src_reg_0, src_reg_1 ) Syntax Dreg = MIN ( Dreg , Dreg ) ; /* 32-bit operands (b) */ Syntax Terminology Dreg R7–0 Instruction Length In the syntax, comment (b) identifies 32-bit instruction length. Functional Description The Minimum instruction returns the minimum value of the source regis- ters to the...
Page 545
Arithmetic Operations • is cleared. • All other flags are unaffected. The ADSP-BF535 processor has fewer flags and some flags ASTAT operate differently than subsequent Blackfin family products. For more information on the ADSP-BF535 status flags, see Table A-1 on page A-3.
Instruction Overview Modify – Decrement General Form dest_reg -= src_reg Syntax 40-Bit Accumulators A0 -= A1 ; /* dest_reg_new = dest_reg_old - src_reg, saturate the result at 40 bits (b) */ A0 -= A1 (W32) ; /* dest_reg_new = dest_reg_old - src_reg, dec- rement and saturate the result at 32 bits, sign extended (b) */ 32-Bit Registers Preg -= Preg ;...
Page 547
Arithmetic Operations “Saturation” on page 1-17 for a description of saturation behavior. The instruction versions that explicitly modify support Ireg optional circular buffering. See “Automatic Circular Addressing” on page 1-21 for more details. Unless circular buffering is desired, disable it prior to issuing this instruction by clearing the Length Register ( ) corresponding to the used in this instruction.
Page 548
Instruction Overview Required Mode User & Supervisor Parallel Issue The 32-bit versions of this instruction and the 16-bit versions that use can be issued in parallel with specific other 16-bit instructions. For Ireg details, see “Issuing Parallel Instructions” on page 20-1.
Arithmetic Operations Modify – Increment General Form dest_reg += src_reg dest_reg = ( src_reg_0 += src_reg_1 ) Syntax 40-Bit Accumulators A0 += A1 ; /* dest_reg_new = dest_reg_old + src_reg, saturate the result at 40 bits (b) */ A0 += A1 (W32) ; /* dest_reg_new = dest_reg_old + src_reg, signed saturate the result at 32 bits, sign extended (b) */ 32-Bit Registers...
Page 550
Instruction Overview result at bit 16 (according to the RND_MOD bit in the ASTAT reg- ister), then saturating at 32 bits and moving bits 31:16 into the half register. (b) */ Syntax Terminology Dreg R7–0 Preg P5–0 Ireg I3–0 Mreg M3–0 : optional bit reverse syntax;...
Page 551
Arithmetic Operations “Rounding and Truncating” on page 1-19 for a description of round- ing behavior. The instruction versions that explicitly modify support Ireg optional circular buffering. See “Automatic Circular Addressing” on page 1-21 for more details. Unless circular buffering is desired, disable it prior to issuing this instruction by clearing the Length Register ( ) corresponding to the...
Page 552
Instruction Overview Flags Affected The versions of the Modify – Increment instruction that store the results in an Accumulator affect flags as follows. • is set if Accumulator result is zero; cleared if nonzero. • is set if Accumulator result is negative; cleared if non-negative. •...
Page 553
Arithmetic Operations • is set if is set; unaffected otherwise. AV0S • All other flags are unaffected. The ADSP-BF535 processor has fewer flags and some flags ASTAT operate differently than subsequent Blackfin family products. For more information on the ADSP-BF535 status flags, see Table A-1 on page A-3.
Page 554
Instruction Overview Also See Modify – Decrement, Add, Shift with Add Special Applications Typically, use the Index Register and Pointer Register versions of the Modify – Increment instruction to increment indirect address pointers for load or store operations. 15-42 ADSP-BF53x/BF56x Blackfin Processor Programming Reference...
Arithmetic Operations Multiply 16-Bit Operands General Form dest_reg = src_reg_0 * src_reg_1 (opt_mode) Syntax Multiply-And-Accumulate Unit 0 (MAC0) Dreg_lo = Dreg_lo_hi * Dreg_lo_hi (opt_mode_1) ; /* 16-bit result into the destination lower half-word register (b) */ Dreg_even = Dreg_lo_hi * Dreg_lo_hi (opt_mode_2) ; /* 32-bit result (b) */ Multiply-And-Accumulate Unit 1 (MAC1)
Page 556
Instruction Overview : Optionally , or . Optionally, can be opt_mode_2 (FU) (IS) (ISS2) used with MAC1 versions either alone or with any of these other options. When used together, the option flags must be enclosed in one set of parenthesis and separated by a comma.
Page 557
Arithmetic Operations The versions of this instruction that produce 16-bit results are affected by bit in the register when they copy the results into the RND_MOD ASTAT 16-bit destination register. determines whether biased or unbi- RND_MOD ased rounding is used. controls rounding for all versions of this RND_MOD instruction that produce 16-bit results except the...
Page 558
Instruction Overview Table 15-2. Multiply 16-Bit Operands Options Option Description for Description for Register Half Destination 32-Bit Register Destination Default Signed fraction. Multiply 1.15 * 1.15 to Signed fraction. Multiply 1.15 * 1.15 to produce 1.31 results after left-shift cor- produce 1.31 results after left-shift cor- rection.
Page 559
Arithmetic Operations Table 15-2. Multiply 16-Bit Operands Options (Cont’d) Option Description for Description for Register Half Destination 32-Bit Register Destination Signed fraction with truncation. Trun- Not applicable. Truncation is meaning- cate Accumulator 9.31 format value at less for 32-bit register destinations. bit 16.
Page 560
Instruction Overview Table 15-2. Multiply 16-Bit Operands Options (Cont’d) Option Description for Description for Register Half Destination 32-Bit Register Destination (ISS2) Signed integer with scaling. Multiply Signed integer with scaling. Multiply 16.0 * 16.0 to produce 32.0 results. No 16.0 * 16.0 to produce 32.0 results. No shift correction.
Page 561
Arithmetic Operations Flags Affected This instruction affects flags as follows. • is set if result saturates; cleared if no saturation. • is set if is set; unaffected otherwise. • All other flags are unaffected. The ADSP-BF535 processor has fewer flags and some flags ASTAT operate differently than subsequent Blackfin family products.
Page 562
Instruction Overview Also See Multiply 32-Bit Operands, Multiply and Multiply-Accumulate to Accu- mulator, Multiply and Multiply-Accumulate to Half-Register, Multiply and Multiply-Accumulate to Data Register, Vector Multiply, Vector Mul- tiply and Multiply-Accumulate Special Applications None 15-50 ADSP-BF53x/BF56x Blackfin Processor Programming Reference...
Arithmetic Operations Multiply 32-Bit Operands General Form dest_reg *= multiplier_register Syntax Dreg *= Dreg ; /* 32 x 32 integer multiply (a) */ Syntax Terminology Dreg R7–0 Instruction Length In the syntax, comment (a) identifies 16-bit instruction length. Functional Description The Multiply 32-Bit Operands instruction multiplies two 32-bit data reg- isters ( and saves the product in...
Page 564
Instruction Overview This instruction might be used to implement the congruence method of random number generation according to: a X n [ ] × )mod 2 where: • X[n] is the seed value, • a is a large integer, and •...
Arithmetic Operations Multiply and Multiply-Accumulate to Accumulator General Form accumulator = src_reg_0 * src_reg_1 (opt_mode) accumulator += src_reg_0 * src_reg_1 (opt_mode) accumulator –= src_reg_0 * src_reg_1 (opt_mode) Syntax Multiply-And-Accumulate Unit 0 (MAC0) Operations A0 =Dreg_lo_hi * Dreg_lo_hi (opt_mode) ; /* multiply and store (b) */ A0 += Dreg_lo_hi * Dreg_lo_hi (opt_mode) ;...
Page 566
Instruction Overview Instruction Length In the syntax, comment (b) identifies 32-bit instruction length. Functional Description The Multiply and Multiply-Accumulate to Accumulator instruction mul- tiplies two 16-bit half-word operands. It stores, adds or subtracts the product into a designated Accumulator with saturation. The Multiply-and-Accumulate Unit 0 (MAC0) portion of the architecture performs operations that involve Accumulator .
Page 567
Arithmetic Operations Table 15-3. Multiply and Multiply-Accumulate to Accumulator Options Option Description Default Signed fraction. Multiply 1.15 x 1.15 to produce 1.31 format data after shift correc- tion. Sign extend the result to 9.31 format before passing it to the Accumulator. Sat- urate the Accumulator after copying or accumulating to maintain 9.31 precision.
Page 568
Instruction Overview Flags Affected This instruction affects flags as follows. • is set if result in Accumulator (MAC0 operation) saturates; cleared if result does not saturate. • is set if is set; unaffected otherwise. AV0S • is set if result in Accumulator (MAC1 operation) saturates;...
Page 569
Arithmetic Operations Also See Multiply 16-Bit Operands, Multiply 32-Bit Operands, Multiply and Mul- tiply-Accumulate to Half-Register, Multiply and Multiply-Accumulate to Data Register, Vector Multiply, Vector Multiply and Multiply-Accumulate Special Applications DSP filter applications often use the Multiply and Multiply-Accumulate to Accumulator instruction to calculate the dot product between two sig- nal vectors.
Instruction Overview Multiply and Multiply-Accumulate to Half-Register General Form dest_reg_half = (accumulator = src_reg_0 * src_reg_1) (opt_mode) dest_reg_half = (accumulator += src_reg_0 * src_reg_1) (opt_mode) dest_reg_half = (accumulator –= src_reg_0 * src_reg_1) (opt_mode) Syntax Multiply-And-Accumulate Unit 0 (MAC0) Dreg_lo = (A0 = Dreg_lo_hi * Dreg_lo_hi) (opt_mode) ; /* mul- tiply and store (b) */ Dreg_lo = (A0 += Dreg_lo_hi * Dreg_lo_hi) (opt_mode) ;...
Page 571
Arithmetic Operations : Optionally opt_mode (FU) (IS) (IU) (TFU) (S2RND) (ISS2) . Optionally, can be used with MAC1 versions either alone or (IH) with any of these other options. If multiple options are specified together for a MAC, the options must be separated by commas and enclosed within a single set of parentheses.
Page 573
Arithmetic Operations “Rounding and Truncating” on page 1-19 for a description of round- ing behavior. Options The Multiply and Multiply-Accumulate to Half-Register instruction sup- ports operand and Accumulator copy options. The options are listed in Table 15-4. Table 15-4. Multiply and Multiply-Accumulate to Half-Register Options Option Description...
Page 574
Instruction Overview Table 15-4. Multiply and Multiply-Accumulate to Half-Register Options (Cont’d) Option Description (IS) Signed integer format. Multiply 16.0 * 16.0 formats to produce 32.0 results. No shift correction. Sign extend 32.0 result to 40.0 format before copying or accumulating to Accumu- lator.
Page 575
Arithmetic Operations Table 15-4. Multiply and Multiply-Accumulate to Half-Register Options (Cont’d) Option Description (TFU) Unsigned fraction with truncation. Multiply 0.16* 0.16 formats to produce 0.32 results. No shift correction. The special case of 0x8000 * 0x8000 yields 0x4000 0000. No saturation is necessary since no shift correction occurs. (Same as the FU mode.) Zero extend 0.32 result to 8.32 format before copying or accumulating to Accu- mulator.
Page 576
Instruction Overview Table 15-4. Multiply and Multiply-Accumulate to Half-Register Options (Cont’d) Option Description (IH) Signed integer, high word extract. Multiply 16.0 * 16.0 formats to produce 32.0 results. No shift correction. (Same as the IS mode.) Sign extend 32.0 result to 40.0 format before copying or accumulating to Accumu- lator.
Page 577
Arithmetic Operations Flags Affected This instruction affects flags as follows. • is set if the result extracted to the saturates; cleared if no Dreg saturation. • is set if is set; unaffected otherwise. • is set if result in Accumulator (MAC0 operation) saturates;...
Page 578
Instruction Overview Example r3.l=(a0=r3.h*r2.h) ; /* MAC0, only. Both operands are signed fractions. Load the product into A0, then copy to r3.l. */ r3.h=(a1+=r6.h*r4.l) (fu) ; /* MAC1, only. Both operands are unsigned fractions. Add the product into A1, then copy to r3.h */ Also See Multiply 32-Bit Operands,...
Arithmetic Operations Multiply and Multiply-Accumulate to Data Register General Form dest_reg = (accumulator = src_reg_0 * src_reg_1) (opt_mode) dest_reg = (accumulator += src_reg_0 * src_reg_1) (opt_mode) dest_reg = (accumulator –= src_reg_0 * src_reg_1) (opt_mode) Syntax Multiply-And-Accumulate Unit 0 (MAC0) Dreg_even = (A0 = Dreg_lo_hi * Dreg_lo_hi) (opt_mode) ; /* mul- tiply and store (b) */ Dreg_even = (A0 += Dreg_lo_hi * Dreg_lo_hi) (opt_mode) ;...
Page 580
Instruction Overview : Optionally , or . Optionally, opt_mode (FU) (IS) (S2RND) (ISS2) be used with MAC1 versions either alone or with any of these other options. If multiple options are specified together for a MAC, the options must be separated by commas and enclosed within a single set of parenthe- sis.
Page 581
Arithmetic Operations Table 15-5. Multiply and Multiply-Accumulate to Data Register Options Option Description Default Signed fraction format. Multiply 1.15 * 1.15 formats to produce 1.31 results after shift correction. The special case of 0x8000 * 0x8000 is saturated to 0x7FFF FFFF to fit the 1.31 result.
Page 582
Instruction Overview Table 15-5. Multiply and Multiply-Accumulate to Data Register Options (Cont’d) Option Description (ISS2) Signed integer with scaling. Multiply 16.0 * 16.0 formats to produce 32.0 results. No shift correction. (Same as the IS mode.) Sign extend 32.0 result to 40.0 format before copying or accumulating to Accumu- lator.
Page 583
Arithmetic Operations Flags Affected This instruction affects flags as follows. • is set if the result extracted to the Dreg saturates; cleared if no saturation. • is set if is set; unaffected otherwise. • is set if result in Accumulator (MAC0 operation) saturates;...
Page 584
Instruction Overview Example r4=(a0=r3.h*r2.h) ; /* MAC0, only. Both operands are signed fractions. Load the product into A0, then into r4. */ r3=(a1+=r6.h*r4.l) (fu) ; /* MAC1, only. Both operands are unsigned fractions. Add the product into A1, then into r3. */ Also See Move Register,...
Page 586
Instruction Overview The Dreg version of the Negate (Two’s-Complement) instruction is offered with or without saturation. The only case where the nonsaturating Negate would overflow is when the input value is 0x8000 0000. The satu- rating version returns 0x7FFF FFFF; the nonsaturating version returns 0x8000 0000.
Page 587
Arithmetic Operations • is set if ; otherwise it is cleared. src_reg zero • All other flags are unaffected. The ADSP-BF535 processor has fewer flags and some flags ASTAT operate differently than subsequent Blackfin family products. For more information on the ADSP-BF535 status flags, see Table A-1 on page A-3.
Page 588
Instruction Overview Also See Vector Negate (Two’s-Complement) Special Applications None 15-76 ADSP-BF53x/BF56x Blackfin Processor Programming Reference...
Arithmetic Operations RND (Round to Half-Word) General Form dest_reg = src_reg (RND) Syntax Dreg_lo_hi =Dreg (RND) ; /* round and saturate the source to 16 bits. (b) */ Syntax Terminology Dreg R7– 0 Dreg_lo_hi R7–0.L R7–0.H Instruction Length In the syntax, comment (b) identifies 32-bit instruction length. Functional Description The Round to Half-Word instruction rounds a 32-bit, normalized-frac- tion number into a 16-bit, normalized-fraction number by extracting and...
Page 590
Instruction Overview Flags Affected The following flags are affected by this instruction. • is set if result is zero; cleared if nonzero. • is set if result is negative; cleared if non-negative. • is set if result saturates; cleared if no saturation. •...
Page 591
Arithmetic Operations Also See Add, Add/Subtract – Prescale Add/Subtract – Prescale Down Special Applications None ADSP-BF53x/BF56x Blackfin Processor Programming Reference 15-79...
Instruction Overview Saturate General Form dest_reg = src_reg (S) Syntax A0 = A0 (S) ; /* (b) */ A1 = A1 (S) ; /* (b) */ A1 = A1 (S), A0 = A0 (S) ; /* signed saturate both Accumula- tors at the 32-bit boundary (b) */ Syntax Terminology None...
Page 593
Arithmetic Operations Flags Affected This instruction affects flags as follows. • is set if result is zero; cleared if nonzero. In the case of two simultaneous operations, represents the logical “OR” of the two. • is set if result is negative; cleared if non-negative. In the case of two simultaneous operations, represents the logical “OR”...
Page 594
Instruction Overview Example a0 = a0 (s) ; a1 = a1 (s) ; a1 = a1 (s), a0 = a0 (s) ; Also See Subtract (saturate options), (saturate options) Special Applications None 15-82 ADSP-BF53x/BF56x Blackfin Processor Programming Reference...
Page 596
Instruction Overview • For a 32-bit input, Sign Bit returns the number of leading sign bits minus one, which is in the range 0 through 31. An input of all zeros or all ones returns +31 (all sign bits). • For a 40-bit Accumulator input, Sign Bit returns the number of leading sign bits minus 9, which is in the range –8 through +31.
Page 597
Arithmetic Operations Required Mode User & Supervisor Parallel Issue This instruction can be issued in parallel with specific other 16-bit instructions. For details, see “Issuing Parallel Instructions” on page 20-1. Example r2.l = signbits r7 ; r1.l = signbits r5.l ; r0.l = signbits r4.h ;...
Instruction Overview Subtract General Form dest_reg = src_reg_1 - src_reg_2 Syntax 32-Bit Operands, 32-Bit Result Dreg = Dreg - Dreg ; /* no saturation support but shorter instruction length (a) */ Dreg = Dreg - Dreg (sat_flag) ; /* saturation optionally sup- ported, but at the cost of longer instruction length (b) */ 16-Bit Operands, 16-Bit Result Dreg_lo_hi = Dreg_lo_hi –...
Page 599
Arithmetic Operations There are two ways to specify subtraction on 32-bit data. One instruction that is 16-bit instruction length does not support saturation. The other instruction, which is 32-bit instruction length, optionally supports satura- tion. The larger DSP instruction can sometimes save execution time because it can be issued in parallel with certain other instructions.
Page 600
Instruction Overview • is set if is set; unaffected otherwise. • All other flags are unaffected. The ADSP-BF535 processor has fewer flags and some flags ASTAT operate differently than subsequent Blackfin family products. For more information on the ADSP-BF535 status flags, see Table A-1 on page A-3.
Page 601
Arithmetic Operations Also See Modify – Decrement, Vector Add / Subtract Special Applications None ADSP-BF53x/BF56x Blackfin Processor Programming Reference 15-89...
Instruction Overview Subtract Immediate General Form register -= constant Syntax Ireg -= 2 ; /* decrement Ireg by 2, half-word address pointer increment (a) */ Ireg -= 4 ; /* word address pointer decrement (a) */ Syntax Terminology Ireg I3–0 Instruction Length In the syntax, comment (a) identifies 16-bit instruction length.
Page 603
Arithmetic Operations Example: If you use to increment your address pointer, first clear to disable circular buffering. Failure to explicitly clear Lreg beforehand can result in unexpected values. Ireg The circular address buffer registers (Index, Length, and Base) are not initialized automatically by Reset. Traditionally, user software clears all the circular address buffer registers during boot-up to dis- able circular buffering, then initializes them later, if needed.
Page 605
16 EXTERNAL EVENT MANAGEMENT Instruction Summary • “Idle” on page 16-3 • “Core Synchronize” on page 16-5 • “System Synchronize” on page 16-8 • “EMUEXCPT (Force Emulation)” on page 16-11 • “Disable Interrupts” on page 16-13 • “Enable Interrupts” on page 16-15 •...
Page 606
Instruction Overview Instruction Overview This chapter discusses the instructions that manage external events. Users can take advantage of these instructions to enable interrupts, force a spe- cific interrupt or reset to occur, or put the processor in idle state. The Core Synchronize instruction resolves all pending operations and flushes the core store buffer before proceeding to the next instruction.
External Event Management Idle General Form IDLE Syntax IDLE ; /* (a) */ Instruction Length In the syntax, comment (a) identifies 16-bit instruction length. Functional Description Typically, the Idle instruction is part of a sequence to place the Blackfin processor in a quiescent state so that the external system can switch between core clock frequencies.
Page 608
Instruction Overview Required Mode The Idle instruction executes only in Supervisor mode. If execution is attempted in User mode, the instruction produces an Illegal Use of Pro- tected Resource exception. Parallel Issue This instruction cannot be issued in parallel with other instructions. Example idle ;...
External Event Management Core Synchronize General Form CSYNC Syntax CSYNC ; /* (a) */ Instruction Length In the syntax, comment (a) identifies 16-bit instruction length. Functional Description The Core Synchronize ( ) instruction ensures resolution of all pend- CSYNC ing core operations and the flushing of the core store buffer before proceeding to the next instruction.
Page 610
Instruction Overview Parallel Issue The Core Synchronize instruction cannot be issued in parallel with other instructions. Example Consider the following example code sequence. if cc jump away_from_here ; /* produces speculative branch prediction */ csync ; r0 = [p0] ; /* load */ In this example, the instruction ensures that the load instruction is...
Page 611
External Event Management Further, it usually allows loads to access memory speculatively. The core may later cancel or restart speculative loads. By using the Core Synchro- nize or System Synchronize instructions and managing interrupts appropriately, you can restrict out-of-order and speculative behavior. Stores never access memory speculatively.
Instruction Overview System Synchronize General Form SSYNC Syntax SSYNC ; /* (a) */ Instruction Length In the syntax, comment (a) identifies 16-bit instruction length. Functional Description The System Synchronize ( ) instruction forces all speculative, tran- SSYNC sient states in the core and system to complete before processing continues.
Page 613
External Event Management Flags Affected None Required Mode User & Supervisor Parallel Issue instruction cannot be issued in parallel with other instructions. SSYNC Example Consider the following example code sequence. if cc jump away_from_here ; /* produces speculative branch prediction */ ssync ;...
Page 614
Instruction Overview Special Applications Typically, prepares the architecture for clock cessation or frequency SSYNC change. In such cases, the following instruction sequence is typical. instruction... instruction... CLI r0 ; /* disable interrupts */ idle ; /* enable Idle state */ ssync ;...
External Event Management EMUEXCPT (Force Emulation) General Form EMUEXCPT Syntax EMUEXCPT ; /* (a) */ Instruction Length In the syntax, comment (a) identifies 16-bit instruction length. Functional Description The Force Emulation instruction forces an emulation exception, thus allowing the processor to enter emulation mode. When emulation is enabled, the processor immediately takes an exception into emulation mode.
Page 616
Instruction Overview Example emuexcpt ; Also See RAISE (Force Interrupt / Reset) Special Applications None 16-12 ADSP-BF53x/BF56x Blackfin Processor Programming Reference...
External Event Management Disable Interrupts General Form Syntax CLI Dreg ; /* previous state of IMASK moved to Dreg (a) */ Syntax Terminology Dreg R7–0 Instruction Length In the syntax, comment (a) identifies 16-bit instruction length. Functional Description The Disable Interrupts instruction globally disables general interrupts by setting to all zeros.
Page 618
Instruction Overview Parallel Issue The Disable Interrupts instruction cannot be issued in parallel with other instructions. Example cli r3 ; Also See Enable Interrupts Special Applications This instruction is often issued immediately before an instruction. IDLE 16-14 ADSP-BF53x/BF56x Blackfin Processor Programming Reference...
External Event Management Enable Interrupts General Form Syntax STI Dreg ; /* previous state of IMASK restored from Dreg (a) */ Syntax Terminology Dreg R7–0 Instruction Length In the syntax, comment (a) identifies 16-bit instruction length. Functional Description The Enable Interrupts instruction globally enables interrupts by restoring the previous state of the interrupt system back into IMASK Flags Affected...
Page 620
Instruction Overview Example sti r3 ; Also See Disable Interrupts Special Applications This instruction is often located after an instruction so that it will IDLE execute after a wake-up event from the idle state. 16-16 ADSP-BF53x/BF56x Blackfin Processor Programming Reference...
External Event Management RAISE (Force Interrupt / Reset) General Form RAISE Syntax RAISE uimm4 ; /* (a) */ Syntax Terminology : 4-bit unsigned field, with the range of 0 through 15 uimm4 Instruction Length In the syntax, comment (a) identifies 16-bit instruction length. Functional Description The Force Interrupt / Reset instruction forces a specified interrupt or reset to occur.
Page 622
Instruction Overview Table 16-1. uimm4 Arguments and Events (Cont’d) uimm4 Event <reserved> <reserved> IVHW IVTMR IVG7 IVG8 IVG9 IVG10 IVG11 IVG12 IVG13 IVG14 IVG15 The Force Interrupt / Reset instruction cannot invoke Exception (EXC) or Emulation (EMU) events; use the instructions, EXCPT EMUEXCPT...
Page 623
External Event Management Required Mode The Force Interrupt / Reset instruction executes only in Supervisor mode. If execution is attempted in User mode, the Force Interrupt / Reset instruction produces an Illegal Use of Protected Resource exception. Parallel Issue The Force Interrupt / Reset instruction cannot be issued in parallel with other instructions.
Instruction Overview EXCPT (Force Exception) General Form EXCPT Syntax EXCPT uimm4 ; /* (a) */ Syntax Terminology : 4-bit unsigned field, with the range of 0 through 15 uimm4 Instruction Length In the syntax, comment (a) identifies 16-bit instruction length. Functional Description The Force Exception instruction forces an exception with code uimm4...
Page 625
External Event Management Parallel Issue The Force Exception instruction cannot be issued in parallel with other instructions. Example excpt 4 ; Also See None Special Applications None ADSP-BF53x/BF56x Blackfin Processor Programming Reference 16-21...
Instruction Overview Test and Set Byte (Atomic) General Form TESTSET Syntax TESTSET ( Preg ) ; /* (a) */ Syntax Terminology are not allowed as the register for this instruction) Preg P5–0 Instruction Length In the syntax, comment (a) identifies 16-bit instruction length. Functional Description The Test and Set Byte (Atomic) instruction loads an indirectly addressed memory byte, tests whether it is zero, then sets the most significant bit of...
Page 627
External Event Management The software designer is responsible for executing atomic operations in the proper cacheable / non-cacheable memory space. Typically, these opera- tions should execute in non-cacheable, off-core memory. In a chip implementation that requires tight temporal coupling between processors or processes, the design should implement a dedicated, non-cacheable block of memory that meets the data latency requirements of the system.
Page 628
Instruction Overview Parallel Issue instruction cannot be issued in parallel with other TESTSET instructions. Example testset (p1) ; instruction may be preceded by a instruction TESTSET CSYNC SSYNC to ensure that all previous exceptions or interrupts have been processed before the atomic operation begins. Also See Core Synchronize,...
External Event Management No Op General Form MNOP Syntax NOP ; /* (a) */ MNOP ; /* (b) */ Instruction Length In the syntax, comment (a) identifies 16-bit instruction length. Comment (b) identifies 32-bit instruction length. Functional Description The No Op instruction increments the and does nothing else.
Page 630
Instruction Overview Parallel Issue The 16-bit versions of this instruction can be issued in parallel with spe- cific other instructions. For details, see “Issuing Parallel Instructions” on page 20-1. Example nop ; mnop ; mnop || /* a 16-bit instr. */ || /* a 16-bit instr. */ ; Also See None Special Applications...
17 CACHE CONTROL Instruction Summary • “PREFETCH” on page 17-3 • “FLUSH” on page 17-5 • “FLUSHINV” on page 17-7 • “IFLUSH” on page 17-9 Instruction Overview This chapter discusses the instructions that are used to flush, invalidate, and prefetch data cache lines as well as the instruction used to invalidate a line in the instruction cache.
Page 632
Instruction Overview to invalidate a buffer, but the instruction also performs a flush of data marked as “dirty.” The registers, which are described in ITEST DTEST “Memory” chapter, can also be used to directly invalidate a line in cache. Buffers in source memory need to be invalidated when a DMA channel is filling the buffer and data cache has been enabled and the source memory has been defined as cacheable.
Page 633
Cache Control PREFETCH General Form PREFETCH Syntax PREFETCH [ Preg ] ; /* indexed (a) */ PREFETCH [ Preg ++ ] ; /* indexed, post increment (a) */ Syntax Terminology Preg P5–0 Instruction Length In the syntax, comment (a) identifies 16-bit instruction length. Functional Description The Data Cache Prefetch instruction causes the data cache to prefetch the cache line that is associated with the effective address in the P-register.
Page 634
Instruction Overview Flags Affected None Required Mode User & Supervisor Parallel Issue This instruction cannot be issued in parallel with other instructions. Example prefetch [ p2 ] ; prefetch [ p0 ++ ] ; Also See None Special Applications None 17-4 ADSP-BF53x/BF56x Blackfin Processor Programming Reference...
Page 635
Cache Control FLUSH General Form FLUSH Syntax FLUSH [ Preg ] ; /* indexed (a) */ FLUSH [ Preg ++ ] ; /* indexed, post increment (a) */ Syntax Terminology Preg P5–0 Instruction Length In the syntax, comment (a) identifies 16-bit instruction length. Functional Description The Data Cache Flush instruction causes the data cache to synchronize the specified cache line with higher levels of memory.
Page 636
Instruction Overview Flags Affected None Required Mode User & Supervisor Parallel Issue The instruction cannot be issued in parallel with other instructions. Example flush [ p2 ] ; flush [ p0 ++ ] ; Also See None Special Applications None 17-6 ADSP-BF53x/BF56x Blackfin Processor Programming Reference...
Page 637
Cache Control FLUSHINV General Form FLUSHINV Syntax FLUSHINV [ Preg ] ; /* indexed (a) */ FLUSHINV [ Preg ++ ] ; /* indexed, post increment (a) */ Syntax Terminology Preg P5–0 Instruction Length In the syntax, comment (a) identifies 16-bit instruction length. Functional Description The Data Cache Line Invalidate instruction causes the data cache to inval- idate a specific line in the cache.
Page 638
Instruction Overview Flags Affected None Required Mode User & Supervisor Parallel Issue The Data Cache Line Invalidate instruction cannot be issued in parallel with other instructions. Example flushinv [ p2 ] ; flushinv [ p0 ++ ] ; Also See None Special Applications None...
Page 639
Cache Control IFLUSH General Form IFLUSH Syntax IFLUSH [ Preg ] ; /* indexed (a) */ IFLUSH [ Preg ++ ] ; /* indexed, post increment (a) */ Syntax Terminology Preg P5–0 Instruction Length In the syntax, comment (a) identifies 16-bit instruction length. Functional Description The Instruction Cache Flush instruction causes the instruction cache to invalidate a specific line in the cache.
Page 640
Instruction Overview Flags Affected None Required Mode User & Supervisor Parallel Issue This instruction cannot be issued in parallel with other instructions. Example iflush [ p2 ] ; iflush [ p0 ++ ] ; Also See None Special Applications None 17-10 ADSP-BF53x/BF56x Blackfin Processor Programming Reference...
Page 641
18 VIDEO PIXEL OPERATIONS Instruction Summary • “ALIGN8, ALIGN16, ALIGN24” on page 18-3 • “DISALGNEXCPT” on page 18-6 • “BYTEOP3P (Dual 16-Bit Add / Clip)” on page 18-8 • “Dual 16-Bit Accumulator Extraction with Addition” on page 18-13 • “BYTEOP16P (Quad 8-Bit Add)” on page 18-15 •...
Page 642
Instruction Overview Instruction Overview This chapter discusses the instructions that manipulate video pixels. Users can take advantage of these instructions to align bytes, disable exceptions that result from misaligned 32-bit memory accesses, and perform dual and quad 8- and 16-bit add, subtract, and averaging operations. 18-2 ADSP-BF53x/BF56x Blackfin Processor Programming Reference...
Page 644
Instruction Overview Table 18-1. Byte Alignment Options src_reg_1 src_reg_0 byte7 byte6 byte5 byte4 byte3 byte2 byte1 byte0 dest_reg for ALIGN8: byte4 byte3 byte2 byte1 dest_reg for ALIGN16: byte5 byte4 byte3 byte2 dest_reg for ALIGN24: byte6 byte5 byte4 byte3 The input values are not implicitly modified by this instruction. The des- tination register can be the same D-register as one of the source registers.
Page 645
Video Pixel Operations Example // If r3 = 0xABCD 1234 and r4 = 0xBEEF DEAD, then . . . r0 = align8 (r3, r4) ; /* produces r0 = 0x34BE EFDE, */ r0 = align16 (r3, r4) ; /* produces r0 = 0x1234 BEEF, and */ r0 = align24 (r3, r4) ;...
Instruction Overview DISALGNEXCPT General Form DISALGNEXCPT Syntax DISALGNEXCPT ; /* (b) */ Syntax Terminology None Instruction Length In the syntax, comment (b) identifies 32-bit instruction length. Functional Description The Disable Alignment Exception for Load ( ) instruction DISALGNEXCPT prevents exceptions that would otherwise be caused by misaligned 32-bit memory loads issued in parallel.
Page 647
Video Pixel Operations Required Mode User & Supervisor Parallel Issue This instruction can be issued in parallel with specific other 16-bit instructions. For details, see “Issuing Parallel Instructions” on page 20-1. Example disalgnexcpt || r1 = [i0++] || r3 = [i1++] ; /* three instruc- tions in parallel */ disalgnexcpt || [p0 ++ p1] = r5 || r3 = [i1++] ;...
Page 649
Video Pixel Operations as bytes on half-word boundaries in one 32-bit destination register. Some syntax options load the upper byte in the half-word and others load the lower byte, as shown in Table 18-2, Table 18-4, and Table 18-4. Table 18-2. Assuming the source registers contain: 31....24 23....16 15....8...
Page 650
Instruction Overview The Dual 16-Bit Add / Clip instruction provides byte alignment directly in the source register pairs based on index regis- src_reg_0 src_reg_1 ters • The two LSBs of the register determine the byte alignment for source register pair (typically src_reg_0 R1:0...
Page 651
Video Pixel Operations order versions of this instruction. By default, the low order bytes come from the low register in the register pair. The ( – , R) option causes the low order bytes to come from the high register. In the optional reverse source order case (for example, using the ( –...
Page 652
Instruction Overview r3 = byteop3p (r1:0, r3:2) (lo, r) ; r3 = byteop3p (r1:0, r3:2) (hi, r) ; Also See BYTEOP16P (Quad 8-Bit Add) Special Applications This instruction is primarily intended for video motion compensation algorithms. The instruction supports the addition of the residual to a video pixel value, followed by unsigned byte saturation.
Video Pixel Operations Dual 16-Bit Accumulator Extraction with Addition General Form dest_reg_1 = A1.L + A1.H, dest_reg_0 = A0.L + A0.H Syntax Dreg = A1.L + A1.H, Dreg = A0.L + A0.H ; /* (b) */ Syntax Terminology Dreg R7–0 Instruction Length In the syntax, comment (b) identifies 32-bit instruction length.
Page 654
Instruction Overview Parallel Issue This instruction can be issued in parallel with specific other 16-bit instructions. For details, see “Issuing Parallel Instructions” on page 20-1. Example r4=a1.l+a1.h, r7=a0.l+a0.h ; Also See SAA (Quad 8-Bit Subtract-Absolute-Accumulate) Special Applications Use the Dual 16-Bit Accumulator Extraction with Addition instruction for motion estimation algorithms in conjunction with the Quad 8-Bit Subtract-Absolute-Accumulate instruction.
Page 656
Instruction Overview Table 18-7. Source Registers Contain 31....24 23....16 15....8 7....0 aligned_src_reg_0: aligned_src_reg_1: Table 18-8. Destination Registers Receive 31....24 23....16 15....8 7....0 aligned_src_reg_0: y1 + z1 y0 + z0 aligned_src_reg_1: y3 + z3 y2 + z2 The Quad 8-Bit Add instruction provides byte alignment directly in the source register pairs based on index registers src_reg_0...
Page 657
Video Pixel Operations Table 18-9. I-register Bits and the Byte Alignment The bytes selected are src_reg_pair_HI src_reg_pair_LO Two LSB’s of I0 or I1 byte7 byte6 byte5 byte4 byte3 byte2 byte1 byte0 00b: byte3 byte2 byte1 byte0 01b: byte4 byte3 byte2 byte1 10b: byte5...
Page 658
Instruction Overview Flags Affected None The ADSP-BF535 processor has fewer flags and some flags ASTAT operate differently than subsequent Blackfin family products. For more information on the ADSP-BF535 status flags, see Table A-1 on page A-3. Required Mode User & Supervisor Parallel Issue This instruction can be issued in parallel with specific other 16-bit instructions.
Page 660
Instruction Overview Table 18-11. Source Registers Contain 31....24 23....16 15....8 7....0 aligned_src_reg_0: aligned_src_reg_1: Table 18-12. Destination Registers Receive 31....24 23....16 15....8 7....0 dest_reg: avg(y3, z3) avg(y2, z2) avg(y1, z1) avg(y0, z0) Arithmetic average (or mean) is calculated by summing the two operands, then shifting right one place to divide by two.
Page 661
Video Pixel Operations The relationship between the I-register bits and the byte alignment is illustrated below. In the default source order case (for example, not the (R) syntax), assume a source register pair contains the data shown in Table 18-13. Table 18-13.
Page 662
Instruction Overview Table 18-14. Options for Quad 8-Bit Average – Byte (Cont’d) Option Description Reverses the order of the source registers within each register pair. Typical high performance applications cannot afford the overhead of reloading both register pair operands to maintain byte order for every calculation. Instead, they alternate and load only one register pair operand each time and alternate between the forward and reverse byte order versions of this instruction.
Page 663
Video Pixel Operations Flags Affected None The ADSP-BF535 processor has fewer flags and some flags ASTAT operate differently than subsequent Blackfin family products. For more information on the ADSP-BF535 status flags, see Table A-1 on page A-3. Required Mode User & Supervisor Parallel Issue This instruction can be issued in parallel with specific other 16-bit instructions.
Page 665
Video Pixel Operations Syntax Terminology Dreg R7–0 , only Dreg_pair R1:0 R3:2 Instruction Length In the syntax, comment (b) identifies 32-bit instruction length. Functional Description The Quad 8-Bit Average – Half-Word instruction finds the arithmetic average of two unsigned quad byte number sets byte wise, adjusting for byte alignment.
Page 666
Instruction Overview Table 18-18. And the versions that load the result into the higher byte – RNDH and TH – produce: 31....24 23....16 15....8 7....0 dest_reg: avg(y3, y2, z3, 0 ..0 avg(y1, y0, z1, 0 .
Page 667
Video Pixel Operations Table 18-19. I-register Bits and the Byte Alignment The bytes selected are src_reg_pair_HI src_reg_pair_LO Two LSB’s of I0 or I1 byte7 byte6 byte5 byte4 byte3 byte2 byte1 byte0 00b: byte3 byte2 byte1 byte0 01b: byte4 byte3 byte2 byte1 10b: byte5...
Page 668
Instruction Overview In the optional reverse source order case (for example, using the (R) syn- tax), the only difference is the source registers swap places within the register pair in their byte ordering. Assume a source register pair contains the data shown in Table 18-21.
Page 669
Video Pixel Operations Parallel Issue This instruction can be issued in parallel with specific other 16-bit instructions. For details, see “Issuing Parallel Instructions” on page 20-1. Example r3 = byteop2p (r1:0, r3:2) (rndl) ; r3 = byteop2p (r1:0, r3:2) (rndh) ; r3 = byteop2p (r1:0, r3:2) (tl) ;...
Instruction Overview BYTEPACK (Quad 8-Bit Pack) General Form dest_reg = BYTEPACK ( src_reg_0, src_reg_1 ) Syntax Dreg = BYTEPACK ( Dreg, Dreg ) ; /* (b) */ Syntax Terminology Dreg R7–0 Instruction Length In the syntax, comment (b) identifies 32-bit instruction length. Functional Description The Quad 8-Bit Pack instruction packs four 8-bit values, half-word aligned, contained in two source registers into one register, byte aligned as...
Page 671
Video Pixel Operations Flags Affected None The ADSP-BF535 processor has fewer flags and some flags ASTAT operate differently than subsequent Blackfin family products. For more information on the ADSP-BF535 status flags, see Table A-1 on page A-3. Required Mode User & Supervisor Parallel Issue This instruction can be issued in parallel with specific other 16-bit instructions.
Page 673
Video Pixel Operations Table 18-25. Destination Registers Receive 31....24 23....16 15....8 7....0 dest_reg_0: y1 - z1 y0 - z0 dest_reg_1: y3 - z3 y2 - z2 The only valid input source register pairs are R1:0 R3:2 The Quad 8-Bit Subtract instruction provides byte alignment directly in the source register pairs based on index registers src_reg_0...
Page 674
Instruction Overview Options The (R) syntax reverses the order of the source registers within each regis- ter pair. Typical high performance applications cannot afford the overhead of reloading both register pair operands to maintain byte order for every calculation. Instead, they alternate and load only one register pair operand each time and alternate between the forward and reverse byte order versions of this instruction.
Page 675
Video Pixel Operations Flags Affected None The ADSP-BF535 processor has fewer flags and some flags ASTAT operate differently than subsequent Blackfin family products. For more information on the ADSP-BF535 status flags, see Table A-1 on page A-3. Required Mode User & Supervisor Parallel Issue This instruction can be issued in parallel with specific other 16-bit instructions.
Instruction Overview SAA (Quad 8-Bit Subtract-Absolute-Accumulate) General Form SAA ( src_reg_0, src_reg_1 ) SAA ( src_reg_0, src_reg_1 ) (R) Syntax SAA (Dreg_pair, Dreg_pair) ; /* forward byte order operands (b) */ SAA (Dreg_pair, Dreg_pair) (R) ; /* reverse byte order oper- ands (b) */ Syntax Terminology (This instruction only supports register pairs...
Page 677
Video Pixel Operations – – ∑ ∑ a i j ( , ) b i j – ( , ) Figure 18-1. Absolute Difference (SAD) Calculations Typical values for N are 8 and 16, corresponding to the video block size of 8x8 and 16x16 pixels, respectively.
Page 678
Instruction Overview In the default source order case (for example, not the (R) syntax), assume a source register pair contain the data shown in Table 18-29. Table 18-29. I-register Bits and the Byte Alignment The bytes selected are src_reg_pair_HI src_reg_pair_LO Two LSB’s of I0 or I1 byte7 byte6...
Page 679
Video Pixel Operations Table 18-30. I-register Bits and the Byte Alignment The bytes selected are src_reg_pair_LO src_reg_pair_HI Two LSB’s of I0 or I1 byte7 byte6 byte5 byte4 byte3 byte2 byte1 byte0 00b: byte3 byte2 byte1 byte0 01b: byte4 byte3 byte2 byte1 10b: byte5...
Page 680
Instruction Overview Also See DISALGNEXCPT, Load Data Register Special Applications Use the Quad 8-Bit Subtract-Absolute-Accumulate instruction for block-based video motion estimation algorithms using block Sum of Absolute Difference (SAD) calculations to measure distortion. 18-40 ADSP-BF53x/BF56x Blackfin Processor Programming Reference...
Page 682
Instruction Overview Table 18-31. I-register Bits and the Byte Alignment The bytes selected are src_reg_pair_HI src_reg_pair_LO Two LSB’s of I0 or I1 byte7 byte6 byte5 byte4 byte3 byte2 byte1 byte0 00b: byte3 byte2 byte1 byte0 01b: byte4 byte3 byte2 byte1 10b: byte5 byte4...
Page 683
Video Pixel Operations The four bytes, now byte aligned, are copied into the destination registers on half-word alignment, as shown in Table 18-33 Table 18-34. Table 18-33. Source Register Contains 31....24 23....16 15....8 7....0 Aligned bytes: byte_D byte_C byte_B byte_A Table 18-34.
Page 684
Instruction Overview Example (r6,r5) = byteunpack r1:0 ; /* non-reversing sources */ • Assuming: • register ’s two LSBs = 00b, • = 0xFEED FACE • = 0xBEEF BADD then this instruction returns: • = 0x00BE 00EF • = 0x00BA 00DD •...
Page 685
Video Pixel Operations • Assuming: • register ’s two LSBs = 10b, • = 0xFEED FACE • = 0xBEEF BADD then this instruction returns: • = 0x00FA 00CE • = 0x00BE 00EF • Assuming: • register ’s two LSBs = 11b, •...
Page 686
Instruction Overview • Assuming: • register ’s two LSBs = 00b, • = 0xFEED FACE • = 0xBEEF BADD then this instruction returns: • = 0x00FE 00ED • = 0x00FA 00CE • Assuming: • register ’s two LSBs = 01b, •...
Page 687
Video Pixel Operations • Assuming: • register ’s two LSBs = 11b, • = 0xFEED FACE • = 0xBEEF BADD then this instruction returns: • = 0x00EF 00BA • = 0x00DD 00FE Also See BYTEPACK (Quad 8-Bit Pack) Special Applications None ADSP-BF53x/BF56x Blackfin Processor Programming Reference 18-47...
Page 689
19 VECTOR OPERATIONS Instruction Summary • “Add on Sign” on page 19-3 • “VIT_MAX (Compare-Select)” on page 19-8 • “Vector ABS” on page 19-15 • “Vector Add / Subtract” on page 19-18 • “Vector Arithmetic Shift” on page 19-23 • “Vector Logical Shift”...
Page 690
Instruction Overview Instruction Overview This chapter discusses the instructions that control vector operations. Users can take advantage of these instructions to perform simultaneous operations on multiple 16-bit values, including add, subtract, multiply, shift, negate, pack, and search. Compare-Select and Add-On-Sign are also included in this chapter.
Vector Operations Add on Sign General Form dest_hi = dest_lo = SIGN (src0_hi) * src1_hi + SIGN (src0_lo) * src1_lo Syntax Dreg_hi = Dreg_lo = SIGN ( Dreg_hi ) * Dreg_hi + SIGN ( Dreg_lo ) * Dreg_lo ; /* (b) */ Register Consistency The destination registers must be halves of the same...
Page 692
Instruction Overview Functional Description The Add on Sign instruction performs a two step function, as follows. 1. Multiply the arithmetic sign of a 16-bit half-word number in src0 by the corresponding half-word number in . The arithmetic src1 sign of is either (+1) or (–1), depending on the sign bit of src0 .
Page 693
Vector Operations Table 19-2. Source Registers Contain 31....24 23....16 15....8 7....0 src0: src1: Table 19-3. Destination Register Receives 31....24 23....16 15....8 7....0 dest: (sign_adjusted_b1) + (sign_adjusted_b1) + (sign_adjusted_b0) (sign_adjusted_b0) Flags Affected None The ADSP-BF535 processor has fewer flags and some flags ASTAT operate differently than subsequent Blackfin family products.
Page 697
Vector Operations Functional Description The Compare-Select ( ) instruction selects the maximum values of VIT_MAX pairs of 16-bit operands, returns the largest values to the destination regis- ter, and serially records in the source of the maximum.This operation A0.W performs signed operations. The operands are compared as two’s-complements.
Page 698
Instruction Overview Table 19-6. ASL Version Shifts A0.X A0.W 00000000 XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXBB Table 19-7. Where Indicates z0 and y0 are maxima z0 and y1 are maxima z1 and y0 are maxima z1 and y1 are maxima Conversely, the ASR version shifts right two bit positions and appends two MSBs to indicate the source of each maximum as shown in Table 19-8...
Page 699
Vector Operations Notice that the history bit code depends on the shift direction. The bit is always shifted onto first, followed by the bit for src_reg_1 src_reg_0 The single operand versions behave similarly. Single 16-Bit Operand Behavior If the dual source register contains the data shown in Table 19-10 the des- tination register receives the data shown in...
Page 700
Instruction Overview Table 19-13. ASR Version Shifts A0.X A0.W 00000000 BXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Table 19-14. Where Indicates y0 is the maximum y1 is the maximum The path metrics are allowed to overflow, and maximum comparison is done on the two’s-complement circle. Such comparison gives a better indication of the relative magnitude of two large numbers when a small number is added/subtracted to both.
Vector Operations Vector ABS General Form dest_reg = ABS source_reg (V) Syntax Dreg = ABS Dreg (V) ; /* (b) */ Syntax Terminology Dreg R7–0 Instruction Length In the syntax, comment (b) identifies 32-bit instruction length. Functional Description The Vector Absolute Value instruction calculates the individual absolute values of the upper and lower halves of a single 32-bit data register.
Page 704
Instruction Overview Table 19-16. Destination Register Contains 31....24 23....16 15....8 7....0 dest_reg: | x.h| | x.l | This instruction saturates the result. Flags Affected This instruction affects flags as follows. • is set if either or both result is zero; cleared if both are nonzero. •...
Page 705
Vector Operations Example /* If r1 = 0xFFFF 7FFF, then . . . */ r3 = abs r1 (v) ; /* . . . produces 0x0001 7FFF */ Also See Special Applications None ADSP-BF53x/BF56x Blackfin Processor Programming Reference 19-17...
Instruction Overview Vector Add / Subtract General Form dest = src_reg_0 +|+ src_reg_1 dest = src_reg_0 –|+ src_reg_1 dest = src_reg_0 +|– src_reg_1 dest = src_reg_0 –|– src_reg_1 dest_0 = src_reg_0 +|+ src_reg_1, dest_1 = src_reg_0 –|– src_reg_1 dest_0 = src_reg_0 +|– src_reg_1, dest_1 = src_reg_0 –|+ src_reg_1 dest_0 = src_reg_0 + src_reg_1, dest_1 = src_reg_0 –...
Page 707
Vector Operations Dual 32-Bit Operations Dreg = Dreg + Dreg, Dreg = Dreg – Dreg (opt_mode_1) ; /* add, subtract; the set of source registers must be the same for each operation (b) */ Dual 40-Bit Accumulator Operations Dreg = A1 + A0, Dreg = A1 –...
Page 708
Instruction Overview Options The Vector Add / Subtract instruction provides three option modes. • supports the Dual and Quad 16-Bit Operations ver- opt_mode_0 sions of this instruction. • supports the Dual 32-bit and 40-bit operations. opt_mode_1 • supports the Quad 16-Bit Operations versions of this opt_mode_2 instruction.
Page 709
Vector Operations Flags Affected This instruction affects the following flags. • is set if any results are zero; cleared if all are nonzero. • is set if any results are negative; cleared if all non-negative. • is set if the right-hand side of a dual operation generates a carry;...
Page 710
Instruction Overview r0=r2 +|- r1(co) ; /* add|subtract with half-word results crossed over in the destination register */ r7=r3 -|- r6(sco) ; /* subtract|subtract with saturation and half-word results crossed over in the destination register */ r5=r3 +|+ r4, r7=r3-|-r4 ; /* quad 16-bit operations, add|add, subtract|subtract */ r5=r3 +|- r4, r7=r3 -|+ r4 ;...
Vector Operations Vector Arithmetic Shift General Form dest_reg = src_reg >>> shift_magnitude (V) dest_reg = ASHIFT src_reg BY shift_magnitude (V) Syntax Constant Shift Magnitude Dreg = Dreg >>> uimm4 (V) ; /* arithmetic shift right, immedi- ate (b) */ Dreg = Dreg << uimm4 (V,S) ; /* arithmetic shift left, immedi- ate with saturation (b) */ Registered Shift Magnitude...
Page 712
Instruction Overview Instruction Length In the syntax, comment (b) identifies 32-bit instruction length. Functional Description The Vector Arithmetic Shift instruction arithmetically shifts a pair of half-word registered numbers a specified distance and direction. Though the two half-word registers are shifted at the same time, the two numbers are kept separate.
Page 713
Vector Operations “ASHIFT” Syntax Both half-word registers in are shifted by the number of places src_reg prescribed in , and the result stored into shift_magnitude dest_reg The sign of the shift magnitude determines the direction of the shift for versions. ASHIFT •...
Page 714
Instruction Overview Flags Affected This instruction affects flags as follows. • is set if either result is zero; cleared if both are nonzero. • is set if either result is negative; cleared if both are non-negative. • is set if either result overflows; cleared if neither overflows. •...
Page 715
Vector Operations Example r4=r5>>>3 (v) ; /* arithmetic right shift immediate R5.H and R5.L by 3 bits (divide each half-word by 8) If r5 = 0x8004 000F then the result is r4 = 0xF000 0001 */ r4=r5>>>3 (v, s) ; /* same as above, but saturate the result */ r2=ashift r7 by r5.l (v) ;...
Page 717
Vector Operations Logical shifts discard any bits shifted out of the register and backfill vacated bits with zeros. “>>” AND “<<” Syntax The two half-word registers in are shifted by the number of dest_reg places specified by and the result stored into shift_magnitude dest_reg The data is always a pair of 16-bit half-registers.
Page 718
Instruction Overview Flags Affected This instruction affects flags as follows. • is set if either result is zero; cleared if both are nonzero. • is set if either result is negative; cleared if both are non-negative. • is cleared. • All other flags are unaffected. The ADSP-BF535 processor has fewer flags and some flags ASTAT...
Page 719
Vector Operations Example r4=r5>>3 (v) ; /* logical right shift immediate R5.H and R5.L by 3 bits */ r4=r5<<3 (v) ; /* logical left shift immediate R5.H and R5.L by 3 bits */ r2=lshift r7 by r5.l (v) ; /* logically shift (right or left, depending on sign of r5.l) R7.H and R7.L by magnitude of R5.L */ Also See Vector Arithmetic...
Instruction Overview Vector MAX General Form dest_reg = MAX ( src_reg_0, src_reg_1 ) (V) Syntax Dreg = MAX ( Dreg , Dreg ) (V) ; /* dual 16-bit operations (b) */ Syntax Terminology Dreg R7–0 Instruction Length In the syntax, comment (b) identifies 32-bit instruction length. Functional Description The Vector Maximum instruction returns the maximum value (meaning the largest positive value, nearest to 0x7FFF) of the 16-bit half-word...
Page 721
Vector Operations Flags Affected This instruction affects flags as follows. • is set if either or both result is zero; cleared if both are nonzero. • is set if either or both result is negative; cleared if both are non-negative. •...
Page 722
Instruction Overview Also See Vector SEARCH, Vector MIN, MAX, Special Applications None 19-34 ADSP-BF53x/BF56x Blackfin Processor Programming Reference...
Vector Operations Vector MIN General Form dest_reg = MIN ( src_reg_0, src_reg_1 ) (V) Syntax Dreg = MIN ( Dreg , Dreg ) (V) ; /* dual 16-bit operation (b) */ Syntax Terminology Dreg R7–0 Instruction Length In the syntax, comment (b) identifies 32-bit instruction length. Functional Description The Vector Minimum instruction returns the minimum value (the most negative value or the value closest to 0x8000) of the 16-bit half-word...
Page 724
Instruction Overview Flags Affected This instruction affects flags as follows. • is set if either or both result is zero; cleared if both are nonzero. • is set if either or both result is negative; cleared if both are non-negative. •...
Page 725
Vector Operations Also See Vector SEARCH, Vector MAX, MAX, Special Applications None ADSP-BF53x/BF56x Blackfin Processor Programming Reference 19-37...
Instruction Overview Vector Multiply Simultaneous Issue and Execution A pair of compatible, scalar (individual) Multiply 16-Bit Operands instructions from “Multiply 16-Bit Operands” on page 15-43 can be com- bined into a single Vector Multiply instruction. The vector instruction executes the two scalar operations simultaneously and saves the results as a vector couplet.
Page 727
Vector Operations Valid pairs are , and Dreg R7:6 R5:4 R3:2 R1:0 Syntax Separate the two compatible scalar instructions with a comma to produce a vector instruction. Add a semicolon to the end of the combined instruc- tion, as usual. The order of the MAC operations on the command line is arbitrary.
Page 728
Instruction Overview /* MAC1 multiplies a signed fraction by an unsigned fraction. MAC0 multiplies two signed fractions. */ r5.h=r3.h*r2.h (m), r5.l=r3.l*r2.l (fu) ; /* MAC1 multiplies signed fraction by unsigned fraction. MAC0 multiplies two unsigned fractions. */ r0.h=r3.h*r2.h, r0.l=r3.l*r2.l (is) ; /* both MACs perform signed integer multiplication.
Vector Operations Vector Multiply and Multiply-Accumulate Simultaneous Issue and Execution A pair of compatible, scalar (individual) instructions from • “Multiply and Multiply-Accumulate to Accumulator” on page 15-53 • “Multiply and Multiply-Accumulate to Half-Register” on page 15-58 • “Multiply and Multiply-Accumulate to Data Register” on page 15-67 can be combined into a single vector instruction.
Page 730
Instruction Overview • The destination D-registers (if applicable) for both scalar opera- tions must form a vector couplet, as described below. • 16-bit: store the results in the upper- and lower-halves of the same 32-bit . MAC0 writes to the lower half, and Dreg MAC1 writes to the upper half.
Page 731
Vector Operations • is set if result in Accumulator (MAC1 operation) saturates; cleared if result does not saturate. • is set if is set; unaffected otherwise. AV1S • All other flags are unaffected. The ADSP-BF535 processor has fewer flags and some flags ASTAT operate differently than subsequent Blackfin family products.
Page 732
Instruction Overview Result is 16-bit half D-register r2.h=(a1=r7.l*r6.h), r2.l=(a0=r7.h*r6.h) ; /* simultaneous MAC0 and MAC1 execution, both are signed fractions, both products load into the Accumulators,MAC1 into half-word registers. */ r4.l=(a0=r1.l*r0.l), r4.h=(a1+=r1.h*r0.h) ; /* same as above, but sum result into A1. ; MAC order is arbitrary. */ r7.h=(a1+=r6.h*r5.l), r7.l=(a0=r6.h*r5.h) ;...
Page 733
Vector Operations Result is 32-bit D-register r3=(a1=r6.h*r7.h), r2=(a0=r6.l*r7.l) ; /* simultaneous MAC0 and MAC1 execution, both are signed fractions, both products load into the Accumulators */ r4=(a0=r6.l*r7.l), r5=(a1+=r6.h*r7.h) ; /* same as above, but sum result into A1. MAC order is arbitrary. */ r7=(a1+=r3.h*r5.h), r6=(a0-=r3.l*r5.l) ;...
Instruction Overview Vector Negate (Two’s-Complement) General Form dest_reg = – source_reg (V) Syntax Dreg = – Dreg (V) ; /* dual 16-bit operation (b) */ Syntax Terminology Dreg R7–0 Instruction Length In the syntax, comment (b) identifies 32-bit instruction length. Functional Description The Vector Negate instruction returns the same magnitude with the opposite arithmetic sign, saturated for each 16-bit half-word in the source.
Page 735
Vector Operations • is set if is set; unaffected otherwise. • is set if carry occurs from either or both results; cleared if nei- ther produces a carry. • All other flags are unaffected. The ADSP-BF535 processor has fewer flags and some flags ASTAT operate differently than subsequent Blackfin family products.
Instruction Overview Vector PACK General Form Dest_reg = PACK ( src_half_0, src_half_1 ) Syntax Dreg = PACK ( Dreg_lo_hi , Dreg_lo_hi ) ; /* (b) */ Syntax Terminology Dreg R7–0 Dreg_lo_hi R7–0.L R7–0.H Instruction Length In the syntax, comment (b) identifies 32-bit instruction length. Functional Description The Vector Pack instruction packs two 16-bit half-word numbers into the halves of a 32-bit data register as shown in...
Page 737
Vector Operations Flags Affected None The ADSP-BF535 processor has fewer flags and some flags ASTAT operate differently than subsequent Blackfin family products. For more information on the ADSP-BF535 status flags, see Table A-1 on page A-3. Required Mode User & Supervisor Parallel Issue This instruction can be issued in parallel with specific other 16-bit instructions.
Instruction Overview Vector SEARCH General Form (dest_pointer_hi, dest_pointer_lo ) = SEARCH src_reg (searchmode) Syntax (Dreg, Dreg) = SEARCH Dreg (searchmode) ; /* (b) */ Syntax Terminology Dreg R7–0 , or searchmode (GT) (GE) (LE) (LT) Instruction Length In the syntax, comment (b) identifies 32-bit instruction length. Functional Description This instruction is used in a loop to locate a maximum or minimum ele- ment in an array of 16-bit packed data.
Page 739
Vector Operations Based on the search mode specified in the syntax, the instruction tests for maximum or minimum signed values. Values are sign extended when copied into the Accumulator(s). See “Example” for one way to implement the search loop. After the vector search loop concludes, hold the two surviving elements, and contain their respective addresses.
Page 740
Instruction Overview Flags Affected None The ADSP-BF535 processor has fewer flags and some flags ASTAT operate differently than subsequent Blackfin family products. For more information on the ADSP-BF535 status flags, see Table A-1 on page A-3. Required Mode User & Supervisor Parallel Issue This instruction can be issued in parallel with the combination of one 16-bit length load instruction to the...
Page 741
Vector Operations LSETUP (loop_, loop_) LC0=P1>>1 ; /* set up the loop */ loop_: (r1,r0) = SEARCH R2 (LE) || R2=[P0++]; /* search for the last minimum in all but the last element of the array */ (r1,r0) = SEARCH R2 (LE); /* finally, search the last element */ /* The lower 16 bits of A1 and A0 contain the last minimums of the array.
20 ISSUING PARALLEL INSTRUCTIONS This chapter discusses the instructions that can be issued in parallel. It identifies supported combinations for parallel issue, parallel issue syntax, 32-bit ALU/MAC instructions, 16-bit instructions, and examples. The Blackfin processor is not superscalar; it does not execute multiple instructions at once.
Parallel Issue Syntax Table 20-1. Parallel Issue Combinations 32-bit ALU/MAC instruction 16-bit Instruction 16-bit Instruction Parallel Issue Syntax The syntax of a parallel issue instruction is as follows. • A 32-bit ALU/MAC instruction || A 16-bit instruction || A 16-bit instruction ; The vertical bar ( ) indicates the following instruction is to be issued in parallel with the previous instruction.
Issuing Parallel Instructions 32-Bit ALU/MAC Instructions The list of 32-bit instructions that can be in a parallel instruction are shown in Table 20-2. Table 20-2. 32-Bit DSP Instructions Instruction Name Notes Arithmetic Operations (Absolute Value) Only the versions that support optional saturation.
Examples Table 20-4. Group2 Compatible 16-Bit Instructions Instruction Name Notes Load / Store Load Data Register Ireg versions only. Load High Data Register Half Ireg versions only. Load Low Data Register Half Ireg versions only. Store Data Register Ireg versions only. Store High Data Register Half Ireg versions only.
Page 751
Issuing Parallel Instructions and One Memory Access Instruction in Parallel Ireg /* Add on Sign while incrementing an Ireg and loading a data reg- ister based on the previous value of the Ireg. */ r7.h=r7.l=sign(r2.h)*r3.h + sign(r2.l)*r3.l || i0+=m3 || r0=[i0] ;...
21 DEBUG The Blackfin processor’s debug functionality is used for software debug- ging. It also complements some services often found in an operating system (OS) kernel. The functionality is implemented in the processor hardware and is grouped into multiple levels. A summary of available debug features is shown in Table 21-1.
Page 754
Watchpoint Unit In addition, information that the Watchpoint Unit provides helps in the optimization of code. The unit also makes it easier to maintain executables through code patching. The Watchpoint Unit contains these memory-mapped registers (MMRs), which are accessible in Supervisor and Emulator modes: •...
Page 755
Debug The address ranges stored in WPIA0 WPIA1 WPIA2 WPIA3 WPIA4 must satisfy these conditions: WPIA5 <= WPIA0 WPIA1 <= WPIA2 WPIA3 <= WPIA4 WPIA5 Two operations implement data watchpoints: • The values in the two Data Watchpoint Address registers, , are compared to the address on the data buses.
Watchpoint Unit To enable the Watchpoint Unit, the bit in the register WPPWR WPIACTL must be set. If , then the individual watchpoints and watch- WPPWR = 1 point ranges may be enabled using the specific enable bits in the WPIACTL MMRs.
Debug Code patching allows software to replace sections of existing code with new code. The watchpoint registers are used to trigger an exception at the start addresses of the earlier code. The exception routine then vectors to the location in memory that contains the new code. On the processor, code patching can be achieved by writing the start address of the earlier code to one of the registers and setting the cor-...
Watchpoint Unit Data Address Watchpoints Each data watchpoint is controlled by four bits in the register, as WPDACTL shown in Table 21-6. Table 21-6. Data Address Watchpoints Bit Name Description WPDACCn Determines whether the match should be on a read or write access. WPDSRCn Determines which DAG the unit should monitor.
Watchpoint Unit WPSTAT Register The Watchpoint Status register ( ) monitors the status of the watch- WPSTAT points. It may be read and written in Supervisor or Emulator modes only. When a watchpoint or watchpoint range matches, this register reflects the source of the watchpoint.
Debug Trace Unit The Trace Unit stores a history of the last 16 changes in program flow taken by the program sequencer. The history allows the user to recreate the program sequencer’s recent path. The trace buffer can be enabled to cause an exception when full. The exception service routine associated with the exception saves trace buffer entries to memory.
Trace Unit The number of valid entries in is held in the field of the TBUF TBUFCNT register. On every second read, is decremented. Because TBUFSTAT TBUFCNT each entry corresponds to two pieces of data, a total of reads 2 x TBUFCNT empties the register.
Debug Listing 21-1. Recreating the Execution Trace in Memory [--sp] = (r7:7, p5:2); /* save registers used in this routine */ p5 = 32; /* 32 reads are needed to empty TBUF */ p2.l = buf; /* pointer to the header (first location) of the software trace buffer */ p2.h = buf;...
Debug Performance Monitor Control Register (PFCTL) 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 0xFFE0 8000 Reset = Undefined PFCNT1 PFMON1[7:0] 0 - Count number of cycles asserted Refer to Event Monitor table on 1 - Count positive edges only page 21-22 PFCNT0...
Page 774
Performance Monitoring Unit Table 21-8. Event Monitor Table PFMONx Fields Events That Cause the Count Value to Increment 0x00 Loop 0 iterations 0x01 Loop 1 iterations 0x02 Loop buffer 0 not optimized 0x03 Loop buffer 1 not optimized 0x04 PC invariant branches (requires trace buffer to be enabled, see “TBUFCTL Register”...
Page 775
Debug Table 21-8. Event Monitor Table (Cont’d) PFMONx Fields Events That Cause the Count Value to Increment 0x90 Processor stalls to memory 0x91 Data memory stalls to processor not hidden by processor stall 0x92 Data memory store buffer full stalls 0x93 Data memory write buffer full stalls due to high-to-low priority code tran- sition...
Page 776
Cycle Counter The cycle counter is 64 bits and increments every cycle. The count value is stored in two 32-bit registers, . The least significant 32 CYCLES CYCLES2 bits (LSBs) are stored in The most significant 32 bits (MSBs) are CYCLES.
Page 777
Debug Note when single-stepping through instructions in a debug environment, register increases in non-unity increments due to the interac- CYCLES tion of the debugger over JTAG. registers are not system MMRs, but are CYCLES CYCLES2 instead system registers. Execution Cycle Count Registers (CYCLES and CYCLES2) RO in User mode, RW in Supervisor and Emulator modes 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 Reset = Undefined...
Page 778
Cycle Counter SYSCFG Register The System Configuration register ( ) controls the configuration of SYSCFG the processor. This register is accessible only from the Supervisor mode. System Configuration Register (SYSCFG) 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 Reset = 0x0000 0030 15 14 13 12 11 10 SNEN (Self-Nesting Interrupt Enable)
Page 781
A ADSP-BF535 CONSIDERATIONS The ADSP-BF535 processor operates differently from other Blackfin pro- cessors in some areas. This chapter describes these differences. ADSP-BF535 Operating Modes and States In the “Operating Modes and States” chapter, several of the descriptions do not apply to the ADSP-BF535 processor. These are: •...
Page 782
ADSP-BF535 Flags ADSP-BF535 Flags Table A-1 lists the Blackfin processor instruction set and the affect on flags when these instructions execute on an ADSP-BF535 processor. The symbol definitions for the flag bits in the table are as follows. • – indicates that the flag is NOT AFFECTED by execution of the instruction •...
Page 783
ADSP-BF535 Considerations Table A-1. ASTAT Flag Behavior for the ADSP-BF535 AC0_ Instruction COPY COPY Jump – – – – – – IF CC JUMP – – – – – – Call – – – – – – RTS, RTI, RTX, RTN, RTE (Return) –...
Page 784
ADSP-BF535 Flags Table A-1. ASTAT Flag Behavior for the ADSP-BF535 (Cont’d) AC0_ Instruction COPY COPY Move Register (acc to dreg) – – – Move Conditional – – – – – – Move Half to Full Word – Zero-Extended – – Move Half to Full Word –...
Page 791
B CORE MMR ASSIGNMENTS The Blackfin processor’s memory-mapped registers (MMRs) are in the address range 0xFFE0 0000 – 0xFFFF FFFF. All core MMRs must be accessed with a 32-bit read or write access. This appendix lists core MMR addresses and register names. To find more information about an MMR, refer to the page shown in the “See Section”...
Page 792
L1 Data Memory Controller Registers Table B-1. L1 Data Memory Controller Registers (Cont’d) Memory-mapped Register Name See Section Address 0xFFE0 0104 DCPLB_ADDR1 “DCPLB_ADDRx Registers” on page 6-59 0xFFE0 0108 DCPLB_ADDR2 “DCPLB_ADDRx Registers” on page 6-59 0xFFE0 010C DCPLB_ADDR3 “DCPLB_ADDRx Registers” on page 6-59 0xFFE0 0110 DCPLB_ADDR4 “DCPLB_ADDRx Registers”...
Page 793
Core MMR Assignments Table B-1. L1 Data Memory Controller Registers (Cont’d) Memory-mapped Register Name See Section Address 0xFFE0 0224 DCPLB_DATA9 “DCPLB_DATAx Registers” on page 6-57 0xFFE0 0228 DCPLB_DATA10 “DCPLB_DATAx Registers” on page 6-57 0xFFE0 022C DCPLB_DATA11 “DCPLB_DATAx Registers” on page 6-57 0xFFE0 0230 DCPLB_DATA12 “DCPLB_DATAx Registers”...
C INSTRUCTION OPCODES This appendix describes the operation codes (or, “opcodes”) for each Blackfin instruction. The purpose is to specify the instruction codes for Blackfin software and tools developers. Introduction This format separates instructions as much as practical for maximum clar- ity.
Page 802
Introduction Glossary The following terms appear throughout this document. Without trying to explain the Blackfin architecture, here are the terms used with their defini- tions. See chapters 1 through 6 for more details on the architecture. Register Names The architecture includes the following registers. Table C-1.
Page 803
Instruction Opcodes Table C-1. Registers (Cont’d) Register Description Index Register The set of 32-bit registers I0, I1, I2, I3 that normally contain byte addresses of data structures. Abbreviated I-register or Ireg. Modify Registers The set of 32-bit registers M0, M1, M2, M3 that normally contain offset val- ues that are added or subtracted to one of the Index registers.
Page 804
Introduction Notation Conventions This appendix uses the following conventions: • Register names are alphabetic, followed by a number in cases where there are more than one register in a logical group. Thus, examples include ASTAT, FP, R3, and M2. Register names are reserved and may not be used as program identifiers.
Page 805
Instruction Opcodes This appendix uses the following conventions to describe options in the assembler syntax: • When there is a choice of any one register within a register group, this appendix shows the register set using a single dash to indicate the range of possible register numbers.
Page 806
Introduction • PC-relative, signed values are designated as “ ” with the fol- pcrel lowing modifiers: • the decimal number indicates how many bits the value can include; for example, pcrel5 is a 5-bit value. • any alignment requirements are designated by an optional “m”...
Page 807
Instruction Opcodes Chapter 2, Computational Units, for more details. Table C-3. Arithmetic Status Flag Summary Flag Description AC0 Carry (ALU0) AC1 Carry (ALU1) Negative Quotient Accumulator 0 Overflow AVS0 Accumulator 0 Sticky Overflow; set when AV0 is set, but remains set until explicitly cleared by user code Accumulator 1 Overflow AVS1 Accumulator 1 Sticky Overflow;...
Page 808
Introduction Core Register Encoding Map Instruction opcodes can address any core register by Register Group and Register Number using the following encoding. Table C-4. Core Register Encoding Map REGISTER NUMBER REGISTER GROUP A0.x A0.w A1.x A1.w <res.> <res.> ASTAT RETS <res.>...
Page 809
Instruction Opcodes A single 16-bit field represents 16-bit opcodes, and two stacked 16-bit fields represent 32-bit opcodes. When stacked, the upper 16 bits show the most significant bits; the lower 16 bits, the least significant bits. See the example table, below. The hex values of 32-bit instructions are shown stacked in the same order as the bit fields—most significant over least significant.
Page 810
Introduction Opcode Bit Terminology The following conventions describe the instruction opcode bit states. Table C-6. SYMBOL MEANING Binary zero bit, logical “low” Binary one bit, logical “high” “don’t care” bit Undefined Opcodes Any and all undefined instruction opcode bit patterns are reserved, poten- tially for future use.
Page 812
Introduction For example, a 32-bit opcode 0xFEED FACE is stored in memory loca- tions as shown in Table C-8, below. Table C-8. Example Memory Contents Relative Byte Address Data 0xED 0xFE 0xCE 0xFA This byte sequence is displayed in ascending address order as... 0xED 0xFE 0xCE...
Page 813
Instruction Opcodes Program Flow Control Instructions Table C-9. Program Flow Control Instructions (Sheet 1 of 3) Instruction and Version Opcode Range 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Jump 0x0050— 0 0 0 0 0 0 0 0 0 1 0 1 Preg # 0x0057 JUMP (Preg)
Page 814
Program Flow Control Instructions Table C-9. Program Flow Control Instructions (Sheet 2 of 3) Instruction and Version Opcode Range 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Call 0x0070— 0 0 0 0 0 0 0 0 0 1 1 1 Preg # 0x0077 CALL (PC+Preg)
Page 815
Instruction Opcodes Table C-9. Program Flow Control Instructions (Sheet 3 of 3) Instruction and Version Opcode Range 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Zero Overhead Loop Setup 0xE0E0 0000— 1 1 1 0 0 0 0 0 1 1 1 0 pcrel5m2 0xE0AF F3FF divided by 2...
Page 816
Load / Store Instructions Load / Store Instructions Table C-10. Load / Store Instructions (Sheet 1 of 12) Instruction and Version Opcode Range 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Load Immediate 0xE100 0000—...
Page 837
Instruction Opcodes Stack Control Instructions Table C-12. Stack Control Instructions (Sheet 1 of 2) Instruction and Version Opcode Range 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Push 0x0140— 0 0 0 0 0 0 0 1 0 1 Reg. Reg.
Page 838
Stack Control Instructions Table C-12. Stack Control Instructions (Sheet 2 of 2) Instruction and Version Opcode Range 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Pop Multiple 0x0500— 0 0 0 0 0 1 0 1 0 0 Dreg # 0 0 0 0x0538 NOTE: The embedded register number represents the lowest register in the range to be used.
Page 839
Instruction Opcodes Control Code Bit Management Instructions Table C-13. Control Code Bit Management Instructions (Sheet 1 of 4) Instruction and Version Opcode Range 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Compare Data Register 0x0800—...
Page 840
Control Code Bit Management Instructions Table C-13. Control Code Bit Management Instructions (Sheet 2 of 4) Instruction and Version Opcode Range 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Compare Data Register 0x0E00—...
Page 841
Instruction Opcodes Table C-13. Control Code Bit Management Instructions (Sheet 3 of 4) Instruction and Version Opcode Range 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Compare Pointer Register 0x0E40— 0 0 0 0 1 1 1 0 0 1 uimm3 Dest reg 0x0E7F CC = Preg <= uimm3 (IU)
Page 842
Control Code Bit Management Instructions Table C-13. Control Code Bit Management Instructions (Sheet 4 of 4) Instruction and Version Opcode Range 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Move CC 0x0320— 0 0 0 0 0 0 1 1 0 0 1 ASTAT bit # 0x033F CC |= statbit...
Page 869
Instruction Opcodes Table C-17. Arithmetic Operations Instructions (Sheet 15 of 44) Instruction and Version Opcode Range 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Multiply and Multiply-Accumulate to Accumulator Legend: Dreg Dreg half determines which halves of the input oper- half and registers to use.
Page 937
Instruction Opcodes Table C-21. Vector Operations Instructions (Sheet 31 of 33) Instruction Opcode and Version Range 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Vector Multiply and Multiply-Accumulate LEGEND: op0 and op1 specify the arith- metic operation for each MAC.
Page 938
Vector Operations Instructions Table C-21. Vector Operations Instructions (Sheet 32 of 33) Instruction Opcode and Version Range 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Vector Negate (Two’s-Complement) 0xC40F C000— 1 1 0 0 0 1 0 x x x 0 0 1 1 1 1 0xC40F CE38 Dest.
Page 939
Instruction Opcodes Table C-21. Vector Operations Instructions (Sheet 33 of 33) Instruction Opcode and Version Range 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Vector Search 1 1 0 0 0 1 0 x x x 0 0 1 1 0 1 0xC40D 8000—...
Page 940
Instructions Listed By Operation Code 16-Bit Opcode Instructions Table C-22 lists the instructions that are represented by 16-bit opcodes. Table C-22. 16-Bit Opcode Instructions (Sheet 1 of 14) Instruction Opcode and Version Range No Op 0x0000— Return 0x0010— Return 0x0011— Return 0x0012—...
Page 941
Instruction Opcodes Table C-22. 16-Bit Opcode Instructions (Sheet 2 of 14) Instruction Opcode and Version Range Call 0x0060— CALL (Preg) 0x0067 Call 0x0070— CALL (PC+Preg) 0x0077 Jump 0x0080— JUMP (PC+Preg) 0x0087 Force Interrupts / Reset 0x0090— RAISE uimm4 0x009F Force Exception 0x00A0—...
Page 942
Instructions Listed By Operation Code Table C-22. 16-Bit Opcode Instructions (Sheet 3 of 14) Instruction Opcode and Version Range Data Cache Line Invalidate 0x0268— FLUSHINV [Preg++] 0x026F Data Cache Flush 0x0270— FLUSH [Preg++] 0x0277 Instruction Cache Flush 0x0278— IFLUSH [Preg++] 0x027F Move CC 0x0300—...
Page 943
Instruction Opcodes Table C-22. 16-Bit Opcode Instructions (Sheet 4 of 14) Instruction Opcode and Version Range Push Multiple 0x05C0— [– –SP]=(R7:Dreglim, P5:Preglim) 0x05FD Move Conditional 0x0600— IF !CC Dreg=Dreg 0x063F Move Conditional 0x0640— IF !CC Dreg=Preg 0x067F Move Conditional 0x0680— IF !CC Preg=Dreg 0x06BF Move Conditional...
Page 944
Instructions Listed By Operation Code Table C-22. 16-Bit Opcode Instructions (Sheet 5 of 14) Instruction Opcode and Version Range Compare Pointer Register 0x09C0— CC = Preg < Preg (IU) 0x09FF Compare Data Register 0x0A00— CC = Dreg <= Dreg (IU) 0x0A3F Compare Pointer Register 0x0A40—...
Page 945
Instruction Opcodes Table C-22. 16-Bit Opcode Instructions (Sheet 6 of 14) Instruction Opcode and Version Range Conditional Jump 0x1000— IF !CC JUMP pcrel11m2 0x13FF Conditional Jump 0x1800— IF CC JUMP pcrel11m2 0x17FF Conditional Jump 0x1400— IF !CC JUMP pcrel11m2 (bp) 0x1BFF Conditional Jump 0x1C00—...
Page 946
Instructions Listed By Operation Code Table C-22. 16-Bit Opcode Instructions (Sheet 7 of 14) Instruction Opcode and Version Range Divide Primitive 0x4240— DIVS (Dreg, Dreg) 0x427F Move Half to Full Word, Sign Extended 0x4280— Dreg = Dreg_lo (X) 0x42BF Move Half to Full Word – Zero Extended 0x42C0—...
Page 947
Instruction Opcodes Table C-22. 16-Bit Opcode Instructions (Sheet 8 of 14) Instruction Opcode and Version Range Bit Set 0x4A00— BITSET (Dreg, uimm5) 0x4AFF Bit Toggle 0x4B00— BITTGL (Dreg, uimm5) 0x4BFF Bit Clear 0x4C00— BITCLR (Dreg, uimm5) 0x4CFF Arithmetic Shift 0x4D00— Dreg >>>= uimm5 0x4DFF Logical Shift...
Page 948
Instructions Listed By Operation Code Table C-22. 16-Bit Opcode Instructions (Sheet 9 of 14) Instruction Opcode and Version Range Load Immediate 0x6000— Dreg = imm7 (X) 0x63FF Add Immediate 0x6400— Dreg += imm7 0x6700 Load Immediate 0x6800— Preg = imm7 (X) 0x6BFF Add Immediate 0x6C00—...
Page 949
Instruction Opcodes Table C-22. 16-Bit Opcode Instructions (Sheet 10 of 14) Instruction Opcode and Version Range Load Data Register 0x9000— Dreg = [ Preg ++ ] 0x903F Load Pointer Register 0x9040— Preg = [ Preg ++ ] 0x907F Load Data Register 0x9080—...
Page 950
Instructions Listed By Operation Code Table C-22. 16-Bit Opcode Instructions (Sheet 11 of 14) Instruction Opcode and Version Range Load Half Word, Zero Extended 0x9500— Dreg = W [ Preg ] (Z) 0x953F Load Half Word, Sign Extended 0x9540— Dreg = W [ Preg ] (X) 0x957F Store Low Data Register Half 0x9600—...
Page 951
Instruction Opcodes Table C-22. 16-Bit Opcode Instructions (Sheet 12 of 14) Instruction Opcode and Version Range Load High Data Register Half 0x9C40— Dreg_hi = W [ Ireg ++ ] 0x9C5F Load Data Register 0x9C80— Dreg = [ Ireg – – ] 0x9C9F Load Low Data Register Half 0x9CA0—...
Page 952
Instructions Listed By Operation Code Table C-22. 16-Bit Opcode Instructions (Sheet 13 of 14) Instruction Opcode and Version Range Modify-Increment 0x9EE0— Ireg += Mreg (brev) 0x9EEF Store Data Register 0x9F00— [ Ireg ] = Dreg 0x9F1F Store Low Data Register Half 0x9F20—...
Page 953
Instruction Opcodes Table C-22. 16-Bit Opcode Instructions (Sheet 14 of 14) Instruction Opcode and Version Range Store Low Data Register Half 0xB400— W [ Preg + uimm5m2 ] = Dreg 0xB7FF Load Data Register 0xB800— Dreg = [ FP – uimm7m4 ] 0xB9F7 Load Pointer Register 0xB808—...
Page 954
Instructions Listed By Operation Code 32-Bit Opcode Instructions Table C-23 lists the instructions that are represented by 32-bit opcodes. Table C-23. 32-Bit Opcode Instructions (Sheet 1 of 40) Instruction Opcode and Version Range Vector Multiply and Multiply-Accumulate 0xC000 0000— A0 {=, +=, or –=} Dreg_lo_hi * Dreg_lo_hi , 0xC003 DE3F A1 {=, +=, or –=} Dreg_lo_hi * Dreg_lo_hi Multiply and Multiply-Accumulate to Accumulator...
Page 955
Instruction Opcodes Table C-23. 32-Bit Opcode Instructions (Sheet 2 of 40) Instruction Opcode and Version Range Vector Multiply and Multiply-Accumulate 0xC004 0000— A0 {=, +=, or –=} Dreg_lo_hi * Dreg_lo_hi , 0xC007 DFFF Dreg_hi = (A1 {=, +=, or –=} Dreg_lo_hi * Dreg_lo_hi) Multiply and Multiply-Accumulate to Half Register 0xC004 1800—...
Page 956
Instructions Listed By Operation Code Table C-23. 32-Bit Opcode Instructions (Sheet 3 of 40) Instruction Opcode and Version Range Multiply and Multiply-Accumulate to Data Register 0xC00D 0800— Dreg_even = (A0 += Dreg_lo_hi * Dreg_lo_hi) 0xC00D 0FFF Multiply and Multiply-Accumulate to Data Register 0xC00D 1000—...
Page 957
Instruction Opcodes Table C-23. 32-Bit Opcode Instructions (Sheet 4 of 40) Instruction Opcode and Version Range Multiply and Multiply-Accumulate to Half Register 0xC023 2800— Dreg_lo = (A0 += Dreg_lo_hi * Dreg_lo_hi) (S2RND) 0xC023 2FFF Multiply and Multiply-Accumulate to Half Register 0xC023 3000—...
Page 958
Instructions Listed By Operation Code Table C-23. 32-Bit Opcode Instructions (Sheet 5 of 40) Instruction Opcode and Version Range Multiply and Multiply-Accumulate to Data Register 0xC02D 1000— Dreg_even = (A0 –= Dreg_lo_hi * Dreg_lo_hi) (S2RND) 0xC02D 17FF Multiply and Multiply-Accumulate to Half Register 0xC034 1800—...
Page 959
Instruction Opcodes Table C-23. 32-Bit Opcode Instructions (Sheet 6 of 40) Instruction Opcode and Version Range Multiply and Multiply-Accumulate to Half Register 0xC045 1800— Dreg_hi = (A1 += Dreg_lo_hi * Dreg_lo_hi) (T) 0xC045 D9FF Multiply and Multiply-Accumulate to Half Register 0xC046 1800—...
Page 960
Instructions Listed By Operation Code Table C-23. 32-Bit Opcode Instructions (Sheet 7 of 40) Instruction Opcode and Version Range Vector Multiply and Multiply-Accumulate 0xC070 0000— A0 {=, +=, or –=} Dreg_lo_hi * Dreg_lo_hi , 0xC073 DE3F A1 {=, +=, or –=} Dreg_lo_hi * Dreg_lo_hi (W32, M) Multiply and Multiply-Accumulate to Accumulator 0xC070 1800—...
Page 961
Instruction Opcodes Table C-23. 32-Bit Opcode Instructions (Sheet 8 of 40) Instruction Opcode and Version Range Move Register Half 0xC083 3800— Dreg_lo = A0 (FU) 0xC083 39C0 Multiply and Multiply-Accumulate to Half Register 0xC084 1800— Dreg_hi = (A1 = Dreg_lo_hi * Dreg_lo_hi) (FU) 0xC084 D9FF Vector Multiply and Multiply-Accumulate 0xC084 0000—...
Page 962
Instructions Listed By Operation Code Table C-23. 32-Bit Opcode Instructions (Sheet 9 of 40) Instruction Opcode and Version Range Multiply and Multiply-Accumulate to Data Register 0xC08D 0000— Dreg_even = (A0 = Dreg_lo_hi * Dreg_lo_hi) (FU) 0xC08D 07FF Multiply and Multiply-Accumulate to Data Register 0xC08D 0800—...
Page 963
Instruction Opcodes Table C-23. 32-Bit Opcode Instructions (Sheet 10 of 40) Instruction Opcode and Version Range Vector Multiply and Multiply-Accumulate 0xC0C4 2000— Dreg_lo = (A0 {=, +=, or –=} Dreg_lo_hi * Dreg_lo_hi) , 0xC0C7 FFFF Dreg_hi = (A1 {=, +=, or –=} Dreg_lo_hi * Dreg_lo_hi) (TFU) Multiply and Multiply-Accumulate to Half Register 0xC0C5 1800—...
Page 964
Instructions Listed By Operation Code Table C-23. 32-Bit Opcode Instructions (Sheet 11 of 40) Instruction Opcode and Version Range Multiply and Multiply-Accumulate to Half Register 0xC103 2000— Dreg_lo = (A0 = Dreg_lo_hi * Dreg_lo_hi) (IS) 0xC103 27FF Multiply and Multiply-Accumulate to Half Register 0xC103 2800—...
Page 965
Instruction Opcodes Table C-23. 32-Bit Opcode Instructions (Sheet 12 of 40) Instruction Opcode and Version Range Vector Multiply and Multiply-Accumulate 0xC10C 0000— A0 {=, +=, or –=} Dreg_lo_hi * Dreg_lo_hi , 0xC10F DFFF Dreg_odd = (A1 {=, +=, or –=} Dreg_lo_hi * Dreg_lo_hi) (IS) Vector Multiply and Multiply-Accumulate 0xC10C 2000—...
Page 966
Instructions Listed By Operation Code Table C-23. 32-Bit Opcode Instructions (Sheet 13 of 40) Instruction Opcode and Version Range Multiply and Multiply-Accumulate to Half Register 0xC123 3000— Dreg_lo = (A0 –= Dreg_lo_hi * Dreg_lo_hi) (ISS2) 0xC123 37FF Move Register Half 0xC123 3800—...
Page 967
Instruction Opcodes Table C-23. 32-Bit Opcode Instructions (Sheet 14 of 40) Instruction Opcode and Version Range Multiply and Multiply-Accumulate to Half Register 0xC134 1800— Dreg_hi = (A1 = Dreg_lo_hi * Dreg_lo_hi) (ISS2, M) 0xC134 D9FF Vector Multiply and Multiply-Accumulate 0xC134 2000— Dreg_lo = (A0 {=, +=, or –=} Dreg_lo_hi * Dreg_lo_hi) , 0xC137 FFFF Dreg_hi = (A1 {=, +=, or –=} Dreg_lo_hi * Dreg_lo_hi) (ISS2, M)
Page 968
Instructions Listed By Operation Code Table C-23. 32-Bit Opcode Instructions (Sheet 15 of 40) Instruction Opcode and Version Range Multiply and Multiply-Accumulate to Half Register 0xC166 1800— Dreg_hi = (A1 –= Dreg_lo_hi * Dreg_lo_hi) (IH) 0xC166 D9FF Move Register Half 0xC167 1800—...
Page 969
Instruction Opcodes Table C-23. 32-Bit Opcode Instructions (Sheet 16 of 40) Instruction Opcode and Version Range Move Register Half 0xC187 1800— Dreg_hi = A1 (IU) 0xC187 19C0 Move Register Half 0xC187 3800— Dreg_lo = A0, Dreg_hi = A1 (IU) 0xC187 39C0 Dreg_hi = A1, Dreg_lo = A0 (IU) Multiply and Multiply-Accumulate to Half Register 0xC194 1800—...
Page 976
Instructions Listed By Operation Code Table C-23. 32-Bit Opcode Instructions (Sheet 23 of 40) Instruction Opcode and Version Range Vector Add / Subtract 0xC401 D000— Dreg = Dreg +|+ Dreg, Dreg = Dreg –|– Dreg (CO, ASL) 0xC401 DFFF Vector Add / Subtract 0xC401 E000—...
Page 977
Instruction Opcodes Table C-23. 32-Bit Opcode Instructions (Sheet 24 of 40) Instruction Opcode and Version Range Subtract 0xC403 A000— Dreg_lo = Dreg_hi – Dreg_lo (S) 0xC403 AE3F Subtract 0xC403 C000— Dreg_lo = Dreg_hi – Dreg_hi (NS) 0xC403 CE3F Subtract 0xC403 E000— Dreg_lo = Dreg_hi –...
Page 978
Instructions Listed By Operation Code Table C-23. 32-Bit Opcode Instructions (Sheet 25 of 40) Instruction Opcode and Version Range Maximum 0xC407 0000— Dreg = MAX (Dreg, Dreg) 0xC407 0E3F Minimum 0xC407 4000— Dreg = MIN (Dreg, Dreg) 0xC407 4E3F Absolute Value 0xC407 8000—...
Page 995
D NUMERIC FORMATS ADSP-BF53x/BF56x Blackfin family processors support 8-, 16-, 32-, and 40-bit fixed-point data in hardware. Special features in the computation units allow support of other formats in software. This appendix describes various aspects of these data formats. It also describes how to implement a block floating-point format in software.
Page 996
Integer or Fractional Data Formats Signed Integer Weight - (2 . . . Sign Bit Radix Point Unsigned Integer Weight . . . Sign Bit Radix Point Figure D-1. Integer Format In a fractional format, the assumed radix point lies within the number, so that some or all of the magnitude bits have a weight of less than 1.
Page 997
Numeric Formats Signed Fractional (13.3) Weight - (2 . . . Sign Bit Radix Point Unsigned Fractional (13.3) Weight . . . Sign Bit Radix Point Figure D-2. Example of Fractional Format ADSP-BF53x/BF56x Blackfin Processor Programming Reference...
Page 998
Integer or Fractional Data Formats Table D-1 shows the ranges of signed numbers representable in the frac- tional formats that are possible with 16 bits. Table D-1. Fractional Formats and Their Ranges Format # of # of Max Positive Value Max Negative Value of 1 LSB Integer...
Page 999
Numeric Formats Binary Multiplication In addition and subtraction, both operands must be in the same format (signed or unsigned, radix point in the same location), and the result for- mat is the same as the input format. Addition and subtraction are performed the same way whether the inputs are signed or unsigned.
Page 1000
Block Floating-Point Format Fractional Mode And Integer Mode A product of 2 two’s-complement numbers has two sign bits. Since one of these bits is redundant, you can shift the entire result left one bit. Addi- tionally, if one of the inputs was a 1.15 number, the left shift causes the result to have the same format as the other input (with 16 bits of addi- tional precision).
Need help?
Do you have a question about the ADSP-BF53x Blackfin and is the answer not in the manual?
Questions and answers