Sign In
Upload
Manuals
Brands
Intel Manuals
Processor
XScale Core
Intel XScale Core Manuals
Manuals and User Guides for Intel XScale Core. We have
1
Intel XScale Core manual available for free PDF download: Developer's Manual
Intel XScale Core Developer's Manual (220 pages)
Brand:
Intel
| Category:
Processor
| Size: 3.8 MB
Table of Contents
Table of Contents
3
1 Introduction
13
About this Document
13
How to Read this Document
13
Other Relevant Documents
14
High-Level Overview of the Intel Xscale ® Core
15
ARM Compatibility
15
Features
16
Multiply/Accumulate (MAC)
16
Branch Target Buffer
17
Data Cache
17
Instruction Cache
17
Memory Management
17
Debug
18
Jtag
18
Performance Monitoring
18
Power Management
18
Terminology and Conventions
19
Number Representation
19
Terminology and Acronyms
19
2 Programming Model
21
ARM Architecture Compatibility
21
ARM Architecture Implementation Options
21
Big Endian Versus Little Endian
21
26-Bit Architecture
21
Thumb
21
ARM DSP-Enhanced Instruction Set
22
Base Register Update
22
Extensions to ARM Architecture
23
DSP Coprocessor 0 (CP0)
23
Multiply with Internal Accumulate Format
24
Miaph{<Cond>} Acc0, Rm, Rs
25
Mia{<Cond>} Acc0, Rm, Rs
25
Miaxy{<Cond>} Acc0, Rm, Rs
26
Internal Accumulator Access Format
27
Mar{<Cond>} Acc0, Rdlo, Rdhi
28
New
29
Second-Level Descriptors for Coarse
30
Second-Level Descriptors for Fine
30
First-Level Descriptors
30
Additions to CP15 Functionality
31
Event Architecture
32
Event Priority
32
Exception Summary
32
Prefetch Aborts
33
Encoding of Fault Status for Prefetch Aborts
33
Data Aborts
34
Encoding of Fault Status for Data Aborts
34
Events from Preload Instructions
35
Debug Events
36
3 Memory Management
37
Overview
37
Architecture Model
38
Version 4 Vs. Version 5
38
Memory Attributes
38
Cacheable (C), Bufferable (B), and Extension (X) Bits
38
Instruction Cache
38
Page (P) Attribute Bit
38
Data Cache and Write Buffer
39
Data Cache and Buffer Behavior When X = 0
39
Data Cache and Buffer Behavior When X = 1
39
Details on Data Cache and Write Buffer Behavior
40
Memory Operation Ordering
40
Exceptions
40
Memory Operations that Impose a Fence
40
Interaction of the MMU, Instruction Cache, and Data Cache
41
Valid MMU & Data/Mini-Data Cache Combinations
41
Control
42
Enabling/Disabling
42
Invalidate (Flush) Operation
42
Locking Entries
43
Round-Robin Replacement Algorithm
45
Example of Locked Entries in TLB
45
4 Instruction Cache
47
Overview
47
Instruction Cache Organization
47
Operation
48
Operation When Instruction Cache Is Enabled
48
Operation When the Instruction Cache Is Disabled
48
Fetch Policy
49
Round-Robin Replacement Algorithm
49
Parity Protection
50
Instruction Cache Coherency
51
Instruction Fetch Latency
51
Instruction Cache Control
52
Enabling/Disabling
52
Instruction Cache State at RESET
52
Invalidating the Instruction Cache
53
Locking Instructions in the Instruction Cache
54
Locked Line Effect on Round Robin Replacement
54
Unlocking Instructions in the Instruction Cache
55
5 Branch Target Buffer
57
Branch Target Buffer (BTB) Operation
57
BTB Entry
57
Reset
58
Update Policy
58
Branch History
58
BTB Control
59
Disabling/Enabling
59
Invalidation
59
6 Data Cache
61
Overviews
61
Data Cache Overview
61
Data Cache Organization
62
Mini-Data Cache Overview
63
Mini-Data Cache Organization
63
Write Buffer and Fill Buffer Overview
64
Data Cache and Mini-Data Cache Operation
65
Cache Policies
65
Cacheability
65
Operation When Caching Is Enabled
65
Operation When Data Caching Is Disabled
65
Read Miss Policy
66
Write Miss Policy
67
Write-Back Versus Write-Through
67
Atomic Accesses
68
Parity Protection
68
Round-Robin Replacement Algorithm
68
Data Cache and Mini-Data Cache Control
69
Data Memory State after Reset
69
Enabling/Disabling
69
Invalidate and Clean Operations
69
Global Clean and Invalidate Operation
70
Re-Configuring the Data Cache as Data RAM
71
Locked Line Effect on Round Robin Replacement
74
Write Buffer/Fill Buffer Operation and Control
75
7 Configuration
77
Overview
77
MRC/MCR Format
78
LDC/STC Format When Accessing CP14
79
CP15 Registers
80
Register 0: ID & Cache Type Registers
81
ID Register
81
Cache Type Register
82
Register 1: Control & Auxiliary Control Registers
83
ARM* Control Register
83
Auxiliary Control Register
84
Register 2: Translation Table Base Register
85
Register 3: Domain Access Control Register
85
Register 4: Reserved
85
Translation Table Base Register
85
Domain Access Control Register
85
Register 5: Fault Status Register
86
Register 6: Fault Address Register
86
Register 7: Cache Functions
87
Register 8: TLB Operations
89
TLB Functions
89
Register 9: Cache Lock down
90
Cache Lockdown Functions
90
Data Cache Lock Register
90
Register 10: TLB Lock down
91
Register 11-12: Reserved
91
Register 13: Process ID
91
TLB Lockdown Functions
91
Accessing Process ID
91
Process ID Register
91
The PID Register Affect on Addresses
92
Register 14: Breakpoint Registers
93
Accessing the Debug Registers
93
Register 15: Coprocessor Access Register
94
Coprocessor Access Register
95
CP14 Registers
96
Performance Monitoring Registers
96
XSC1 Performance Monitoring Registers
96
Accessing the XSC1 Performance Monitoring Registers
96
XSC2 Performance Monitoring Registers
97
Accessing the XSC2 Performance Monitoring Registers
97
Clock and Power Management Registers
98
PWRMODE Register
98
Clock and Power Management
98
CCLKCFG Register
98
Software Debug Registers
99
Accessing the Debug Registers
99
8 Performance Monitoring
101
Overview
101
XSC1 Register Description (2 Counter Variant)
102
Clock Counter (CCNT; CP14 - Register 1)
102
XSC1 Performance Monitoring Registers
102
Clock Count Register (CCNT)
102
Performance Count Registers (PMN0 - PMN1; CP14 - Register 2 and 3, Respectively)
103
Extending Count Duration Beyond 32 Bits
103
Performance Monitor Control Register (PMNC)
103
Performance Monitor Count Register (PMN0 and PMN1)
103
Performance Monitor Control Register (CP14, Register 0)
104
Managing PMNC
105
XSC2 Register Description (4 Counter Variant)
106
Clock Counter (CCNT)
106
Performance Monitoring Registers
106
Clock Count Register (CCNT)
106
Performance Count Registers (PMN0 - PMN3)
107
Performance Monitor Count Register (PMN0 - PMN3)
107
Performance Monitor Control Register (PMNC)
108
Interrupt Enable Register (INTEN)
109
Overflow Flag Status Register (FLAG)
110
Event Select Register (EVTSEL)
111
Managing the Performance Monitor
112
Performance Monitoring Events
113
Some Common Uses of the PMU
114
Instruction Cache Efficiency Mode
115
Data Cache Efficiency Mode
115
Instruction Fetch Latency Mode
115
Data/Bus Request Buffer Full Mode
116
Stall/Writeback Statistics
116
Instruction TLB Efficiency Mode
117
Data TLB Efficiency Mode
117
Multiple Performance Monitoring Run Statistics
118
Examples
119
XSC1 Example (2 Counter Variant)
119
XSC2 Example (4 Counter Variant)
120
9 Software Debug
121
Definitions
121
Debug Registers
121
Introduction
122
Halt Mode
122
Monitor Mode
122
Debug Control and Status Register (DCSR)
123
Global Enable Bit (GE)
124
Halt Mode Bit (H)
124
SOC Break (B)
124
Vector Trap Bits (TF,TI,TD,TA,TS,TU,TR)
125
Sticky Abort Bit (SA)
125
Method of Entry Bits (MOE)
125
Trace Buffer Mode Bit (M)
125
Trace Buffer Enable Bit (E)
125
Debug Exceptions
126
Event Priority
126
Halt Mode
127
Halt Mode R14_DBG Updating
127
Monitor Mode
129
Monitor Mode R14_DBG Updating
129
HW Breakpoint Resources
130
Instruction Breakpoints
130
Instruction Breakpoint Address and Control Register (Ibcrx)
130
Data Breakpoints
131
Data Breakpoint Register (Dbrx)
131
Data Breakpoint Controls Register (DBCON)
131
Software Breakpoints
133
Transmit/Receive Control Register (TXRXCTRL)
134
TX RX Control Register (TXRXCTRL)
134
RX Register Ready Bit (RR)
135
Normal RX Handshaking
135
High-Speed Download Handshaking States
135
Overflow Flag (OV)
136
Download Flag (D)
136
TX Register Ready Bit (TR)
137
Conditional Execution Using TXRXCTRL
137
TX Handshaking
137
TXRXCTRL Mnemonic Extensions
137
Transmit Register (TX)
138
Receive Register (RX)
138
TX Register
138
RX Register
138
Debug JTAG Access
139
SELDCSR JTAG Register
139
Dcsr (Dbg_Sr[34:3])
140
Ext_Dbg_Break
140
Hold_Reset
140
DBGTX JTAG Register
141
Dbg_Sr[0]
141
Tx (Dbg_Sr[34:3])
141
DBGRX JTAG Register
142
Dbg_Sr[0]
143
Flush_Rr
143
Hs_Download
143
Rx (Dbg_Sr[34:3])
143
RX Write Logic
143
Rx_Valid
144
Trace Buffer
145
Trace Buffer Registers
145
Checkpoint Registers
146
Trace Buffer Register (TBREG)
147
TBREG Format
147
Trace Buffer Entries
148
Message Byte
148
Message Byte Formats
148
Exception Message Byte
149
Non-Exception Message Byte
150
Address Bytes
151
Indirect Branch Entry Address Byte Organization
151
Trace Buffer Usage
152
High Level View of Trace Buffer
152
Downloading Code in the Instruction Cache
154
Mini Instruction Cache Overview
154
LDIC JTAG Command
155
LDIC JTAG Data Register
155
LDIC JTAG Data Register Hardware
155
LDIC Cache Functions
156
Format of LDIC Cache Functions
157
Loading Instruction Cache During Reset
158
Code Download During a Cold Reset for Debug
158
Steps for Loading Mini Instruction Cache During Reset
159
Dynamically Loading Instruction Cache after Reset
160
Downloading Code in IC During Program Execution
160
Steps for Dynamically Loading the Mini Instruction Cache
161
Dynamic Download Synchronization Code
162
10 Performance Considerations
163
Interrupt Latency
163
Branch Prediction
164
Addressing Modes
164
Branch Latency Penalty
164
Instruction Latencies
165
Performance Terms
165
Latency Example
166
Branch Instruction Timings
167
Data Processing Instruction Timings
167
Multiply Instruction Timings
168
Multiply Implicit Accumulate Instruction Timings
169
Implicit Accumulator Access Instruction Timings
169
Saturated Arithmetic Instructions
170
Status Register Access Instructions
170
Saturated Data Processing Instruction Timings
170
Status Register Access Instruction Timings
170
Load/Store Instructions
171
Semaphore Instructions
171
Load and Store Instruction Timings
171
Load and Store Multiple Instruction Timings
171
Semaphore Instruction Timings
171
Coprocessor Instructions
172
Miscellaneous Instruction Timing
172
CP15 Register Access Instruction Timings
172
CP14 Register Access Instruction Timings
172
Exception-Generating Instruction Timings
172
Count Leading Zeros Instruction Timings
172
Thumb Instructions
173
Optimization Guide
175
Introduction
175
A.1 Introduction
175
A.1.1 about this Guide
175
About this Guide
175
The Intel Xscale ® Core Pipeline
176
General Pipeline Characteristics
176
Number of Pipeline Stages
176
A-1 Pipelines and Pipe Stages
177
The Intel Xscale ® Core Pipeline Organization
177
Out of Order Completion
178
Register Scoreboarding
178
Use of Bypassing
178
Instruction Flow through the Pipeline
179
ARM* V5TE Instruction Execution
179
Pipeline Stalls
179
Main Execution Pipeline
180
F1 / F2 (Instruction Fetch) Pipestages
180
ID (Instruction Decode) Pipestage
180
RF (Register File / Shifter) Pipestage
181
WB (Write-Back)
181
X1 (Execute) Pipestages
181
X2 (Execute 2) Pipestage
181
Memory Pipeline
182
A.2.4 Memory Pipeline
182
D1 and D2 Pipestage
182
Multiply/Multiply Accumulate (MAC) Pipeline
182
Behavioral Description
182
Basic Optimizations
183
Conditional Instructions
183
A.3 Basic Optimizations
183
A.3.1 Conditional Instructions
183
Optimizing Condition Checks
183
Optimizing Branches
184
Optimizing Complex Expressions
186
Bit Field Manipulation
187
Optimizing the Use of Immediate Values
188
Optimizing Integer Multiply and Divide
189
Effective Use of Addressing Modes
190
A.4.1 Instruction Cache
191
Cache and Prefetch Optimizations
191
Instruction Cache
191
Cache Miss Cost
191
Code Placement to Reduce Cache Misses
191
Round-Robin Replacement Cache Policy
191
Locking Code into the Instruction Cache
192
Data and Mini Cache
193
Non Cacheable Regions
193
Write-Through and Write-Back Cached Memory Regions
193
Creating On-Chip RAM
194
Read Allocate and Read-Write Allocate Memory Regions
194
Mini-Data Cache
195
Data Alignment
196
Literal Pools
197
Cache Considerations
198
A.4.3 Cache Considerations
198
Cache Conflicts, Pollution and Pressure
198
Memory
198
Prefetch Considerations
199
A.4.4 Prefetch Considerations
199
Compute Vs. Data Bus Bound
199
Prefetch Distances
199
Prefetch Loop Limitations
199
Prefetch Loop Scheduling
199
Bandwidth Limitations
200
Low Number of Iterations
200
Cache Memory Considerations
201
Cache Blocking
203
Prefetch Unrolling
203
Pointer Prefetch
204
Loop Fusion
205
Loop Interchange
205
Prefetch to Reduce Register Pressure
206
Instruction Scheduling
207
A.5 Instruction Scheduling
207
A.5.1 Scheduling Loads
207
Scheduling Loads
207
Scheduling Load and Store Double (LDRD/STRD)
210
Scheduling Load and Store Multiple (LDM/STM)
211
Scheduling Data Processing Instructions
212
Scheduling Multiply Instructions
213
Scheduling SWP and SWPB Instructions
214
Scheduling the MRA and MAR Instructions (MRRC/MCRR)
215
Scheduling the MIA and MIAPH Instructions
216
Scheduling CP15 Coprocessor Instructions
217
Scheduling MRS and MSR Instructions
217
Optimizing C Libraries
218
A.6 Optimizing C Libraries
218
Optimizations for Size
218
Multiple Word Load and Store
218
Space/Performance Trade off
218
Use of Conditional Instructions
218
Use of PLD Instructions
218
Test Features
219
Overview
219
B.1 Overview
219
Advertisement
Advertisement
Related Products
Intel Quad-Core Xeon 3300 Series
Intel Xeon X3380
Intel Xeon X3360
Intel Xeon X3350
Intel Xeon X3330
Intel Xeon X3320
Intel Core2 vPro
Intel X79 Extreme7
Intel Xeon X5355
Intel Xeon X3370
Intel Categories
Motherboard
Computer Hardware
Server
Server Board
Desktop
More Intel Manuals
Login
Sign In
OR
Sign in with Facebook
Sign in with Google
Upload manual
Upload from disk
Upload from URL