X2 (Execute 2) Pipestage; Xwb (Write-Back; Memory Pipeline; D1 And D2 Pipestage - Intel PXA255 User Manual

Xscale microarchitecture

Hide thumbs Also See for PXA255:

Developer's manual (600 pages)

Datasheet (40 pages)

Table Of Contents

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

page of 198

/ 198
Contents
Table of Contents
Bookmarks

Table of Contents

Optimization Guide

cancelled, and will not cause any architectural state changes, including modifications of

registers, memory, and PSR.

•

Branch target determination - If a branch was mispredicted by the BTB, the X1 pipestage

flushes all of the instructions in the previous pipestages and sends the branch target address to

the BTB, which will restart the pipeline

A.2.3.5.

X2 (Execute 2) Pipestage

The X2 pipestage contains the program status registers (PSRs). This pipestage selects what is

going to be written to the RFU in the XWB cycle: PSRs (MRS instruction), ALU output, or other

items.

A.2.3.6.

XWB (write-back)

When an instruction has reached the write-back stage, it is considered complete. Changes are

written to the RFU.

A.2.4

Memory Pipeline

The memory pipeline consists of two stages, D1 and D2. The data cache unit, or DCU, consists of

the data-cache array, mini-data cache, fill buffers, and writebuffers. The memory pipeline solely

handles load and store instructions.

A.2.4.1.

D1 and D2 Pipestage

Operation begins in D1 after the X1 pipestage has calculated the effective address for load/stores.

The data cache and mini-data cache returns the destination data in the D2 pipestage. Before data is

returned in the D2 pipestage, sign extension and byte alignment occurs for byte and half-word

loads.

A.2.5

Multiply/Multiply Accumulate (MAC) Pipeline

The Multiply-Accumulate (MAC) unit executes all multiply and multiply-accumulate instructions

supported by the Intel® XScale™ core. The MAC implements the 40-bit Intel® XScale™ core

accumulator register acc0 and handles the instructions, which transfer its value to and from

general-purpose ARM* registers.

The following are important characteristics about the MAC:

•

The MAC is not truly pipelined, as the processing of a single instruction may require use of the

same datapath resources for several cycles before a new instruction can be accepted. The type

of instruction and source arguments determines the number of cycles required.

•

No more than two instructions can occupy the MAC pipeline concurrently.

•

When the MAC is processing an instruction, another instruction may not enter M1 unless the

original instruction completes in the next cycle.

•

The MAC unit can operate on 16-bit packed signed data. This reduces register pressure and

memory traffic size. Two 16-bit data items can be loaded into a register with one LDR.

•

The MAC can achieve throughput of one multiply per cycle when performing a 16 by 32 bit

multiply.

A-6

Intel® XScale™ Microarchitecture User's Manual

Table of Contents

Need help?

Do you have a question about the PXA255 and is the answer not in the manual?

X2 (Execute 2) Pipestage; Xwb (Write-Back; Memory Pipeline; D1 And D2 Pipestage - Intel PXA255 User Manual

X2 (Execute 2) Pipestage

XWB (write-back)

Memory Pipeline

D1 and D2 Pipestage

Multiply/Multiply Accumulate (MAC) Pipeline

Need help?

Subscribe to Our Youtube Channel

Related Manuals for Intel PXA255

Related Content for Intel PXA255

Table of Contents