Performance Considerations; Branch Prediction - Intel PXA255 User Manual

Xscale microarchitecture

Hide thumbs Also See for PXA255:

Developer's manual (600 pages)

Datasheet (40 pages)

Table Of Contents

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

page of 198

/ 198
Contents
Table of Contents
Bookmarks

Table of Contents

Performance Considerations

This chapter describes performance considerations that compiler writers, application programmers

and system designers need to be aware of to efficiently use the Intel® XScale™ core. Performance

numbers discussed here include branch prediction, and instruction latencies.

The timings in this section are specific to the PXA255 processor, and how it implements the ARM*

v5TE architecture. This is not a summary of all possible optimizations nor is it an explanation of

the ARM* v5TE instruction set. For information on instruction definitions and behavior consult the

ARM* Architecture Reference Manual.

11.1

Branch Prediction

The Intel® XScale™ core implements dynamic branch prediction for the ARM* instructions B and

BL and for the Thumb instruction B. Any instruction that specifies the PC as the destination is

predicted as not taken, and is not entered into the BTB. For example, an LDR or a MOV that loads

or moves directly to the PC will be predicted not taken and incur a branch latency penalty.

The instructions B and BL (including Thumb) enter into the branch target buffer when they are

taken for the first time. A taken branch refers to when they are evaluated to be true. Once in the

branch target buffer, the Intel® XScale™ core dynamically predicts the outcome of these

instructions based on previous outcomes.

instructions are correctly predicted and when they are not. A penalty of zero for correct prediction

means that the Intel® XScale™ core can execute the next instruction in the program flow in the

cycle following the branch.

Table 11-1. Branch Latency Penalty

Core Clock Cycles

ARM*

Intel® XScale™ Microarchitecture User's Manual

Thumb

Predicted Correctly. The instruction matches in the branch target buffer and is

+ 0

correctly predicted.

Mispredicted. There are three occurrences of branch misprediction, all of

which incur a 4-cycle branch delay penalty.

1. The instruction is in the branch target buffer and is predicted not-taken, but

+ 5

is actually taken.

2. The instruction is not in the branch target buffer and is a taken branch.

3. The instruction is in the branch target buffer and is predicted taken, but is

actually not-taken

Table 11-1

shows the branch latency penalty when these

Description

11-1

Table of Contents

Performance Considerations; Branch Prediction - Intel PXA255 User Manual

Performance Considerations

Branch Prediction

Related Manuals for Intel PXA255

Related Content for Intel PXA255

Table of Contents