HP 124708-001 - ProLiant Cluster - 1850 Introduction Manual page 8

The intel processor roadmap for industry-standard servers technology brief, 10 edition
Hide thumbs Also See for 124708-001 - ProLiant Cluster - 1850:
Table of Contents

Advertisement

Since multi-processing operating systems such as Microsoft Windows and Linux are designed to
divide their workload into threads that can be independently scheduled, these operating systems can
send two distinct threads to work their way through execution in the same device. This provides the
opportunity for a higher abstraction level of parallelism at the thread level rather than simply at the
instruction level, as in the Pentium 4 design. To illustrate this concept, refer to Table 3: It is obvious
that instruction-level parallelism can take advantage of opportunities in the instruction stream to
execute independent instructions at the same time. Thread-level parallelism, shown in Table 4, takes
this a step further since two independent instruction streams are available for simultaneous execution
opportunities.
It should be noted that the performance gain from adding HT Technology does not equal the expected
gain from adding a second physical processor or processor core. The overhead to maintain the
threads and the requirement to share processor resources limit HT Technology performance.
Nevertheless, HT Technology was a valuable and cost-effective addition to the Pentium 4 design.
Table 3. Example of instruction-level parallelism
Instruction
Instruction
number
thread
1
Read register A
2
Write register B
3
Read register C
Add A + B
4
5
Inc A
Table 4. Example of thread-level parallelism
Instruction
Instruction
number
thread
1a
Read
register A
2a
Write
register B
3a
Read
register C
Add A + B
4a
5a
Inc A
According to Intel's simulations, HT Technology achieves its objective of improving the
microarchitecture utilization rate significantly. Improved performance is the real goal though, and Intel
reports that the performance gain can be as high as 30 percent.
The performance gained by these design changes is limited by the fact that two threads now share
and compete for processor resources, such as the execution pipeline and Level 1 (L1) and L2 caches.
There is some risk that data needed by one thread can be replaced in a cache by data that the other
is using, resulting in a higher turnover of cache data (referred to as thrashing) and a reduced hit rate.
Instruction execution
Operations 1, 2, and 3 are independent and can execute simultaneously if
resources permit.
This operation must wait for instructions 1 and 2 to complete, but it can
execute in parallel with operation 3.
This operation needs to wait for the completion of instruction 4 before
executing.
Instruction
Instruction
number
thread
1b
Add D + E
2b
Inc E
3b
Read F
4b
Add E+F
5b
Write E
Instruction execution
None of the instructions in Thread
2 depend on those in Thread 1;
therefore, to the extent that
execution units are available, any
of them can execute in parallel
with those in Thread 1.
As an example, instruction 2b
must wait for instruction 1b, but
does not need to wait for 1a.
Similarly, if two arithmetic units
are available, 4a and 4b can
execute at the same time.
8

Advertisement

Table of Contents
loading

This manual is also suitable for:

Bl10e - hp proliant - 512 mb ramCl380 - proliant - 256 mb ramProliant cl1850226824-001 - proliant - ml750234664-002 - proliant - ml330t02Dl360 - proliant - photon ... Show all

Table of Contents