Collective Acceleration Unit (Cau) - IBM Power Systems 775 Manual

For aix and linux hpc solution
Table of Contents

Advertisement

1.4.3 Collective acceleration unit
The hub chip provides specialized hardware that is called the
(CAU) to accelerate frequently used collective operations.
Collective operations
Collective operations are distributed operations that operate across a tree. Many HPC
applications perform collective operations with the application that make forward progress
after every compute node that completed its contribution and after the results of the collective
operation are delivered back to every compute node (for example, barrier synchronization,
and global sum).
A specialized arithmetic-logic unit (ALU) within the collective CAU implements reduction,
barrier, and reduction operations. For reduce operations, the ALU supports the following
operations and data types:
Fixed point: NOP, SUM, MIN, MAX, OR, ANDS, signed and unsigned XOR
Floating point: MIN, MAX, SUM, single and double precision PROD
There is one CAU in each hub chip, which is one CAU per four POWER7 chips, or one CAU
per 32 C1 cores.
Software organizes the CAUs in the system collective trees. The arrival of an input on one link
causes its forwarding on all other links when there is a broadcast operation. For reduce
operation, arrivals on all but one link causes the reduction result to forward to the remaining
links.
A link in the CAU tree maps to a path composed of more than one link in the network. The
system supports many trees simultaneously and each CAYU supports 64 independent trees.
The usage of sequence numbers and a retransmission protocol enables reliability and
pipelining. Each tree has only one participating HFI window on any involved node. The order
in which the reduction operation is evaluated is preserved from one run to another, which
benefits programming models that allow programmers to require that collective operations are
executed in a particular order, such as MPI.
Package propagation
As shown Figure 1-7 on page 14, a CAU receive packets from the following sources:
The memory of a remote node is inserted into the cluster network by the HFI of the remote
node
The memory of a local node is inserted into the cluster network by the HFI of the local
node
A remote CAU
Chapter 1. Understanding the IBM Power Systems 775 Cluster
Collective Acceleration Unit
13

Advertisement

Table of Contents
loading

Table of Contents