Mp_Task_Affinity; Mp_Css_Interrupt; Mpi-Io - IBM pSeries Tuning Manual

High performance switch tuning and debug guide
Table of Contents

Advertisement

2.1.5 MP_TASK_AFFINITY

Setting MP_TASK_AFFINITY to SNI tells parallel operating environment (POE) to bind each
task to the MCM containing the HPS adapter it will use, so that the adapter, CPU, and memory
used by any task are all local to the same MCM. To prevent multiple tasks from sharing the same
CPU, do not set MP_TASK_AFFINITY to SNI if more than four tasks share any HPS adapter.
If more than four tasks share any HPS adapter, set MP_TASK_AFFINITY to MCM, which allows
each MPI task to use CPUs and memory from the same MCM, even if the adapter is on a remote
MCM. If MP_TASK_AFFINITY is set to either MCM or SNI, MEMORY_AFFINITY should be
set to MCM.

2.1.6 MP_CSS_INTERRUPT

The MP_CSS_INTERRUPT variable allows you to control interrupts triggered by packet arrivals.
Setting this variable to no implies that the application should run in polling mode. This setting is
appropriate for applications that have mostly synchronous communication. Even applications that
make heavy use of MPI_ISEND/MPI_IRECV should be considered synchronous unless there is
significant computation between the ISEND/IRECV postings and the MPI_WAITALL. The
default value for MP_CSS_INTERRUPT is no.
For applications with an asynchronous communication pattern (one that uses non-blocking MPI
calls), it might be more appropriate to set this variable to yes. Setting MP_CSS_INTERRUPT to
yes can cause your application to be interrupted when new packets arrive, which could be
helpful if a receiving MPI task is likely to be in the middle of a long numerical computation at the
time when data from a remote-blocking send arrives.

2.2 MPI-IO

The most effective use of MPI-IO is when an application takes advantage of file views and
collective operations to read or write a file in which data for each task is dispersed across the file.
To simplify we focus on read, but write is similar.
An example is reading a matrix with application-wide scope from a single file, with each task
needing a different fragment of that matrix. To bring in the fragment needed for each task,
several disjoint chunks must be read. If every task were to do POSIX read of each chunk, the
GPFS file system handle it correctly. However, because each read() is independent, there is little
chance to apply an effective strategy.
When the same set of reads is done with collective MPI-IO, every task specifies all the chunks it
needs to one MPI-IO call. Because the call is collective, the requirements of all the tasks are
known at one time. As a result, MPI can use a broad strategy for doing the I/O.
When MPI-IO is used but each call to read or write a file is local or specifies only a single chunk
of data, there is much less chance for MPI-IO to do anything more than a simple POSIX read()
would do. Also, when the file is organized by task rather than globally, there is less MPI-IO can
do to help. This is the case when each task's fragment of the matrix is stored contiguously in the
file rather than having the matrix organized as a whole.
pshpstuningguidewp040105.doc
Page 7

Advertisement

Table of Contents
loading

This manual is also suitable for:

@server pseries

Table of Contents