The salloc allocates 16 nodes and runs one copy of shmemrun on the first
allocated node which then creates the SHMEM processes. shmemrun invokes
mpirun, and mpirun determines the correct set of hosts and required number of
processes based on the slurm allocation that it is running inside of. Since
shmemrun is used in this approach there is no need for the user to set up the
environment.
No Integration
This approach allows a job to be launched inside a slurm allocation but with no
integration. This approach can be used for any supported MPI implementation.
However, it requires that a wrapper script is used to generate the hosts file. slurm
is used to allocate nodes for the job, and the job runs within that allocation but not
under the control of the slurm daemon. One way to use this approach is:
salloc -N 16 shmemrun_wrapper shmem-test-world
Where shmemrun_wrapper is a user-provided wrapper script that creates a
hosts file based on the current slurm allocation and simply invokes mpirun with
the hosts file and other appropriate options. Note that ssh/rsh will be used for
starting processes not slurm.
Sizing Global Shared Memory
SHMEM provides shmalloc, shrealloc and shfree calls to allocate and
release memory using a symmetric heap. These functions are called collectively
across the processing elements (PEs) so that the memory is managed
symmetrically across them. The extent of the symmetric heap determines the
amount of global shared memory per PE that is available to the application.
This is an important resource and this section discusses the mechanisms
available to size it. Applications can access this memory in various ways and this
maps into quite different access mechanisms:
Accessing global shared memory on my PE: This is achieved by direct loads
and stores to the memory.
Accessing global shared memory on a PE on the same host: This is
achieved by mapping the global shared memory using the local shared
memory mechanisms (for example, System V shared memory) operating
system and then accessing the memory by direct loads and stores. This
means that each PE on a host needs to map the global shared memory of
each other PE on that host. These accesses do not use the adapter and
interconnect.
Accessing global shared memory on a PE on a different host: This is
achieved by sending put, get, and atomic requests across the interconnect.
IB0054606-02 A
6–SHMEM Description and Configuration
Sizing Global Shared Memory
6-9
Need help?
Do you have a question about the OFED+ Host and is the answer not in the manual?