Reduction; Creating The Offload Version; Intel® Xeon Phi™ Coprocessor Developer - Intel Xeon Phi Developer's Quick Start Manual

Coprocessor
Table of Contents

Advertisement

Note: Although, the user may specify the region of code to run on the target, there is no guarantee of
execution on the Intel® Xeon Phi™ Coprocessor. Depending on the presence of the target hardware or the
availability of resources on the Intel® Xeon Phi™ Coprocessor when execution reaches the region of code
marked for offload, the code can run on the Intel® Xeon Phi™ Coprocessor or not.
The following code samples show several versions of porting reduction code to the Intel® Xeon Phi™
Coprocessor using the offload pragma directive.

Reduction

The operation refers to computing the expression:
ans = a[0] + a[1] + ... + a[n-1]
Host Version:
The following sample code shows the C code to implement this version of the reduction.
float reduction(float *data, int size)
{
float ret = 0.f;
for (int i=0; i<size; ++i)
{
ret += data[i];
}
return ret;
}

Creating the Offload Version

Serial Reduction with Offload
The programmer uses #pragma offload target(mic) (as shown in the example below) to mark statements
(offload constructs) that should execute on the Intel® Xeon Phi™ Coprocessor. The offloaded region is defined
as the offload construct plus the additional regions of code that run on the target as the result of function
calls. Execution of the statements on the host will resume once the statements on the target have executed
and the results are available on the host (i.e. the offload will block, although there is a version of this pragma
that allows asynchronous execution). The in, out, and inout clauses specify the direction of data to be
transferred between the host and the target.
Variables used within an offloaded construct that are declared outside the scope of the construct (including
the file-scope) are copied (by default) to the target before execution on the target begins and copied back to
the host on completion.
For example, in the code below, the variable ret is automatically copied to the target before execution on the
target and copied back to the host on completion. The offloaded code below is executed by a single thread on
a single Intel® MIC Architecture core.
Intel® Xeon Phi™ Coprocessor D
Code Example 1: Implementing Reduction Code in C/C++
'
Q
S
G
EVELOPER
S
UICK
TART
UIDE
18

Advertisement

Table of Contents
loading

Table of Contents