Steps to install the Software Development tools ........................9 Updating an Existing System ..................................10 Updating a system that already has an Intel® Xeon Phi™ Coprocessor ................ 10 Regaining Access to the Intel® Xeon Phi™ Coprocessor after Reboot ................... 11 Restarting the Intel®...
Page 3
Parallel Programming Options on the Intel® Xeon Phi™ Coprocessor ..................22 Parallel Programming on the Intel® Xeon Phi™ Coprocessor: OpenMP* ................ 22 Parallel Programming on the Intel® Xeon Phi™ Coprocessor: OpenMP* + Intel® Cilk™ Plus Extended Array Notation ..........................................23 Parallel Programming on the Intel®...
1. Walk you through the Intel® Manycore Platform Software Stack (Intel® MPSS) installation. 2. Introduce the build environment for software enabled to run on Intel® Xeon Phi™ Coprocessor. 3. Give an example of how to write code for Intel® Xeon Phi™ Coprocessor and build using Intel® Composer XE 2013 SP1.
Intel® Xeon Phi™ Coprocessor. The offload compilers can generate binaries that will run only on the host, only on the Intel® Xeon Phi™ Coprocessor, or paired binaries that run on both the host and the Intel® Xeon Phi™ Coprocessor and communicate with each other.
Page 6
UICK TART UIDE Device Driver: At the bottom of the software stack in kernel space is the Intel® Xeon Phi™ Coprocessor device driver. The device driver is responsible for managing device initialization and communication between the host and target devices.
To support the new vector processing model, a new 512-bit SIMD ISA was introduced. The VPU is a key feature of the Intel® MIC Architecture-based cores. Fully utilizing the vector unit is critical for best Intel® Xeon Phi™ Coprocessor performance. It is important to note that Intel® MIC Architecture cores do not support other SIMD ISAs (such as MMX™, Intel®...
7. Reboot the system. 8. Start the Intel® Xeon Phi™ Coprocessor (while you can set up the card to start with the host system, it will not do so by default), and then run “micinfo” to verify that it is set up properly: sudo service mpss start sudo micctrl –w...
Documentation, you can get the Install Guide, Getting Started Guide and Release Notes documents. 1. Follow the instructions in the Install Guide to install the Intel Cluster Studio XE for Linux*. If you bought the Intel C++ Composer XE for Linux, or the Intel Fortran Composer XE for Linux only, read the corresponding Install Guide to install these packages, as well as separately installing Intel®...
“setenv H_TRACE 2” or “export H_TRACE=2” to display the dialog between the Host and Intel® Xeon Phi™ Coprocessor (messages from the processor will be prefixed with “MIC:”). If you do see dialog then everything is running fine and the system is ready for general use.
4. Reboot the system 5. Start the Intel® Xeon Phi™ Coprocessor (while you can set up the card to start with the host system, it will not do so by default), and then run “micinfo” to verify that it is set up properly:...
Running an Intel® Xeon Phi™ Coprocessor program from the host system It is possible to copy an Intel® MIC Architecture native binary to a specified Intel® Xeon Phi™ Coprocessor and execute it using the “micnativeloadex” utility. This utility conveniently copies library dependencies to the...
The offload language extensions allow you to port sections of your code (written in C/C++ or FORTRAN) to run on the Intel® Xeon Phi™ Coprocessor, or you can port your entire application to the Intel® MIC Architecture. Best performance will only be attained with highly parallel applications that also use SIMD...
Intel® Xeon Phi™ Coprocessor; rather, the familiar host-based Intel tools have been extended to add support for the Intel® MIC Architecture via a few additions to standard languages and APIs. However, to make best use of the development tools and to get best performance from the Intel®...
Intel® C++ Compiler XE 13.0 SP1 and the Intel® Fortran Compiler XE 2013 SP1. Most information on how to build for the Intel® MIC Architecture can be found in the “Key Features/Intel® MIC Architecture” section under “Programming for the Intel® MIC Architecture”...
Compiler Switches and Makefiles When building applications that offload some of their code to the Intel® Xeon Phi™ Coprocessor, it is possible to cause the offloaded code to be built with different compiler options from the host code. The method of passing these options to the compiler is documented in the compiler documentation under the “Compiler...
Details can be found in the compiler documentation in the “Compilation/Setting Environment Variables” section. Where to Get More Help You can visit the Forum on the Intel® Xeon Phi™ Coprocessor to post questions. It can be found at the http://software.intel.com/en-us/forums/intel-many-integrated-core Using the Offload Compiler –...
Note: Although, the user may specify the region of code to run on the target, there is no guarantee of execution on the Intel® Xeon Phi™ Coprocessor. Depending on the presence of the target hardware or the availability of resources on the Intel® Xeon Phi™ Coprocessor when execution reaches the region of code marked for offload, the code can run on the Intel®...
Vector Reduction with Offload Each core on the Intel® Xeon Phi™ Coprocessor has a VPU. The auto vectorization option is enabled by default on the offload compiler. Alternately, as seen in the example below, the programmer can use the Intel® Cilk™...
Page 20
_Offload_shared_aligned_free(void *p); It should be noted that this is not actually “shared memory”: there is no hardware that maps some portion of the memory on the Intel® Xeon Phi™ Coprocessor to the host system. The memory subsystems on the coprocessor and host are completely independent, and this programming model is just a different way of copying data between these memory subsystems at well-defined synchronization points.
Reference Guide” shows some restrictions of using this programming model. Native Compilation Applications can also be run natively on the Intel® Xeon Phi™ Coprocessor, in which case the coprocessor will be treated as a standalone multicore computer. Once the binary is built on the host system, copy the binary and other related binaries or data to the Intel®...
OpenMP* semantics of shared and private data apply. Multiple host CPU threads can offload to the Intel® Xeon Phi™ coprocessor at any time. If a CPU thread attempts to offload to the Intel® Xeon Phi™ Coprocessor and resources are not available on the coprocessor, the code meant to be offloaded may be executed on the host.
Parallel Programming on the Intel® Xeon Phi™ Coprocessor: OpenMP* + Intel® Cilk™ Plus Extended Array Notation The following code sample further extends the OpenMP example to use Intel Cilk Plus Extended Array Notation. In the following code sample, each thread uses the Intel Cilk Plus Extended Array Notation __sec_reduce_add() built-in reduction function to use all 32 of the Intel®...
Parallel Programming on the Intel® Xeon Phi™ Coprocessor: Intel® Cilk™ Plus Intel Cilk Plus header files are not available on the target environment by default. To make the header files available to an application built for the Intel® MIC Architecture using Intel Cilk Plus, wrap the header files with...
Parallel Programming on Intel® Xeon Phi™ Coprocessor: Intel® Threading Building Blocks (Intel® TBB) Like Intel Cilk Plus, the Intel TBB header files are not available on the target environment by default. They are made available to the Intel® MIC Architecture target environment using similar techniques: #pragma offload_attribute (push,target(mic)) #include "tbb/task_scheduler_init.h"...
Using Intel ® For offload users, Intel MKL is most commonly used in Native Acceleration (NAcc) mode on the Intel® Xeon Phi™ Coprocessor. In NAcc, all data and binaries reside on the Intel® Xeon Phi™ Coprocessor. Data is transferred by the programmer through offload compiler pragmas and semantics to be used by Intel MKL calls within an offloaded region or function.
Code Example 14: Sending the Data to the Intel® Xeon Phi™ Coprocessor Step 3: Call sgemm inside the offload section to use the “Native Acceleration” version of Intel® MKL on the Intel® Xeon Phi™ Coprocessor. The nocopy() qualifier causes the data copied to the card in step 2 to be reused.
Code Example 16: Set the Copied Memory Free As with Intel® MKL on any platform, it is possible to limit the number of threads it uses by setting the number of allowed OpenMP threads before executing the MKL function within the offloaded code.
Performance Analysis on the Intel® Xeon Phi™ Coprocessor Information on collecting performance data on the Intel® Xeon Phi™ Coprocessor using Intel® VTune™ Amplifier XE for Linux* can be found in Section Getting Started -> Intel Xeon Phi Coprocessor Analysis Workflow, located in /opt/intel/vtune_amplifier_xe_2013/documentation/help/index.htm...
Bachelor’s degree in Electronics Engineering from Mumbai University, India in 2009 and a Master’s degree in Computer Engineering from Clemson University in December 2012. He joined Intel in 2012 and been working as an Software Engineer, focusing on developing collateral for Intel® Xeon Phi™ coprocessor.
Intel literature, may be obtained by calling 1-800-548-4725, or go http://www.intel.com/design/literature.htm Intel, the Intel logo, Cilk, Xeon and Intel Xeon Phi are trademarks of Intel Corporation in the U.S. and other countries. *Other names and brands may be claimed as the property of others...
Optimization Notice Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel.
Need help?
Do you have a question about the Xeon Phi and is the answer not in the manual?
Questions and answers