Intel Xeon Phi Developer's Quick Start Manual

Intel Xeon Phi Developer's Quick Start Manual

Coprocessor
Table of Contents

Advertisement

Quick Links

White Paper
Intel® Xeon Phi™ Coprocessor
D
'
Q
S
G
EVELOPER
S
UICK
TART
UIDE
Version 1.7

Advertisement

Table of Contents
loading
Need help?

Need help?

Do you have a question about the Xeon Phi and is the answer not in the manual?

Questions and answers

Subscribe to Our Youtube Channel

Summary of Contents for Intel Xeon Phi

  • Page 1 White Paper Intel® Xeon Phi™ Coprocessor ’ EVELOPER UICK TART UIDE Version 1.7...
  • Page 2: Table Of Contents

    Steps to install the Software Development tools ........................9 Updating an Existing System ..................................10 Updating a system that already has an Intel® Xeon Phi™ Coprocessor ................ 10 Regaining Access to the Intel® Xeon Phi™ Coprocessor after Reboot ................... 11 Restarting the Intel®...
  • Page 3 Parallel Programming Options on the Intel® Xeon Phi™ Coprocessor ..................22 Parallel Programming on the Intel® Xeon Phi™ Coprocessor: OpenMP* ................ 22 Parallel Programming on the Intel® Xeon Phi™ Coprocessor: OpenMP* + Intel® Cilk™ Plus Extended Array Notation ..........................................23 Parallel Programming on the Intel®...
  • Page 4: Introduction

    1. Walk you through the Intel® Manycore Platform Software Stack (Intel® MPSS) installation. 2. Introduce the build environment for software enabled to run on Intel® Xeon Phi™ Coprocessor. 3. Give an example of how to write code for Intel® Xeon Phi™ Coprocessor and build using Intel® Composer XE 2013 SP1.
  • Page 5: System Configuration

    Intel® Xeon Phi™ Coprocessor. The offload compilers can generate binaries that will run only on the host, only on the Intel® Xeon Phi™ Coprocessor, or paired binaries that run on both the host and the Intel® Xeon Phi™ Coprocessor and communicate with each other.
  • Page 6 UICK TART UIDE Device Driver: At the bottom of the software stack in kernel space is the Intel® Xeon Phi™ Coprocessor  device driver. The device driver is responsible for managing device initialization and communication between the host and target devices.
  • Page 7: Intel® Many Integrated Core Architecture Overview

    To support the new vector processing model, a new 512-bit SIMD ISA was introduced. The VPU is a key feature of the Intel® MIC Architecture-based cores. Fully utilizing the vector unit is critical for best Intel® Xeon Phi™ Coprocessor performance. It is important to note that Intel® MIC Architecture cores do not support other SIMD ISAs (such as MMX™, Intel®...
  • Page 8: Administrative Tasks

    7. Reboot the system. 8. Start the Intel® Xeon Phi™ Coprocessor (while you can set up the card to start with the host system, it will not do so by default), and then run “micinfo” to verify that it is set up properly: sudo service mpss start sudo micctrl –w...
  • Page 9: Steps To Install The Software Development Tools

    Documentation, you can get the Install Guide, Getting Started Guide and Release Notes documents. 1. Follow the instructions in the Install Guide to install the Intel Cluster Studio XE for Linux*. If you bought the Intel C++ Composer XE for Linux, or the Intel Fortran Composer XE for Linux only, read the corresponding Install Guide to install these packages, as well as separately installing Intel®...
  • Page 10: Updating An Existing System

    “setenv H_TRACE 2” or “export H_TRACE=2” to display the dialog between the Host and Intel® Xeon Phi™ Coprocessor (messages from the processor will be prefixed with “MIC:”). If you do see dialog then everything is running fine and the system is ready for general use.
  • Page 11: Regaining Access To The Intel® Xeon Phi™ Coprocessor After Reboot

    4. Reboot the system 5. Start the Intel® Xeon Phi™ Coprocessor (while you can set up the card to start with the host system, it will not do so by default), and then run “micinfo” to verify that it is set up properly:...
  • Page 12: Monitoring The Intel® Xeon Phi™ Coprocessor

    Running an Intel® Xeon Phi™ Coprocessor program from the host system It is possible to copy an Intel® MIC Architecture native binary to a specified Intel® Xeon Phi™ Coprocessor and execute it using the “micnativeloadex” utility. This utility conveniently copies library dependencies to the...
  • Page 13: Working Directly With The Uos Environment Intel® Xeon Phi™ Coprocessor

    The offload language extensions allow you to port sections of your code (written in C/C++ or FORTRAN) to run on the Intel® Xeon Phi™ Coprocessor, or you can port your entire application to the Intel® MIC Architecture. Best performance will only be attained with highly parallel applications that also use SIMD...
  • Page 14: Available Software Development Tools / Environments

    Intel® Xeon Phi™ Coprocessor; rather, the familiar host-based Intel tools have been extended to add support for the Intel® MIC Architecture via a few additions to standard languages and APIs. However, to make best use of the development tools and to get best performance from the Intel®...
  • Page 15: Documentation And Sample Code

    Intel® C++ Compiler XE 13.0 SP1 and the Intel® Fortran Compiler XE 2013 SP1.  Most information on how to build for the Intel® MIC Architecture can be found in the “Key Features/Intel® MIC Architecture” section under “Programming for the Intel® MIC Architecture”...
  • Page 16: Build-Related Information

    Compiler Switches and Makefiles When building applications that offload some of their code to the Intel® Xeon Phi™ Coprocessor, it is possible to cause the offloaded code to be built with different compiler options from the host code. The method of passing these options to the compiler is documented in the compiler documentation under the “Compiler...
  • Page 17: Debugging During Runtime

    Details can be found in the compiler documentation in the “Compilation/Setting Environment Variables” section. Where to Get More Help You can visit the Forum on the Intel® Xeon Phi™ Coprocessor to post questions. It can be found at the http://software.intel.com/en-us/forums/intel-many-integrated-core Using the Offload Compiler –...
  • Page 18: Reduction

    Note: Although, the user may specify the region of code to run on the target, there is no guarantee of execution on the Intel® Xeon Phi™ Coprocessor. Depending on the presence of the target hardware or the availability of resources on the Intel® Xeon Phi™ Coprocessor when execution reaches the region of code marked for offload, the code can run on the Intel®...
  • Page 19: Asynchronous Offload And Data Transfer

    Vector Reduction with Offload Each core on the Intel® Xeon Phi™ Coprocessor has a VPU. The auto vectorization option is enabled by default on the offload compiler. Alternately, as seen in the example below, the programmer can use the Intel® Cilk™...
  • Page 20 _Offload_shared_aligned_free(void *p); It should be noted that this is not actually “shared memory”: there is no hardware that maps some portion of the memory on the Intel® Xeon Phi™ Coprocessor to the host system. The memory subsystems on the coprocessor and host are completely independent, and this programming model is just a different way of copying data between these memory subsystems at well-defined synchronization points.
  • Page 21: Native Compilation

    Reference Guide” shows some restrictions of using this programming model. Native Compilation Applications can also be run natively on the Intel® Xeon Phi™ Coprocessor, in which case the coprocessor will be treated as a standalone multicore computer. Once the binary is built on the host system, copy the binary and other related binaries or data to the Intel®...
  • Page 22: Parallel Programming Options On The Intel® Xeon Phi™ Coprocessor

    OpenMP* semantics of shared and private data apply. Multiple host CPU threads can offload to the Intel® Xeon Phi™ coprocessor at any time. If a CPU thread attempts to offload to the Intel® Xeon Phi™ Coprocessor and resources are not available on the coprocessor, the code meant to be offloaded may be executed on the host.
  • Page 23: Parallel Programming On The Intel® Xeon Phi™ Coprocessor: Openmp* + Intel® Cilk™ Plus Extended Array Notation

    Parallel Programming on the Intel® Xeon Phi™ Coprocessor: OpenMP* + Intel® Cilk™ Plus Extended Array Notation The following code sample further extends the OpenMP example to use Intel Cilk Plus Extended Array Notation. In the following code sample, each thread uses the Intel Cilk Plus Extended Array Notation __sec_reduce_add() built-in reduction function to use all 32 of the Intel®...
  • Page 24: Parallel Programming On The Intel® Xeon Phi™ Coprocessor: Intel® Cilk™ Plus

    Parallel Programming on the Intel® Xeon Phi™ Coprocessor: Intel® Cilk™ Plus Intel Cilk Plus header files are not available on the target environment by default. To make the header files available to an application built for the Intel® MIC Architecture using Intel Cilk Plus, wrap the header files with...
  • Page 25: Parallel Programming On Intel® Xeon Phi™ Coprocessor: Intel® Threading Building Blocks (Intel® Tbb)

    Parallel Programming on Intel® Xeon Phi™ Coprocessor: Intel® Threading Building Blocks (Intel® TBB) Like Intel Cilk Plus, the Intel TBB header files are not available on the target environment by default. They are made available to the Intel® MIC Architecture target environment using similar techniques: #pragma offload_attribute (push,target(mic)) #include "tbb/task_scheduler_init.h"...
  • Page 26: Using Intel Mkl

    Using Intel ® For offload users, Intel MKL is most commonly used in Native Acceleration (NAcc) mode on the Intel® Xeon Phi™ Coprocessor. In NAcc, all data and binaries reside on the Intel® Xeon Phi™ Coprocessor. Data is transferred by the programmer through offload compiler pragmas and semantics to be used by Intel MKL calls within an offloaded region or function.
  • Page 27: Sgemm Sample

    Code Example 14: Sending the Data to the Intel® Xeon Phi™ Coprocessor Step 3: Call sgemm inside the offload section to use the “Native Acceleration” version of Intel® MKL on the Intel® Xeon Phi™ Coprocessor. The nocopy() qualifier causes the data copied to the card in step 2 to be reused.
  • Page 28: Intel® Mkl Automatic Offload Model

    Code Example 16: Set the Copied Memory Free As with Intel® MKL on any platform, it is possible to limit the number of threads it uses by setting the number of allowed OpenMP threads before executing the MKL function within the offloaded code.
  • Page 29: Debugging On The Intel® Xeon Phi™ Coprocessor

    Performance Analysis on the Intel® Xeon Phi™ Coprocessor Information on collecting performance data on the Intel® Xeon Phi™ Coprocessor using Intel® VTune™ Amplifier XE for Linux* can be found in Section Getting Started -> Intel Xeon Phi Coprocessor Analysis Workflow, located in /opt/intel/vtune_amplifier_xe_2013/documentation/help/index.htm...
  • Page 30: About The Authors

    Bachelor’s degree in Electronics Engineering from Mumbai University, India in 2009 and a Master’s degree in Computer Engineering from Clemson University in December 2012. He joined Intel in 2012 and been working as an Software Engineer, focusing on developing collateral for Intel® Xeon Phi™ coprocessor.
  • Page 31: Notices

    Intel literature, may be obtained by calling 1-800-548-4725, or go http://www.intel.com/design/literature.htm Intel, the Intel logo, Cilk, Xeon and Intel Xeon Phi are trademarks of Intel Corporation in the U.S. and other countries. *Other names and brands may be claimed as the property of others...
  • Page 32: Performance Notice

    Optimization Notice Optimization Notice Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel.

Table of Contents