12.07.2015 Views

PGI User's Guide

PGI User's Guide

PGI User's Guide

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>PGI</strong> Unified Binary for Acceleratorsfastmathkeepbinkeepgpukeepptxmaxregcount:nmul24nofmatime[no]waitUse routines from the fast math library.Keep the binary (.bin) files.Keep the kernel source (.gpu) files.Keep the portable assembly (.ptx) file for the GPU code.Specify the maximum number of registers to use on the GPU.Leaving this blank indicates no limit.Use 24-bit multiplication for subscripting.Do not generate fused multiply-add instructions.Link in a limited-profiling library, as described in “ProfilingAccelerator Kernels,” on page 102.Wait for each kernel to finish before continuing in the host program.100hostSelect NO accelerator target. Generate <strong>PGI</strong> Unified Binary Code, as described in “<strong>PGI</strong> Unified Binary forAccelerators,” on page 100.The compiler automatically invokes the necessary CUDA software tools to create the kernel code and embedsthe kernels in the object file.NoteTo access accelerator libraries, you must link an accelerator program with the –ta flag.<strong>PGI</strong> Unified Binary for AcceleratorsNoteThe information and capabilities described in this section are only supported for 64-bit systems.<strong>PGI</strong> compilers support the <strong>PGI</strong> Unified Binary feature to generate executables with functions optimizedfor different host processors, all packed into a single binary. This release extends the <strong>PGI</strong> Unified Binarytechnology for accelerators. Specifically, you can generate a single binary that includes two versions offunctions:• one is optimized for the accelerator• one runs on the host processor when the accelerator is not available or when you want to compare host toaccelerator execution.To enable this feature, use the extended –ta flag:-ta=nvidia,hostThis flag tells the compiler to generate two versions of functions that have valid accelerator regions.• A compiled version that targets the accelerator.• A compiled version that ignores the accelerator directives and targets the host processor.

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!