09.05.2013 Views

aocl_programming_guide

aocl_programming_guide

aocl_programming_guide

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

OCL002-13.0.0<br />

101 Innovation Drive<br />

San Jose, CA 95134<br />

www.altera.com<br />

May 2013 Altera Corporation<br />

Altera SDK for OpenCL<br />

Programming Guide<br />

This document describes the content and functionality of the Altera ® SDK for<br />

OpenCL version 13.0. The Altera SDK for OpenCL is a C-based heterogeneous<br />

parallel <strong>programming</strong> environment for Altera field programmable gate arrays<br />

(FPGAs). This document assumes that you have read the Altera SDK for OpenCL<br />

Getting Started Guide and have performed the following tasks:<br />

■ Download and install the Quartus II software version 13.0.<br />

■ Install your Stratix V FPGA board. You must download and install all necessary<br />

device support software.<br />

■ Download the Altera SDK for OpenCL version 13.0.<br />

■ Install the Altera SDK for OpenCL version 13.0.<br />

■ Install the USB-Blaster and the PCI Express ® (PCIe ® ) drivers.<br />

f If you have not performed the tasks described above, refer to the Altera Software<br />

Installation and Licensing Manual for installation instructions on the Quartus II<br />

software. Refer to the Altera SDK for OpenCL Getting Started Guide for installation<br />

instructions on the Altera SDK for OpenCL.<br />

This document assumes you are knowledgeable in OpenCL concepts and application<br />

<strong>programming</strong> interfaces (APIs), as described in the Khronos OpenCL Specification<br />

version 1.0. In addition, this <strong>programming</strong> <strong>guide</strong> assumes that you are familiar with<br />

specific sections and clauses of the OpenCL specification. For more information about<br />

the OpenCL Specification version 1.0, refer to OpenCL Reference Pages on the<br />

Khronos Group website.<br />

About the Altera SDK for OpenCL<br />

The Altera SDK for OpenCL provides a platform that allows you to implement the<br />

Embedded Profile as described in Section 10 of the OpenCL Specification version 1.0,<br />

with key differences listed in “Appendix B”.<br />

Contents of the Altera SDK for OpenCL version 13.0<br />

The Altera SDK for OpenCL version 13.0 provides the following logical components:<br />

■ Compiler—The compiler translates your OpenCL C device code into a hardware<br />

configuration file that can be loaded onto an Altera FPGA.<br />

■ Altera SDK for OpenCL (AOCL) Utility —The AOCL utility includes a set of<br />

commands that allows you to perform high-level tasks. Refer to “Altera SDK for<br />

OpenCL Utility” on page 8 for more information.<br />

© 2013 Altera Corporation. All rights reserved. ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS,<br />

QUARTUS and STRATIX words and logos are trademarks of Altera Corporation and registered in the U.S. Patent and Trademark<br />

Office and in other countries. OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission of Khronos. * The<br />

Altera SDK for OpenCL is based on a published Khronos Specification, and is expected to pass the Khronos Conformance Testing<br />

Process. Current conformance status can be found at www.khronos.org/conformance. All other words and logos identified as<br />

trademarks or service marks are the property of their respective holders as described at www.altera.com/common/legal.html.<br />

Altera warrants performance of its semiconductor products to current specifications in accordance with Altera's standard<br />

warranty, but reserves the right to make changes to any products and services at any time without notice. Altera assumes no<br />

responsibility or liability arising out of the application or use of any information, product, or service described herein except as<br />

expressly agreed to in writing by Altera. Altera customers are advised to obtain the latest version of device specifications before<br />

ISO<br />

9001:2008<br />

Registered<br />

Feedback Subscribe


Page 2 Example OpenCL Applications<br />

Altera SDK for OpenCL<br />

Programming Guide<br />

■ Host Runtime—The host runtime provides the OpenCL host platform API and<br />

runtime API for your OpenCL host application. The host runtime consists of the<br />

following libraries:<br />

■ Statically-linked libraries provide OpenCL host APIs, hardware abstractions<br />

and helper libraries<br />

■ Dynamically-linked libraries (DLLs) provide hardware abstractions and helper<br />

libraries<br />

On Windows and Linux machines, the installation process for the Altera SDK for<br />

OpenCL installs the SDK into a folder or directory referenced by the<br />

ALTERAOCLSDKROOT environment variable. The contents of the Altera SDK for OpenCL<br />

version 13.0 are summarized in Table 1.<br />

Table 1. Contents of the Altera SDK for OpenCL for Windows and Linux<br />

Windows<br />

Folder<br />

Example OpenCL Applications<br />

Linux<br />

Directory<br />

Description<br />

\windows64\bin /linux64/bin<br />

Shared libraries, executables and Windows DLLs for your development<br />

environment. Include this folder or directory in your PATH environment<br />

variable.<br />

The Jungo WinDriver for PCIe for Windows or the Altera OpenCL PCIe<br />

\windows64\driver /linux64/driver<br />

Driver for Linux allows your host computer to communicate with the<br />

FPGA board. Refer to the Altera SDK for OpenCL Getting Started Guide<br />

for installation instructions.<br />

\board /board<br />

The Altera Certified Board Program for OpenCL for each FPGA board<br />

supported by the SDK.<br />

\ip /ip IP cores used to compile device kernels.<br />

\host /host Files used to compile your program.<br />

OpenCL version 1.0 header files and Altera SDK for OpenCL interface files<br />

\host\include /host/include that are used to compile and link your host program. Add this path to the<br />

include file search path in your development environment.<br />

OpenCL host runtime libraries that comprise a generic layer and a<br />

\host\windows64\lib /host/linux64/lib hardware abstraction layer that interface the host CPU with the FPGA<br />

board via PCIe. Used to link your host program.<br />

— /linux64/lib<br />

Shared libraries for your development environment. Include this<br />

directory in your LD_LIBRARY_PATH environment variable.<br />

The Altera SDK for OpenCL includes the following example OpenCL applications. To<br />

access these example files, navigate to the<br />

ALTERAOCLSDKROOT/design/ folder or directory.<br />

■ fft—an OpenCL application of a fast fourier transform algorithm<br />

■ matrixMult—an OpenCL application of a single precision floating-point-blocked<br />

matrix multiplication algorithm<br />

■ moving_average—an OpenCL application of an averaging filter<br />

■ vectorAdd—an OpenCL application that adds two vectors<br />

May 2013 Altera Corporation


Altera SDK for OpenCL Compilation Flow Page 3<br />

Altera SDK for OpenCL Compilation Flow<br />

May 2013 Altera Corporation<br />

Compiling an OpenCL application with the SDK is a two-step process:<br />

1. Compile your OpenCL kernels. Refer to “Compiling the OpenCL Kernel Source<br />

File” on page 4.<br />

2. Compile and link your host program. Refer to “Compiling and Linking Your Host<br />

Program” on page 9.<br />

Figure 1 depicts the compilation flow of the Altera SDK for OpenCL.<br />

Figure 1. The Altera SDK for OpenCL Compilation Flow<br />

Kernel Code<br />

AOC OpenCL<br />

Compiler<br />

Kernel Binary<br />

Host Code<br />

Standard<br />

C Compiler<br />

Host Binary<br />

PCIe<br />

Altera SDK for OpenCL<br />

Programming Guide


Page 4 Using the Altera Offline Kernel Compiler<br />

Using the Altera Offline Kernel Compiler<br />

Altera SDK for OpenCL<br />

Programming Guide<br />

Figure 2 illustrates the compilation flow of an OpenCL application that targets an<br />

Altera FPGA.<br />

Figure 2. Compilation Flow of the Altera SDK for OpenCL Offline Kernel Compiler<br />

Kernel<br />

file 2<br />

Kernel<br />

file 1<br />

Kernel<br />

file 3<br />

Merged<br />

kernel<br />

source file<br />

You create an FPGA image of your kernel file using the Altera Offline Compiler<br />

(AOC). If you create multiple kernel files, you must consolidate them into a single<br />

kernel source file (.cl). The kernel compiler translates this kernel source file into an<br />

Altera Offline Compiler Object file (.aoco), which the compiler then uses to build a<br />

hardware configuration file, called the Altera Offline Compiler Executable file (.aocx),<br />

that targets the FPGA.<br />

Compiling the OpenCL Kernel Source File<br />

Altera Offline<br />

Kernel Compiler<br />

(AOC)<br />

FPGA<br />

image file<br />

The AOC builds your OpenCL kernels and creates your hardware configuration file.<br />

Table 2 lists the available aoc compilation options. By default, you type the command<br />

aoc .cl to compile your kernels and create your hardware<br />

configuration file in a single step.<br />

c The process of creating an FPGA hardware configuration file takes hours to complete.<br />

Altera recommends that you perform this hardware configuring step on a computer<br />

that is equipped with at least 24 gigabytes (GB) of RAM.<br />

Table 2. Available Compilation Options for the Altera Offline Compiler<br />

Command Description<br />

Default compilation option:<br />

Generates a Quartus II hardware design project and then<br />

creates the hardware configuration file. The raw FPGA<br />

aoc .cl<br />

<strong>programming</strong> file is the .aocx executable file in the project<br />

directory.<br />

May 2013 Altera Corporation


Using the Altera Offline Kernel Compiler Page 5<br />

May 2013 Altera Corporation<br />

Table 2. Available Compilation Options for the Altera Offline Compiler (Continued)<br />

A two-step compilation option:<br />

Command Description<br />

aoc -c .cl<br />

aoc .aoco<br />

Step 1:<br />

Generates a Quartus II hardware design project in a<br />

subdirectory that has the same name as the kernel filename,<br />

with special characters converted to underscores.The<br />

subdirectory contains intermediate files that the SDK uses to<br />

build the bit stream used to program the FPGA.<br />

This command generates the .aoco object file.<br />

Note:<br />

Generating the .aoco file is a rapid process compared to<br />

creating the .aocx file. To increase efficiency, iterate on your<br />

kernel code using this compilation option before generating a<br />

hardware configuration.<br />

Step 2:<br />

Creates the .aocx hardware configuration file from the .aoco<br />

object file.<br />

You may also perform the compilation in stages. For example, the kernel compiler<br />

builds the .aoco file for a vector addition OpenCL application shown in Example 1.<br />

You can access the kernel source file vectorAdd.cl by navigating to the<br />

ALTERAOCLSDKROOT/design/vectorAdd folder or directory.<br />

Example 1. vectorAdd.cl Kernel Source File<br />

// AOCL kernel for adding two input vectors<br />

__kernel void vectorAdd(__global const float *x,<br />

__global const float *y,<br />

__global float *restrict z,<br />

int numElements)<br />

{<br />

// get index of the work-item<br />

size_t index = get_global_id(0);<br />

if (index >= numElements)<br />

{<br />

return;<br />

}<br />

//add the vector elements<br />

z[index] = x[index] + y[index];<br />

}<br />

To create the .aoco file, type aoc -c vectorAdd.cl at the command prompt. When<br />

the first compilation step completes and you return to the command prompt, type<br />

aoc vectorAdd.aoco to create the hardware configuration file from the .aoco file.<br />

Altera SDK for OpenCL<br />

Programming Guide


Page 6 Using the Altera Offline Kernel Compiler<br />

Altera SDK for OpenCL<br />

Programming Guide<br />

By default, the command prompt appears to signify the completion of the<br />

compilation. If you want to review the kernel throughput analysis and the estimated<br />

resource usage summary reports for the vectorAdd kernel compilation, shown in<br />

Example 2, invoke the aoc -c vectorAdd.cl --report command.<br />

Example 2. Output of the aoc -c vectorAdd.cl --report Command<br />

aoc: Selected target board pcie385n_a7<br />

Kernel throughput analysis for : vectorAdd<br />

.. simd work items : 1<br />

.. compute units : 1<br />

.. throughput due to control flow analysis : 250.00 M work items/second<br />

.. kernel global memory bandwidth analysis : 3157.89 MB/second<br />

.. kernel number of local memory banks : none<br />

+--------------------------------------------------------------------+<br />

; Estimated Resource Usage Summary ;<br />

+----------------------------------------+---------------------------+<br />

; Resource + Usage ;<br />

+----------------------------------------+---------------------------+<br />

; Logic utilization ; 17% ;<br />

; Dedicated logic registers ; 7% ;<br />

; Memory blocks ; 16% ;<br />

; DSP blocks ; 0% ;<br />

+----------------------------------------+---------------------------;<br />

aoc: First stage compile time = 10s<br />

1 You can also view detailed information on the compilation by accessing the log file.<br />

The .log file is located in the folder or<br />

subdirectory.<br />

Altera Offline Compiler Options<br />

You can type the command aoc --help to obtain more information about the<br />

command options for the kernel compiler, as summarized in Table 3.<br />

Table 3. AOC Command Options (Part 1 of 3)<br />

Option Description<br />

Help options:<br />

--help or -h Displays a list of aoc command options.<br />

--version Prints out version information and exits.<br />

-v Reports the progress of compilation.<br />

--report Generates reports for kernel throughput analysis and estimated resource<br />

usage summary.<br />

Overall options:<br />

-c Sets up a Quartus II hardware design project without performing a<br />

hardware build.<br />

-o Uses as the name for the output.<br />

For example:<br />

To specify the filename of the output .aoco file, type the command:<br />

aoc -c -o .aoco myKernel.cl<br />

To specify the filename of the output .aocx file, type the command:<br />

aoc -o .aocx myKernel.cl, or<br />

aoc -o .aocx myKernel.aoco<br />

May 2013 Altera Corporation


Using the Altera Offline Kernel Compiler Page 7<br />

May 2013 Altera Corporation<br />

Table 3. AOC Command Options (Part 2 of 3)<br />

Option Description<br />

-I Adds to header search path.<br />

-D Defines a macro called , where can be a value or an actual<br />

name.<br />

For example, aoc myKernel.cl -D RADIUS = 4 is equivalent to<br />

adding a #define RADIUS 4 line at the top of your kernel<br />

source file.<br />

-W Suppresses warnings.<br />

-Werror Forces all warnings into errors.<br />

Modifiers:<br />

--board Runs compilation for the specified board.<br />

For example, the command<br />

aoc -- board pcie385n_a7 myKernel.cl compiles<br />

myKernel for the Nallatech PCIe-385N A7 FPGA computing card.<br />

--list-boards Prints a list of available boards and exits.<br />

Optimization controls:<br />

-fp-relaxed Directs the compiler to optimize floating-point operations using balanced<br />

trees.<br />

If you want to use this optimization option, your program must be able to<br />

tolerate small differences in floating-point results.<br />

Example usage: aoc -fp-relaxed=true myKernel.cl<br />

-fpc Directs the compiler to remove floating-point rounding operations and<br />

conversions when possible and to carry additional bits to maintain<br />

precision.<br />

If possible, this argument causes the compiler to round a floating-point<br />

operation once at the end of the balanced tree.<br />

Example usage: aoc -fpc=true myKernel.cl<br />

--sw-dimm-partition Configures the global memory as separate address spaces for each external<br />

memory or bank.<br />

After you configure the global memory, allocate each buffer in an external<br />

memory with the Altera specific cl_mem_flags (For example,<br />

CL_MEM_BANK_2_ALTERA).<br />

--const-cache-bytes Configures the constant cache size in bytes, rounded to the next nearest<br />

power of 2.<br />

This argument has no effect if none of the kernels uses the __constant<br />

address space. The default constant cache size is 16 kilobytes (kB).<br />

-O3 Applies resource-driven performance optimizations to improve<br />

performance without violations fitting requirements (estimated logic<br />

utilization is less than or equal to 85%).<br />

--util Overrides the logic utilization threshold used by the -O3 option with %.<br />

Altera SDK for OpenCL<br />

Programming Guide


Page 8 Using the Altera Offline Kernel Compiler<br />

Altera SDK for OpenCL<br />

Programming Guide<br />

Table 3. AOC Command Options (Part 3 of 3)<br />

OpenCL Build Options:<br />

-cl-single-precisionconstant<br />

-cl-denorms-are-zero<br />

-cl-opt-disable<br />

-cl-strict-aliasing<br />

-cl-mad-enable<br />

-cl-no-signed-zeros<br />

-cl-unsafe-mathoptimizations<br />

-cl-finite-math-only<br />

-cl-fast-relaxed-math<br />

Altera SDK for OpenCL Utility<br />

Option Description<br />

Refer to the OpenCL Specification version 1.0, section 5.4.3.<br />

The Altera SDK for OpenCL (AOCL) utility provides additional functionality that<br />

allows you to perform high-level tasks. You can obtain information on your host<br />

program such as flags and makefile fragments. You can invoke the <strong>aocl</strong> help<br />

command to obtain more information about the command options, as summarized in<br />

Table 4.<br />

Table 4. AOCL Command Options<br />

Option<br />

Subcommands for building your host program:<br />

Description<br />

<strong>aocl</strong> example-makefile Displays makefile fragments for compiling and linking a host program.<br />

<strong>aocl</strong> makefile Same as the example-makefile subcommand.<br />

<strong>aocl</strong> compile-config Displays the flags for compiling your host program.<br />

Shows the flags for linking your host program with the runtime libraries provided<br />

<strong>aocl</strong> link-config<br />

by the Altera SDK for OpenCL.<br />

This command combines the function of the ldflags and ldlibs<br />

subcomands.<br />

<strong>aocl</strong> linkflags Same as the link-config subcommand.<br />

Displays the linker flags used to link your host program to the host runtime<br />

<strong>aocl</strong> ldflags<br />

libraries provided by the Altera SDK for OpenCL. This command does not list the<br />

libraries themselves.<br />

<strong>aocl</strong> ldlibs Displays the list of host runtime libraries provided by the Altera SDK for OpenCL.<br />

<strong>aocl</strong> program<br />

Configures a new FPGA image onto the board.<br />

Usage: <strong>aocl</strong> program .aocx.<br />

<strong>aocl</strong> flash<br />

[If supported] Initializes the FPGA with a specified startup configuration.<br />

Usage: <strong>aocl</strong> flash .aocx.<br />

<strong>aocl</strong> install Installs your FPGA board into the current host system.<br />

<strong>aocl</strong> diagnostic Runs the board vendor’s test program for your FPGA board.<br />

May 2013 Altera Corporation


Compiling and Linking Your Host Program Page 9<br />

Compiling and Linking Your Host Program<br />

May 2013 Altera Corporation<br />

Table 4. AOCL Command Options (Continued)<br />

General:<br />

<strong>aocl</strong> version Displays the version information.<br />

<strong>aocl</strong> help Displays a list of <strong>aocl</strong> options.<br />

<strong>aocl</strong> help Displays the help content for a particular command.<br />

In an OpenCL application, the host program uses standard OpenCL runtime APIs to<br />

manage device configuration, data buffers, kernel launches, and event<br />

synchronization. The host program also contains host-side functions such as file I/O<br />

or portions of the source code that do not run on an accelerator device. A host<br />

program includes C header files that describe the OpenCL APIs. These C header files<br />

must link against the runtime API libraries.<br />

Host Program Compilation Settings<br />

To compile your host program, type <strong>aocl</strong> compile-config at the command line to<br />

obtain the following path you must add to your C preprocessor to include a file search<br />

path:<br />

■ For Windows systems, add the path -I/%ALTERAOCLSDKROOT%/host/include<br />

■ For Linux systems, add the path -I/$ALTERAOCLSDKROOT/host/include<br />

The OpenCL API header files reside in this directory. You should include the OpenCL<br />

header file CL/opencl.h in your host source.<br />

1 On Windows, you must add the /MD flag to compile the Altera runtime libraries. The<br />

flag links the runtime libraries against the multi-threaded DLL version of the<br />

Microsoft C Runtime library. You must also compile your host program with the /MD<br />

compilation flag, or use the /NODEFAULTLIB linker option to override the selection of<br />

runtime library.<br />

Library Paths and Links<br />

Your host program must link against the appropriate runtime libraries provided by<br />

the SDK. To obtain the link options, type the command <strong>aocl</strong> link-config.<br />

Programming the FPGA Hardware<br />

Option Description<br />

By default, the AOC compiles the OpenCL kernels, and then the kernels are loaded<br />

onto the FPGA via PCIe CvP using clCreateProgramWithBinary, as outlined in<br />

“Loading Kernels onto an FPGA using clCreateProgramWithBinary” on page 13.<br />

c You must ensure that the PCIe device inside your FPGA board is configured properly<br />

prior to using clCreateProgramWithBinary to program your FPGA hardware. To<br />

verify the successful instantiation of the PCIe device, type the command<br />

<strong>aocl</strong> diagnostic.<br />

Altera SDK for OpenCL<br />

Programming Guide


Page 10 Running Your OpenCL Application<br />

Altera SDK for OpenCL<br />

Programming Guide<br />

1 If the PCIe device on your FPGA board is not configured properly, refer to<br />

“Programming the Flash Memory of an FPGA” in “Appendix A” for<br />

<strong>programming</strong> instructions.<br />

If you want to program your FPGA manually without using the host runtime API, via<br />

a non-standard use of clCreateProgramWithBinary, refer to “Programming the FPGA<br />

with the <strong>aocl</strong> program Command” in “Appendix A”.<br />

Running Your OpenCL Application<br />

After you load the FPGA hardware configuration file onto the FPGA, you can run the<br />

host application as outlined in the Altera SDK for OpenCL Getting Started Guide.<br />

OpenCL Programming Considerations<br />

When you create your kernel and host applications, keep in mind the following<br />

<strong>programming</strong> considerations:<br />

■ For OpenCL kernels, refer to “Kernel Programming Considerations”<br />

■ For host code, refer to “Host Programming Considerations”<br />

Kernel Programming Considerations<br />

Specifying a Target FPGA Board<br />

You can target any FPGA board that is provided in a board vendor-supplied “board<br />

package.” To select the board package that the compiler uses, set the<br />

AOCL_BOARD_PACKAGE_ROOT environment variable to the path of your board vendor's<br />

board package (that is, T/board/).<br />

For example, if you run the SDK on Linux, set the board package environment<br />

variable as shown below:<br />

■ To select the Nallatech PCIe-385N A7 FPGA computing card, set<br />

AOCL_BOARD_PACKAGE_ROOT to $ALTERAOCLSDKROOT/board/pcie385n<br />

■ To select the BittWare S5-PCIe-HQ Altera Stratix V GSMD5 half-length PCIe<br />

board, set AOCL_BOARD_PACKAGE_ROOT to $ALTERAOCLSDKROOT/board/s5pciehq<br />

Type aoc --list-boards to list the available boards within your current board<br />

package. You can then target any of the listed boards within that board package using<br />

the aoc --board command.<br />

File Scope __constant Address Space Qualifier<br />

The Altera SDK for OpenCL supports scalar file scope constant data. For example,<br />

you may set the __constant address space qualifier as follows:<br />

__constant int my_array[8] = {0x0, 0x1, 0x2, 0x3, 0x4, 0x5, 0x6, 0x7};<br />

__kernel void my_kernel (__global int * my_buffer)<br />

{<br />

size_t index = get_global_id(0);<br />

my_buffer[index] += my_array[index % 8];<br />

}<br />

May 2013 Altera Corporation


OpenCL Programming Considerations Page 11<br />

May 2013 Altera Corporation<br />

In this case, values for my_array are set in a ROM because the file scope constant data<br />

does not change between kernel invocations. To avoid long compilation times, file<br />

scope constant data should not exceed 32 kB in size.<br />

c Vector type file scope constant data is not supported. For example, the following line<br />

of code is illegal because the vector type int2 is not supported:<br />

__constant int2 my_array[4] = {(0x0, 0x1), (0x2, 0x3), (0x4, 0x5), (0x6,<br />

0x7)};<br />

Modifying Your OpenCL Kernel to Target an FPGA<br />

To modify a kernel that was originally written to target a different architecture,<br />

perform the following steps:<br />

1. Collect the source code from all your kernels into a single file. The name of the<br />

single file must have a .cl extension. For example, collect all the source code from<br />

all your kernel source code files into a single file and name that single file<br />

mykernels.cl.<br />

2. If your kernel uses barriers and your kernel must handle work-groups larger than<br />

256 work-items, specify the number of work-items you require with either<br />

reqd_work_group_size or max_work_group_size. (The default<br />

max_work_group_size is 256.)<br />

3. Eliminate function scope __constant variables. Replace them with a pointer to a<br />

__constant parameter. In your host program, allocate a cl_mem object in global<br />

memory, copy the constant data to global memory before the kernel starts, and<br />

pass the cl_mem object as a parameter to the kernel.<br />

If a constant variable is a complex type, Altera recommends that, for simplicity,<br />

you use a typedef argument. For example:<br />

Example 3. Eliminating Function Scope __constant Variables<br />

If your source code includes this: Rewrite it to this:<br />

__kernel void original( _global int *A)<br />

{<br />

__constant int Payoff[2][2] =<br />

{{ 1, 3}, {5, 3}};<br />

*A = Payoff[1][2];<br />

// and so on<br />

}<br />

typedef int Payoff_type[2][2];<br />

__kernel void modified( _global int *A,<br />

__constant Payoff_type *PayoffPtr )<br />

{<br />

*A = (PayoffPtr)[1][2];<br />

// and so on<br />

}<br />

1 Use the same type definition in both your host program and your kernel.<br />

Altera SDK for OpenCL<br />

Programming Guide


Page 12 OpenCL Programming Considerations<br />

Altera SDK for OpenCL<br />

Programming Guide<br />

4. Convert each structure parameter to a pointer that points to a structure. For<br />

example:<br />

Example 4. Converting Structure Parameters to Pointers that Point to Structures<br />

If your source code includes this: Rewrite it to this:<br />

struct Context<br />

struct Context<br />

{<br />

{<br />

float param1;<br />

float param1;<br />

float param2;<br />

float param2;<br />

int param3;<br />

int param3;<br />

uint param4;<br />

uint param4;<br />

};<br />

};<br />

__kernel void algorithm(<br />

__global float* A,<br />

struct Context c)<br />

{<br />

if ( c.param3 )<br />

{<br />

// and so on<br />

}<br />

}<br />

1 The __global struct declaration creates a new buffer to store the structure.<br />

Use a restrict qualifier to declare the pointer to the structure to prevent<br />

pointer aliasing.<br />

f For more information about optimizing your OpenCL kernel code to target an FPGA<br />

refer to the Altera SDK for OpenCL Optimization Guide.<br />

Host Programming Considerations<br />

Host Binary Requirement<br />

You must build 64-bit host binaries from your host code.<br />

Host Machine Memory Requirements<br />

The machine that runs the host program must have enough RAM to support all of the<br />

following components simultaneously:<br />

■ The host program and operating system.<br />

■ The working set for the host program.<br />

__kernel void algorithm(<br />

__global float* A,<br />

__global struct Context* restrict c)<br />

{<br />

if ( c->param3)<br />

{ // Dereference through a pointer<br />

// and so on<br />

}<br />

}<br />

■ The maximum amount of OpenCL memory buffers that can be allocated at once.<br />

Every device-side cl_mem buffer is associated with a corresponding storage area in<br />

the host process. Therefore, the amount of RAM required can be as large as the<br />

amount of RAM supported by the FPGA.<br />

May 2013 Altera Corporation


OpenCL Programming Considerations Page 13<br />

May 2013 Altera Corporation<br />

Loading Kernels onto an FPGA using clCreateProgramWithBinary<br />

The Altera SDK for OpenCL provides a kernel compiler that builds the .cl file for an<br />

OpenCL application. Kernels are compiled separately by the workstation offline and<br />

then loaded onto the FPGA using clCreateProgramWithBinary, shown in Example 5.<br />

Example 5. Host Code Showing Usage of clCreateProgramWithBinary<br />

size_t lengths[1];<br />

unsigned char* binaries[1] ={NULL};<br />

cl_int status[1];<br />

cl_int error;<br />

cl_program program;<br />

const char options[] = "";<br />

FILE *fp = fopen("program.aocx","rb");<br />

fseek(fp,0,SEEK_END);<br />

lengths[0] =ftell(fp);<br />

binaries[0]= (unsigned char*)malloc(sizeof(unsigned char)*lengths[0]);<br />

rewind(fp);<br />

fread(binaries[0],lengths[0],1,fp);<br />

fclose(fp);<br />

program = clCreateProgramWithBinary(context,1,device_list,lengths,<br />

(const unsigned char **)binaries,status,&error);<br />

clBuildProgram(program,1,device_list,options,NULL,NULL);<br />

After the .aocx file is built, you may use it to create the OpenCL program. The<br />

clCreateProgramWithBinary function uses the .aocx file to create a program for the<br />

targeted FPGA, and then host runtime programs the FPGA via clBuildProgram. You<br />

can load multiple FPGA programs into memory, and host runtime will reprogram the<br />

FPGA to match the program in memory.<br />

Aligned Memory Allocation<br />

You should use aligned memory allocations when assigning a host-side buffer to be<br />

written or read from the FPGA. This allows direct memory access (DMA) transfers to<br />

occur to and from the FPGA, which is a more efficient form of memory movement.<br />

Altera recommends that you use 64-byte alignment. To set up aligned memory<br />

allocations, add the source code in Example 6.<br />

Example 6. Source Code for Enabling Aligned Memory Allocations<br />

For Windows:<br />

#define AOCL_ALIGNMENT 64<br />

#include <br />

void *ptr = _aligned_malloc (size, AOCL_ALIGNMENT);<br />

For Linux:<br />

#define AOCL_ALIGNMENT 64<br />

#include <br />

void *ptr = NULL;<br />

posix_memalign (&ptr, AOCL_ALIGNMENT, size);<br />

Altera SDK for OpenCL<br />

Programming Guide


Page 14 Appendix A<br />

Appendix A<br />

Altera SDK for OpenCL<br />

Programming Guide<br />

Modifying Your Host Program to Target an FPGA<br />

To modify a host program that was originally written to target a different architecture,<br />

perform the following steps:<br />

1. To create cl_program objects, use clCreateProgramWithBinary, shown in<br />

Example 5, instead of clCreateProgramWithSource. You can use either<br />

clCreateKernelsInProgram or clCreateKernel.<br />

2. If you eliminated any function scope __constant variables in your kernel code (by<br />

replacing them with __constant pointers as described in “Modifying Your<br />

OpenCL Kernel to Target an FPGA” on page 11), modify your host program to<br />

create cl_mem objects associated with the pointers, load constant data into them<br />

with clEnqueueWriteBuffer, and pass the objects to the kernel as arguments with<br />

clSetKernelArg.<br />

3. If you converted any structure parameters to pointers-to-constant-structures,<br />

make the following changes to your host program:<br />

a. Allocate a cl_mem buffer to store the structure contents. (You need a separate<br />

cl_mem buffer for every kernel that uses a different structure value.)<br />

b. Set the structure kernel argument with a pointer to the structure buffer, not<br />

with a pointer to the structure contents.<br />

c. Populate the structure buffer contents before enqueuing the kernel. Ensure that<br />

the structure buffer is populated before the kernel launches by either<br />

enqueuing it on the same command queue as the kernel enqueuing, or by<br />

synchronizing separate kernel enqueuing and structure buffer enqueuing with<br />

an event.<br />

d. When your program no longer needs to call a kernel that uses the structure<br />

buffer, release the cl_mem buffer.<br />

Programming the Flash Memory of an FPGA<br />

By default, you configure an FPGA using the image (that is, the hardware<br />

configuration file) stored in the flash memory of the device. When there is no power,<br />

the FPGA retains the hardware configuration file stored in flash memory. When you<br />

power up the system, the hardware configuration file stored in flash memory<br />

configures the FPGA circuitry. Therefore, it is imperative to load an OpenCLcompatible<br />

image into the flash memory to ensure the proper configuration of the<br />

FPGA.<br />

When you load the hardware configuration file into the flash memory of the FPGA, it<br />

is imperative to not prematurely power down the system. Also, you must not launch<br />

any host code that calls OpenCL kernels or might otherwise communicate with the<br />

FPGA board.<br />

May 2013 Altera Corporation


Appendix B Page 15<br />

Appendix B<br />

May 2013 Altera Corporation<br />

Perform the following steps to load your hardware configuration file into the flash<br />

memory of the FPGA:<br />

1. Ensure that the USB-Blaster driver is installed successfully so that you can load<br />

your hardware configuration file into the Flash memory. If your USB-Blaster<br />

driver is not installed properly, refer to the Altera SDK for OpenCL Getting Started<br />

Guide for installation instructions.<br />

2. Generate a .aocx file. Refer to “Compiling the OpenCL Kernel Source File” on<br />

page 4 for more information.<br />

3. Type <strong>aocl</strong> flash .aocx to load the hardware configuration<br />

file into the flash memory of the FPGA.<br />

4. Reboot your host system.<br />

The reason for <strong>programming</strong> the flash memory of your FPGA is that OpenCL kernels<br />

require a PCIe device inside the FPGA so that they can communicate with the host<br />

program. The FPGA boards supported by the Altera SDK for OpenCL version 13.0 are<br />

preprogrammed with a default hardware configuration file that instantiates the PCIe<br />

device inside the FPGA. If the PCIe device in your FPGA board is not configured<br />

properly, you can configure the FPGA using the image stored in the flash memory<br />

such that you can have a functional PCIe device upon rebooting your system.<br />

Programming the FPGA with the <strong>aocl</strong> program Command<br />

If you want to load your hardware configuration file onto the FPGA offline without<br />

using the host runtime API, you may do so by implementing a non-standard use of<br />

clCreateProgramWithBinary in your host code, as shown below:<br />

cl_int status;<br />

cl_program program;<br />

const char *kernel_name = "vectorAdd";<br />

size_t kernel_name_length = 9;<br />

cl_int kernel_status;<br />

program = clCreateProgramWithBinary(context, 1, &device,<br />

&kernel_name_length, (const unsigned char**)&kernel_name,<br />

&kernel_status, &status);<br />

To load the kernel onto the FPGA, perform the following steps:<br />

1. At a command prompt, navigate to the directory that contains the .aocx that<br />

specifies the hardware configuration for the target FPGA.<br />

2. Type <strong>aocl</strong> program .aocx. This command loads the<br />

hardware configuration onto the FPGA.<br />

Allocation Limits of the Altera SDK for OpenCL<br />

The Altera SDK for OpenCL host runtime conforms with the OpenCL platform layer<br />

and runtime API, with the following clarifications and exceptions:<br />

Altera SDK for OpenCL<br />

Programming Guide


Page 16 Appendix B<br />

Altera SDK for OpenCL<br />

Programming Guide<br />

■ Allocation limits—Table 5 lists allocation limits of the SDK.<br />

Table 5. Allocation Limits of the Altera SDK for OpenCL<br />

Item Limit<br />

Maximum number of contexts 4<br />

Maximum number of queues per context 28<br />

Maximum number of program objects per context 20<br />

Maximum number of concurrently running kernels<br />

The total number of<br />

queues<br />

Maximum number of enqueued kernels More than 1000<br />

Maximum number of kernels per FPGA device 32<br />

Maximum number of arguments per kernel 128<br />

Maximum total size of kernel arguments 256 bytes<br />

OpenCL Features Supported by the Altera SDK for OpenCL<br />

The Altera SDK for OpenCL supports the OpenCL Specification version 1.0. The SDK<br />

provides a platform that implements the Embedded Profile (Section 10 of the OpenCL<br />

Specification version 1.0 by the Khronos Group).<br />

Platform Layer and Runtime Implementation<br />

Section 4 of the OpenCL Specification version 1.0 describes the OpenCL platform layer<br />

that allows applications to query OpenCL devices and device configuration<br />

information, and to create OpenCL contexts. Section 5 of the specification describes<br />

the OpenCL runtime API that manages OpenCL objects such as command queues,<br />

memory objects, program objects, kernel objects, and calls that allow you to enqueue<br />

commands.<br />

■ Section 5.2.1: Creating Buffer Objects—Memory objects allocated in host memory<br />

must be aligned to 128-byte boundaries to accommodate the worst-case alignment<br />

of the built-in OpenCL C data types (that is, type long16 because type long16 is<br />

128 bytes (16 x 8 bytes)). When you provide a host memory block to<br />

clCreateBuffer, you must ensure that the buffer is aligned as required.<br />

■ Section 5.2.8: Mapping and Unmapping Memory Objects—The SDK does not present<br />

a coherent shared memory abstraction; host memory and device global memory<br />

are not shared. All memory transfers between the host and the device must be<br />

explicitly managed with the OpenCL host API.<br />

■ Section 5.4.1: Creating Program Objects—The SDK does not provide an online<br />

compiler. Therefore, do not use the clCreateProgramWithSource API. Instead, use<br />

clCreateProgramWithBinary to create a program.<br />

■ Section 5.8: Out-of-order Execution of Kernels and Memory Object Commands—<br />

Command queues do not support out-of-order execution.<br />

May 2013 Altera Corporation


Appendix B Page 17<br />

OpenCL Programming Language Implementation<br />

May 2013 Altera Corporation<br />

Section 6 of the OpenCL Specification version 1.0 describes the OpenCL C <strong>programming</strong><br />

language. OpenCL C is based on C99 with some limitations. The Altera SDK for<br />

OpenCL conforms with the OpenCL C <strong>programming</strong> language with clarifications and<br />

exceptions summarized in Table 6. The following symbols indicate the support<br />

statuses of the features:<br />

■ “v” means that the feature is supported, and there is additional information<br />

available in the Notes column in Table 6.<br />

■ “v*” means that the feature is supported with exceptions and clarifications.<br />

■ “X” means that the feature is not supported.<br />

■ “—” means that an explanation or clarification for the feature is available in the<br />

Notes column Table 6.<br />

■ OpenCL <strong>programming</strong> language implementations that are supported with no<br />

additional clarifications are not shown in Table 6.<br />

Table 6. OpenCL Programming Language Implementation (Part 1 of 5)<br />

Section Feature Supported Notes<br />

6.1.1 Built-in Scalar Data Types<br />

double precision float v*<br />

half X<br />

6.1.2 Built-in Vector Data Types v*<br />

6.1.3 Built-in Data Types X<br />

6.1.4 Reserved Data Types X<br />

6.1.5 Alignment of Types —<br />

6.2.1 Implicit Conversions v<br />

6.2.2 Explicit Casts v<br />

6.5 Address Space Qualifiers v*<br />

6.6 Image Access Qualifiers X<br />

6.7 Function Qualifiers<br />

6.7.2<br />

Optional Attribute<br />

Qualifiers<br />

v<br />

*Preliminary support for all double precision float builtin<br />

scalar data types. This feature might not conform<br />

with the OpenCL Specification version 1.0.<br />

*Preliminary support for vectors with three elements.<br />

Three-element vectors might not conform with the<br />

OpenCL Specification version 1.0.<br />

All scalar and vector types are aligned as required<br />

(vectors with three elements are aligned as if they had<br />

four elements).<br />

Refer to Section 6.2.6: Usual Arithmetic Conversions in<br />

the OpenCL Specification version 1.2 for an important<br />

clarification of implicit conversions between scalar and<br />

vector types.<br />

The SDK allows scalar data casts to a vector with a<br />

different element type.<br />

*Function scope __constant variables are not<br />

supported.<br />

Refer to the Altera SDK for OpenCL Optimization Guide<br />

for tips on using regd_work_group_size to<br />

improve kernel performance.<br />

The SDK parses but ignores vec_type_hint and<br />

work_group_size_hint attribute qualifiers.<br />

Altera SDK for OpenCL<br />

Programming Guide


Page 18 Appendix B<br />

Altera SDK for OpenCL<br />

Programming Guide<br />

Table 6. OpenCL Programming Language Implementation (Part 2 of 5)<br />

Section Feature Supported Notes<br />

6.8<br />

Restrictions—The SDK implements the restrictions listed in the OpenCL Specification version 1.0 as<br />

follows:<br />

pointer assignments<br />

between address spaces<br />

—<br />

pointers to functions X<br />

structure-type kernel<br />

arguments<br />

X<br />

images X<br />

Arguments to __kernel functions declared in a<br />

program that are pointers must be declared with the<br />

__global, __constant, or __local qualifier.<br />

The compiler enforces the OpenCL restriction against<br />

point assignments between address spaces.<br />

The compiler does not enforce the OpenCL restriction<br />

against pointers to functions—you must ensure that<br />

your code contains no pointers to functions.<br />

Convert structure arguments to a pointer to a structure<br />

in global memory.<br />

bit fields X<br />

The compiler does not enforce this restriction—you<br />

must ensure that your code contains no bit fields.<br />

variable length arrays and<br />

structures<br />

X<br />

variadic macros and<br />

functions<br />

X<br />

C99 headers X C99 headers are not available.<br />

extern, static,<br />

auto, and register<br />

storage-class specifiers<br />

predefined identifiers v<br />

recursion X<br />

irreducible control flow X<br />

writes to memory of<br />

built-in types less than<br />

32 bits in size<br />

declaration of arguments<br />

to __kernel functions<br />

of type event_t<br />

elements of a struct or<br />

union belonging to<br />

different address spaces<br />

X<br />

The compiler does not enforce this restriction—you<br />

must ensure that your code contains no extern or<br />

static specifiers; the auto and register<br />

specifiers should have no effect.<br />

Warning: Assigning elements of a struct or union<br />

to different address spaces might cause a fatal error.<br />

Use the -D option of the aoc command to provide<br />

preprocessor symbol definitions in your kernel code.<br />

The compiler does not enforce this restriction—you<br />

must ensure that your code contains no recursive<br />

functions.<br />

Irreducible control flow is invalid. Code with irreducible<br />

control flows might result in a hardware implementation<br />

with unexpected behavior. Generally, irreducible control<br />

flow occurs in loops that have multiple entry points.<br />

v* *Performance of such writes is poor.<br />

X<br />

X<br />

The compiler does not enforce this restriction—you<br />

must ensure that your code does not include<br />

__kernel function arguments of type event_t.<br />

The compiler does not enforce this restriction—you<br />

must ensure that all elements of a struct or union<br />

belong to the same address space.<br />

May 2013 Altera Corporation


Appendix B Page 19<br />

May 2013 Altera Corporation<br />

Table 6. OpenCL Programming Language Implementation (Part 3 of 5)<br />

Section Feature Supported Notes<br />

6.9 Preprocessor Directives and Macros<br />

#pragma directive:<br />

#pragma unroll<br />

v<br />

The compiler supports only #pragma unroll. The<br />

unroll directive may optionally take an integer as an<br />

argument to control the unroll factor.<br />

For example, #pragma unroll 4 unrolls a loop<br />

with four iterations.<br />

By default, an unroll directive causes the compiler to<br />

attempt to fully unroll the loop.<br />

Refer to the Altera SDK for OpenCL Optimization Guide<br />

for tips on using #pragma unroll to improve<br />

kernel performance.<br />

__ENDIAN_LITTLE__<br />

defined to be value 1<br />

v The target FPGA is little-endian.<br />

__IMAGE_SUPPORT__ X<br />

__IMAGE_SUPPORT__ is undefined; the SDK does<br />

not support images.<br />

6.10 Attribute Qualifiers—The compiler parses attribute qualifiers as follows:<br />

Specifying Attributes of<br />

6.10.2 Functions—Structure-type<br />

kernel arguments<br />

6.10.3 Specifying Attributes of Variables<br />

6.10.4<br />

6.10.5<br />

endian X<br />

Specifying Attributes of<br />

Blocks and<br />

Control-Flow-Statements<br />

Extending Attribute<br />

Qualifiers<br />

6.11.2 Math Functions<br />

X<br />

X<br />

—<br />

built-in math functions v*<br />

built-in half_ and<br />

native_ math functions<br />

v*<br />

6.11.5 Geometric Functions v<br />

6.11.8<br />

Image Read and Write<br />

Functions<br />

X<br />

Convert structure arguments to a pointer to a structure<br />

in global memory.<br />

The SDK for OpenCL compiler can parse attributes on<br />

various syntactic structures. It reserves some attribute<br />

names for its own internal use. Refer to the Altera SDK<br />

for OpenCL Optimization Guide for more information<br />

about these attribute names.<br />

*Built-in math functions for double precision float are<br />

supported preliminarily. These functions might not<br />

conform with OpenCl Specification version 1.0.<br />

*Built-in half_ and native_ math functions for<br />

double precision float are supported preliminarily.<br />

These functions might not conform with OpenCl<br />

Specification version 1.0.<br />

*Built-in gemometric functions for double precision<br />

float are supported preliminarily. These functions might<br />

not conform with OpenCl Specification version 1.0.<br />

Refer to Table 7 on page 21 for a list of built-in<br />

geometric functions supported by the SDK.<br />

Altera SDK for OpenCL<br />

Programming Guide


Page 20 Appendix B<br />

Altera SDK for OpenCL<br />

Programming Guide<br />

Table 6. OpenCL Programming Language Implementation (Part 4 of 5)<br />

Section Feature Supported Notes<br />

6.11.9<br />

6.11.10<br />

6.11.11<br />

Synchronization Functions—<br />

the barrier synchronization<br />

function<br />

Explicit Memory Fence<br />

Functions<br />

Async Copies from Global to<br />

Local Memory, Local to<br />

Global Memory, and Prefetch<br />

v*<br />

X<br />

v*<br />

*Clarifications and exceptions:<br />

If a kernel specifies regd_work_group_size or<br />

max_work_group_size attributes, barrier<br />

supports the corresponding number of work-items.<br />

If neither attribute is specified, a barrier is instantiated<br />

with a default limit of 256 work-items.<br />

The work-item limit is the maximum supported workgroup<br />

size for the kernel; this limit is enforced by the<br />

runtime.<br />

*The implementation is naive:<br />

Work-item (0,0,0) performs the copy and the<br />

wait_group_events is implemented as a barrier.<br />

If a kernel specifies the reqd_work_group_size<br />

or max_work_group_size attribute,<br />

wait_group_events supports the corresponding<br />

number of work-items.<br />

If neither attribute is specified,<br />

wait_group_events is instantiated with a default<br />

limit of 256 work-items.<br />

Additional built-in vector functions from the OpenCL Specification version 1.2 Section 6.12.12: Miscellaneous Vector<br />

Functions:<br />

vec_step v<br />

shuffle v<br />

shuffle2 v<br />

May 2013 Altera Corporation


Appendix B Page 21<br />

May 2013 Altera Corporation<br />

Table 6. OpenCL Programming Language Implementation (Part 5 of 5)<br />

Section Feature Supported Notes<br />

OpenCL Specification version 1.2 Section<br />

6.12.13: printf<br />

v*<br />

*Preliminary support. This feature might not conform<br />

with the OpenCL Specification version 1.0 See below for<br />

details.<br />

The printf function in OpenCL has syntax and features similar to the printf function in C99, with a few<br />

exceptions. For details, refer to the OpenCL Specification version 1.2.<br />

To use a printf function, there are no requirements for special compilation steps, buffers, or flags; you can<br />

compile kernels that include printf instructions with the usual aoc command.<br />

During kernel execution, printf data is stored in a global printf buffer that is automatically allocated by the<br />

compiler. The size of this buffer is 64 kB; the total size of data arguments to a printf call should not exceed this<br />

size. When kernel execution completes, the contents of the printf buffer are printed to standard output.<br />

Buffer overflows are handled seamlessly; printf instructions can be executed an unlimited number of times.<br />

However, if the printf buffer overflows, kernel pipeline execution stalls until the host reads the buffer and prints<br />

the buffer contents.<br />

Because printf functions store their data into a global memory buffer, the performance of your kernel will drop if it<br />

includes such functions.<br />

There are no usage limitations on printf functions. You can use printf instructions inside if-then-else<br />

statements, loops, etc. A kernel can contain multiple printf instructions that are executed by multiple work-items.<br />

Format string arguments and literal string arguments of printf calls are transferred to the host system from the<br />

FPGA using a special memory region. This memory region can overflow if the total size of the printf string<br />

arguments is large (3000 characters or less is usually safe in a typical OpenCL application). An overflow causes the<br />

following error message to be printed during the host program execution:<br />

cannot parse auto-discovery string at byte offset 4096<br />

Output from printf is never intermixed, even though work-items may execute printf functions<br />

concurrently; however, the order of concurrent printf execution is not guaranteed. In other words, printf<br />

outputs may not appear in program order if the printf instructions are in concurrent data paths<br />

Table 7. Supported Scalar and Vector Argument Built-in Geometric Functions<br />

Argument Type<br />

Function<br />

float double<br />

cross v v<br />

dot v v<br />

distance v v<br />

length v v<br />

normalize v v<br />

fast_distance v —<br />

fast_length v —<br />

fast_normalize v —<br />

Altera SDK for OpenCL<br />

Programming Guide


Page 22 Appendix B<br />

Altera SDK for OpenCL<br />

Programming Guide<br />

Numerical Compliance Implementation<br />

Section 7 of the OpenCL Specification version 1.0 describes features of the C99 and IEEE<br />

754 standards that must be supported by OpenCL compliant devices. The Altera SDK<br />

for OpenCL operates on 32-bit and 64-bit floating-point values in IEEE Standard 754-<br />

2008 format, but not all floating-point operators have been implemented, as<br />

summarized in Table 8.<br />

Table 8. Numerical Compliance Implementation<br />

Section Feature Supported Notes<br />

7.1 Rounding Modes v*<br />

7.2<br />

Image Addressing and Filtering Implementation<br />

Image addressing and filtering are not supported. The SDK does not support images.<br />

Optional Extensions<br />

INF, NaN and Denormalized<br />

Numbers<br />

v*<br />

Conversions between integer and single and half<br />

precision floating-point types support all rounding<br />

modes.<br />

*Conversions between integer and double precision<br />

floating-point types support all rounding modes on a<br />

preliminary basis. This feature might not conform with<br />

the OpenCL Specification version 1.0.<br />

INF and NaN results for single precision operations are<br />

generated in a manner that conforms with the OpenCL<br />

Specifications version 1.0. Most operations that handle<br />

denormalized numbers are flushed prior to and after a<br />

floating-point operation.<br />

*Double precision floating-point operation is supported<br />

on a preliminary basis. This feature might not conform<br />

with the OpenCL Specifications version 1.0.<br />

7.3 Floating-Point Exceptions X The SDK discards floating-point exceptions.<br />

7.4 Relative Error as ULPs v*<br />

7.5 Edge Case Behavior v<br />

Single precision floating-point operations conform with<br />

the numerical accuracy requirements for an embedded<br />

profile of the OpenCL Specification version 1.0.<br />

*Double precision floating-point operation support is<br />

preliminary. This feature might not conform with the<br />

OpenCL Specification version 1.0.<br />

Section 9 of the OpenCL Specification version 1.0 describes a list of optional features that<br />

may be supported by some OpenCL implementations.<br />

■ Section 9.5: Atomic Functions for 32-bit integers—The SDK offers preliminary<br />

support for all 32-bit global and local memory atomic functions described in<br />

Section 6.12.11: Atomic Functions of the OpenCL Specification version 1.2. This feature<br />

might not conform with the OpenCL Specification version 1.0.<br />

1 The implementation of atomic functions may lower the performance of<br />

your design. The constraint on performance (F max) increases if more than<br />

one type of atomic functions are used in the kernel.<br />

May 2013 Altera Corporation


Document Revision History Page 23<br />

Embedded Profile Implementation<br />

May 2013 Altera Corporation<br />

Section 10 of the OpenCL Specification version 1.0 describes the OpenCL embedded<br />

profile. The Altera SDK for OpenCL conforms with the OpenCL embedded profile<br />

with clarifications and exceptions summarized in Table 9.<br />

Multiple Host Threads<br />

Document Revision History<br />

Table 10. Document Revision History<br />

Table 9. Clarifications and Exceptions to OpenCL Embedded Profile<br />

Clause Feature Supported Notes<br />

1 64-bit integers v<br />

2 3D images X The SDK does not support images.<br />

3<br />

Create 2D and 3D images with<br />

image_channel_data_type<br />

values<br />

X The SDK does not support images.<br />

4 Samplers X<br />

5 Rounding modes —<br />

6<br />

Restrictions listed for single<br />

precision basic floating-point<br />

operations<br />

7 half type X<br />

8<br />

Error bounds listed for conversions<br />

from CL_UNORM_INT8,<br />

CL_SNORM_INT8,<br />

CL_UNORM_INT16 and<br />

CL_SNORM_INT16 to float<br />

Appendix A.2 Multiple Host Threads of the OpenCL Specification version 1.0—The<br />

Altera SDK for OpenCL host library is not thread safe.<br />

Table 10 lists the revision history for this document.<br />

X<br />

X<br />

This clause of the OpenCL Specification version 1.0<br />

does not apply to the SDK.<br />

Refer to Table 5 on page 16 for a list of allocation<br />

limits.<br />

Date Version Changes<br />

May 2013 13.0.0<br />

Updated compilation flow.<br />

Updated kernel compiler commands.<br />

Included Altera SDK for OpenCL Utility commands.<br />

Inserted a OpenCL Programming Considerations section.<br />

Updated flash <strong>programming</strong> procedure and moved it to Appendix A.<br />

Included a new clCreateProgramWithBinary FPGA hardware <strong>programming</strong> flow; moved the old<br />

clCreateProgramWithBinary hardware <strong>programming</strong> flow to Appendix A.<br />

Moved updated information on allocation limits and OpenCL language support to Appendix B.<br />

November 2012 12.1.0 Initial release.<br />

Altera SDK for OpenCL<br />

Programming Guide


Page 24 Document Revision History<br />

Altera SDK for OpenCL<br />

Programming Guide<br />

May 2013 Altera Corporation

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!