aocl_programming_guide
aocl_programming_guide
aocl_programming_guide
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
OCL002-13.0.0<br />
101 Innovation Drive<br />
San Jose, CA 95134<br />
www.altera.com<br />
May 2013 Altera Corporation<br />
Altera SDK for OpenCL<br />
Programming Guide<br />
This document describes the content and functionality of the Altera ® SDK for<br />
OpenCL version 13.0. The Altera SDK for OpenCL is a C-based heterogeneous<br />
parallel <strong>programming</strong> environment for Altera field programmable gate arrays<br />
(FPGAs). This document assumes that you have read the Altera SDK for OpenCL<br />
Getting Started Guide and have performed the following tasks:<br />
■ Download and install the Quartus II software version 13.0.<br />
■ Install your Stratix V FPGA board. You must download and install all necessary<br />
device support software.<br />
■ Download the Altera SDK for OpenCL version 13.0.<br />
■ Install the Altera SDK for OpenCL version 13.0.<br />
■ Install the USB-Blaster and the PCI Express ® (PCIe ® ) drivers.<br />
f If you have not performed the tasks described above, refer to the Altera Software<br />
Installation and Licensing Manual for installation instructions on the Quartus II<br />
software. Refer to the Altera SDK for OpenCL Getting Started Guide for installation<br />
instructions on the Altera SDK for OpenCL.<br />
This document assumes you are knowledgeable in OpenCL concepts and application<br />
<strong>programming</strong> interfaces (APIs), as described in the Khronos OpenCL Specification<br />
version 1.0. In addition, this <strong>programming</strong> <strong>guide</strong> assumes that you are familiar with<br />
specific sections and clauses of the OpenCL specification. For more information about<br />
the OpenCL Specification version 1.0, refer to OpenCL Reference Pages on the<br />
Khronos Group website.<br />
About the Altera SDK for OpenCL<br />
The Altera SDK for OpenCL provides a platform that allows you to implement the<br />
Embedded Profile as described in Section 10 of the OpenCL Specification version 1.0,<br />
with key differences listed in “Appendix B”.<br />
Contents of the Altera SDK for OpenCL version 13.0<br />
The Altera SDK for OpenCL version 13.0 provides the following logical components:<br />
■ Compiler—The compiler translates your OpenCL C device code into a hardware<br />
configuration file that can be loaded onto an Altera FPGA.<br />
■ Altera SDK for OpenCL (AOCL) Utility —The AOCL utility includes a set of<br />
commands that allows you to perform high-level tasks. Refer to “Altera SDK for<br />
OpenCL Utility” on page 8 for more information.<br />
© 2013 Altera Corporation. All rights reserved. ALTERA, ARRIA, CYCLONE, HARDCOPY, MAX, MEGACORE, NIOS,<br />
QUARTUS and STRATIX words and logos are trademarks of Altera Corporation and registered in the U.S. Patent and Trademark<br />
Office and in other countries. OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission of Khronos. * The<br />
Altera SDK for OpenCL is based on a published Khronos Specification, and is expected to pass the Khronos Conformance Testing<br />
Process. Current conformance status can be found at www.khronos.org/conformance. All other words and logos identified as<br />
trademarks or service marks are the property of their respective holders as described at www.altera.com/common/legal.html.<br />
Altera warrants performance of its semiconductor products to current specifications in accordance with Altera's standard<br />
warranty, but reserves the right to make changes to any products and services at any time without notice. Altera assumes no<br />
responsibility or liability arising out of the application or use of any information, product, or service described herein except as<br />
expressly agreed to in writing by Altera. Altera customers are advised to obtain the latest version of device specifications before<br />
ISO<br />
9001:2008<br />
Registered<br />
Feedback Subscribe
Page 2 Example OpenCL Applications<br />
Altera SDK for OpenCL<br />
Programming Guide<br />
■ Host Runtime—The host runtime provides the OpenCL host platform API and<br />
runtime API for your OpenCL host application. The host runtime consists of the<br />
following libraries:<br />
■ Statically-linked libraries provide OpenCL host APIs, hardware abstractions<br />
and helper libraries<br />
■ Dynamically-linked libraries (DLLs) provide hardware abstractions and helper<br />
libraries<br />
On Windows and Linux machines, the installation process for the Altera SDK for<br />
OpenCL installs the SDK into a folder or directory referenced by the<br />
ALTERAOCLSDKROOT environment variable. The contents of the Altera SDK for OpenCL<br />
version 13.0 are summarized in Table 1.<br />
Table 1. Contents of the Altera SDK for OpenCL for Windows and Linux<br />
Windows<br />
Folder<br />
Example OpenCL Applications<br />
Linux<br />
Directory<br />
Description<br />
\windows64\bin /linux64/bin<br />
Shared libraries, executables and Windows DLLs for your development<br />
environment. Include this folder or directory in your PATH environment<br />
variable.<br />
The Jungo WinDriver for PCIe for Windows or the Altera OpenCL PCIe<br />
\windows64\driver /linux64/driver<br />
Driver for Linux allows your host computer to communicate with the<br />
FPGA board. Refer to the Altera SDK for OpenCL Getting Started Guide<br />
for installation instructions.<br />
\board /board<br />
The Altera Certified Board Program for OpenCL for each FPGA board<br />
supported by the SDK.<br />
\ip /ip IP cores used to compile device kernels.<br />
\host /host Files used to compile your program.<br />
OpenCL version 1.0 header files and Altera SDK for OpenCL interface files<br />
\host\include /host/include that are used to compile and link your host program. Add this path to the<br />
include file search path in your development environment.<br />
OpenCL host runtime libraries that comprise a generic layer and a<br />
\host\windows64\lib /host/linux64/lib hardware abstraction layer that interface the host CPU with the FPGA<br />
board via PCIe. Used to link your host program.<br />
— /linux64/lib<br />
Shared libraries for your development environment. Include this<br />
directory in your LD_LIBRARY_PATH environment variable.<br />
The Altera SDK for OpenCL includes the following example OpenCL applications. To<br />
access these example files, navigate to the<br />
ALTERAOCLSDKROOT/design/ folder or directory.<br />
■ fft—an OpenCL application of a fast fourier transform algorithm<br />
■ matrixMult—an OpenCL application of a single precision floating-point-blocked<br />
matrix multiplication algorithm<br />
■ moving_average—an OpenCL application of an averaging filter<br />
■ vectorAdd—an OpenCL application that adds two vectors<br />
May 2013 Altera Corporation
Altera SDK for OpenCL Compilation Flow Page 3<br />
Altera SDK for OpenCL Compilation Flow<br />
May 2013 Altera Corporation<br />
Compiling an OpenCL application with the SDK is a two-step process:<br />
1. Compile your OpenCL kernels. Refer to “Compiling the OpenCL Kernel Source<br />
File” on page 4.<br />
2. Compile and link your host program. Refer to “Compiling and Linking Your Host<br />
Program” on page 9.<br />
Figure 1 depicts the compilation flow of the Altera SDK for OpenCL.<br />
Figure 1. The Altera SDK for OpenCL Compilation Flow<br />
Kernel Code<br />
AOC OpenCL<br />
Compiler<br />
Kernel Binary<br />
Host Code<br />
Standard<br />
C Compiler<br />
Host Binary<br />
PCIe<br />
Altera SDK for OpenCL<br />
Programming Guide
Page 4 Using the Altera Offline Kernel Compiler<br />
Using the Altera Offline Kernel Compiler<br />
Altera SDK for OpenCL<br />
Programming Guide<br />
Figure 2 illustrates the compilation flow of an OpenCL application that targets an<br />
Altera FPGA.<br />
Figure 2. Compilation Flow of the Altera SDK for OpenCL Offline Kernel Compiler<br />
Kernel<br />
file 2<br />
Kernel<br />
file 1<br />
Kernel<br />
file 3<br />
Merged<br />
kernel<br />
source file<br />
You create an FPGA image of your kernel file using the Altera Offline Compiler<br />
(AOC). If you create multiple kernel files, you must consolidate them into a single<br />
kernel source file (.cl). The kernel compiler translates this kernel source file into an<br />
Altera Offline Compiler Object file (.aoco), which the compiler then uses to build a<br />
hardware configuration file, called the Altera Offline Compiler Executable file (.aocx),<br />
that targets the FPGA.<br />
Compiling the OpenCL Kernel Source File<br />
Altera Offline<br />
Kernel Compiler<br />
(AOC)<br />
FPGA<br />
image file<br />
The AOC builds your OpenCL kernels and creates your hardware configuration file.<br />
Table 2 lists the available aoc compilation options. By default, you type the command<br />
aoc .cl to compile your kernels and create your hardware<br />
configuration file in a single step.<br />
c The process of creating an FPGA hardware configuration file takes hours to complete.<br />
Altera recommends that you perform this hardware configuring step on a computer<br />
that is equipped with at least 24 gigabytes (GB) of RAM.<br />
Table 2. Available Compilation Options for the Altera Offline Compiler<br />
Command Description<br />
Default compilation option:<br />
Generates a Quartus II hardware design project and then<br />
creates the hardware configuration file. The raw FPGA<br />
aoc .cl<br />
<strong>programming</strong> file is the .aocx executable file in the project<br />
directory.<br />
May 2013 Altera Corporation
Using the Altera Offline Kernel Compiler Page 5<br />
May 2013 Altera Corporation<br />
Table 2. Available Compilation Options for the Altera Offline Compiler (Continued)<br />
A two-step compilation option:<br />
Command Description<br />
aoc -c .cl<br />
aoc .aoco<br />
Step 1:<br />
Generates a Quartus II hardware design project in a<br />
subdirectory that has the same name as the kernel filename,<br />
with special characters converted to underscores.The<br />
subdirectory contains intermediate files that the SDK uses to<br />
build the bit stream used to program the FPGA.<br />
This command generates the .aoco object file.<br />
Note:<br />
Generating the .aoco file is a rapid process compared to<br />
creating the .aocx file. To increase efficiency, iterate on your<br />
kernel code using this compilation option before generating a<br />
hardware configuration.<br />
Step 2:<br />
Creates the .aocx hardware configuration file from the .aoco<br />
object file.<br />
You may also perform the compilation in stages. For example, the kernel compiler<br />
builds the .aoco file for a vector addition OpenCL application shown in Example 1.<br />
You can access the kernel source file vectorAdd.cl by navigating to the<br />
ALTERAOCLSDKROOT/design/vectorAdd folder or directory.<br />
Example 1. vectorAdd.cl Kernel Source File<br />
// AOCL kernel for adding two input vectors<br />
__kernel void vectorAdd(__global const float *x,<br />
__global const float *y,<br />
__global float *restrict z,<br />
int numElements)<br />
{<br />
// get index of the work-item<br />
size_t index = get_global_id(0);<br />
if (index >= numElements)<br />
{<br />
return;<br />
}<br />
//add the vector elements<br />
z[index] = x[index] + y[index];<br />
}<br />
To create the .aoco file, type aoc -c vectorAdd.cl at the command prompt. When<br />
the first compilation step completes and you return to the command prompt, type<br />
aoc vectorAdd.aoco to create the hardware configuration file from the .aoco file.<br />
Altera SDK for OpenCL<br />
Programming Guide
Page 6 Using the Altera Offline Kernel Compiler<br />
Altera SDK for OpenCL<br />
Programming Guide<br />
By default, the command prompt appears to signify the completion of the<br />
compilation. If you want to review the kernel throughput analysis and the estimated<br />
resource usage summary reports for the vectorAdd kernel compilation, shown in<br />
Example 2, invoke the aoc -c vectorAdd.cl --report command.<br />
Example 2. Output of the aoc -c vectorAdd.cl --report Command<br />
aoc: Selected target board pcie385n_a7<br />
Kernel throughput analysis for : vectorAdd<br />
.. simd work items : 1<br />
.. compute units : 1<br />
.. throughput due to control flow analysis : 250.00 M work items/second<br />
.. kernel global memory bandwidth analysis : 3157.89 MB/second<br />
.. kernel number of local memory banks : none<br />
+--------------------------------------------------------------------+<br />
; Estimated Resource Usage Summary ;<br />
+----------------------------------------+---------------------------+<br />
; Resource + Usage ;<br />
+----------------------------------------+---------------------------+<br />
; Logic utilization ; 17% ;<br />
; Dedicated logic registers ; 7% ;<br />
; Memory blocks ; 16% ;<br />
; DSP blocks ; 0% ;<br />
+----------------------------------------+---------------------------;<br />
aoc: First stage compile time = 10s<br />
1 You can also view detailed information on the compilation by accessing the log file.<br />
The .log file is located in the folder or<br />
subdirectory.<br />
Altera Offline Compiler Options<br />
You can type the command aoc --help to obtain more information about the<br />
command options for the kernel compiler, as summarized in Table 3.<br />
Table 3. AOC Command Options (Part 1 of 3)<br />
Option Description<br />
Help options:<br />
--help or -h Displays a list of aoc command options.<br />
--version Prints out version information and exits.<br />
-v Reports the progress of compilation.<br />
--report Generates reports for kernel throughput analysis and estimated resource<br />
usage summary.<br />
Overall options:<br />
-c Sets up a Quartus II hardware design project without performing a<br />
hardware build.<br />
-o Uses as the name for the output.<br />
For example:<br />
To specify the filename of the output .aoco file, type the command:<br />
aoc -c -o .aoco myKernel.cl<br />
To specify the filename of the output .aocx file, type the command:<br />
aoc -o .aocx myKernel.cl, or<br />
aoc -o .aocx myKernel.aoco<br />
May 2013 Altera Corporation
Using the Altera Offline Kernel Compiler Page 7<br />
May 2013 Altera Corporation<br />
Table 3. AOC Command Options (Part 2 of 3)<br />
Option Description<br />
-I Adds to header search path.<br />
-D Defines a macro called , where can be a value or an actual<br />
name.<br />
For example, aoc myKernel.cl -D RADIUS = 4 is equivalent to<br />
adding a #define RADIUS 4 line at the top of your kernel<br />
source file.<br />
-W Suppresses warnings.<br />
-Werror Forces all warnings into errors.<br />
Modifiers:<br />
--board Runs compilation for the specified board.<br />
For example, the command<br />
aoc -- board pcie385n_a7 myKernel.cl compiles<br />
myKernel for the Nallatech PCIe-385N A7 FPGA computing card.<br />
--list-boards Prints a list of available boards and exits.<br />
Optimization controls:<br />
-fp-relaxed Directs the compiler to optimize floating-point operations using balanced<br />
trees.<br />
If you want to use this optimization option, your program must be able to<br />
tolerate small differences in floating-point results.<br />
Example usage: aoc -fp-relaxed=true myKernel.cl<br />
-fpc Directs the compiler to remove floating-point rounding operations and<br />
conversions when possible and to carry additional bits to maintain<br />
precision.<br />
If possible, this argument causes the compiler to round a floating-point<br />
operation once at the end of the balanced tree.<br />
Example usage: aoc -fpc=true myKernel.cl<br />
--sw-dimm-partition Configures the global memory as separate address spaces for each external<br />
memory or bank.<br />
After you configure the global memory, allocate each buffer in an external<br />
memory with the Altera specific cl_mem_flags (For example,<br />
CL_MEM_BANK_2_ALTERA).<br />
--const-cache-bytes Configures the constant cache size in bytes, rounded to the next nearest<br />
power of 2.<br />
This argument has no effect if none of the kernels uses the __constant<br />
address space. The default constant cache size is 16 kilobytes (kB).<br />
-O3 Applies resource-driven performance optimizations to improve<br />
performance without violations fitting requirements (estimated logic<br />
utilization is less than or equal to 85%).<br />
--util Overrides the logic utilization threshold used by the -O3 option with %.<br />
Altera SDK for OpenCL<br />
Programming Guide
Page 8 Using the Altera Offline Kernel Compiler<br />
Altera SDK for OpenCL<br />
Programming Guide<br />
Table 3. AOC Command Options (Part 3 of 3)<br />
OpenCL Build Options:<br />
-cl-single-precisionconstant<br />
-cl-denorms-are-zero<br />
-cl-opt-disable<br />
-cl-strict-aliasing<br />
-cl-mad-enable<br />
-cl-no-signed-zeros<br />
-cl-unsafe-mathoptimizations<br />
-cl-finite-math-only<br />
-cl-fast-relaxed-math<br />
Altera SDK for OpenCL Utility<br />
Option Description<br />
Refer to the OpenCL Specification version 1.0, section 5.4.3.<br />
The Altera SDK for OpenCL (AOCL) utility provides additional functionality that<br />
allows you to perform high-level tasks. You can obtain information on your host<br />
program such as flags and makefile fragments. You can invoke the <strong>aocl</strong> help<br />
command to obtain more information about the command options, as summarized in<br />
Table 4.<br />
Table 4. AOCL Command Options<br />
Option<br />
Subcommands for building your host program:<br />
Description<br />
<strong>aocl</strong> example-makefile Displays makefile fragments for compiling and linking a host program.<br />
<strong>aocl</strong> makefile Same as the example-makefile subcommand.<br />
<strong>aocl</strong> compile-config Displays the flags for compiling your host program.<br />
Shows the flags for linking your host program with the runtime libraries provided<br />
<strong>aocl</strong> link-config<br />
by the Altera SDK for OpenCL.<br />
This command combines the function of the ldflags and ldlibs<br />
subcomands.<br />
<strong>aocl</strong> linkflags Same as the link-config subcommand.<br />
Displays the linker flags used to link your host program to the host runtime<br />
<strong>aocl</strong> ldflags<br />
libraries provided by the Altera SDK for OpenCL. This command does not list the<br />
libraries themselves.<br />
<strong>aocl</strong> ldlibs Displays the list of host runtime libraries provided by the Altera SDK for OpenCL.<br />
<strong>aocl</strong> program<br />
Configures a new FPGA image onto the board.<br />
Usage: <strong>aocl</strong> program .aocx.<br />
<strong>aocl</strong> flash<br />
[If supported] Initializes the FPGA with a specified startup configuration.<br />
Usage: <strong>aocl</strong> flash .aocx.<br />
<strong>aocl</strong> install Installs your FPGA board into the current host system.<br />
<strong>aocl</strong> diagnostic Runs the board vendor’s test program for your FPGA board.<br />
May 2013 Altera Corporation
Compiling and Linking Your Host Program Page 9<br />
Compiling and Linking Your Host Program<br />
May 2013 Altera Corporation<br />
Table 4. AOCL Command Options (Continued)<br />
General:<br />
<strong>aocl</strong> version Displays the version information.<br />
<strong>aocl</strong> help Displays a list of <strong>aocl</strong> options.<br />
<strong>aocl</strong> help Displays the help content for a particular command.<br />
In an OpenCL application, the host program uses standard OpenCL runtime APIs to<br />
manage device configuration, data buffers, kernel launches, and event<br />
synchronization. The host program also contains host-side functions such as file I/O<br />
or portions of the source code that do not run on an accelerator device. A host<br />
program includes C header files that describe the OpenCL APIs. These C header files<br />
must link against the runtime API libraries.<br />
Host Program Compilation Settings<br />
To compile your host program, type <strong>aocl</strong> compile-config at the command line to<br />
obtain the following path you must add to your C preprocessor to include a file search<br />
path:<br />
■ For Windows systems, add the path -I/%ALTERAOCLSDKROOT%/host/include<br />
■ For Linux systems, add the path -I/$ALTERAOCLSDKROOT/host/include<br />
The OpenCL API header files reside in this directory. You should include the OpenCL<br />
header file CL/opencl.h in your host source.<br />
1 On Windows, you must add the /MD flag to compile the Altera runtime libraries. The<br />
flag links the runtime libraries against the multi-threaded DLL version of the<br />
Microsoft C Runtime library. You must also compile your host program with the /MD<br />
compilation flag, or use the /NODEFAULTLIB linker option to override the selection of<br />
runtime library.<br />
Library Paths and Links<br />
Your host program must link against the appropriate runtime libraries provided by<br />
the SDK. To obtain the link options, type the command <strong>aocl</strong> link-config.<br />
Programming the FPGA Hardware<br />
Option Description<br />
By default, the AOC compiles the OpenCL kernels, and then the kernels are loaded<br />
onto the FPGA via PCIe CvP using clCreateProgramWithBinary, as outlined in<br />
“Loading Kernels onto an FPGA using clCreateProgramWithBinary” on page 13.<br />
c You must ensure that the PCIe device inside your FPGA board is configured properly<br />
prior to using clCreateProgramWithBinary to program your FPGA hardware. To<br />
verify the successful instantiation of the PCIe device, type the command<br />
<strong>aocl</strong> diagnostic.<br />
Altera SDK for OpenCL<br />
Programming Guide
Page 10 Running Your OpenCL Application<br />
Altera SDK for OpenCL<br />
Programming Guide<br />
1 If the PCIe device on your FPGA board is not configured properly, refer to<br />
“Programming the Flash Memory of an FPGA” in “Appendix A” for<br />
<strong>programming</strong> instructions.<br />
If you want to program your FPGA manually without using the host runtime API, via<br />
a non-standard use of clCreateProgramWithBinary, refer to “Programming the FPGA<br />
with the <strong>aocl</strong> program Command” in “Appendix A”.<br />
Running Your OpenCL Application<br />
After you load the FPGA hardware configuration file onto the FPGA, you can run the<br />
host application as outlined in the Altera SDK for OpenCL Getting Started Guide.<br />
OpenCL Programming Considerations<br />
When you create your kernel and host applications, keep in mind the following<br />
<strong>programming</strong> considerations:<br />
■ For OpenCL kernels, refer to “Kernel Programming Considerations”<br />
■ For host code, refer to “Host Programming Considerations”<br />
Kernel Programming Considerations<br />
Specifying a Target FPGA Board<br />
You can target any FPGA board that is provided in a board vendor-supplied “board<br />
package.” To select the board package that the compiler uses, set the<br />
AOCL_BOARD_PACKAGE_ROOT environment variable to the path of your board vendor's<br />
board package (that is, T/board/).<br />
For example, if you run the SDK on Linux, set the board package environment<br />
variable as shown below:<br />
■ To select the Nallatech PCIe-385N A7 FPGA computing card, set<br />
AOCL_BOARD_PACKAGE_ROOT to $ALTERAOCLSDKROOT/board/pcie385n<br />
■ To select the BittWare S5-PCIe-HQ Altera Stratix V GSMD5 half-length PCIe<br />
board, set AOCL_BOARD_PACKAGE_ROOT to $ALTERAOCLSDKROOT/board/s5pciehq<br />
Type aoc --list-boards to list the available boards within your current board<br />
package. You can then target any of the listed boards within that board package using<br />
the aoc --board command.<br />
File Scope __constant Address Space Qualifier<br />
The Altera SDK for OpenCL supports scalar file scope constant data. For example,<br />
you may set the __constant address space qualifier as follows:<br />
__constant int my_array[8] = {0x0, 0x1, 0x2, 0x3, 0x4, 0x5, 0x6, 0x7};<br />
__kernel void my_kernel (__global int * my_buffer)<br />
{<br />
size_t index = get_global_id(0);<br />
my_buffer[index] += my_array[index % 8];<br />
}<br />
May 2013 Altera Corporation
OpenCL Programming Considerations Page 11<br />
May 2013 Altera Corporation<br />
In this case, values for my_array are set in a ROM because the file scope constant data<br />
does not change between kernel invocations. To avoid long compilation times, file<br />
scope constant data should not exceed 32 kB in size.<br />
c Vector type file scope constant data is not supported. For example, the following line<br />
of code is illegal because the vector type int2 is not supported:<br />
__constant int2 my_array[4] = {(0x0, 0x1), (0x2, 0x3), (0x4, 0x5), (0x6,<br />
0x7)};<br />
Modifying Your OpenCL Kernel to Target an FPGA<br />
To modify a kernel that was originally written to target a different architecture,<br />
perform the following steps:<br />
1. Collect the source code from all your kernels into a single file. The name of the<br />
single file must have a .cl extension. For example, collect all the source code from<br />
all your kernel source code files into a single file and name that single file<br />
mykernels.cl.<br />
2. If your kernel uses barriers and your kernel must handle work-groups larger than<br />
256 work-items, specify the number of work-items you require with either<br />
reqd_work_group_size or max_work_group_size. (The default<br />
max_work_group_size is 256.)<br />
3. Eliminate function scope __constant variables. Replace them with a pointer to a<br />
__constant parameter. In your host program, allocate a cl_mem object in global<br />
memory, copy the constant data to global memory before the kernel starts, and<br />
pass the cl_mem object as a parameter to the kernel.<br />
If a constant variable is a complex type, Altera recommends that, for simplicity,<br />
you use a typedef argument. For example:<br />
Example 3. Eliminating Function Scope __constant Variables<br />
If your source code includes this: Rewrite it to this:<br />
__kernel void original( _global int *A)<br />
{<br />
__constant int Payoff[2][2] =<br />
{{ 1, 3}, {5, 3}};<br />
*A = Payoff[1][2];<br />
// and so on<br />
}<br />
typedef int Payoff_type[2][2];<br />
__kernel void modified( _global int *A,<br />
__constant Payoff_type *PayoffPtr )<br />
{<br />
*A = (PayoffPtr)[1][2];<br />
// and so on<br />
}<br />
1 Use the same type definition in both your host program and your kernel.<br />
Altera SDK for OpenCL<br />
Programming Guide
Page 12 OpenCL Programming Considerations<br />
Altera SDK for OpenCL<br />
Programming Guide<br />
4. Convert each structure parameter to a pointer that points to a structure. For<br />
example:<br />
Example 4. Converting Structure Parameters to Pointers that Point to Structures<br />
If your source code includes this: Rewrite it to this:<br />
struct Context<br />
struct Context<br />
{<br />
{<br />
float param1;<br />
float param1;<br />
float param2;<br />
float param2;<br />
int param3;<br />
int param3;<br />
uint param4;<br />
uint param4;<br />
};<br />
};<br />
__kernel void algorithm(<br />
__global float* A,<br />
struct Context c)<br />
{<br />
if ( c.param3 )<br />
{<br />
// and so on<br />
}<br />
}<br />
1 The __global struct declaration creates a new buffer to store the structure.<br />
Use a restrict qualifier to declare the pointer to the structure to prevent<br />
pointer aliasing.<br />
f For more information about optimizing your OpenCL kernel code to target an FPGA<br />
refer to the Altera SDK for OpenCL Optimization Guide.<br />
Host Programming Considerations<br />
Host Binary Requirement<br />
You must build 64-bit host binaries from your host code.<br />
Host Machine Memory Requirements<br />
The machine that runs the host program must have enough RAM to support all of the<br />
following components simultaneously:<br />
■ The host program and operating system.<br />
■ The working set for the host program.<br />
__kernel void algorithm(<br />
__global float* A,<br />
__global struct Context* restrict c)<br />
{<br />
if ( c->param3)<br />
{ // Dereference through a pointer<br />
// and so on<br />
}<br />
}<br />
■ The maximum amount of OpenCL memory buffers that can be allocated at once.<br />
Every device-side cl_mem buffer is associated with a corresponding storage area in<br />
the host process. Therefore, the amount of RAM required can be as large as the<br />
amount of RAM supported by the FPGA.<br />
May 2013 Altera Corporation
OpenCL Programming Considerations Page 13<br />
May 2013 Altera Corporation<br />
Loading Kernels onto an FPGA using clCreateProgramWithBinary<br />
The Altera SDK for OpenCL provides a kernel compiler that builds the .cl file for an<br />
OpenCL application. Kernels are compiled separately by the workstation offline and<br />
then loaded onto the FPGA using clCreateProgramWithBinary, shown in Example 5.<br />
Example 5. Host Code Showing Usage of clCreateProgramWithBinary<br />
size_t lengths[1];<br />
unsigned char* binaries[1] ={NULL};<br />
cl_int status[1];<br />
cl_int error;<br />
cl_program program;<br />
const char options[] = "";<br />
FILE *fp = fopen("program.aocx","rb");<br />
fseek(fp,0,SEEK_END);<br />
lengths[0] =ftell(fp);<br />
binaries[0]= (unsigned char*)malloc(sizeof(unsigned char)*lengths[0]);<br />
rewind(fp);<br />
fread(binaries[0],lengths[0],1,fp);<br />
fclose(fp);<br />
program = clCreateProgramWithBinary(context,1,device_list,lengths,<br />
(const unsigned char **)binaries,status,&error);<br />
clBuildProgram(program,1,device_list,options,NULL,NULL);<br />
After the .aocx file is built, you may use it to create the OpenCL program. The<br />
clCreateProgramWithBinary function uses the .aocx file to create a program for the<br />
targeted FPGA, and then host runtime programs the FPGA via clBuildProgram. You<br />
can load multiple FPGA programs into memory, and host runtime will reprogram the<br />
FPGA to match the program in memory.<br />
Aligned Memory Allocation<br />
You should use aligned memory allocations when assigning a host-side buffer to be<br />
written or read from the FPGA. This allows direct memory access (DMA) transfers to<br />
occur to and from the FPGA, which is a more efficient form of memory movement.<br />
Altera recommends that you use 64-byte alignment. To set up aligned memory<br />
allocations, add the source code in Example 6.<br />
Example 6. Source Code for Enabling Aligned Memory Allocations<br />
For Windows:<br />
#define AOCL_ALIGNMENT 64<br />
#include <br />
void *ptr = _aligned_malloc (size, AOCL_ALIGNMENT);<br />
For Linux:<br />
#define AOCL_ALIGNMENT 64<br />
#include <br />
void *ptr = NULL;<br />
posix_memalign (&ptr, AOCL_ALIGNMENT, size);<br />
Altera SDK for OpenCL<br />
Programming Guide
Page 14 Appendix A<br />
Appendix A<br />
Altera SDK for OpenCL<br />
Programming Guide<br />
Modifying Your Host Program to Target an FPGA<br />
To modify a host program that was originally written to target a different architecture,<br />
perform the following steps:<br />
1. To create cl_program objects, use clCreateProgramWithBinary, shown in<br />
Example 5, instead of clCreateProgramWithSource. You can use either<br />
clCreateKernelsInProgram or clCreateKernel.<br />
2. If you eliminated any function scope __constant variables in your kernel code (by<br />
replacing them with __constant pointers as described in “Modifying Your<br />
OpenCL Kernel to Target an FPGA” on page 11), modify your host program to<br />
create cl_mem objects associated with the pointers, load constant data into them<br />
with clEnqueueWriteBuffer, and pass the objects to the kernel as arguments with<br />
clSetKernelArg.<br />
3. If you converted any structure parameters to pointers-to-constant-structures,<br />
make the following changes to your host program:<br />
a. Allocate a cl_mem buffer to store the structure contents. (You need a separate<br />
cl_mem buffer for every kernel that uses a different structure value.)<br />
b. Set the structure kernel argument with a pointer to the structure buffer, not<br />
with a pointer to the structure contents.<br />
c. Populate the structure buffer contents before enqueuing the kernel. Ensure that<br />
the structure buffer is populated before the kernel launches by either<br />
enqueuing it on the same command queue as the kernel enqueuing, or by<br />
synchronizing separate kernel enqueuing and structure buffer enqueuing with<br />
an event.<br />
d. When your program no longer needs to call a kernel that uses the structure<br />
buffer, release the cl_mem buffer.<br />
Programming the Flash Memory of an FPGA<br />
By default, you configure an FPGA using the image (that is, the hardware<br />
configuration file) stored in the flash memory of the device. When there is no power,<br />
the FPGA retains the hardware configuration file stored in flash memory. When you<br />
power up the system, the hardware configuration file stored in flash memory<br />
configures the FPGA circuitry. Therefore, it is imperative to load an OpenCLcompatible<br />
image into the flash memory to ensure the proper configuration of the<br />
FPGA.<br />
When you load the hardware configuration file into the flash memory of the FPGA, it<br />
is imperative to not prematurely power down the system. Also, you must not launch<br />
any host code that calls OpenCL kernels or might otherwise communicate with the<br />
FPGA board.<br />
May 2013 Altera Corporation
Appendix B Page 15<br />
Appendix B<br />
May 2013 Altera Corporation<br />
Perform the following steps to load your hardware configuration file into the flash<br />
memory of the FPGA:<br />
1. Ensure that the USB-Blaster driver is installed successfully so that you can load<br />
your hardware configuration file into the Flash memory. If your USB-Blaster<br />
driver is not installed properly, refer to the Altera SDK for OpenCL Getting Started<br />
Guide for installation instructions.<br />
2. Generate a .aocx file. Refer to “Compiling the OpenCL Kernel Source File” on<br />
page 4 for more information.<br />
3. Type <strong>aocl</strong> flash .aocx to load the hardware configuration<br />
file into the flash memory of the FPGA.<br />
4. Reboot your host system.<br />
The reason for <strong>programming</strong> the flash memory of your FPGA is that OpenCL kernels<br />
require a PCIe device inside the FPGA so that they can communicate with the host<br />
program. The FPGA boards supported by the Altera SDK for OpenCL version 13.0 are<br />
preprogrammed with a default hardware configuration file that instantiates the PCIe<br />
device inside the FPGA. If the PCIe device in your FPGA board is not configured<br />
properly, you can configure the FPGA using the image stored in the flash memory<br />
such that you can have a functional PCIe device upon rebooting your system.<br />
Programming the FPGA with the <strong>aocl</strong> program Command<br />
If you want to load your hardware configuration file onto the FPGA offline without<br />
using the host runtime API, you may do so by implementing a non-standard use of<br />
clCreateProgramWithBinary in your host code, as shown below:<br />
cl_int status;<br />
cl_program program;<br />
const char *kernel_name = "vectorAdd";<br />
size_t kernel_name_length = 9;<br />
cl_int kernel_status;<br />
program = clCreateProgramWithBinary(context, 1, &device,<br />
&kernel_name_length, (const unsigned char**)&kernel_name,<br />
&kernel_status, &status);<br />
To load the kernel onto the FPGA, perform the following steps:<br />
1. At a command prompt, navigate to the directory that contains the .aocx that<br />
specifies the hardware configuration for the target FPGA.<br />
2. Type <strong>aocl</strong> program .aocx. This command loads the<br />
hardware configuration onto the FPGA.<br />
Allocation Limits of the Altera SDK for OpenCL<br />
The Altera SDK for OpenCL host runtime conforms with the OpenCL platform layer<br />
and runtime API, with the following clarifications and exceptions:<br />
Altera SDK for OpenCL<br />
Programming Guide
Page 16 Appendix B<br />
Altera SDK for OpenCL<br />
Programming Guide<br />
■ Allocation limits—Table 5 lists allocation limits of the SDK.<br />
Table 5. Allocation Limits of the Altera SDK for OpenCL<br />
Item Limit<br />
Maximum number of contexts 4<br />
Maximum number of queues per context 28<br />
Maximum number of program objects per context 20<br />
Maximum number of concurrently running kernels<br />
The total number of<br />
queues<br />
Maximum number of enqueued kernels More than 1000<br />
Maximum number of kernels per FPGA device 32<br />
Maximum number of arguments per kernel 128<br />
Maximum total size of kernel arguments 256 bytes<br />
OpenCL Features Supported by the Altera SDK for OpenCL<br />
The Altera SDK for OpenCL supports the OpenCL Specification version 1.0. The SDK<br />
provides a platform that implements the Embedded Profile (Section 10 of the OpenCL<br />
Specification version 1.0 by the Khronos Group).<br />
Platform Layer and Runtime Implementation<br />
Section 4 of the OpenCL Specification version 1.0 describes the OpenCL platform layer<br />
that allows applications to query OpenCL devices and device configuration<br />
information, and to create OpenCL contexts. Section 5 of the specification describes<br />
the OpenCL runtime API that manages OpenCL objects such as command queues,<br />
memory objects, program objects, kernel objects, and calls that allow you to enqueue<br />
commands.<br />
■ Section 5.2.1: Creating Buffer Objects—Memory objects allocated in host memory<br />
must be aligned to 128-byte boundaries to accommodate the worst-case alignment<br />
of the built-in OpenCL C data types (that is, type long16 because type long16 is<br />
128 bytes (16 x 8 bytes)). When you provide a host memory block to<br />
clCreateBuffer, you must ensure that the buffer is aligned as required.<br />
■ Section 5.2.8: Mapping and Unmapping Memory Objects—The SDK does not present<br />
a coherent shared memory abstraction; host memory and device global memory<br />
are not shared. All memory transfers between the host and the device must be<br />
explicitly managed with the OpenCL host API.<br />
■ Section 5.4.1: Creating Program Objects—The SDK does not provide an online<br />
compiler. Therefore, do not use the clCreateProgramWithSource API. Instead, use<br />
clCreateProgramWithBinary to create a program.<br />
■ Section 5.8: Out-of-order Execution of Kernels and Memory Object Commands—<br />
Command queues do not support out-of-order execution.<br />
May 2013 Altera Corporation
Appendix B Page 17<br />
OpenCL Programming Language Implementation<br />
May 2013 Altera Corporation<br />
Section 6 of the OpenCL Specification version 1.0 describes the OpenCL C <strong>programming</strong><br />
language. OpenCL C is based on C99 with some limitations. The Altera SDK for<br />
OpenCL conforms with the OpenCL C <strong>programming</strong> language with clarifications and<br />
exceptions summarized in Table 6. The following symbols indicate the support<br />
statuses of the features:<br />
■ “v” means that the feature is supported, and there is additional information<br />
available in the Notes column in Table 6.<br />
■ “v*” means that the feature is supported with exceptions and clarifications.<br />
■ “X” means that the feature is not supported.<br />
■ “—” means that an explanation or clarification for the feature is available in the<br />
Notes column Table 6.<br />
■ OpenCL <strong>programming</strong> language implementations that are supported with no<br />
additional clarifications are not shown in Table 6.<br />
Table 6. OpenCL Programming Language Implementation (Part 1 of 5)<br />
Section Feature Supported Notes<br />
6.1.1 Built-in Scalar Data Types<br />
double precision float v*<br />
half X<br />
6.1.2 Built-in Vector Data Types v*<br />
6.1.3 Built-in Data Types X<br />
6.1.4 Reserved Data Types X<br />
6.1.5 Alignment of Types —<br />
6.2.1 Implicit Conversions v<br />
6.2.2 Explicit Casts v<br />
6.5 Address Space Qualifiers v*<br />
6.6 Image Access Qualifiers X<br />
6.7 Function Qualifiers<br />
6.7.2<br />
Optional Attribute<br />
Qualifiers<br />
v<br />
*Preliminary support for all double precision float builtin<br />
scalar data types. This feature might not conform<br />
with the OpenCL Specification version 1.0.<br />
*Preliminary support for vectors with three elements.<br />
Three-element vectors might not conform with the<br />
OpenCL Specification version 1.0.<br />
All scalar and vector types are aligned as required<br />
(vectors with three elements are aligned as if they had<br />
four elements).<br />
Refer to Section 6.2.6: Usual Arithmetic Conversions in<br />
the OpenCL Specification version 1.2 for an important<br />
clarification of implicit conversions between scalar and<br />
vector types.<br />
The SDK allows scalar data casts to a vector with a<br />
different element type.<br />
*Function scope __constant variables are not<br />
supported.<br />
Refer to the Altera SDK for OpenCL Optimization Guide<br />
for tips on using regd_work_group_size to<br />
improve kernel performance.<br />
The SDK parses but ignores vec_type_hint and<br />
work_group_size_hint attribute qualifiers.<br />
Altera SDK for OpenCL<br />
Programming Guide
Page 18 Appendix B<br />
Altera SDK for OpenCL<br />
Programming Guide<br />
Table 6. OpenCL Programming Language Implementation (Part 2 of 5)<br />
Section Feature Supported Notes<br />
6.8<br />
Restrictions—The SDK implements the restrictions listed in the OpenCL Specification version 1.0 as<br />
follows:<br />
pointer assignments<br />
between address spaces<br />
—<br />
pointers to functions X<br />
structure-type kernel<br />
arguments<br />
X<br />
images X<br />
Arguments to __kernel functions declared in a<br />
program that are pointers must be declared with the<br />
__global, __constant, or __local qualifier.<br />
The compiler enforces the OpenCL restriction against<br />
point assignments between address spaces.<br />
The compiler does not enforce the OpenCL restriction<br />
against pointers to functions—you must ensure that<br />
your code contains no pointers to functions.<br />
Convert structure arguments to a pointer to a structure<br />
in global memory.<br />
bit fields X<br />
The compiler does not enforce this restriction—you<br />
must ensure that your code contains no bit fields.<br />
variable length arrays and<br />
structures<br />
X<br />
variadic macros and<br />
functions<br />
X<br />
C99 headers X C99 headers are not available.<br />
extern, static,<br />
auto, and register<br />
storage-class specifiers<br />
predefined identifiers v<br />
recursion X<br />
irreducible control flow X<br />
writes to memory of<br />
built-in types less than<br />
32 bits in size<br />
declaration of arguments<br />
to __kernel functions<br />
of type event_t<br />
elements of a struct or<br />
union belonging to<br />
different address spaces<br />
X<br />
The compiler does not enforce this restriction—you<br />
must ensure that your code contains no extern or<br />
static specifiers; the auto and register<br />
specifiers should have no effect.<br />
Warning: Assigning elements of a struct or union<br />
to different address spaces might cause a fatal error.<br />
Use the -D option of the aoc command to provide<br />
preprocessor symbol definitions in your kernel code.<br />
The compiler does not enforce this restriction—you<br />
must ensure that your code contains no recursive<br />
functions.<br />
Irreducible control flow is invalid. Code with irreducible<br />
control flows might result in a hardware implementation<br />
with unexpected behavior. Generally, irreducible control<br />
flow occurs in loops that have multiple entry points.<br />
v* *Performance of such writes is poor.<br />
X<br />
X<br />
The compiler does not enforce this restriction—you<br />
must ensure that your code does not include<br />
__kernel function arguments of type event_t.<br />
The compiler does not enforce this restriction—you<br />
must ensure that all elements of a struct or union<br />
belong to the same address space.<br />
May 2013 Altera Corporation
Appendix B Page 19<br />
May 2013 Altera Corporation<br />
Table 6. OpenCL Programming Language Implementation (Part 3 of 5)<br />
Section Feature Supported Notes<br />
6.9 Preprocessor Directives and Macros<br />
#pragma directive:<br />
#pragma unroll<br />
v<br />
The compiler supports only #pragma unroll. The<br />
unroll directive may optionally take an integer as an<br />
argument to control the unroll factor.<br />
For example, #pragma unroll 4 unrolls a loop<br />
with four iterations.<br />
By default, an unroll directive causes the compiler to<br />
attempt to fully unroll the loop.<br />
Refer to the Altera SDK for OpenCL Optimization Guide<br />
for tips on using #pragma unroll to improve<br />
kernel performance.<br />
__ENDIAN_LITTLE__<br />
defined to be value 1<br />
v The target FPGA is little-endian.<br />
__IMAGE_SUPPORT__ X<br />
__IMAGE_SUPPORT__ is undefined; the SDK does<br />
not support images.<br />
6.10 Attribute Qualifiers—The compiler parses attribute qualifiers as follows:<br />
Specifying Attributes of<br />
6.10.2 Functions—Structure-type<br />
kernel arguments<br />
6.10.3 Specifying Attributes of Variables<br />
6.10.4<br />
6.10.5<br />
endian X<br />
Specifying Attributes of<br />
Blocks and<br />
Control-Flow-Statements<br />
Extending Attribute<br />
Qualifiers<br />
6.11.2 Math Functions<br />
X<br />
X<br />
—<br />
built-in math functions v*<br />
built-in half_ and<br />
native_ math functions<br />
v*<br />
6.11.5 Geometric Functions v<br />
6.11.8<br />
Image Read and Write<br />
Functions<br />
X<br />
Convert structure arguments to a pointer to a structure<br />
in global memory.<br />
The SDK for OpenCL compiler can parse attributes on<br />
various syntactic structures. It reserves some attribute<br />
names for its own internal use. Refer to the Altera SDK<br />
for OpenCL Optimization Guide for more information<br />
about these attribute names.<br />
*Built-in math functions for double precision float are<br />
supported preliminarily. These functions might not<br />
conform with OpenCl Specification version 1.0.<br />
*Built-in half_ and native_ math functions for<br />
double precision float are supported preliminarily.<br />
These functions might not conform with OpenCl<br />
Specification version 1.0.<br />
*Built-in gemometric functions for double precision<br />
float are supported preliminarily. These functions might<br />
not conform with OpenCl Specification version 1.0.<br />
Refer to Table 7 on page 21 for a list of built-in<br />
geometric functions supported by the SDK.<br />
Altera SDK for OpenCL<br />
Programming Guide
Page 20 Appendix B<br />
Altera SDK for OpenCL<br />
Programming Guide<br />
Table 6. OpenCL Programming Language Implementation (Part 4 of 5)<br />
Section Feature Supported Notes<br />
6.11.9<br />
6.11.10<br />
6.11.11<br />
Synchronization Functions—<br />
the barrier synchronization<br />
function<br />
Explicit Memory Fence<br />
Functions<br />
Async Copies from Global to<br />
Local Memory, Local to<br />
Global Memory, and Prefetch<br />
v*<br />
X<br />
v*<br />
*Clarifications and exceptions:<br />
If a kernel specifies regd_work_group_size or<br />
max_work_group_size attributes, barrier<br />
supports the corresponding number of work-items.<br />
If neither attribute is specified, a barrier is instantiated<br />
with a default limit of 256 work-items.<br />
The work-item limit is the maximum supported workgroup<br />
size for the kernel; this limit is enforced by the<br />
runtime.<br />
*The implementation is naive:<br />
Work-item (0,0,0) performs the copy and the<br />
wait_group_events is implemented as a barrier.<br />
If a kernel specifies the reqd_work_group_size<br />
or max_work_group_size attribute,<br />
wait_group_events supports the corresponding<br />
number of work-items.<br />
If neither attribute is specified,<br />
wait_group_events is instantiated with a default<br />
limit of 256 work-items.<br />
Additional built-in vector functions from the OpenCL Specification version 1.2 Section 6.12.12: Miscellaneous Vector<br />
Functions:<br />
vec_step v<br />
shuffle v<br />
shuffle2 v<br />
May 2013 Altera Corporation
Appendix B Page 21<br />
May 2013 Altera Corporation<br />
Table 6. OpenCL Programming Language Implementation (Part 5 of 5)<br />
Section Feature Supported Notes<br />
OpenCL Specification version 1.2 Section<br />
6.12.13: printf<br />
v*<br />
*Preliminary support. This feature might not conform<br />
with the OpenCL Specification version 1.0 See below for<br />
details.<br />
The printf function in OpenCL has syntax and features similar to the printf function in C99, with a few<br />
exceptions. For details, refer to the OpenCL Specification version 1.2.<br />
To use a printf function, there are no requirements for special compilation steps, buffers, or flags; you can<br />
compile kernels that include printf instructions with the usual aoc command.<br />
During kernel execution, printf data is stored in a global printf buffer that is automatically allocated by the<br />
compiler. The size of this buffer is 64 kB; the total size of data arguments to a printf call should not exceed this<br />
size. When kernel execution completes, the contents of the printf buffer are printed to standard output.<br />
Buffer overflows are handled seamlessly; printf instructions can be executed an unlimited number of times.<br />
However, if the printf buffer overflows, kernel pipeline execution stalls until the host reads the buffer and prints<br />
the buffer contents.<br />
Because printf functions store their data into a global memory buffer, the performance of your kernel will drop if it<br />
includes such functions.<br />
There are no usage limitations on printf functions. You can use printf instructions inside if-then-else<br />
statements, loops, etc. A kernel can contain multiple printf instructions that are executed by multiple work-items.<br />
Format string arguments and literal string arguments of printf calls are transferred to the host system from the<br />
FPGA using a special memory region. This memory region can overflow if the total size of the printf string<br />
arguments is large (3000 characters or less is usually safe in a typical OpenCL application). An overflow causes the<br />
following error message to be printed during the host program execution:<br />
cannot parse auto-discovery string at byte offset 4096<br />
Output from printf is never intermixed, even though work-items may execute printf functions<br />
concurrently; however, the order of concurrent printf execution is not guaranteed. In other words, printf<br />
outputs may not appear in program order if the printf instructions are in concurrent data paths<br />
Table 7. Supported Scalar and Vector Argument Built-in Geometric Functions<br />
Argument Type<br />
Function<br />
float double<br />
cross v v<br />
dot v v<br />
distance v v<br />
length v v<br />
normalize v v<br />
fast_distance v —<br />
fast_length v —<br />
fast_normalize v —<br />
Altera SDK for OpenCL<br />
Programming Guide
Page 22 Appendix B<br />
Altera SDK for OpenCL<br />
Programming Guide<br />
Numerical Compliance Implementation<br />
Section 7 of the OpenCL Specification version 1.0 describes features of the C99 and IEEE<br />
754 standards that must be supported by OpenCL compliant devices. The Altera SDK<br />
for OpenCL operates on 32-bit and 64-bit floating-point values in IEEE Standard 754-<br />
2008 format, but not all floating-point operators have been implemented, as<br />
summarized in Table 8.<br />
Table 8. Numerical Compliance Implementation<br />
Section Feature Supported Notes<br />
7.1 Rounding Modes v*<br />
7.2<br />
Image Addressing and Filtering Implementation<br />
Image addressing and filtering are not supported. The SDK does not support images.<br />
Optional Extensions<br />
INF, NaN and Denormalized<br />
Numbers<br />
v*<br />
Conversions between integer and single and half<br />
precision floating-point types support all rounding<br />
modes.<br />
*Conversions between integer and double precision<br />
floating-point types support all rounding modes on a<br />
preliminary basis. This feature might not conform with<br />
the OpenCL Specification version 1.0.<br />
INF and NaN results for single precision operations are<br />
generated in a manner that conforms with the OpenCL<br />
Specifications version 1.0. Most operations that handle<br />
denormalized numbers are flushed prior to and after a<br />
floating-point operation.<br />
*Double precision floating-point operation is supported<br />
on a preliminary basis. This feature might not conform<br />
with the OpenCL Specifications version 1.0.<br />
7.3 Floating-Point Exceptions X The SDK discards floating-point exceptions.<br />
7.4 Relative Error as ULPs v*<br />
7.5 Edge Case Behavior v<br />
Single precision floating-point operations conform with<br />
the numerical accuracy requirements for an embedded<br />
profile of the OpenCL Specification version 1.0.<br />
*Double precision floating-point operation support is<br />
preliminary. This feature might not conform with the<br />
OpenCL Specification version 1.0.<br />
Section 9 of the OpenCL Specification version 1.0 describes a list of optional features that<br />
may be supported by some OpenCL implementations.<br />
■ Section 9.5: Atomic Functions for 32-bit integers—The SDK offers preliminary<br />
support for all 32-bit global and local memory atomic functions described in<br />
Section 6.12.11: Atomic Functions of the OpenCL Specification version 1.2. This feature<br />
might not conform with the OpenCL Specification version 1.0.<br />
1 The implementation of atomic functions may lower the performance of<br />
your design. The constraint on performance (F max) increases if more than<br />
one type of atomic functions are used in the kernel.<br />
May 2013 Altera Corporation
Document Revision History Page 23<br />
Embedded Profile Implementation<br />
May 2013 Altera Corporation<br />
Section 10 of the OpenCL Specification version 1.0 describes the OpenCL embedded<br />
profile. The Altera SDK for OpenCL conforms with the OpenCL embedded profile<br />
with clarifications and exceptions summarized in Table 9.<br />
Multiple Host Threads<br />
Document Revision History<br />
Table 10. Document Revision History<br />
Table 9. Clarifications and Exceptions to OpenCL Embedded Profile<br />
Clause Feature Supported Notes<br />
1 64-bit integers v<br />
2 3D images X The SDK does not support images.<br />
3<br />
Create 2D and 3D images with<br />
image_channel_data_type<br />
values<br />
X The SDK does not support images.<br />
4 Samplers X<br />
5 Rounding modes —<br />
6<br />
Restrictions listed for single<br />
precision basic floating-point<br />
operations<br />
7 half type X<br />
8<br />
Error bounds listed for conversions<br />
from CL_UNORM_INT8,<br />
CL_SNORM_INT8,<br />
CL_UNORM_INT16 and<br />
CL_SNORM_INT16 to float<br />
Appendix A.2 Multiple Host Threads of the OpenCL Specification version 1.0—The<br />
Altera SDK for OpenCL host library is not thread safe.<br />
Table 10 lists the revision history for this document.<br />
X<br />
X<br />
This clause of the OpenCL Specification version 1.0<br />
does not apply to the SDK.<br />
Refer to Table 5 on page 16 for a list of allocation<br />
limits.<br />
Date Version Changes<br />
May 2013 13.0.0<br />
Updated compilation flow.<br />
Updated kernel compiler commands.<br />
Included Altera SDK for OpenCL Utility commands.<br />
Inserted a OpenCL Programming Considerations section.<br />
Updated flash <strong>programming</strong> procedure and moved it to Appendix A.<br />
Included a new clCreateProgramWithBinary FPGA hardware <strong>programming</strong> flow; moved the old<br />
clCreateProgramWithBinary hardware <strong>programming</strong> flow to Appendix A.<br />
Moved updated information on allocation limits and OpenCL language support to Appendix B.<br />
November 2012 12.1.0 Initial release.<br />
Altera SDK for OpenCL<br />
Programming Guide
Page 24 Document Revision History<br />
Altera SDK for OpenCL<br />
Programming Guide<br />
May 2013 Altera Corporation