11.07.2015 Views

The GPU Computing Revolution - London Mathematical Society

The GPU Computing Revolution - London Mathematical Society

The GPU Computing Revolution - London Mathematical Society

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

A KNOWLEDGE TRANSFER REPORT FROM THE LMSAND THE KTN FOR INDUSTRIAL MATHEMATICS15Processing Elements (which it calls‘threads’) and Compute Units(which it calls ‘thread blocks’) butgenerally it is considered to befairly straightforward to port codebetween CUDA C and similarprogramming models, such asOpenCL.CUDA threads executeindependently and thus ideally onindependent data — this is whydata parallelism is such a natural fitfor these kinds of architectures.Threads in a thread block aregrouped to execute essentially thesame program at the same time,but on different pieces of data. Thisdata-parallel approach is known asSingle Instruction Multiple Data(SIMD) [38]. On the other hand,different thread blocks may executecompletely different programs fromone another if the need arises,although most applications tend torun the same program on all thethread blocks at the same time,essentially turning the computationinto one large SIMD or vectorcalculation.CUDA’s maturity brings a numberof benefits for software developers,including a growing number ofsoftware development toolsincluding debuggers andprofilers [97]. In 2009, CUDAFortran arrived, as a jointdevelopment between NVIDIA andthe Portland Group [95]. CUDAFortran takes the principles ofCUDA C and weaves them into astate-of-the-art commercial Fortran2003 compiler.OpenCLOpenCL bears many similarities toCUDA and indeed NVIDIA is one ofthe main contributors to theOpenCL standard and so thisshould be no surprise. <strong>The</strong> biggestdifferences are in the way OpenCLis being developed. WhereasCUDA is a proprietary solutionbeing driven by a single vendor,OpenCL is an open standard,instigated by Apple, but now beingdriven by a consortium of over 35companies, including all the majorprocessor vendors such as Intel,IBM and AMD. <strong>The</strong> consortium isbeing organised and run by theKhronos Group [67].OpenCL is a much more recentdevelopment than CUDA and iscorrespondingly less mature.However, OpenCL also includes anumber of more recent advancesfor supporting heterogeneouscomputing in systems combiningmultiple CPUs and <strong>GPU</strong>s. <strong>The</strong> firstversion of the OpenCL standardwas released in December2008 [66], and OpenCL has beendeveloping rapidly since then. Ithas already been integrated intorecent versions of Apple’s OS Xoperating system. AMD andNVIDIA have releasedimplementations for their <strong>GPU</strong>s, theformer also including a version thatwill run on a multi-core x86 hostCPU. IBM has demonstrated aversion of OpenCL running on itsCell processor and recentlyreleased a version for theirhigh-end POWER architecture [56].Intel released its first OpenCLimplementation for its multi-corex86 CPUs in late 2010 [27, 60].Embedded processor companiesare also developing their ownOpenCL solutions, includingARM [11, 99], ImaginationTechnologies [57] and ZiiLabs [122]. <strong>The</strong>se last threecompanies provide the CPUs and<strong>GPU</strong>s in most of the popularconsumer electronics gadgets,such as smartphones and portableMP3 players.While OpenCL is less mature thanNVIDIA’s CUDA and has some ofthe drawbacks of committeedesigned standards, its benefits arethe openness of the standard, thevast resource being ploughed intoits development by manycompanies, and most importantly,its cross-platform capabilities.OpenCL is quite a low-levelsolution. It exposes features thatmany software developers may nothave had to deal with before.CUDA has similar features butincludes a higher-level applicationprogrammer interface (API) thatconveniently handles much of thelow-level detail. But in OpenCL thisis all left up to the programmer.One example is in the explicit useof queues for sending commandssuch as ‘run this kernel’ from thehost processor to the many-core<strong>GPU</strong>. It is expected that as OpenCLmatures, various solutions willemerge to abstract away thislower-level detail, leaving mostprogrammers to operate at a higherlevel. An interface has alreadybeen developed that provides thisfacility for C++ programs.One of OpenCL’s other importantcharacteristics is that it has beendesigned to support heterogeneouscomputing from Day One; that is, itsupports running codesimultaneously on multiple, differentkinds of processors, all within asingle OpenCL program. Whenre-engineering software this is animportant consideration: adopting aprogramming environment thatsupports a wide range ofheterogeneous parallel hardwarewill give developers the greatestflexibility when deploying theirre-engineered codes in the future.For example, an OpenCL programcould decide to run one task onone of the host processor cores,while running another task using amany-core <strong>GPU</strong>, and do this all inparallel. <strong>The</strong>se multiple OpenCLtasks can easily coordinatebetween themselves, passing dataand signals from one to the other.Because almost all processors will

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!