18 THE <strong>GPU</strong> COMPUTING REVOLUTIONFrom Multi-Core CPUs To Many-Core Graphics ProcessorsCurrent ChallengesLike all breakthroughs intechnology, the change frommulti-core to many-core computerarchitectures will not be smooth foreveryone. <strong>The</strong>re are significantchallenges during the transition,some of which are outlined below.1. Porting code to massivelyparallel heterogeneoussystems is often (but notalways) harder than ports tonew hardware have been inthe past. Often completelynew algorithms are required.2. Many-core technologies arestill relatively new, withimplications for the maturityof software tools, the lack ofsoftware developers with theright skills and experience,and the paucity of portedapplication software andlibraries.3. Even though cross-platformprogramming languages suchas OpenCL are nowemerging, these have so farfocused on source codeportability and cannotguarantee performanceportability. This is of coursenot a new issue; any highlyoptimised code written in amainstream language suchas C, C++ or Fortran hasperformance portabilityissues between differentarchitectures. However,differences between <strong>GPU</strong>architectures are evengreater than those betweenCPU architectures, and soperformance portability is setto become a greaterchallenge in the future.4. <strong>The</strong>re are multiple competingopen and de facto standardswhich inevitably confuse thesituation for potentialadopters.5. Many current GP<strong>GPU</strong>products still carry some oftheir consumer graphicsheritage, including the lack ofimportant hardware reliabilityfeatures such as ErrorCorrecting Codes (ECC) ontheir memories. Even wherethese features do exist, theycurrently incur prohibitiveperformance penalties thatare not present in thecorresponding CPUsolutions.6. <strong>The</strong>re is a lot of hype around<strong>GPU</strong> computing, with manyover-inflated claims ofperformance speedups of100 times or more. <strong>The</strong>seclaims increase the risk ofsetting expectations too high,with subsequentdisappointment from trialprojects.7. <strong>The</strong> lack of industry standardbenchmarks makes it difficultfor users to comparecompeting many-coreproducts simply andaccurately.Of all these challenges, the mostfundamental is the design anddevelopment of new algorithms thatwill naturally lend themselves to themassive parallelism of <strong>GPU</strong>s today,and to the ubiquitousheterogeneous multi-/many-coresystems of tomorrow. If as acommunity we can design adaptive,highly scalable algorithms, ideallywith heterogeneity and evenfault-tolerance in mind, we will bewell placed to exploit the rapiddevelopment of parallelarchitectures over the next twodecades.
A KNOWLEDGE TRANSFER REPORT FROM THE LMSAND THE KTN FOR INDUSTRIAL MATHEMATICS19Next StepsAudit your softwareOne valuable practical step we caneach take is to perform an audit ofthe software we use that isperformance-critical to our work.For software developed by thirdparties, find out what their policy istowards supporting many-coreprocessors such as <strong>GPU</strong>s. Is theirsoftware already parallel? If so,how scalable is it? Does it runeffectively on a quad-core CPUtoday? What about the emerging 8,12 and 16 core CPUs? Do theyhave a demonstration of theirsoftware accelerated on a <strong>GPU</strong>?What is their roadmap for thesoftware? <strong>The</strong> software licensingmodel is something that you alsoneed to be conscious of — is thesoftware licensed per core,processor, node, user, . . . ? Willyou have to pay more for anupgrade that supports many-coreprocessors? Some vendors willsupply these features as no-costupgrades; others will charge extrafor them.It is also important to be specificabout parallel acceleration of theparticular features you use in thesoftware in question. For example,at the time of writing there are <strong>GPU</strong>accelerated versions of denselinear algebra solvers in MATLAB,but not of sparse linear algebrasolvers [73]. Just because anapplication claims to be‘<strong>GPU</strong>-accelerated’, it does notnecessarily follow that yourparticular use of that application willgain the performance benefit of<strong>GPU</strong>s. Your mileage will definitelyvary, so check with the supplier ofyour software to verify beforecommitting.Plan for parallelismIf you develop your own software,start thinking about what your ownpath to parallelism should be. Arethe users of your software likely tostick primarily to multi-coreprocessors in mainstream laptops,desktops and servers? If so youshould be thinking about adoptingOpenMP, MPI or another widelysupported approach for parallelprogramming. You should probablyavoid proprietary approaches thatmay not support all platforms or becommercially viable in the longterm. Instead, use open standardsavailable across multiple platformsand vendors to minimise your risk.Also consider when many-coreprocessors will feature in yourroadmap. <strong>The</strong>se are inevitable —even the mainstream CPUs willrapidly become heterogeneousmany-cores, so this really is a‘when’ not an ‘if’. If you do not wantto support many-core processors inthe near or medium term, OpenMPand MPI will be good choices. If,however, you may want to supportmany-core processors within thenext few years, you will need a planto adopt either OpenCL or CUDAsooner rather than later. OpenCLmight be a viable alternative toOpenMP on multi-core CPUs in theshort term.If you are going to develop yourown many-core aware softwarethere is a tremendous amount ofsupport that you can tap into.Attend a workshopIn the UK each year there areseveral training workshops in theuse of <strong>GPU</strong>s. <strong>The</strong> nationalsupercomputing serviceHECToR [53] is starting to provide<strong>GPU</strong> workshops on CUDA andOpenCL programming; thetimetable for these is availableonline [52]. Prof Mike Giles at theUniversity of Oxford regularly runsCUDA programming workshops; forthe date of the next one see [43].His webpage also includesexcellent links to other <strong>GPU</strong>programming resources. A searchfor <strong>GPU</strong> training in the UK shouldturn up offerings from severaluniversities. <strong>The</strong>re are alsocommercial training providers in theUK that are worth considering,such as NAG [89]. Daresbury Labshas a team who track the latestprocessor technologies and whoare already experienced indeveloping software for <strong>GPU</strong>s. Thisgroup holds occasional seminarsand workshops, and is willing tooffer advice and guidance fornewcomers to many-coretechnologies [30]. <strong>GPU</strong> vendorswill often provide assistance ifasked, especially if their assistancecould lead to new sales.<strong>The</strong>re are also many conferencesand seminars emerging to addressmany-core computing. <strong>The</strong> UK nowhas a regular <strong>GPU</strong> developersworkshop. Previous years haveseen the workshop held inOxford [1] and Cambridge [2]. <strong>The</strong>2011 workshop is due to be held atImperial College.A useful <strong>GPU</strong> computing resourceis <strong>GPU</strong>computing.net [47]. Inparticular it has a page dedicatedto <strong>GPU</strong> computing in the UK [45].This site is mostly dominated by theuse of NVIDIA <strong>GPU</strong>s, reflectingNVIDIA’s lead in the market, butover time the site should see agreater percentage of contentcoming from work on a wider rangeof platforms.To get started with OpenCL, onegood place to start is AMD’s