11.07.2015 Views

The GPU Computing Revolution - London Mathematical Society

The GPU Computing Revolution - London Mathematical Society

The GPU Computing Revolution - London Mathematical Society

SHOW MORE
SHOW LESS
  • No tags were found...

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

A KNOWLEDGE TRANSFER REPORT FROM THE LMSAND THE KTN FOR INDUSTRIAL MATHEMATICS9Figure 4: BUDE molecular docking performance in seconds (lower is better).BUDE was ported to the OpenCLparallel programming language.Figure 4 compares theperformance of this OpenCL coderunning on a <strong>GPU</strong> to the original,optimised Fortran code running ona fast, quad-core CPU. <strong>The</strong> resultsfor both performance and energyefficiency were compelling for the<strong>GPU</strong>-based system: NVIDIAC2050 <strong>GPU</strong>s gave a real-worldspeedup of 4.0X versus a fastquad-core CPU while using aboutone half of the energy to completethe same work [76, 77].An important benefit of using thenew OpenCL parallel programminglanguage was the ability to runexactly the same code on a rangeof different <strong>GPU</strong>s from differentvendors, and even on multi-corex86 CPUs. Thus BUDE can nowuse whichever hardware is thefastest available at any given time.BUDE can also use <strong>GPU</strong>s andCPUs at the same time to deliverthe maximum possible aggregateperformance.Options pricing using<strong>GPU</strong>s: NAGOne of the earliest applicationareas explored using <strong>GPU</strong>s wasMonte-Carlo-based numericalintegration for derivative pricing andrisk management in financialmarkets. <strong>The</strong> primary need in thisapplication is the ability to generaterapidly high-quality pseudo-randomor quasi-random numbers.Fortunately parallel algorithms forgenerating these kinds of randomnumbers already exist, and severalof these algorithms are wellmatched to the <strong>GPU</strong>’s many-corearchitecture.<strong>The</strong> Numerical Algorithms Group(NAG) is a provider of high-qualitynumerical software libraries. NAGis one of the first vendors to support<strong>GPU</strong>s – their <strong>GPU</strong>-acceleratedpseudo-random number generatorsshow speedups of between 5X and34X when running on NVIDIA’sC2050 <strong>GPU</strong>, compared to Intel’shighly optimised MKL library whenrunning on all eight cores of acontemporary Intel Core i7 860operating at 2.8GHz [17]. <strong>The</strong>exact speedup of these routinesdepends on the kind of randomnumber generator being used(MRG32k3a, Sobol or MT19937)and also the distribution required(uniform, exponential or normal).However, in modern financialmodels, generating the randomnumbers is typically only a fractionof the overall task, with the modelsthemselves often requiringconsiderable computation toevaluate. Since the Monte Carlomethod is inherently parallel, theentire simulation can be performedon the <strong>GPU</strong>, where the abundanceof computing power means thatthese complex models can beevaluated very rapidly. Thiscapability has attracted severalusers from the financial servicesindustry, including several of themore sophisticated insurancecompanies. This class of potential<strong>GPU</strong> user is faced with modern riskmanagement requirements such as‘Solvency II’, which require largeamounts of simulation. Whencombined with the complexcashflow calculations inherent ininsurance portfolios, this applicationbecomes massively parallel andvery compute-intensive, making itwell suited to modern <strong>GPU</strong>s.NAG has published the results itachieved while working with two ofits tier-one financial servicescustomers. One of these customershas used NAG’s <strong>GPU</strong>-acceleratedrandom number generator library tospeed up its ‘local vol’ code andsaw speedups of approximately tentimes compared to a fast quad-corex86 CPU.In summary, the generation of largesets of pseudo-random numbers forMonte Carlo methods has beenone application area where <strong>GPU</strong>shave had a big impact. Speedupsfor the random number generationroutines of 100X or more versus asingle CPU core are regularlyachieved, often translating into

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!