13.07.2015 Views

ECE1724 Project Final Report: HashLife on GPU

ECE1724 Project Final Report: HashLife on GPU

ECE1724 Project Final Report: HashLife on GPU

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

6 Evaluati<strong>on</strong>Experimental Setup All implementati<strong>on</strong>s of Life and <str<strong>on</strong>g>HashLife</str<strong>on</strong>g> were tested <strong>on</strong> a 2.83GHz Core 2 quad-core CPUwith 4GB of RAM (unknown type and speed), although there is <strong>on</strong>ly <strong>on</strong>e thread of executi<strong>on</strong>. The <strong>GPU</strong> used is anNVIDIA GeForce GTX 280 with 1GB of RAM. All the grids were square with 22,000 cells <strong>on</strong> a side. Given <strong>on</strong>e byteper cell, this translates to about 484MB, enabling two such grids to fit in the video card memory at <strong>on</strong>e time, allowingalternating updates.All cells are initially set to ’dead’ in both grids, and a small test pattern is then inserted at the centre of the first grid.This pattern is the R-Pentomino, which is a simple pattern of 6 cells which is known to evolve for 1103 iterati<strong>on</strong>s, atwhich point it settles into a number of stable, cyclical patterns.Figure 3: R-Pentomino (black cells are Live)The grids are then copied to the <strong>GPU</strong> (when used), and no further data transfers occur during measurement. Onlythe kernel is restarted at each iterati<strong>on</strong>. The number of generati<strong>on</strong>s is set to 100 for CPU implementati<strong>on</strong>s and 4000for the <strong>GPU</strong>, to achieve similar benchmark times of 300 to 500 sec<strong>on</strong>ds. Table 1 shows the number of millisec<strong>on</strong>dsfor each iterati<strong>on</strong> of each algorithm, and the resulting speedups are worked out in Table 2.Speedup and Memory Bandwidth The raw speedup granted by moving Life to the <strong>GPU</strong> is 38.4x. Relative to thatLife baseline, the speedup granted by the partial implementati<strong>on</strong> of <str<strong>on</strong>g>HashLife</str<strong>on</strong>g> is 1.53x <strong>on</strong> the CPU, and <strong>on</strong>ly 1.18x <strong>on</strong>the <strong>GPU</strong>. The <strong>GPU</strong> implementati<strong>on</strong> was already memory bound but benefited n<strong>on</strong>etheless from caching of the texturememory, since the total number of cells accessed remains otherwise the same. The benefit is relatively small as thetexture caches do not reduce latency, but merely free up some global memory bandwidth. In c<strong>on</strong>trast, the CPU cachesdo reduce latency and thus achieve a greater speedup.Thread Dispatching It was expected that reducing the number of threads by 75% (prior to using texture memory)would have a positive performance impact since otherwise most of the threads would do nothing at all, reducing theutilizati<strong>on</strong> of the processors. However, there was no significant difference in performance, further supporting the ideathat access to global memory is the limiting factor. This lack of change when going from 484 milli<strong>on</strong> threads to <strong>on</strong>ly121 milli<strong>on</strong> also suggests that the dispatching of threads is extremely efficient.Millisec<strong>on</strong>ds per Iterati<strong>on</strong>CPU <strong>GPU</strong>Life 4612 120<str<strong>on</strong>g>HashLife</str<strong>on</strong>g> 3006 102Table 1: Millisec<strong>on</strong>ds per iterati<strong>on</strong> of Life and <str<strong>on</strong>g>HashLife</str<strong>on</strong>g> implementati<strong>on</strong>s.SpeedupsLife (CPU) Life (<strong>GPU</strong>) <str<strong>on</strong>g>HashLife</str<strong>on</strong>g> (CPU) <str<strong>on</strong>g>HashLife</str<strong>on</strong>g> (<strong>GPU</strong>)Life (CPU) 1 0.0260 0.652 0.0221Life (<strong>GPU</strong>) 38.4 1 25.1 0.850<str<strong>on</strong>g>HashLife</str<strong>on</strong>g> (CPU) 1.53 0.0399 1 0.0339<str<strong>on</strong>g>HashLife</str<strong>on</strong>g> (<strong>GPU</strong>) 45.2 1.18 29.5 1Table 2: Speedup comparis<strong>on</strong>s between all implementati<strong>on</strong>s of Life and <str<strong>on</strong>g>HashLife</str<strong>on</strong>g>. The speedups are for the rows,relative to the columns.5

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!