Tutorial CUDA
Tutorial CUDA
Tutorial CUDA
You also want an ePaper? Increase the reach of your titles
YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.
Parallel Reduction Complexity<br />
Takes log(N) steps and each step S performs N/2 S<br />
independent operations<br />
© NVIDIA Corporation 2008<br />
Step complexity is O(log(N))<br />
For N=2 D , performs ∑ S∈[1..D] 2 D-S = N-1 operations<br />
Work complexity is O(N)<br />
Is work-efficient (i.e. does not perform more operations<br />
than a sequential reduction)<br />
With P threads physically in parallel (P processors),<br />
performs ∑ S∈[1..D] ceil(2 D-S /P) operations<br />
∑S∈[1..D]ceil(2D-S /P) < ∑S∈[1..D](floor(2D-S /P) + 1) < N/P + log(N)<br />
Time complexity is O(N/P + log(N))<br />
Compare to O(N) for sequential reduction