11.07.2015 Views

A Compiler for Parallel Exeuction of Numerical Python Programs on ...

A Compiler for Parallel Exeuction of Numerical Python Programs on ...

A Compiler for Parallel Exeuction of Numerical Python Programs on ...

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

3. float : 32-bit floating-point4. double : 64-bit floating-point5. int : 32-bit signed integer6. float2 : A 64-bit structure <str<strong>on</strong>g>of</str<strong>on</strong>g> 2 floats. If X is a float2, then it has two float comp<strong>on</strong>entswhich can be referenced as X.x and X.y. <str<strong>on</strong>g>Numerical</str<strong>on</strong>g> indexing is not allowed and .x and.y is the <strong>on</strong>ly way to get to comp<strong>on</strong>ents.7. float4 : A 128-bit structure <str<strong>on</strong>g>of</str<strong>on</strong>g> 4 floats. The 4 comp<strong>on</strong>ents are respectively x, y, z andw. <str<strong>on</strong>g>Numerical</str<strong>on</strong>g> indexing is not allowed.8. double2 : 128-bit structure <str<strong>on</strong>g>of</str<strong>on</strong>g> 2 doubles with comp<strong>on</strong>ents .x and .y.2.1 OverviewRV770 is a massively parallel graphics and computing core. An overview <str<strong>on</strong>g>of</str<strong>on</strong>g> the chip isprovided in Figure 2.1. In the figure, the following blocks <str<strong>on</strong>g>of</str<strong>on</strong>g> RV770 are visible:1. A setup engine and an ultrathreaded dispatcher that dynamically schedules threadexecuti<strong>on</strong> <strong>on</strong> the SIMD units.2. Ten SIMD units. These <str<strong>on</strong>g>for</str<strong>on</strong>g>m the executi<strong>on</strong> cores <str<strong>on</strong>g>of</str<strong>on</strong>g> the chip.3. Ten texture units, where each texture unit is aligned with an SIMD unit. Textureunits are equivalent to a load unit <strong>on</strong> a CPU. Each texture unit also has a dedicatedL1 texture cache.4. Four 64-bit memory c<strong>on</strong>trollers (<str<strong>on</strong>g>for</str<strong>on</strong>g> a total 256-bit memory interface) that can becombined with GDDR3 or GDDR5. Each memory c<strong>on</strong>troller has a dedicated L2cache. The memory c<strong>on</strong>trollers are c<strong>on</strong>nected to the texture units through a crossbarswitch.5. Various c<strong>on</strong>nectors such as PCIe and display c<strong>on</strong>trollers.6. A UVD (Universal Video Decode) block <str<strong>on</strong>g>for</str<strong>on</strong>g> decoding video codecs.From the GPGPU per<str<strong>on</strong>g>for</str<strong>on</strong>g>mance perspective, understanding the SIMD units and thetexture units is important <str<strong>on</strong>g>for</str<strong>on</strong>g> obtaining the best per<str<strong>on</strong>g>for</str<strong>on</strong>g>mance out <str<strong>on</strong>g>of</str<strong>on</strong>g> the chip. The SIMDunits and texture units are described next.5

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!