11.07.2015 Views

CUDA Compression - Progress - Stanford PPL

CUDA Compression - Progress - Stanford PPL

CUDA Compression - Progress - Stanford PPL

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

<strong>CUDA</strong> <strong>Compression</strong> - <strong>Progress</strong>●●●investigated <strong>CUDA</strong> architecture by reading throughProgramming GuideSet up <strong>CUDA</strong> development environment and compiled testprogramsresearched details of each of the compression algorithms●●●images - JPEGdata - ZIPaudio - MP3/OGG


<strong>CUDA</strong> Architecture●●●●~4-16 streaming multiprocessors each having memory● contain ~8-16 scalar processors accommodating up to 512threads, each has local memoryprogram in blocks, where each block is independent of anotherSPMT architecture, similar to SPMDnon-parallel sections will still run on the architecture, but key toperformance will be vector-type processing, with little or nosynchronization between blocks


JPEG <strong>Compression</strong>• Take advantage mostly of data parallelism• Algorithm for each 8x8 pixel units– R GB to YCbCr (brightness, chrominance)– Downsampling– DCT (2d frequency transformation)– Quantization (high frequency)– E ncoding (zigzag + Huffman)• Distribute unit among <strong>CUDA</strong> blocks• Further parallelism in DCT• E ffects of varying thread workload/distribution• Libjpeg


ZIP File <strong>Compression</strong>●●●●●●●●common operation on desktops, serverslossless compression algorithmcombination of LZ77 and Huffman codingData Parallelism: Input file can be broken into blocks, and eachof these compressed independentlyBlocks must be written back in the correct order (may requiresynchronization)Can use open source PIGZ compression program●multicore gzip compression using Pthreadsport to <strong>CUDA</strong> manycore, port to Windowsparallel bzip implementation for SMP and cluster machines alsoavailable for comparison


Audio <strong>Compression</strong>●●●●Algorithms are designed to take in PCM(sampled audio input) and compressThis lossy compression technique is performedby eliminating frequency components outsidethe human audible rangeOther techniques such as Huffman encodingand suppressing low amplitude backgroundsounds also usedHigh levels of DLP available in all cases


Future Direction●●●●●●become more familiar with <strong>CUDA</strong> programming by writing smalltest applicationsstart initial development as a group (will encounter commonproblems)each develop parallel application separatelybenchmark the applications and analyze the speedupcomment on architectural/algorithmic issues that affect theperformancetweak applications as necessary based on performance data

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!