12.07.2015 Views

Dell Power Solutions

Dell Power Solutions

Dell Power Solutions

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

HIGH-PERFORMANCE COMPUTINGDifferences in memory subsystemsThe <strong>Dell</strong> <strong>Power</strong>Edge 1750 and <strong>Dell</strong> <strong>Power</strong>Edge 3250 servers use DDRat 266 MHz (PC2100) memory. The <strong>Power</strong>Edge 3250 server, however,operates at the speed of 200 MHz. The <strong>Dell</strong> <strong>Power</strong>Edge 1850 serveruses the new DDR2 memoryrunning at 400 MHz (PC3200),which has a theoretical bandwidthof 3.2 GB/sec. DDR2architecture is also based onthe industry-standard dynamicRAM (DRAM) technology. TheDDR2 standard contains severalmajor internal changesthat allow improvements inareas such as reliability andpower consumption. One of the most important DDR2 featuresis the ability to prefetch 4 bits of memory at a time compared to2 bits in DDR.DDR2 transfer speed starts where the current DDR technologyends at 400 MHz. In the future, DDR2 is expected to support 533and 667 mega-transfers/sec (MT/sec) to enable memory bandwidthsof 4.3 GB/sec and 5.3 GB/sec. Currently, only DDR2 at 400 MHz isavailable in <strong>Dell</strong> <strong>Power</strong>Edge servers, which is the memory technologyused in the <strong>Power</strong>Edge 1850 system.Components of the test environmentBLAST scaled well withincreasing processorfrequency on the IntelXeon processors, especiallyon larger query sizes.The goal of the <strong>Dell</strong> study, which was conducted in July 2004, wasto evaluate the impact of processor and memory architecture on theperformance of BLAST. Three <strong>Dell</strong> <strong>Power</strong>Edge servers configuredsimilarly in terms of software and compilers were used. The maindifference between the servers was in the processor architecture and,to some extent, in the memory technology. The use of three serversallowed the test team to compare three processor architectures, includingthe impact of processor and FSB (system bus) speeds and theinfluence of memory technology on BLAST performance.The <strong>Power</strong>Edge 1750 and <strong>Power</strong>Edge 1850 servers, which useIntel Xeon processors, had Intel Hyper-Threading Technology turnedoff. The 90 nm Intel Xeon processors used in the <strong>Power</strong>Edge 1850support 64-bit extensions (EM64T). The 64-bit mode of the EM64TcapableIntel Xeon processor was not used because that was notthe focus of this study.Application background and characteristicsThe BLAST family of sequence database-search algorithms serves asthe foundation for much biological research. The BLAST algorithmssearch for similarities between a short query sequence and a large,infrequently changing database of DNA or amino acid sequences.Version 8.0 Intel compilers for 32-bit applications were usedto compile BLAST on the <strong>Power</strong>Edge 1750 and <strong>Power</strong>Edge 1850servers. For the Itanium-based <strong>Power</strong>Edge 3250 server, version 8.0Intel compilers for 64-bit applications were used.BLAST was executed on each of the four configurations—the<strong>Power</strong>Edge 1850 was configured with two different processor frequencies,one at 3.2 GHz and one at 3.6 GHz. The test used adatabase of about 2 million sequences, with about 10 billion totalletters. For this study, BLAST was executed against single queriesof three lengths: 94,000 words; 206,000 words; and 510,000 words.Runs were conducted using both single and dual threads.Performance evaluation and analysisBefore testing BLAST performance, the test team used the STREAMbenchmark to measure memory bandwidth. STREAM measures realworldbandwidth sustainable from ordinary user programs as opposedto the theoretical peak bandwidth provided by vendors. By runningfour simple kernels, the benchmark measures traffic all the wayfrom registers to main memory (and vice versa). Because the arraysare much too large to fit in caches, the benchmark measures a mixtureof both read and write traffic. STREAM measures programmerperceivedbandwidth—that is, sustained bandwidth rather than rawor peak bandwidth.Figure 2 shows the measured memory bandwidth using theSTREAM benchmark. The <strong>Power</strong>Edge 3250 server showed significantimprovements over the <strong>Power</strong>Edge 1750 server because of its widersystem bus (128 bits). Similarly, the <strong>Power</strong>Edge 1850 server showedimprovements over the <strong>Power</strong>Edge 3250 thanks to its faster memoryclock speed (400 MHz) as well as a faster FSB (800 MHz).Next, the performance of BLAST was evaluated using differentquery sizes and running single and dual threads. The importanceof processor frequency, architecture, and memory subsystem designcan be determined from the results obtained on the four testedsystem configurations.Throughput (MB/sec)40003500300025002000150010005000<strong>Power</strong>Edge 3250 (Intel Itanium processor at 1.5 GHz)<strong>Power</strong>Edge 1850 (Intel Xeon processor at 3.2 GHz)<strong>Power</strong>Edge 1750 (Intel Xeon processor at 3.2 GHz)366632823675 3737339131552427 2431Copy Scale Add TriadFigure 2. Sustainable memory bandwidth measured using STREAM benchmark21943524 3646 2162www.dell.com/powersolutions Reprinted from <strong>Dell</strong> <strong>Power</strong> <strong>Solutions</strong>, February 2005. Copyright © 2005 <strong>Dell</strong> Inc. All rights reserved. POWER SOLUTIONS 121

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!