12.07.2015 Views

Dell Power Solutions

Dell Power Solutions

Dell Power Solutions

SHOW MORE
SHOW LESS
  • No tags were found...

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

HIGH-PERFORMANCE COMPUTINGprovides sufficient transparency to the programmer and is compatiblewith the message-passing model.The complex I/O patterns required to accomplish efficientmemory management are handled using the abstract programminginterface ChemIO. ChemIO is a high-performance I/O applicationprogramming interface (API) designed to meet the requirements oflarge-scale computational chemistry problems. It allows the programmerto create I/O files that may be local or distributed.Like TCGMSG, the runtime database is a component of thefourth tier. The runtime database is a persistent data storage mechanismdesigned to hold calculation-specific information for all theupper-level programming modules. Because it is not destroyed atthe end of a calculation unless the user specifically requests itsdestruction, a given runtime database can be used in several independentcalculations.Examining the performance characteristics of NWChemEvery application behaves differently and has distinctive characteristics.For the study described in this article, which was conducted inNovember 2003, baseline performance was measured when runningNWChem on a single Intel Itanium processor–based <strong>Dell</strong> <strong>Power</strong>Edge3250 server at various processor speeds and cache sizes; speedupwas evaluated when running the application on multiple nodes in a<strong>Dell</strong> HPC cluster. The study also addressed the performance impact ofrunning applications like NWChem on various configurations of IntelItanium processor–based systems to obtain a better understanding ofNWChem’s dependencies on processor features such as cache sizeand clock frequency. Understanding the behavior and patterns of theNWChem application can help organizations identify how to designand allocate appropriate resources using standard data sets. This in turnhelps to identify bottlenecks while running NWChem on a cluster.All tests for this study were conducted using NWChem 4.5. Theinput file used for this study was siosi3.nw, a density functionaltheory (DFT) benchmark that calculates the DFT function on thesiosi3 molecule and provides as output the various atomic energiesrelated to the DFT function. This input file is publicly available anddownloadable from the NWChem home page (www.emsl.pnl.gov/docs/nwchem/nwchem.html). For comparison purposes, the inputfile was kept constant throughout this study. Wall-clock time—thatis, the real running time of the program from start to finish (inseconds)—was used to measure performance.Test configuration L3 cache Processor clock frequency1 4MB 1.4 GHz2 1.5 MB 1.4 GHz3 1.5 MB 1.0 GHzFigure 2. Test configurations comprising variable clock speeds and cache sizesPerformance improvement (percent)1401301201101009080BaselineCacheCPU frequency<strong>Power</strong>Edge 3250(1.4 GHz, 1.5 MB L3)Measuring the effect of Itanium 2 processor featuresTo understand the sensitivity of the NWChem application to cachesize and processor frequency, the <strong>Dell</strong> High-Performance ComputingCluster team conducted tests using the default data set (siosi3.nw).The team conducted multiple tests for various configurations suchas constant processor clock frequency and different cache size versusconstant cache size and different clock frequency.Three configurations using 64-bit Intel Itanium processors weretested as follows: The first two test configurations represented thesame clock speed but different level 3 (L3) cache sizes, while thesecond two test configurations represented the same cache size butdifferent clock speeds (see Figure 2).The results in Figure 3 were obtained on a single processorusing the Red Hat ® Enterprise Linux ® AS 2.1 operating system withkernel 2.4.18-e31smp. For this particular test, <strong>Dell</strong> engineers usedthe precompiled binaries with default optimization that are availablefrom the Red Hat Web site.<strong>Power</strong>Edge 3250(1.4 GHz, 4 MB L3)Figure 3. Impact of cache size and processor speed on NWChem performanceFigure 3 shows that when the clock speed was kept at 1.4 GHzand the L3 cache size increased from 1.5 MB to 4 MB, performanceincreased approximately 11 percent. Thus, a larger cache size helpedachieve a performance improvement in this study when runningNWChem’s siosi3.nw benchmark. When running other data sets,the performance benefits from large caches will likely depend onthe size of the data set. Larger data sets typically benefit more thansmaller data sets from larger caches.<strong>Power</strong>Edge 3250(1.0 GHz, 1.5 MB L3)Figure 3 also shows performance benefits when the L3 cachesize was kept constant at 1.5 MB and the CPU frequency wasincreased from 1.0 GHz to 1.4 GHz (a 40 percent increase in processorclock speed). The percentage performance gain for movingfrom the slower clock frequency to the higher clock frequency wasapproximately 35 percent. Thus, the results from this study signifythat the NWChem application is highly compute intensive and canbenefit from increasing processor clock speed because NWChemperformance scaled well with CPU frequency.<strong>Power</strong>Edge 3250(1.4 GHz, 1.5 MB L3)www.dell.com/powersolutions Reprinted from <strong>Dell</strong> <strong>Power</strong> <strong>Solutions</strong>, February 2005. Copyright © 2005 <strong>Dell</strong> Inc. All rights reserved. POWER SOLUTIONS 139

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!