Prototypes - prace

prace.project.eu
  • No tags were found...

Prototypes - prace

Prototyping in PRACEPRACE Energy to Solution prototype at LRZTorsten Wilde 1IP-WP9 co-lead and 2IP-WP11 lead (GSC-LRZ)PRACE Industy Seminar, Bologna, April 16, 2012


Leibniz Supercomputing Center2


Outline• PRACE– PRACE and Prototyping– Challenges, Energy efficiency• Energy to Solution (EtS) Prototype at LRZ– Motivation– Technology– First results4


PRACE – A Partnership with a Vision• Provide world-class HPC systems for world-class science• Support Europe in attaining global leadership in publicand private research and development… and a Mission• Create a world-leading persistent high-end HPCinfrastructure managed as a single legal entity– Deploy systems of the highest performance level (Tier-0)– Ensure a diversity of architectures to meet the needs of Europeanuser communities– Collaborate with vendors and ISVs on strategic HPC technologies– Provide support and training5


Prototyping is Mandatory for PRACEIdentificationof UserRequirementsAssessmentof emergingTechnologiesRecommendations for:- Procurements- Deployment of matureSoftware TechnologiesTechnologyWatchJointDevelopmentswith VendorsRecommendations for:further Developments• Prototyping is a mandatory step in the selection and deployment of new technologies• Prototyping is a vehicle for cooperation with technology providers6


Future System Architecture Projections/TargetsEnergy EfficiencySystems 2010 2012 2015 2018 DifferenceToday & 2018System Peak [PF/s] 2 25 200 1000 O(1000)Power [MW] 6 6-20 15-50 20-80 O(10)System Memory [PB] 0.3 0.3-0.5 5 32-64 O(100)GB RAM/Core 0.5-4 0.5-2 0.2-1 0.1-0.5 -O(10)Node Performance [GF/s] 125 160-1000 500-7000 1000-10000 O(10)-O(100)Cores/Node 12 16-32 100-1000 1000-10000 O(100)-O(1000)Node memory BW [GB/s] 40 70 100-1000 400-4000 O(100)Number of nodes ~20.000 10.000-100.0005000-50.000100.000-1.000.000O(10)-O(50)Total concurrency ~240.000 O(10 6 ) O(10 7 ) O(10 9 ) O(50.000)MTTI days days O(1 day) O(1 day) -O(10)Source: Rick Stevens and Andy White, IESP Meeting, Oxford 2010


Todays “Energy Efficiency”Nr.1 Top500K computer, RIKEN, Japan10.51 PetaFlop/s @12.659 MW(830.18 MFlop/s per W)~ 1 Exaflop/s@ 1.2GWNr. 1 Green500Blue Gene/Q, IBM –Rochester, MI, US(2026.48 MFlop/s per W)~ 1 Exaflop/s@ 500MW8


• Use newestsemiconductortechnology• Use of energy savingprocessor and memorytechnologies• Consider using specialhardware oraccelerators dependingon algorithm• Reduce power losses inthe power supply chain• Improve coolingtechnology• Re-use waste heat fromIT systems• Monitor the energyconsumption of thecompute system and thebuilding infrastructure• Use energy aware systemsoftware to exploit theenergy saving features ofthe platform• Monitor and optimize theperformance of yourscientific applicationsEnergy efficienthardwareEnergy efficientinfrastructureEnergy awaremanagement software9


• PRACE ARMand ARM+GPU• Mont-Blanc• Deep• Use newestsemiconductortechnology• Use of energy savingprocessor and memorytechnologies• Consider using specialhardware oraccelerators dependingon algorithm• Reduce power losses inthe power supply chain• Improve coolingtechnology• Re-use waste heat fromIT systems• Monitor the energyconsumption of thecompute system and thebuilding infrastructure• Use energy aware systemsoftware to exploit theenergy saving features ofthe platform• Monitor and optimize theperformance of yourscientific applicationsEnergy efficienthardwareEnergy efficientinfrastructureEnergy awaremanagement software10


How is it done


Cooling matters (courtesy of APC)


Power Challenge at LRZSystemEraPeakPerformancePowerConsumptionInvestmentCostsTotalOperatingCosts(incl. Power)Power BillHLRB I: HitachiSR80002000-20061.3 TFLOP/s 0.5 MW 29 M€ 13 M€ 3 M€HLRB II: sgiALTIX 47002006-201162 TFLOP/s 1 MW 35 M€ 16 M€ 7 M€SuperMUC: IBMiDataPlex2012-20163000TFLOP/s3 MW 48 M€ 35 M€ 22 M€13


CooLMUC Hardware• The world’s first AMD based direct warm/hot water-cooled clusterwith• 178 nodes (2x8-core AMD CPU and 16GB RAM pernode)• Infiniband QDR network• Power monitoring for nodes, network equipment and coolinghardware• Closed racks (no dependence on room air conditioning)• Waste-heat reuse through a SorTech adsorption chiller14


Direct water cooled CPUs, Infiniband HCAs, Chipsets, ...16


Warm water cooling first results• This is really early evaluation data, freshly collected!• Measurement equipment accuracy is not checked!• Results are specific to current CooLMUC prototypesetup!Use all possible caution when drawing conclusions!17


Higher inlettemperatures cause aslightly smaller ∆t18


Power Consumption ofNodes increased by 2.4kW(=5.6%)19


Example SettingOutside Temperature:26,1 ºCCond. Outlet Setpoint:70,0 ºCWater Inlet:59,8 ºCPower Consumption(Baseline: 27 ºC Water Inlet Temperature)Leakage Overhead: 3,13 kWCooling Power: 9,00 kWCOP: 3(coefficient of performance)COP (Compression based): 4


Adsorption Summary• Efficiency and usefulness depends on:– Outside temperature– Water temperature– Components– Adsorption machine COP• Will work well if:– Leakage overhead from components is small– COP can compete with compressor based cooling– Hot water will be created anyways for other reuse purposes22


Other reuse of waste heatApplicationTemperatureLoss due toLeakage*CommentHot WaterHeating50 - 90 ºC 5,9 - 15,4 % If nothing else works …Swimming Pool 40 - 60 ºC 3,6 - 8,3 %Ok, if you would need theenergy anywaysUnderfloorHeatingConcrete CoreActivation30 - 35 ºC 1,2 - 2,4 % * Good25 ºC 0,04 % * Very Good*Reference: 18 ºC Inlet Temperature23


Outlook - What REALY happens to the waste heat?HPC Resort & SpaMaking use of waste heat:•Warm water produced by SuperMUC will be used to heat ouroffice buildings•Good during winter - but what’s during summer?•What can you do with water at 50-60ºC during summer?24


Bonus Slides25


PRACE Funded Prototypes – 1IP Phase 1• I/O– Exascale IO (CEA & CINES)– Novel Exascale I/O Concepts (JSC)• Accelerators– Interconnect Virtualization @ CaSToRC (CaSToRC)– DSP HPC Node (KTH)• Memory– NUMA-CIC (UiO)• Energy-to-Solution– Energy-to-Solution @ LRZ (LRZ)– Energy-to-Solution @ JKU (JKU)– Energy-to-Solution @ BSC (BSC)– Energy-to-Solution @ PSNC (PSNC)26


Warning• This is really early evaluation data, freshly collated!• The prototypes span from single chip to clusters!• Measurement equipment accuracy is not checked!• Energy measurements vary in scope!• Most of the codes may be optimized in the future!Use all the caution possible when drawing conclusions!27


STREAM Performance Space1e+12BSCCaSToRCLRZKTH1e+11Bandwidth [Bytes/s]1e+101e+091e+08100 1000 10000 100000 1e+06 1e+07 1e+08 1e+09 1e+10Vector Length [Bytes]28


STREAM Energy Efficiency10000BSCCaSToRCLRZKTHEnegry Efficienct [Bytes/J]1000100101100 1000 10000 100000 1e+06 1e+07 1e+08 1e+09 1e+10Vector Length [Bytes]29


Linpack Performance Space1e+131e+12BSCCaSToRCLRZKTHPerformance [FLOP/s]1e+111e+101e+091e+08100 1000 10000 100000 1e+06Equations30


Linpack Energy Efficiency10000BSCCaSToRCLRZKTHEfficiency [FLOP/J]100010010100 1000 10000 100000 1e+06Equations31


Evaluate the usefulness and usability of eachcurrent accelerator technology, e.g. Intel ManyIntegrated Cores (MIC), Nvidia GPU and AMDGPU, and possible alternatives to InfiniBandevaluate possible low-poweralternatives to current HPCsystems based on Systems onChip (SoC) solutions.Evaluate new check-pointing technologies(FTI and MFT) in combination with differentstorage levels and technologies32


Backup Material33


What we do...We Provide Generic IT Servicesto all Universities in Munich• Internet access, Munich Scientific Network• Web, E-Mail, Groupware, Databases, e-Learning, ...• IT-Service & Support• Users:• More than 90.000 students• More than 30.000 employees


What we do...We Provide Generic IT Servicesto all Universities in MunichWe Provide Special IT Servicesto all Universities in Bavaria• Internet access, Munich Scientific Network• Web, E-Mail, Groupware, Databases, e-Learning, ...• IT-Service • Software & Support License Management• Users: • Backup and Archive Services• More • than Competence 90.000 students Centre for:• More • than Networks 30.000 employees• High Performance and Grid Computing• IT-Management


What we do...We Provide Generic IT Servicesto all Universities in MunichWe Provide Special IT Servicesto all Universities in BavariaWe Provide Supercomputing Servicesto German and European Users• Internet access, Munich Scientific Network• Web, E-Mail, Groupware, Databases, e-Learning, ...• IT-Service • Software & Support License Management• Users: • Backup and Archive Services• More • than Competence • Tier-090.000 students Centre Supercomputing for: Centre (SuperMUC)• More • than Networks • Member of the European HPC Community PRACE30.000 employees• High Performance • Research on and Future Grid ComputingHPC Systems:• IT-Management • Hardware Architectures• Programming Models & System Software• HPC Centre Infrastructures


Processor heat density37


Software andScalabilityCompute Node AlternativesPowerConsumptionTier-0 ExascaleChallengeI/O and MemoryResilience38


Software andScalabilityPRACE -1 IP2011-13GPGPUvirtualizationCompute Node AlternativesHybrid CPU/GPUARMFPGA and DSPPowerConsumptionDirect LiquidFree CoolingPRACE PP2008-10SGI UV, ICEAccelerators andprogramming models (Cell,GPU, Clearspeed, LRB)SSDTier-0 ExascaleChallengeNUMA-CICExascale I/OMPP I/OI/O and MemoryResilience39


Software andScalabilityCompute Node AlternativesEURORAScalable HybridPowerConsumptionSHAVE-PRACEPRACE-1Pand 2IP2012CPU/GPUARM+GPUPRACE - 1 IP2011-13GPGPUvirtualizationHybrid CPU/GPUGPUARMDirect LiquidFree CoolingSSDAMFTTier-0 ExascaleChallengeI/O and MemoryResilience40


Summary• Investigate novel technologies that might provideunique solutions for the Tier-0 exascale challenge• Get first hand access to latest technology – beforegeneral availability• Assess technologies, components, and systems fortheir suitability and maturity• Guide development of future generation HPChardware41


Vision• Strengthen Europe as leading innovator for highperformance computing by working with Europeanindustry and technologies• PRACE prototyping is a key element for thetechnological evolution of the PRACE ResearchInfrastructure42


Thanks for Your attendance!Questions?

More magazines by this user
Similar magazines