Lessons from HPCS/PERCSMootaz ElnozahySenior ManagerIBM Research, Austin
High Productivity Computing SystemsGoal:Ø Provide a new generation of economically viable high productivity computingsystems for the national security and industrial user community (2010)Impact:l Performance (time-to-solution): speedup critical nationalsecurity applications by a factor of 10X to 40Xl Programmability (idea-to-first-solution): reduce cost andtime of developing application solutionsl Portability (transparency): insulate research andoperational application software from systeml Robustness (reliability): apply all known techniques toprotect against outside attacks, hardware faults, &programming errorsHPCS Program Focus AreasApplications:l Intelligence/surveillance, reconnaissance, cryptanalysis, weapons analysis, airborne contaminantmodeling and biotechnologyFill the Critical Technology and Capability GapToday (late 80s HPC technology)…..to…..Future (Quantum/Bio Computing)Source:DARPA
DARPA HPCS PhasesPhase I Phase II Phase IIICrayHPIBMSGISun1 yr$3M/yIBMPERCSSunHeroCrayCascadeIBMCray03 06 103 yr$18M/y4 yr§§§Phase I: Concept Study– Brainstorming, new ideas &revolutionary technologiesPhase II: R&D– Focused R&D and risk reductionengineering activities– Down select based on proposal forphase 3Phase III: Full-Scale Development &Manufacturing– Prototype by end of ‘10– Serial 001 in 2011
TeamUT AustinPittsburghMITIBMResearch WatsonWatson ResearchUIUCIBMResearch AustinResearch AustinRPICornellU of Del.LLNLIBMSoftwareGroupSystemsGroupSDSCSystems IBMSoftware GroupGroupUNMUCBVanderbilt
Lesson 1: Tyranny of Linpack§ Linpack continues to dictate final designs andprocurements despite misrepresentingmainstream applications– Forces unbalanced designs good only for top 500– Forces procurements [Flops/$]§ We do not walk the talk on Linpack§ Effect on Exascale:– We can produce funny-looking machines that do Linpack at 1EX– What about real applications?
Lesson 2: Reliability—Unexciting Stuff§ Reliability was assumed– Everybody assumed reliability is a given– Seen as a burden§ Insufficient metrics– MTBF, but what does that mean?– Impact on performance and power easy to measure, but nobodyseems to bother§ Effect on Exascale:– Doing what we do right now means 200PF thrown on the floor
Lesson 3: Application Saga§ Some applications were defined from day 1– Linpack, STREAM– Gups, PTRANS§ But real applications?– 3.5 years into the program– Caught the tail-end of the concept phase, had trouble making impact§ For Exascale:– Please define the application set before issuing the RFP!!!
Lesson 4: Software Inertia§ HPCS got new programming languages andenvironments– X10, Fortress and Chapel– Programming environments and tools§ So what are we using today?– Emacs/vi/favorite editor + MPI + printf§ For Exascale:– Do not underestimate the software inertia– If you plan new software, start yesterday and hire an army
Lesson 5: Knowledge Gaps§ Application experts§ Numerical analysts§ System experts§ Computer scientists§ Engineers§ For Exascale:– Co-design? How you co-design?
Lesson 6: Algorithmic Research§ One algorithmic change was worth 4Ximprovement in performance– 4X just by better h/w is expensive– 4X just by software tuning assumes really bad starting point§ The least investigated component§ For Exascale:– We need fundamental work at the algorithmic level that take intoaccount power constraints, scalability challenge and potentiallyreliability issues.
Lesson 7: Lack of Pipe Cleaners§ Proof of concept§ Testbed & evaluation§ Prototype§ Product§ For Exascale:– Too many new technologies– You need pipe cleaners
Lesson 7: Universality Requirements§ Everyone wants to help!§ Everyone wants to have an impact!§ Everyone has a priority/pet peeves/….§ For Exascale:– Program must never lose focus– The challenges are tall enough, don’t pile more
Lesson 8: Role of Academia§ Ideas§ Students