Slides for this Presentation - Cray User Group

cug.org

Slides for this Presentation - Cray User Group

OverviewIntroductionSeshatExperimentsExperiments IIExperiments IIIRelated WorkSummary andFuture WorkIntroductionSeshatExperimentsExperiments IIExperiments IIIRelated WorkSummary and Future WorkCUG 2007 2 / 28


IntroductionIntroductionSeshatExperimentsExperiments IIExperiments IIIRelated WorkSummary andFuture WorkIntroductionCUG 2007 3 / 28


IntroductionIntroductionIntroductionSeshatExperimentsExperiments IIExperiments IIIRelated WorkSummary andFuture Work■■■■■Many applications today are so complex (and dynamic)that it is very difficult to predict message passingpatterns and behaviorMPI traces can help analyze applicationsTraces can also be used to feed simulators fornext-generation systemsProblem: Extracting traces changes application behaviorThis talk presents preliminary results for an intrusion freeMPI trace collectorCUG 2007 4 / 28


OverviewIntroductionSeshatOverviewDesignExperimentsExperiments IIExperiments IIIRelated WorkSummary andFuture WorkCUG 2007 6 / 28


DesignIntroductionSeshatOverviewDesignExperimentsExperiments IIExperiments IIIRelated WorkSummary andFuture Work■■■■■Execution driven network simulator◆◆Current sim is simple; uses Red Storm parametersPlans to make it parallel and topology awareUse MPI profiling interface to hook into existingapplications◆No code instrumentation; only re-link neededRun each node in virtual time, set by simulator◆MPI Wtime() returns virtual timeNetwork sim collects statistics about ever message inappCan write info to a trace file without disturbing virtualtimeCUG 2007 7 / 28


IntroductionSeshatExperimentsAll-to-AllNAS LU ATrace formatStatisticsNAS SP AExperiments IIExperiments IIIExperimentsRelated WorkSummary andFuture WorkCUG 2007 8 / 28


All-to-All benchmark on 128 NodesIntroductionSeshatExperimentsAll-to-AllNAS LU ATrace formatStatisticsNAS SP A300.0 ms250.0 ms200.0 ms128-Node All-to-All BenchmarkExperiments IIExperiments IIIRelated WorkSummary andFuture WorkTime150.0 ms100.0 ms50.0 ms0.0 s16 k 32 k 64 k 96 k 128 k 160 k 192 kNumber of ints exchangedw/o tracingw/ tracing■10 runs, same nodes, alternate trace on/offCUG 2007 9 / 28


NAS LU Class A on 4 NodesIntroductionSeshatExperimentsAll-to-AllNAS LU ATrace formatStatisticsNAS SP AExperiments IIExperiments IIIRelated WorkSummary andFuture WorkReported Time30 s25 s20 s15 s10 s5 s0 sLU class A on 4 nodesRun A Run B Run C Run D Run EFive Test RunsSimulated with tracingNativeCUG 2007 10 / 28


NAS LU Class A on 64 NodesIntroductionSeshat3 sLU class A on 64 nodesExperimentsAll-to-AllNAS LU ATrace formatStatisticsNAS SP AExperiments IIExperiments IIIRelated WorkSummary andFuture WorkReported Time2.5 s2 s1.5 s1 s0.5 ms0 sRun A Run B Run C Run D Run EFive Test RunsSimulated with tracingNativeCUG 2007 11 / 28


Trace FormatIntroductionSeshatExperimentsAll-to-AllNAS LU ATrace formatStatisticsNAS SP AExperiments IIExperiments IIIRelated WorkSummary andFuture Work■■■■■■■■■Time of event at network simulatorSource (or root) of message (collective)DestinationVirtual send timeSimulated time in networkMPI tagType of collectiveLength of message in bytesASCII format,≈90 bytes per eventCUG 2007 12 / 28


Trace StatisticsIntroductionSeshatExperimentsAll-to-AllNAS LU ATrace formatStatisticsNAS SP AExperiments IICode Nodes Events Wall Clock Time Trace Sizew/o w/ traceAll-to-all 128 4,826,000 1,300s 15,671s 397 MBLU, A 4 126,635 30s 391s 11 MBLU, A 16 759,699 10s 2,288s 63 MBLU, A 64 3,545,003 4s 10,581s 285 MBLU, A 256 > 7,172,517 3s > 21,557s > 589 MBExperiments IIIRelated WorkSummary andFuture Work■■256-node LU job killed after 6 hoursTrace file written to home directory (NFS, not parallelfile system)CUG 2007 13 / 28


NAS SP Class A on 16 NodesIntroductionSeshat12 sSP class A on 16 nodesExperimentsAll-to-AllNAS LU ATrace formatStatisticsNAS SP AExperiments IIExperiments IIIRelated WorkSummary andFuture WorkReported Time10 s8 s6 s4 s2 s0 sRun A Run B Run C Run D Run EFive Test RunsSimulated with tracingNative■Reported time w/ trace is 6.5% higherCUG 2007 14 / 28


NAS SP Class A on 64 NodesIntroductionSeshat12 sSP class A on 64 nodesExperimentsAll-to-AllNAS LU ATrace formatStatisticsNAS SP AExperiments IIExperiments IIIRelated WorkSummary andFuture WorkReported Time10 s8 s6 s4 s2 s0 sRun A Run B Run C Run D Run EFive Test RunsSimulated with tracingNative■Reported time w/ trace is 40% higher!CUG 2007 15 / 28


IntroductionSeshatExperimentsExperiments IINAS CG ACG A BC 64Experiments IIIRelated WorkSummary andFuture WorkExperiments IICUG 2007 16 / 28


NAS CG Class A on 16 NodesIntroductionSeshatExperiments1 sCG class A on 16 nodesExperiments IINAS CG ACG A BC 64Experiments IIIRelated WorkSummary andFuture WorkReported Time0.8 ms0.6 ms0.4 ms0.2 ms0 sRun A Run B Run C Run D Run EFive Test RunsSimulated with tracingNative■Reported time w/ trace is 48% higher!CUG 2007 17 / 28


NAS CG Class A on 256 NodesIntroductionSeshatExperiments1 sCG class A on 256 nodesExperiments IINAS CG ACG A BC 64Experiments IIIRelated WorkSummary andFuture WorkReported Time0.8 ms0.6 ms0.4 ms0.2 ms0 sRun A Run B Run C Run D Run EFive Test RunsSimulated with tracingNative■■Reported time w/ trace is 385% higher!Does benchmark class or trace size matter?CUG 2007 18 / 28


NAS CG Class A on 64 NodesIntroductionSeshatExperiments1 sCG class A on 64 nodesExperiments IINAS CG ACG A BC 64Experiments IIIRelated WorkSummary andFuture WorkReported Time0.8 ms0.6 ms0.4 ms0.2 ms0 sRun A Run B Run C Run D Run EFive Test RunsSimulated with tracingNative■ Reported time w/ trace is 110% higher!■ Number of events: 269,501CUG 2007 19 / 28


NAS CG Class B on 64 NodesIntroductionSeshatExperiments10 sCG class B on 64 nodesExperiments IINAS CG ACG A BC 64Experiments IIIRelated WorkSummary andFuture WorkReported Time8 s6 s4 s2 s0 sRun A Run B Run C Run D Run EFive Test RunsSimulated with tracingNative■ Reported time w/ trace is 25% higher!■ # of events: 1,279,421 (5 times more than class A)CUG 2007 20 / 28


NAS CG Class C on 64 NodesIntroductionSeshatExperiments500 sCG class C on 64 nodesExperiments IINAS CG ACG A BC 64Experiments IIIRelated WorkSummary andFuture WorkReported Time400 s300 s200 s100 s0 sRun A Run B Run C Run D Run EFive Test RunsSimulated with tracingNative■ Reported time w/ trace is 3,557% higher!■ # of events: 1,279,421 (same as class B)■ Problem is not class or event size!CUG 2007 21 / 28


IntroductionSeshatExperimentsExperiments IIExperiments IIICGRelated WorkSummary andFuture WorkExperiments IIICUG 2007 22 / 28


NAS CG on 64 NodesIntroductionCG 64 nodesSeshatExperiments14 sClass CExperiments IIExperiments IIICGRelated WorkSummary andFuture WorkReported Time12 s10 s8 s6 sClass B4 ssimulated, no tracingnative2 s0 sClass ARun A Run B Run C Run D Run E Run F Run GSeven Test Runs■■Bug seems to be in virtual time adjustmentDelay due to tracing exacerbates problemCUG 2007 23 / 28


IntroductionSeshatExperimentsExperiments IIExperiments IIIRelated WorkSummary andFuture WorkRelated WorkCUG 2007 24 / 28


Related WorkIntroductionSeshatExperimentsExperiments IIExperiments IIIRelated WorkSummary andFuture Work■■■Two ways to assess message passing behavior:◆◆Collect complete trace data, but alter applicationbehaviorCollect only statisticsNeed to reduce size of trace and computation timeE.g., IPDPS’07 paper (Michael Noeth et. al) compressestraces, but leaves timing information outCUG 2007 25 / 28


IntroductionSeshatExperimentsExperiments IIExperiments IIIRelated WorkSummary andFuture WorkSummary and Future WorkCUG 2007 26 / 28


Summary and Future WorkIntroductionSeshatExperimentsExperiments IIExperiments IIIRelated WorkSummary andFuture Work■■■■Fix timing bugProof of conceptClearly need to compress data◆Buffer traces in sim node or on buffer-node toreduce wall-clock time.Customizable trace format and filterCUG 2007 27 / 28


IntroductionSeshatExperimentsExperiments IIExperiments IIIRelated WorkSummary andFuture WorkQuestions?CUG 2007 28 / 28

More magazines by this user
Similar magazines