11.07.2015 Views

3.3 Computer Science - Center for Simulation of Advanced Rockets

3.3 Computer Science - Center for Simulation of Advanced Rockets

3.3 Computer Science - Center for Simulation of Advanced Rockets

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

<strong>3.3</strong> <strong>Computer</strong> <strong>Science</strong>Group Leaders: Michael Heath and Laxmikant KaleFaculty: Eric de Sturler, Michael Heath, Laxmikant Kale, and Marianne WinslettResearch Scientists/Programmers: Phillip Alexander, Michael Campbell, Robert Fiedler, DamrongGuoy, Xiangmin (Jim) Jiao, Orion Lawlor, and John NorrisPost-Doctoral Associates: Jonghyun Lee, Terry Wilmarth, and Eric ShafferGraduate Research Assistants: William Cochran, Rebecca Hartman-Baker, Rajeev Jaiman, SoumyadebMitra, Michael Parks, Manoj Parmer, Christopher Seifert, Rishi Sinha, Pornput Suriyamongkol,Evan Vanderzee, Hanna Vanderzee, Terry Wilmarth, and Gengbin ZhengOverviewThe computer science group continues to provide technologies and tools used by the rest <strong>of</strong> CSAR.Production runs <strong>of</strong> the integrated Rocstar code use our automatic tool Makeflo to partition the structuredmesh <strong>for</strong> Rocflo, the tools provided by the Charm++ FEM Framework to partition the unstructuredmesh <strong>for</strong> Rocflu, and our adaptive version <strong>of</strong> MPI, AMPI, to provide parallel communicationalong with enhanced latency tolerance, automated load balancing, and numerous other benefits. Variousaspects <strong>of</strong> our research in computer science are summarized in the following sections.Two teams — Computational Environment, and Computational Mathematics and Geometry —address research in the <strong>Computer</strong> <strong>Science</strong> Group. The target <strong>of</strong> our Computational Environment teamis to provide the broad computational infrastructure necessary to support complex, large-scale simulationsin general, and <strong>for</strong> rocket simulation in particular. Areas <strong>of</strong> research include parallel programmingenvironments, compiler optimization and parallelization, parallel per<strong>for</strong>mance analysis andtools, parallel input/output, and visualization. Work in Computational Mathematics and Geometryfocuses on the areas <strong>of</strong> linear solvers, mesh generation and adaptation, and interpolation between diversediscretizations.Computational EnvironmentsThe <strong>Computer</strong> <strong>Science</strong> Group has provided support <strong>for</strong> several key technologies used by the rest <strong>of</strong>the <strong>Center</strong>. Our unstructured mesh support library, the FEM Framework based on Charm++, is in fullproduction use by the Fluids Group and in the integrated code. During the next year, we plan to improvethe meshing, mesh partitioning, and remeshing support in the Charm++ FEM framework.Lagrangian Interface Propagation (Jiao)In Rocstar, the interface must be tracked as it regresses due to burning. In recent years, Eulerianmethods, especially level set methods, have made significant advancements and have become thedominant methods <strong>for</strong> moving interfaces. In our context, Lagrangian representation <strong>of</strong> the interface iscrucial to describe the boundary <strong>of</strong> volume meshes <strong>of</strong> physical regions. However, previous numericalmethods, either Eulerian or Lagrangian, have difficulties in capturing evolving singularities (such asridges and corners) in solid rocket motors.To meet this challenge, we have developed a novel approach, called face-<strong>of</strong>fsetting methods,based on a new entropy-satisfying Lagrangian <strong>for</strong>mulation. Our face-<strong>of</strong>fsetting methods deliver anaccurate and stable entropy-satisfying solution without requiring Eulerian volume meshes. A fundamentaldifference between face-<strong>of</strong>fsetting and traditional Lagrangian methods is that our methodssolve the Lagrangian <strong>for</strong>mulation face by face, and then reconstruct vertices by constrained minimizationand curvature-aware averaging, instead <strong>of</strong> directly moving vertices along some approximatenormal directions. This method allows part <strong>of</strong> the surface to be fixed or to be constrained to moveResearch Program — <strong>Computer</strong> <strong>Science</strong> <strong>3.3</strong>.1


along certain directions (such as constraining the propellant to burn along the case). It supports bothstructured and unstructured meshes, with an integrated node redistribution scheme that suffices tocontrol mesh quality <strong>for</strong> moderately moving interfaces. Figure <strong>3.3</strong>.1 shows the propagation <strong>of</strong> ablock-structured surface mesh <strong>for</strong> the fluids domain <strong>of</strong> the Attitude Control Motor (ACM) rocket,where the front and aft ends burn along the cylindrical case.Fig. <strong>3.3</strong>.1: <strong>Simulation</strong> <strong>of</strong> burning <strong>of</strong> Attitude Control Motor along case with block-structured meshes usingface <strong>of</strong>fsetting. Left subfigure shows initial geometry; middle and right subfigures show meshes <strong>of</strong> initialgeometry and after 30% burn, respectively. Colors indicate magnitude <strong>of</strong> total displacements <strong>of</strong> vertices.When coupled with mesh adaptation, the face-<strong>of</strong>fsetting method can capture significant burns.Figure <strong>3.3</strong>.2 shows a sample result <strong>of</strong> the burning <strong>of</strong> a star grain section <strong>of</strong> a rocket motor using theface <strong>of</strong>fsetting method coupled with surface remeshing using MeshSim from Simmetrix(http://www.simmetrix.com). The interior (the fins) <strong>of</strong> the propellant burns at uni<strong>for</strong>m speed and exhibitsrapid expansion at slots and contraction at fins. The fin tips trans<strong>for</strong>m into sharp ridges duringpropagation, as captured by the face <strong>of</strong>fsetting method.Fig. <strong>3.3</strong>.2: <strong>Simulation</strong> <strong>of</strong> uni<strong>for</strong>m burning <strong>of</strong> section <strong>of</strong> star grain <strong>of</strong> solid rocket using face <strong>of</strong>fsetting andmesh repair. Green curves indicate ridges in evolving geometry.Our methods are extensible in many ways. One important direction is to integrate mesh repair andcollision detection to allow propagation under more complex flows. In the coming years, we willwork on surface mesh adaptation <strong>for</strong> handing further geometrical complexities associated with largede<strong>for</strong>mation and substantial burns (such as burn-out <strong>of</strong> fins), and <strong>for</strong> handling topological changes. Inaddition, volume and mass conservation are important <strong>for</strong> which face <strong>of</strong>fsetting is promising in deliveringa simple and reliable solution because <strong>of</strong> its flexibility in reconstructing vertices. Our currentnumerical schemes use piecewise linear faces, and hence allow second order accuracy in space. Theconcept <strong>of</strong> the geometric construction allows extension to higher order schemes.Interface propagation has many applications in science, engineering, and computer graphics towhich face-<strong>of</strong>fsetting methods are applicable. In addition, our methods can also be applied to surfacemesh smoothing, which shares two difficulties with surface propagation: point projection, and en<strong>for</strong>cingnodal constraints. By posing smoothing as a propagation problem with zero speed, the con-Research Program — <strong>Computer</strong> <strong>Science</strong> <strong>3.3</strong>.2


strained minimization and eigenvalue analysis <strong>of</strong> face <strong>of</strong>fsetting can help address both <strong>of</strong> these problems.Inter-Mesh Solution Transfer (Jiao, Jaiman, Geubelle, and Loth)Rocface is the interface code that transfers data between non-matching surface meshes in an accurateand conservative manner. To achieve these goals, Rocface constructs a common refinement, thatis, a mesh whose polygons simultaneouslysubdivide the polygons<strong>of</strong> the given input meshes. Incollaboration with Jaiman,Geubelle, and Loth, we have per<strong>for</strong>meda systematic comparison<strong>of</strong> the common-refinement basedmethod used in Rocface withsome other methods in the literature<strong>for</strong> fluid-solid interactions,identified conclusively the source<strong>of</strong> large errors in those traditionalmethods, and demonstrated thesuperiority <strong>of</strong> Rocface’s methods.Solid domainMesh-correlation and intermeshsolution transfer are importantin many scientific and engineeringapplications. Besides transfer <strong>of</strong> interface data in fluid-solid interactions, other examples includedata transfer after remeshing, crack propagation, and contact problems. This correlation problemis also critical to a number <strong>of</strong> emerging numerical methods, including the generalized finite elementmethods (GFEM), immersed and embedded boundary methods, which are likely to be <strong>of</strong> strategicimportance to CSAR’s longer-term goal <strong>of</strong> simulating rockets under abnormal conditions.The future work will focus on developing efficient and robust parallel algorithms <strong>for</strong> correlatingdifferent meshes that overlap in space but may move across, or slide along, each other, and implementingthese algorithms in an easy-to-integrate s<strong>of</strong>tware toolkit to facilitate various advanced simulations,such as fluid-solid-combustion interaction in rocket simulations, and crack propagations usinggeneralized finite element methods (GFEM).Data Management and I/O (Winslett, Mitra, and Sinha)TractionsDisplacementFluid domainFig. <strong>3.3</strong>.3: Rocface used to transfer load and motion in conservativeand accurate manner across Nonmatching interfaces in Multiphysicssimulations.During the past year, we designed and implemented a new indexing facility <strong>for</strong> use with Rocketeer(and possibly with Rocstar in the longer term). This facility allows a user to specify a region <strong>of</strong>interest in the artifact being visualized, by supplying limit values on major attributes (e.g., temperature,physical location, pressure). The indexing facility will locate the qualifying regions <strong>of</strong> the dataquickly. When used in concert with the GODIVA buffering facility, the indexing facility allowsloading <strong>of</strong> only the data <strong>of</strong> interest into memory, rather than the entire data set. The indexing facilityis based on bitmaps, and breaks new ground in the data management world in a variety <strong>of</strong> respects.The new facility is currently undergoing per<strong>for</strong>mance testing; a preliminary description <strong>of</strong> it has beenaccepted <strong>for</strong> publication in the Storage Network Architecture and Parallel I/O workshop collocatedwith PACT05.To support Rocpanda’s role as the main I/O module <strong>for</strong> Rocstar, we also updated Rocpanda towork with Rocstar 3. This required major modifications to support Rocstar’s new data <strong>for</strong>mats and tochange the way that data are sent to Rocpanda servers. Also, our GODIVA library <strong>for</strong> data-<strong>for</strong>matindependentprefetching and caching <strong>of</strong> scientific data continued to be used in the production versionsResearch Program — <strong>Computer</strong> <strong>Science</strong> <strong>3.3</strong>.3


<strong>of</strong> both the batch and interactive versions <strong>of</strong> the Rocketeer visualization tools. The figure includedbelow shows the per<strong>for</strong>mance benefits that accrue from the use <strong>of</strong> GODIVA in a typical batch run <strong>of</strong>Rocketeer.Finally, we designed, implemented, and evaluated a log-based facility that uses idle local disks tominimize the apparent run-time cost <strong>of</strong> data writes per<strong>for</strong>med by simulation applications, and evaluatedit with rocket simulation data. This facility can be used by Rocstar in the future if the cost <strong>of</strong>immediate writes to remote storage becomes prohibitively high, which may happen as the level <strong>of</strong>computational parallelism continues to increase on Rocstar’s plat<strong>for</strong>ms. The results <strong>of</strong> this work willappear in the Cluster Computing 2005 conference.During the next two years, we plan to complete our new indexing facility and integrate it intoRocketeer (and, if appropriate, into Rocstar). This should provide big per<strong>for</strong>mance gains <strong>for</strong> situationswhere one only wants to look at part <strong>of</strong> a large data set. We will also continue to support the evolution<strong>of</strong> Rocstar and Rocketeer through updates to Rocpanda and GODIVA as needed. Further, we willevaluate the per<strong>for</strong>mance improvement potential <strong>of</strong> passing I/O related hints to the storage serversused in large parallel computations, to allow the storage servers to tune themselves better <strong>for</strong> the upcomingI/O requests. Finally, we will continue our ef<strong>for</strong>t to transfer the technology <strong>of</strong> active bufferingand data migration to the ROMIO parallel I/O library. We intend the fruit <strong>of</strong> these technology transferactivities to be a lasting legacy <strong>of</strong> CSAR that will benefit the thousands <strong>of</strong> scientists who use ROMIOand netCDF.Remeshing (Wilmarth, Lawlor)Over the past six months, wehave worked on the Rocremremeshing module. Rocrem isused when a physics solverreaches a point <strong>of</strong> failure due todeteriorated mesh quality. Separatemesh smoothing and localmesh repair ef<strong>for</strong>ts are ongoing.Remeshing handles the worstcasescenario when smoothingand local repair fail to improvemesh quality sufficiently <strong>for</strong> thesolver to continue.Fig. <strong>3.3</strong>.4: Minimum dihedral angle be<strong>for</strong>e and after remeshingRemeshing ef<strong>for</strong>ts were begunby Orion Lawlor via a pro<strong>of</strong>-<strong>of</strong>-concept code using YAMS and TetMesh to per<strong>for</strong>m serialremeshing and Charm++ <strong>for</strong> parallel solution data transfer. Simmetrix meshing s<strong>of</strong>tware was designatedto replace YAMS and TetMesh, since it has a usable API and runs in parallel. Terry Wilmarthhas extended this to convert Rocrem to use Simmetrix (Figure <strong>3.3</strong>.4).Rocrem was tested with various meshes, resulting in improvements in the initial surface meshstitcher and parallel solution data transfer phases <strong>of</strong> the code. It now produces higher quality mesheswith correct solution data. We have added functionality to make the regional granularity <strong>of</strong> the newmesh similar to that <strong>of</strong> the original mesh.Recent ef<strong>for</strong>ts are focused on restarting. Four types <strong>of</strong> Rocflu restart files are needed <strong>for</strong> eachpartition. Rocrem now extracts data from the new mesh and prepares these files. We are ready to testthe solver restart capabilities from a remeshed rocket. Additional ef<strong>for</strong>ts involved improving the meshquality produced by Simmetrix. Initial experience has yielded lower quality meshes than those obtainedwith YAMS and TetMesh.Research Program — <strong>Computer</strong> <strong>Science</strong> <strong>3.3</strong>.4


Wilmarth has continued work on ParFUM, the Charm++ Parallel Framework <strong>for</strong> UnstructuredMeshes, in collaboration with Pr<strong>of</strong>. Geubelle and his students. She has implemented parallel refinementand coarsening algorithms that can be used by a solver to refine mesh regions where solutionaccuracy is <strong>of</strong> the highest priority and to coarsen where there is less activity (Figure <strong>3.3</strong>.5).Plans <strong>for</strong> the future <strong>of</strong> Rocrem include:• Test the restart capabilities: proceed withthis in parallel with working out the meshquality issues, since in some cases Simmetrixproduces adequately improvedmeshes.• Resolve remaining Simmetrix mesh qualityissues; these ef<strong>for</strong>ts are ongoing.• Integrate Rocrem <strong>for</strong> automatic on-lineremeshing: the goal <strong>of</strong> Rocrem is to enablefull burns <strong>of</strong> rockets. We are already at thetesting stage <strong>for</strong> this, so the next logical stepis to enable full burn with little or no userinteraction required. This will involve implementingan on-line interface using Roccomto extract the mesh directly from memoryand per<strong>for</strong>m Rocrem's task. It will alsoinvolve changes to Rocflu to enable restartwith a new in-memory mesh, and remove the dependency on the several restart files currently required.Per<strong>for</strong>mance Tools and Tuning (Campbell, Alexander, and Haselbacher)Rocpr<strong>of</strong>, the pr<strong>of</strong>iling module in Rocstar,has been redesigned and adapted <strong>for</strong> use withRoccom 3 and Rocstar 3. The new Rocpr<strong>of</strong> hasbeen designed as a light, stand-alone CS servicemodule rather than a tightly integrated submodule<strong>of</strong> Roccom. As a new CS service module,Rocpr<strong>of</strong> compliments Roccom's automatichigh level pr<strong>of</strong>iling by providing enhancedpr<strong>of</strong>iling services to any code in the Rocstarsuite. Rocpr<strong>of</strong>'s services are provided throughnative and standard MPI_PCONTROL interfaces,allowing use by any serial or parallelapplication. The Rocpr<strong>of</strong> suite now also includespost processing tools with which to archive,report, and analyze multiple sets <strong>of</strong> paralleland serial per<strong>for</strong>mance data. Rocpr<strong>of</strong>'sFig. <strong>3.3</strong>.5: Refining along wavefront, coarseningbehindnew design conslidates the stand-alone and integrated code bases, streamlining further development.Support <strong>for</strong> MacOS X 10.3 and 10.4 has been added.We have been putting Rocpr<strong>of</strong> to good use in pr<strong>of</strong>iling and tuning Rocstar 3 components. Submodulepr<strong>of</strong>iling has enabled us to identify and eliminate many per<strong>for</strong>mance bottlenecks in Rocstarphysics and service modules. Ongoing per<strong>for</strong>mance analysis and tuning ef<strong>for</strong>ts to Rocflu, Rocstar'sunstructured fluid solver, have yielded substantial per<strong>for</strong>mance boosts <strong>of</strong> up to factors <strong>of</strong> two <strong>for</strong>MFLOPS/ProcessorRocstar Component MFLOPS on Frost300250200150100500Rocstar Rocfrac Rocflo Rocsolid RocfluRocstar ComponentFig. <strong>3.3</strong>.6: Rocpr<strong>of</strong> used to pr<strong>of</strong>ile individual RocstarmodulesResearch Program — <strong>Computer</strong> <strong>Science</strong> <strong>3.3</strong>.5


critical code sections. We have been collaborating with Livermore Computing in the Rocflu tuningef<strong>for</strong>t. Rocpr<strong>of</strong> has also identified severalbottlenecks in Rocstar's mesh smoothingservice module, Rocmop. Subsequentelimination <strong>of</strong> these bottlenecks have reducedoverall smoothing overhead by afactor <strong>of</strong> two.Project Plans <strong>for</strong> 2006• Recovery - We will recover functionalitiesthat we have left behind in theredesign <strong>of</strong> Rocpr<strong>of</strong> - Intermediatedata dumps allow very long simulationsto continue without introducingpr<strong>of</strong>iling overhead. This importantfeature is key to allowing detailedpr<strong>of</strong>iling during production runs where per<strong>for</strong>mance is important. This is a high priority Rocpr<strong>of</strong>item. - V.I. PAPI support. Version independent PAPI support allows us to obtain hardwarecounter access on Linux systems. As we still have counter access on IBM SP systems, this is amedium priority item. - Thread safety shoring to ensure proper operation under CHARM++. Thisis a high priority item.• Enhancement - We plan the following enhancements to Rocpr<strong>of</strong> capabilities. – Multiple pr<strong>of</strong>ilelevels to allow varying degrees <strong>of</strong> detail. Low priority. – Module-based control allowing indepedentpr<strong>of</strong>iling control <strong>for</strong> each Rocstar module. Med priority. - Control file input to allow finegraincontrol <strong>of</strong> pr<strong>of</strong>iling. This is high priority. - CHUD tool interface allowing statistical kernelbased pr<strong>of</strong>iling in the Mac OS X environment. Medium priority• Utilization - We plan to conduct the following per<strong>for</strong>mance studies in addition to nominal Rocstarper<strong>for</strong>mance monitoring - Rocflu pr<strong>of</strong>iling and tuning on multiple plat<strong>for</strong>ms. This activity is alreadyunderway. There is limited literature available on per<strong>for</strong>mance enhancement <strong>of</strong> unstructuredfinite element codes and we hope to produce a conference paper and a journal paper out <strong>of</strong>this activity. - Rocpart pr<strong>of</strong>iling and tuning as integrated with both structured and unstructuredfluid codes.Mesh Smoothing (Alexander, Campbell, Jiao, Shaffer, E. Vanderzee)In 2004, we introduced a new module, Rocmop, to alleviate the degradation <strong>of</strong> mesh quality as asimulation progresses. This deterioration necessitates ever-shorter global time steps, and may evenresult in simulation failure. Theintent <strong>of</strong> Rocmop is to provideblack-box parallel meshsmoothingroutines supportingboth volume mesh smoothing andfeature-preserving surface meshsmoothing.In our simulations, interiormesh quality issues tend to appearearlier than problems on the surface.There<strong>for</strong>e, our initial ef<strong>for</strong>tshave concentrated on volumemesh smoothing. Our implemen-Seconds250200150100500Be<strong>for</strong>eAfterInterfaceRocfracRocfluFig. <strong>3.3</strong>.7: Rocpr<strong>of</strong> used to identify bottlenecks in Rocflu.Reduced overhead by factor <strong>of</strong> two.t = 0.0 t = 3.0 t = 6.0 ms t = 8.6 msFig. <strong>3.3</strong>.8: Mesh smoothing used to minimize requirement <strong>for</strong> fullmotor remeshing. ACM motor shown.Research Program — <strong>Computer</strong> <strong>Science</strong> <strong>3.3</strong>.6


tation uses MESQUITE, a national lab developed mesh quality toolkit developed at SNL, as a serialcore that is used in conjunction with simple partition boundary smoothing techniques to provide volumemesh optimization in parallel. Recent results have proved that Rocmop is a viable tool <strong>for</strong> prolongingsimulation life, and Rocmop is now utilized in some <strong>of</strong> our largest runs, including those involvingthe NASA space shuttle’s RSRM rocket booster and the Titan IV PQM-1 rocket motor.Mesh Smoothing Related DOE Interactions CSAR’s continuing development <strong>of</strong> a parallel meshoptimizationmodule utilizes the MESQUITE mesh quality toolkit developed primarily at Sandia NationalLaboratories. This ef<strong>for</strong>t is aided by ongoing interaction with national laboratory personnel. Todate, we have conferred with Lori Freitag from LLNL as well as Patrick Knupp and Mike Brewer,both employed at SNL. Two items in particular underline the increasing depth <strong>of</strong> CSAR’s relationshipwith this project’s researchers. The first is that at our request, a version <strong>of</strong> MESQUITE was deliveredwhich provides support <strong>for</strong> additional mesh element types critical to our simulations. This level <strong>of</strong>commitment to improving MESQUITE’s viability as a general purpose tool has exceeded our expectations.The other is the recent internship <strong>of</strong> a CSAR student, Evan VanderZee, with the MESQUITEgroup at Sandia. CSAR will soon adopt reference-mesh-based quality metrics, metrics with whichEvan now has first-hand knowledge.Mesh Smoothing PlansVast improvements are planned <strong>for</strong> Rocmop’s volume smoothing capabilities. The current version<strong>of</strong> the module has inefficiencies that will be addressed during fall 2005. Later in the year, and extendinginto the spring <strong>of</strong> 2006, development will focus on incorporating the newly available version<strong>of</strong> MESQUITE, with its support <strong>for</strong> pyramids and prisms. This work will enable all <strong>of</strong> the physicscodes to utilize Rocmop, whereas the current version supports only Rocflu. At the same time, a newreference-mesh-based quality metric will be adopted. Useful characteristics <strong>of</strong> this class <strong>of</strong> metricsinclude robust detection <strong>of</strong> poor quality elements (dihedral angle is not robust), increased respect <strong>for</strong>mesh sizing, and the absence <strong>of</strong> start-up issues currently caused by initial smoothing displacements.A parallel ef<strong>for</strong>t will restart development <strong>of</strong> surface smoothing in Rocmop. With simulations progressingfurther towards burnout, this ability becomes increasingly important. In 2005, CSAR continuedto develop surface smoothing capabilities indirectly under the auspices <strong>of</strong> Rocprop, a surfacepropagation module. Core surface routines <strong>for</strong> feature detection and classification that are needed byboth Rocmop and Rocprop will be separated from Rocprop. We will then concentrate on buildinghigher-order geometric representations <strong>of</strong> the surface mesh and corresponding schemes to projectpoints onto these surfaces. During the first half <strong>of</strong> 2006, CSAR will use these surface routines to introduceparallel feature-preserving surface smoothing either through custom code, or via further integrationwith MESQUITE.I/O Capability Enhancements (Norris)During the last year, we have worked to improve the I/O capabilities <strong>of</strong> Rocstar. Norris has expandedthe two new Roccom I/O modules, Rocin and Rocout, to support CGNS 2.4 and also workedwith the CGNS development team to include support <strong>for</strong> ghost data in unstructured grids, a necessaryfeature <strong>for</strong> our data. CGNS will be a welcome change to HDF4, which presents many problems, includingslow file access and nondeterministic I/O errors. The CGNS reader plug-in <strong>for</strong> Rocketeer isalso being updated to support the new version.Rocman (Zheng, Jiao)We redesigned Rocman, the control and orchestration module, to coordinate multiple physicsmodules in coupled simulations and provide facilities to extend and implement new couplingschemes. With a completely new design, using the idea <strong>of</strong> action-centric specification and automaticscheduling <strong>of</strong> reusable actions to describe the intermodule interactions, the new Rocman facilitatesResearch Program — <strong>Computer</strong> <strong>Science</strong> <strong>3.3</strong>.7


the diverse needs <strong>of</strong> differentapplications and couplingschemes in an easy-tousefashion. The newRocman also provides betterreadability and maintainabilityby promoting codereuse and modularity.Rocman currently supportsFluidAlone withoutburn, FluidAlone with burn,SolidAlone without burn,Solid/Fluid couplingschemes without burn,Fluid/Solid couplingschemes without burn, andthe fully coupled schemewith Fluid/Solid/Burn. Thenew Rocman also provides avisualization tool that allowsa user to examine thecoupling scheme and verifyits correctness <strong>of</strong> thescheme visually. The visualization<strong>of</strong> a couplingFig. <strong>3.3</strong>.9: Solid/fluid coupling scheme visualization in AiSeescheme shows the internal control flow <strong>of</strong> actions and the data flow as shown in the following figure.Parallel ComputingAMPI (Kale, Zheng, Lawlor, Wilmarth)We continue to develop AMPI, our version <strong>of</strong> MPI that supports virtual processors. We havegreatly improved AMPI per<strong>for</strong>mance on the new Apple Turing cluster with Myrinet interconnect. Wedramatically improved the communication per<strong>for</strong>mance with both small and large messages, as wellas collective communication such as broadcasting, by optimizing Charm++’s Myrinet machine layer.Rocstar running on Turing cluster enjoys this per<strong>for</strong>mance improvement since almost every productionrun <strong>of</strong> the integrated code running on the Turing cluster uses AMPI.Load balancing Many scientific applications, including rocket simulation, involve simulationswith dynamic geometry, and use adaptive techniques to solve highly irregular problems. In these applications,load balancing is one <strong>of</strong> the key factors <strong>for</strong> achieving high per<strong>for</strong>mance on large parallelmachines. Load balancing <strong>for</strong> extremely large parallel machines such as Blue Gene/L, is ever challenging.We designed and implemented a new hierarchical load balancing algorithm <strong>for</strong> very largeparallel machines and demonstrated its per<strong>for</strong>mance sing the BigSim per<strong>for</strong>mance simulator we developed.Rocman 3 We will continue to work on Rocman 3. Future work includes:• Incorporate all existing features, such as ALE and predictor-corrector, from the older version <strong>of</strong>Rocman.• Verify the correctness <strong>of</strong> new Rocman by comparing with the physics results <strong>of</strong> old Rocman.• Continue to work with other physicists in developing new coupling scheme based on their needs.Research Program — <strong>Computer</strong> <strong>Science</strong> <strong>3.3</strong>.8


• Work on a comprehensiveRocman manual so that itbecomes easier <strong>for</strong> a physicistto write his/her owncoupling scheme.• We will continue to improvethe per<strong>for</strong>mance <strong>of</strong> AMPI,especially the communicationper<strong>for</strong>mance <strong>for</strong> smallmessages and collective operations.Rocstar currently does notuse the load balancing frameworkin Charm++ and AMPI.Fig. <strong>3.3</strong>.10: New hybrid topological-geometric mesh partitioner usedto generate a 200 partition, 36.8 million mixed element, RSRM fluidWe plan to integrate Rocstardomainwith our load balancing frameworkby extending the Rocstar code with process migration capability.Parallel Mesh Partitioning (Cochran and Heath)The hybrid topological-geometric mesh partitioner we have proposed relies on the accurate computation<strong>of</strong> the medial surface <strong>of</strong> a mesh. Available algorithms are efficient but require the mesh to befree <strong>of</strong> sharp features in order to achieve sufficient accuracy <strong>for</strong> this application. We are developingan algorithm that is free <strong>of</strong> this requirement, and initial results look promising. A simple, smoothmesh has been subdivided topologically <strong>for</strong> optimal geometric partitioning. Further, the parallel geometricpartitioner has been implemented and successfully partitioned a 37 million element mesh <strong>for</strong>the RSRM, far exceeding previous partitioning capability in size.Mesh Smoothing (Guoy andSuriyamongkol)The meshing research inCSAR is driven by the need tocope with dynamically changinggeometry. In integrated simulations,the fluid and solid domainschange due to both structural de<strong>for</strong>mationand consumption <strong>of</strong>solid propellant. Our strategy <strong>for</strong>mesh improvement involves threelevels <strong>of</strong> aggressiveness to dealwith progressively more severegeometric change.The first line <strong>of</strong> defense is amesh smoothing module calledRocmop, in which mesh nodes arerepositioned, but topological connectivityis preserved. In our application,mesh smoothing is appliedin a dynamic setting. Thesmoothing procedure must be effi-Fig. <strong>3.3</strong>.11: Adaptive mesh refinement via MeshSimAdapt. Toppictures show input sizing field with hot spot in input mesh. Bottompictures show results from local mesh refinement near hotspot.Research Program — <strong>Computer</strong> <strong>Science</strong> <strong>3.3</strong>.9


cient enough to be called at every time step by every processor. Rocmop has been deployed successfullyand is currently being tuned <strong>for</strong> efficiency and robustness.The second line <strong>of</strong> defense is local mesh repair. This procedure is needed <strong>for</strong> severe de<strong>for</strong>mationin a local area; <strong>for</strong> example, near the tip <strong>of</strong> a propagating crack or around the joint slot <strong>of</strong> the pressurizedTitan IV. The goal <strong>of</strong> local repair is toeliminate poor elements in the problematic areawhile preserving good elements elsewhere. Onlynewly created elements need local solutiontransfer. Parallelization <strong>of</strong> local mesh repair ischallenging because the area in need <strong>of</strong> repaircan be shared by an arbitrary number <strong>of</strong> processors.In the past year, we collaborated with Pr<strong>of</strong>.Mark Shephard <strong>of</strong> Rensselaer Polytechnic Instituteand Simmetrix Inc. on local mesh repairand adaptation. Our preliminary results showedthe effectiveness <strong>of</strong> Simmetrix’s MeshSimAdapton adaptive mesh refinement (Figure <strong>3.3</strong>.11),mesh sizing control, sequential local mesh repair,and on parallel local repair (Figure <strong>3.3</strong>.12).The results look promising.The third line <strong>of</strong> defense is global remeshingwhich is implemented in a module calledRocrem. Remeshing the entire domain has theadvantage <strong>of</strong> being independent <strong>of</strong> the severity<strong>of</strong> the de<strong>for</strong>mation <strong>of</strong> the old mesh, but the disadvantage<strong>of</strong> requiring global solution transfer,which entails an expensive geometric search. InFig. <strong>3.3</strong>.12: Parallel local repair. Top picture showssliver elements with minimum dihedral angle between2 and 5 degrees. Bottom picture shows meshafter parallel repair, with minimum dihedral anglewas improved to 10 degrees.addition, boundary flags must be preserved <strong>for</strong> the physics solvers that will use the new mesh. For amesh <strong>of</strong> moderate size (a few million elements) sequential remeshing may suffice, but <strong>for</strong> a billionelements or more, parallel mesh generation is necessary.Research Program — <strong>Computer</strong> <strong>Science</strong> <strong>3.3</strong>.10

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!