Part II Implementation - FEniCS Project

Exploration des Dimensions de la Qualité de Vie desMénages de la ville de N’Djaménaا ‏____ا ا ‏____ا Arrondissement_______________________ا ‏____ا ا ‏____ا _____________________________ : Quartierا ‏____ا ا ‏____ا ___________________________ : ménage Numéro duMme/M. , cette étude initiée dans le cadre de la collaboration entre OPHI et le réseau PEP a pour objectif d’obtenir del’information statistique sur la population de la ville de N’Djaména, qui sera utilisée pour la recherche académique sur lapauvreté, l’emploi, la sécurité et les perceptions du bien-être.Vous avez été sélectionné à travers un processus scientifique d’échantillonnage aléatoire pour participer à cette étude. Nousserions heureux que vous participiez à cette étude, mais si vous décidez de ne pas participer à l’un quelconque de ses aspects,nous respecterons votre décision.Avec votre consentement, je (l’enquêteur) vais vous poser plusieurs questions sur votre situation socio-économique et sur laqualité de votre vie et celle de votre famille. Je voudrais vous rassurer qu’il n’y a pas de réponses correctes ou incorrectes à cesquestions. Nous voudrions seulement enregistrer ce que vous pensez des différents thèmes de l’enquête. Vous pouvez meposez des questions de clarification si vous ne comprenez pas une question. S’il y a une question à laquelle vous ne voudriezpas répondrez, dites-le moi, et nous continuerons avec la question suivante. Tous ce que vous nous direz sera strictementconfidentiel.L’interview durera moins d’une heure. Votre participation à cette étude est complètement volontaire et vous pouvez quitter à toutmoment sans conséquence. Si vous avez une question concernant cette étude, vous pouvez contacter les personnes qui en ontla charge au numéro de téléphone ……………….Page 1 sur 22

Automated Scientific ComputingThe FEniCS BookPreliminary draft, June 9, 2009

And now that we may give final praise to the machine we may say thatit will be desirable to all who are engaged in computations which, itis well known, are the managers of financial affairs, the administratorsof others’ estates, merchants, surveyors, geographers, navigators,astronomers...For it is unworthy of excellent men to lose hours likeslaves in the labor of calculations which could safely be relegated toanyone else if the machine were used.Gottfried Wilhelm Leibniz (1646–1716)

Contents1 IntroductionBy Anders Logg, Garth N. Wells and Kent-Andre Mardal 172 A FEniCS TutorialBy Hans Petter Langtangen 19I Methodology 213 The Finite Element MethodBy Robert C. Kirby and Anders Logg 233.1 A Simple Model Problem . . . . . . . . . . . . . . . . . . . . . . . . . 233.2 Finite Element Discretization . . . . . . . . . . . . . . . . . . . . . . 253.2.1 Discretizing Poisson’s equation . . . . . . . . . . . . . . . . . 253.2.2 Discretizing the first order system . . . . . . . . . . . . . . . 273.3 Finite Element Abstract Formalism . . . . . . . . . . . . . . . . . . 283.3.1 Linear problems . . . . . . . . . . . . . . . . . . . . . . . . . . 283.3.2 Nonlinear problems . . . . . . . . . . . . . . . . . . . . . . . . 293.4 Finite Element Function Spaces . . . . . . . . . . . . . . . . . . . . . 303.4.1 The mesh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.4.2 The finite element definition . . . . . . . . . . . . . . . . . . . 323.4.3 The nodal basis . . . . . . . . . . . . . . . . . . . . . . . . . . 323.4.4 The local-to-global mapping . . . . . . . . . . . . . . . . . . . 333.4.5 The global function space . . . . . . . . . . . . . . . . . . . . 343.4.6 The mapping from the reference element . . . . . . . . . . . 363.5 Finite Element Solvers . . . . . . . . . . . . . . . . . . . . . . . . . . 383.6 Finite Element Error Estimation and Adaptivity . . . . . . . . . . . 393.6.1 A priori error analysis . . . . . . . . . . . . . . . . . . . . . . 407

3.6.2 A posteriori error analysis . . . . . . . . . . . . . . . . . . . . 413.6.3 Adaptivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433.7 Automating the Finite Element Method . . . . . . . . . . . . . . . . 443.8 Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453.9 Historical Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 454 Common and Unusual Finite ElementsBy Robert C. Kirby, Anders Logg and Andy R. Terrel 474.1 Ciarlet’s Finite Element Definition . . . . . . . . . . . . . . . . . . . 474.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 484.3 The Argyris Element . . . . . . . . . . . . . . . . . . . . . . . . . . . 504.3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504.3.2 Historical notes . . . . . . . . . . . . . . . . . . . . . . . . . . 504.4 The Brezzi–Douglas–Marini element . . . . . . . . . . . . . . . . . . 514.4.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 514.4.2 Historical notes . . . . . . . . . . . . . . . . . . . . . . . . . . 514.5 The Crouzeix–Raviart element . . . . . . . . . . . . . . . . . . . . . 534.5.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534.5.2 Historical notes . . . . . . . . . . . . . . . . . . . . . . . . . . 534.6 The Hermite Element . . . . . . . . . . . . . . . . . . . . . . . . . . . 544.6.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 544.6.2 Historical notes . . . . . . . . . . . . . . . . . . . . . . . . . . 544.7 The Lagrange Element . . . . . . . . . . . . . . . . . . . . . . . . . . 554.7.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 554.7.2 Historical notes . . . . . . . . . . . . . . . . . . . . . . . . . . 564.8 The Morley Element . . . . . . . . . . . . . . . . . . . . . . . . . . . 574.8.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574.8.2 Historical notes . . . . . . . . . . . . . . . . . . . . . . . . . . 574.9 The Nédélec Element . . . . . . . . . . . . . . . . . . . . . . . . . . . 584.9.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584.9.2 Historical notes . . . . . . . . . . . . . . . . . . . . . . . . . . 594.10 The PEERS Element . . . . . . . . . . . . . . . . . . . . . . . . . . . 614.10.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 614.10.2 Historical notes . . . . . . . . . . . . . . . . . . . . . . . . . . 614.11 The Raviart–Thomas Element . . . . . . . . . . . . . . . . . . . . . . 634.11.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 634.11.2 Historical notes . . . . . . . . . . . . . . . . . . . . . . . . . . 634.12 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 655 Constructing General Reference Finite ElementsBy Robert C. Kirby and Kent-Andre Mardal 695.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 695.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 718

5.3 Mathematical Framework . . . . . . . . . . . . . . . . . . . . . . . . 735.3.1 Change of basis . . . . . . . . . . . . . . . . . . . . . . . . . . 735.3.2 Polynomial spaces . . . . . . . . . . . . . . . . . . . . . . . . . 745.4 Examples of Elements . . . . . . . . . . . . . . . . . . . . . . . . . . 765.4.1 Bases for other polynomial spaces . . . . . . . . . . . . . . . 785.5 Operations on the Polynomial spaces . . . . . . . . . . . . . . . . . . 805.5.1 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 805.5.2 Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . 805.5.3 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 815.5.4 Linear functionals . . . . . . . . . . . . . . . . . . . . . . . . . 826 Finite Element Variational FormsBy Robert C. Kirby and Anders Logg 837 Finite Element AssemblyBy Anders Logg 858 Quadrature Representation of Finite Element Variational FormsBy Kristian B. Ølgaard and Garth N. Wells 879 Tensor Representation of Finite Element Variational FormsBy Anders Logg and possibly others 8910 Discrete Optimization of Finite Element Matrix EvaluationBy Robert C. Kirby, Matthew G. Knepley, Anders Logg, L. Ridgway Scott andAndy R. Terrel 9111 Parallel Adaptive Mesh RefinementBy Johan Hoffman, Johan Jansson and Niclas Jansson 9311.1 A brief overview of parallel computing . . . . . . . . . . . . . . . . . 9311.2 Local mesh refinement . . . . . . . . . . . . . . . . . . . . . . . . . . 9411.2.1 The challenge of parallel mesh refinement . . . . . . . . . . 9411.2.2 A modified longest edge bisection algorithm . . . . . . . . . . 9511.3 The need of dynamic load balancing . . . . . . . . . . . . . . . . . . 9711.3.1 Workload modelling . . . . . . . . . . . . . . . . . . . . . . . . 9811.3.2 Remapping strategies . . . . . . . . . . . . . . . . . . . . . . . 9911.4 The implementation on a massively parallel system . . . . . . . . . 9911.4.1 The refinement method . . . . . . . . . . . . . . . . . . . . . . 10011.4.2 The remapping scheme . . . . . . . . . . . . . . . . . . . . . . 10111.4.3 Theoretical and experimental analysis . . . . . . . . . . . . . 10211.5 Summary and outlook . . . . . . . . . . . . . . . . . . . . . . . . . . 1059

II Implementation 10712 DOLFIN: A C++/Python Finite Element LibraryBy Anders Logg and Garth N. Wells 10913 FFC: A Finite Element Form CompilerBy Anders Logg and possibly others 11114 FErari: An Optimizing Compiler for Variational FormsBy Robert C. Kirby and Anders Logg 11315 FIAT: Numerical Construction of Finite Element Basis FunctionsBy Robert C. Kirby 11515.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11515.2 Prime basis: Collapsed-coordinate polynomials . . . . . . . . . . . . 11615.3 Representing polynomials and functionals . . . . . . . . . . . . . . . 11715.4 Other polynomial spaces . . . . . . . . . . . . . . . . . . . . . . . . . 12015.4.1 Supplemented polynomial spaces . . . . . . . . . . . . . . . . 12115.4.2 Constrained polynomial spaces . . . . . . . . . . . . . . . . . 12115.5 Conveying topological information to clients . . . . . . . . . . . . . . 12215.6 Functional evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . 12315.7 Overview of fundamental class structure . . . . . . . . . . . . . . . 12416 Instant: Just-in-Time Compilation of C/C++ Code in PythonBy Ilmar M. Wilbers, Kent-Andre Mardal and Martin S. Alnæs 12716.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12716.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12816.2.1 Installing Instant . . . . . . . . . . . . . . . . . . . . . . . . . 12816.2.2 Hello World . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12916.2.3 NumPy Arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . 13016.2.4 Ordinary Differential Equations . . . . . . . . . . . . . . . . 13016.2.5 Numpy Arrays and OpenMP . . . . . . . . . . . . . . . . . . 13216.3 Instant Explained . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13416.3.1 Arrays and Typemaps . . . . . . . . . . . . . . . . . . . . . . 13616.3.2 Module name, signature, and cache . . . . . . . . . . . . . . 14016.3.3 Locking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14116.4 Instant API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14116.4.1 build module . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14116.4.2 inline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14616.4.3 inline module . . . . . . . . . . . . . . . . . . . . . . . . . . . 14616.4.4 inline with numpy . . . . . . . . . . . . . . . . . . . . . . . . 14616.4.5 inline module with numpy . . . . . . . . . . . . . . . . . . . 14616.4.6 import module . . . . . . . . . . . . . . . . . . . . . . . . . . . 14616.4.7 header and libs from pkgconfig . . . . . . . . . . . . . . . . . 14710

16.4.8 get status output . . . . . . . . . . . . . . . . . . . . . . . . . 14716.4.9 get swig version . . . . . . . . . . . . . . . . . . . . . . . . . . 14816.4.10check swig version . . . . . . . . . . . . . . . . . . . . . . . . 14817 SyFi: Symbolic Construction of Finite Element Basis FunctionsBy Martin S. Alnæs and Kent-Andre Mardal 14918 UFC: A Finite Element Code Generation InterfaceBy Martin S. Alnæs, Anders Logg and Kent-Andre Mardal 15119 UFL: A Finite Element Form LanguageBy Martin Sandve Alnæs 15319.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15419.1.1 Design goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15419.1.2 Motivational example . . . . . . . . . . . . . . . . . . . . . . . 15519.2 Defining finite element spaces . . . . . . . . . . . . . . . . . . . . . . 15619.3 Defining forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15919.4 Defining expressions . . . . . . . . . . . . . . . . . . . . . . . . . . . 16019.4.1 Form arguments . . . . . . . . . . . . . . . . . . . . . . . . . . 16119.4.2 Index notation . . . . . . . . . . . . . . . . . . . . . . . . . . . 16219.4.3 Algebraic operators and functions . . . . . . . . . . . . . . . 16419.4.4 Differential operators . . . . . . . . . . . . . . . . . . . . . . . 16519.4.5 Other operators . . . . . . . . . . . . . . . . . . . . . . . . . . 16719.5 Form operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16819.5.1 Differentiating forms . . . . . . . . . . . . . . . . . . . . . . . 16819.5.2 Adjoint . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17019.5.3 Replacing functions . . . . . . . . . . . . . . . . . . . . . . . . 17019.5.4 Action . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17119.5.5 Splitting a system . . . . . . . . . . . . . . . . . . . . . . . . . 17119.5.6 Computing the sensitivity of a function . . . . . . . . . . . . 17119.6 Expression representation . . . . . . . . . . . . . . . . . . . . . . . . 17219.6.1 The structure of an expression . . . . . . . . . . . . . . . . . 17219.6.2 Tree representation . . . . . . . . . . . . . . . . . . . . . . . . 17319.6.3 Expression node properties . . . . . . . . . . . . . . . . . . . 17419.6.4 Linearized graph representation . . . . . . . . . . . . . . . . 17519.6.5 Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17619.7 Computing derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . 17619.7.1 Relations to form compiler approaches . . . . . . . . . . . . . 17719.7.2 Approaches to computing derivatives . . . . . . . . . . . . . . 17819.7.3 Forward mode Automatic Differentiation . . . . . . . . . . . 17819.7.4 Extensions to tensors and indexed expressions . . . . . . . . 17919.7.5 Higher order derivatives . . . . . . . . . . . . . . . . . . . . . 18019.7.6 Basic differentiation rules . . . . . . . . . . . . . . . . . . . . 18119.8 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18311

19.8.1 Effective tree traversal in Python . . . . . . . . . . . . . . . . 18319.8.2 Type based function dispatch in Python . . . . . . . . . . . . 18319.8.3 Implementing expression transformations . . . . . . . . . . 18519.8.4 Important transformations . . . . . . . . . . . . . . . . . . . 18619.8.5 Evaluating expressions . . . . . . . . . . . . . . . . . . . . . . 18719.8.6 Viewing expressions . . . . . . . . . . . . . . . . . . . . . . . 18819.9 Implementation issues . . . . . . . . . . . . . . . . . . . . . . . . . . 18819.9.1 Python as a basis for a domain specific language . . . . . . . 18819.9.2 Ensuring unique form signatures . . . . . . . . . . . . . . . . 18919.9.3 Efficiency considerations . . . . . . . . . . . . . . . . . . . . . 19019.10Future directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19019.11Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19120 Unicorn: A Unified Continuum Mechanics SolverBy Johan Hoffman, Johan Jansson, Niclas Jansson and Murtazo Nazarov 19320.1 Unified Continuum modeling . . . . . . . . . . . . . . . . . . . . . . 19420.1.1 Automated computational modeling and software design . . 19520.2 Space-time General Galerkin discretization . . . . . . . . . . . . . . 19520.2.1 Standard Galerkin . . . . . . . . . . . . . . . . . . . . . . . . 19620.2.2 Local ALE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19620.2.3 Streamline diffusion stabilization . . . . . . . . . . . . . . . . 19720.2.4 Duality-based adaptive error control . . . . . . . . . . . . . . 19720.2.5 Unicorn/FEniCS software implementation . . . . . . . . . . 19720.3 Unicorn classes: data types and algorithms . . . . . . . . . . . . . . 19820.3.1 Unicorn software design . . . . . . . . . . . . . . . . . . . . . 19820.3.2 TimeDependentPDE . . . . . . . . . . . . . . . . . . . . . . . 19920.3.3 ErrorEstimate . . . . . . . . . . . . . . . . . . . . . . . . . 20020.3.4 SlipBC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20220.4 Mesh adaptivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20320.4.1 Local mesh operations: Madlib . . . . . . . . . . . . . . . . . 20320.4.2 Elastic mesh smoothing: cell quality optimization . . . . . . 20320.4.3 Recusive Rivara bisection . . . . . . . . . . . . . . . . . . . . 20320.5 Parallel computation . . . . . . . . . . . . . . . . . . . . . . . . . . . 20320.5.1 Tensor assembly . . . . . . . . . . . . . . . . . . . . . . . . . . 20320.5.2 Mesh refinement . . . . . . . . . . . . . . . . . . . . . . . . . 20320.6 Application examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 20320.6.1 Incompressible flow . . . . . . . . . . . . . . . . . . . . . . . . 20320.6.2 Compressible flow . . . . . . . . . . . . . . . . . . . . . . . . . 20320.6.3 Fluid-structure interaction . . . . . . . . . . . . . . . . . . . 20321 Viper: A Minimalistic Scientific PlotterBy Ola Skavhaug 20712

22 Lessons Learnt in Mixed Language ProgrammingBy Kent-Andre Mardal, Anders Logg, and Ola Skavhaug 209III Applications 21123 Finite Elements for Incompressible FluidsBy Andy R. Terrel, L. Ridgway Scott, Matthew G. Knepley, Robert C. Kirby andGarth N. Wells 21324 Benchmarking Finite Element Methods for Navier–StokesBy Kristian Valen-Sendstad, Anders Logg and Kent-Andre Mardal 21525 Image-Based Computational HemodynamicsBy Luca Antiga 21726 Simulating the Hemodynamics of the Circle of WillisBy Kristian Valen-Sendstad, Kent-Andre Mardal and Anders Logg 21927 Cerebrospinal Fluid FlowBy Susanne Hentschel, Svein Linge, Emil Alf Løvgren and Kent-Andre Mardal 22127.1 Medical Background . . . . . . . . . . . . . . . . . . . . . . . . . . . 22127.2 Mathematical Description . . . . . . . . . . . . . . . . . . . . . . . . 22327.3 Numerical Experiments . . . . . . . . . . . . . . . . . . . . . . . . . 22327.3.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 22327.3.2 Example 1. Simulation of a Pulse in the SAS. . . . . . . . . . 23027.3.3 Example 2. Simplified Boundary Conditions. . . . . . . . . . 23427.3.4 Example 3. Cord Shape and Position. . . . . . . . . . . . . . 23527.3.5 Example 4. Cord with Syrinx. . . . . . . . . . . . . . . . . . . 23628 Turbulent Flow and Fluid–Structure Interaction with UnicornBy Johan Hoffman, Johan Jansson, Niclas Jansson, Claes Johnson and MurtazoNazarov 23928.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23928.2 Continuum models . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24028.3 Mathematical framework . . . . . . . . . . . . . . . . . . . . . . . . 24128.4 Computational method . . . . . . . . . . . . . . . . . . . . . . . . . . 24128.5 Boundary conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . 24128.6 Geometry modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24228.7 Fluid-structure interaction . . . . . . . . . . . . . . . . . . . . . . . . 24228.8 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24228.8.1 Turbulent flow separation . . . . . . . . . . . . . . . . . . . . 24228.8.2 Flight aerodynamics . . . . . . . . . . . . . . . . . . . . . . . 24228.8.3 Vehicle aerodynamics . . . . . . . . . . . . . . . . . . . . . . . 24228.8.4 Biomedical flow . . . . . . . . . . . . . . . . . . . . . . . . . . 24213

28.8.5 Aeroacoustics . . . . . . . . . . . . . . . . . . . . . . . . . . . 24228.8.6 Gas flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24328.9 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24329 Fluid–Structure Interaction using Nitsche’s MethodBy Kristoffer Selim and Anders Logg 24530 Improved Boussinesq Equations for Surface Water WavesBy N. Lopes, P. Pereira and L. Trabucho 24730.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24730.2 Model derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24930.2.1 Standard models . . . . . . . . . . . . . . . . . . . . . . . . . 25130.2.2 Second-order model . . . . . . . . . . . . . . . . . . . . . . . . 25330.3 Linear dispersion relation . . . . . . . . . . . . . . . . . . . . . . . . 25330.4 Wave generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25530.4.1 Initial condition . . . . . . . . . . . . . . . . . . . . . . . . . . 25530.4.2 Incident wave . . . . . . . . . . . . . . . . . . . . . . . . . . . 25630.4.3 Source function . . . . . . . . . . . . . . . . . . . . . . . . . . 25630.5 Reflective walls and sponge layers . . . . . . . . . . . . . . . . . . . 25730.6 Numerical Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25730.7 Numerical Applications . . . . . . . . . . . . . . . . . . . . . . . . . . 25930.8 Conclusions and future work . . . . . . . . . . . . . . . . . . . . . . 26030.9 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26231 Multiphase Flow Through Porous MediaBy Xuming Shan and Garth N. Wells 26332 Computing the Mechanics of the HeartBy Martin S. Alnæs, Kent-Andre Mardal and Joakim Sundnes 26533 Simulation of Ca 2+ Dynamics in the Dyadic CleftBy Johan Hake 26733.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26733.2 Biological background . . . . . . . . . . . . . . . . . . . . . . . . . . 26833.3 Mathematical models . . . . . . . . . . . . . . . . . . . . . . . . . . . 26833.3.1 Geometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26833.3.2 Ca 2+ Diffusion . . . . . . . . . . . . . . . . . . . . . . . . . . . 26933.3.3 Stochastic models of single channels . . . . . . . . . . . . . . 27133.4 Numerical methods for the continuous system . . . . . . . . . . . . 27233.4.1 Discretization . . . . . . . . . . . . . . . . . . . . . . . . . . . 27433.4.2 Stabilization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27633.5 diffsim an event driven simulator . . . . . . . . . . . . . . . . . . 28033.5.1 Stochastic system . . . . . . . . . . . . . . . . . . . . . . . . . 28033.5.2 Time stepping algorithm . . . . . . . . . . . . . . . . . . . . . 28114

33.5.3 diffsim an example . . . . . . . . . . . . . . . . . . . . . . . 28234 Electromagnetic Waveguide AnalysisBy Evan Lezar and David B. Davidson 28934.1 Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29034.1.1 Waveguide Cutoff Analysis . . . . . . . . . . . . . . . . . . . 29134.1.2 Waveguide Dispersion Analysis . . . . . . . . . . . . . . . . . 29334.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29434.2.1 Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29434.2.2 Post-Processing . . . . . . . . . . . . . . . . . . . . . . . . . . 29634.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29734.3.1 Hollow Rectangular Waveguide . . . . . . . . . . . . . . . . . 29834.3.2 Half-Loaded Rectangular Waveguide . . . . . . . . . . . . . . 30134.3.3 Shielded Microstrip . . . . . . . . . . . . . . . . . . . . . . . . 30334.4 Analysis of Waveguide Discontinuities . . . . . . . . . . . . . . . . . 30534.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30735 Applications in Solid MechanicsBy Kristian B. Ølgaard and Garth N. Wells 30936 Modelling Evolving DiscontinuitiesBy Mehdi Nikbakht and Garth N. Wells 31137 Optimal Control ProblemsBy Kent-Andre Mardal, Oddrun Christine Myklebust and Bjørn Fredrik Nielsen31338 Automatic Calibration of Depositional ModelsBy Hans Joachim Schroll 31538.1 Issues in dual lithology sedimentation . . . . . . . . . . . . . . . . . 31538.2 A multidimensional sedimentation model . . . . . . . . . . . . . . . 31638.3 An inverse approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 31638.4 The Landweber algorithm . . . . . . . . . . . . . . . . . . . . . . . . 31738.5 Evaluation of gradients by duality arguments . . . . . . . . . . . . 31838.6 Aspects of the implementation . . . . . . . . . . . . . . . . . . . . . . 32038.7 Numerical experiments . . . . . . . . . . . . . . . . . . . . . . . . . . 32138.8 Results and conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . 32339 Computational ThermodynamicsBy Johan Hoffman, Claes Johnson and Murtazo Nazarov 33139.1 FEniCS as Computational Science . . . . . . . . . . . . . . . . . . . 33139.2 The 1st and 2nd Laws of Thermodynamics . . . . . . . . . . . . . . 33239.3 The Enigma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33339.4 Computational Foundation . . . . . . . . . . . . . . . . . . . . . . . . 33539.5 Viscosity Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33739.6 Joule’s 1845 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . 33815

39.7 The Euler Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . 33939.8 Energy Estimates for Viscosity Solutions . . . . . . . . . . . . . . . 34039.9 Compression and Expansion . . . . . . . . . . . . . . . . . . . . . . . 34239.10A 2nd Law witout Entropy . . . . . . . . . . . . . . . . . . . . . . . . 34239.11Comparison with Classical Thermodynamics . . . . . . . . . . . . . 34339.12EG2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34439.13The 2nd Law for EG2 . . . . . . . . . . . . . . . . . . . . . . . . . . . 34539.14The Stabilization in EG2 . . . . . . . . . . . . . . . . . . . . . . . . . 34539.15Output Uniqueness and Stability . . . . . . . . . . . . . . . . . . . . 34540 Saddle Point StabilityBy Marie E. Rognes 347A Notation 35716

CHAPTER1IntroductionBy Anders Logg, Garth N. Wells and Kent-Andre MardalChapter ref: [intro]17

CHAPTER2A FEniCS TutorialBy Hans Petter LangtangenChapter ref: [langtangen]A series of complete, worked out examples, starting with simple mathematicalPDE problems, progressing with physics/mechanics problems, and ending upwith advanced computational mechanics problems. The essence is to show thattheprogrammingtasksscalewiththemathematicalformulationoftheproblems.19

Part IMethodology21

CHAPTER3The Finite Element MethodBy Robert C. Kirby and Anders LoggChapter ref: [kirby-7]The finite element method has grown out of Galerkin’s method, emerging asauniversal methodforthesolutionofdifferential equations. Much ofthesuccessof the finite element method can be contributed to its generality and simplicity,allowing a wide range of differential equations from all areas of science to beanalyzedandsolvedwithinacommonframework. Anothercontributingfactortothesuccessofthefiniteelementmethodistheflexibility offormulation, allowingthepropertiesofthediscretization to be controlledbythe choice offiniteelementapproximating spaces.Inthischapter,wereviewthefiniteelementmethodandintroducesomebasicconceptsandnotation. Inthecomingchapters,wediscusstheseconceptsinmoredetail,withaparticularfocusontheimplementationandautomationofthefiniteelement method as part of the FEniCS project.3.1 A Simple Model ProblemIn 1813, Siméon Denis Poisson (Figure 3.1) published in Bulletin de la sociétéphilomatique his famous equation as a correction of an equation published earlierby Pierre-Simon Laplace. Poisson’s equation is a second-order partial differentialequation stating that the negative Laplacian −∆u of some unknown fieldu = u(x) is equal to a given function f = f(x) on a domain Ω ⊂ R d , possiblyamended by a set of boundary conditions for the solution u on the boundary ∂Ω23

The Finite Element MethodFigure 3.1: Siméon Denis Poisson (1781–1840), inventor of Poisson’s equation.of Ω:−∆u = f in Ω,u = u 0 on Γ D ⊂ ∂Ω,−∂ n u = g on Γ N ⊂ ∂Ω.(3.1)The Dirichlet boundary condition u = u 0 signifies a prescribed value for the unknownu on a subset Γ D of the boundary and the Neumann boundary condition−∂ n u = g signifies a prescribed value for the (negative) normal derivative of uon the remaining boundary Γ N = ∂Ω \ Γ D . Poisson’s equation is a simple modelfor gravity, electromagnetism, heat transfer, fluid flow, and many other physicalprocesses. It also appears as the basic building block in a large number of morecomplex physical models, including the Navier–Stokes equations that we returnto below in Chapters ??.To derive Poisson’s equation (3.1), we may consider a model for the temperatureuinabodyoccupyingadomain Ωsubjecttoaheatsource f. Letting σ = σ(x)denote heat flux, it follows by conservation of energy that the outflow of energyover the boundary ∂ω of any test volume ω ⊂ Ω must be balanced by the energytransferred from the heat source f,∫ ∫σ · n ds = f dx.Integrating by parts, it follows that∫ ∫∇ · σ dx =∂ωωforall testvolumes ω and thusthat ∇ ·σ = f (by suitable regularity assumptionson σ and f). If we now make the assumption that the heat flux σ is proportional24ωωf dx

Robert C. Kirby and Anders LoggFigure 3.2: Poisson’s equation is a simple consequence of balance of energy in anarbitrary test volume ω ⊂ Ω.to the negative gradient of the temperature u (Fourier’s law),σ = −κ∇u,we arrive at the following system of equations:∇ · σ = f in Ω,σ + ∇u = 0 in Ω,(3.2)where we have assumed that the the heat conductivity κ = 1. Replacing σ inthe first of these equations by −∇u, we arrive at Poisson’s equation (3.1). Wenote that one may as well arrive at the system of first order equations (3.2) byintroducing σ = −∇u as an auxiliary variable in the second order equation (3.1).We also note that the Dirichlet and Neumann boundary conditions in (3.1) correspondto prescribed values for the temperature and heat flux respectively.3.2 Finite Element Discretization3.2.1 Discretizing Poisson’s equationTo discretize Poisson’s equation (3.1) by the finite element method, we first multiplyby a test function v and integrate by parts to obtain∫∫ ∫∇v · ∇u dx − v ∂ n u ds = vf dx.ΩΩΩ25

The Finite Element MethodLetting the test function v vanish on the Dirichlet boundary Γ D where the solutionu is known, we arrive at the following classical variational problem: Findu ∈ V such that∫∫ ∫∇v · ∇u dx = vf dx − v g ds ∀v ∈ ˆV . (3.3)ΩΩ Γ NThe test space ˆV is defined byˆV = {v ∈ H 1 (Ω) : v = 0 on Γ D },and the trial space V contains members of ˆV shifted by the Dirichlet condition,V = {v ∈ H 1 (Ω) : v = u 0 on Γ D }.We may now discretize Poisson’s equation by restricting the variational problem(3.3) to a pair of discrete spaces: Find u h ∈ V h ⊂ V such that∫∫ ∫∇v · ∇u h dx = vf dx − v g ds ∀v ∈ ˆV h ⊂ ˆV . (3.4)ΩΩ Γ NWenoteherethattheDirichletcondition u = u 0 on Γ D entersintothedefinitionofthe trial space V h (it is an essential boundary condition), whereas the Neumanncondition −∂ n u = g on Γ N enters into the variational problem (it is a naturalboundary condition).To solve the discrete variational problem (3.4), we must construct a suitablepair of discrete test and trial spaces ˆV h and V h . We return to this issue below, butassume for now that we have a basis {ˆφ i } N i=1 for ˆV h and a basis {φ j } N j=1 for V h. Wemay then make an ansatz for u h in termsofthe basis functionsof the trial space,N∑u h = U j φ j ,j=1where U ∈ R N is the vector of degrees of freedom to be computed. Inserting thisinto (3.4) and varying the test function v over the basis functions of the discretetest space ˆV h , we obtainN∑∫∫ ∫U j ∇ˆφ i · ∇φ j dx = ˆφ i f dx − ˆφi g ds, i = 1, 2, . . .,N.ΩΩ Γ Nj=1We may thus compute the finite element solution u h = ∑ Nj=1 U jφ j by solving thelinear systemAU = b,where∫A ij =∫b i =ΩΩ∇ˆφ i · ∇φ j dx,∫ˆφ i f dx − ˆφi g ds.Γ N26

Robert C. Kirby and Anders Logg3.2.2 Discretizing the first order systemWe may similarly discretize the first order system (3.2) by multiplying the firstequation by a test function v and the second by a test function τ. Summing upand integrating by parts, we find∫∫ ∫v∇ · σ + τ · σ − ∇ · τ u dx + τ · n u ds = vf dx ∀v ∈ ˆV .Ω∂ΩThe normal flux σ ·n = g is knownon the Neumann boundary Γ N so we maytakeτ · n = 0 on Γ N . Inserting the value for u on the Dirichlet boundary Γ D , we thusarrive at the following variational problem: Find (u, σ) ∈ V such that∫∫ ∫v∇ · σ + τ · σ − ∇ · τ u dx = vf dx − τ · n u 0 ds ∀(v, τ) ∈ ˆV . (3.5)ΩΩ Γ DNow, ˆV and V are a pair of suitable test and trial spaces, hereˆV = {(v, τ) : v ∈ L 2 (Ω), τ ∈ H(div; Ω), τ · n = 0 on Γ N },V = {(v, τ) : v ∈ L 2 (Ω), τ ∈ H(div; Ω), τ · n = g on Γ N }.As above, we restrict this variational problem to a pair of discrete test andtrialspaces ˆV h ⊂ ˆV and V h ⊂ V andmakeanansatzforthefiniteelementsolutionof the formN∑(u h , σ h ) = U j (φ j , ψ j ),j=1where {(φ j , ψ j )} N j=1 is a basis for the trial space V h. Typically, either φ j or ψ j willvanish, so that the basis is really the tensor product of a basis for an L 2 spacewith an H(div) space. We thus obtain a linear system for the degrees of freedomU ∈ R N by solving a linear system AU = b, where now∫A ij = ˆφ i ∇ · ψ j + ˆψ i · ψ j − ∇ · ˆψ i φ j dx,∫Ω∫b i = ˆφ i f dx − ˆψi · n u 0 ds.Ω Γ DWe note that the variational problem (3.5) differs from the variational problem(3.3) in that the Dirichlet condition u = u 0 on Γ D enters into the variationalformulation(itisnowanaturalboundarycondition), whereastheNeumannconditionσ = g on Γ N enters into the definition of the trial space V (it is now anessential boundary condition).Such mixed methods require some care in selecting spaces that discretize L 2and H(div) in a compatible way. Stable discretizations must satisfy the so-calledinf–sup or Ladysenskaja–Babuška–Brezzi (LBB) condition. This theory explainswhy many of the elements for mixed methods seem complicated compared tothose for standard Galerkin methods.27Ω

The Finite Element Method3.3 Finite Element Abstract Formalism3.3.1 Linear problemsWe saw above that the finite element solution of Poisson’s equation (3.1) or (3.2)can be obtained by restricting an infinite dimensional variational problem to afinite dimensional variational problem and solving a linear system.To formalize this, we consider a general linear variational problem written inthe following canonical form: Find u ∈ V such thata(v, u) = L(v) ∀v ∈ ˆV , (3.6)where ˆV is the test space and V is the trial space. We may thus express thevariational problem in terms of a bilinear form a and linear form (functional) L,a : ˆV × V → R,L : ˆV → R.As above, we discretize the variational problem (3.6) by restricting to a pair ofdiscrete test and trial spaces: Find u h ∈ V h ⊂ V such thata(v, u h ) = L(v) ∀v ∈ ˆV h ⊂ ˆV . (3.7)To solve the discrete variational problem (3.7), we make an ansatz of the formu h =N∑U j φ j , (3.8)j=1and take v = ˆφ i , i = 1, 2, . . ., N, where {ˆφ i } N i=1 is a basis for the discrete testspace ˆV h and {φ j } N j=1 is a basis for the discrete trial space V h. It follows thatN∑U j a(ˆφ i , φ j ) = L(ˆφ i ), i = 1, 2, . . ., N.j=1We thus obtain the degrees of freedom U of the finite element solution u h bysolving a linear system AU = b, whereA ij = a(ˆφ i , φ j ), i, j = 1, 2, . . ., N,b i = L(ˆφ i ).(3.9)28

Robert C. Kirby and Anders Logg3.3.2 Nonlinear problemsWe also consider nonlinear variational problems written in the following canonicalform: Find u ∈ V such thatF(u; v) = 0 ∀v ∈ ˆV , (3.10)where now F : V × ˆV → R is a semilinear form, linear in the argument(s) subsequentto the semicolon. As above, we discretize the variational problem (3.10) byrestricting to a pair of discrete test and trial spaces: Find u h ∈ V h ⊂ V such thatF(u h ; v) = 0 ∀v ∈ ˆV h ⊂ ˆV .The finite element solution u h = ∑ Nj=1 U jφ j may then be computed by solving anonlinear system of equations,b(U) = 0, (3.11)where b : R N → R N andb i (U) = F(u h ; ˆφ i ), i = 1, 2, . . ., N. (3.12)To solve the nonlinear system (3.11) by Newton’s method or some variant ofNewton’smethod,wecomputetheJacobian A = b ′ . Wenotethatifthesemilinearform F is differentiable in u, then the entries of the Jacobian A are given byA ij (u h ) = ∂b i(U)∂U j= ∂∂U jF(u h ; ˆφ i ) = F ′ (u h ; ˆφ i ) ∂u h∂U j= F ′ (u h ; ˆφ i ) φ j ≡ F ′ (u h ; ˆφ i , φ j ).(3.13)In each Newton iteration, we must then evaluate (assemble) the matrix A andthe vector b, and update the solution vector U bywhere δU k solves the linear systemU k+1 = U k − δU k ,A(u k h ) δUk = b(u k h ). (3.14)Wenotethatforeachfixed u h , a = F ′ (u h ; ·, ·)isabilinear formand L = F(u h ; ·)is a linear form. In each Newton iteration, we thus solve a linear variationalproblem of the canonical form (3.6): Find δu ∈ V 0 such thatF ′ (u h ; v, δu) = F(u h ; v) ∀v ∈ ˆV , (3.15)where V 0 = {v − w : v, w ∈ V }. Discretizing (3.15) as in Section 3.3.1, we recoverthe linear system (3.14).29

The Finite Element MethodExample 3.1 (Nonlinear Poisson equation). As an example, consider the followingnonlinear Poisson equation:−∇ · ((1 + u)∇u) = f in Ω,u = 0on ∂Ω.(3.16)Multiplying (3.16) with a test function v and integrating by parts, we obtain∫∫∇v · ((1 + u)∇u) dx = vf dx,Ωwhich is a nonlinear variational problem of the form (3.10), with∫∫F(u; v) = ∇v · ((1 + u)∇u) dx − v f dx.ΩLinearizing the semilinear form F around u = u h , we obtain∫∫F ′ (u h ; v, δu) = ∇v · (δu∇u h ) dx + ∇v · ((1 + u h )∇δu) dx.ΩWe may thus compute the entries of the Jacobian matrix A(u h ) by∫∫A ij (u h ) = F ′ (u h ; ˆφ i , φ j ) = ∇ˆφ i · (φ j ∇u h ) dx + ∇ˆφ i · ((1 + u h )∇φ j ) dx. (3.17)3.4 Finite Element Function SpacesΩIn the above discussion, we assumed that we could construct discrete subspacesV h ⊂ V of infinite dimensional function spaces. A central aspect of the finiteelement method is the construction of such subspaces by patching together localfunction spaces defined on a set of finite elements. We here give a generaloverview of the construction of finite element function spaces and return belowinChapters4and5totheconstructionofspecificfunctionspacessuchassubsetsof H 1 (Ω), H(curl), H(div) and L 2 (Ω).3.4.1 The meshTo define V h , we first partition the domain Ω into a finite set of disjoint cellsT = {K} such that∪ K∈T K = Ω.Together, these cells form a mesh of the domain Ω. The cells are typically simplepolygonal shapes like intervals, triangles, quadrilaterals, tetrahedra or hexahedraas shown in Figure 3.3. But other shapes are possible, in particular curvedcells to correctly capture the boundary of a non-polygonal domain as shown inFigure 3.4.30ΩΩΩΩ

Robert C. Kirby and Anders LoggFigure 3.3: Finite element cells in one, two and three space dimensions.Figure 3.4: A straight triangular cell (left) and curved triangular cell (right).31

The Finite Element Method3.4.2 The finite element definitionOnce a domain Ω has been partitioned into cells, one may define a local functionspace P K on each cell K and use these local function spaces to build the globalfunction space V h . A cell K together with a local function space P K and a set ofrules for describing functions in P K is called a finite element. This definition offinite element was first formalized by Ciarlet in [Cia78, Cia02], and it remainsthe standard formulation today. [BS94, BS08]. The formal definition reads asfollows: A finite element is a triple (K, P K , L K ), where• K ⊂ R d is a bounded closed subset of R d with nonempty interior and piecewisesmooth boundary;• P K is a function space on K of dimension n K < ∞;• L K = {l K 1 , lK 2 , . . .,lK n K} is a basis for P K ′ (the bounded linear functionals onP K ).As an example, consider the standard linear Lagrange finite element on thetriangle in Figure 3.5. The cell K is given by the triangle and the space P K isgiven by the space of first degree polynomials on K. As a basis for P K ′ , we maytake point evaluation at the three vertices of K, that is,l K i : P K → R,l K i (v) = v(x i ),for i = 1, 2, 3 where x i is the coordinate of the ith vertex. To check that thisis indeed a finite element, we need to verify that L K is a basis for P K ′ . Thisis equivalent to the unisolvence of L K , that is, if v ∈ P K and l K i (v) = 0 for alll K i , then v = 0. [BS08] For the linear Lagrange triangle, we note that if v iszero at each vertex, then v must be zero everywhere, since a plane is uniquelydetermined by its value a three non-collinear points. Thus, the linear Lagrangetriangle is indeed a finite element. In general, determining the unisolvence ofL K may be non-trivial.3.4.3 The nodal basisExpressing finite element solutions in V h in terms of basis functions for the localfunction spaces P K may be greatly simplified by introducing a nodal basis forP K . A nodal basis {φ K i }n Ki=1 for P K is a basis for P K that satisfiesIt follows that any v ∈ P K may be expressed byl K i (φK j ) = δ ij, i, j = 1, 2, . . ., n K . (3.18)v =n K∑i=1l K i (v)φ K i . (3.19)32

Robert C. Kirby and Anders LoggFigure 3.5: The linear Lagrange (Courant) triangle).In particular, any function v in P K for the linear Lagrange triangle is given byv = ∑ 3i=1 v(xi )φ K i . Inotherwords,thedegreesoffreedomofanyfunction v maybeobtained by evaluating the linear functionals L K . We shall therefore sometimesrefer to L K as degrees of freedom.◮ Author note: Give explicit formulas for some nodal basis functions. Use example environment.For any finite element (K, P K , L K ), the nodal basis may be computed by solvinga linear system of size n K × n K . To see this, let {ψi K } n Ki=1 be any basis (theprime basis) for P K . Such a basis is easy to construct if P K is a full polynomialspace or may otherwise be computed by a singular-value decomposition ora Gram-Schmidt procedure, see [Kir04]. We may then make an ansatz for thenodal basis in terms of the prime basis:φ j =n K∑k=1Inserting this into (3.18), we find thatn K∑k=1α jk ψ K k , j = 1, 2, . . ., n K .α jk l K i (ψK k ) = δ ij, j = 1, 2, . . ., n K .In otherwords, the expansion coefficients α for the nodal basis may be computedby solving the linear systemBα ⊤ = I,where B ij = l K i (ψK j ).3.4.4 The local-to-global mappingNow, to define a global function space V h = span{φ i } N i=1 on Ω from a given set{(K, P K , L K )} K∈T offiniteelements,wealsoneedtospecifyhowthelocalfunction33

The Finite Element MethodFigure3.6: Local-to-globalmappingforasimplemeshconsistingoftwotriangles.spaces are patched together. We do this by specifying for each cell K ∈ T a localto-globalmapping,ι K : [1, n K ] → N. (3.20)Thismappingspecifieshowthelocaldegreesoffreedom L K = {l K i }n Ki=1 aremappedto global degrees of freedom L = {l i } N i=1. More precisely, the global degrees offreedom are given byl ιK (i)(v) = l K i (v| K), i = 1, 2, . . ., n K , (3.21)for any v ∈ V h . Thus, each local degree of freedom l K i ∈ L K corresponds to aglobal degree of freedom ν ιK (i) ∈ L determined by the local-to-global mapping ι K .As we shall see, the local-to-global mapping together with the choice of degreesof freedom determine the continuity of the global function space V h .For standard piecewise linears, one may define the local-to-global mappingby simply mapping each local vertex number i for i = 1, 2, 3 to the correspondingglobal vertex number ι K (i). This is illustrated in Figure 3.6 for a simple meshconsisting of two triangles.3.4.5 The global function spaceOne may now define the global function space V h as the set of functions on Ωsatisfying the following pair of conditions. We first require thatv| K ∈ P K ∀K ∈ T , (3.22)that is, the restriction of v to each cell K lies in the local function space P K .Second, we require that for any pair of cells (K, K ′ ) ∈ T × T and any pair (i, i ′ ) ∈34

Robert C. Kirby and Anders LoggFigure 3.7: Patching together local function spaces on a pair of cells (K, K ′ ) toform a global function space on Ω = K ∪ K ′ .[1, n K ] × [1, n K ′] satisfyingit holds thatι K (i) = ι K ′(i ′ ), (3.23)l K i (v| K ) = l K′i (v| ′ K ′). (3.24)Inotherwords,iftwolocaldegreesoffreedom l K i and l K′i ′are mappedtothesameglobaldegreeoffreedom,thentheymustagreeforeachfunction v ∈ V h . Here, v| Kdenotes the continuous extension to K of the restriction of v to the interior of K.This is illustrated in Figure 3.7 for the space of continuous piecewise quadraticsobtained by patching together two quadratic Lagrange triangles.Notethatbythisconstruction,thefunctionsof V h areundefinedoncellboundaries,unless the constraints (3.24) force the (restrictions of) functions of V h to becontinuous on cell boundaries. However, this is usually not a problem, since wecan perform all operations on the restrictions of functions to the local cells.Thelocal-to-global mappingtogetherwiththechoiceofdegreesoffreedomdeterminethecontinuityoftheglobalfunctionspaceV h . FortheLagrange triangle,the choice of degrees of freedom as point evaluation at vertices ensures that therestrictions v|K and v|K ′ of a function v ∈ V h to a pair of adjacent triangles Kagree at the two common vertices, since ι K and ι K ′ map corresponding degrees offreedom to the same global degree of freedom and this global degree of freedomis single-valued. It follows that the functions of V h are continuous not only atvertices but along each shared edge since a first-degree polynomial on a line isuniquely determined by its values at two distinct points. Thus, the global functionspace of piecewise linears generated by the Lagrange triangle is continuous35

The Finite Element MethodFigure3.8: Thedegreeofcontinuityisdeterminedbythechoiceofdegreesoffreedom,illustrated here for a pair of linear Lagrange triangles, Crouzeix–Raviarttriangles, Brezzi–Douglas–Marini triangles and Nédélec triangles.and thus H 1 -conforming, that is, V h ⊂ H 1 (Ω).One may also consider degrees of freedom defined by point evaluation at themidpoint of each edge. This is the so-called Crouzeix–Raviart triangle. The correspondingglobal Crouzeix–Raviart space V h is consequently continuous only atedge midpoints and so V h is not a subspace of H 1 . The Crouzeix–Raviart triangleis an example of an H 1 -nonconforming element. Other choices of degrees of freedommayensure continuity ofnormal componentslike forthe H(div)-conformingBrezzi–Douglas–Marini elements or tangential components as for the H(curl)-conforming Nédélec elements. This is illustrated in Figure 3.8. In Chapter 4,other examples of particular elements are given which ensure different kinds ofcontinuity by the choice of degrees of freedom and local-to-global mapping.3.4.6 The mapping from the reference element◮ Editor note: Need to change the notation here to (K 0 , P 0 , L 0 ) since ˆ· is already used forthe test space.As we have seen, the global function space V h may be described by a meshT,aset of finite elements {(K, P K , L K )} K∈T and a set of local-to-global mappings{ι K } K∈T . We may simplify this description further by introducing a reference finiteelement ( ˆK, ˆP, ˆL), where ˆL = {ˆl 1 , ˆl 2 , . . ., ˆlˆn }, and a set of invertible mappings{F K } K∈T that map the reference cell ˆK to the cells of the mesh,K = F K ( ˆK) ∀K ∈ T . (3.25)This is illustrated in Figure 3.9. Note that ˆK is generally not part of the mesh.36

Robert C. Kirby and Anders LoggFigure 3.9: The (affine) mapping F K from a reference cell ˆK to some cell K ∈ T.For function spaces discretizing H 1 as in (3.3), the mapping F K is typicallyaffine, that is, F K can be written in the form F K (ˆx) = A Kˆx + b K for some matrixA K ∈ R d×d and some vector b K ∈ R d , or isoparametric, in which case thecomponentsof F K are functionsin ˆP. Forfunction spacesdiscretizing H(div) likein(3.5)or H(curl),theappropriate mappingsarethecontravariant andcovariantPiola mappings which preserve normal and tangential components respectively,see[RKL08]. Forsimplicity, werestrict thefollowingdiscussion tothecase whenF K is affine or isoparametric.For each cell K ∈ T , the mapping F K generates a function space on K givenbyP K = {v : v = ˆv ◦ F −1K , ˆv ∈ ˆP}, (3.26)that is, each function v = v(x) may be expressed as v(x) = ˆv(F −1−1K(x)) = ˆv ◦ FK (x)for some ˆv ∈ ˆP.The mapping F K also generates a set of degrees of freedom L K on P K givenbyL K = {l K i : l K i (v) = ˆl i (v ◦ F K ), i = 1, 2, . . ., ˆn}. (3.27)Themappings {F K } K∈T thusgenerate fromthereference finiteelement ( ˆK, ˆP, ˆL)a set of finite elements {(K, P K , L K )} K∈T given byK = F K ( ˆK),P K = {v : v = ˆv ◦ F −1K: ˆv ∈ ˆP},L K = {l K i : l K i (v) = ˆl i (v ◦ F K ), i = 1, 2, . . ., ˆn = n K }.(3.28)Bythisconstruction,wealsoobtainthenodalbasisfunctions {φ K i } n Ki=1 on K fromasetofnodalbasisfunctions {ˆφ i }ˆn i=1 onthereferenceelementsatisfying ˆl i (ˆφ j ) = δ ij .37

The Finite Element MethodLetting φ K i= ˆφ i ◦ F −1Kfor i = 1, 2, . . ., n K, we find thatl K i (φK j ) = ˆl i (φ K j ◦ F K ) = ˆl i (ˆφ j ) = δ ij , (3.29)so {φ K i } n Ki=1 is a nodal basis for P K .We may thus define the function space V h by specifying a mesh T, a referencefinite element ( ˆK, ˆP, ˆL), a set of local-to-global mappings {ι K } K∈T and a set ofmappings {F K } K∈T fromthereferencecell ˆK. Notethatingeneral, themappingsneed not be of the same type for all cells K and not all finite elements need tobe generated from the same reference finite element. In particular, one couldemploy a different (higher-degree) isoparametric mapping for cells on a curvedboundary.Theaboveconstructionisvalidforso-calledaffine-equivalentelements[BS08]like the family H 1 -conforming Lagrange finite elements. A similar constructionis possible for H(div)- and H(curl) conforming elements, like the Raviart–Thomas, Brezzi–Douglas–Marini and Nédélec elements, where an appropriatePiola mapping must be used to map the basis functions. However, not all finiteelements may be generated from a reference finite element using this simpleconstruction. For example, this construction fails for the family of Hermite finiteelements. [Cia02, BS08].3.5 Finite Element SolversFiniteelementsprovideapowerfulmethodologyfordiscretizingdifferentialequations,but solving the resulting algebraic systems also presents quite a challenge,even for linear systems. Good solvers must handle the sparsity and illconditioningof the algebraic system, but also scale well on parallel computers.The linear solve is a fundamental operation not only in linear problems, but alsowithin each iteration of a nonlinear solve via Newton’s method, an eigenvaluesolve, or time-stepping.A classical approach that has been revived recently is direct solution, basedon Gaussian elimination. Thanks to techniques enabling parallel scalabilityand recognizing block structure, packages such as UMFPACK [Dav04] and SuperLU[Li05] have made direct methods competitive for quite large problems.The 1970s and 1980s saw the advent of modern iterative methods. Thesegrew out of classical iterative methods such as relaxation methods [?] and theconjugate gradient iteration of Hestenes and Stieffel [HS52]. These techniquescan use much less memory than direct methods and are easier to parallelize.◮ Author note: Missing reference for relaxation methodsMultigrid methods [Bra77, Wes92] use relaxation techniques on a hierarchyof meshes to solve elliptic equations, typically for symmetric problems, in nearlylineartime. However,theyrequire ahierarchy ofmeshesthat maynotalways be38

Robert C. Kirby and Anders Loggavailable. Thismotivatedtheintroductionofalgebraicmultigridmethods(AMG)that mimic mesh coarsening, working only on the matrix entries. SuccessfulAMGdistributionsincludetheHyprepackage [FY02]andtheMLpackage insideTrilinos [HBH + 05].Krylov methods such as conjugate gradients and GMRES [SS86] generatea sequence of approximations converging to the solution of the linear system.These methodsare based onlyon the matrix–vector product. The performance ofthese methods is significantly improved by use of preconditioners, which transformthe linear systemAU = bintoP −1 AU = P −1 b,which is known as left preconditioning. The preconditioner P −1 may also beapplied from the right by recognizing that AU = AP −1 (PU). To ensure good convergence,the preconditioner P −1 should be a good approximation of A −1 . Somepreconditioners are strictly algebraic, meaning they only use information availablefrom the entries of A. Classical relaxation methods such as Gauss–Seidelmay be used as preconditioners, as can so-called incomplete factorizations [?].If multigrid or AMG is available, it also can serve as a powerful preconditioner.Other kinds of preconditioners require special knowledge about the differentialequations being solved and may require new matrices modeling related physicalprocesses. Such methods are sometimes called physics-based preconditioners[?]. An automated system, such as FEniCS, provides an interesting opportunityto assist with the development and implementation of these powerful butless widely used methods.◮ Author note: Missing reference for incomplete LU factorization and physics-based preconditionersFortunately, many of the methods discussed here are included in modern librariessuch as PETSc [BBE + 04] and Trilinos [HBH + 05]. FEniCS typically interactswith the solvers discussed here through these packages and so mainlyneed to be aware of the various methods at a high level, such as when the variousmethods are appropriate and how to access them.3.6 Finite Element Error Estimation and AdaptivityThe error e = u h − u in a computed finite element solution u h approximating theexact solution u of (3.6) may be estimated either a priori or a posteriori. Bothtypes of estimates are based on relating the size of the error to the size of the(weak) residual r : V → R defined byr(v) = a(v, u h ) − L(v). (3.30)39

The Finite Element MethodWe note that the weak residual is formally related to the strong residual R ∈ V ′by r(v) = (v, R).A priori error estimates express the error in terms of the regularity of theexact (unknown) solution and may give useful information about the order ofconvergence of a finite element method. A posteriori error estimates expressthe error in terms of computable quantities like the residual and (possibly) thesolution of an auxiliary dual problem.3.6.1 A priori error analysisWe consider the linear variational problem (3.6). We first assume that the bilinearform a and the linear form L are continuous (bounded), that is, there existsa constant C > 0 such thata(v, w) ≤ C‖v‖ V ‖w‖ V , (3.31)L(v) ≤ C‖v‖ V , (3.32)for all v, w ∈ V. For simplicity, we assume in this section that ˆV = V is a Hilbertspace. For (3.1), this corresponds to the case of homogeneous Dirichlet boundaryconditions and V = H 1 0 (Ω). Extensions to the general case ˆV ≠ V are possible,see for example [OD96]. We further assume that the bilinear form a is coercive(V-elliptic), that is, there exists a constant α > 0 such thata(v, v) ≥ α‖v‖ V , (3.33)for all v ∈ V. It then follows by the Lax–Milgram theorem [LM54] that thereexists a unique solution u ∈ V to the variational problem (3.6).To derive an a priori error estimate for the approximate solution u h definedby the discrete variational problem (3.7), we first note thata(v, u h − u) = a(v, u h ) − a(v, u) = L(v) − L(v) = 0for all v ∈ V h ⊂ V. By the coercivity and continuity of the bilinear form a, we findthatα‖u h − u‖ 2 V ≤ a(u h − u, u h − u) = a(u h − v, u h − u) + a(v − u, u h − u)for all v ∈ V h . It follows that= a(v − u, u h − u) ≤ C‖v − u‖ V ‖u h − u‖ V .‖u h − u‖ V ≤ C α ‖v − u‖ V ∀v ∈ V h . (3.34)Theestimate(3.34)isreferredtoasCea’slemma. Wenotethatwhenthebilinearform a is symmetric, it is also an inner product. We may then take ‖v‖ V =40

Robert C. Kirby and Anders LoggFigure 3.10: The finite element solution u h ∈ V h ⊂ V is the a-projection of u ∈ Vonto the subspace V h and is consequently the best possible approximation of u inthe subspace V h .√a(v, v) and C = α = 1. In this case, uh is the a-projection onto V h and Cea’slemma states that‖u h − u‖ V ≤ ‖v − u‖ V ∀v ∈ V h , (3.35)that is, u h is the best possible solution of the variational problem (3.6) in thesubspace V h . This is illustrated in Figure 3.10.Cea’s lemma together with a suitable interpolation estimate now yields thea priori error estimate for u h . By choosing v = π h u, where π h : V → V h is aninterpolation operator into V h , we find that‖u h − u‖ V ≤ C α ‖π hu − u‖ V ≤ CC iα ‖hp D q u‖, (3.36)where C i is an interpolation constant and the values of p and q depend on theaccuracy of interpolation and the definition of ‖ · ‖ V . For the solution of Poisson’sequation in H0 1 , we have C = α = 1 and p = q = 1.3.6.2 A posteriori error analysisEnergy norm error estimatesThe continuity and coercivity of the bilinear form a also allows a simple derivationof an a posteriori error estimate. In fact, it follows that the V-norm of theerror e = u h − u is equivalent to the V ′ -norm of the residual r. To see this, wenote that by the continuity of the bilinear form a, we haver(v) = a(v, u h ) − L(v) = a(v, u h ) − a(v, u) = a(v, u h − u) ≤ C‖u h − u‖ V ‖v‖ V .41

The Finite Element MethodFurthermore, by coercivity, we find thatα‖u h − u‖ 2 ≤ a(u h − u, u h − u) = a(u h − u, u h ) − L(u h − u) = r(u h − u).It follows thatα‖u h − u‖ V ≤ ‖r‖ V ′ ≤ C‖u h − u‖ V , (3.37)where ‖r‖ V ′ = sup v∈V,v≠0 r(v)/‖v‖ V .The estimates (3.36) and (3.37) are sometimes referred to as energy normerror estimates. This is the case when the bilinear form a is symmetric and thusdefines an inner product. One may then take ‖v‖ V = √ a(v, v) and C = α = 1. Inthis case, it follows that‖e‖ V = ‖r‖ V ′. (3.38)The term energy norm refers to a(v, v) corresponding to physical energy in manyapplications.Duality-based error controlThe classical a priori and a posteriori error estimates (3.36) and (3.37) relatethe V-norm of the error e = u h − u to the regularity of the exact solution u andthe residual r = a(v, u u ) − L(v) of the finite element solution u h respectively.However, in applications it is often necessary to control the error in a certainoutput functional M : V → R of the computed solution to within some giventolerance TOL > 0. In these situations, one would thus ideally like to choose thefinite element space V h ⊂ V such that the finite element solution u h satisfies|M(u h ) − M(u)| ≤ TOL (3.39)with minimal computational work. We assume here that both the output functionaland the variational problem are linear, but the analysis may be easilyextended to the full nonlinear case, see [EEHJ95, BR01].To estimate the error in the output functional M, we introduce an auxiliarydual problem: Find z ∈ V ∗ such thata ∗ (v, z) = M(v) ∀v ∈ ˆV ∗ . (3.40)Wenoteherethatthefunctional Mentersasdatain thedual problem. The dual(adjoint) bilinear form a ∗ : ˆV ∗ × V ∗ is defined bya ∗ (v, w) = a(w, v).The dual trial and test spaces are given byV ∗ = ˆV ,ˆV ∗ = V 0 = {v − w : v, w ∈ V },42

C ∑ KRobert C. Kirby and Anders Loggthat is, the dual trial space is the primal test space and the dual test space isthe primal trial space modulo boundary conditions. In particular, if V = u 0 + ˆVand V h = u 0 + ˆV h then ˆV ∗ = ˆV, and thus both the dual test and trial functionsvanish at Dirichlet boundaries. The definition of the dual problem leads us tothe following representation of the error:M(u h ) − M(u) = M(u h − u)= a ∗ (u h − u, z)= a(z, u h − u)= a(z, u h ) − L(z)= r(z).We thus find that the error is exactly represented by the residual applied to thedual solution,M(u h ) − M(u) = r(z). (3.41)3.6.3 AdaptivityAs seen above, one may thus estimate the error in a computed finite elementsolution u h , either the error in the V-norm or the error in an output functional,by estimating the size of the residual r. This may be done in several differentways. The estimate typically involves integration by parts to recover the strongelement-wise residual of the original PDE, possibly in combination with the solutionof local problems over cells or patches of cells. In the case of the standardpiecewise linear finite element approximation of Poisson’s equation (3.1), onemay obtain the following estimate:( ∑K∈T‖u h − u‖ V = ‖∇e‖ ≤ Ch 2 K ‖R‖2 K + h K‖[∂ n u h ]‖ 2 ∂K) 1/2,where R| K = −∆u h | K − f| K is the strong residual, h K denotes the mesh size(diameter of smallest circumscribed sphere) and [∂ n u h ] denotes the jump of thenormal derivative across mesh facets. Letting ηK 2 = h2 K ‖R‖2 K + h K‖[∂ n u h ]‖ 2 ∂K , onethus obtains the estimate(‖u h − u‖ V ≤ E ≡η 2 K) 1/2.An adaptive algorithm seeksto determineameshsize h = h(x) such that E ≤TOL. Starting from an initial coarse mesh, the mesh is successively refined inthosecells wherethe errorindicator η K islarge. Several strategies are available,such as refining the top fraction of all cells where η K is large, say the first 20%43

The Finite Element MethodFigure 3.11: An adaptively refinedmesh obtained by successive refinement ofanoriginal coarse mesh.of all cells ordered by η K . Other strategies include refining all cells where η K isabove a certain fraction of max K∈T η K , or refining a top fraction of all cells suchthat the sum of their error indicators account for a significant fraction of E.◮ Author note: Find good reference for adaptive strategies.Once the mesh has been refined, a new solution and new error indicatorscan be computed. The process is then repeated until either E ≤ TOL (the stoppingcriterion) or the available resources (CPU time and memory) have been exhausted.The adaptive algorithm thus yields a sequence of successively refinedmeshes as illustrated in Figure 3.11. For time-dependent problems, an adaptivealgorithm needs to distribute both the mesh size and the size of the time stepin both space and time. Ideally, the error estimate E is close to the actual error,as measured by the efficiency index E/‖u h − u‖ V which should be close to one bybounded below by one.3.7 Automating the Finite Element MethodThe FEniCS project seeks to automate Scientific Computing as explained inChapter [intro]. This is a formidable task, but it may be solved in part by automatingthe finite element method. In particular, this automation relies on thefollowing key steps:(i) automation of discretization,(ii) automation of discrete solution,44

Robert C. Kirby and Anders Logg(iii) automation of error control.Since its inception in 2003, the FEniCS project has been concerned mainly withthe automation of discretization, resulting in the development of the form compilersFFC and SyFi/SFC, the code generation interface UFC and the form languageUFL. As a result, the first step towards a complete automation is nowclose to complete; variational problems for a large class of partial differentialequations may now be automatically discretized by the finite element methodusing FEniCS. For the automation of discrete solution, that is, the solution oflinear and nonlinear systems arising from the automated discretization of variationalproblems, interfaces to state-of-the-art libraries for linear algebra havebeenimplementedaspart ofDOLFIN.Ongoingworkisnowseekingtoautomateerror control by automated error estimation and adaptivity as part of FEniCS.3.8 OutlookIn the following chapters, we return to specific aspects of the automation of thefinite element method. In the next chapter, we review a number of commonand unusual finite elements, including the standard Lagrange elementsbut alsosomemoreexoticelements. InChapter5,wethendiscusstheautomatedgenerationof finite element nodal basis functions from a given finite element definition(K, P K , L K ). In Chapter 6, we consider general finite element variational formsarising from the discretization of PDEs and discuss the automated assembly ofthe corresponding discrete operators in Chapter 7. We then discuss specific optimizationstrategies for form evaluation in Chapters ??–??.◮ Author note: This section needs to be reworked when the following chapters have materialized.3.9 Historical NotesIn 1915, Boris Grigoryevich Galerkin formulated a general method for solvingdifferential equations. [Gal15] A similar approach was presented sometimeearlier by Bubnov. Galerkin’s method, or the Bubnov–Galerkin method, wasoriginally formulated with global polynomials and goes back to the variationalprinciples of Leibniz, Euler, Lagrange, Dirichlet, Hamilton, Castigliano [Cas79],Rayleigh [Ray70] and Ritz [Rit08]. Galerkin’s method with piecewise polynomialspaces (ˆV h , V h ) is known as the finite element method. The finite elementmethod was introduced by engineers for structural analysis in the 1950s andwas independently proposed by Courant in 1943 [Cou43]. The exploitation ofthe finite element methodamong engineers and mathematicians exploded in the1960s. Since then, the machinery of the finite element method has been expandedand refined into a comprehensive framework for design and analysis of45

The Finite Element MethodFigure 3.12: Boris Galerkin (1871–1945), inventor of Galerkin’s method.numerical methods for differential equations, see [ZTZ67, SF73, Cia76, Cia78,BCO81, Hug87, BS94] Recently, the quest for compatible (stable) discretizationsof mixed variational problems has led to the introduction of finite element exteriorcalculus. [AFW06a]Work on a posteriori error analysis of finite element methods dates back tothe pioneering work of Babuška and Rheinboldt. [BR78]. Important referencesinclude the works [BW85, ZZ87, EJ91, EJ95a, EJ, EJ95b, EJ95c, EJL98, AO93]and the reviews papers [EEHJ95, Ver94, Ver99, AO00, BR01].◮ Author note: Need to check for missing/inaccurate references here.◮ Editor note: Might use a special box/layout for historical notes if they appear in manyplaces.46

CHAPTER4Common and Unusual Finite ElementsBy Robert C. Kirby, Anders Logg and Andy R. TerrelChapter ref: [kirby-6]This chapter provides a glimpse of the considerable range of finite elementsin the literature and the challenges that may be involved with automating “all”the elements. Many of the elements presented here are included in the FEniCSproject already; some are future work.4.1 Ciarlet’s Finite Element DefinitionAs discussed in Chapter 3, a finite element is defined by a triple (K, P K , L K ),where• K ⊂ R d is a bounded closed subset of R d with nonempty interior and piecewisesmooth boundary;• P K is a function space on K of dimension n K < ∞;• L K = {l K 1 , l K 2 , . . .,l K n K} is a basis for P K ′P K ).(the bounded linear functionals onThis definition was first introduced by Ciarlet in a set oflecture notes[Cia75]and became popular after his 1978 book [Cia78, Cia02]. It remains the standarddefinition today, see for example [BS08]. Similar ideas were introduced earlierin [CR72] which discusses unisolvence of a set of interpolation points Σ = {a i } i .This is closely related to the unisolvence of L K . In fact, the set of functionals L Kis given by l K i (v) = v(a i). It is also interesting to note that the Ciarlet triple was47

Common and Unusual Finite Elementsoriginally written as (K, P, Σ) with Σ denoting L K . Conditions for uniquely determiningapolynomialbasedoninterpolation offunctionvaluesandderivativesat a set of points was also discussed in [?], although the term unisolvence wasnot used.4.2 NotationIt is common to refer to the space of linear functionals L K as the degrees of freedomof the element (K, P K , L K ). The degrees of freedom are typically given bypoint evaluation or moments of function values or derivatives. Other commonlyused degrees of freedom are point evaluation or moments of certain componentsoffunction values, such asnormal or tangential components,but also directionalderivatives. We summarize the notation used to indicate degrees of freedomgraphically in Figure 4.1. A filled circle at a point ¯x denotes point evaluation atthat point,l(v) = v(¯x).We note that for a vector valued function v with d components, a filled circle denotesevaluationofall componentsand thuscorrespondsto d degreesoffreedom,l 1 (v) = v 1 (¯x),l 2 (v) = v 2 (¯x),l 3 (v) = v 3 (¯x).An arrow denotes evaluation of a component of a function value in a given direction,such as a normal component l(v) = v(¯x) · n or tangential componentl(v) = v(¯x) · t. A plain circle denotes evaluation of all first derivatives, a linedenotes evaluation of a directional first derivative such as a normal derivativel(v) = ∇v(¯x) · n. A dotted circle denotes evaluation of all second derivatives. Finally,acirclewithanumberindicatesanumberofinteriormoments(integrationagainst functions over the domain K).48

Robert C. Kirby, Anders Logg and Andy R. Terrelpoint evaluationpoint evaluation of directional componentpoint evaluation of all first derivativespoint evaluation of directional derivativepoint evaluation of all second derivatives3interior momentsFigure 4.1: Notation49

4.3 The Argyris Element4.3.1 DefinitionCommon and Unusual Finite ElementsThe Argyris triangle [AFS68, Cia02] is based on the space P K = P 5 (K) of quinticpolynomials over some triangle K. It can be pieced together with full C 1 continuitybetween elements with C 2 continuity at the vertices of a triangulation.Quintic polynomials in R 2 are a 21-dimensional space, and the dual basis L Kconsists of six degrees of freedom per vertex and one per each edge. The vertexdegrees of freedom are the function value, two first derivatives to specify thegradient, and three second derivatives to specify the unique components of the(symmetric) Hessian matrix.Figure 4.2: The quintic Argyris triangle.4.3.2 Historical notesTheArgyriselement[AFS68]wasfirstcalledtheTUBAelementandwasappliedtofourth-orderplate-bending problems. Infact, asCiarlet pointsout[Cia02], theelement also appeared in an earlier work by Felippa [Fel66].The normal derivatives in the dual basis for the Argyris element prevent itfrom being affine-interpolation equivalent. This prevents the nodal basis frombeing constructed on a reference cell and affinely mapped. Recent work byDominguez and Sayas [?] has developed a transformation that corrects this issueand requires less computational effort than directly forming the basis oneach cell in a mesh.The Argyris element can be generalized to polynomial degrees higher thanquintic, still giving C 1 continuity with C 2 continuity at the vertices [?]. TheArgyris elementalso makes an appearance in exact sequences offinite elements,where differential complexes are used to explain the stability of many kinds offinite elements and derive new ones [AFW06a].50

Robert C. Kirby, Anders Logg and Andy R. Terrel4.4 The Brezzi–Douglas–Marini element4.4.1 DefinitionThe Brezzi–Douglas–Marini element [BDM85b] discretizes H(div). That is, itprovides a vector field that may be assembled with continuous normal componentsso that global divergences are well-defined. The BDM space on a simplexin d dimensions (d = 2, 3) consists of vectors of length d whose components arepolynomials of degree q for q ≥ 1.3 8Figure 4.3: The linear, quadratic and cubic Brezzi–Douglas–Marini triangles.The degrees of freedom for the BDM triangle include the normal componenton each edge, specified either by integral moments against P q or the value of thenormalcomponent at q +1 pointsper edge. For q > 1, thedegrees of freedom alsoinclude integration against gradients of P q (K) over K. For q > 2, the degrees offreedom also include integration against curls of b K P q−2 (K) over K, where b K isthe cubic bubble function associated with K.◮ Author note: What about tets? Will also make up for the empty space on the next page.The BDM element is also defined on rectangles and boxes, although it hasquite a different flavor. Unusually for rectangular domains, it is not definedusing tensor products of one-dimensional polynomials, but instead by supplementingpolynomials of complete degree [P q (K)] d with extra functions to makethe divergence onto P q (K). The boundary degrees of freedom are similar to thesimplicialcase,buttheinternaldegreesoffreedomareintegralmomentsagainst[P q (K)] d .4.4.2 Historical notesThe BDM element was originally derived in two dimensions [BDM85b] as analternative to the Raviart–Thomas element using a complete polynomial space.51

Common and Unusual Finite ElementsExtensions to tetrahedra came via the “second-kind” elements of Nédélec [?] aswell as in Brezzi and Fortin [BF91]. While Nédélec uses quite different internaldegrees of freedom (integral moments against the Raviart–Thomas spaces), thedegrees of freedom in Brezzi and Fortin are quite similar to [BDM85b].A slight modification of the BDM element constrains the normal componentson the boundary to be of degree q − 1 rather than q. This is called the Brezzi–Douglas–Fortin–Marini or BDFM element [BF91]. In similar spirit, elementswith differing orders on the boundary suitable for varying the polynomial degreebetween triangles were derived in [BDM85a]. Besides mixed formulationsof second-order scalar elliptic equations, the BDM element also appears in elasticity[AFW07],whereitisseenthateachrowofthestresstensormaybeapproximatedin a BDM space with the symmetry of the stress tensor imposed weakly.◮ Author note: Fill up the blank space here. Adding a discussion and possibly a figurefor tets should help.52

Robert C. Kirby, Anders Logg and Andy R. Terrel4.5 The Crouzeix–Raviart element4.5.1 DefinitionThe Crouzeix–Raviart element [CR73] most commonly refers to a linear nonconformingelement. It uses piecewise linear polynomials, but unlike the Lagrangeelement, the degrees of freedom are located at edge midpoints ratherthan at vertices. This gives rise to a weaker form of continuity, but it is still asuitable C 0 -nonconforming element. The extension to tetrahedra in R 3 replacesthe degrees of freedom on edge midpoints by degrees of freedom on face midpoints.◮ Author note: What other element does it refer to? Sounds like there may be several, butI just know about this one.Figure 4.4: The linear Crouzeix–Raviart triangle.4.5.2 Historical notesCrouzeix and Raviart developed two simple Stokes elements, both using pointwiseevaluation for degrees of freedom. The second element used extra bubblefunctions to enrich the typical Lagrange element, but the work of Crouzeix andFalk [CF89] later showed that the bubble functions were in fact not necessaryfor quadratic and higher orders.◮ Author note: The discussion in the previous paragraph should be expanded so it statesmore explicitly what this has to do with the CR element.The element is usually associated with solving the Stokes problem but hasbeen used for linear elasticity [HL03] and Reissner-Mindlin plates [AF89] as aremedy for locking. There is an odd order extension of the element from Arnoldand Falk.◮ Author note: Missing reference here to odd order extension.53

4.6 The Hermite Element4.6.1 DefinitionCommon and Unusual Finite ElementsTheHermite element[Cia02]generalizes theclassic cubic Hermiteinterpolatingpolynomials on the line segment. On the triangle, the space of cubic polynomialsis ten-dimensional, and the ten degrees of freedom are point evaluation at thetriangle vertices and barycenter, together with the components of the gradientevaluated at the vertices. The generalization to tetrahedra is analagous.Figure 4.5: The cubic Hermite triangle.Unlike the cubic Hermite functions on a line segment, the cubic Hermite triangleand tetrahedron cannot be patched together in a fully C 1 fashion.4.6.2 Historical notesHermite-type elements appear in the finite element literature almost from thebeginning,appearingatleastasearlyastheclassicpaperofCiarletandRaviart[CR72].Theyhavelongbeenknownasuseful C 1 -nonconformingelements[Bra07,Cia02].Underaffinemapping,theHermiteelementsformaffine-interpolationequivalentfamilies. [BS08].54

Robert C. Kirby, Anders Logg and Andy R. Terrel4.7 The Lagrange Element4.7.1 DefinitionThebest-knownandmostwidelyusedfiniteelementistheLagrange P 1 element.Ingeneral,theLagrange elementuses P K = P q (K),polynomialsofdegree q on K,andthedegreesoffreedomare simplypointwiseevaluation atanarray ofpoints.While numerical conditioning and interpolation properties can be dramaticallyimproved by choosing these points in a clever way [?], for the purposes of thischapter the points may be assumed to lie on an equispaced lattice.◮ Author note: Missing reference for statement about node placement.Figure 4.6: The linear Lagrange interval, triangle and tetrahedron.Figure 4.7: The quadratic Lagrange interval, triangle and tetrahedron.55

Common and Unusual Finite ElementsFigure 4.8: The Lagrange P q triangle for q = 1, 2, 3, 4, 5, 6.4.7.2 Historical notesReams could be filled with all the uses of the Lagrange elements. The Lagrangeelement predates the modern study of finite elements. The lowest-order triangleis sometimes called the Courant triangle, after the seminal paper [Cou43] inwhich variational techniques are considered and the P 1 triangle is usedto derivea finite difference method. The rest is history.◮ Author note: Expand the historical notes for the Lagrange element. As far as I can see,Bramble and Zlamal don’t seem to be aware of the higher order Lagrange elements (onlythe Courant triangle). Their paper from 1970 focuses only on Hermite interpolation.56

Robert C. Kirby, Anders Logg and Andy R. Terrel4.8 The Morley Element4.8.1 DefinitionThe Morley triangle [Mor68] is a simple H 2 -nonconforming quadratic elementthat is used in fourth-order problems. The function space is simply P K = P 2 (K),the six-dimensional space of quadratics. The degrees of freedom consist of pointwiseevaluation at each vertex and the normal derivative at each edge midpoint.It is interesting that the Morley triangle is neither C 1 nor even C 0 , yet it issuitable for fourth-order problems, and is the simplest known element for thispurpose.Figure 4.9: The quadratic Morley triangle.4.8.2 Historical notesThe Morley element was first introduced to the engineering literature by Morleyin 1968 [Mor68]. In the mathematical literature, Lascaux and Lesaint [LL75]considereditinthecontextofthepatchtestinastudyofplate-bendingelements.◮ Author note: Fill up page.57

4.9 The Nédélec Element4.9.1 DefinitionCommon and Unusual Finite ElementsThe widely celebrated H(curl)-conforming elements of Nédélec [?, ?] are muchused in electromagnetic calculations and stand as a premier example of thepower of “nonstandard” (meaning not lowest-order Lagrange) finite elements.2 6Figure 4.10: The linear, quadratic and cubic Nédélec triangles.On triangles, the function space P K may be obtained by a simple rotationof the Raviart–Thomas basis functions, but the construction of the tetrahedralelement is substantially different. In the lowest order case q = 1, the space P Kmay be written as functions of the formv(x) = α + β × x,where α and β are vectors in R 3 . Hence, P K contains all vector-valued constantfunctions and some but not all linears. In the higher order case, the functionspace may be written as the direct sumwhereP K = [P q−1 (K)] 3 ⊕ S q ,S q = {v ∈ [ ˜P q (K)] 3 : v · x = 0}.Here, ˜P q (K) is the space of homogeneous polynomials of degree q on K. An alternatecharacterization of P K is that it is the space of polynomials of degree q + 1onwhich the qth poweroftheelastic stresstensorvanishes. The dimension of P qis exactlyq(q + 2)(q + 3)n K = .2◮ Author note: What is the qth power of the elastic stress tensor?58

Robert C. Kirby, Anders Logg and Andy R. TerrelSimplex H(div) H(curl)K ⊂ R 2 RT q−1 Pq −Λ1 (K)BDM qP q Λ 1 (K)NED q−1 (curl) —K ⊂ R 3 RT q−1 = NED 1 q−1 (div) P− q Λ2 (K)BDM q = NED 2 q (div) P qΛ 2 (K)NED 1 q−1 (curl)NED 2 q (curl)P− q Λ1 (K)P qΛ 1 (K)Table 4.1: Nedelec elements of the first and second kind and their relation to theRaviart–ThomasandBrezzi–Douglas–Marini elementsaswellastothenotationof finite element exterior calculus.◮ Author note: What is the dimension on triangles?The degrees of freedom are chosen to ensure tangential continuity betweenelements and thus a well-defined global curl. In the lowest order case, the sixdegreesoffreedomare theaverage value ofthetangential componentalong eachedge of the tetrahedron, hence the term “edge elements”. In the more generalcase, the degrees of freedom are the q − 1 tangential moments along each edge,moments of the tangential components against (P q−2 ) 2 on each face, and momentsagainst (P q−3 ) 3 in the interior of the tetrahedron.Fortetrahedra,therealsoexistsanotherfamilyofelementsknownasNedelecelements of the second kind, appearing in [?]. These have a simpler functionspace at the expense of more complicated degrees of freedom. The second kindspace of order q is simply vectors of polynomials of degree q. The degrees offreedom are integral momentsof degree q along each edge togetherwith integralmoments against lower-order first-kind bases on the faces and interior.◮ Author note: Note different numbering compared to RT, starting at 1, not zero.4.9.2 Historical notesNédélec’s original paper [?] provided rectangular and simplicial elements forH(div)and H(curl)basedonincompletefunctionspaces. Thisbuiltonearliertwodimensionalwork for Maxwell’s equations [AGSNR80] and extended the work ofRaviart and Thomas for H(div) to three dimensions. The second kind elements,appearing in [?], extend the Brezzi–Douglas–Marini triangle [BDM85b] to threedimensionsand curl-conforming spaces. We summarize the relation betweentheNedelec elements of first and second kind with the Raviart–Thomas and Brezzi–Douglas–Marini elements in Table 4.1.In many ways, Nédélec’s work anticipates the recently introduced finite elementexterior calculus [AFW06a], where the first kind spaces appear as P − q Λ kspaces and the second kind as P q Λ k . Moreover, the use of a differential operator59

Common and Unusual Finite Elements(the elastic strain) in [?] to characterize the function space foreshadows the useof differential complexes [AFW06b].◮ Author note: Should we change the numbering of the Nedelec elements and Raviart–Thomas elements to start at q = 1?60

Robert C. Kirby, Anders Logg and Andy R. Terrel4.10 The PEERS Element4.10.1 DefinitionThe PEERS element [ABD84] provides a stable tensor space for discretizingstress in two-dimensional mixed elasticity problems. The stress tensor σ is representedas a 2 × 2 matrix, each row of which is discretized with a vector-valuedfiniteelement. Normally, oneexpectsthestresstensortobesymmetric,althoughthe PEERS element works with a variational formulation that enforces this conditionweakly.The PEERS element is based on the Raviart–Thomas element described inSection 4.11. If RT 0 (K) is the lowest-order Raviart-Thomas function space ona triangle K and b K is the cubic bubble function that vanishes on ∂K, then thefunction space for the PEERS element is given byP K = [RT 0 (K) ⊕ span{curl(b K )}] 2 .?Figure 4.11: The PEERS triangle. One vector-valued component is shown.◮ Author note: Which degrees of freedom in the interior? The curl?◮ Author note: Is this really an element? We could also introduce other mixed elementslike Taylor–Hood. But perhaps it’s suitable to include it since it is not a trivial combinationof existing elements (the extra curl part).4.10.2 Historical notesDiscretizing the mixed form of planar elasticity is quite a difficult task. Polynomialspaces of symmetric tensors providing inf-sup stability are quite rare, onlyappearing in the last decade [AW02]. A common technique is to relax the symmetryrequirementofthetensor,imposingitweaklyinavariationalformulation.61

Common and Unusual Finite ElementsThis extended variational form requires the introduction of a new field discretizingtheassymetricportionofthestresstensor.WhenthePEERSelementisusedforthestress,thedisplacement isdiscretized inthespace ofpiecewiseconstants,andtheasymmetricpartisdiscretizedin thestandardspaceofcontinuouspiecewiselinear elements.The PEERS element was introduced in [ABD84], and some practical details,including postprocessing and hybridization strategies, are discussed in [AB85].62

Robert C. Kirby, Anders Logg and Andy R. Terrel4.11 The Raviart–Thomas Element4.11.1 DefinitionTheRaviart–Thomaselement,liketheBrezzi–Douglas–MariniandBrezzi–Douglas–Fortin–Marini elements, is an H(div)-conforming element. The space of order qisconstructedtobethesmallestpolynomialspacesuchthatthedivergence mapsRT q (K) onto P q (K). The function space P K is given byP K = P q−1 (K) + xP q−1 (K).The lowest order Raviart–Thomas space thus consists of vector-valued functionsof the formv(x) = α + βx,where α is a vector-valued constant and β is a scalar constant.On triangles, the degrees of freedom are the moments of the normal componentup to degree q, or, alternatively, the normal component at q + 1 points peredge. For higher order spaces, these degrees of freedom are supplemented withintegrals against a basis for [P q−1 (K)] 2 .2 6Figure 4.12: The zeroth order, linear and quadratic Raviart–Thomas triangles.4.11.2 Historical notesThe Raviart–Thomas element was introduced in [RT77] in the late 1970’s, thefirst element to discretize the mixed form of second order elliptic equations.Shortlythereafter, it wasextendedto tetrahedraand boxesby Nédélec [?]andsois sometimes referred to as the Raviart–Thomas–Nédélec element or a first kindH(div) element.On rectangles and boxes, there is a natural relation between the lowest orderRaviart–Thomas element and cell-centered finite differences. This was explored63

Common and Unusual Finite Elementsin [RW83], where a special quadrature rule was used to diagonalize the massmatrix and eliminate the flux unknowns. Similar techniques are known for triangles[ADK + 98], although the stencils are more complicated.64

4.12 SummaryRobert C. Kirby, Anders Logg and Andy R. TerrelNotation Element family L K dim P K ReferencesARG 5 Quintic Argyris 21BDM 1 Brezzi–Douglas–Marini 6BDM 2 Brezzi–Douglas–Marini 3 12BDM 3Brezzi–Douglas–Marini820CR 1 Crouzeix–Raviart 3HERM q Hermite 1065

Common and Unusual Finite ElementsP 1 Lagrange 3P 2 Lagrange 6P 3 Lagrange 10MOR 1 Morley 6NED 1 Nédélec 3NED 2Nédélec28NED 3Nédélec61566

Robert C. Kirby, Anders Logg and Andy R. TerrelPEERSPEERS??RT 0 Raviart–Thomas 3RT 0 Raviart–Thomas 2 8RT 0 Raviart–Thomas 6 1567

Common and Unusual Finite Elements◮ Author note: Add references to table.◮ Author note: Indicate which elements are supported by FIAT and SyFi.◮ Author note: Include formula for space dimension as function of q for all elements.68

CHAPTER5Constructing General Reference Finite ElementsBy Robert C. Kirby and Kent-Andre MardalChapter ref: [kirby-1]◮ Editor note: Reuse color figures from chapter [kirby-7] for RT elements etc.5.1 IntroductionThe finite element literature contains a huge collection of approximating spacesand degrees of freedom, many of which are surveyed in Chapter ??, Some applications,such asCahn-Hilliard andotherfourth-orderproblems, can benefit fromvery smooth finite element bases. While porous media flow requires vector fieldsdiscretized by piecewise polynomial functions with normal components continuousacross cell boundaries. Many problems in electromagnetism call for thetangentially continuous vector fields obtained by using Nedelec elements [?, ?].Manyelementsarecarefully designedtosatisfy theinf-supcondition [?,?], originallydevelopedtoexplainstabilityofdiscretizationsofincompressibleflowproblems.Additionally, someproblemscall forlow-orderdiscretizations, while othersare effectively solved with high-order polynomials.Whiletheautomaticgenerationofcomputerprogramsforfiniteelementmethodsrequires one to confront the panoply of finite element families found in theliterature, it also provides a pathway for wider employment of Raviart-Thomas,Nedelec, and other difficult-to-program elements. Ideally, one would like to describethe diverse finite element spaces at an abstract level, whence a computercode discerns how to evaluate and differentiate their basis functions. Such goalsare in large part accomplished by the FIAT and SyFi projects, whose implemen-69

Constructing General Reference Finite Elementstations are described in later chapters.Projects like FIAT and SyFi may ultimately remain mysterious to the enduser of a finite element system, as interactions with finite element bases aretypically mediated through tools that construct the global finite element operators.The end user will typically be satisfied if two conditiones are met. First, afinite element system should support the common elements used in the applicationarea of interest. Second, it should provide flexibility with respect to order ofapproximation.It is entirely possible to satisfy many users by a priori enumerating a listof finite elements and implement only those. At certain times, this would evenseem ideal. For example, after the rash of research that led to elements such asthe Raviart-Thomas-Nedelec and Brezzi-Douglas-Marini families, developmentof new families slowed considerably. Then, more recent work of lead forth byArnold, Falk, and Winther in the context of exterior calculus has not only ledto improved understanding of existing element families, but has also brought anew wave of elements with improved proprerties. A generative system for finiteelement bases can far more readily assimilate these and future developments.Automation also provides generality with respect to the order of approximationthat standard libraries might not otherwise provide. Finally, the end-user mighteven easilly define his own new element and test its numerical properties beforeanalyzing it mathematically.In the present chapter, wedescribe the mathematical formulation underlyingsuch projects as FIAT [?, ?], SyFi [?, AM09] and Exterior [?]. This formulationstarts from definitions of finite elements as given classically by Ciarlet [?]. Itthen uses basic linear algebra to construct the appropriate nodal basis for anabstract finite element in terms of polynomials that are easy to implement andwell-behaved in floating point arithmetic. We focus on constructing nodal basesfor a single, fixed reference element. As we will see in Chapter ??, form compilerssuch as ffc [Log07] and sfc [?] will work in terms of this single referenceelement.Other approaches exist in the literature, such as the hierarchical bases studiedby Szabo and Babuska [?] and extended to H(curl) and H(div) spaces in worksuch as [?]. These can provide greater flexibility for refining the mesh and polynomialdegree in finite element methods, but they also require more care duringassembly and are typically constructed on a case-by-case basis for each elementfamily. When they are available, they may be cheaper to construct than usingthe technique studied here, but this present technique is easier to apply to an“arbitrary” finite element and so is considered in the context of automatic software.70

5.2 PreliminariesRobert C. Kirby and Kent-Andre MardalBoth FIAT and SyFi work a slightly modified version of the abstract definition ofa finite element introduced by Ciarlet [?].Definition 5.1 (A Finite Element). A finite element is a triple (K, P, N) with1. K ⊂ R d a bounded domain with piecewise smooth boundary.2. A finite-dimensional space P of BC m (K, V ), where V is some normed vectorspace and BC m is the space of m-times bounded and continuosly differentiablefunctions from K into V.3. Adualbasisfor P,written N = {L i } dim Pi=1. Theseareboundedlinearfunctionalsin (BC m (K, V )) ′ and frequently called the nodes or degrees of freedom.In this definition, the term “finite element” refers not only to a particularcell in a mesh, but also to the associated function space and degrees of freedom.Typically, the domain K is some simple polygonal or polyhedral shape and thefunction space P consists of polynomials.Given a finite element, a concrete basis, often called the nodal basis, for thiselement can be computed by using the following defintion.Definition 5.2. The nodal basis for a finite element (K, P, N) be a finite elementis the set of functions {ψ i } dim Pi=1 such that for all 1 ≤ i, j ≤ dim P,L i (ψ j ) = δ i,j . (5.1)The main issue at hand in this chapter is the construction of this nodal basis.For any given finite element, one may construct the nodal basis explicitlywith elementary algebra. However, this becomes tedious as we consider manydifferent familes of elements and want arbitrary order instances of each family.Hence, we present a linear algebraic paradigm here that undergirds computerprograms for automating the construction of nodal bases.In addition to the construction of the nodal base we need to keep in mindthat finite elements are patched together to form a piecewise polynomial fieldover a mesh. The fitness (or stability) of a particular finite element method fora particular problem relies on the continuity requirements of the problem. Thedegrees of freedom of a particular element are often choosen such that thesecontinuity requirements are fulfilled.Hence, in addition to computing the nodal basis, the mathematical structuredeveloped here simplifies software for the following tasks:1. Evaluate the basis function and their derivatives at points.2. Associtate the basis function (or degree of freedom) with topological facetsof K such as its vertices, edges and its placement on the edges.71

Constructing General Reference Finite Elements3. Associtate the basis function with additional metadata such as a sign ora mapping that should be used together with the evaluation of the basisfunctions or its derivatives.4. Providerulesforthedegreesoffreedomappliedtoarbitraryinputfunctionsdetermined at run-time.The first of these is relatively simple in the framework of symbolic computation(SyFi), but they require more care if an implementation uses numericalarithmetic (FIAT). The middle two encode the necessary information for a clientcode to transform the reference element and assemble global degrees of freedomwhen the finite element is either less or more than C 0 continuous. The final taskmay take the form of a point at which data is evaluated or differentiated or moregenerally as the form of a sum over points and weights, much like a quadraturerule.A common practice, employed throuought the FEniCS software and in manyother finite element codes, is to map the nodal basis functions from this referenceelement to each cell in a mesh. Sometimes, this is as simple as an affinechange of coordinates; in other cases it is more complicated. For completeness,we briefly describe the basics of creating the global finite elements in terms of amapped reference element. Let therefore T be a polygon and ˆT the correspondingreference polygon. Between the coordinates x ∈ T and xi ∈ ˆT we use themappingx = G(xi) + x 0 , (5.2)and define the Jacobian determinant of this mapping asJ(x) =∂G(xi)∣ ∂xi ∣ . (5.3)The basis functions are defined in terms of the basis function on the referenceelement asN j (x) = ˆN j (xi), (5.4)where ˆN j is basis function number j on the reference element. The integral canthen be performed on the reference polygon,∫ ∫f(x) dx = f(xi) Jdxi, (5.5)Tand the spatial derivatives are defined by the derivatives on the reference elementand the geometry mapping simply by using the chain rule,ˆT∂N∂x i= ∂N∂ξ j∂ξ j∂x i. (5.6)72

RESEARCH EXPERIENCE2006 Guest Reviewer, Quality of Life Research.2004, 2005 Guest Reviewer, Health Psychology.2004 Guest Reviewer, British Journal of Health Psychology.6/03 - 8/05 NIMH Predoctoral TraineeUK Dept. of Behavioral Science: Lexington, KY.4/02 -4/04 Research ConsultantBethesda Naval Hospital Orofacial Pain Center.6/00 - 6/03 DOD Research TraineeUK Dept. of Behavioral Science: Lexington, KY.6/00 - 8/05 Research Assistant, UK Orofacial Pain Center: Lexington, KY.8/99 - 8/05 Research Assistant, UK Dept. of Psychology: Lexington, KY.8/98 - 5/99 Research Assistant, West Chester University Dept. of Psychology.CLINICAL EXPERIENCE8/06 – present Postdoctoral Fellow in Medical Psychology and PainRehabilitation.Mayo Graduate School of Medicine, Mayo Clinic.Rochester, MN8/05 – 8/06 Psychology Intern, Federal Medical Center, Lexington, KY.Pre-doctoral APA Accredited Internship.Major rotation: Behavioral Medicine.9/00 - 8/05 Student Therapist, JGH Psychological Services Center6/00 - 8/05 Psychology Intern, UK Orofacial Pain CenterTEACHING EXPERIENCE1/00 – 5/00 Teaching Assistant, University of Kentucky.Laboratory Instructor for Applications of Statistics in Psychology.Supervisor: Sung Hee Kim, Ph.D.8/99 - 12/99 Teaching Assistant, University of Kentucky.Laboratory Instructor for Introduction to Psychology Course.Supervisor: Jonathan Golding, Ph.D.59John E. SchmidtAugust, 2006

Constructing General Reference Finite Elements5.3.2 Polynomial spacesIn Definition 5.1 we defined the finite element in terms of a finite dimensionalfunction space P. Although it is not strictly neccessary, the functions used infinite elements are typically polynomials. While our general strategy will inprinciple accomodate non-polynomial bases, we only deal with polynomials inthis chapter. A common space is P d n , the space of polynomials of degree n in Rd .There are many different ways to represent P d n . We will discuss the power basis,orthogonal bases, and the Bernstein basis. Each of these bases has explicitrepresentationsand well-known evaluation algorithms. In addition to P d n we willalso for some elements need H d n , the space of homogenous polynomials of degreen in d variables.Typically, the developed techniques here are used on simplices, where polynomialsdo not have a nice tensor-product structure. Some rectangular elementsliketheBrezzi-Douglas-Marinifamily[],however,arenotbasedontensorproductpolynomial spaces, and the techniques described in this chapther applyin that case aswell. SyFi hasexplicit support for rectangular domains, but FIATdoes not.Power basisOn a line segment, P 1 n = P n the monomial, or power basis is {x i } n i=0 , so that anyv ∈ P n is written asn∑v = a 0 + a 1 x + . . .a n x n = a i x i . (5.14)In 2D on triangles, P n is spanned by functions on the form {x i y j } i+j≤ni,j=0 , with asimilar definition in three dimensions.This basis is quite easy to evaluate, differentiate, and integrate but is veryill-conditioned in numerical calculations.Legendre basisA popular polynomial basis for polygons that are either intervals, rectangles orboxes are the Legendre polynomials. This polynomial basis is also usable torepresent polynomials of high degree. The basis is definedon the interval [−1, 1],asL k (x) = 1 d k2 k k! dx k (x2 − 1), k = 0, 1, . . .,A nice feature with these polynomials is that they are orthogonal with respect tothe L 2 inner product, i.e.,∫ 1{ 2L k (x)L l (x) dx =, k = l,2k+10, k ≠ l,−174i=0

Robert C. Kirby and Kent-Andre MardalThe Legendre polynomials are extended to rectangular domains in 2D and 3Dsimply by taking the tensor product,L k,l,m (x, y, z) = L k (x)L l (y)L m (z).and P n is defined by functions on the form (in 3D),Dubiner basisv =k,l,m 0. As a possible disadvantage, the basislacks rotational symmetry that can be found in other bases.Bernstein basisThe Bernstein basis is another well-conditioned basis that can be used in generatingfinite element bases. In 1D, the basis functions take the form,( iB i,n = xn)i (1 − x) n−i , i = 0, . . ., n,75

Constructing General Reference Finite Elements2 6Figure 5.3: Triangular Raviart–Thomas elements of order 0, 1, and 2.freedom as in P 2 3 . The advantage of the Hermite element is that it has continuousderivatives at the vertices (it will however not neccessarily result in a H 2conforming approximation).Example 5.3. The Raviart-Thomas ElementIn Figure 5.3 we illustrate the Raviart-Thomas element. In contrast to theprevious elements, this element has a vector-valued function space. The arrowsrepresent normal vectors while the double dots indicate pointwise evaluation inboth the x− and y− direction. Starting at n = 0, the dimension of RT n is (n +1)(n + 3). The Raviart-Thomas element is typically used for approximations inH(div).Remark5.4.1. Earlierwesawcommonbasesfor P n d ,butelementsliketheRaviart-Thomas element described above use function spaces other than P n d or vectors ortensors thereof. To fix this, we must either construct a basis for the appropriatepolynomial space or work with a different element that includes full vectors ofP n d . In the case of H(div) elements, this corresponds to using the Brezzi-Douglas-Fortin-Marini elements.5.4.1 Bases for other polynomial spacesThe basis presented above are suitable for constructing many finite elements,but as we have just seen, they do not work in all cases. The Raviart-Thomasfunction space,( )( )P2 2 xn ⊕ H 2 ny,requires a basis for vectors of polynomials to be supplemented with an extradim H 2 n = dim P2 n − dim P2 n−1 = n functions. In fact, any n polynomials in P2 n \P2 n−1will work, not just homogeneous polynomials, although the sum above will nolonger be direct.78

Robert C. Kirby and Kent-Andre MardalWhile the Raviart-Thomas element requires supplementing a standard basiswith extra functions, the Brezzi-Douglas-Fortin-Marini triangle uses the functionspace {u ∈ (P2n (K)) 2 : u · n ∈ P 1 n−1 (γ), γ ∈ E(K)} .Obtaining a basis for this space is somewhat more subtle. FIAT and SyFi havedeveloped different but mathematically equivalent solutions. Both rely on recognizingthat the required function space sits inside (P 2 n )2 andcan be characterizedby certain functionals that vanish on this larger space.Three such functionals describe the basis for the BDFM triangle. If µ γ n is theLegendre polynomial of order n on an edge γ, then the functional∫l γ (u) = (u · n)µ γ nγacting on (P 2 n) 2 must vanish on the BDFM space, for the n th degree Legendrepolynomial is orthogonal to all polynomials of lower degree.Now, we define the matrix( ) V1V =V 2 . (5.17)Here, V 1 ∈ R 2dim P2 n−3,2dim P 2 n and V 2 ∈ R 3,2dim P2 n are defined bywhere {φ j } 2dim P2 nj=1 is a basis for (P 2 n) 2 .Consider now the matrix equationV 1ij = L i(φ j ),V 2ij = l i (φ j ),V A t = I 2dim Pn,2dim P n−3, (5.18)where I m,n denotes the m × n identity matrix with I i,j = δ i,j . As before, thecolumns of A still contain the expansion coefficients of the nodal basis functionsψ i in terms of {φ j }. Moreover, V 2 A = 0, which imples that the nodal basis functionsare in fact in the BDFM space.More generally, we can think of our finite element space P being embeddedin some larger space ¯P for which there is a readily usable basis. If we maycharacterize P byP = ∩ dim ¯P −dim Pi=1 l i ,where l i : ¯P → R are linear functionals, then we may apply this technique.In this case, V 1 ∈ R dim P,dim ¯P and V 2 ∈ R dim ¯P −dim P,dim ¯P . This scenario, thoughsomewhat abstract, is applicable not only to BDFM, but also to the Nédélec [?],Arnold-Winther [AW02], Mardal-Tai-Winther [?], Tai-Winther [?], and Bell [] elementfamilies.Again, we summarize the discussion as a proposition.79

Constructing General Reference Finite ElementsProposition 5.4.1. Let (K, P, N) be a finite element with P ⊂ ¯P. dim ¯PLet {φ i }a basis for ¯P. Suppose that there exist functionals {l i } dim ¯P −dim Pi=1 on ¯P such thatP = ∩ dim ¯P −dim Pi=1 null(l i ).i=1 beDefine the matrix A as in (5.18). Then, the nodal basis for P may be expressed aswhere 1 ≤ i ≤ dim P and 1 ≤ j ≤ dim ¯P.ψ i = A ij φ j ,5.5 Operations on the Polynomial spacesHere, we show various important operations may be cast in terms of linear algebra,supposing that they may be done on original basis {φ i } dim Pi=1 .5.5.1 EvaluationIn order to evaluate the nodal basis {ψ i } dim Pi=1 at a given point x ∈ K, one simplycomputes the vectorΦ i = φ i (x)followed by the productψ i (x) ≡ Ψ i = A ij Φ j .Generally, the nodal basis functions are required at an array of points {x j } m j=1 ⊂K. For performance reasons, performing matrix-matrix products may be advantageous.So, define Φ ij = Φ i (x j ) and Ψ ij = Ψ i (x j ). Then all of the nodal basisfunctions may be evaluated by the product5.5.2 DifferentiationΨ ij = A ik Φ kj .Differentiation is more complicated, and also presents more options. We wantto handle the possibility of higher order derivatives. Let α = (α 1 , α 2 , . . .α d ) be amultiindex so thatD α ≡where |α| = ∑ di=1 α i.We want to compute the arrayfor some x ∈ K.∂ |α|∂x α 11 ∂x α 22 . . .∂x α dΨ α i = D α ψ i (x)80d,

Robert C. Kirby and Kent-Andre MardalOne may differentiate the original basis functions {φ i } to produce an arraywhenceΦ α i = D α ψ i (x),Ψ α i = A ij Φ α i .This presupposes that one may conveniently compute all derivatives of the {φ i }.Thisistypically trueinsymboliccomputationorwhenusingthepowerbasis. Alternatively,automatic differentiation [] could be used in the power or Bernsteinbasis. The Dubiner basis, as typically formulated, contains a coordinate singularitythat prevents automatic differentiation from working at the top vertex.Recentwork [] has reformulatedrecurrence relations to allow forthis possibility.If one prefers (or is required by the particular starting basis), one may alsocompute matrices that encode first derivatives acting on the {φ i } and constructhigher derivatives than these. For each coordinate direction x k , a matrix D k isconstructed so that∂φ i∂x i= D k ijφ j .How to do this depends on which bases are chosen. For particular details on theDubiner basis, see []. Then, Ψ α may be computed by5.5.3 IntegrationΨ α i = (D α A) ij Φ j ,Integrationofbasisfunctionsover K,includingproductsofbasisfunctionsand/ortheir derivatives, may be performed numerically, symbolically, or exactly withsome known formula. An important aspect of automation is that it allows generalorders of approximation. While this is not difficult in symbolic calculations,a numerical implementation like FIAT must work harder. If Bernstein polynomialsare used, we may use the formula∫b i 1 bj 2 bk 3 dx = i!j!k! |K|(i + j + k + 2)! 2Kon triangles and a similar formula for lines and tetrahedra to calculate integralsof things expressed in terms of these polynomials exactly. Alternately, if theDubiner basis is used, orthogonality may be used. In either case, when derivativesoccur, it may be as efficient to use numerical quadrature. On rectangulardomains, tensor products of Gauss or Gauss-Lobatto quadrature can be used togive efficient quadrature families to any order accuracy. On the simplex, however,optimal quadrature is more difficult to find. Rules based on barycentricsymmetry[]maybeuseduptoacertain degree (whichisfrequently high enoughin practice). If one needs to go beyond these rules, it is possible to use the mapping(5.15) to map tensor products of Gauss-Jacobi quadrature rules from thesquare to the triangle.81

Constructing General Reference Finite Elements5.5.4 Linear functionalsIntegration, pointwise evaluation and differentiation all are examples of linearfunctionals. In each case, the functionals are evaluated by their action on the{φ i }82

CHAPTER6Finite Element Variational FormsBy Robert C. Kirby and Anders LoggChapter ref: [kirby-5]Summarize notation for variational forms. Introduce global tensor A and elementtensor A K .83

CHAPTER7Finite Element AssemblyBy Anders LoggChapter ref: [logg-3]Overview of general assembly algorithm.85

CHAPTER8Quadrature Representation of Finite ElementVariational FormsBy Kristian B. Ølgaard and Garth N. WellsChapter ref: [oelgaard-2]Summarise work on optimizing quadrature representation using automatedcode generation and address automated selection of best representation.87

CHAPTER9Tensor Representation of Finite Element VariationalFormsBy Anders Logg and possibly othersChapter ref: [logg-4]Overview of tensor representation.89

CHAPTER10Discrete Optimization of Finite Element MatrixEvaluationBy Robert C. Kirby, Matthew G. Knepley, Anders Logg, L. Ridgway Scott and Andy R.TerrelChapter ref: [kirby-4]The tensor constraction structure obtained in the representation results forvariational forms enables not only the construction of a compiler for variationalforms, but an optimizing compiler. For typical variational forms, the referenceelement tensor has significant structure that allows it to be contracted for anarbitrary element tensor in a reduced amount of arithmetic. Reducing the numberof operations based on this structure leads naturally to several problems indiscrete mathematics. This chapter introduces the idea of complexity-reducingrelations and discusses compile-time graph and hypergraph optimization problemsthat form the core of the FErari project.91

CHAPTER11Parallel Adaptive Mesh RefinementBy Johan Hoffman, Johan Jansson and Niclas JanssonChapter ref: [hoffman-4]For many interesting problems one is often interested in rather localized featuresof a solution, for example separation or transition to turbulence in flowproblems. It is often the case that it is to expensive to uniformly refine a meshto such an extent that these features develops. The solution is to only refine themesh in the regions of most interest, or for example where the error is large.This chapter is based on the work in [?]. First a brief summary of the theorybehind mesh refinement is given, followed by a discussion of the issues withparallelrefinementtogetherwithourproposedsolution. Thechapterendswithapresentation of our implementation of a fully distributed parallel adaptive meshrefinement framework on a Blue Gene/L.11.1 A brief overview of parallel computingIn this chapter, parallel computing refers to distributed memory systems withmessage passing. It is assumed that the system is formed by linking computenodes together with some interconnect, in order to form a virtual computer (orcluster) with a large amount of memory and processors.It is also assumed that the data, the mesh in this case is distributed acrossthe available processors. Shared entities are duplicated or “ghosted” on adjacentprocessors. In order to fully represent the original mesh from the smaller distributedsub meshes, it is essential that the information for the shared entitiesare the same for all processors.93

Parallel Adaptive Mesh Refinement11.2 Local mesh refinementLocal mesh refinement has been studied by several authors over the past years.The general idea is to split an element into a set of newones, with the constraintthat the produced mesh must be valid. Usually a mesh is valid if there are no“hanging nodes”, that is no node should be on another element’s facet. How tosplit the element in order to ensure this differs between different methods. Onecommon theme is edge bisection.A common edge bisection algorithm bisects all edges in the element, introducinganew vertexon each edge, and connectingthemtogetherto form thenewelements(seeforexample [?]). Othermethodsbisect onlyoneoftheedges, whichedge dependson the method. Forexample one could select the edge which wheremost recently refined, this method is often referred to as the newest vertex approachand where described in [?]. Anotherpopular edge bisection methodis thelongest edge [?], where one always select the longest edge for refinement. In orderto ensure that the mesh is free of “hanging nodes”, the algorithm recursivelybisects elements until there are no “hanging nodes” left.11.2.1 The challenge of parallel mesh refinementPerforming the refinement in parallel adds additional constraints on the refinementmethod. Not only should the methodprevent “hanging nodes”, it must alsobe guaranteed to terminate in a finite number of steps.Intheparallel case, each processorownsasmallpart ofthedistributedmesh.So if a new vertex is introduced on the boundary between two processors, thealgorithm must ensure that the information propagates to all neighboring processors.Foranalgorithmthatbisectsalledgesinanelement,theproblemreducestoaglobaldecisionproblem,decidingwhichoftheprocessorsinformationthatshouldbe used on all the other processors. But for an algorithm that propagates therefinementlikethelongestedge,theproblembecomesasynchronizationproblemi) to detect and handle refinement propagation between different processors andii) to detect global termination.The first problem could easily be solved by dividing the refinement into twodifferent phases, one local refinement phase and one propagation phase. In thefirst phase elements on the processor are refined with an ordinary serial refinementmethod. This could create non conforming elements on the boundary betweenprocessors. These are fixed by propagating the refinement to the neighboringprocessor. This is done in the second propagation phase. But since thenext local refinement phase could create an invalid mesh, one could get anotherpropagation phase and the possibility of another and so forth. However, if thelongest edge algorithm is used, termination is guaranteed [?]. But the problemis to detect when all these local meshes are conforming, and also when they are94

Johan Hoffman, Johan Jansson and Niclas Janssonconforming at the same time, that means one has to detect global termination,which is a rather difficult problem to solve efficiently, especially for massivelyparallel systems for which we are aiming.There has been some other work related to parallel refinement with edge bisection.For example a parallel newest vertex algorithm has been done by Zhang[?]. Since the algorithm does not need to solve the termination detection problem,scaling is simply a question of interconnect latency. Another work is theparallel longest edge algorithm done by Castaños and Savage [?]. They solve thetermination detection problem with Dijkstras general distributed terminationalgorithm, which simply detects termination by counting messages sent and receivedfrom some controlling processor. However, in both of these methods theyonly used a fairly small amount of processors, less then one hundred, so it isdifficult to estimate how efficient and scalable these algorithms are. For moreprocessors the effects of communication cost and concurrency in the communicationpatterns starts to be important, therefore we tried to design an algorithmthat would scale well for thousands of processors.11.2.2 A modified longest edge bisection algorithmInstead of trying to solve the termination detection problem, one could try tomodify the refinement algorithm in such a way that it would only require onesynchronization step, thus less communication. With less communication overheadit should also scale better for a large number of processors.1. 2. 3.Figure 11.1: An example of the refinement algorithm used. First a set of elementsare marked for refinement (1). The longest edges are found (2), and allelements connected to these edges are finally bisected, the dashed lines in (3).When this work started, DOLFIN did not have a pure longest edge implementation.Instead of recursively fixing “hanging nodes” it bisected elements inpairs (or groups) (see figure 11.1). Since this algorithm always terminate therefinement by bisecting all elements connected to the refined edge, it is a perfectcandidate for an efficient parallel algorithm. If the longest edge is shared bydifferent processors, the algorithm must only propagate the refinement onto all95

Parallel Adaptive Mesh Refinementelements (processors) connected to that edge, but then no further propagation ispossible (see figure 11.2).Cpu 0Cpu 1Cpu 2Figure 11.2: An example of the two simple cases of propagation. The dashedlines refers to how the processors wants to bisect the elements.However, notifying an adjacent processor of propagation does not solve theproblem entirely. As mentioned in section 11.1, all mesh entities shared by severalprocessors must have the same information in order to correctly representthe distributed mesh. The refinement process must therefore guarantee thatall newly created vertices are assigned the same unique information on all theneighboring processors. Another problematic case arises when processors refinethe same edge and the propagation “collides” (see figure 11.2). In this case thepropagation is done implicitly but the processors must decide which of the newinformation to use.Cpu 0Cpu 1Cpu 2Figure11.3: Anexampleoftheproblematiccasewithmultiplepropagations. Thedashed lines refers to how the processors wants to bisect the elements.A more complicated case is when an element receives multiple propagation(possibly from different processors) on different edges (see figure 11.3). Sincethe modified longest edge algorithm only allows one edge to be bisected per element,one of the propagations must be selected and the other one rejected. Thishowever adds a difficulty to the simple algorithm. First of all, how should theprocessors decide upon which edge to be refined? Clearly this could not be donearbitrarily since when a propagation is forbidden, all refinement done aroundthat edge must be removed. Thus, in the worst case it could destroy the entirerefinement.To solve the edge selection problem perfectly one needs an algorithm with aglobal view of the mesh. In two dimensions with a triangular mesh, the problemcould be solved rather simple since each propagation could only come fromtwo different edges (one edge is always facing the interior). By exchanging thedesired propagation edges processors could match theirs selected edges with the96

Johan Hoffman, Johan Jansson and Niclas Janssonpropagated ones, in an attempt to minimize the number of forbidden propagations.However, in three dimensions the problem starts to be such complicatedthat multiple exchange steps are needed in order to solve the problem. Hence, itbecomes too expensive to solve.Insteadwe propose an algorithm which solves the problem using an edge votingscheme. Each processor refines the boundary elements, find their longestedge and cast a vote for it. These votes are then exchanged between processors,which add the received votes to its own set of votes. Now the refinement processrestarts, but instead of using the longest edge criteria, edges are selecteddepending on the maximum numbers of votes. In the case of a tie, the edge isselected depending on a random number assigned to all votes.Onceasetofedgeshasbeen selectedfrom thevotingphase theactually propagationstarts by exchanging these with the other processors. However, the votingcould fail to “pair” refinements together. For example, an element could liebetween two processors which otherwise does not share any common face. Eachof these processors wants to propagate into the neighboring element but on differentedges(seefigure11.4).Sincetheprocessorsontheleftandrightsideoftheelement do not receive the edge vote from each other, the exchange of votes willin this case not help with selecting an edge that would work for both processors.Cpu 0Cpu 1Cpu 2Figure 11.4: An example of the case when edge votes could be missed. Thedashed lines refers to how the processors wants to bisect the elements.To fix this, an additionally exchange step is needed and maybe another andso forth, rendering the perfect fix impossible. Instead, the propagation step endsby exchanging the refined edges which gave rise to a forbidden propagation. Allprocessors could then remove all refinements that these edges introduced, andin the process, remove any hanging nodes on the boundary between processors.11.3 The need of dynamic load balancingFor parallel refinement, there is an additional problem not present in the serialsetting. As one refines the mesh, new vertices are added arbitrarily at anyprocessor. Thus, rendering an initially good load balance useless. Therefore, inorder to sustain a good parallel efficiency the mesh must be repartitioned andredistributed after each refinement, in other words dynamic load balancing isneeded.97

Parallel Adaptive Mesh RefinementFigure 11.5: An example of load balancing were the locality of the data is considered.The mesh is refined three times around the cylinder, and for each step loadbalancing is performed, shading refers to processor id.In the worst case, the load balancing routine must be invoked every time amesh is adapted, it has to be rather efficient, and for our aim, scale well fora large number of processors. There are mainly two different load balancingmethods used today, diffusive and remapping methods. Diffusive methods, likethe physical meaning, by finding a diffusion solution a heavily loaded processor’svertices would move to another processor and in that way smear out theimbalance, described for example in [?, ?]. Remapping methods, relies on thepartitioner’s efficiency of producing good partitions from an already partitioneddataset. In order to avoid costly data movement, a remapping method tries toassign the new partitions to processors in an optimal way. For problems wherethe imbalance occurs rather localized, the remapping methods seems to performbetter [?]. Hence, it maps perfectly to the localized imbalance from local meshrefinement.In this work, we used the load balancing framework of PLUM [?] a remappingload balancer. The mesh is repartitioned according to an imbalance model.Repartitioning is done before refinement, this would in theory minimize datamovement and speedup refinement, since a more balanced number of elementwould be bisected on each processor.11.3.1 Workload modellingThe workload is modelled by a weighted dual graph of the mesh. Let G = (V, E)bethedualgraph ofthemesh, q beoneofthepartitionsandlet w i betthecomputationalwork (weights) assigned to each graph vertex. The processor workload98

Johan Hoffman, Johan Jansson and Niclas Janssonis then defined asW(q) = ∑∀w i ∈qw i (11.1)wherecommunication costsare neglected. Let W avg be the average workload andW max be the maximum, then the graph is considered imbalanced ifW max /W avg > κ (11.2)where the threshold value κ is based on the problem or machine characteristics.This model suits the modified longest edge algorithm (section 11.2.2) perfectly.Since the modifications reduces the algorithm to only have one propagationand/or synchronization step. The workload calculation becomes a localproblem, thus it is rather cheap to compute. So if we let each element representone unit of work, a mesh adapted by the modified algorithm would producea dual graph with vertex weights equal to one or two. Each element is onlybisected once, giving a computational weight of two elements for each elementmarked for refinement.11.3.2 Remapping strategiesRemapping of the new partitions could be done in numerous ways, depending onwhatmetric onetries tominimize. Usually oneoften talks about the twometricsTOTALV and MAXV. MAXV triestominimizetheredistribution timebyloweringthe flow of data, while TOTALV lower the redistribution time by trying to keepthe largest amount of data local, for a more detailed description see [?]. We havechosen to focus on the TOTALV metric, foremost since it much cheaper to solvethen MAXV, and it produces equally good (or even better) balanced partitions.Independent of which metric one tries to solve. The result from the repartitioningis placed in a similarity matrix S, where each entry S i,j is the number ofvertices on processor i which would be placed in the new partition j. In our case,we want to keep the largest amount of data local, hence to keep the maximumrow entry in S local. This could be solved by transforming the matrix S into a bipartitegraphwhereeach edge e i,j isweightedwith S i,j ,theproblemthenreducesto the maximally weighted bipartite graph problem [?].11.4 The implementation on a massively parallel systemThe adaptive refinement method described in this chapter was implemented usingan early parallel version of DOLFIN, for a more detailed description see [?].Parallelization was implemented for a message passing system, and all the algorithmswere designed to scale well for a large number of processors.99

Parallel Adaptive Mesh RefinementThe challenge of implementing a refinement algorithm on a massively parallelsystem is as always the communication latency. In order to avoid that themessage passing dominates the runtime, it is important that the communicationis kept at a minimum. Furthermore, in order to obtain a good parallel efficiency,communication patterns must be design in such way that they are highly concurrent,reducing processors idle time etc.11.4.1 The refinement methodSince element bisection is a local problem, without any communication, the onlypart of the refinement algorithm that has to be well designed for a parallel systemis the communication pattern used for refinement propagation.For small scale parallelism, one could often afford to do the naive approach,loop over all processors and exchange messages without any problem. When thenumber of processors are increased, synchronization, concurrency and deadlockprevention starts to become important factors to considered when designing thecommunication pattern. A simple and reliable pattern is easily implemented asfollows. If the processors send data to the left and receive data from the rightin a circular pattern, all processors would always be busy sending and receivingdata, thus no idle time.Algorithm 1 Communication pattern◮ Editor note: Package algorithmic.styconflicts with algorithmicx.sty. Sort outwhich packages to use for algorithms.The refinement algorithm outlined in 11.2.2 is easily implemented as a loopover all elements marked for refinement. For each element marked for refinementit finds the longest edge and bisect all elements connected to that edge.However, since an element is only allowed to be bisected once, the algorithm isonly allowed to select the longest edge which is part of an unrefined element. Inorder to make this work in parallel, one could structure the algorithm in sucha way that it first refines the shared elements, and propagate the refinement.After that it could refine all interior elements without any communication.Algorithm 2 Refinement algorithm◮ Editor note: Package algorithmic.styconflicts with algorithmicx.sty. Sort outwhich packages to use for algorithms.If we let B be the set of elements on the boundary between processors, R theset of elements marked for refinement. Then by using algorithm 1 we could efficientlyexpress the refinement of shared entities in algorithm 2 with algorithm3.100

Johan Hoffman, Johan Jansson and Niclas JanssonAlgorithm 3 Refinement of shared entities◮ Editor note: Package algorithmic.styconflicts with algorithmicx.sty. Sort outwhich packages to use for algorithms.11.4.2 The remapping schemeThemaximallyweightedbipartitegraphproblemfortheTOTALV metriccouldbesolvedin an optimalway in O(V 2 log V +V E) steps[?]. Recall that thevertices inthe graph are the processors. Calculating the remapping could therefore becomerather expensive if a large number of processors are used. Since the solutiondoes not need to be optimal, a heuristic algorithm with a runtime of O(E) wasused.The algorithm starts by generating a sorted list of the similarity matrix S.It then steps through the list and selects the largest value which belongs to anunassigned partition. It was proven in [?] that the algorithm’s solution is alwaysgreater than half ofthe optimal solution, thusit shouldnot be a problem to solvetheremappingprobleminasuboptimalway. Sortingwasimplemented(asintheoriginal PLUM paper) by a serial binary radix sort (the matrix S were gatheredonto one processor), performing β passes and using 2 r “buckets” for counting.In order to save some memory the sorting was performed per byte of the integerinsteadofthebinaryrepresentation. Sinceeachintegerisrepresentedby4bytes(true even for most 64-bits architectures) the number of passes required wasβ = 4, and since each byte have 8 bits the number of “buckets” needed were 2 8 .However, since the similarity matrix S is of the size P × P where P is thenumber of processors, the sorting will have a runtime of O(P 2 ). This should notcause any problem on a small or medium size parallel computer, as the one usedin the fairly old PLUM paper. But after 128-256 processors the complexity of thesorting starts to dominates in the load balancer. To solve this problem, insteadof sorting S on one processor we implemented a parallel binary radix sort. Theunsorted data of length N was divided into N/P parts which were assigned tothe available processors. The internal β sorting phases were only modified witha parallel prefix calculation and a parallel swap phase (when the elements aremoved to the new “sorted” locations).Algorithm 4 Parallel radix sort◮ Editor note: Package algorithmic.styconflicts with algorithmicx.sty. Sort outwhich packages to use for algorithms.101

Parallel Adaptive Mesh RefinementFigure 11.6: Two of the meshes used during evaluation.11.4.3 Theoretical and experimental analysisThe adaptive refinement method described in this chapter has been successfullytestedona1024nodeBlueGene/L,eachdual-processornodecouldeitherbeusedto run two separate programs (virtual mode) or run in coprocessor mode whereone of the processor works as an offload engine, for example handling parts ofthe message passing stack [?, ?].As a test problem we used an unstructured mesh and refined all elementsinside a small local region, timed mesh refinement and load balancing times.The problem were tested on P = 32, 128, 512 nodes both in virtual and coprocessormode. Since the smallest number of nodes that could be allocated was32, all speedup results are presented as a comparison against the time for 32processors. To capture possible communication bottlenecks, three different unstructuredmeshes were used. First a cylinder mesh with n v = 566888 vertices,secondly a hill with n V = 94720 vertices and finally, the cylinder again but withn v = 1237628 vertices instead.The regions selected for refinement were around the cylinder and behind thehill. Since these regions already had the most number of elements in the mesh,refinement would certainly result in an workload imbalance. Hence, trigger theload balancing algorithms. In order to analyze the experimental results we useda performance model which decompose the total runtime T into one serial computationalcost T comp , and a parallel communication cost T comm .T = T comp + T comm (11.3)The mesh refinement has a local computational costs consisting of iteratingover and bisecting all elements marked for refinement, for a mesh with N c elementsO(N c /P) steps. Communication only occurs when the boundary elements102

Johan Hoffman, Johan Jansson and Niclas Janssonneeds to be exchanged. Thus, each processor would in the worst case communicatewith (P − 1) other processors. If we assume that there are N s shared edges,the total runtime with communication becomes,( )NcT refine = O τ f + (P − 1)(τ s + N s τ b ) (11.4)Pwhere τ f isthetimetoperformone(floatingpoint)operation, τ s isthelatencyandτ b the bandwidth. So based on the performance model, more processors wouldlower the computational time, but in the same time increase the communicationtime.The most computationally expensive part of the load balancer is the remappingorassignmentofthenewpartitions.Asdiscussedearlier,weusedanheuristicwith a runtime of O(E), the number of edges in the bipartite graph. Hence,in the worst case E ≈ P 2 . The sorting phase is linear, and due to the parallelimplementation it runs in O(P). Communication time for the parallel prefix calculationis given by, for m data it sends and calculates in m/P steps. Since theprefixconsistsof 2 r elements,it wouldtake 2 r /P steps, andwehave todo thisforeach β sorting phases. In the worst case the reordering phase (during sorting)needs to send away all the elements, thus P communication steps, which givesthe total runtime.( ( ) ) 2T loadb = O(P 2 r)τ f + β τ s +P + P τ b (11.5)According to this, load balancing should not scale well for a large numberof processors (due to the O(P 2 ) term). However the more realistic average caseshould be O(P). So again, with more processors the runtime could go up due tothe communication cost.If we then compare with the experimental results presented in figure 11.7,we see that the performance degenerates when a large number of processors areused. The question is why? Is it solely due to the increased communication cost?Or is the load balancer’s computational cost to blame?First of all one could observe that when the number of elements per processoris small. The communication costs starts to dominate, see for example theresults for the smaller hill mesh (represented by a triangle in the figure). Theresult is better for the medium size cylinder (represented by a diamond), andeven better for the largest mesh (represented by a square). If the load balancingtime was the most dominating part, a performance drop should have beenobserved around 128 - 256 processors. Instead performance generally drops after256 processors. A possible explanation for this could be the small amount oflocal elements. Even for the larger 10 6 element mesh, with 1024 processors thenumber of local elements is roughly 10 3 , which is probably too few to balance thecommunication costs.103

Parallel Adaptive Mesh Refinement87cyl1cyl2hill6Speedup5432132 128 512 1024Number of processorsFigure 11.7: Refinement speedup (incl. load balancing)Processors Execution time Speedup256 33.50 1.0512 23.19 1.441024 19.24 1.74Table 11.1: Refinement time for a nine million vertices meshThis illustrate the curse of massively parallel, distributed memory systems.In order to gain anything from the large amount of processors, either one has tohave a large local computational cost, or one has to increase the dataset in ordertomaskthecommunicationlatency. Toillustratehowlargethemeshneedstobe,weuniformlyrefinedthelargercylinder, locallyrefinedthesameregionasbeforeand timed it for 256,512 and 1024 processors. Now the refinement performancebetter, as can be seen in table 11.1. But again performance drops between 512and 1024 processors.However, despite the decrease in speedup, one could see that the algorithmseems to scale well, given that the local amount of elements are fairly large. Itis close to linear scaling from 32 to 256 processors, and we see no reason for itto not scale equally good between 256 and 1024 processors, given that the localpart of the mesh is large enough.104

Johan Hoffman, Johan Jansson and Niclas Jansson11.5 Summary and outlookIn this chapter we presented some of the challenges with parallel mesh refinement.How these could be solved and finally how these solutions were implementedfor a distributed memory system. In the final section some theoreticaland experimental performance results were presented and explained. One couldobserve how well the implementation performs, until the curse of slow interconnectstarts to affect the results.One aspect of the refinement problem that has not been touched upon in thischapter is the mesh quality. The modification done to the longest edge algorithm(see section 11.2.2), unfortunately destroys the good properties of the originalrecursive algorithm. It was a trade off between efficiency and mesh quality. Asmentioned before, the problem with the original algorithm is the efficiency ofpropagation and global termination detection. Currently our work is focusedin overcoming these problems, implementing an efficient parallel refinementmethod with good quality aspects, which also performs well for thousands ofprocessors.105

Part IIImplementation107

CHAPTER12DOLFIN: A C++/Python Finite Element LibraryBy Anders Logg and Garth N. WellsChapter ref: [logg-2]Overview and tutorial of DOLFIN.109

CHAPTER13FFC: A Finite Element Form CompilerBy Anders Logg and possibly othersChapter ref: [logg-1]◮ Editor note: Oelgaard/Wells might possibly write something here on FFC quadratureevaluation, or it will be included in a separate chapter. Marie will possibly write somethinghere on H(div)/H(curl), or it will be included in a separate chapter.Overview and tutorial of FFC.111

CHAPTER14FErari: An Optimizing Compiler for Variational FormsBy Robert C. Kirby and Anders LoggChapter ref: [kirby-3]We describe the implementation of an optimizing compiler for variationalforms based on the discrete optimization methodology described in an earlierchapter. The major issues are communicating reference tensors from FFC to theoptimizer, FErari, performing the actual optimization, and communicating abstractsyntax for the optimized algorithm back to FFC for code generation. Weinclude some results indicating the kinds of speedups that may be achieved andwhat factors influence the effectiveness of the optimizations.113

CHAPTER15FIAT: Numerical Construction of Finite Element BasisFunctionsBy Robert C. KirbyChapter ref: [kirby-2]15.1 IntroductionThe FIAT project [?, ?] implements the mathematical framework described inChapter ?? as a Python package, working mainly in terms of numerical linearalgebra. Although an implementation floating-point arithmetic presents somechallenges relative to symbolic computation, it can allow greater efficiency andconsume fewerresources, especially for high order elements. To obtain efficiencyinPython, thecompute-intensiveoperationsare expressedin termsofnumericallinear algebra and performed using the widely distributed numpy package.The FIAT project is one of the first FEniCS projects, providing the basis functionback-end for ffc and enabling high-order H 1 , H(div) and H(curl) elements.Itiswidelydistributed, withdownloadsoneveryinhabitedcontinentandin oversixty countries, averaging about 100 downloads per month.This chapter works in the context of a Ciarlet triple (K, P, N) [?], where Kis a fixed reference domain, typically a triangle or tetrahedron. P is a finitedimensionalpolynomial space, though perhaps vector- or tensor-valued and notcoincident with polynomials of some fixed degree. N = {l i } |P |i=1 is a set of linearfunctionals spanning P ′ . Recalling Chapter [], the goal is first to enumeratea convenient basis {φ i } |P |i=1 for P and then to form a generalized Vandermonde115

FIAT: Numerical Construction of Finite Element Basis FunctionssystemV A = I,where V ij = l i (φ j ). Thecolumnsof A = V −1 storethe expansion coefficientsofthenodal basis for (K, P, N) in terms of the basis {φ i }.15.2 Prime basis: Collapsed-coordinate polynomialsHigh order polynomials in floating-point arithmetic require stable evaluation algorithms.FIAT uses the so-called collapsed-coordinate polynomials [?] on thetriangle and tetrahedra. Let P α,βi (x) denote the Jacobi polynomial of degree iwith weights α and β. On the triangle K with vertices (−1, 1), (1, −1), (−1, 1), thepolynomials are of the formwhereD p,q (x, y) = P 0,0p (η 1 )η 1 = 2η 2 = y( 1 − η22( ) 1 + x− 11 − y) iP 2i+1,0j (η 2 )is called the collapsed-coordinate mapping is a singular transformation betweenthe triangle and the biunit square. The set {D p,q (x, y)} p+q≤np,q≥0 forms a basis forpolynomoials of degree n. Moreover, they are orthogonal in the L 2 (K) innerproduct. Recently [?], it has been shown that these polynomials may be computeddirectly on the triangle without reference to the singular mapping. Thismeans that no special treatment of the singular point is required, allowing useof standard automatic differentiation techniques to compute derivatives.The recurrences are obtained by rewriting the polynomials aswhereandD p,q (x, y) = χ p (x, y)ψ p,q (y),χ p (x, y) = P 0,0p (η 1 )ψ p,q (y) = P 2p+1,0q( 1 − η22) p(η 2 ) = P 2p+1,0 (y).This representation is not separable in η 1 and η 2 , which may seem to be a drawbacktoreadersfamiliarwiththeusageofthesepolynomialsinspectralmethods.However, they do still admit sum-factorization techniques. More importantly forpresentpurposes,each χ p isin factapolynomial in xand y andmaybe computedbyrecurrence. ψ p,q isjustaJacobipolynomialin y andsohasawell-knownthreetermrecurrence. The recurrences derived in [?] are presented in Algorithm 5116q

Robert C. KirbyAlgorithm 5 Computes all triangular orthogonal polynomials up to degree d byrecurrence1: D 0,0 (x, y) := 12: D 1,0 (x, y) := 1+2x+y23: for p ← 1, d − 1 do ( ) (1+2x+y ) ( ) (1−y )4: D p+1,0 (x, y) :=D p,0 2(x, y) − D p−1,0 (x, y)5: end for6: for p ← 0, d − 1 do2p+1p+17: D p,1 (x, y) := D p,0 (x, y)8: end for9: for p ← 0, d − 1 do10: for q ← 1, d − p − 1 do11: D p,q+1 (x, y) := ( a 2p+1,0q12: end for13: end for2(1+2p+(3+2p)y2)pp+1)y + b 2p+1,0q D p,q (x, y) − c 2p+1,0q D p,q−1 (x, y)15.3 Representing polynomials and functionalsEven using recurrence relations and numpy vectorization for arithmetic, furthercare is required to optimize performance. In this section, standard operationson polynomials will be translated into vector operations, and then batches ofsuch operations cast as matrix multiplication. This helps eliminate the interpretiveoverhead of Python while moving numerical computation into optimizedlibrary routines, since numpy.dot wraps level 3 BLAS and other functions suchas numpy.svd wrap relevant LAPACK routines.Since polynomials and functionals over polynomials both form vector spaces,it is natural to represent each of them as vectors representing expansion coefficientsin some basis. So, let {φ i } be the set of polynomials described above.Now, any p ∈ P is written as a linear combination of the basis functions {φ i }.Introduce a mapping R from P into R |P | by taking the expansion coefficients of pin terms of {φ i }. That is,p = R(p) i φ i ,where summation is implied over i.A polynomial p may then be evaluated at a point x as follows. Let Φ i be thebasis functions tabulated at x. That is,Then, evaluating p follows by a simple dot product:2Φ i = φ i (x). (15.1)p(x) = R(p) i Φ i . (15.2)117

FIAT: Numerical Construction of Finite Element Basis FunctionsMore generally in FIAT, a set of polynomials {p i } will need to be evaulatedsimultaneously, such as evaluating all of the members of a finite element basis.The coefficients of the set of polynomials may be stored in the rows of a matrixC, so thatC ij = R(p i ) j .Tabulating this entire set of polynomials at a point x is simply obtained bymatrix-vector multiplication. Let Φ i be as in (15.1). Then,p i (x) = C ij Φ j .The basis functions are typically needed at a set of points, such as those of aquadrature rule. Let {x j } now be a collection of points in K and letΦ ij = φ i (x j ),where the rows of Φ run over the basis functions and the columns over the collectionof points. As before, the set of polynomials may be tabulated at all thepoints byp i (x j ) = C ik Φ kj ,which is just the matrix product CΦ and may may be efficiently carried out by alibrary operation, such as the numpy.dot wrapper to level 3 BLAS.Finite element computation also requires the evaluation of derivatives ofpolynomials. In a symbolic context, differentiation presents no particular difficulty,but working in a numerical context requires some special care.For some differential operator D, the derivatives Dφ i are computed at apointx, any polynomial p = R(p) i φ i may be differentiated at x byDp(x) = R(p) i (Dφ i ),which is exactly analagous to (15.2). By analogy, sets of polynomials may bedifferentiated at sets of points just like evaluation.TheformulaeinAlgorithm5andtheirtetrahedralcounterpartarefairly easyto differentiate, but derivatives may also be obtained through automatic differentiation.Some experimental support for this using the AD tools in ScientificPython has been developed in an unreleased version of FIAT.The released version of FIAT currently evaluates derivatives in terms of linearoperators,whichallowsthecoordinatesingularityinthestandardrecurrencerelations to be avoided. For each Cartesian partial derivative ∂∂x k, a matrix D k iscalculated such that ( ∂pR = D∂xijR(p)k)ik j .Then, derivatives of sets of polynomials may be tabulated by premultiplying thecoefficient matrix C with such a D k matrix.118

Robert C. KirbyThis paradigm may also be extended to vector- and tensor-valued polynomials,making use ofthe multidimensional arrays implementedin numpy. Let P bea space of scalar-valued polynomials and n > 0 an integer. Then, a member of(P) m ,avectorwith mcomponentsin P,mayberepresentedasatwo-dimensionalarray. Let p ∈ (P) m and p i be the j th component of p. Then p = R(p) jk φ k , so thatR(p) jk is the coefficient of φ k for p j .The previous discussion of tabulating collections of functions at collectionsof points is naturally extended to this context. If {p i } is a set of members ofP m , then their coefficients may be stored in an array C ijk , where C i is the twodimensionalarrayR(p) jk ofcoefficientsfor p i . Asbefore, Φ ij = φ i (x j ) containstheof basis functions at a set of points. Then, the j th component of v i at the point x kis naturally given by a three-dimensional arrayp i (x k ) j = C ijl φ lk .Psychologically, this is just a matrix product if C ijl is stored contiguously in generalizedrow-majorformat, and theoperation is readly cast as densematrix multiplication.Returning for the moment to scalar-valued polynomials, linear functionalsmay also be represented as Euclidean vectors. Let l : P → R be a linear functional.Then, for any p ∈ P,l(p) = l(R(p) i φ i ) = R(p) i l(φ i ),so that l acting on p is determined entirely by its action on the basis {φ i }. Aswith R, define R ′ : P ′ → R |P | byso thatR ′ (l) i = l(φ i ),l(p) = R ′ (l) i R(p) i .Note that the inverse of R ′ is the Banach-space adjoint of R.Just as with evaluation, sets of linear functionals can be applied to sets offunctions via matrix multiplication. Let {l i } N i=1 ⊂ P ′ and {u i } N i=1 ⊂ P. The functionalsare represented by a matrixand the functions byL ij = R ′ (l i ) jC ij = R(u i ) jThen, evaluating all of the functionals on all of the functions is computed by thematrix productA ij = L ik C jk , (15.3)119

FIAT: Numerical Construction of Finite Element Basis Functionsor A = LC t . This is especially useful in the setting of the next section, where thebasis for the finite element spaceneeds to be expressed as a linear combinationof orthogonal polynomials.Also,theformalismof R ′ maybegeneralizedtofunctionalsovervector-valuedspaces. As before, let P be a space of degree n with basis {φ i } |P |i=1and to eachv ∈ (P) m associate therepresentation v i = R(v) ij φ j . Inthisnotation, v = R(v) ij φ jis the vector indexed over i. For any functional l ∈ ((P) m ) ′ , a representationR ′ (l) ij must be defined such thatl(v) = R ′ (l) ij R(v) ij ,with summation implied over i and j. To determine the representation of R ′ (f),let e j be the canonical basis vector with (e j ) i = δ ij and writel(v) = l(R ij φ j )= l(R(v) ij δ ik e k φ j )= l(R(v) ij e i φ j )= R(v) ij l(e i φ j ).(15.4)From this, it is seen that R ′ (l) ij = l(e i φ j ).Now,let {v i } N i=1 beasetofvector-valued polynomials and {l i} M i=1 asetoflinearfunctionalsactingonthem. ThepolynomialsmaybestoredbyacoefficienttensorC ijk = R(v i ) jk . The functionals may be represented by a tensor L ijk = R ′ (l i ) jk .The matrix A ij = l i (v j ) is readily computed by the contractionA ij = L ikl C jkl .Despite having three indices, this calculation may still be performed by matrixmultiplication. Since numpy stores arrays in row-major format, a simple reshapingmaybeperformedwithoutdatamotionsothatA = ˜L ˜C t ,for ˜Land ˜C reshapedto two-dimensional arrays by combining the second and third axes.15.4 Other polynomial spacesMany of the complicated elements that motivate the development of a tool likeFIAT polynomial spaces that are not polynomials of some complete degree (orvectors or tensors of such). Once a process for providing bases for such spaces isdescribed, the techniques of the previous section may be applied directly. Mostfiniteelementpolynomial spaces are described eitherbyadding a fewbasis functionstosomepolynomialsofcompletedegreeorelsebyconstrainingsuchaspaceby some linear functionals.120

Robert C. Kirby15.4.1 Supplemented polynomial spacesA classic example of the first case is the Raviart-Thomas element, where thefunction space of order r is( )RT r = (P r (K)) d ⊕ ˜Pr (K) x,where x ∈ R d is the coordinate vector and ˜P r is the space of homogeneous polynomialsof degree r. Given any basis {φ i } for P r (K) such as the Dubiner basis, itis easy to obtain a basis for (P r (K)) d by taking vectors where one component issome φ i and the rest are zero. The issue is obtaining a basis for the entire space.Consider the case d = 2 (triangles). While monomials of the form x i y r−i spanthe space of homoegeneous polynomials, they are subject to ill-conditioning innumericalcomputations. Ontheotherhand, theDubinerbasis oforder r, {φ i } |Pr|i=1may be ordered so that the last r + 1 functions, {φ i } |Pr|i=|P r|−r, have degree exactlyr. While they do not span ˜P r , the span of {xφ i } |Pr|i=|P r|−rtogether with a basis for(P r (K)) 2 does span RT r .So, this gives a basis for the Raviart-Thomas space that can be evaluatedand differentiated using the recurrence relations described above. A similartechnique may be used to construct elements that consist of standard elementsaugmented with some kind of bubble function, such as the PEERS element ofelasticity or MINI element for Stokes flow.15.4.2 Constrained polynomial spacesAn example of the second case is the Brezzi-Douglas-Fortin-Marini element [?].Let E(K)bethesetofdimensiononecofacetsof K (edgesin2d,facesin3d). Thenthe function space isBDFM r (K) = {u ∈ (P r (K)) d : u · n| γ ∈ P r−1 (γ), γ ∈ E(K)}This space is naturally interpreted as taking a function space, (P r (K)) d , andimposing linear constraints. For the case d = 2, there are exactly three suchconstraints. For γ ∈ E(K), let µ γ be the Legendre polynomial of degree r mappedto γ. Then, if a function u ∈ (P r (K)) d , it is in BDFM r (K) iff∫(u · n)µ γ ds = 0for each γ ∈ E(K).Number the edges by {γ i } 3 i=1 and introduce linear functionalsl i (u) = ∫ γ i(u · n)µ γ ids. Then,γBDFM r (K) = ∩ 3 i=1 null(l i).121

FIAT: Numerical Construction of Finite Element Basis FunctionsThis may naturally be cast into linear algebra and so evaluated with LAPACK.Following the techniques for constructing Vandermonde matrices, a constraintmatrix may be constructed. Let {¯φ i } be a basis for (P r (K)) 2 . Define the 3 × |(P r )| 2matrixC ij = l i (φ j ).Then, a basis for the null space of this matrix is constructed using the singularvalue decomposition [?]. The vectors of this null-space basis are readily seen tocontain the expansion coefficients of a basis for BDFM r in terms of a basis forP r (K) 2 . With this basis in hand, the nodal basis for BDFM r (K) is obtained byconstructing the generalized Vandermonde matrix.This technique may be generalized to three dimensions, and it also appliesto Nédélec [?], Arnold-Winther [?], Mardal-Tai-Winther [?], and many other elements.15.5 Conveying topological information to clientsMost of this chapter has provided techniques for constructing finite elementbases and evaluating and differentiating them. FIAT must also indicate whichdegrees of freedom are associated with which entities of the reference element.Thisinformationisrequiredwhenlocal-global mappingsaregeneratedbyaformcompiler such as ffc.The topological information is provided by a graded incidence relation and issimilar to the presentation of finite element meshes in [?]. Each entity in thereference element is labeled by its topological dimension (e.g. 0 for vertices and1 for edges), and then the entities of the same dimension are ordered by someconvention. To each entity, a list of the local nodes is associated. For example,thereferencetriangle withentitieslabeledisshowninFigure 15.5,andthecubicLagrange triangle with nodes in the dual basis labeled is shown in Figure 15.5.Figure 15.1: The reference triangle, with vertices, edges, and the face numbered.122

Robert C. KirbyFigure 15.2: The cubic Lagrange triangle, with nodes in the dual basis labelled.For this example, the graded incidence relation is stored as{ 0: { 0: [ 0 ] ,1: [ 1 ] ,2: [ 2 ] } ,1: { 0: [ 3 , 4 ] ,1: [ 5 , 6 ] ,2: [ 7 , 8 ] } ,2: { 0: [ 9 ] } }15.6 Functional evaluationIn order to construct nodal interpolants or strongly enforce boundary conditions,FIATalsoprovidesinformationtonumericallyevaluatelinearfunctionals. Theserules are typically exact for a certain degree polynomial and only approximateon general functions. For scalar functions, these rules may be represented by acollection of points and corresponding weights {x i }, {w i } so thatl(f) ≈ w i f(x i ).For example, pointwise evaluation at a point x is simply represented by thecoordinates of x together with a weight of 1. If the functional is an integralmoment, such as∫l(f) = gf dx,then the points {x i } will be those of some quadrature rule and the weights willbe w i = ω i g(x i ), where the ω i are the quadrature weights.This framework is extended to support vector- and tensor-valued functionspaces, by including a component corresponding to each point and weight. If v isK123

FIAT: Numerical Construction of Finite Element Basis Functionsa vector-valued function and v α is its component, then functionals are written inthe forml(v) ≈ w i v αi (x i ),so that the sets of weights, components, and points must be conveyed to theclient.This framework may also support derivative-based degrees of freedom by includinga multiindex at each point corresponding to a particular partial derivative.15.7 Overview of fundamental class structureMany FEniCS users will never directly use FIAT; for them, interaction will bemoderated through a form compiler such as ffc. Others will want to use theFIAT basis functions in other contexts. At a basic level, a user will access FIATthrough top-level classes such as Lagrange and RaviartThomas that implementthe elements. Typically, the class constructors accept the reference elementand order of function space as arguments. This gives an interface that isparametrized by dimension and degree. The classes such as Lagrange derivefrom a base class FiniteElement that provides access to the three componentsof the Ciarlet triple.The currently released version of FIAT stores the reference element as a flagindicating the simplex dimension, although a development version provides anactual class describing reference element geometry and topology. This will allowfuture releasesof FIATto be parametrized overthe particular reference elementshape and topology.The function space P is modelled by the base class PolynomialSet, whilecontains a rule for constructing the base polynomials φ i (e.g. the Dubiner basis)andamultidimensionalarray ofexpansion coefficientsforthebasis of P. Specialsubclasses of this provide (possibly array-valued) orthogonal bases as well asthe rules for constructing supplemented and constrained bases. These classesprovide mechanisms for tabulating and differentiating the polynomials at inputpoints as well as basic queries such as the dimension of the space.The set of finite element nodes is similarly modeled by a class DualBasis.This provides the functionals of the dual basis as well as their connection tothe reference element facets. The functionals are modeled by a FunctionalSetobject, which is a collection of Functional objects. Each Functional objectcontains a reference to the PolynomialSet over which it is defined and thearrayofcoefficientsrepresentingitandownsaFunctionalTypeclassprovidingthe information described in the previous section. The FunctionalSet classbatches these coefficients together in a single large array.The constructor for the FiniteElement class takes a PolynomialSet modelingthe starting basis and a DualBasis defined over this basis and constructs124

Robert C. Kirbya new PolynomialSet by building and inverting the generalized Vandermondematrix.Beyond this basic finite element structure, FIAT provides quadrature suchas Gauss-Jacobi rules in one dimension and collapsed-coordinate rules in higherdimensions. It also provides routines for constructing lattices of points on eah ofthe reference element shapes and their facets.In the future, FIAT will include the developments discussed already (moregeneralreferenceelementgeometry/topologyandautomaticdifferentiation). Automaticdifferentiationwillmakeiteasiertoconstructfiniteelementswithderivative-typedegrees of freedom such as Hermite, Morley, and Argyris. Aditionally,we hope to expand the collection of quadrature rules and provide more advancedpoint distributions, such as Warburton’s warp-blend points [?].125

CHAPTER16Instant: Just-in-Time Compilation of C/C++ Code inPythonBy Ilmar M. Wilbers, Kent-Andre Mardal and Martin S. AlnæsChapter ref: [wilbers]16.1 IntroductionInstant is a small Python module for just-in-time compilation (or inlining) ofC/C++ code based on SWIG [?] and Distutils 1 . Just-in-time compilation can significantlyspeed up, e.g., yout NumPy [?] code in a clean and readable way. Thismakes Instant a very convenient tool in combination with code generation. Beforewe demonstrate the use of Instant in a series of examples, we briefly stepthrough the basic ideas behind the implementation. Instant relies on SWIG forthe generation of wrapper code needed for making the C/C++ code usable fromPython [?]. SWIG is a mature and well-documented tool for wrapping C/C++code in many languages. We refer to its website for a comprehensive user manualand we also discuss some common tricks and troubles in Chapter [?]. Thecodetobeinlined,inadditiontothewrappercode,isthencompiledintoaPythonextension module (a shared library with functionality as specified by the PythonC-API) by using Distutils. To check whether the C/C++ code has changed sincethe last execution, Instant computes the SHA1 sum 2 of the code and comparesit to the SHA1 checksum of the code used in the previous execution. Finally,1 http://www.python.org/doc/2.5.2/lib/module-distutils.html2 http://www.apps.ietf.org/rfc/rfc3174.html127

Instant: Just-in-Time Compilation of C/C++ Code in PythonInstant has implemented a set of SWIG typemaps, allowing the user to transferNumPy arrays between the Python code and the C/C++ code.There exist several packages that are similar to Instant. Worth mentioninghere are Weave [?], Cython [?], and F2PY [?]. Weave allows us to inline C codedirectly in our Python code. Unlike Instant, Weave does not require the specificationof the function signature and the return argument. For specific examplesof Weave and the other mentioned packages, we refer to [?, ?]. Weave is partof SciPy [?]. F2PY is currently part of NumPy, and is primarily intended forwrapping Fortran code. F2PY can also be used for wrapping C code. Cythonis a rather new project, branched from the more well-known Pyrex project [?].Cython is attractive because of its integration with NumPy arrays. Cython differsfromtheotherprojectsbybeing aprogramming language ofitsown. Cythonextends Python with concepts such as static typing, hence allowing the user toincrementally speed up the code.Instant accepts plain C/C++. This makes it particularly attractive to combineInstant with tools capable of generating C/C++ code such as FFC [?], SFC [?],Swiginac [?], and Sympy [?]. In fact, tools like these have been the main motivationbehind Instant, and both FFC and SFC employ Instant. Instant is releasedunder a BSD license, see the file LICENSE in the source directory.In this chapter we will begin with several examples in Section 16.2. Section(16.3) explains how Instant works, while Section (16.4) gives a detailed descriptionof the API.16.2 ExamplesAll code from the examples in this section can be found online.3 We will refer to this location as $examples.16.2.1 Installing InstantBeforetrying toruntheexamples, you needtoinstall Instant. Thelatest Instantrelease can be downloaded from the FEniCS website 4 . It is available both as asource code tarball and as a Debian package. In addition, the latest source codecan be checked out using Mercurial 5 :hg clone http://www.fenics.org/hg/instantInstalling Instant from the source code is done with a regular Distutils script,i.e,3 http://www.fenics.org/pub/documents/book/instant/examples4 http://www.fenics.org/wiki/Download5 http://www.selenic.com/mercurial/wiki/128

Ilmar M. Wilbers, Kent-Andre Mardal and Martin S. Alnæspython setup.py installAfter successfully installing Instant, one can verify the installation by runningthe scripts run tests.py followed by rerun tests.py in the tests-directoryof the source code. The first will run all the examples after having cleaned theInstantcache,thesecondwillrunallexamplesusingthecompiledmodulesfoundin the Instant cache from the previous execution.16.2.2 Hello WorldOur first example demonstrate the usage of Instant in a very simple case:from instant import inlinec_code = r’’’double add(double a, double b){printf("Hello world! C function add is being called...\n");return a+b;}’’’add_func = inline(c_code)sum = add_func(3, 4.5)print ’The sum of 3 and 4.5 is’, sumHere Instant will wrap the C-function add into a Python extension module byusing SWIG and Distutils. The inlined function is written in standard C. SWIGsupports almost all of C and C++, including object orientation and templates.When running this Python snippet the first time, compiling the C code takes afewseconds. Nexttimewerunit, however,thecompilation isomitted,given thatno changes are made to the C source code.Note that a raw string is used in this example, to avoid Python interpretingescape sequences such as ’\n’. Alternatively, special characters can be escapedusing a backslash.Although Instant notifies the user when it is compiling, it might sometimesbe necessary, e.g. when debugging, to see the details of the Instant internals. Wecandothisbysettingthelogging levelbeforecalling anyotherInstantfunctions:from instant import outputoutput.set_logging_level(’DEBUG’)The intrinsic Python module logging is used. First, the build function argumentsare displayed, whereafter the different steps performed by Instant areshown in detail, e.g whether the module is found in cache and the arguments tothe Distutils file when building the module. This example can be found in thefile $examples/ex1.py.129

Instant: Just-in-Time Compilation of C/C++ Code in Python16.2.3 NumPy ArraysOne basic problem with wrapping C and C++ code is how to handle dynamicallyallocated arrays. Arrays allocated dynamically are typically representedin C/C++ by a pointer to the first element of an array and a separate integervariable holding the array size. In Python the array variable is itself an objectcontains the data array, array size, type information etc. However, a pointer inC/C++ does not necessarily represent an array. Therefore, SWIG provides thetypemap functionality that allows the userto specify a mapping betweenPythonand C/C++ types. We will not go into details on typemaps in this chapter, butthe reader should be aware that it is a powerful tool that may greatly enhanceyour code, but also lead to mysterious bugs when used wrongly. Typemaps arediscussed in Chapter [?] and at length at the SWIG webpage. In this chapter,it is sufficient to illustrate how to deal with arrays in Instant using the NumPymodule. More details on how Instant NumPy arrays can be found in Section16.3.1.16.2.4 Ordinary Differential EquationsWeintroduceasolverforanordinarydifferentialequation(ODE)modelingbloodpressure by using a Windkessel model. The ODE is as follows:dp(t) = BQ(t) − Ap(t),dtt ∈ (0, 1), (16.1)p(0) = p0. (16.2)Here p(t) is the blood pressure, Q(t) is the volume flux of blood, A is . . . and B is. . .. An explicit scheme is:p i = p i−1 + ∆t(BQ i − Ap i−1 ), for i = 1, . . ., N − 1, (16.3)p 0 = p0. (16.4)The scheme can be implemented in Python as follows using NumPy arrays:def time_loop_py(p, Q, A, B, dt, N, p0):p[0] = p0for i in range(1, N):p[i] = p[i-1] + dt*(B*Q[i] - A*p[i-1])TheCcodegiven asargument totheInstantfunction inline with numpylookslike:void time_loop_c(int n, double* p,int m, double* Q,double A, double B,130

Ilmar M. Wilbers, Kent-Andre Mardal and Martin S. Alnæs{}double dt, int N, double p0)if ( n != m || N != m ){printf("n, m and N should be equal\n");return;}p[0] = p0;for (int i=1; i

Instant: Just-in-Time Compilation of C/C++ Code in PythonFigure 16.1: Plot of pressure and blood volume flux computed by solving theWindkessel model.N 100 1000 10000 100000 1000000CPU time with NumPy 3.9e-4 3.9e-3 3.8e-2 3.8e-1 3.8CPU time with Python 0.7e-4 0.7e-3 0.7e-2 0.7e-1 0.7CPU time with Instant 5.0e-6 1.4e-5 1.0e-4 1.0e-3 1.1e-2CPU time with C 4.0e-6 1.1e-5 1.0e-4 1.0e-3 1.1e-2Table 16.1: CPU times of Windkessel model for different implementations (inseconds).16.2.5 Numpy Arrays and OpenMPIt is easy to speed up code on parallel computers with OpenMP. We will notdescribe OpenMP in any detail here, the reader is referred to [?]. However, notethat preprocessor directives like ’#pragma omp ...’ are OpenMP directivesand that OpenMP functions start with omp. In this example, we want to solvea standard 2-dimensional wave equation in a heterogeneous medium with localwave velocity k:∂ 2 u= ∇ · [k∇u] . (16.5)∂t2 We set the boundary condition to u = 0 for the whole boundary of a rectangulardomain Ω = (0, 1) × (0, 1). Further, u has the initial value I(x, y) at t = 0while ∂u/∂t = 0. We solve the wave equation using the following finite difference132

Ilmar M. Wilbers, Kent-Andre Mardal and Martin S. Alnæsscheme:( ∆tu l i,j =∆x( ∆t+∆y) 2[k i+12 ,j(u i+1,j − u i,j ) − k i−12 ,j(u i,j − u i−1,j )] l−1) 2[k i,j+12(u i,j+1 − u i,j ) − k i,j−1(u i,j − u i,j−1 )] l−1 . (16.6)2Here, u l i,j represents u at the grid point x i and y j at time level t l , wherex i = i∆x, i = 0, . . .,ny i = j∆y, j = 0, . . .,m andt l = l∆t,Also, k i+12 ,j is short for k(x i+1, y j ).2The code for calculating the next time step using OpenMP looks like:void stencil(double dt, double dx, double dy,int ux, int uy, double* u,int umx, int umy, double* um,int kx, int ky, double* k,int upn, double* up){#define index(u, i, j) u[(i)*m + (j)]int i=0, j=0, m = ux, n = uy;double hx, hy, k_c, k_ip, k_im, k_jp, k_jm;hx = pow(dt/dx, 2);hy = pow(dt/dy, 2);j = 0; for (i=0; i

Instant: Just-in-Time Compilation of C/C++ Code in Python}}}WealsoneedtoaddtheOpenMPheader omp.handcompilewiththeflag -fopenmpand link with the OpenMP shared library, e.g. libgomp.so for Linux (specifiedwith -lgomp). This can be done as follows:instant_ext = \build_module(code=c_code,system_headers=[’numpy/arrayobject.h’,’omp.h’],include_dirs=[numpy.get_include()],init_code=’import_array();’,cppargs=[’-fopenmp’],lddargs=[’-lgomp’],arrays=[[’ux’, ’uy’, ’u’],[’umx’, ’umy’, ’um’],[’kx’, ’ky’, ’k’],[’upn’, ’up’, ’out’]])Note that the arguments include headers, init code, and the first elementof system headers could have been omitted had we chosen to use inline -module with numpyinsteadof build module. Wecouldalsohaveused inline -with numpy, which would have returned only the function, not the whole module.For more details, see the next section. The complete code can be found in$examples/ex3.py. It might very well be possible to write more efficient codefor many of these examples, but the primary objective is to examplify differentInstant features.16.3 Instant ExplainedThe previous section concentrated on the usage of Instant and it may appearmysterious how it actually works since it is unclear what files that are madeduring execution and where they are located. In this section we explain this.Wewillagain useourfirstexample, butthistimewiththekeywordargumentmodulename set explicitely. The file can be found under $examples/ex4.py:from instant import inlinecode = r’’’double add(double a, double b){134

Ilmar M. Wilbers, Kent-Andre Mardal and Martin S. Alnæsprintf("Hello world! C function add is being called...\n");return a+b;}’’’add_func = inline(code, modulename=’ex4_cache’)sum = add_func(3, 4.5)print ’The sum of 3 and 4.5 is’, sumUpon calling Instant the first time for some C/C++ code, Instant compiles thiscode and stores the resulting files in a directory ex4 cache. The output fromrunning the code the first time is:--- Instant: compiling ---Hello world! C function add is being called...The sum of 3 and 4.5 is 7.5Next time we ask Instant to call this code, it will check if the compiled filesare available either in cache or locally, and further whether we need to rebuildthese files based on the checksum of the source files and the arguments to theInstant function. This means that Instant will perform the compile step onlyif changes are made to the source code or arguments. More details about thedifferent caching options can be found in Section 16.3.2.The resulting module files can be found in a directory reflecting the name ofthe module, in this case ex4 cache:ilmarw@multiboot:˜/instant_doc/code$ cd ex4_cache/ilmarw@multiboot:˜/instant_doc/code/ex4_cache$ ls -gtotal 224drwxr-xr-x 4 ilmarw 4096 2009-05-18 16:52 build-rw-r--r-- 1 ilmarw 844 2009-05-18 16:52 compile.log-rw-r--r-- 1 ilmarw 183 2009-05-18 16:52 ex4_cache-0.0.0.egg-info-rw-r--r-- 1 ilmarw 40 2009-05-18 16:52 ex4_cache.checksum-rw-r--r-- 1 ilmarw 402 2009-05-18 16:53 ex4_cache.i-rw-r--r-- 1 ilmarw 1866 2009-05-18 16:52 ex4_cache.py-rw-r--r-- 1 ilmarw 2669 2009-05-18 16:52 ex4_cache.pyc-rwxr-xr-x 1 ilmarw 82066 2009-05-18 16:52 _ex4_cache.so-rw-r--r-- 1 ilmarw 94700 2009-05-18 16:52 ex4_cache_wrap.cxx-rw-r--r-- 1 ilmarw 23 2009-05-18 16:53 __init__.py-rw-r--r-- 1 ilmarw 448 2009-05-18 16:53 setup.pyWhen building a new module, Instant creates a new directory with a number offiles. The first file it generates is the SWIG interface file, named ex4 cache.iin this example. Then the Distutils file setup.py is generated based and executed.During execution, setup.py first runs SWIG in the interface file, producingex4 cache wrap.cxx and ex4 cache.py. The first file is then compiledinto a shared library ex4 cache.so (note the leading underscore). A fileex4 cache-0.0.0.egg-info and a directory build will also be present as a135

Instant: Just-in-Time Compilation of C/C++ Code in Pythonresult of these steps. The output from executing the Distutils file is stored inthe file compile.log. Finally, a checksum file named ex4 cache.checksum isgenerated,containingachecksumbasedonthefilespresentinthedirectory. Thefinal step consists of moving the whole directory from its temporary location toeither cache or a user-specified directory. The init .py imports the moduleex4 cache.Thescript instant-cleanremovescompiledmodulesfromtheInstantcache,located in the directory .instant in the home directory of the user running it.In addition, all Instant modules located in the temporary directory where theywere first generated and compiled. It does not clean modules located elsewhere.The script instant-showcache allow you to see the modules currently locatedin the Instant cache:Found 1 modules in Instant cache:test_cacheFound 1 lock files in Instant cache:test_cache.lockArguments to this script will output the files matching the specified pattern, forexample will instant-showcache ’test*.i’ show the content of the SWIGinterface file for any module beginning with the letters test.16.3.1 Arrays and TypemapsInstant has support for converting NumPy arrays to C arrays and vice versa.For arrays with up to three dimensions, the SWIG interface file from NumPy isused, with a few modifications. When installing Instant, this file is included aswell. arrays should be a list, each entry containing information about a specificarray. This entry should contain a list with strings, so the arrays argument isa nested list.Each array (i.e. each element in arrays) is a list containing the names ofthe variables describing that array in the C code. Fora1-dimensional array, thismeans the names of the variables containing the length of the array (an int),and the array pointer (can have several tpes, but the default is double). For 2-dimensional arrays we need three strings, two for the length in each dimension,and the third for the array pointer. For 3-dimensional arrays, there will be threevariables first. This example should make things clearerarrays = [[’len’, ’a’],[’len_bx’, ’len_by’, ’b’],[’len_cx’, ’len_cy’, ’len_cz’, ’c’]]These variables names specified reflect the variable names in the C functionsignature. It is important that the order of the variables in the signature isretained for each array, e.g. one cannot write:136

Ilmar M. Wilbers, Kent-Andre Mardal and Martin S. Alnæsc_code = """double sum (int len_a, int len_bx, int len_by,double* a, double* b){...}"""The correct code would be:c_code = """double sum (int len_a, double* a,int len_bx,int len_by, double* b){...}"""The order of the arrays can be changed, as long as the arguments in the Pythonfunction are changed as well accordingly.Data TypesDefault, all arrays are assumedto be oftype double,but several othertypes aresupported. These are float, short, int, long, long long, unsigned short,unsigned int, unsigned long, and unsigned long long. The type can bespecified by adding an additional element to the list describing the array, e.g.arrays = [[’len’, ’a’, ’long’]]It is important that there is correspondance between the type of the NumPyarrayandthetypeinthesignatureoftheCfunction. Forarraysthatarechangedin-place, the types have to match exactly. For arrays that are input or output(see next section), one has to make sure that the implicit casting is done to atype with higher accuracy. For input arrays, the C type must be of higher (or thesame)accuracy thantheNumPyarray, whileforoutputarrays theNumPyarraytype must be of higher (or the same) accuracy than the C array. The NumPytype float32 corresponds to the C type float, while float64 corresponds todouble. The NumPy type float is the same as float64. For integer arrays,the mapping between NumPy types and C types depends on your system. Usinglong as the C type will work in most cases.137

Instant: Just-in-Time Compilation of C/C++ Code in PythonInput/Output ArraysAll arrays are assumed to be both input and output arrays, i.e. any changesto arrays in the C code result in the NumPy array being changed in-place. Forperformace purposes, this is desirable, as we avoid unecessary copying of data.The NumPy SWIG interface file has support for both input and output arraysin addition to changing arrays in-place. Input arrays do not need to be NumPyarrays, but can be any type of sequence, e.g. lists and tuples. The default behaviourof the NumPy SWIG interface file is to create new objects for sequencesthat are not NumPy arrays, while using mere pointers to the data of NumPyarrays. Instant deviates from this behaviour by taking copies of all input data,allowing for the modification of the array in the C code, as might be necessaryfor certain applications, while retaining the array as seen from the Python code.An array is marked as input only by adding the additional element ’in’ to thelist describing the array:arrays = [[’len’, ’a’, ’in’]]It is also possible to create output arrays in the C code. Instead of creatingan array in the Python code and sending it as an in-place array to the C code,the array is created by the wrapper code and returned. If there are are multipleoutput arrays or the C function has a return argument, the wrapper functionreturnsatuple with the different arguments. This approach is more Python-likethan changing arrays in-place.We only need to specify the length of the array when calling the wrapperfunction. The limitation is that only 1-dimensional arrays are supported, whichmeans that we need to set the shape of the array manually after calling thewrapper function. In the C code all arrays are treated as 1-dimensional, so thisdoes not affect the C code. An array is marked as input only by adding theadditional element ’out’ to the list describing the array. The following codeshows an example where we calculate matrix-vector multiplication x = Ab. Thematrix A is marked as input, the vector b as in-place, and the vector x as output.The example is only meant for illustrating the use of the different array options,and can be found in the file $examples/ex5.py. We verify that the result iscorrect by using the dot product from NumPy:from instant import inline_with_numpyfrom numpy import arange, dotc_code = ’’’void dot_c(int Am, int An, long* A, int bn, int* b,int xn, double* x){for (int i=0; i

Ilmar M. Wilbers, Kent-Andre Mardal and Martin S. Alnæs{x[i] = 0;for (int j=0; j

Instant: Just-in-Time Compilation of C/C++ Code in Pythonu(i,j,k) we need to write u[i*ny*nz + j*ny + k], where nx, ny, and nzare the lengths of the array in each direction. One way of achieving a simplersyntax is to use the #define macro in C:#define index(u, i, j, k) u[(i)*nz*ny + (j)*ny + (k)]which allows us to write index(u, i, j, k) instead.16.3.2 Module name, signature, and cacheThe Instant cache resides in the directory .instant in the directory of the user.It is possible to specify a different directory, but the instant-clean script willnot remove these when executed. The three keyword arguments modulename,signature,and cache dirareconnected. Ifnoneofthemaregiven,thedefaultbehaviour is to create a signature from the contents of the files and argumentsto the build module function, resulting in a name starting with instant -module followed by a long checksum. The resulting code is copied to Instantcache unless cache dir is set to a specific directory. Note that changing thearguments or any of the files will result in a new directory in the Instant cache,as the checksum no longer is the same. Before compiling a module, Instant willalways check if it is cached in both the Instant cache and in the current workingdirectory.If modulename is used, the directory with the resulting code is named accordingly,but not copied to the Instant cache. Instead, it is stored in the currentworkingdirectory. Any changes to the argument or the source files will automaticallyresult in a recompilation. The argument cache dir is ignored.When signatureis given as argument, Instant uses this instead of calculatingchecksums.Theresultingdirectoryhasthesamenameasthesignature, providedthe signature does not contain more than 100 characters containing onlyletters, numbers, or a underscore. If the signature contains any of these characters,the module name is generated based on the checksum of this string, resultinginamodulenamestartingwithinstant module followedbythechecksum.Because the user specifies the signature herself, changes in the argumentsor source code will not cause a recompilation. The use of signatures is primarilyintended for external software making use of Instant, e.g. SFC. Sometimes, thecode output by this software might be different from the code used previouslyby Instant, without these changes affecting the result of running this code (e.g.comments are inserted to the code). By using signatures, the external programcan decide when recompilation is necessary instead of leaving this to Instant.Unless otherwise specified, the modules is stored in the Instant cache.It is not possible to specify both the module name and the signature. If bothare given, Instant will issue an error.140

Ilmar M. Wilbers, Kent-Andre Mardal and Martin S. AlnæsIn addition to the disk cache discussed so far, Instant also has a memorycache. All modules used during the life-time of a program are stored in memoryfor faster access. The memory cache is always checked before the disk cache.16.3.3 LockingInstant provides file locking functionality for cache modules. If multiple processesare working on the same module, race conditions could potentially occurwhretwoor moreprocesses believe themodule is missing from thecache and tryto write it simultaneously. To avoid race conditions, lock files were introduced.Thelock filesreside in theInstantcache, andlocking isonlyenabled formodulesthat should be cached, i.e. where the module name is not given explicitely asargument to build module or one of its wrapper functions. The first process toreach the stage where the module is copied from its temporary location to theInstant cache, will aquire a lock, and other processes cannot access this modulewhile it is being copied.16.4 Instant APIIn this section we will describe the various Instant functions and their argumentsvisible to the user. The first ten functions are the core Instant functions,with build module being the main one, while the next eight are wrapper functionsaround this function. Further, there are four more helper functions available,intended for using Instant with other applications.16.4.1 build moduleThis function is the most important one in Instant, and for most applicationsthe only one that developers need to use, combined with the existing wrapperfunctions around this function. The return argument is the compiled module,hence it can be used directly in the calling code (rather then importing it as aPython module). It is also possible to import the module manually if compiled inthe same directory as the calling code.Thereareanumberofkeywordarguments,andwewillexplain themindetailhere. Although one of the aims of Instant is to minimize the direct interactionwith SWIG, some of the keywords require a good knowledge of SWIG in order tomake sense. In this way, Instant can be used both by programmers new to theuseofextensionlanguages forPython, as wellasby experiencedSWIGprogrammers.The keywords arguments are as follows:• modulename– Default: None141

Instant: Just-in-Time Compilation of C/C++ Code in Python– Type: String– Comment: The name you want forthe module. If specified, the modulewill not be cached. If missing, a name will be constructed based on achecksum oftheotherarguments,and themodulewill be placed in theglobal cache. See Section 16.3.2 for more details.• source directory• code– Default: ’.’– Type: String– Comment: The directory where user supplied files reside. The filesgiven in sources, wrap headers, and local headers are expectedto exist in this directory.– Default: ’’– Type: String– Comment: The C or C++ code to be compiled and wrapped.• init code– Default: ’’– Type: String– Comment: Code that should be executed when the Instant module isimported. This code is inserted in the SWIG interface file, and is usedfor instance for calling import array() used for the initialization ofNumPy arrays.• additional definitions– Default: ’’– Type: String– Comment: Additional definitions (typically needed for inheritance) forinterfacefile. Thesedefinitionsshouldbegivenastriple-quotedstringsin the case they span multiple lines, and are placed both in the initialblock for C/C++ code (%{,%}-block), and the main section of the interfacefile.• additional declarations– Default: ’’– Type: String142

Ilmar M. Wilbers, Kent-Andre Mardal and Martin S. Alnæs– Comment: Additional declarations (typically needed for inheritance)for interface file. These declarations should be given as triple-quotedstringsin thecasetheyspanmultiple lines, andareplaves inthemainsection of the interface file.• sources– Default: []– Type: List of strings– Comment: Source files to compile and link with the module. Thesefiles are compiled togehter with the SWIG-generated wrapper file intothe final library file. Should reside in directory specified in source -directory.• wrap headers– Default: []– Type: List of strings– Comment: Local header files that should be wrapped by SWIG. Thefiles specified will be included both in the initial block for C/C++ code(with a C directive) and in the main section of the interface file (witha SWIG directive). Should reside in directory specified in source -directory.• local headers– Default: []– Type: List of strings– Comment: Local header files required to compile the wrapped code.The files specified will be included in the initial block for C/C++ code(with a C directive). Should reside in directory specified in source -directory.• system headers– Default: []– Type: List of strings– Comment: System header files required to compile the wrapped code.The files specified will be included in the initial block for C/C++ code(with a C directive).• include dirs– Default: []143

Instant: Just-in-Time Compilation of C/C++ Code in Python– Type: List of strings– Comment: Directories to search for header files for building the extensionmodule. Needs to be absolute path names.• library dirs– Default: []– Type: List of strings– Comment: Directories to search for libraries (-l) for building the extensionmodule. Needs to be absolute paths.• libraries– Default: []– Type: List of strings– Comment: Libraries needed by the Instant module. The libraries willbe linked in from the shared object file. The initial -l is added automatically.• swigargs– Default: [’-c++’, ’-fcompact’, ’-O’, ’-I.’, ’-small’]– Type: List of strings– Comment: Arguments to swig, e.g. [’-lpointers.i’]to include theSWIG pointers.i library.• swig include dirs– Default: []– Type: List of strings– Comment: Directories to include in the ’swig’ command.• cppargs– Default: [’-O2’]– Type: List of strings– Comment: Arguments to the C++ compiler, other than include directories,e.g. [’-Wall’, ’-fopenmp’].• lddargs– Default: []– Type: List of strings144

Ilmar M. Wilbers, Kent-Andre Mardal and Martin S. Alnæs– Comment: Arguments to the linker, other than libraries and librarydirectories, e.g. [’-E’, ’-U’].• arrays– Default: []– Type: List of strings– Comment: AnestedlistdescribingtheCarraystobemadefromNumPyarrays. The SWIG interface for fil NumPy is used. For 1D arrays,the inner list should contain strings with the variable names for thelength of the arrays and the array itself. 2D matrices should containthe names of the dimensions in the two directions as well as the nameof the array, and 3D tensors should contain the names of the dimensionsin the three directions in addition to the name of the array. Ifthe NumPy array harmore than fourdimensions, the innerlist shouldcontain strings with variable names for the numberof dimensions, thelengthineachdimensionasapointer,andthearrayitself,respectively.For more details, see section 16.3.1.• generate interface– Default: True– Type: Boolean– Comment: Indicate whether you want to generate the interface files.• generate setup– Default: True– Type: Boolean– Comment: Indicate if you want to generate the setup.py file.• signature– Default: None– Type: String– Comment: A signature string to identify theform instead ofthesourcecode. See Section 16.3.2.• cache dir– Default: None– Type: String145

Instant: Just-in-Time Compilation of C/C++ Code in Python– Comment: A directory to look for cached modules and place new ones.If missing, a default directory is used. Note that the module will notbe cached if modulename is specified. The cache directory should notbe used for anything else.16.4.2 inlineThefunction inlinecreatesamodulegiventhattheinputisavalidC/C++function.It is only possible to inline one C/C++ function each time. One mandatoryargument, which is the C/C++ code to be compiled.The default keyword arguments from build module are used, with c codeas the C/C++ code given as argument to inline. These keyword argument canbe overridden, however, by giving them as arguments to inline, with the obviousexception of code. The function tries to return the single C/C++ function tobe compiled rather than the whole module, if it fails, the module is returned.16.4.3 inline moduleThe same as inline, but returns the whole module rather than a single function.Except for the C/C++ code being a mandatory argument, the exact same asbuild module.16.4.4 inline with numpyThe difference between this function and the inline function is that C-arrayscan be used. This means that the necessary arguments (init code, system -headers, and include dirs) for converting NumPy arrays to C arrays are setby the function.16.4.5 inline module with numpyThe difference between this function and the inline module function is thatC-arrays can be used. This means that the necessary arguments (init code,system headers,and include dirs)forconverting NumPyarrays toCarraysare set by the function.16.4.6 import moduleThis function can be used to import cached modules from the current work directoryor the Instant cache. It has one mandatory argument, moduleid, andone keyword argument cache dir. If the latter is given, Instant searches thespecified directory instead of the Instant cache, if this directory exists. If the146

Ilmar M. Wilbers, Kent-Andre Mardal and Martin S. Alnæsmodule is not found, None is returned. The moduleid arguments can be eitherthe module name, a signature, or an object with a function signature.Using the module name or signature, assuming the module instant extexists in the current working directory or the Instant cache, we import a modulein the following way:instant_ext = import_module(’instant_ext’)Usinganobjectasargument,assumingthisobjectincludesafunction signature()and the module is located in the directory /tmp:instant_ext = import_module(signature_object, ’/tmp’)The imported module, if found, is also placed in the memory cache.16.4.7 header and libs from pkgconfigThis function returns a list of include files, flags, libraries and library directoriesobtain from a pkg-config 6 file. It takes any number of arguments, one string forevery package name. It returns four or five arguments. Unless the keywordargument returnLinkFlags is given with the value True, it returns lists withtheincludedirectories,thecompileflags,thelibraries, andthelibrarydirectoriesof the package names given as arguments. If returnLinkFlags is True, thelink flags are returned as a fifth list. Let’s look at an example:inc_dirs, comp_flags, libs, lib_dirs, link_flags = \header_and_libs_from_pkgconfig(’ufc-1’, ’libxml-2.0’, ’numpy-1’,returnLinkFlags=True)This makes it a easy to write C code that makes use of a package providing apkg-config file, as we can use the returned lists for compiling and linking ourmodule correctly.16.4.8 get status outputThis function provides a platform-independent way of running processes in theterminal and extracting the output using the Python module subprocess 7 . Theone mandatory argument is the command we want to run. Further, there arethree keyword arguments. The first is input, which should be a string containinginput to the process once it is running. The other two are cwd and env.We refer to the documentation of subprocess for a more detailes description ofthese, but in short the first is the directory in which the process should be executed,while the second is used for setting the necessary environment variables.6 http://pkg-config.freedesktop.org/wiki7 http://docs.python.org/library/subprocess.html147

Instant: Just-in-Time Compilation of C/C++ Code in Python16.4.9 get swig versionReturnstheSWIGversioninstalledonthesystemasastring,forinstance’1.3.36’.Accepts no arguments.16.4.10 check swig versionTakes a single argument, which should be a string on the same format as theoutputof get swig version. Returns Trueifthe version ofthe installed SWIGis equal or greater than the version passed to the function. It also has one keywordargument, same. If it is True, the function returns True if and only if thetwo versions are the same.148

CHAPTER17SyFi: Symbolic Construction of Finite Element BasisFunctionsBy Martin S. Alnæs and Kent-Andre MardalChapter ref: [alnes-3]SyFi is a C++ library for definition of finite elements based on symbolic computations.By solving linear systems of equations symbolically, symbolic expressionsfor the basis functions of a finite element can be obtained. SyFi contains acollection of such elements.The SyFi Form Compiler, SFC, is a Python module for generation of finiteelement code based on symbolic computations. Using equations in UFL formatas input and basis functions from SyFi, SFC can generate C++ code which implementsthe UFC interface for computation of the discretized element tensors.SFCsupportsgeneratingcodebasedonquadratureorusingsymbolicintegrationprior to code generation to produce highly optimized code.149

CHAPTER18UFC: A Finite Element Code Generation InterfaceBy Martin S. Alnæs, Anders Logg and Kent-Andre MardalChapter ref: [alnes-2]When combining handwritten libraries with automatically generated codelike we do in FEniCS, it is important to have clear boundaries between the two.This is best done by having the generated code implement a fixed interface, suchthat the library and generated code can be as independent as possible. Such aninterface is specified in the project Unified Form-assembly Code (UFC) for finiteelements and discrete variational forms. This interface consists of a small set ofabstract classes in a single header file, which is well documented. The detailsof the UFC interface should rarely be visible to the end-user, but can be importantfor developers and technical users to understand how FEniCS projectsfit together. In this chapter we discuss the main design ideas behind the UFCinterface, including current limitations and possible future improvements.151

CHAPTER19UFL: A Finite Element Form LanguageBy Martin Sandve AlnæsChapter ref: [alnes-1]◮ Editor note: Sort out what to do with all UFL specific macros and bold math fonts.The Unified Form Language – UFL [?, ?] – is a domain specific language forthe declaration of finite element discretizations of variational forms and functionals.More precisely, the language definesaflexible user interface for definingfiniteelementspacesandexpressions forweakformsin anotationclose tomathematicalnotation.The FEniCS project [?, ?, ?] provides a framework for building applicationsfor solving partial differential equations (PDEs). UFL is one of the core componentsof this framework. It defines the language you express your PDEs in. It isthe input language and front-end of the form compilers FFC [?, ?, ?, ?, ?, ?] andSFC [?, ?]. The UFL implementation provides algorithms that the form compilerscan use to simplify the compilation process. The output from these formcompilersisUFC[?,?,?]conformingC++[?]code. ThiscodecanbeusedwiththeC++library DOLFIN 1 [?,?,?]toefficientlyassemblelinearsystemsandcomputesolution to PDEs.The combination of domain specific languages and symbolic computing withfinite element methods has been pursued from other angles in several otherprojects. Sundance [?, ?, ?] implements a symbolic engine directly in C++ todefine variational forms, and has support for automatic differentiation. TheLife [?, ?]project usesadomain specific language embeddedin C++, based on expressiontemplate techniques to specify variational forms. SfePy [?] uses SymPy1 Note that in PyDOLFIN, some parts of UFL is wrapped to blend in with other softwarecomponentsand make the compilation process hidden from the user. This is not discussed here.153

UFL: A Finite Element Form Languageas a symbolic engine, extending it with finite element methods. GetDP [?, ?]is another project using a domain specific language for variational forms. TheMathematica package AceGen [?, ?] uses the symbolic capabilities of Mathematicato generate efficient code for finite element methods. All these packages havein common a focus on high level descriptions of partial differential equations toachive higher human efficiency in the development of simulation software.UFL almost resembles a library for symbolic computing, but its scope, goalsand priorities are different from generic symbolic computing projects such asGiNaC [?, ?], swiginac [?] and SymPy [?]. Intended as a domain specific languageand form compiler frontend, UFL is not suitable for large scale symboliccomputing.This chapter is intended both for the FEniCS user who wants to learn how toexpress her equations, and for otherFEniCS developers and technical users whowantstoknowhowUFLworksontheinside. Therefore,thesectionsofthischapterare organized with an increasing amount of technical details. Sections 19.1-19.5 give an overview of the language as seen by the end-user and is intendedfor all audiences. Sections 19.6-19.9 explain the design of the implementationand dive into some implementation details. Many details of the language has tobe omitted in a text such as this, and we refer to the UFL manual [?] for a morethorough description. Note that this chapter refers to UFL version 0.3, and boththe user interface and the implementation may change in future versions.Startingwithabriefoverview,wementionthemaindesigngoalsforUFLandshow an example implementation of a non-trivial PDE in Section 19.1. Next wewill look at how to define finite element spaces in Section 19.2, followed by theoverall structure of forms and their declaration in Section 19.3. The main partof the language is concerned with defining expressions from a set of data typesand operators, which are discussed in Section 19.4. Operators applying to entireforms is the topic of Section 19.5.The technical part of the chapter begins with Section 19.6 which discussesthe representation of expressions. Building on the notation and data structuresdefinedthere,howtocomputederivativesisdiscussedinSection19.7. Somecentralinternal algorithms and key issues in their implementation are discussed inSection 19.8. Implementation details, some of which are specific to the programminglanguage Python [?], is the topic of Section 19.9. Finally, Section 19.10discusses future prospects of the UFL project.19.1 Overview19.1.1 Design goalsUFL is a unification, refinement and reimplementation of the form languagesused in previous versions of FFC and SFC. The development of this language154

Martin Sandve Alnæshas been motivated by several factors, the most important being:• A richer form language, especially for expressing nonlinear PDEs.• Automatic differentiation of expressions and forms.• Improving the performance of the form compiler technology to handle morecomplicated equations efficiently.UFLfulfilsalltheserequirements,andbythisitrepresentsamajorstepforwardin the capabilities of the FEniCS project.Tensoralgebra andindexnotationsupportismodeledaftertheFFCformlanguageand generalized further. Several nonlinear operators and functions whichonly SFC supported before have been included in the language. Differentiationof expressions and forms has become an integrated part of the language, and ismuch easier to use than the way these features were implemented in SFC before.In summary, UFL combines the best of FFC and SFC in one unified formlanguage and adds additional capabilities.The efficiency of code generated by the new generation of form compilersbased on UFL has been verified to match previous form compiler benchmarks [?,?]. The form compilation process is now fast enough to blend into the regularapplication build process. Complicated forms that previously required too muchmemory to compile, or took tens of minutes or even hours to compile, now compilesin seconds with both SFC and FFC.19.1.2 Motivational exampleOne major motivating example during the initial development of UFL has beenthe equations for elasticity with large deformations. In particular, models of biologicaltissue use complicated hyperelastic constitutive laws with anisotropiesand strong nonlinearities. To implement these equations with FEniCS, all threedesign goals listed above had to be adressed. Below, one version of the hyperelasticityequationsandtheircorresponding UFL implementationis shown. Keepin mind that this is only intended as an illustration of the close correspondencebetween the form language and the natural formulation of the equations. Themeaning of equations is not necessary for the reader to understand. Note thatmany other examples are distributed together with UFL.In the formulation of the hyperelasticity equations presented here, the unknownfunction is the displacement vector field u. The material coefficients c 1and c 2 are scalar constants. The second Piola-Kirchoff stress tensor S is computedfrom the strain energy function W(C). W defines the constitutive law,here a simple Mooney-Rivlin law. The equations relating the displacement and155

UFL: A Finite Element Form Languagestresses read:F = I + (∇u) T ,C = F T F,I C = tr(C),II C = 1 2 (tr(C)2 − tr(CC)),W = c 1 (I C − 3) + c 2 (II C − 3),S = 2 ∂W∂C ,P = FS.(19.1)Approximating the displacement field as u = ∑ k u kφ 1 k , the weak forms of theequations are as follows (ignoring boundary conditions):∫L(φ 0 ;u, c 1 , c 2 ) =ΩP : (∇φ 0 ) T dx, (19.2)a(φ 0 , φ 1 k ;u, c 1, c 2 ) = ∂L∂u k. (19.3)Figure 19.1.2 shows an implementation of these equations in UFL. Notice theclose relation between the mathematical notation and the UFL source code. Inparticular, note the automated differentiation of both the constitutive law andthe residual equation. This means a new material law can be implemented bysimplychanging W, therest isautomatic. Inthefollowingsections,thenotation,definitions and operators used in this implementation are explained.19.2 Defining finite element spacesA polygonal cell is defined by a basic shape and a degree 2 , and is declaredcell = Cell(shape, degree)UFL defines a set of valid polygonal cell shapes: “interval”, “triangle”, “tetrahedron”,“quadrilateral”, and “hexahedron”. Linear cells of all basic shapes arepredefined and can be used instead by writingcell = tetrahedron2 Note that at the time of writing, the other componentsof FEniCS does not yethandle higherdegreecells.156

Martin Sandve Alnæs# Finite element spacescell = tetrahedronelement = VectorElement("CG", cell, 1)# Form argumentsphi0 = TestFunction(element)phi1 = TrialFunction(element)u = Function(element)c1 = Constant(cell)c2 = Constant(cell)# Deformation gradient Fij = dXi/dxjI = Identity(cell.d)F = I + grad(u).T# Right Cauchy-Green strain tensor C with invariantsC = variable(F.T*F)I_C = tr(C)II_C = (I_C**2 - tr(C*C))/2# Mooney-Rivlin constitutive lawW = c1*(I_C-3) + c2*(II_C-3)# Second Piola-Kirchoff stress tensorS = 2*diff(W, C)# Weak formsL = inner(F*S, grad(phi0).T)*dxa = derivative(L, u, phi1)Figure 19.1: UFL implementation of hyperelasticity equations with a Mooney-Rivlin material law.157

UFL: A Finite Element Form LanguageIn the rest of this chapter, a variable name cell will be used where any cellis a valid argument, to make the examples dimension independent whereverpossible.UFL defines syntax for declaring finite element spaces, but does not knowanything about the actual polynomial basis or degrees of freedom. The polynomialbasis is selected implicitly by choosing among predefined basic elementfamilies and providing a polynomial degree, but UFL only assumes that thereexists a basis with a fixed ordering for each finite element space V h , i.e.V h = span {φ j } n j=1 . (19.4)Basic scalar elements can be combined to form vector elements or tensor elements,and elements can easily be combined in arbitrary mixed element hierarchies.The set of predefined 3 element family names in UFL includes “Lagrange”(short name “CG”), representing scalar Lagrange finite elements (continuouspiecewise polynomial functions), “Discontinuous Lagrange” (short name “DG”),representingscalardiscontinuousLagrangefiniteelements(discontinuouspiecewisepolynomial functions), and a range of other families that can be found inthe manual. Each family name has an associated short name for convenience.To print all valid families to screen from Python, call show elements().The syntax for declaring elements is best explained with some examples.cell = tetrahedronP = FiniteElement("Lagrange", cell, 1)V = VectorElement("Lagrange", cell, 2)T = TensorElement("DG", cell, 0, symmetry=True)TH = V + PME = MixedElement(T, V, P)In the first line a polygonal cell is selected from the set of predefined linear cells.ThenascalarlinearLagrangeelement Pisdeclared,aswellasaquadraticvectorLagrange element V.Nextasymmetricrank 2tensorelement Tisdefined,whichis also piecewise constant on each cell. The code pproceeds to declare a mixedelement TH,whichcombinesthequadratic vectorelement Vandthelinear scalarelement P. This element is known as the Taylor-Hood element. Finally anothermixed element with three sub elements is declared. Note that writing T + V +P would not result in a mixed elementwith three direct sub elements,but ratherMixedElement(MixedElement(T + V), P).3 Form compilers can register additional element families.158

19.3 Defining formsMartin Sandve AlnæsConsider Poisson’s equation with two different boundary conditions on ∂Ω 0 and∂Ω 1 ,∫a(v, u; w) = w∇u · ∇v dx, (19.5)Ω∫ ∫ ∫L(v; f, g, h) = fv dx + g 2 v ds + hv ds. (19.6)Ω ∂Ω 0 ∂Ω 1These forms can be expressed in UFL asa = dot(grad(u), grad(v))*dxL = f*v*dx + g**2*v*ds(0) + h*v*ds(1)where multiplication by the measures dx, ds(0) and ds(1) represent the integrals∫ Ω 0(·) dx, ∫ ∂Ω 0(·) ds, and ∫ ∂Ω 1(·) ds respectively.Forms expressed in UFL are intended for finite element discretization followedby compilation to efficient code for computing the element tensor. Consideringthe above example, the bilinear form a with one coefficient function w isassumed to be evaluated at a later point with a range of basis functions and thecoefficient function fixed, that isV 1h = span { φ 1 k|V 2 h |}, V2h = span { φ 2 k}, V3h = span { φ 3 k}, (19.7)∑w = w k φ 3 k , {w k} given, (19.8)k=1A ij = a(φ 1 i , φ2 j; w), i = 1, . . .,|V1h2|, j = 1, . . .,|V |. (19.9)In general, UFL is designed to express forms of the following generalizedform:∑n ca(φ 1 , . . .,φ r ; w 1 , . . .,w n ) = Ik ∫Ω c dx + ∑n eIk k∫∂Ω e ds + ∑n i∫dS. (19.10)kk=1Most of this chapter deals with ways to define the integrand expressions Ik c, Ie kand Ik i . The rest of the notation will be explained below.The form arguments are divided in two groups, the basis functions φ 1 , . . .,φ rand the coefficient functions w 1 , . . ., w n . All {φ k } and {w k } are functions in somediscrete function space with a basis. Note that the actual basis functions {φ k j }and the coefficients {w k } are never known to UFL, but we assume that the orderingof the basis for each finite element space is fixed. A fixed ordering onlymatters when differentiating forms, explained in Section 19.7.Each term of a valid form expression must be a scalar-valued expression integratedexactly once, and they must be linear in {φ k }. Any term may have159k=1k=1hΓ kI i k

UFL: A Finite Element Form Languagenonlinear dependencies on coefficient functions. A form with one or two basisfunction arguments (r = 1, 2) is called a linear or bilinear form respectively, ignoringitsdependencyoncoefficientfunctions.Thesewillbeassembledtovectorsand matrices when used in an application. A form depending only on coefficientfunctions(r = 0)iscalledafunctional,sinceitwillbeassembledtoarealnumber.The entire domain is denoted Ω, the external boundary is denoted ∂Ω, whilethe set of interior facets of the triangulation is denoted Γ. Sub domains aremarked with a suffix, e.g., Ω k ⊂ Ω. As mentioned above, integration is expressedby multiplication with a measure, and UFL defines the measures dx, ds and dS.In summary, there are three kinds of integrals with corresponding UFL representations• ∫ Ω k(·) dx ↔ (·)*dx(k), called a cell integral,• ∫ ∂Ω k(·) ds ↔ (·)*ds(k), called an exterior facet integral,• ∫ Γ k(·) dS ↔ (·)*dS(k), called an interior facet integral,Defining a different quadrature order for each term in a form can be achieved byattaching meta data to measure objects, e.g.,dx02 = dx(0, { "integration_order": 2 })dx14 = dx(1, { "integration_order": 4 })dx12 = dx(1, { "integration_order": 2 })L = f*v*dx02 + g*v*dx14 + h*v*dx12Meta data can also be used to override other form compiler specific options separatelyfor each term. For more details on this feature see the manuals of UFLand the form compilers.19.4 Defining expressionsMost of UFL deals with how to declare expressions such as the integrand expressionsin Equation 19.10. The most basic expressions are terminal values,which do not depend on other expressions. Other expressions are called operators,which are discussed in sections 19.4.2-19.4.5.Terminal value types in UFL include form arguments (which is the topic ofSection 19.4.1), geometric quantities, and literal constants. Among the literalconstants are scalar integer and floating point values, as well as the d by d identitymatrix I = Identity(d). To get unit vectors, simply use rows or columnsof the identity matrix, e.g., e0 = I[0,:]. Similarly, I[i,j] represents theDirac delta function δ ij (see Section 19.4.2 for details on index notation). Availablegeometric values are the spatial coordinates x ↔ cell.x and the facet normaln ↔ cell.n. The geometric dimension is available as cell.d.160

Martin Sandve Alnæs19.4.1 Form argumentsBasis functions and coefficient functions are represented by BasisFunctionand Function respectively. The ordering of the arguments to a form is decidedby the order in which the form arguments were declared in the UFL code. Eachbasis functionargumentrepresentsanyfunction inthebasis ofitsfiniteelementspaceφ j ∈ {φ j k },V jh = span { φ j k}. (19.11)with the intention that the form is later evaluated for all φ k such as in equation(19.9). Each coefficient function w represents a discrete function in some finiteelement space V h ; it is usually a sum of basis functions φ k ∈ V h with coefficientsw k|V h |∑w = w k φ k . (19.12)k=1Theexceptioniscoefficientfunctionsthatcanonlybeevaluatedpointwise,whichare declared with a finite elementwith family “Quadrature”. Basis functions aredeclared for an arbitrary element as in the following manner:phi = BasisFunction(element)v = TestFunction(element)u = TrialFunction(element)Byusing TestFunctionand TrialFunctionindeclarationsinsteadof Basis-Functionyoucanignoretheirrelativeordering. Theonlytime BasisFunctionis needed is for forms of arity r > 2.Coefficient functions are declared similarly for an arbitrary element, andshorthand notation exists for declaring piecewise constant functions:w = Function(element)c = Constant(cell)v = VectorConstant(cell)M = TensorConstant(cell)Ifaform argument u in a mixed finite elementspace V h = Vh 0 ×V h 1 is desired, butthe form is more easily expressed using sub functions u 0 ∈ Vh 0 and u 1 ∈ Vh 1,youcan split the mixed function or basis function into its sub functions in a genericway using split:V = V0 + V1u = Function(V)u0, u1 = split(u)161

UFL: A Finite Element Form LanguageThe splitfunctioncanhandlearbitrary mixedelements. Alternatively,ahandyshorthand notation for argument declaration followed by split isv0, v1 = TestFunctions(V)u0, u1 = TrialFunctions(V)f0, f1 = Functions(V)19.4.2 Index notationUFLallowsworkingwithtensorexpressionsofarbitrary rank, usingbothtensoralgebra and index notation. A basic familiarity with tensor algebra and indexnotation is assumed. The focus here is on how index notation is expressed inUFL.Assumingastandard orthonormalEuclidean basis 〈e k 〉 d k=1 for Rd ,avectorcanbe expressedwithitsscalar componentsin thisbasis. Tensorsofrank twocan beexpressedusingtheirscalarcomponentsinadyadicbasis {e i ⊗e j } d i,j=1 . Arbitraryrank tensors can be expressed the same way, as illustrated here.v =A =C =d∑v k e k , (19.13)k=1d∑i=1d∑d∑A ij e i ⊗ e j , (19.14)j=1d∑ ∑C ijk e i ⊗ e j ⊗ e k . (19.15)i=1 j=1 kHere, v, A and C are rank 1, 2 and 3 tensors respectively. Indices are calledfree if they have no assigned value, such as i in v i , and fixed if they have a fixedvalue such as 1 in v 1 . An expression with free indices represents any expressionyou can get by assigning fixed values to the indices. The expression A ij is scalarvalued, and represents any component (i, j) of the tensor A in the Euclidean basis.When working on paper, it is easy to switch between tensor notation (A)and index notation (A ij ) with the knowledge that the tensor and its componentsare different representations of the same physical quantity. In a programminglanguage, we must express the operations mapping from tensor to scalar componentsand back explicitly. Mapping from a tensor to its components, for a rank 2tensor defined asA ij = A : (e i ⊗ e j ), (19.16)(19.17)162

Martin Sandve Alnæsis accomplished using indexing with the notation A[i,j]. Defining a tensor Afrom component values A ij is defined asA = A ij e i ⊗ e j , (19.18)andis accomplished using the function as vector(Aij, (i,j)). To illustrate,consider the outer product of two vectors A = u ⊗ v = u i v j e i ⊗ e j , and the correspondingscalar components A ij . One way to implement this isA = outer(u, v)Aij = A[i, j]Alternatively, the components of A can be expressed directly using index notation,such as A ij = u i v j . A ij can then be mapped to A in the following manner:Aij = v[j]*u[i]A = as_tensor(Aij, (i, j))These two pairs of lines are mathematically equivalent, and the result of eitherpair is that the variable A represents the tensor A and the variable Aij representsthe tensor A ij . Note that free indices have no ordering, so their order ofappearance in theexpression v[j]*u[i]isinsignificant. Insteadof as tensor,the specialized functions as vector and as matrix can be used. Although arank two tensor was used for the examples above, the mappings generalize toarbitrary rank tensors.When indexing expressions, fixed indices can also be used such as in A[0,1]which represents a single scalar component. Fixed indices can also be mixedwith free indices such as in A[0,i]. In addition, slices can be used in place ofanindex. An example of using slices is A[0,:] which is is a vector expression thatrepresents row 0 of A. To create new indices, you can either make a single one ormake several at once:i = Index()j, k, l = indices(3)A set of indices i, j, k, l and p, q, r, s are predefined, and these should sufficefor most applications.If your components are not represented as an expression with free indices,but as separate unrelated scalar expressions, you can build a tensor from themusing as tensor and its peers. As an example, lets define a 2D rotation matrixand rotate a vector expression by π 2 :th = pi/2A = as_matrix([[ cos(th), -sin(th)],[ sin(th), cos(th)]])u = A*v163

UFL: A Finite Element Form LanguageWhen indices are repeated in a term, summation over those indices is impliedin accordance with the Einstein convention. In particular, indices can berepeated when indexing a tensor of rank two or higher (A[i,i]), when differentiatingan expression with a free index (v[i].dx(i)), or when multiplying twoexpressions with shared free indices (u[i]*v[i]).A ii ≡ ∑ iA ii ,v i u i ≡ ∑ iv i u i ,v i,i ≡ ∑ iv i,i . (19.19)An expression Aij = A[i,j] is represented internally using the Indexedclass. Aij will reference A, keeping the representation of the original tensorexpression A unchanged. Implicit summation is represented explicitly in theexpression tree using the class IndexSum. Many algorithms become easier toimplement with this explicit representation, since e.g. a Product instance cannever implicitly represent a sum. More details on representation classes arefound in Section 19.6.19.4.3 Algebraic operators and functionsUFL defines a comprehensive set of operators that can be used for composingexpressions. The elementary algebraic operators +, -, *, / can be used betweenmost UFL expressions with a few limitations. Division requires a scalar expressionwith no free indices in the denominator. The operands to a sum must havethe same shape and set of free indices.The multiplication operator * is valid between two scalars, a scalar and anytensor, a matrix and a vector, and two matrices. Other products could have beendefined, but for clarity we use tensor algebra operators and index notation forthose rare cases. A product of two expressions with shared free indices impliessummation over those indices, see Section 19.4.2 for more about index notation.Three often used operators are dot(a, b), inner(a, b), and outer(a,b). The dot product of two tensors of arbitrary rank is the sum over the lastindex of the first tensor and the first index of the second tensor. Some examplesarev · u = v i u i , (19.20)A · u = A ij u j e i , (19.21)A · B = A ik B kj e i e j , (19.22)C · A = C ijk A kl e i e j e l . (19.23)The inner product is the sum over all indices, for examplev : u = v i u i , (19.24)A : B = A ij B ij , (19.25)C : D = C ijkl D ijkl . (19.26)164

Martin Sandve AlnæsSome examples of the outer product arev ⊗ u = v i u j e i e j , (19.27)A ⊗ u = A ij u k e i e j e k , (19.28)A ⊗ B = A ij B kl e i e j e k e l (19.29)Other common tensor algebra operators are cross(u,v), transpose(A) (orA.T), tr(A), det(A), inv(A), cofac(A), dev(A), skew(A),and sym(A).Mostof these tensor algebra operators expect tensors without free indices. The detaileddefinitions of these operators are found in the manual.A set of common elementary functions operating on scalar expressions withoutfreeindicesareincluded,inparticularabs(f), pow(f, g), sqrt(f), exp(f),ln(f), sin(f), cos(f), and sign(f).19.4.4 Differential operatorsUFL implements derivatives w.r.t. three different kinds of variables. The mostused kind is spatial derivatives. Expressions can also be differentiated w.r.t.arbitraryuserdefinedvariables. Andthefinalkindofderivatives arederivativesofaformorfunctional w.r.t. thecoefficientsofaFunction. Formderivatives areexplained in Section 19.5.1.Note that derivatives are not computed immediately when declared. A discussionof how derivatives are computed is found in Section 19.7.Spatial derivativesBasic spatial derivatives ∂f∂x ican be expressed in two equivalent ways:df = Dx(f, i)df = f.dx(i)Here, dfrepresentsthederivativeof finthespatialdirection x i . Theindex icaneitherbe an integer, representing differentiation in onefixed spatial direction x i ,or an Index, representing differentiation in the direction of a free index. Thenotation f.dx(i)isintendedtomirrortheindexnotation f ,i ,whichisshorthandfor ∂f∂x i. Repeated indices imply summation, such that the divergence of a vectorcan be written v i,i , or v[i].dx(i).Several common compound spatial derivative operators are defined, namelydiv, grad, curl and rot (rot is a synonym for curl). The definition of theseoperators in UFL follow from the vector of partial derivatives∇ ≡ e k∂∂x k, (19.30)165

UFL: A Finite Element Form Languageand the definition of the dot product, outer product, and cross product. Hence,div(C) ≡ ∇ · C, (19.31)grad(C) ≡ ∇ ⊗ C, (19.32)curl(v) ≡ ∇ × v. (19.33)Notethattherearetwocommonwaystodefine gradand div. Thiswayofdefiningthese operators correspond to writing the convection term from, e.g., the Navier-Stokes equations aswhich is expressed in UFL asdot(w, grad(u))w · ∇u = (w · ∇)u = w · (∇u) = w i u j,i , (19.34)Anotherillustrativeexampleistheanisotropicdiffusiontermfrom,e.g.,thebidomainequations, which readsand is expressed in UFL asdot(A*grad(u), v)(A∇u) · v = A ij u ,j v i , (19.35)In other words, the divergence sums over the first index of its operand, and thegradient prepends an axis to the tensor shape of its operand. The above definitionof curl is only valid for 3D vector expressions. For 2D vector and scalarexpressions the definitions are:curl(u) ≡ u 1,0 − u 0,1 , (19.36)curl(f) ≡ f ,1 e 0 − f ,0 e 1 . (19.37)User defined variablesThesecondkindofdifferentiationvariablesareuser-definedvariables, whichcanrepresent arbitrary expressions. Automating derivatives w.r.t. arbitrary quantitiesisusefulforseveraltasks,fromdifferentiationofmateriallawstocomputingsensitivities. An arbitrary expression g can be assigned to a variable v. An expressionf defined as a function of v can be differentiated f w.r.t. v:v = g, (19.38)f = f(v), (19.39)h(v) = ∂f(v)∂v . (19.40)Setting g = sin(x 0 ) and f = e v2 , gives h = 2ve v2 = 2 sin(x 0 )e sin2 (x 0 ) , which can beimplemented as follows:166

Martin Sandve Alnæsg = sin(cell.x[0])v = variable(g)f = exp(v**2)h = diff(f, v)Try running this code in a Python session and print the expressions. The resultis>>> print vvar0(sin((x)[0]))>>> print hd/d[var0(sin((x)[0]))] (exp((var0(sin((x)[0]))) ** 2))Note that the variable has a label 0 (“var0”), and that h still represents the abstractderivative. Section 19.7 explains how derivatives are computed.19.4.5 Other operatorsA few operators are provided for the implementation of discontinuous Galerkinmethods. The basic concept is restricting an expression to the positive or negativeside of an interior facet, which is expressed simply as v(’+’) or v(’-’)respectively. On top of this, the operators avg and jump are implemented, definedasavg(v) = 1 2 (v+ + v − ), (19.41)jump(v) = v + − v − . (19.42)Theseoperatorscan onlybe usedwhenintegratingovertheinteriorfacets(*dS).The only control flow construct included in UFL is conditional expressions. Aconditional expression takes on one of two values depending on the result of aboolean logic expression. The syntax for this isf = conditional(condition, true_value, false_value)which is interpreted asf ={t, if condition is true,f, otherwise.(19.43)The condition can be one of167

UFL: A Finite Element Form Language• lt(a, b) ↔ (a < b)• le(a, b) ↔ (a ≤ b)• eq(a, b) ↔ (a = b)• gt(a, b) ↔ (a > b)• ge(a, b) ↔ (a ≥ b)• ne(a, b) ↔ (a ≠ b)19.5 Form operatorsOnce you have defined some forms, there are several ways to compute relatedforms from them. While operators in the previous section are used to define expressions,the operators discussed in this section are applied to forms, producingnew forms. Form operators can both make form definitions more compact andreduce the chances of bugs since changes in the original form will propagate toforms computed from it automatically. These form operators can be combinedarbitrarily; given a semi-linear form only a few lines are needed to compute theaction of the adjoint of the Jacobi. Since these computations are done prior toprocessing by the form compilers, there is no overhead at run-time.19.5.1 Differentiating formsThe form operator derivative declares the derivative of a form w.r.t. coefficientsof a discrete function (Function). This functionality can be used forexample to linearize your nonlinear residual equation (linear form) automaticallyfor use with the Newton-Raphson method. It can also be applied multipletimes, which is useful to derive a linear system from a convex functional, in orderto find the function that minimizes the functional. For non-trivial equationssuch expressions can be tedious to calculate by hand. Other areas in which thisfeature can be useful include optimal control and inverse methods, as well assensitivity analysis.In its simplest form, the declaration of the derivative of a form L w.r.t. thecoefficients of a function w readsa = derivative(L, w, u)The form a depends on an additional basis function argument u, which mustbe in the same finite element space as the function w. If the last argument isomitted, a new basis function argument is created.Let us step through an example of how to apply derivative twice to a functionalto derive a linear system. In the following, V h is a finite element spacewith some basis, w is a function in V h , and f is a functional we want to minimize.168

Martin Sandve AlnæsDerived from f is a linear form F, and a bilinear form J.For a concrete functional f(w) = ∫ Ωv = TestFunction(element)u = TrialFunction(element)w = Function(element)f = 0.5 * w**2 * dxF = derivative(f, w, v)J = derivative(F, w, u)V h = span {φ k } , (19.44)|V h |∑w(x) = w k φ k (x), (19.45)k=1f : V h → R, (19.46)F(φ i ; w) = ∂ f(w),∂w i(19.47)J(φ i , φ j ; w) = ∂ F(φ i ; w).∂w j(19.48)12 w2 dx, we can implement this asThis code declares two forms F and J. The linear form F represents the standardload vector w*v*dx and the bilinear form J represents the mass matrix u*v*dx.Derivatives can also be defined w.r.t. coefficients of a function in a mixedfinite element space. Consider the Harmonic map equations derived from thefunctional∫f(x, λ) = ∇x : ∇x + λx · x dx, (19.49)Ωwhere x is a function in a vector finite element space Vh d and λ is a function ina scalar finite element space V h . The linear and bilinear forms derived from thefunctional in Equation 19.49 have basis function arguments in the mixed spaceVh d + V h. The implementation of these forms with automatic linearization readsVx = VectorElement("CG", triangle, 1)Vy = FiniteElement("CG", triangle, 1)u = Function(Vx + Vy)x, y = split(u)f = inner(grad(x), grad(x))*dx + y*dot(x,x)*dxF = derivative(f, u)J = derivative(F, u)169

UFL: A Finite Element Form LanguageNote that the functional is expressed in terms of the subfunctions x and y, whiletheargumentto derivativemustbethesingle mixedfunction u. Inthisexamplethe basis function arguments to derivative are omitted and thus providedautomatically in the right function spaces.Note that in computing derivatives of forms, we have assumed that∂∂w k∫Ω∫I dx =Ω∂∂w kI dx, (19.50)or in particular that the domain Ω is independent of w. Furthermore, note thatthere is no restriction on the choice of element in this framework, in particulararbitrary mixed elements are supported.19.5.2 AdjointAnother form operator is the adjoint a ∗ of a bilinear form a, defined as a ∗ (u, v) =a(v, u), which is similar to taking the transpose of the assembled sparse matrix.In UFL this is implementedsimply by swapping the test and trial functions, andcan be written:a = inner(M*grad(u), grad(v))*dxad = adjoint(a)which corresponds to∫∫a(M; v, u) = (M∇u) : ∇v dx = M ik u j,k v j,i dx, (19.51)Ω∫Ωa ∗ (M; v, u) = a(M; u, v) = (M∇v) : ∇u dx. (19.52)This automatic transformation is particularly useful if we need the adjoint ofnonsymmetricbilinear formscomputedusing derivative,since theexplicit expressionsfor a are not at hand. Several of the form operators below are mostuseful when used in conjunction with derivative.19.5.3 Replacing functionsEvaluating a form with new definitions of form arguments can be done by replacingterminal objects with other values. Lets say you have defined a form Lthat depends on some functions f and g. You can then specialize the form byreplacing these functions with other functions or fixed values, such as∫L(f, g; v) = (f 2 /(2g))v dx, (19.53)Ω∫L 2 (f, g; v) = L(g, 3; v) = (g 2 /6)v dx. (19.54)This feature is implemented with replace, as illustrated in this case:170ΩΩ

Martin Sandve AlnæsL = f**2 / (2*g) * v * dxL2 = replace(L, { f: g, g: 3})L3 = g**2 / 6 * v * dxHere L2 and L3 represents exactly the same form. Since they depend only on g,the code generated for these forms can be more efficient.19.5.4 ActionSparse matrix-vector multiplication is an important operation in PDE solver applications.In some cases the matrix is not needed explicitly, only the action ofthe matrix on a vector, the result of the matrix-vector multiplication. You canassemble the action of the matrix on a vector directly by defining a linear formfor the action of a bilinear form on a function, simply writing L = action(a,w) or L = a*w, with a any bilinear form and w being any Function defined onthe same finite element as the trial function in a.19.5.5 Splitting a systemIf you prefer to write your PDEs with all terms on one side such asa(v, u) − L(v) = 0, (19.55)youcandeclareformswithbothlinearandbilineartermsandsplittheequationsafterwards:pde = u*v*dx - f*v*dxa, L = system(pde)Here system is used to split the PDE into its bilinear and linear parts. Alternatively,lhs and rhs can be used to obtain the two parts separately.19.5.6 Computing the sensitivity of a functionIf you have found the solution u to Equation (19.55), and u depends on someconstant scalar value c, you can compute the sensitivity of u w.r.t. changes in c.If u is represented by a coefficient vector x that is the solution to the algebraiclinear system Ax = b, the coefficients of ∂u ∂xare . Applying ∂ to Ax = b and∂c ∂c ∂cusing the chain rule, we can writeA ∂x∂c = ∂b∂c − ∂A x, (19.56)∂c171

UFL: A Finite Element Form Languageand thus ∂x can be found by solving the same algebraic linear system used to∂ccompute x, only with a different right hand side. The linear form correspondingto the right hand side of Equation (19.56) can be writtenu = Function(element)sL = diff(L, c) - action(diff(a, c), u)or you can use the equivalent form transformationsL = sensitivity_rhs(a, u, L, c)Note that the solution u must be represented by a Function, while u in a(v, u)is represented by a BasisFunction.19.6 Expression representation19.6.1 The structure of an expressionMostoftheUFLimplementationisconcernedwithexpressing,representing,andmanipulating expressions. To explain and reason about expression representationsand algorithms operating on them, we need an abstract notation for thestructure of an expression. UFL expressions are representations of programs,and the notation should allow us to see this connection without the burden ofimplementation details.The most basic UFL expressions are expressions with no dependencies onother expressions, called terminals. Other expressions are the result of applyingsome operator to one or more existing expressions. All expressions are immutable;once constructed an expression will never change. Manipulating anexpression always results in a new expression being created.Consider an arbitrary (non-terminal) expression z. This expression dependson a set of terminal values {t i }, and is computed using a set of operators {f i }.If each subexpression of z is labeled with an integer, an abstract program canbe written to compute z by computing a sequence of subexpressions 〈y i 〉 n i=1 andsetting z = y n . Algorithm 6 shows such a program.Algorithm 6 Program to compute an expression zfor i = 1, . . .,m:y i = t i = terminal expressionfor i = m + 1, . . ., n:y i = f i (〈y j 〉 j∈Ii)z = y n172

TerminalExprOperatorBasisFunction ... Inner ...Martin Sandve AlnæsInnerGradGradBasisFunction(element, 0)BasisFunction(elemeFigure 19.2: Expression class hierarchy.Figure 19.3: Expression tree for ∇u :∇v.Eachterminalexpression y i = t i isaliteralconstantorinputargumentstotheprogram. A non-terminal subexpression y i is the result of applying an operatorf i to a sequence of previously computed expressions 〈y j 〉 j∈Ii, where I i is a set ofexpression labels. Note that the order in which subexpressions are computedcan be arbitrarily chosen, except that we require j < i ∀j ∈ I i , such that alldependencies of a subexpression y i has been computed before y i . In particular,all terminals are numbered first in this algorithm for notational convenienceonly.The program can be represented as a graph, where each expression y i correspondsto a graph vertex and each direct dependency between two expressions isa graph edge. More formally,G = (V, E), (19.57)V = 〈v i 〉 n i=1 = 〈y i〉 n i=1 , (19.58)n⋃E = {e i } = {(i, j)∀j ∈ I i }. (19.59)i=1This graph is clearly directed, since dependencies have a direction. It is acyclic,since an expression can only be constructed from existing expressions and neverbemodified. Thuswecan say thatan UFL expression representsaprogram, andcanberepresentedusingadirectedacyclic graph(DAG).TherearetwowaysthisDAG can be represented in UFL, a linked representation called the expressiontree, and a linearized representation called the computational graph.19.6.2 Tree representation◮ Editor note: Redraw these figures in Inkscape.An expression is usually represented as an expression tree. Each subexpressionis represented by a tree node, which is the root of a tree of its own. The173

UFL: A Finite Element Form Languageleaves of the tree are terminal expressions, and operators have their operandsas children. An expression tree for the stiffness term ∇u : ∇v is illustrated inFigure 19.3. The terminals u and v have no children, and the term ∇u is itselfrepresented by a tree with two nodes. The names in this figure, Grad, Innerand BasisFunction, reflect the names of the classes used in UFL to representthe expression nodes. Taking the gradient of an expression with grad(u) givesan expression representation Grad(u), and inner(a, b) gives an expressionrepresentation Inner(a, b). In general, each expression node is an instanceof some subclass of Expr. The class Expr is the superclass of a hierarchy containingall terminaltypesandoperatortypesUFL supports. Exprhastwodirectsubclasses, Terminal and Operator, as illustrated in Figure 19.2.Each expression node represents a single vertex v i in the DAG. Recall fromAlgorithm 6 that non-terminals are expressions y i = f i (〈y j 〉 j∈Ii). The operatorf i is represented by the class of the expression node, while the expression y i isrepresented by the instance of this class. The edges of the DAG is not storedexplicitly in the tree representation. However, from an expression node representingthe vertex v i , a tuple with the vertices 〈y j 〉 j∈Iican be obtained by callingyi.operands(). Theseexpressionnodesrepresentthegraphverticesthathaveedges pointing to them from y i . Note that this generalizes to terminals wherethere are no outgoing edges and t.operands() returns an empty tuple.19.6.3 Expression node propertiesAny expression node e (an Expr instance) has certain generic properties, andthe most important ones will be explained here. Above it was mentioned thate.operands() returns a tuple with the child nodes. Any expression node canbe reconstructed with modified operands using e.reconstruct(operands),where operandsisatupleofexpressionnodes. Theinvariant e.reconstruct(e.operands== eshouldalways hold. Thisfunctionisrequiredbecause expression nodesareimmutable,theyshouldneverbemodified. Theimmutablepropertyensuresthatexpression nodes can be reused and shared between expressions without side effectsin other parts of a program.◮ Editor note: Stick ugly text sticking out in margin.In Section 19.4.2 the tensor algebra and index notation capabilities of UFLwas discussed. Expressions can be scalar or tensor-valued, with arbitrary rankand shape. Therefore, each expression node has a value shape e.shape(),which is a tuple of integers with the dimensions in each tensor axis. Scalar expressionshave shape (). Anotherimportant property is the set of free indices inan expression, obtained as a tuple using e.free indices(). Although the freeindices have no ordering, they are represented with a tuple of Index instancesfor simplicity. Thus the ordering within the tuple carries no meaning.UFL expressions are referentially transparent with some exceptions. Ref-174

Martin Sandve Alnæserential transparency means that a subexpression can be replaced by anotherrepresentation of its value without changing the meaning of the expression. Akey point here is that the value of an expression in this context includes the tensorshape and set of free indices. Another important point is that the derivativeof a function f(v) in a point, f ′ (v)| v=g , depends on function values in the vicinityof v = g. The effect of this dependency is that operator types matter whendifferentiating, not only the current value of the differentiation variable. In particular,a Variable cannot be replaced by the expression it represents, becausediff depends on the Variable instance and not the expression it has the valueof. Similarly, replacing a Function with some value will change the meaning ofan expression that contains derivatives w.r.t. function coefficients.The following example illustrate this issue.e = 0v = variable(e)f = sin(v)g = diff(f, v)Here v is a variable that takes on the value 0, but sin(v) cannot be simplifiedto 0 since the derivative of f then would be 0. The correct result here is g =cos(v).19.6.4 Linearized graph representationA linearized representation of the DAG is useful for several internal algorithms,either to achieve a more convenient formulation of an algorithm or for improvedperformance. UFLincludestoolstobuildalinearizedrepresentationoftheDAG,the computational graph, from any expression tree. The computational graphG = V, E is a data structure based on flat arrays, directly mirroring the definitionof the graph in equations (19.57)-(19.59). This simple data structure makessome algorithms easier to implement or more efficient than the recursive treerepresentation. One array (Python list) V is used to store the vertices 〈v i 〉 n i=1 ofthe DAG. For each vertex v i an expression node y i is stored to represent it. Thusthe expression tree for each vertex is also directly available, since each expressionnode is the root of its own expression tree. The edges are stored in an arrayE with integer tuples (i,j) representing an edge from v i to v j , i.e. that v j is anoperand of v i . The graph is built using a post-order traversal, which guaranteesthat the vertices are ordered such that j < i∀j ∈ I i .From the edges E, related arrays can be computed efficiently; in particularthe vertex indices of dependencies of a vertex v i in both directions are useful:V out = 〈I i 〉 n i=1 ,V in = 〈{j|i ∈ I j }〉 n i=1(19.60)175

UFL: A Finite Element Form LanguageThese data structures can be easily constructed for any expression:G = Graph(expression)V, E = GVin = G.Vin()Vout = G.Vout()A nice property of the computational graph built by UFL is that no two verticeswill represent the same identical expression. During graph building, subexpressionsare inserted in a hash map (Python dict) to achieve this.Free indices in expression nodes can complicate the interpretation of the linearizedgraph when implementing some algorithms. One solution to that can betoapply expand indicesbeforeconstructingthegraph. Notehoweverthatfreeindices cannot be regained after expansion.19.6.5 PartitioningUFL is intended as a front-end for form compilers. Since the end goal is generationof code from expressions, some utilities are provided for the code generationprocess. In principle, correct code can be generated for an expression fromits computational graph simply by iterating over the vertices and generatingcode for each operation separately, basically mirroring Algorithm 6. However, agood form compiler should be able to produce better code. UFL provides utilitiesfor partitioning the computational graph into subgraphs (partitions) basedon dependencies of subexpressions, which enables quadrature based form compilersto easily place subexpressions inside the right sets of loops. The functionpartition implements this feature. Each partition is represented by a simplearray of vertex indices.19.7 Computing derivativesWhen a derivative expression is declared by the end-user of the form language,an expression node is constructed to represent it, but nothing is computed. Thetype of this expression node is a subclass of Derivative. Differential operatorscannot be expressed natively in a language such as C++. Before code canbe generated from the derivative expression, some kind of algorithm to evaluatederivatives must be applied. Computing exact derivatives is important, whichrules out approximations by divided differences. Several alternative algorithmsexist for computing exact derivatives. All relevant algorithms are based on thechainrulecombinedwithdifferentiationrulesforeachexpressionnodetype. Themain differences between the algorithms are in the extent of which subexpressionsare reused, and in the way subexpressions are accumulated.176

Martin Sandve AlnæsBelow, the differences and similarities between some of the simplest algorithmsare discussed. After the algorithm currently implemented in UFL hasbeenexplained, extensionstotensorandindexnotationandhigher orderderivativesare discussed. Finally, the section is closed with some remarks about thedifferentiation rules for terminal expressions.19.7.1 Relations to form compiler approachesBefore discussing the choice of algorithm for computing derivatives, let us conciderthe context in which the results will be used. Although UFL does not generatecode, some form compiler issues are relevant to this context.Mixing derivative computation into the code generation strategy ofeach formcompiler would lead to a significant duplication of implementation effort. Toseparateconcernsandkeepthecodemanageable, differentiation isimplementedas part of UFL in such a way that the form compilers are independent of thechosen differentiation strategy. Before expressions are interpreted by a formcompiler, differential operators should be evaluated such that the only operatorsleftarenon-differentialoperators 4 . Therefore,itisadvantageoustousethesamerepresentation for the evaluated derivative expressions and other expressions.The properties of each differentiation algorithm is strongly related to thestructure of the expression representation. However, UFL has no control overthe final expression representation used by the form compilers. The main differencebetween the current form compilers is the way in which expressionsare integrated. For large classes of equations, symbolic integration or a specializedtensor representation have proven highly efficient ways to evaluate elementtensors [?, ?, ?]. However, when applied to more complex equations, therun-time performance of both these approaches is beaten by code generated withquadrature loops [?, ?]. To apply symbolic differentiation, polynomials are expandedwhich destroys thestructure of theexpressions, gives potentialexponentialgrowth of expression sizes, and hides opportunities for subexpression reuse.Similarly, the tensor representation demands a canonical representation of theintegral expressions.In summary, both current non-quadrature form compiler approaches changethe structure of the expressions they get from UFL. This change makes the interactionbetween the differentiation algorithm and the form compiler approachhard to control. However, thiswill only become a problem for complex equations,in which case quadrature loop based code is more suitable. Code generationusing quadrature loops can more easily mirror the inherent structure of UFLexpressions.4 Anexceptionismadeforspatial derivativesofterminalswhichareunknowntoUFLbecausethey are provided by the form compilers.177

UFL: A Finite Element Form Language19.7.2 Approaches to computing derivativesAlgorithms for computing derivatives are designed with different end goals inmind. Symbolic Differentiation (SD) takes as input a single symbolic expressionand produces a new symbolic expression for the derivative of the input. AutomaticDifferentiation (AD) takes as input a program to compute a function andproduces a new program to compute the derivative of the function. Several variantsof AD algorithms exist, the two most common being Forward Mode AD andReverse Mode AD [?]. More advanced algorithms exist, and is an active researchtopic.is a symbolic expression, represented by an expression tree. But the expressiontree is a directed acyclic graph that represents a program to evaluatesaidexpression. ThusitseemsthelinebetweenSDandADbecomeslessdistinctin this context.Naively applied, SD can result in huge expressions, which can both require alot of memory during the computation and be highly inefficient if written to codedirectly. However, some illustrations of the inefficiency of symbolic differentiation,such as in [?], are based on computing closed form expressions of derivativesin some stand-alone computeralgebra system (CAS). Copying theresultinglarge expressions directly into a computer code can lead to very inefficient code.The compiler may not be able to detect common subexpressions, in particularif simplification and rewriting rules in the CAS has changed the structure ofsubexpressions with a potential for reuse.In general, AD is capable of handling algorithms that SD can not. A tool forapplying AD to a generic source code must handle many complications such assubroutines, global variables, arbitrary loops and branches [?, ?, ?]. Since thesupport for program flow constructs in UFL is very limited, the AD implementationin UFL will not run into such complications. In Section 19.7.3 the similaritybetween SD and forward mode AD in the context of UFL is explained in moredetail.19.7.3 Forward mode Automatic DifferentiationRecall Algorithm 6, which represents a program for computing an expression zfrom a set of terminal values {t i } and a set of elementary operations {f i }. Assumefor a moment that there are no differential operators among {f i }. Thealgorithm can then be extended tocompute thederivative dz , where v representsdva differentiation variable of any kind. This extension gives Algorithm 7.This way of extending a program to simultaneously compute the expressionz and its derivative dz is called forward mode automatic differentiation (AD).dvBy renaming y i and dy ito a new sequence of values 〈ŷj〉ˆn dv j=1, Algorithm 7 can berewritten as shown in Algorithm 8, which is isomorphic to Algorithm 6 (theyhave exactly the same structure).Since the program in Algorithm 6 can be represented as a DAG, and Algo-178

Martin Sandve AlnæsAlgorithm 7 Forward mode AD on Algorithm 6for i = 1, . . .,m:y i = t idy i= dt idv dvfor i = m + 1, . . ., n:y i = f i (〈y j 〉 j∈Ii)dy i= ∑ ∂f idv k∈I iz = y ndz= dyndv dvdy k∂y k dvAlgorithm 8 Program to compute dzdvfor i = 1, . . ., ˆm:ŷ i = ˆt ifor i = ˆm + 1, . . ., ˆn:ŷ i = ˆf i (〈ŷ j 〉 j∈ Î i)dz= dv ŷˆnproduced by forward mode ADrithm 8 is isomorphic to Algorithm 6, the program in Algorithm 8 can also berepresented as a DAG. Thus a program to compute dz can be represented by andvexpression tree built from terminal values and non-differential operators.The currently implemented algorithm for computing derivatives in UFL followsforward mode AD closely. Since the result is a new expression tree, thealgorithm can also be called symbolic differentiation. In this context, the differencesbetween the two are implementation details. To ensure that we can reuseexpressions properly, simplification rules in UFL avoids modifying the operandsofanoperator. Naturallyrepeatedpatternsintheexpressioncanthereforebedetectedeasilyby the form compilers. Efficient common subexpression eliminationcan then be implemented by placing subexpressions in a hash map. However,thereare simplifications such as 0 ∗f → 0 and 1 ∗f → f which simplify theresultof the differentiation algorithm automatically as it is being constructed. Thesesimplifications are crucial for the memory use during derivative computations,and the performance of the resulting program.19.7.4 Extensions to tensors and indexed expressionsSo far we have not considered derivatives of non-scalar expression and expressionswith free indices. This issue does not affect the overall algorithms, but itdoes affect the local derivative rules for each expression type.Consider the expression diff(A, B) with A and B matrix expressions. Themeaning of derivatives of tensors w.r.t. to tensors is easily defined via index179

UFL: A Finite Element Form Languagenotation, which is heavily used within the differentiation rules:dAdB = dA ijdB kle i ⊗ e j ⊗ e k ⊗ e l (19.61)Derivatives of subexpressions are frequently evaluated to literal constants.For indexed expressions, it is important that free indices are propagated correctlywiththederivatives.Therefore, differentiatedexpressionswillsometimesinclude literal constants annotated with free indices.There is one rare and tricky corner case when an index sum binds an index isuchasin (v i v i )andthederivative w.r.t. x i isattempted. Thesimplestexampleofthis is the expression (v i v i ) ,j , which has one free index j. If j is replaced by i, theexpression can still be well defined, but you would never write (v i v i ) ,i manually.If the expression in the parenthesis is defined in a variable e = v[i]*v[i],the expression e.dx(i) looks innocent. However, this will cause problems asderivatives (including the index i) are propagated up to terminals. If this case isencountered it will be detected and an error message will be triggered. To avoidit, simply use different index instances. In the future, this case may be handledby relabeling indices to change this expression into (v j v j ) ,i u i .19.7.5 Higher order derivativesA simple forward mode AD implementation such as Algorithm 7 only considersone differentiation variable. Higher order or nested differential operators mustalso be supported, with any combination of differentiation variables. A simpleexample illustrating such an expression can bea = ddx( ddx f(x) + 2 ddy g(x, y) ). (19.62)Considerations forimplementationsofnestedderivatives in a functional 5 frameworkhave been explored in several papers [?, ?, ?].In the current UFL implementation this is solved in a different fashion. ConsideringEquation (19.62), the approach is simply to compute the innermostderivatives d f(x) and d g(x, y) first, and then computing the outer derivatives.dx dyThisapproach ispossible because theresult ofaderivative computationisrepresentedas an expression tree just as any other expression. Mainly this approachwas chosen because it is simple to implement and easy to verify. Whether otherapproaches are faster has not been investigated. Furthermore, alternative ADalgorithmssuch asreverse modecan be experimentedwithin thefuturewithoutconcern for nested derivatives in the first implementations.An outer controller function apply ad handles the application of a singlevariable AD routine to an expression with possibly nested derivatives. The AD5 Functional as in functional languages.180

Martin Sandve Alnæsroutine is a function accepting a derivative expression node and returning anexpression wherethesingle variable derivative hasbeen computed. This routinecan be an implementation of Algorithm 8. The result of apply ad is mathematicallyequivalent to the input, but with no derivative expression nodes left 6 .Thefunction apply adworksbytraversingthetreerecursively inpost-order,discovering subtrees where the root represents a derivative, and applying theprovided AD routine to the derivative subtree. Since the children of the derivativenode has already been visited by apply ad, they are guaranteed to be freeof derivative expression nodes and the AD routine only needs to handle the casediscussed above with algorithms 7 and 8.The complexity of the ad routine should be O(n), with n being the size ofthe expression tree. The size of the derivative expression is proportional to theoriginal expression. If there are d derivative expression nodes in the expressiontree, the complexity of this algorithm is O(dn), since ad routine is applied tosubexpressions d times. As a result the worst case complexity of apply ad isO(n 2 ), but in practice d ≪ n. A recursive implementation of this algorithm isshown in Figure 19.4.def apply_ad(e, ad_routine):if isinstance(e, Terminal):return eops = [apply_ad(o, ad_routine) for o in e.operands()]e = e.reconstruct(*ops)if isinstance(e, Derivative):e = ad_routine(e)return eFigure 19.4: Simple implementation of recursive apply ad procedure.19.7.6 Basic differentiation rulesTo implement the algorithm descriptions above, we must implement differentiationrules for all expression node types. Derivatives of operators can be implementedas generic rules independent of the differentiation variable, and theseare well known and not mentioned here. Derivatives of terminals depend onthe differentiation variable type. Derivatives of literal constants are of coursealways zero, and only spatial derivatives of geometric quantities are non-zero.Since form arguments are unknown to UFL (they are provided externally by theform compilers), their spatial derivatives ( ∂φk∂x iand ∂wk∂x i) are considered input argumentsas well. In all derivative computations, the assumption is made that6 Except direct spatial derivatives of form arguments, but that is an implementation detail.181

UFL: A Finite Element Form Languageform coefficientshave no dependencies on thedifferentiation variable. Two morecases needs explaining, the user definedvariables and derivatives w.r.t. the coefficientsof a Function.If v is a Variable, then we define dt ≡ 0 for any terminal t. If v is scalardvvaluedthen dv ≡ 1. Furthermore,if Visatensorvalued Variable,itsderivativedvw.r.t. itself isdVdV = dV ijdV kle i ⊗ e j ⊗ e k ⊗ e l = δ ik δ jl e i ⊗ e j ⊗ e k ⊗ e l . (19.63)In addition, the derivative of a variable w.r.t. something else than itself equalsthe derivative of the expression it represents:v = g, (19.64)dvdz = dgdz . (19.65)Finally, we consider the operator derivative, which represents differentiationw.r.t. all coefficients {w k } of a function w. Consider an object elementwhich represents a finite element space V h with a basis {φ k }. Next consider formarguments defined in this space:v = BasisFunction(element)w = Function(element)The BasisFunction instance v represents any v ∈ {φ k }, while the Function instancew represents the sumw = ∑ kw k φ k (x). (19.66)The derivative of w w.r.t. any w k is the corresponding basis function in V h ,which can be represented by v, since∂w∂w k= φ k , k = 1, . . ., |V h |, (19.67)v ∈ 〈φ k 〉 |V h|k=1 = 〈 ∂w∂w k〉 |Vh |k=1(19.68). (19.69)Note that v should be a basis function instance that has not already been usedin the form.182

Martin Sandve Alnæs19.8 AlgorithmsIn this section, some central algorithms and key implementation issues are discussed,much of which relates to the Python programming language. Thus, thissection is mainly intended for developers and others who need to relate to UFLon a technical level.19.8.1 Effective tree traversal in PythonApplyingsomeactiontoallnodesinatreeisnaturallyexpressedusingrecursion:def walk(expression, pre_action, post_action):pre_action(expression)for o in expression.operands():walk(o)post_action(expression)Thisimplementationsimultaneouslycoverspre-ordertraversal,whereeachnodeis visited before its children, and post-order traversal, where each node is visitedafter its children.A more “pythonic” way to implement iteration over a collection of nodes isusing generators. A minimal implementation of this could bedef post_traversal(root):for o in root.operands():yield post_traversal(o)yield rootwhichthenenablesthenaturalPythonsyntaxforiterationoverexpressionnodes:for e in post_traversal(expression):post_action(e)Forefficiency,theactualimplementationof post traversalinUFLisnotusingrecursion. Function calls are very expensive in Python, which makes the nonrecursiveimplementation an order of magnitude faster than the above.19.8.2 Type based function dispatch in Python◮ Editor note: Make code fit in box.A common task in both symbolic computing and compiler implementation isthe selection of some operation based on the type of an expression node. For a183

UFL: A Finite Element Form Languageclass ExampleFunction(MultiFunction):def __init__(self):MultiFunction.__init__(self)def terminal(self, expression):return "Got a Terminal subtype %s." % type(expression)def operator(self, expression):return "Got an Operator subtype %s." % type(expression)def basis_function(self, expression):return "Got a BasisFunction."def sum(self, expression):return "Got a Sum."m = ExampleFunction()cell = triangleelement = FiniteElement("CG", cell, 1)x = cell.xprint m(BasisFunction(element))print m(x)print m(x[0] + x[1])print m(x[0] * x[1])Figure 19.5: Example declaration and use of a multifunctionselected few operations, this is done using overloading of functions in the subclassesof Expr, but this is not suitable for all operations.In many cases type-specific operations must be implemented together in thealgorithm instead of distributed across class definitions. One way to implementtype based operation selection is to use a type switch, or a sequence of if-testssuch as this:if isinstance(expression, IntValue):result = int_operation(expression)elif isinstance(expression, Sum):result = sum_operation(expression)# etc.There are several problems with this approach, one of which is efficiency when184

Martin Sandve Alnæsthere are many types to check. A type based function dispatch mechanism withefficiency independent of the number of types is implemented as an alternativethrough the class MultiFunction. The underlying mechanism is a dict lookup(which is O(1)) based on the type of the input argument, followed by a call tothe function found in the dict. The lookup table is built in the MultiFunctionconstructor. Functions to insert in the table are discovered automatically usingthe introspection capabilites of Python.A multifunction is declared as a subclass of MultiFunction. For each typethat should be handled particularly, a member function is declared in the subclass.The Expr classes use the CamelCaps naming convention, which is automaticallyconvertedtounderscore notationforcorrespondingfunctionnames,such as BasisFunction and basis function. If a handler function is not declaredfor a type, the closest superclass handler function is used instead. Notethat the MultiFunction implementation is specialized to types in the Exprclass hierarchy. The declaration and use of a multifunction is illustrated in Figure19.5. Note that basis function and sum will handle instances of the exacttypes BasisFunctionand Sum, while terminaland operatorwill handle thetypes SpatialCoordinateand Product since they have no specific handlers.19.8.3 Implementing expression transformationsMany transformations of expressions can be implemented recursively with sometype-specific operation applied to each expression node. Examples of operationsare converting an expression node to a string representation, an expression representationusing an symbolic external library, or an UFL representation withsome different properties. A simple variant of this pattern can be implementedusing a multifunction to represent the type-specific operation:def apply(e, multifunction):ops = [apply(o, multifunction) for o in e.operands()]return multifunction(e, *ops)The basic idea is as follows. Given an expression node e, begin with applyingthe transformation to each child node. Then return the result of some operationspecialized according to the type of e, using the already transformed children asinput.The Transformer class implements this pattern. Defining a new algorithmusing this pattern involves declaring a Transformer subclass, and implementingthe type specific operations as member functions of this class just as withMultiFunction. The difference is that member functions take one additionalargument for each operand of the expression node. The transformed child nodesare supplied as these additional arguments. The following code replaces terminalobjectswithobjectsfoundinadictmapping,andreconstructsoperatorswith185

UFL: A Finite Element Form Languagethe transformed expression trees. The algorithm is applied to an expression bycalling the function visit, named after the similar Visitor pattern.class Replacer(Transformer):def __init__(self, mapping):Transformer.__init__(self)self.mapping = mappingdef operator(self, e, *ops):return e.reconstruct(*ops)def terminal(self, e):return self.mapping.get(e, e)f = Constant(triangle)r = Replacer({f: f**2})g = r.visit(2*f)After running this code the result is g = 2f 2 . The actual implementation of thereplace function is similar to this code.In some cases, child nodes should not be visited before their parent node.This distinction is easily expressed using Transformer, simply by omitting themember function arguments for the transformed operands. See the source codefor many examples of algorithms using this pattern.19.8.4 Important transformationsThere are many ways in which expression representations can be manipulated.Here, we describe a few particularly important transformations. Note that eachof these algorithms removes some abstractions, and hence may remove someopportunities for analysis or optimization.Some operators in UFL are termed “compound” operators, meaning they canbe represented by other elementary operators. Try defining an expression e =inner(grad(u), grad(v)), and print repr(e). As you will see, the representationof e is Inner(Grad(u), Grad(v))(with some moredetails for u andv). This way the input expressions are easier to recognize in the representation,and rendering of expressions to for example L A TEX format can show the originalcompound operators as written by the end-user.However, since many algorithms must implement actions for each operatortype, the function expand compounds is used to replace all expression nodesof “compound” types with equivalent expressions using basic types. When thisoperation is applied to the input forms from the user, algorithms in both UFLand the form compilers can still be written purely in terms of basic operators.186

Martin Sandve AlnæsAnother important transformation is expand derivatives, which appliesautomatic differentiation to expressions, recursively and for all kinds of derivatives.The end result is that most derivatives are evaluated, and the only derivativeoperator types left in the expression tree applies to terminals. The preconditionfor this algorithm is that expand compounds has been applied.Index notation and the IndexSum expression node type complicate interpretationof an expression tree in some contexts, since free indices in its summandexpressionwill take on multiple values. Insome cases, the transformationexpand indices comes in handy, the end result of which is that there are nofree indices left in the expression. The precondition for this algorithm is thatexpand compounds and expand derivatives have been applied.19.8.5 Evaluating expressionsEven though UFL expressions are intended to be compiled by form compilers,it can be useful to evaluate them to floating point values directly. In particular,this makes testing and debugging of UFL much easier, and is used extensivelyin the unit tests. To evaluate an UFL expression, values of form arguments andgeometric quantities must be specified. Expressions depending only on spatialcoordinates can be evaluated by passing a tuple with the coordinates to the calloperator. The following code from an interactive Python session shows the syntax:>>> cell = triangle>>> x = cell.x>>> e = x[0]+x[1]>>> print e((0.5,0.7))1.2Other terminals can be specified using a dictionary that maps from terminalinstances to values. This code extends the above code with a mapping:c = Constant(cell)e = c*(x[0]+x[1])print e((0.5,0.7), { c: 10 })If functions and basis functions depend on the spatial coordinates, the mappingcan specify a Python callable instead of a literal constant. The callable musttake the spatial coordinates as input and return a floating point value. If thefunction being mapped is a vector function, the callable must return a tuple ofvalues instead. These extensions can be seen in the following code:element = VectorElement("CG", cell, 1)f = Function(element)187

UFL: A Finite Element Form Languagee = c*(f[0] + f[1])def fh(x):return (x[0], x[1])print e((0.5,0.7), { c: 10, f: fh })To use expression evaluation for validating that the derivative computations arecorrect, spatial derivatives of form arguments can also be specified. The callablemust then take a second argument which is called with a tuple of integers specifyingthe spatial directions in which to differentiate. A final example code computingg 2 + g 2 ,0 + g2 ,1 for g = x 0x 1 is shown below.element = FiniteElement("CG", cell, 1)g = Function(element)e = g**2 + g.dx(0)**2 + g.dx(1)**2def gh(x, der=()):if der == (): return x[0]*x[1]if der == (0,): return x[1]if der == (1,): return x[0]print e((2, 3), { g: gh })19.8.6 Viewing expressionsExpressions can be formatted in various ways for inspection, which is particularlyuseful while debugging. The Python built in string conversion operatorstr(e) provides a compact human readable string. If you type print ein an interactive Python session, str(e) is shown. Another Python built instring operator is repr(e). UFL implements repr correctly such that e ==eval(repr(e)) for any expression e. The string repr(e) reflects all the exactrepresentation types used in an expression, and can therefore be useful fordebugging. Anotherformatting function is tree format(e),which produces anindented multi-line string that shows the tree structure of an expression clearly,as opposed to repr which can return quite long and hard to read strings. Informationabout formatting of expressions as L A TEX and the dot graph visualizationformat can be found in the manual.19.9 Implementation issues19.9.1 Python as a basis for a domain specific languageMany of the implementation details detailed in this section are influenced bythe initial choice of implementing UFL as an embedded language in Python.188

Martin Sandve AlnæsTherefore some words about why Python is suitable for this, and why not, areappropriate here.Python provides a simple syntax that is often said to be close to pseudo-code.This is a good starting point for a domain specific language. Object orientationand operator overloading is well supported, and this is fundamental to the designof UFL. The functional programming features of Python (such as generatorexpressions) are useful in the implementation of algorithms and form compilers.The built-in data structures list, dict and set play a central role in fastimplementations of scalable algorithms.There is one problem with operator overloading in Python, and that is thecomparison operators. The problem stemsfrom the fact that eq or cmp areusedbythebuilt-in datastructuresdict andsettocompare keys, meaningthat a== b must return a boolean value for Expr to be used as keys. The result is thateq can not be overloaded to return some Expr type representation such asEquals(a, b)forlaterprocessing byformcompilers. Theotherproblem isthatandand orcannot beoverloaded, andthereforecannotbe usedin conditionalexpressions. There are good reasons for these design choices in Python. Thisconflict is the reason for the somewhat non-intuitive design of the comparisonoperators in UFL.19.9.2 Ensuring unique form signaturesThe form compilers need to compute a unique signature of each form for use in acache system to avoid recompilations. A convenient way to define a signature isusing repr(form),since the definition of this in Python is eval(repr(form))== form. Therefore repr is implemented for all Expr subclasses.Some forms are equivalent even though their representation is not exactlythe same. UFL does not use a truly canonical form for its expressions, but takessome measures to ensure that trivially equivalent forms are recognized as such.Some of the types in the Expr class hierarchy (subclasses of Counted), hasa global counter to identify the order in which they were created. This counteris used by form arguments (both BasisFunction and Function) to identifytheir relative ordering in the argument list of the form. Other counted types areIndex and Label, which only use the counter as a unique identifier. Algorithmsareimplementedforrenumberingofall Countedtypessuch thatallcountsstartfrom 0.Inaddition,someoperatortypessuchas Sumand Productmaintainsasortedlist of operands such that a+b and b+a are both represented as Sum(a, b).The numbering of indices does not affect this ordering because a renumbering ofthe indices would lead to a new ordering which would lead to a different indexrenumbering if applied again. The operand sorting and renumbering combinedensure that the signature ofequal formswill stay the same. To get the signaturewithrenumberingapplied, use repr(form.form data().form). Notethatthe189

UFL: A Finite Element Form Languagerepresentation, and thus the signature, of a form may change with versions ofUFL.19.9.3 Efficiency considerationsBy writing UFL in Python, we clearly do not put peak performance as a first priority.Iftheformcompilationprocesscanblendintotheapplicationbuildprocess,the performance is sufficient. We do, however, care about scaling performanceto handle complicated equations efficiently, and therefore about the asymptoticcomplexity of the algorithms we use.To write clear and efficient algorithms in Python, it is important to use thebuilt in data structures correctly. These data structures include in particularlist, dict and set. CPython [?], the reference implementation of Python, implementsthe data structure list as an array, which means append, and pop,andrandomreadorwriteaccessare all O(1)operations. Randominsertion, however,is O(n). Both dict and set are implemented as hash maps, the latter simplywith no value associated with the keys. In a hash map, random read, write,insertion and deletion of items are all O(1) operations, as long as the key typesimplement hash and eq efficiently. Thus to enjoy efficient use of thesecontainers, all Expr subclasses must implement these two special functions efficiently.The dict data structure is used extensively by the Python language, andtherefore particular attention has been given to make it efficient [?].19.10 Future directionsMany additional features can be introduced to UFL. Which features are addedwill depend on the needs of FEniCS users and developers. Some features canbe implemented in UFL alone, while other features will require updates to otherparts of the FEniCS project.Improvements to finite element declarations is likely easy to do in UFL. Theadded complexity will mostly be in the form compilers. Among the current suggestionsarespace-timeelementsandrelatedtimederivatives,andenrichmentoffinite element spaces. Additional geometry mappings and finite element spaceswith non-uniform cell types are also possible extensions.Additional operators can be added to make the language more expressive.Someoperatorsareeasytoaddbecausetheirimplementationonlyaffectsasmallpart of the code. More compound operators that can be expressed using elementaryoperations is easy to add. Additional special functions are easy to add aswell, as long as their derivatives are known. Other features may require morethoroughdesignconsiderations,suchassupportforcomplexnumberswhichmayaffect many parts of the code.190

Martin Sandve AlnæsUser friendly notation and support for rapid development are core values inthe design of UFL. Having a notation close to the mathematical abstractionsallows expression of particular ideas more easily, which can reduce the probabilityof bugs in user code. However, the notion of metaprogramming and codegeneration adds another layer of abstraction which can make understanding theframework moredifficult forend-users. Gooderrorchecks everywhere are thereforevery important, to detect user errors as close as possible to the user input.The error messages, documentation, and unit test suite should be improved tohelp avoid frequently repeated errors and misunderstandings among new users.SeveralalgorithmsinUFLcan probably beoptimizedifbottlenecksarefoundas more complicated applications are attempted. The focus in the developmenthas not been on achieving peak performance, which is not important in a toollike UFL.To support form compiler improvements, algorithms and utilities for generatingbetter code more efficiently can be implemented in UFL. In this area, moreworkonalternative automaticdifferentiation algorithms[?,?]canbeuseful. Anotherpossibility for code improvement is operation scheduling, or reordering ofthe vertices of a graph partition to improve the efficiency of the generated codeby better use of hardware cache and registers. Since modern C++ compilers arequite good at optimizing low level code, the focus should be on high level optimizationswhen considering potential code improvement in UFL and the formcompilers. At the time of writing, operation scheduling is not implemented inUFL, and the value of implementing such an operation is an open question.However, results from [?] indicates that a high level scheduling algorithm couldimprove the efficiency of the generated code.To summarize, UFL brings important improvements to the FEniCS framework:a richer form language, automatic differentiation and improved form compilerefficiency. These are useful features in rapid development of applicationsfor efficiently solving partial differential equations. UFL improves upon the Automationof Discretization that has been the core feature of this framework, andadds Automation of Linearization. In conclusion, UFL brings FEniCS one stepcloser to its overall goal Automation of Mathematical Modeling.19.11 AcknowledgementsThisworkhasbeensupportedbytheNorwegianResearchCouncil(grant162730)and Simula Research Laboratory. I wish to thank everyone who has helped improvingUFL with suggestions and testing, in particular Anders Logg, KristianØlgaard, Garth Wells, and Harish Narayanan. Both Kent-André Mardal andMarieRognesperformedcriticalreviewswhichgreatlyimprovedthismanuscript.191

CHAPTER20Unicorn: A Unified Continuum Mechanics SolverBy Johan Hoffman, Johan Jansson, Niclas Jansson and Murtazo NazarovChapter ref: [hoffman-2]Unicorn is solver technology (models, methods, algorithms and software implementations)targeting simulation of realistic continuum mechanics applications,suchasdrag/liftcomputationforfixedorflexibleobjects(fluid-structureinteraction)in turbulent incompressible or compressible flow (airplane/bird flight,car aerodynamics). The basis for Unicorn is Unified Continuum (UC) modeling,where we define conservation equations for mass, momentum and energy overthe whole domain as one continuum, together with a Cauchy stress and phasevariable as data for defining material properties and constitutive equation. Forthe discretization we use a stabilized adaptive finite element method which werefer to as General Galerkin (G2), which has been shown to accurately computequantities of interest in both laminar and turbulent flow [?, ?, ?, ?, ?, ?], wherethe methodology includes deforming geometries with an Arbitrary Lagrangian-Eulerian (ALE) discretization [?, ?].This chapter provides a description of the technology in Unicorn focusing onefficient and general algorithms and software implementation of the UC conceptand the adaptive G2 discretization. We describe how Unicorn fits into theFEniCS framework, how it interfaces to other FEniCS components (FIAT, FFC,DOLFIN) and what interfaces and functionality Unicorn provides itself and howthe implementation is designed. We also give application examples in incompressibleturbulent flow, fluid-structure interaction and compressible flow for illustration.Unicorn realizes automated computational modeling in the form of tensorassembly, time-stepping, adaptive fixed-point iteration for solving discrete sys-193

Unicorn: A Unified Continuum Mechanics Solvertems, duality-based adaptive error control, mesh adaptivity by local cell operations(split,collapse,swap)andcellqualityoptimization(elasticmeshmoothing).We also describe the implementation of key concepts for efficient computationof large-scale turbulent flow problems: friction boundary conditions and parallelizationof tensor assembly and mesh refinement.20.1 Unified Continuum modelingWedefineanincompressible unifiedcontinuummodelinafixedEulercoordinatesystem consisting of:• conservation of mass• conservation of momentum• phase convection equation• constitutive equations for stress as datawherethe stress is the Cauchy (laboratory) stress and the phase variable is usedto define material data such as constitutive equation for the stress and materialparameters. Note that in this continuum description the coordinate system isfixed (Euler), and a phase function (marker) is convected according to the phaseconvection equation.We start with conservation of mass, momentum and energy, together with aconvection equation for a phase function θ over a space-time domain Q = [Ω ×[0, T]] with Ω an open domain in R 3 with boundary Γ:D t ρ + D xj (u j ρ) = 0 (Mass conservation)D t m i + D xj (u j m i ) = D xj σ i (Momentum conservation)D t e + D xj (u j e) = D xj σ i u i (Energy conservation)D t θ + D xj u j θ = 0 (Phase convection equation)(20.1)together with initial and boundary conditions. We can then pose constitutiverelations between the constitutive (Cauchy) stress component σ and other variablessuch as the velocity u.We define incompressibility as:D t ρ + u j D xj ρ = 0which together with mass and momentum conservation gives:ρ(D t u i + u j D j u i ) = D xj σ ijD xj u j = 0194

Johan Hoffman, Johan Jansson, Niclas Jansson and Murtazo Nazarovwhere now the energy equation is decoupled and we can omit it.We decompose the total stress into constitutive and forcing stresses:D xj σ ij = D xj σ ij + D xj σ f ij = D x jσ ij + f iSummarizing, we end up with the incompressible UC formulation:ρ(D t u i + u j D xj u i ) = D xj σ ij + f iD xj u j = 0D t θ + D xj u j θ = 0(20.2)The UC modeling framework is simple and compact, close to the formulation ofthe original conservation equations, without mappings between coordinate systems.This allows simple manipulation and processing for error estimation andimplementation. It is also general, we can choose the constitutive equations tomodel simple or complex solids and fluids, possibly both in interaction, with individualparameters.20.1.1 Automated computational modeling and software designOne key design choice of UC modeling is to define the Cauchy stress σ as data,which means the conservation equations for momentum and mass are fixed andexplicitly defined regardless of the choice of constitutive equation. This givesa generality in software design, where a modification of constitutive equationimpacts the implementation of the constitutive equation, but not the implementationof the conservation equations.20.2 Space-time General Galerkin discretizationThe General Galerkin (G2) methodhas been developed as an adaptive stabilizedfinite element method for turbulent incompressible/compressible flow [?, ?, ?, ?,?, ?, ?]. G2 has been shown to be cheap, since the adaptive mesh refinement isminimizing the number of degrees of freedom, general, since there are no modelparameters to fit, and reliable, since the method is based on quantitative errorcontrol in a chosen output.We begin by describing the standard FEM applied to the model to establishbasic notation, and proceed to describe streamline diffusion stabilization andlocal ALE map over a mesh T h with mesh size h together with adaptive errorcontrol based on duality.195

Unicorn: A Unified Continuum Mechanics Solver20.2.1 Standard GalerkinWe begin by formulating the standard cG(1)cG(1) FEM [?] with piecewise continuouslinear solution in time and space for 20.1 by defining the exact solution:w = [u, p, θ], the discrete solution W = [U, P, Θ] and the residual R(W) =[R u (W), R p (W), R θ (W)]:R u (W) = ρ(D t U i + U j D xj U i ) − D xj Σ ij − f iR p (W) = D xj U jR θ (W) = D t Θ + u j D xj Θwhere R(w) = 0 and Σ denotes a discrete piecewise constant stress.To determine the degrees of freedom ξ we enforce the Galerkin orthogonality(R(W), v) = 0, ∀v ∈ V h where v are test functions in the space of piecewise linearcontinuous functions in space and piecewise constant discontinuous functions intime and (·, ·) denotes the space-time L 2 inner product over Q. We thus have theweak formulation:(R u (W), v u ) = (ρ(D t U i + U j D j U i ) − f i , v u i ) + (Σ ij , D xj v u i ) −(R p (W), v p ) = (D xj U j , v p ) = 0(R θ (W), v θ ) = (D t Θ + u j D xj Θ, v θ ) = 0∫ tnt n−1∫ΓΣ ij v u i n j dsdt = 0forall v ∈ V h , wherethe boundary termon Γ arising fromintegration by partsvanishesifweassumeahomogenousNeumannboundaryconditionforthestressΣ.Thisstandardfiniteelementformulationisunstableforconvection-dominatedproblems and due to choosing equal order for the pressure and velocity. Thuswe cannot use the standard finite element formulation by itself but proceed toa streamline diffusion stabilization formulation. We also describe a local ALEdiscretization for handling the phase interface.20.2.2 Local ALEIf the phase function Θ has different values on the same cell it would lead to anundesirable diffusion of the phase interface. By introducing a local ALE coordinatemap [?] on each discrete space-time slab based on a given mesh velocity(i.e. the material velocity of one of the phases) we can define the phase interfaceat cell facets, allowing the interface to stay discontinuous. We describe thedetails of the coordinate map and its influence on the FEM discretization in theappendix. The resulting discrete phase equation is:D t Θ(x) + (U(x) − β h (x)) · ∇Θ(x) = 0 (20.3)196

Johan Hoffman, Johan Jansson, Niclas Jansson and Murtazo Nazarovwith β h (x) the mesh velocity.We thus choose the mesh velocity β h to be the discrete material velocity Uin the structure part of the mesh (vertices touching structure cells) and in therest of the mesh we use mesh smoothing to determine β h to maximize the meshquality according to a chosen objective, alternatively use local mesh modificationoperations (refinement, coarsening, swapping) on the mesh to maintain thequality [?]. Note that we still compute in Euler coordinates, but with a movingmesh.20.2.3 Streamline diffusion stabilizationFor the standard FEM formulation of the model we only have stability of U butnot of spatial derivatives of U. This means the solution can be oscillatory, causinginefficiency by introducing unnecessary error. We instead choose a weightedstandard Galerkin/streamline diffusion method of the form (R(W), v + δR(v)) =0, ∀v ∈ V h (see [?])with δ > 0astabilization parameter. Wehere also make asimplificationwhereweonlyintroducenecessarystabilization termsanddroptermsnot contributing to stabilization. Although not fully consistent, the streamlinediffusion stabilization avoid unnecessary smearing of shear layers as the stabilizationis not based on large (≈ h −1 2) cross flow derivatives). For the UC modelthe stabilized method thus looks like:(R u (W), v u ) = (ρ(D t U i + U j D j U i ) − f i , v u i ) + (Σ ij , D xj v u i ) + SD u (W, v u ) = 0(R p (W), v p ) = (D xj U j , v p ) + SD p (W, v p ) = 0for all v ∈ V h , and:SD u (W, v u ) = δ 1 (U j D j U i , U u j D j v u i ) + δ 2 (D xj U j , D xj v u j )SD p (W, v p ) = δ 1 (D xi P, D xi v p )where we only include the dominating stabilization terms to reduce complexityin the formulation.20.2.4 Duality-based adaptive error control20.2.5 Unicorn/FEniCS software implementationWe implement the G2 discretization of the UC in a general interface for timedependentPDE where we give the forms a(U, v) = (D U F U , v) and L(v) = (F U , v)for assembling the linear system given by Newton’s method for a time step fortheincompressible UCwithNewtonianfluidconstitutiveequationin figure20.1.The language used is Python, where we use the FEniCS Form Compiler (FFC)[?] form notation.197

Unicorn: A Unified Continuum Mechanics Solver...def ugradu( u, v):return [ dot( u, grad(v[i])) for i in range(d)]def epsilon( u):return 0.5 ∗( grad( u) + transp( grad( u)))def S(u, P):return mult( P, Identity(d)) − mult( nu, grad( u))def f(u, v):return −dot( ugradu( Uc, Uc) , v) + \dot( S(Uc, P) , grad( v)) + \−mult( d1, dot( ugradu( Um, u) , ugradu( Um, v))) + \−mult( d2, dot( div( u) , div( v))) + \dot( ff, v)def dfdu( u, k, v):return −dot( ugradu( Um, u) , v) + \−dot( mult( nu, grad( u)) , grad(v)) + \−mult( d1, dot( ugradu( Um, u) , ugradu( Um, v))) + \−mult( d2, dot( div( u) , div( v)))# cG(1)def F(u, u0, k, v):uc = 0.5 ∗ ( u + u0)return (−dot( u, v) + dot( u0, v) + mult( k, f(u, v)))def dFdu( u, u0, k, v):uc = 0.5 ∗ ureturn (−dot( u, v) + mult(1.0 ∗ k, dfdu( uc, k, v)))a = ( dFdu(U1, U0, k, v)) ∗ dxL = −F( UP, U0, k, v) ∗ dxFigure 20.1: Source code for bilinear and linear forms for incompressible UC onetime step with a Newton-type method (approximation of Jacobian).20.3 Unicorn classes: data types and algorithms20.3.1 Unicorn software designUnicorn follows two basic design principles: 198

Johan Hoffman, Johan Jansson, Niclas Jansson and Murtazo Nazarov• Keep It Simple Stupid (KISS)• “Premature optimization is the root of all evil” (Donald Knuth)Together, these two principles enforce generality and understandability of interfacesand implementations. Unicorn re-uses other existing implementationsand chooses straightforward, sufficiently efficient (optimize bottlenecks) standardalgorithms for solving problems. This leads to small and maintainable implementations.High performance is achieved by reducing the computationalload on the method level (through adaptivity and fixed-point iteration).Unicornconsistsofkeyconceptsabstractedinthefollowingclasses/interfaces:TimeDependentPDE: time-stepping In each time-step a non-linear algebraicsystem is solved by fixed-point iteration.ErrorEstimate: adaptive error control The adaptive algorithm isbased oncomputing local error indicators of the form ǫ K = (R(U), D x Φ) L2 (K×T). Thisalgorithm is abstracted in the ErrorEstimate and class.SlipBC: friction boundary condition Efficientcomputationofturbulentflowin Unicorn is based on modeling of turbulent boundary layers by a frictionmodel: u · n = 0, implemented as a strong boundary condition in the algebraicsystem.20.3.2 TimeDependentPDEWeconsidertime-dependentequationsofthetype f(u) = −D t u+g(u) = 0where gcan include differential operators in space, where specifically the UC model is ofthis type. In weak form the equation type looks like(f(u), v) = (−D t u + g(u), v) =0, possibly with partial integration of termsWe want to define a class (datatype and algorithms) abstracting the timesteppingoftheG2method,wherewewanttogivetheequation(possibly in weakform)asinputandgenerate thetime-steppingautomatically. cG(1)cG(1)(Crank-Nicolsonintime)givestheequationforthe(possiblynon-linear)algebraicsystemF(U) (in Python notation):# cG(1)def F(u, u0, k, v):uc = 0.5 ∗ ( u + u0)return (−dot( u, v) + dot( u0, v) + mult( k, g(uc, v)))With v: ∀v ∈ V h generating the equation system.We solve this system by Newton-type fixed-point iteration:(F ′ (U P )U 1 , v) = (F ′ (U P ) − F(U P ), v) (20.4)199

Unicorn: A Unified Continuum Mechanics Solverwhere U P denotes the value in the previous iterate and F ′ = ∂F the Jacobianmatrix or an approximation. Note that F ′ can be chosen freely since it only∂Uaffects the convergence of the fixed-point iteration, and does not introduce approximationerror.We define the bilinear form a(U, v) and linear form L(v) corresponding to theleft and right hand sides respectively (in Python notation):def dFdu( u, u0, k, v):uc = 0.5 ∗ ureturn (−dot( u, v) + mult( k, dgdu( uc, k, v)))a = ( dFdu(U, U0, k, v)) ∗ dxL = ( dFdu(UP, U0, k, v) − F(UP, U0, k, v)) ∗ dxThus, in each time step we need to solve the system given in eq. 20.4 byfixed-point iteration by repeatedly computing a and L, solving a linear systemand updating U.We now encapsulate this in a C++ class interface in fig. 20.4 which we callTimeDependentPDEwhere we give a and L, an end time T, a mesh (defining V h )and boundary conditions.The skeleton of the time-stepping with fixed-point iteration is implementedin listing 20.3.2.See ?? and [?] for a dicussion about the efficiency of the fixed-point iterationand its implementation.[Discuss pressure/systems]20.3.3 ErrorEstimateThe duality-based adaptive error control algorithm requires the following primitives:Residual computation We compute the mean-value in each cell of the continuousresidual R(U) = f(U) = −D t U + g(U), this is computed as the L 2 -projection into the space of piecewise constants W h : (R(U), v) = (−D t U +g(U), v), ∀v ∈ W h .Dual solution We compute the solution of the dual problem using the sametechnology as the primal problem. The dual problem is solved backwardin time, but with the time coordinate transform s = T − t we can use thestandard TimeDependentPDEinterface and step the dual time s forward.Space-time function storage/evaluation Wecomputeerrorindicatorsasspacetimeintegrals over cells: ǫ K = (R(U), D x Φ) L2 (K×T), where we need to evaluateboth the primal solution U and the dual solution Φ. In addition, U is200

Johan Hoffman, Johan Jansson, Niclas Jansson and Murtazo Nazarov// / Represent and solve time dependent PDE.class TimeDependentPDE{// / Public interfacepublic:TimeDependentPDE(// Computational meshMesh& mesh,// Bilinear form for Jacobian approx.Form& a,// Linear form for time−step residualForm& L,// List of boundary conditionsArray & bcs,// End timereal T);virtual ˜ TimeDependentPDE();// / Solve PDEvirtual uint solve();// / Protected interface for subclassesprotected:// / Compute initial valuevirtual void u0(Vector& u);// / Called before each time stepvirtual void preparestep();// / Called before each fixed−point iterationvirtual void prepareiteration();// / Return the bilinear form aForm& a();// / Return the linear form LForm& L();// / Return the meshMesh& mesh();};Figure 20.2: C++ class interface for TimeDependentPDE.a coefficient in the dual equation. This requires storage and evaluation ofa space-time function, which is encapsulated in the SpaceTimeFunctionclass.201

Unicorn: A Unified Continuum Mechanics SolverMesh adaptation After the computation of the error indicators we select thelargest p% of the indicators for refinement. The refinement is then performedbyrecursiveRivaracellbisectionencapsulatedintheMeshAdaptivityclass. A future promising alternative is to use Madlib [?, ?] for mesh adaptation,whichis basedon edge split, collapse and swap, andwouldthusgivethe ability to coarsen a mesh, or more generally to control the mesh size.Usingtheseprimitives,wecanconstructanadaptivealgorithm. Theadaptivealgorithm is encapsulated in the C++ class interface in fig. ?? which we callErrorEstimate.20.3.4 SlipBCForhighReynoldsnumbersproblemssuchascaraerodynamicsorairplaneflight,it’s not possible to resolve the turbulent boundary layer. One possibility is tomodel turbulent boundary layers by a friction model:u · n = 0 (20.5)u · τ k + β −1 n ⊤ στ k = 0, k = 1, 2 (20.6)Weimplementthenormalcomponentcondition(slip)boundaryconditionstrongly.By “strongly” we here mean an implementation of the boundary condition afterassembling the left hand side matrix and the right hand side vector in the algebraicsystem, whereas the tangential components (friction) are implemented“weakly” by adding boundary integrals in the variational formulation. The rowof the matrix and load vector corresponding to a vertex is found and replaced bya new row according to the boundary condition.The idea is as follows: Initially, the test function v is expressed in the Cartesianstandardbasis(e 1 , e 2 , e 3 ). Now,thetestfunctionismappedlocallytonormaltangentcoordinates with the basis (n, τ 1 , τ 2 ), where n = (n 1 , n 2 , n 3 ) is the normal,and τ 1 = (τ 11 , τ 12 , τ 13 ), τ 2 = (τ 21 , τ 22 , τ 23 ) are tangents to each node on the boundary.This allows us to let the normal direction to be constrained and the tangentdirections be free:v = (v · n)n + (v · τ 1 )τ 1 + (v · τ 2 )τ 2 .For the matrix and vector this means that the rows corresponding to the boundaryneed to be multiplied with n, τ 1 , τ 2 , respectively, and then the normal componentof the velocity should be put 0.This concept is encapsulated in the class SlipBC which is a subclass ofdolfin::BoundaryConditionfor representing strong boundary conditions.202

Johan Hoffman, Johan Jansson, Niclas Jansson and Murtazo Nazarov20.4 Mesh adaptivity20.4.1 Local mesh operations: MadlibMadlib incorporates an algorithm and implementation of mesh adaptationwhere a small set of local mesh modification operators are defined such as edgesplit,edgecollapseandedgeswap. Ameshadaptationalgorithmisdefinedwhichuses this set of local operators in a control loop to satisfy a prescribed size fieldh(x) and quality tolerance. Edge swapping is the key operator for improvingquality of cells, for example around a vertex with a large number of connectededges.In the formulation of finite element methods it is typically assumed that thecell size of a computational mesh can be freely modified to satisfy a desired sizefield h(x) or to allow mesh motion. In state-of-the-art finite element softwareimplementations this is seldom the case, where typically only limited operationsare allowed [?, ?], (local mesh refinement), or a separate often complex, closedand ad-hoc mesh generation implementation is used to re-generate meshes.The mesh adaptation algorithm in Madlib gives the freedom to adapt to aspecifiedsize fieldusinglocal meshoperations. The implementationispublishedas free software/open source allowing other research to build on the results andscientific repeatability of numerical experiments.20.4.2 Elastic mesh smoothing: cell quality optimization20.4.3 Recusive Rivara bisection20.5 Parallel computation20.5.1 Tensor assembly20.5.2 Mesh refinement20.6 Application examples20.6.1 Incompressible flow20.6.2 Compressible flow20.6.3 Fluid-structure interaction203

Unicorn: A Unified Continuum Mechanics Solvervoid TimeDependentPDE::solve(){// Time−steppingwhile( t < T){U = U0;preparestep();step();}}void TimeDependentPDE::step(){// Fixed−point iterationfor(int iter = 0; iter < maxiter; iter++){prepareiteration();step_residual = iter();}}if( step_residual < tol){// Iteration convergedbreak;}void TimeDependentPDE::iter(){// Compute one fixed−point iterationassemble( J, a());assemble( b, L());for ( uint i = 0; i < bc().size(); i++)bc()[i]−>apply(J, b, a());solve( J, x, b);// Compute residual for the time−step / fixed−point equationJ.mult( x, residual);residual −= b;}return residual. norm( linf);Figure 20.3: Skeleton implementation204in Unicorn of time-stepping with fixedpointiteration.

Johan Hoffman, Johan Jansson, Niclas Jansson and Murtazo Nazarov// / Represent and solve time dependent PDE.class TimeDependentPDE{// / Public interfacepublic:TimeDependentPDE(// Computational meshMesh& mesh,// Bilinear form for Jacobian approx.Form& a,// Linear form for time−step residualForm& L,// List of boundary conditionsArray & bcs,// End timereal T);virtual ˜ TimeDependentPDE();// / Solve PDEvirtual uint solve();// / Protected interface for subclassesprotected:// / Compute initial valuevirtual void u0(Vector& u);// / Called before each time stepvirtual void preparestep();// / Called before each fixed−point iterationvirtual void prepareiteration();// / Return the bilinear form aForm& a();// / Return the linear form LForm& L();// / Return the meshMesh& mesh();};Figure 20.4: C++ class interface for TimeDependentPDE.205

CHAPTER21Viper: A Minimalistic Scientific PlotterBy Ola SkavhaugChapter ref: [skavhaug]207

CHAPTER22Lessons Learnt in Mixed Language ProgrammingBy Kent-Andre Mardal, Anders Logg, and Ola SkavhaugChapter ref: [mardal-2]◮ Editor note: This could be in the implementation section or in an appendix.This chapter discusses some lessons learnt while developing the Python interfaceto DOLFIN. It contains the basics of using SWIG to create Python extensions.Then an explanation of some design choices in DOLFIN with particularemphasis on operators (and how these are dealt with in SWIG). Typemaps areexplained in the context of NumPy arrays. Finally, we explain how to debugPython extensions efficiently, eg. by setting breakpoints.209

Part IIIApplications211

CHAPTER23Finite Elements for Incompressible FluidsBy Andy R. Terrel, L. Ridgway Scott, Matthew G. Knepley, Robert C. Kirby and GarthN. WellsChapter ref: [terrel]Incompressible fluid models have numerousdiscretizations each with its ownbenefits and problems. This chapter will focus on using FEniCS to implementdiscretizations of the Stokes and two non-Newtonian models, grade two andOldroyd–B. Special consideration is given to profiling the discretizaions on severalproblems.213

CHAPTER24Benchmarking Finite Element Methods forNavier–StokesBy Kristian Valen-Sendstad, Anders Logg and Kent-Andre MardalChapter ref: [kvs-1]In this chapter, we discuss the implementation of several well-known finiteelement based solution algorithms for the Navier-Stokes equations. We focuson laminar incompressible flows and Newtonian fluids. Implementations of simpleprojection methods are compared to fully implicit schemes such as inexactUzawa, pressure correction on the Schur complement, block preconditioning ofthe saddle point problem, and least-squares stabilized Galerkin. Numerical stabilityand boundary conditions are briefly discussed before we compare the implementationswithrespect toefficiency and accuracy foranumberofwellestablishedbenchmark tests.215

CHAPTER25Image-Based Computational HemodynamicsBy Luca AntigaChapter ref: [antiga]The physiopathology of the cardiovascular system has been observed to betightly linked to the local in-vivo hemodynamic environment. For this reason,numerical simulation of patient-specific hemodynamics is gaining ground in thevascular research community, and it is expected to start playing a role in futureclinical environments. For the non-invasive characterization of local hemodynamicsonthebasisofinformationdrawnfrommedicalimages,robustworkflowsfromimagestothedefinitionandthediscretizationofcomputationaldomainsfornumericalanalysis are required. Inthischapter, wepresent aframework forimageanalysis, surface modeling, geometric characterization and mesh generationprovided as part of the Vascular Modeling Toolkit (VMTK), an open-source effort.Starting from a brief introduction of the theoretical bases of which VMTKis based, we provide an operative description of the steps required to generate acomputational mesh from a medical imaging data set. Particular attention willbe devoted to the integration of the Vascular Modeling Toolkit with FEniCS. Allaspects covered in this chapter are documented with examples and accompaniedby code and data, which allow to concretely introduce the reader to the field ofpatient-specific computational hemodynamics.217

CHAPTER26Simulating the Hemodynamics of the Circle of WillisBy Kristian Valen-Sendstad, Kent-Andre Mardal and Anders LoggChapter ref: [kvs-2]Stroke is a leading cause of death in the western world. Stroke has differentcauses but around 5-10% is the result of a so-called subarachnoid hemorrhagecaused by the rupture of an aneurysm. These aneurysms are usually found inournearthecircle ofWillis, whichisan arterial networkatthebase ofthebrain.In this chapter we will employ FEniCS solvers to simulate the hemodynamics inseveral examples ranging from simple time-dependent flow in pipes to the bloodflow in patient-specific anatomies.219

CHAPTER27Cerebrospinal Fluid FlowBy Susanne Hentschel, Svein Linge, Emil Alf Løvgren and Kent-Andre MardalChapter ref: [hentschel]27.1 Medical BackgroundThe cerebrospinal fluid (CSF) is a clear water-like fluid which occupies the socalledsubarachnoid space (SAS) surrounding the brain and the spinal cord, andthe ventricular system within the brain. The SAS is composed of a cranial and aspinal part, bounded by tissue layers, the dura mater as outer boundary and thepia mater as internal boundary. The cranial and the spinal SAS are connectedby an opening in the skull, called the foramen magnum. One important functionoftheCSFistoact asashock absorber andtoallowthebrain toexpandandcontractas a reaction to the changing cranial blood volume throughout the cardiaccycle. Duringsystole the blood volume that enters the brain through the arterialsystem exceeds the volume that leaves the brain through the venous system andleads therefore to an expansion of the brain. The opposite effect occurs duringdiastole, whenthebloodvolumein thebrain returnstothestarting point. Hencethepulse that travels throughtheblood vesselnetworkis transformedto apulsein the CSF system, that is damped on its way along the spinal canal.The left picture in Figure 27.1 shows the CSF and the main structures in thebrain of a healthy individual. In about 0.6% of the population the lower partof the cerebellum occupies parts of the CSF space in the upper spinal SAS andobstructs the flow. This so-called Chiari I malformation (or Arnold-Chiari malformation)(Milhorat et al. [?]) is shown in the right picture in Figure 27.1. Avariety of symptoms is related to this malformation, including headache, abnor-221

Cerebrospinal Fluid FlowFigure 27.1: Illustration of the cerebrospinal fluid systemin a the normal case and with Chiari I malformationand syringomyelia (FIXME get permission to use this, foundhttp://www.chiariinstitute.com/chiari malformation.html)mal eye-movement, motor or sensor-dysfunctions, etc. If the malformation is nottreated surgically, the condition may become more severe and will eventuallycause more serious neurological deterioration, or even lead to death. Many peoplewith the Chiari I malformation develop fluid filled cavities within the spinalcord,adiseasecalledsyringomyelia(Oldfield[?]). TheexactrelationbetweentheChiari I malformation and syringomyelia is however not known. It is believedthatobstructions,thatisabnormalanatomiescauseabnormalflowleadingtothedevelopment of syringomyelia (Oldfield [?]). Several authors have analyzed therelations between abnormal flow and syringomyelia development based on measurementsin patients and healthy volunteers (Heiss [?], Pinna [?], Hofmann[?],Hentschel [?]). Thementionedstudiesalsocomparethedynamicsbeforeandafter decompressive surgery. The latter is an operation, where the SAS lumenaroundtheobstructedareaisincreasedbyremovingpartsofthesurroundingtissueandbones(MilhoratandBolognese[?]).Controlimagestakensomeweeksormonths after the intervention often show a reduction of the size of the cavity inthe spinal canal and patients usually report improvement of their condition. Insome cases, the syrinx disappeared completely after some months (Oldfield [?],Pinna [?], Heiss [?]).The studies mentioned above are all based on a small amount of individualscharacterized by remarkable variations. CFD simulations may help to test theinitial assumptions in generalized settings. Gupta [?] and Roldan [?] demonstratedthe usefulness of CFD to quantify and visualize CSF flow in patient specificcases in great detail. It is the purpose of this chapter to describe the implementationof such a CFD solver in FEniCS and to compare the simulationresults with results obtained from Star-CD. Note that the Navier-Stokes solversare discussed in detail in Chapter [?].222

Susanne Hentschel, Svein Linge, Emil Alf Løvgren and Kent-Andre Mardal27.2 Mathematical DescriptionWe model the CSF flow in the upper spinal canal as a Newtonian fluid with viscosityand density similar to water under body temperature. In the presentedexperiments, we focus on the dynamics around the spinal cord. The tissue surroundingthe fluid is modeled as impermeable and rigid throughout the cardiaccycle. To simulate CSF flow, we apply the Navier-Stokes equations for an incompressibleNewtonian fluid,( )∂vρ∂t + (v · ∇)v = −∇p + µ∆v + g, ∈ Ω,∇v = 0, ∈ Ω,with the variables as indicated in Table 27.2, and g, the body force, i.e., gravity.We kan eliminate gravity from the equation by assuming that the body force isbalanced by the hydrostatic pressure. As a result, pressure describes only thedynamic pressure. For calculating the physical correct pressure, static pressureresulting from body forces has to be added. This simplification is however nottrue during sudden movements such as raising up.The coordinate system is chosen so that the tubular axis points downwards,resulting in positive systolic flow and negative diastolic flow.27.3 Numerical Experiments27.3.1 ImplementationWe refer to Chapter [?] for a complete description of the solvers and schemesimplemented. In this chapter we concentrate on the use of these solvers in a fewexamples.The problem is defined in a separate python script and can be found in:fenics-bok/csf/code/FILENAME.The main parts are presented below.Mesh boundaries. The mesh boundaries at the inlet cross section, the outletcross section, and the SAS boundaries are defined by the respective classesTop, Bottom,and Contour. Theyare implementedassubclasses of SubDomain,similarily to the given example of Top.class Top(SubDomain):def __init__(self, index, z_max, z_min):SubDomain.__init__(self)self.z_index = indexself.z_max = z_maxself.z_min = z_mindef inside(self, x, on_boundary):return bool(on_boundary and x[self.z_index] == self.z_max)223

Cerebrospinal Fluid FlowSymbol Meaning Entity Chosen Value Reference Valuecmv velocity variable — −1.3 ± 0.6 . . .2.4 ± 1.4 asp pressure variable mmHg — ...gρ density— 0.993 bcmgs3µ dynamic viscosity — 0.0007cmcmν kinematic viscosity0.710 −2 0.710 −2sSV stroke volume c ml 0.27 0.27 dsbeatsHR heart rate 1.17 1.17sA0 tube boundary cm 2 32 —A1,A2 area of inlet/outlet cm 2 0.93 0.8 ...1.1 eRe Reynholds Number – – 70–200 fWe Womersley Number – – 14–17Table 27.1: Characteristic values and parameters for CSF flow modeling.a Hofmann et al. [?]; Maximum absolute anterior CSF flow in both directions from controlsand patients at foramen Magnumb at 37 ◦ Cc CSF volume that moves up and down through cross section in the SAS during one cardiaccycled Gupta et al. [?]e Loth et al. [?]; Cross sections at 20–40 cm from the foramen magnum.f See more details in 27.3.5.224

Susanne Hentschel, Svein Linge, Emil Alf Løvgren and Kent-Andre MardalTodefinethedomaincorrectly,weoverridethebaseclass’objectfunction inside.It returns a boolean evaluating if the inserted point x is part of the sub domain.The boolean on boundary is very useful to easily partition the wholemesh boundary to sub domains.Physically more correct would be to require, that the no slip condition is alsovalid on the outermost/innermost nodes of the inflow and outflow sections asimplemented below:def on_ellipse(x, a, b, x_index, y_index, x_move=0, y_move=0):x1 = x[x_index] - x_movex2 = x[y_index] - y_movereturn bool( abs((x1/a)**2 + (x2/b)**2 - 1.0 ) < 10**(-6) )Thevectorsdescribingtheellipsesofthecordandthedurainacrosssectionwiththe corresponding axes are required. The global function on ellipse checks ifx is on the ellipse defined by the x-vector a and the y-vector b. The variablesx move and y move allow to define an eccentric ellipse.Defining the inflow area at the top with excluded mantle nodes is done asfollows below, the outflow area at the bottom is defined analogously.class Top(SubDomain): #bc for topdef __init__(self, a2_o, a2_i, b2_o, b2_i, x_index, y_index, z_index, z_max, x2_o_move=0,\y2_o_move=0, x2_i_move=0, y2_i_move=0):SubDomain.__init__(self)self.x_index = x_indexself.y_index = y_indexself.a2_o = a2_oself.a2_i = a2_iself.b2_o = b2_oself.b2_i = b2_iself.z_index = z_indexself.z_max = z_maxself.x2_o_move = x2_o_moveself.x2_i_move = x2_i_moveself.y2_o_move = y2_o_moveself.y2_i_move = y2_i_movedef inside(self, x, on_boundary):return bool(on_boundary and abs(x[self.z_index] - self.z_max)

Cerebrospinal Fluid FlowFigure 27.2: Two different flow pulses..minute. Furthermore, we defined the cross sectional area to be 0.93 cm 2 , whichmatches the segment from 20 to 40 cm down from the foramen magnum (Loth etal [?]). In this region of the spinal canal, the cross sectional area varies little. Inaddition, the dura andthe cord shape resemble a simple tube more than in otherregions. According to Oldfield et al. [?], syrinxes start at around 5 cm below theforamen magnum and reach down up to 28 cm below the foramen magnum.Further, we define a velocity pulse on the inflow and outflow boundaries andsince we are modeling incompressible flow between rigid impermeable boundaries,we must have equal inflow and outflow volume at all times. The pulsevalues in these boundary cross sections were set equal in every grid point, andscaled to match the volume transport of 0.27 ml.Smithetal. [?]introducedafunctiondescribing thevarying bloodpressureina heart chamber(see Figure 27.2). With some adjustment and additional parameters,the function was adapted to approximate the CSF flow pulse. The systoleof the pulse function is characterized by a high amplitude with a short durationwhile the negative counter movement has a reduced amplitude and lasts considerablylonger. The global function for defining the pulse is:def get_pulse_input_function(V, z_index, factor, A, HR_inv, HR, b, f1):two_pi = 3.4 * pirad = two_pi /HR_invv_z = "factor*(-A*(exp(-fmod(t,T)*rad)*Ees*(sin(-f1*fmod(t,T)*rad)-vd)\-(1-exp(-factor*fmod(t,T)*rad))*p0*(exp(sin(-fmod(t,T)*rad)-vd)-1))-b)"vel = Noneif z_index == 0:vel = (v_z, "0.0", "0.0")elif z_index ==1:vel = ("0.0", v_z, "0.0")elif z_index ==2:vel = ("0.0", "0.0", v_z)226

Susanne Hentschel, Svein Linge, Emil Alf Løvgren and Kent-Andre Mardalclass Pulse(Function):cpparg = velprint veldefaults = {"factor":factor, "A":A, "p0":1, "vd":0.03, "Ees":50, "T":HR_inv, "HR":HR,\"rad":rad, "b":b, \emp{f1}:f1}return Pulse(V)To define the necessary parameters in the initialization, the following lines arerequired.self.A = 2.9/16 # scale to get max = 2.5 and min = -0.5 for f1 = 1self.factor = self.flow_per_unit_area/0.324self.v_max = 2.5 * self.factorself.b = 0.465 # translating the function "down"self.f1 = 0.8The boundary condition Pulse is defined as a subclass of Function, thatenables parameter dependencies evaluated at run time. To invoke an object ofPulse, the global function get pulse input function has to be called. Thefunctioninputcontainsallnecessaryconstantstodefinethepulsefunction,scaledto cardiac cycle and volume transport. The index z index definesthe coordinateof the tubular direction. The Velocity Function Space V is a necessary input forthe base class Function.Initialization of the problem. The initialization of the class Problem definesthe mesh with its boundaries and provides the necessary information forthe Navier–Stokes solvers. The mesh is ordered for all entities and initiated tocompute its faces.The values z min and z max mark the inflow and outflow coordinates alongthetube’slengthaxis. Asmentionedabove,theaxisalongthetubeisindicatedbyz index. Ifone ofthecoordinates orthez-index is notknown,it mayhelpto callthe mesh in viper unix>viper meshname.xml. Typing o prints the length inx, y and z direction in the terminal window. Defining z min, z max and z indexcorrectly is important for the classes that define the boundary domains of themesh Top, Bottom and Contour. As we have seen before, z index is necessaryto set the correct component to the non-zero boundary velocity.Exterior forces on the Navier–Stokes flow are defined in the object variablef. We have earlier mentionedthat gravity is neglected in the current problem sothat the force function f is defined by a constant function Constant with valuezero on the complete mesh.Afterinitializingthesubdomains, Top, Bottomand Contour,theyaremarkedwithreferencenumbersattributedtothecollectionofallsubdomains sub domains.Toseethemostimportanteffects,thesimulationwasrunslightlylongerthanone full period. A test verified that the initial condition of zero velocity in allpoints is sufficiently correct and leads to a good result in the first period already.227

Cerebrospinal Fluid FlowBesides maximum and minimum velocities, it includes the transition from diastoleto systole and vice versa. With the given time step length, the simulation isvery detailed in time.def __init__(self, options):ProblemBase.__init__(self, options)#filename = options["mesh"]filename = "../../data/meshes/chiari/csf_extrude_2d_bd1.xml.gz"self.mesh = Mesh(filename)self.mesh.order()self.mesh.init(2)self.z_max = 5.0self.z_min = 0.0self.z_index = 2self.D = 0.5# in cm# in cm# characteristic diameter in cmself.contour = Contour(self.z_index, self.z_max, self.z_min)self.bottom = Bottom(self.z_index, self.z_max, self.z_min)self.top = Top(self.z_index, self.z_max, self.z_min)# Load sub domain markersself.sub_domains = MeshFunction("uint", self.mesh, self.mesh.topology().dim() - 1)# Mark all facets as sub domain 3for i in range(self.sub_domains.size()):self.sub_domains.set(i, 3)self.contour.mark(self.sub_domains, 0)self.top.mark(self.sub_domains, 1)self.bottom.mark(self.sub_domains, 2)# Set viscosityself.nu = 0.7 *10**(-2) # cmˆ2/s# Create right-hand side functionself.f = Constant(self.mesh, (0.0, 0.0, 0.0))n = FacetNormal(self.mesh)# Set end-timeself.T = 1.2* 1.0/self.HRself.dt = 0.001Increasing the time step length usually speeds up the calculation of the solution.As long as the CFL number with the maximum velocity v max , time steplength dt and minimal edge length h min is smaller than one (CFL = vmaxdth min< 1),the solvers should (!!!) converge. For too small time steps it can however lead toan increasing number of iterations for the solver on each time step. As a characterizationof the fluid flow, the Reynholds number (Re = vcl ) was calculated withνthe maximum velocity v c at the inflow boundary and the characteristic length lof the largest gap between outer and inner boundary. Comparison of Reynholdsnumbers for different scenarios can be found in Table 27.3.5.The area of the mesh surfaces and the mesh size can be found as follows.self.h = MeshSize(self.mesh)self.A0 = self.area(0)self.A1 = self.area(1)self.A2 = self.area(2)228

Susanne Hentschel, Svein Linge, Emil Alf Løvgren and Kent-Andre Mardaldef area(self, i):f = Constant(self.mesh, 1)A = f*ds(i)a = assemble(A, exterior_facet_domains=self.sub_domains)return aObject Functions. Being a subclass of ProblemBase, Problem overrides theobject functions update and functional. The first ensures that all time–dependent variables are updated for the current time step. The latter printsthe maximum values for pressure and velocity. The normal flux through theboundaries is defined in the separate function flux.def update(self, t, u, p):self.g1.t = tself.g2.t = tpassdef functional(self, t, u, p):v_max = u.vector().norm(linf)f0 = self.flux(0,u)f1 = self.flux(1,u)f2 = self.flux(2,u)pressure_at_peak_v = p.vector()[0]print "time ", tprint "max value of u ", v_maxprint "max value of p ", p.vector().norm(linf)print "CFL = ", v_max * self.dt / self.h.min()print "flux through top ", f1print "flux through bottom ", f2# if current velocity is peakif v_max > self.peak_v:self.peak_v = v_maxprint pressure_at_peak_vself.pressure_at_peak_v = pressure_at_peak_vreturn pressure_at_peak_vdef flux(self, i, u):n = FacetNormal(self.mesh)A = dot(u,n)*ds(i)a = assemble(A, exterior_facet_domains=self.sub_domains)return aThe boundaryconditionsare all given asDirichlet conditions, associated withtheir velocity function space and the belonging sub domain. The additional functionsboundary conditions and initial conditions define the respectiveconditions for the problem that are called by the solver. Boundary conditions forvelocity, pressure and psi (???) are collected in the lists bcv, bcp and bcpsi.def boundary_conditions(self, V, Q):# Create no-slip boundary condition for velocityself.g0 = Constant(self.mesh, (0.0, 0.0, 0.0))bc0 = DirichletBC(V, self.g0, self.contour)# create function for inlet and outlet BC229

Cerebrospinal Fluid Flowself.g1 = get_sine_input_function(V, self.z_index, self.HR, self.HR_inv, self.v_max)self.g2 = self.g1# Create inflow boundary condition for velocity on side 1 and 2bc1 = DirichletBC(V, self.g1, self.top)bc2 = DirichletBC(V, self.g2, self.bottom)# Collect boundary conditionsbcv = [bc1, bc0, bc2]bcp = []bcpsi = []return bcv, bcp, bcpsidef initial_conditions(self, V, Q):u0 = Constant(self.mesh, (0.0, 0.0, 0.0))p0 = Constant(self.mesh, 0.0)return u0, p0Running. Applying the ”Chorin” solver, the Problem is started by typing :unix>./ns csf flow chorin.ItapproximatestheNavier–StokesequationwithChorin’smethod. Theprogressof different simulation steps and results, including maximum calculated pressureand velocity per time step, are printed out on the terminal. In addition, thesolution for pressure and velocity are dumped to afile for each (by default?) timestep. Before investigating the results, we introduce how the mesh is generated.27.3.2 Example 1. Simulation of a Pulse in the SAS.In the first example we represent the spinal cord and the surrounding duramater as two straight cylinders. These cylinders can easily be generated by usingNetGen [?] or Gmsh [?]. In NetGen meshes can be constructed by addingor subtracting geometrical primitives from each other. It also supports DOLFINmesh generation. Examples for mesh generation with NetGen can be found in....In Gmsh, constructing the basic shapes requires a more complex geometricaldescription, however it is easier to control how the geometry is meshed. The followingcode example shows the construction of a circular cylinder (representingthe pia on the spinal cord) within an elliptic cylinder (representing the dura).The dura is defined by the ellipse vectors a=0.65 mm and b=0.7 mm in x and ydirection respectively. The cord has a radius of 4 mm with its center moved 0.8mm in positive x-direction Since Gmsh only allows to draw circular or ellipticarcsforangles smallerthan pi, thebasic ellipses wereconstructedfrom fourarcseach. Every arc is defined by the starting point, the center, another point on thearc and the end point. The value lc defines the maximal edge length in vicinityto the point.230

Susanne Hentschel, Svein Linge, Emil Alf Løvgren and Kent-Andre Mardallc = 0.04;Point(1) = {0,0,0,lc}; // center point//outer ellipsesa = 0.65;b = 0.7;Point(2) = {a,0,0,lc};Point(3) = {0,b,0,lc};Point(4) = {-a,0,0,lc};Point(5) = {0,-b,0,lc};Ellipse(10) = {2,1,3,3};Ellipse(11) = {3,1,4,4};Ellipse(12) = {4,1,5,5};Ellipse(13) = {5,1,2,2};// inner ellipsesmove = 0.08; //"move" centerPoint(101) = {move,0,0,lc};c = 0.4;d = 0.4;Point(6) = {c+move,0,0,lc*0.2};Point(7) = {move,d,0,lc};Point(8) = {-c+move,0,0,lc};Point(9) = {move,-d,0,lc};Ellipse(14) = {6,101,7,7};Ellipse(15) = {7,101,8,8};Ellipse(16) = {8,101,9,9};Ellipse(17) = {9,101,6,6};The constructed ellipses are composed of separate arcs. To define them assingle lines, the ellipse arcs are connected in line loops.// connect lines of outer and inner ellipses to oneLine Loop(20) = {10,11,12,13}; // only outerLine Loop(21) = {-14,-15,-16,-17}; // only innerThe SAS surface between cord and dura is then defined by the following command.Plane Surface(32) = {20,21};Toeasily constructvolumes, Gmshallows toextrude ageneratedsurface overa given length.length = 5.0csf[] = Extrude(0,0,length){Surface{32};};Calling the.geofile in Gmsh >unix Gmsh filename.geoshowsthe definedgeometry. Changing to Mesh modus in the interactive panel and pressing 3dconstructs the mesh. Pressing Save will save the mesh with the .geo–filenameand the extension msh. For use in DOLFIN, the mesh generated in Gmsh can beconverted by applying the DOLFIN converter.unix>dolfin-convert mesh-name.msh mesh-name.xml231

Cerebrospinal Fluid FlowFigure 27.3: Gmsh mesh..Solver p in Pa v max in cm/s t in sChorin 4.03 1.35 0.233G2 6.70 0.924 0.217UzawaTable 27.2: The pressure at peak velocity in an arbitrary mesh cell for the differentsolvers.Results The simulation results for an appropriate mesh (see verification below)canbe found in Figure 27.4. The plots show the velocity componentin tubulardirection at at the mid cross section of the model. The flow profiles are takenat the time steps of maximum flow in both directions and during the transitionfrom systole to diastole. For maximal systole, the velocities have two peak rings,one close to the outer, the other to the inner boundary. We can see sharp profilesat the maxima and bidirectional flow at the local minimum during diastole.Comparing different solvers.For the first example, we applied the Chorin solver (WRITE ABOUT MODIFI-CATIONS WITH TOLERANCES!). For verifying the result, we also applied thesolvers G2 and Uzawa. We picked an arbitrary point in the mesh to compareits pressure value at the time step of peak velocity. The results shown in Table27.2 reveal remarkable differences for ...Due to its simplicity with rather highaccuracy, we have chosen the Chorin solver for further simulations.232

Susanne Hentschel, Svein Linge, Emil Alf Løvgren and Kent-Andre MardalFigure 27.4: Case: Circular cord. The velocity in z-direction for the nonsymmetricpulse at the time steps t = 0.07s, 0.18s, 0.25s..Verifying the mesh.In our case, the resolution in the cross section is more important than along thepipe. Thus, we first varied the number of nodes per cross section on a pipe oflength 1.75 cm with 30 layers in the flow direction. Reducing the maxium edgelength 1 from 0.04, to 0.02 and 0.01 mm gradually decreased the velocity withsome distance to the boundary. The reason for the different velocities lies in theno-slip boundary condition, that influences a greater area for meshes with fewernodes per cross section, leading to a smaller region of the non-zero flow area.Additionally, we observed increasingly wave-like structures of fluctuating velocitesin the inflow and outflow regions, when the maximum edge length wasdecreased. These effects result from the changed ratio of edge lengths in crosssectionaland tubular direction.To avoid increasing the node density utterly, we introduced three thin layersclose to the side boundaries, that capture the steep velocity gradient in theboundary layer. The distance of the layers was chosen, so that the edge lengthslightlyincreasesforeach layer. Startingfrom10% ofthemaximumedgelength,for the first layer, the width of the second and the third layer was set to 30% and80% of the maximum edge length. It turned out, that for meshes with layersalong the side boundaries, a maximum edge length of 0.04 mm was enough toreproduce the actual flow profile.ToaddmeshlayersinGmsh,copiesfortheellipticarcsarescaledtograduallyincrease the maximum edge length. The code example below shows the creationof the layers close to the outer ellipse. The inner layers are created similarly.1 Denoted as lc in the Gmsh code.233

Cerebrospinal Fluid Flowouter_b1[] = Dilate {{0, 0, 0}, 1.0 - 0.1*lc } {Duplicata{ Line{10}; Line{11}; Line{12}; Line{13}; } };outer_b2[] = Dilate {{0, 0, 0}, 1.0 - 0.3*lc } {Duplicata{ Line{10}; Line{11}; Line{12}; Line{13}; } };outer_b3[] = Dilate {{0, 0, 0}, 1.0 - 0.8*lc } {Duplicata{ Line{10}; Line{11}; Line{12}; Line{13}; } };The single arcs are dilated separately since the arc points are necessary for furthertreatment. Remember that no arcs with angles smaller than pi are allowed.Again weneedarepresentationfor thecompleteellipses definedby line loops, asLine Loop(22) = {outer_b1[]};thatare necessaryto definethesurfaces betweenall neighboring ellipses similarto:Plane Surface(32) = {20,22};Additionally, all Surfaces have to be listed in the Extrude command (see below).The tubular layers can be specified during extrusion. Note that the list ofextruded surfaces now contains the six layers close to the side boundaries andthe section between them.// Extrudelength = 5.0;layers = 30;csf[] = Extrude {0,0,length} {Surface{32}; Surface{33};Surface{34};Surface{35};Surface{36};Surface{37};Surface{38};Layers{ {layers}, {1} }; };Besides controling the numbers of nodes in tubular direction, extruded meshesresult in more homogenous images in cross-sectional cuts.The test meshes of 1.75 cm showed seemed to have a fully developed regionaround the mid-cross sections, where want to observe the flow profile. Testingdifferent numbers of tubular layers for the length of 1.75, 2.5 and 5 cm showedthat the above mentioned observations of wave-like structures occurred less forlonger pipes, even though the number of layers was low compared to the pipelength. The presented results were simulated on meshes of length 5 cm with 30layers in z-direction and three layers on the side boundaries.The complete codecan be found in mesh generation/FILENAME.27.3.3 Example 2. Simplified Boundary Conditions.Manyresearchersapply thesinefunctionasinlet andoutletboundarycondition,since itsintegral is zerooverone period. However, itss shape isnotin agreement234

Susanne Hentschel, Svein Linge, Emil Alf Løvgren and Kent-Andre Mardalwith measurements of the cardiac flow pulse (Loth et al. [?]). To see the influenceof the applied boundary condition for the defined mesh, we replace themore realistic pulse function with a sine, scaled to the same amount of volumetransport per cardiac cycle. The code example below implements the alternativepulse function in the object function boundary conditions. The variablesin integration factor describes the integral of the first half of a sine.self.HR = 1.16 # heart rate in beats per second; from 70 bpmself.HR_inv = 1.0/self.HRself.SV = 0.27self.A1 = self.area(1)self.flow_per_unit_area = self.volume_flow/self.A1sin_integration_factor = 0.315self.v_max = self.flow_per_unit_area/sin_integration_factorAs before, we have a global function returning the sine as a Function - object,def get_sine_input_function(V, z_index, HR, HR_inv, v_max):v_z = "sin(2*pi*HR*fmod(t,T))*(v_max)"vel = ["0.0", "0.0", "0.0"]vel[z_index] = v_zclass Sine(Function):cpparg = tuple(vel)defaults = {’HR}:HR, \emp{v_max}:v_max, \emp{T}:HR_inv}return Sine(V)that is called instead of get pulse input function in the function namedboundary conditions:self.g1 = get_sine_input_function(V, self.z_index, self.factor, self.A, self.HR_inv, self.HR,\self.b, self.f1).ThepulseandthesinearesketchedinFigure27.2. Bothfunctionsaremarkedat the points of interest: maximum systolic flow, around the transition from systoleto diastole and the (first, local) minimum. Results for sinusoidal boundaryconditions are shown in Figure 27.5 The shape of the flow profile is similar inevery time step, only the magnitudes change. No bidirectional flow was discoveredin the transition from positive to negative flow. Compared to the resultsreceived by the more realistic pulse function, the velocity profile generated fromsinusoidal boundaries is more homogeneous over the cross section.27.3.4 Example 3. Cord Shape and Position.According to [?], [?], the present flow is inertia dominated, meaning that theshape of the cross section should not influence the pressure gradient. Changingthe length of vectors describing the ellipse fromc = 0.4;d = 0.4;235

Cerebrospinal Fluid FlowFigure 27.5: Case: Circular Cord. The velocity in z-direction as response to asine boundary condition for the time steps t = 0.2, 0.4, 0.6..toc = 0.32;d = 0.5;transforms the cross section of the inner cylinder to an elliptic shape with preservedarea. The simulation results are collected in Figure 27.6. Comparisonsshowed that the pressure gradient was identical for the two cases, the differentshape is however reflected in the flow profiles.A further perturbation of the SAS cross sections was achieved by changingthe moving of the center of the elliptic cord frommove = 0.08;tomove = 0.16;Also for this case the pressure field was identical, with some variations in theflow profiles.27.3.5 Example 4. Cord with Syrinx.Syrinxes expand the cord so that it occupies more space of the spinal SAS. Increasingthe cord radius from 4 mm to 5 mm 2 decreases the cross sectional areabyalmostonethirdto0.64 cm 2 . TheresultingflowisshowninFigure27.8. Apartfrom the increased velocities, we see bidirectional flow already at t = 0.18 and at236

Susanne Hentschel, Svein Linge, Emil Alf Løvgren and Kent-Andre MardalFigure27.6: Case: Ellipticcord. Thevelocityinz-directionforthenon-symmetricpulse at the time steps t = 0.07s, 0.18s, 0.25s..Figure 27.7: Case: Translated elliptic cord. The velocity in z-direction for thenon-symmetric pulse at the time steps t = 0.07s, 0.18s, 0.25s..Problem D 3 in cm v max 4 in cm/s Re WeExample 1 0.54 2.3 177 17Example 2 0.54 0.92 70 17Example 4 0.45 3.2 205 14Table 27.3: Characteristic values for the examples 1, 2 and 3.237

Cerebrospinal Fluid FlowFigure 27.8: Case: Enlarged cord diameter. The velocity in z-direction for thenon-symmetric pulse at the time steps t = 0.07s, 0.18s, 0.25s.t=0.25 as before. The fact that diastolic back flow is visible at t = 0.18, showsthat the pulse with its increased amplitude travels faster.Comparing Reynholds and Womersly numbers shows a clear differene for theabove described examples 1, 2 and 3. Example 2 is marked by a clearly lowermaximum velocity at inflow and outflow boundary that leads to a rather lowReynholdsnumber. Duetothedifferentinflowandoutflowarea, Example 4hasalowerWomerley number, leading to an elevated maximum velocity at the boundaryand clearly increased Reynholds number. These numbers help to quantifythe changes introduced by variations in the model. For the chosen model, theshape of the pulse function at the boundary condition as well as the cross sectionalarea have great influence on the simulation results. As earlier shown byLoth et al. [?], altering the shape of the cross sections does not seem to influencethe flow greatly.2 which equals to set the variables c and d in the geo-file to 0.5238

CHAPTER28Turbulent Flow and Fluid–Structure Interaction withUnicornBy Johan Hoffman, Johan Jansson, Niclas Jansson, Claes Johnson and MurtazoNazarovChapter ref: [hoffman-1]28.1 IntroductionFormanyproblemsinvolvingafluidandastructure,decouplingthecomputationof the two is not possible for accurate modeling of the phenomenon at hand, insteadthe full fluid-structure interaction (FSI) problem has to be solved togetheras a coupled problem. This includes a multitude of important problems in biology,medicine and industry, such as the simulation of insect or bird flight, thehuman cardiovascular and respiratory systems, the human speech organ, thepaper making process, acoustic noise generation in exhaust systems, airplanewingflutter,windinducedvibrationsinbridgesandwaveloadsonoffshorestructures.Common for many of these problems is that for various reasons they arevery hard or impossible to investigate experimentally, and thus reliable computationalsimulation would open up for detailed study and new insights, as wellas for new important design tools for construction.Computational methods used today are characterized by a high computationalcost, and a lack of generality and reliability. In particular, major openchallenges of computational FSI include: (i) robustness of the fluid- structurecoupling, (ii) for high Reynolds numbers the computation of turbulent fluid flow,239

Turbulent Flow and Fluid–Structure Interaction with Unicornand (iii) efficiency and reliability of the computations in the form of adaptivemethods and quantitative error estimation.The FEniCS project aims towards the goals of generality, efficiency, and simplicity,concerning mathematical methodology, implementation, and application.The Unicorn project is a realization of this effort in the field of continuum mechanics,thatwehereexposetoarangeofchallengingproblemsthattraditionallydemand a number of specialized methods and codes, including some problemswhich have been considered unsolvable with state of the art methods. The basisof Unicorn is an adaptive finite element method and a unified continuum formulation,which offer new possibilities for mathematical modeling of high Reynoldsnumber turbulent flow, gas dynamics and fluid-structure interaction.Unicorn, which is based on the DOLFIN/FFC/FIAT suite, is today central ina number of applied research projects, characterized by large problems, complexgeometry and constitutive models, and a need for fast results with quantitativeerror control. We here present some key elementsof Unicorn and the underlyingtheory, and illustrate how this opens for a number of breakthroughs in appliedresearch.28.2 Continuum modelsContinuum mechanics is based on conservation laws for mass, momentum andenergy, together with constitutive laws for the stresses. A Newtonian fluid ischaracterized by a linear relation between the viscous stress and the strain, togetherwith a fluid pressure, resulting in the Navier-Stokes equations. Manycommon fluids, including water and air at subsonic velocities, can be modeledas incompressible fluids, where the pressure acts as a Langrangian multiplierenforcing a divergence free velocity. In models of gas dynamics the pressure isgiven from the internal energy, with an ideal gas corresponding to a linear relation.Solids and non-Newtonian fluids can be described by arbitrary complexlaws relating the stress to displacements, strains and internal energies.Newtonian fluids are characterized by two important non-dimensional numbers:the Reynolds number Re, measuring the importance of viscous effects, andthe Mach number M, measuring the compressibility effects by relating the fluidvelocity to the speed of sound. High Re flow is characterized by partly turbulentflow,andhigh M flowby shocks andcontact discontinuities, all phenomenaassociatedwith complex flow on a range of scales. The Euler equations correspondsto the limit of inviscid flow where Re → ∞, and incompressible flow correspondsto the limit of M → 0.240

Johan Hoffman, Johan Jansson, Niclas Jansson, Claes Johnson and MurtazoNazarov28.3 Mathematical frameworkThe mathematical theoryfor the Navier-Stokes equations is incomplete, withoutany proof of existence or uniqueness, formulated as one of the Clay Institute $1million problems. What is available if the proof of existence of weak solutions byLeray from 1934, but this proof does not extend to the inviscid case of the Eulerequations. No general uniqueness result is available for weal solutions, whichlimits the usefulness of the concept.In [] a new mathematical framework of weak (output) uniqueness is introduced,characterizing well-posedness of weak solutions with respect to functionalsM of the solution u, thereby circumventing the problem of non-existence ofclassical solutions. This framework extends to the Euler equations, and also tocompressible flow, where the basic result takes the form|M(u) − M(U)| ≤ S(‖R(u)‖ −1 + ‖R(U)‖ −1 ) (28.1)with ‖ · ‖ −1 a weak norm measuring residuals R of two weak solutions u and U,and with S a stability factor given by a duality argument connecting local errorsto output errors in M.28.4 Computational methodComputational methods in fluid mechanics are typically very specialized; for acertain range of Re or M, or for a particular class of geometries. In particular,there is a wide range of turbulence models and shock capturing techniques.The goal of Unicorn is to design one method with one implementation, capableof modeling general geometry and the whole range of parameters Re and M.The mathematical framework of well-posedness allows for a general foundationfor Newtonian fluid mechanics, and the General Galerkin (G2) finite elementmethod offers a robust algorithm to compute weak solutions [?].Adaptive G2 methods are based on a posteriori error estimates of the form:|M(u) − M(U)| ≤ ∑ KE K (28.2)with E K a local error indicator for cell K.Parallel mesh refinement...[Unicorn implementation]28.5 Boundary conditions1/2 pagefriction bc, turbulent boundary conditions[Unicorn implementation]241

Turbulent Flow and Fluid–Structure Interaction with Unicorn28.6 Geometry modeling1/2 pageprojection to exact geometry[Unicorn implementation]28.7 Fluid-structure interaction1 pageChallenges: stablity coupling, weak strong, monolithicdifferent software, methodsmesh algorithms, smoothing, MadlibUnified continuum fluid-structure interaction[Unicorn implementation]28.8 Applications28.8.1 Turbulent flow separation1 pagesdrag crisis, cylinders and spheres, etc.28.8.2 Flight aerodynamics2 pagenaca aoa28.8.3 Vehicle aerodynamics1 pageVolvo28.8.4 Biomedical flow1 pageALE heart model28.8.5 Aeroacoustics1 pageSwenox mixer - aerodynamic sources242

Johan Hoffman, Johan Jansson, Niclas Jansson, Claes Johnson and MurtazoNazarov28.8.6 Gas flow2 pagescompressible flow28.9 References1 page243

CHAPTER29Fluid–Structure Interaction using Nitsche’s MethodBy Kristoffer Selim and Anders LoggChapter ref: [selim]In this study, we present a 2D fluid–structure interaction (FSI) simulation ofa channel flow containing an elastic valve that may undergo large deformations.The FSI occurs when the fluid interacts with the solid structure of the valve,exerting pressure that causes deformation of the valve and, thus, alters the flowof the fluid itself. To describe the equations governing the fluid flow and theelastic valve, two separate meshes are used and Nitsche’s method is used tocouple the equations on the two meshes. The method is based on continuouspiecewise approximations on each mesh with weak enforcement of the propercontinuity at the interface defined by the boundary of one of the overlappingmeshes.245

CHAPTER30Improved Boussinesq Equations for Surface WaterWavesBy N. Lopes, P. Pereira and L. TrabuchoChapter ref: [lopes]◮ Editor note: Move macros to common macros after deciding what to do about boldfonts.◮ Editor note: List authors with full namesThe main motivation of this work is the implementation of a general solverfor some of the improved Boussinesq models. Here, we use the second ordermodel proposed by Zhao et al. [?] to investigate the behaviour of surface waterwaves. Some effects like surface tension, dissipation and wave generation bynatural phenomena or external physical mechanisms are also included. As aconsequence, some modified dispersion relations are derived for this extendedmodel.30.1 IntroductionThe FEniCS project, via DOLFIN and FFC, provides a good support for theimplementationoflarge scaleindustrial models. Weimplementasolverforsomeof the Boussinesq type systems to model the evolution of surface water waves ina variable depth seabed. This type of models is used, for instance, in harboursimulation 1 , tsunamigeneration andpropagation aswellasin coastal dynamics.1 See Fig. 30.1 for an example of a standard harbour.247

Improved Boussinesq Equations for Surface Water WavesFigure 30.1: Nazaré’s harbour, Portugal.◮ Editor note: Need to get permission for this figure!There are several Boussinesq models and some of the most widely used arethose based on the wave Elevation and horizontal Velocities formulation (BEV)(see, e.g., [?], [?]).In the next section the governing equations for surface water waves are presented.From these equations different types of models can be derived. We consideronly the wave Elevation and velocity Potential (BEP) formulation. Thus,the number of system equations is reduced when compared to the BEV models.Two different types of BEP models are taken into account:i) a standard sixth-order model;ii) the second-order model proposed by Zhao, Teng and Cheng (ZTC) (cf. [?]).We use the sixth-order model to illustrate a standard technique in order to derivea Boussinesq-type model. In the subsequent sections, only the ZTC modelis considered. Note that these two models are complemented with some extraterms, due to the inclusion of effects like dissipation, surface tension and wavegeneration by moving an impermeable bottom or using a source function.An important characteristic of the modified ZTC model, including dissipativeeffects, is presented in the third section, namely, the dispersion relation.In the fourth and fifth sections, we describe several types of wave generation,absorption and reflection mechanisms. Initial conditions for a solitary wave and248

N. Lopes, P. Pereira and L. Trabuchoa periodic wave induced by Dirichlet boundary conditions are also presented.Moreover, we complement the ZTC model using a source function to generatesurface water waves, as proposed in [?]. Total reflective walls are modelled bystandard zero Neumann conditions for the surface elevation and velocity potential.The wave energy absorption is simulated using sponge layers.The following section is dedicated to the numerical methods used in the discretizationof the variational formulation. The discretization of the spatial variablesis accomplished with low order Lagrange elements whereas the time integrationis implemented using Runge-Kutta and Predictor-Corrector algorithms.In the seventh section, the ZTC numerical model is used to simulate the evolutionof a periodic wave in an harbour geometry like that one represented inFig. 30.1.30.2 Model derivationAs usual we consider the following set of equations for the irrotational flow of anincompressible and inviscid fluid:⎧( )∂uP⎪⎨ ∂t + (u · ∇ xyz)u = −∇ xyzρ + g z ,(30.1)∇ xyz × u = 0,⎪⎩∇ xyz · u = 0,where uisthevelocityvectorfieldofthefluid, P thepressure, g thegravitationalacceleration, [ ρ the mass per unit volume, t the time and the differential operator∂∇ xyz =∂x , ∂∂y , ∂ ]. A Cartesian coordinate system is adopted with the horizontalx and y-axes on the still water plane and the z-axis pointing vertically∂zupwards (see Fig. 30.2). The fluid domain is bounded by the bottom seabed atz = −h(x, y, t) and the free water surface at z = η(x, y, t).◮ Editor note: AL: Need to decide what to do about bold fonts for vectors. I prefer not touse it.In Fig. 30.2, L, A and H are the characteristic wave length, wave amplitudeand depth, respectively. Note that the material time derivative is denoted by D . DtFromtheirrotationalassumption,onecanintroduceavelocitypotentialfunction,φ(x, y, z, t), to obtain Bernoulli’s equation:∂φ∂t + 1 2 ∇ xyzφ · ∇ xyz φ + P ρ+ g z = f(t), (30.2)where f(t) stands for an arbitrary function of integration. Note that one canremove f(t) from equation (30.2) if φ is redefined by φ + ∫ f(t) dt. From the incompressibilitycondition (see (30.1) 3 ) the velocity potential satisfies Laplace’s249

Improved Boussinesq Equations for Surface Water Waveszz = η(x, y, t)LAD(z − η(x, y, t)) = 0Dtz = −Hoxz = −h(x, y, t)D(z + h(x, y, t)) = 0DtFigure 30.2: Cross-section of the water wave domain.equation:∇ 2 φ + ∂2 φ= 0, (30.3)∂z2 [ ∂where ∇ is the horizontal gradient operator given by ∇ =∂x , ∂ ]. To close this∂yproblem, the following boundary conditions must be satisfied:i) the kinematic boundary condition for the free water surface:∂φ∂z = ∂η + ∇φ · ∇η, z = η; (30.4)∂tii) the kinematic boundary condition for the impermeable sea bottom:∂φ∂z+ (∇φ · ∇h) = −∂h, z = −h; (30.5)∂tiii) the dynamic boundary condition for the free water surface:(∂φ∂t + gη + 1 ( ) ) 2 ∂φ|∇φ| 2 + + D(φ) − W(η) = 0, z = η, (30.6)2 ∂zwhere D(φ) is a dissipative term (see, e.g., the work by Duthyk and Dias [?]). Weassume that this dissipative term is of the following form:D(φ) = ν ∂2 φ∂z 2 , (30.7)with ν = µ/ρ and µ an eddy-viscosity coefficient. Note that a non-dissipativemodel means that there is no energy loss. This is not acceptable from a physicalpoint of view, since any real flow is accompanied by energy dissipation.250

N. Lopes, P. Pereira and L. TrabuchoIn equation (30.6), W is the surface tension term given by:( ( ) ) (2 ( ) )∂η ∂ 2 2η ∂η1 +∂y ∂x + ∂ 2 η1 +2 ∂x ∂y − 2∂η ∂η ∂ 2 η2 ∂x ∂y ∂x∂yW(η) = T, (30.8)(1 + |∇η| 2 ) 3/2where T is the surface tension coefficient.Using Laplace’s equation it is possible to write the dissipative term, mentionedabove, as D(φ) = −ν∇ 2 φ. In addition, the linearization of the surfacetension term results in W(η) = T ∇ 2 η. Throughout the literature, analogousterms were added to the kinematic and dynamic conditions to absorb the waveenergyneartheboundaries. Thesetermsarerelatedwiththespongeordampinglayers. The physical meaning of these terms was explained, for instance, in therecent work of Dias et al. [?]. As we will see later, the dispersion relations canbe modified using these extra terms. The surface tension effects are importantif short waves are considered. In general this is not the case. In fact, the longwave assumption is made to derive these extended models. We refer the worksby Wangetal. [?] aswellas Dash and Daripa [?], which included surface tensioneffects in the KdV (Korteweg-de Vries) and Boussinesq equations. On the otherhand, it is worth to mention that one of the main goals of the scientific researchon Boussinesq wave models is the improvement of the range of applicability interms of the water-depth/wave-length relationship. A more detailed descriptionof the above equations is found in the G. B. Whitham’s reference book on waves[?], or in the more recent book by R. S. Johnson [?].30.2.1 Standard modelsIn this subsection, we present a generic Boussinesq system using the velocitypotential formulation. To transform equations (30.2)-(30.8) in a dimensionlessform, the following scales are introduced:(x ′ , y ′ ) = 1 L (x, y), z′ = z H , t′ = t√ gHL , η′ = η A , φ′ = HφAL √ gH , h′ = h H ,together with the small parameters(30.9)µ = H L , ε = A H . (30.10)In the last equation, µ is usually called the long wave parameter and ε the smallamplitude wave parameter. Note that ε is related with the nonlinear terms andµ with the dispersive terms. For simplicity, in what follows, we drop the primenotation.TheBoussinesqapproach consistsinreducinga3Dproblemtoa2Done. ThismaybeaccomplishedbyexpandingthevelocitypotentialinaTaylorpowerseries251

Improved Boussinesq Equations for Surface Water Wavesin terms of z. Using Laplace’s equation, in a dimensionless form, one can obtainthe following expression for the velocity potential:+∞∑)φ(x, y, z, t) =((−1) n z2n(2n)! µ2n ∇ 2n φ 0 (x, y, t) + (−1) n z 2n+1(2n + 1)! µ2n ∇ 2n φ 1 (x, y, t) ,withn=0(30.11)φ 0 = φ | z=0 ,( ) ∂φφ 1 = | z=0 .∂z(30.12)Fromasymptoticexpansions, successive approximation techniquesandthekinematicboundary condition for the sea bottom, it is possible to write φ 1 in termsof φ 0 (cf. [?], [?]). In this work, without loss of generality, we assume that thedispersive and nonlinear terms are related by the following equation:Note that the Ursell number is defined by U r = ε µ 2 .ε= O(1). (30.13)µ2A sixth-order model is obtained if φ 1 is expanded in terms of φ 0 and all termsupto O(µ 8 ) are retained. Thus, theasymptotic kinematic and dynamic boundaryconditions for the free water surface are rewritten as follows 2 :⎧∂η∂t + ε∇ · (η∇φ 0) − 1 µ 2φ 1 + ε22 ∇ · (η2 ∇φ 1 ) = O(µ 6 ),⎪⎨⎪⎩where φ 1 is given by:∂φ 0∂t + εη∂φ 1∂t + η + ε 2 |∇φ 0| 2 + ε 2 ∇φ 0 · η∇φ 1 −−ε 2 η∇ 2 φ 0 φ 1 + ε2µ 2φ2 1 + D(φ 0 , φ 1 ) − W(η) = O(µ 6 ),φ 1 = −µ 2 ∇ · (h∇φ 0 ) + µ46 ∇ · (h 3 ∇ 3 φ 0)−µ 42 ∇ · (h 2 ∇ 2 · (h∇φ 0 ) ) −− µ6120 ∇ · (h 5 ∇ 5 φ 0)+µ 624 ∇ · (h 4 ∇ 4 · (h∇φ 0 ) ) + µ6− µ64 ∇ · (h 2 ∇ 2 · (h 2 ∇ 2 · (h∇φ 0 ) )) − µ2ε)− µ2+ µ2ε(µ 424 ∇ · h 4 ∇ 3∂h∂tTo obtain equation (30.15), we assume that ∂h∂t2 Note that D and W are, now, dimensionless functions.ε12 ∇ · (h 2 ∇ 2 · (h )) 3 ∇ 3 φ 0 −(µ 22 ∇ · h 2 ∇ ∂h )+∂t∂h∂t − µ2ε( (µ 44 ∇ · h 2 ∇ 2 h 2 ∇ ∂h∂t= O(ε) (cf. [?]).(30.14)))+ O(µ 8 ). (30.15)252

N. Lopes, P. Pereira and L. Trabucho30.2.2 Second-order modelThe low order equations are obtained, essentially, via the slowly varying bottomassumption. Inparticular, only O(h, ∇h)termsare retained. Also, onlylow ordernonlinear terms O(ε) are admitted. In fact the modified ZTC model is writtenretaining only O(ε, µ 4 ) terms.Under these conditions, (30.14) and (30.15) lead to:⎧∂η⎪⎨ ∂t + ε∇ · (η∇φ 0) − 1 µ 2φ 1 = O(µ 6 ),and⎪⎩∂φ 0∂t + η + ε 2 |∇φ 0| 2 − νε∇ 2 φ 0 − µ 2 T ∇ 2 η = O(µ 6 )φ 1 = −µ 2 ∇ · (h∇φ 0 ) + µ46 ∇ · (h 3 ∇ 3 φ 0)−µ 42 ∇ · (h 2 ∇ 2 · (h∇φ 0 ) ) −− 2µ615 h5 ∇ 6 φ 0 − 2µ 6 h 4 ∇h · ∇ 5 φ 0 − µ2ε(30.16)∂h∂t + O(µ8 ). (30.17)Thus, these extended equations, in terms of the dimensional variables, are writtenas follows:⎧∂η⎪⎨ ∂t + ∇ · [(h + η)∇Φ] − 1 2 ∇ · [h2 ∇ ∂η∂t ] + 1 6 h2 ∇ 2∂η∂t − 115 ∇ · [h∇(h∂η )] = −∂h∂t ∂t ,⎪⎩∂Φ∂t + 1 2 |∇Φ|2 + gη − 1 15 gh∇ · (h∇η) − ν∇2 Φ − gT ∇ 2 η = 0,where Φ is the transformed velocity potential given by:(30.18)Φ = φ 0 + h 15 ∇ · (h∇φ 0). (30.19)The transformed velocity potential is used with two main consequences (cf. [?]):i) the spatial derivation order is reduced to the second order;ii) linear dispersion characteristics, analogous to the fourth-order BEP modelproposed by Liu and Woo [?] and the third-order BEV model developed byNwogu [?], are obtained.30.3 Linear dispersion relationOne of the most important properties of a water wave model is described by thelinear dispersion relation. From this relation we can deduce the phase velocity,253

Improved Boussinesq Equations for Surface Water Wavesgroup velocity and the linear shoaling. The dispersion relation provides a goodmethod to characterize the linear properties of a wave model. This is achievedusing the linear wave theory of Airy.In this section we follow the work by Duthyk and Dias [?]. Moreover, wepresent a generalized version of the dispersion relation for the ZTC model withthe dissipative term mentioned above. One can also include other dampingterms, which are usually used in the sponge layers.For simplicity, a 1D-Horizontal model is considered. To obtain the dispersionrelation, a standard test wave is assumed:⎧⎪⎨ η(x, t) = a e i(kx−ωt) ,(30.20)⎪⎩Φ(x, t) = −bie i(kx−ωt) ,where a is the wave amplitude, b the potential magnitude, k = 2π/L the wavenumber and ω the angular frequency. This wave, described by equations (30.20),is the solution of the linearized ZTC model, with a constant depth bottom and anextra dissipative term, if the following equation is satisfied:ω 2 − ghk21 + (1/15)(kh)2+ iνωk 2 = 0. (30.21)1 + 2/5(kh) 2Using Padé’s [2,2] approximant, the dispersion relation given by last equation isaccurate up to O((kh) 4 ) or O(µ 4 ) when compared with the following equation:ω 2 − ghk 2tanh(kh)kh+ iνωk 2 = 0. (30.22)In fact, equation (30.22) is the dispersion relation of the full linear problem.From (30.21), the phase velocity, C = w , for this dissipative ZTC model iskgiven by:√C = − iνk ( ) 2 νk2 ± − + gh (1 + 1/15(kh)2 )2 (1 + 2/5(kh) 2 ) . (30.23)(In Fig. 30.3, we can see the positive real part of C/ √ )gh as a function of kh forthe following models: full linear theory (FL), Zhao et al. (ZTC), full linear theorywith a dissipative model (FL D) and the improved ZTC model with the dissipativeterm (ZTC D). From Fig. 30.3, one can also see that these two dissipativemodels ( admit critical wave numbers k 1 and k 2 , such that the positive part ofRe C/ √ )gh is zero for k ≥ k 1 and k ≥ k 2 . To avoid some numerical instabilitiesone can optimize the ν values in order to reduce the short waves propagation.In general, to improve the dispersion relation one can also use other transformationslike (30.19), or evaluate the velocity potential at z = −σh (σ ∈ [0, 1])instead of z = 0 (cf. [?], [?] and [?]).254

N. Lopes, P. Pereira and L. Trabucho1.00.8FLZTCFL_DZTC_DRe( C/ gh )0.60.40.20.00 5 10 15 20kh(Figure 30.3: Positive part of Re C/ √ )gh for several models.30.4 Wave generationIn this section some of the physical mechanisms to induce surface water wavesare presented. We note that the moving bottom approach is useful for wave generationdue to seismic activities. However, some physical applications are associatedwith other wave generation mechanisms. For simplicity, we only considermechanisms to generate surface water waves along the x direction.30.4.1 Initial conditionThe simplest way of inducing a wave into a certain domain is to consider anappropriate initial condition. An usefulandtypical benchmark case is toassumea solitary wave given by:η(x, t) = a 1 sech 2 (kx − ωt) + a 2 sech 4 (kx − ωt), (30.24)u(x, t) = a 3 sech 2 (kx − ωt), (30.25)wheretheparameters a 1 and a 2 arethewaveamplitudesand a 3 isthemagnitudeof the velocity in the x direction. As we use a potential formulation, Φ is givenby:Φ(x, t) = −2a 3 e 2ωtk (e 2ωt + e 2kx ) + K 1(t), (30.26)255

Improved Boussinesq Equations for Surface Water Waveswhere K 1 (t) is a time-dependent function of integration.In [?] and [?] the above solitary wave was presented as a solution of the extendedNwogu’s Boussinesq model.30.4.2 Incident waveFor time-dependent wave generation, it is possible to consider waves inducedby a boundary condition. This requires that the wave surface elevation and thevelocity potential must satisfy appropriated boundary conditions, e.g., Dirichletor Neumann conditions.The simplest case is to consider a periodic wave given by:η(x, t) = a sin(kx − ωt) (30.27)Φ(x, t) = − c k cos(kx − ωt) + K 2(t), (30.28)where c isthewave velocity magnitude and K 2 (t) isatime-dependent functionofintegration. This function K 2 (t) must satisfy the initial condition of the problem.In equations (30.27) as well as (30.28), one can note that the parameters a, c, kand ω are not all arbitrary, since they are related by the dispersion relation. Onecan also consider the superposition of water waves as solutions of the full linearproblem with a constant depth.30.4.3 Source functionIntheworkbyWeietal. [?],asourcefunctionforthegenerationofsurfacewaterwaves was derived. This source function was obtained, using Fourier transformand Green’s functions, to solve the linearized and non homogeneousequations ofthe Peregrine [?] and Nwogu’s [?] models. This mathematical procedure can alsobe adapted here to deduce the source function.WeconsideramonochromaticGaussianwavegeneratedbythefollowingsourcefunction:f(x, t) = D ∗ exp(−β(x − x s ) 2 ) cos(ωt), (30.29)with D ∗ given by:D ∗ =√ βω √ k2a exp(π 4β ) 215 h3 k 3 g. (30.30)In the above expressions x s is the center line of the source function and β is aparameter associated with the width of the generation band (cf. [?]).256

N. Lopes, P. Pereira and L. Trabucho30.5 Reflective walls and sponge layersBesides the incident wave boundaries where the wave profiles are given, onemust close the system with appropriate boundary conditions. We consider twomore types of boundaries:i) full reflective boundaries;ii) partial reflective or absorbing boundaries.The first case is modelled by the following equations:∂Φ∂n = 0,∂η∂n= 0, (30.31)where n is the outward unit vector normal to the computational domain Ω. Wedenote Γ as the boundary of Ω.Note that in the finite element formulation, the full reflective boundaries(equations (30.31)) are integrated by considering zero Neumann-type boundaryconditions.Coupling the reflective case and an extra artificial layer, often called spongeordampinglayer, wecan simulatepartial reflective orfullabsorbing boundaries.Inthisway,thereflectedenergycanbecontrolled. Moreover,onecanpreventunwantedwave reflections and avoid complex wave interactions. It is also possibleto simulate effects like energy dissipation by wave breaking.In fact, a sponge layer can be defined as a subset L of Ω where some extraviscosity term is added. As mentioned above, the system of equations can incorporateseveral extra damping terms, like that one provided by the inclusion of adissipative model. Thus, the viscosity coefficient ν is described by a function ofthe form:⎧0, (x, y) ∉ L,⎪⎨(ν(x, y) = d(x, y)exp⎪⎩ n 1) n2− 1d L, (x, y) ∈ L,exp(1) − 1(30.32)where n 1 and n 2 are, in general, experimental parameters, d L is the sponge-layerdiameter and d stands for the distance function between a point (x, y) and theintersection of Γ and the boundary of L (see, e.g., [?] pag. 79).30.6 Numerical MethodsWe start this section by noting that a detailed description of the implementednumerical methods referred bellow can be found in the work of N. Lopes [?].257

Improved Boussinesq Equations for Surface Water WavesFor simplicity, we only consider the second-order system described by equations(30.18) restricted to a stationary bottom and without dissipative, surfacetension or extra source terms.The model variational formulation is written as follows:∫∂ηΩ ∂t ϑ 1 dxdy + 2 ∫ ( ) ∂ηh 2 ∇ ∇ϑ 1 dxdy − 1 ∫ ( ) ∂ηh(∇h)∇ ϑ 1 dxdy+5 Ω ∂t3 Ω ∂t+ 1 ∫h∇h ∂η ∫∫15 Ω ∂t ∇ϑ ∂Φ1 dxdy +Ω ∂t ϑ 2 dxdy = (h + η)∇Φ∇ϑ 1 dxdy+Ω+ 2 ∫h 2 ∂ ( ) ∂ηϑ 1 dΓ − 1 ∫h ∂h ∫∂η5 Γ ∂t ∂n 15 Γ ∂n ∂t ϑ 1 dΓ − (h + η) ∂ΦΓ ∂n ϑ 1 dΓ−− 1 ∫∫|∇Φ| 2 ϑ 2 dxdy − g ηϑ 2 dxdy − g ∫h 2 ∇η∇ϑ 2 dxdy−2 ΩΩ 15 Ω− g ∫h(∇h)(∇η)ϑ 2 dxdy + g ∫h 2 ∂η15 Ω 15 Γ ∂n ϑ 2 dΓ,(30.33)where the unknown functions η and Φ are the surface elevation and the transformedvelocity potential, whereas ϑ 1 and ϑ 2 are the test functions defined inappropriate spaces.The spatial discretization of this equation is implemented using low orderLagrange finite elements. In addition, the numerical implementation of (30.33)is accomplished using FFC.We use a predictor-corrector scheme with an initialization provided by theRunge-Kutta method for the time integration. Note that the discretization ofequation (30.33) can be written as follows:MẎ = f(t, Y ), (30.34)( ∂ηwhere Ẏ and Y refer to ∂t , ∂Φ )and to (η, Φ), respectively. The known vector f∂tis related with the right-hand side of (30.33) and M is the coefficient matrix. Inthis way, the fourth order Adams-Bashforth-Moulton method can be written asfollows:⎧MY ⎪⎨(0)n+1 = MY n + ∆t24 [55f(t n, Y n ) − 59f(t n−1 , Y n−1 ) + 37f(t n−2 , Y n−2 ) − 9f(t n−3 , Y n−3 )]⎪⎩ MY (1)n+1 = MY n + ∆t24 [9f(t n+1, Y (0)n+1 ) + 19f(t n, Y n ) − 5f(t n−1 , Y n−1 ) + f(t n−2 , Y n−2 )](30.35)where ∆t is the time step, t n = n∆t (n ∈ N) and Y n = (η, Φ) evaluated at t n . Thepredicted and corrected values of Y n are denoted by Y n (0) and Y n(1) , respectively.The corrector-step equation ((30.35) 2 ) can be iterated as function of a predefinederror between consecutive time steps. For more details see, e.g., [?] or [?].,258

N. Lopes, P. Pereira and L. Trabucho30.7 Numerical ApplicationsIn this section, we present some numerical results about the propagation of surfacewater waves in an harbour with a geometry similar to that one of Fig. 30.1.The colour scale used in all figures in this section is presented in Fig. 30.4.A schematic description of the fluid domain, namely the bottom profile and thesponge layer can be seen in Figs. 30.5 and 30.6, respectively. Note that a piecewiselinear bathymetry is considered. A sponge layer is used to absorb the waveenergy at the outflow region and to avoid strong interaction between incidentand reflected waves in the harbour entrance. A monochromatic periodic wave,with an amplitude of 0.3m, is introduced at the indicated boundary (DirichletBC) in Fig. 30.6. This is achieved by considering a periodic Dirichlet boundarycondition as described in the subsection 30.4.2. Full reflective walls are assumedas boundary conditions in all domain boundary except in the harbour entrance.In Fig. 30.7 a wave elevation snapshot is shown. A zoom of the image, whichdescribes the physical potential and velocity vector field in the still water plane,is given in the neighbourhood of the point (x, y) = (150, 0)mat the same time step(see Fig. 30.8). In Fig. 30.9 two time steps of the speed in the still water planeare shown.←− Max←− minFigure 30.4: Scale.Figure 30.5: Impermeable bottom[Max = −5.316m,min = −13.716m].From these numerical results, one can conclude that the interaction betweenincident and reflected waves, near the harbour entrance, can generate wave amplitudesof, approximately, 0.84m. These amplitudes almost take the triple valueof the incident wave amplitude. One can also observe an analogous behaviourfor velocities. The maximum water speed of the incident waves at z=0 is, approximately,1.2m/s whereas, after the interaction, the maximum speed value isalmost 4m/s.259

Improved Boussinesq Equations for Surface Water Wavesy/30m-15 -10 -5 0 5 10 152010(Dirichlet BC) −→500-5-5-10-10-15 -10 -5 0 5 1015x/30mFigure 30.6: Sponge layer (viscosity) [Max ≈ 27m 2 /s,min = 0m 2 /s].Figure 30.7: Surface elevation [Max ≈ 0.84m,min ≈ −0.81m].30.8 Conclusions and future workAsfarasweknow,thefiniteelementmethodisnotoftenappliedinsurfacewaterwavemodelsbasedontheBEPformulation. Ingeneral, finitedifferencemethodsare preferred but are not appropriated for the treatment of complex geometrieslike the ones of harbours, for instance.Infact, thesurface waterwave problems are associated withBoussinesq-typegoverning equations, which require very high order (≥ 6) spatial derivatives or avery high number of equations (≥ 6). A first approach, to the high-order modelsusing discontinuous Galerkin finite element methods, can be found in [?].FromthisworkonecanconcludethattheFEniCS packages,namelyDOLFINand FFC, are appropriated to model surface water waves.We have been developing DOLFWAVE, i.e., a FEniCS based application for260

N. Lopes, P. Pereira and L. TrabuchoFigure 30.8: Velocity vector field at z = 0 and potential magnitude[Max ≈ 36m 2 /s,min ≈ −9m 2 /s].Figure 30.9: Water speed [Max ≈ 4m/s,min = 0m/s].surfacewaterwaves. DOLFWAVEwillbecompatiblewithXd3dpost-processor 3 .The current state of the work, along with several numerical simulations, canbe found at http://ptmat.fc.ul.pt/∼ndl. This package will include somestandard potential models of low order (≤ 4) as well as other new models to besubmitted elsewhere by the authors [?].3 http://www.cmap.polytechnique.fr/∼jouve/xd3d/261

Improved Boussinesq Equations for Surface Water Waves30.9 AcknowledgmentsL. Trabucho work is partially supported by Fundação para a Ciência e Tecnologia,Financiamento Base 2008-ISFL-1-209. N. Lopes work is supported by InstitutoSuperior de Engenharia de Lisboa – ISEL.N. Lopes ∗,⋄,† , e-mail: ndl@ptmat.fc.ul.ptP. Pereira ∗,♯ ,e-mail: ppereira@deq.isel.ipl.ptL. Trabucho ⋄,† ,e-mail: trabucho@ptmat.fc.ul.pt∗ Área Científica de MatemáticaISEL-Instituto Superior de Engenharia de Lisboa,Rua Conselheiro Emídio Navarro, 11959-007 LisboaPortugal⋄ Departamento de MatemáticaFCT-UNL-Faculdade de Ciências e Tecnologia2829-516 CaparicaPortugal† CMAF-Centro de Matemática e Aplicações Fundamentais,Av. Prof. Gama Pinto, 21649-003 Lisboa,Portugal♯ CEFITEC-Centro de Física e Investigação TecnológicaFCT-UNL-Faculdade de Ciências e Tecnologia2829-516 CaparicaPortugal262

CHAPTER31Multiphase Flow Through Porous MediaBy Xuming Shan and Garth N. WellsChapter ref: [shan]Summariseworkonautomatedmodellingformultiphaseflowthroughporousmedia.263

CHAPTER32Computing the Mechanics of the HeartBy Martin S. Alnæs, Kent-Andre Mardal and Joakim SundnesChapter ref: [alnes-4]Heartfailure isoneofthemostcommoncausesofdeathin thewesternworld,and it is therefore of great importance to understand the behaviour of the heartbetter, with the ultimate goal to improve clinical procedures. The heart is acomplicated electro-mechanical pump, and modelling its mechanical behaviourrequires multiscale physics incorporating coupled electro-chemical and mechanicalmodels both at the cell and continuum level. In this chapter we presenta basic implementation of a cardiac electromechanics simulator in the FEniCSframework.265

CHAPTER33Simulation of Ca 2+ Dynamics in the Dyadic CleftBy Johan HakeChapter ref: [hake]33.1 IntroductionFrom when we are children we hear that we should drink milk because it is animportantsourceforcalcium(Ca 2+ ),andthatCa 2+ isvitalforastrongbonestructure.Whatwedonothearasfrequently, isthatCa 2+ isoneofthemostimportantcellular messengers we have in our body [?]. Among other things Ca 2+ controlscell death, neural signaling, secretion of different chemical substances to thebody, and what will be our focus in this chapter, the contraction of cells in theheart.In this chapter I will first discribe a mathematical model that can be used tomodel Ca 2+ dynamics in a small sub cellular domain called the dyadic cleft. Themodel includes Ca 2+ diffusion that is described by an advection-diffusion partialdifferentialequation,anddiscretechanneldynamicsthatisdiscribedbystochasticMarkov models. Numerical methods implemented in PyDOLFINsolving thepartial differental equation will also be presented. A time stepping scheme forsolving the stochastic and deterministic models in a coupled manner will be presentedin the last section. Here will also a solver framwork, diffsim, that implementsthetime stepping scheme together with the other presented numericalmethods be presented.267

Simulation of Ca 2+ Dynamics in the Dyadic Cleft33.2 Biological backgroundIn a healthy heart every beat origins in the sinusoidal node, where pacemakercells triggers an electric signal. This signal propagates through the whole heart,and results in a sudden change in electric potential between the interior andexterior of the heart cells. These two domains are separated by the cell membrane.The difference in electric potential between these domains is called themembrane potential. The sudden change in the membrane potential, an actionpotential, is facilitated by specialized ion channels that reside in the membrane.When an action potential arrives a heart cell it triggers Ca 2+ channel that bringsCa 2+ into the cell. The Ca 2+ then diffuse over a small cleft, called the dyadiccleft, and triggers further Ca 2+ release from an intracellular Ca 2+ storage, thesarcoplasmic reticulum (SR). The Ca 2+ ionsthendiffuse tothemain intracellulardomain of the cell, the cytosole, in which the contractile proteins are situated.The Ca 2+ ions attach to these proteins and triggers contraction. The strength ofthecontractioniscontrolledbythestrengthoftheCa 2+ concentrationincytosole.The contraction is succeeded by a period of relaxation, which is facilitated by theextraction of Ca 2+ from the intracellular space by various proteins.This chain of events is called the Excitation Contraction (EC) coupling [?].Several severe heart diseases can be related to impaired EC coupling and bybroaden the knowledge of the coupling it will also be possible to develop bettertreatments for the diseases. Although the big picture of EC coupling is straightforward to grasp it conceals the nonlinear action of hundreds of different proteinspecies. Computational methods have emerged as a natural complement toexperimental studies in the ongoing strive to better understand the intriguingcoupling, and it is in this light the present study is presented. Here I will focuson the initial phase of the EC coupling, i.e., when Ca 2+ flows into the cell andtriggers further Ca 2+ release.33.3 Mathematical models33.3.1 GeometryThe dyadic cleft is the space between a structure called the t-tubule (TT) andthe SR. TT is a network of pipe-like invaginations of the cell membrane thatperforate the heart cell[?]. Fig. 33.1 A presents a sketch of a small part of asingle TT together with a piece of SR. Here one can see that the junctional SR(jSR) wraps the TT, and the small volume between these structures is the dyadiccleft. The space is not well defined as it is crowded with channel proteins, andthe size of it also varies. In computational studies it is commonly approximatedas a disk or a rectangular slab [?, ?, ?, ?]. In this study I have used a disk, seeFig. 33.1 A. The height of the disk is: h = 12nm, and the radius is: r = 50nm.268

Johan HakeABFigure 33.1: A: A diagram showing the relationship between the TT, the SR andthe jSR. The volume between the flat jSR and the TT is the dyadic cleft. Theblack structures in the cleft are Ryanodine receptors, which are large channelproteins. The figure is from [?]. B: The geometry that is used for the dyadiccleft. The top of the disk are the cell membrane of the SR, or jSR, the bottomthe cell membrane of the TT, and the circumference of the disk is the interfaceto the cytosole. The top of the two small elevations models the mouthsof two ionchannels.Larger radius can be used, e.g., up to 200 nm, but due to numerical limitationsI have to limit the size of the cleft, see below. The diffusion constant of Ca 2+ wasset to σ = 10 5 nm 2 ms −1 [?].33.3.2 Ca 2+ DiffusionElectro-DiffusionThecellmembrane,also called thesarcolemma, consistsofalipidbi-layer, whichproduce an electric potential in the solution. This potential is due to negativelycharged phospholipid head-groups [?, ?]. Therefore in close proximity to the sarcolemma,an electric double layer is produced [?]. I am going to use the Gouy-Chapman method, which defines a diffuse layer to describe this double layer[?]. The theory introduces an advection term to the ordinary diffusion equation,which makes the resulting equation harder to solve.The ion flux in a solution that experience an electric field is governed by theNernst-Planck equation,J = −σ (∇c − 2 cE), (33.1)where σ is the diffusion constant of Ca 2+ , c = c(x, t) is the Ca 2+ concentration,E = E(x) is the non-dimensional electric field (the original electric field scaledwith e/kT) and 2 is the valence of Ca 2+ . Assuming conservation of mass, wearrive at the general advection-diffusion equation,ċ = σ [∆c − ∇ · (2 cE)] . (33.2)269

Simulation of Ca 2+ Dynamics in the Dyadic CleftE follows from the non-dimensional potential, ψ, (the ordinary electric potentialscaled with e/kT) in the solution as,E = −∇ψ. (33.3)The strength of ψ is defined by the amount of charged head-groups in the lipidbi-layers and by the combined screening effect of all ions in the dyadic cleft.Following previous work done by [?] and [?] all other ions in the solution will betreated as being in steady state. The sarcolemma is assumed to be planar andeffectively infinite. This let us use an approximation of the electric potential inthe solution,ψ(z) = ψ 0 exp(−κz). (33.4)Here ψ 0 is the non-dimensional potential at the membrane, κ the inverse Debyelength and z the distance from the sarcolemmar in a perpendicular direction. [?]showed that for a large range of [ Ca 2+] , ψ 0 stayed constant at -2.2 and κ is alsoassumed to be 1 nm.◮ Editor note: Check formula in (33.4). Got undefinedcontrol sequence when comping soneeded to edit.Boundary fluxesThe SR and TT membrane is impermeable for ions, effectively making ∂Ω 0 , inFig. 33.1, a no-flux boundary, giving us,J 0 = 0. (33.5)The main sources for Ca 2+ inflow to the dyadic cleft in our model, is the L-typeCa 2+ channel (LCC). This flux comes in at the ∂Ω [1,2] boundaries, see Fig. 33.1.After entering the cleft the Ca 2+ then diffuse to the RyR situated at the SR membrane,triggering more Ca 2+ influx. This flux will not be included in the simulations,however the stochastic dynamic of the opening of the channel will be included,seeSection33.3.3below. TheCa 2+ thatentersthedyadiccleft diffuse intothe main compartement of cytosole introducing a third flux at the ∂Ω 3 boundary.The LCC is a stochastic channel that are modelled as either open or close.When the channel is open Ca 2+ flows into the cleft. The dynamic that describesthe stochastic behaviour is presented in Section 33.3.3 below. The LCC flux ismodelled as a constant current with amplitude, -0.1 pA, which corresponds tothe amplitude during voltage clamp to 0 mV [?]. The LCC flux is then,{0 : close channelJ [1,2] =−i , : open channel (33.6)2 F Awhere i is the amplitude, 2 the valence of Ca 2+ , F Faraday’s constant and A thearea of the channel. Note that inward current is by convention negative.270

Johan HakeBAFigure 33.2: A: State diagram of the discrete LCC Markov model from [?]. Eachchannelcan be in one ofthe12 states. The transition betweenthestatesare controlledbypropensities.The α,and β arevoltagedependent, γ is [ Ca 2+] dependentand f, a, b, and ω are constant, see [?] for further details. The channels operatesin two modes: Mode normal, represented by the states in the upper row, andMode Ca, represented the states in the lower row. In state 6 and 12 the channelis open, but state 12 is rarely entered as f ′ ≪ f, effectively making Mode Ca aninactivated mode. B: State diagram of a RyR from [?]. The α and γ propensitiesare Ca 2+ dependent, representing the activation and inactivation dependency ofthe cytosolic [ Ca 2+] . The β and δ propensities are constant.◮ Editor note: Placement of figures seems strange.The flux to cytosole is modeled as a concentration dependent flux,J 3 = −σ c − c 0∆s , (33.7)where c is the concentration in the cleft at the boundary, c 0 the concentration incytosole, and ∆sthe distance to the center of the cytosole.33.3.3 Stochastic models of single channelsDiscrete and stochastic Markov chain models are used to describe single channelsdynamics.Suchmodelsconsistsofavariablethatcanbeinacertainnumberof discrete states. The transition between these states is a stochastic event. Thefrequency of these events are determined by the propensity functions associatedto each transition. These functions characterize the probability per unit timethat the corresponding transition event occurs and are dependent on the chosenMarkov chain model and they may vary with time.271

Simulation of Ca 2+ Dynamics in the Dyadic CleftL-type Ca 2+ channelThe LCC is the main source for extracellular Ca 2+ into the cleft. The channelopens when an action potential arrives to the cell and inactivates when singleCa 2+ ions binds to binding sites on the intracellular side of the channel. An LCCis composed by a complex of four transmembrane subunits, which each can bepermissive or non-permissive. For the whole channel to be open, all four subunitsneed to be permissive and the channel then has to undergo a last conformationalchange to an opened state [?]. In this chapter I am going to use aMarkov model of the LCC that incorporates a voltage dependent activation togetherwitha Ca 2+ dependent inactivation [?, ?]. The state diagram of this modelis presented in Fig. 33.2 A. It consists of 12 states, where state 6 and 12 are theonly conducting states, i.e., when the channel is in one of these states it is open.Thetransitionpropensitiesaredefinedbyasetoffunctionsandconstants,whichare all described in [?].Ryanodine ReceptorsRyRs are Ca 2+ specific channels that are situated on the SR in clusters of severalchannels [?, ?]. They open by single Ca 2+ ions attaching to the receptors at thecytosolic side. A modified version of a phenomenological model that mimics thephysiologicalfunctionsoftheRyRs,firstpresentedby[?],willbeused. Themodelconsistsoffourstateswhereoneisconducting, state2, seeFig.33.2B.The αandγ propensities are Ca 2+ dependent, representing the activation and inactivationdependency of cytosolic [ Ca 2+] . The β and δ propensities are constants. Forspecific values for the propensities consider [?].33.4 Numerical methods for the continuous systemThe continuous problem is defined by Eq. (33.2 -33.7) together with an initialcondition. Given a bounded domain Ω ⊂ R 3 with the boundary, ∂Ω, we want tofind c = c(x, t) ∈ R + , for x ∈ Ω and t ∈ R + , such that:{ċ = σ∆c − ∇ · (ca) in Ωσ ∂c∂n = J k on ∂Ω k ,(33.8)with c(·, 0) = c 0 (x). Here a = a(x) = 2σE(x), and, J k and ∂Ω k are the k th flux at thek th boundary, where ⋃ k ∂Ω k = ∂Ω. The J k are given by Eq. (33.5)- (33.7).272

Johan Hake1 from numpy import *2 from dolfin import *34 mesh = Mesh(’cleft_mesh.xml.gz’)56 Vs = FunctionSpace(mesh, "CG", 1)7 Vv = VectorFunctionSpace(mesh, "CG", 1)89 v = TestFunction(Vs)10 u = TrialFunction(Vs)1112 # Defining the electric field-function13 a = Function(Vv,["0.0","0.0","phi_0*valence*kappa*sigma*exp(-kappa*x[2])"],14 {"phi_0":-2.2,"valence":2,"kappa":1,"sigma":1.e5})1516 # Assembly of the K, M and A matrices17 K = assemble(dot(grad(u),grad(v))*dx)18 M = assemble(u*v*dx)19 E = assemble(-u*dot(a,grad(v))*dx)2021 # Collecting face markers from a file, and skip the 0 one22 sub_domains = MeshFunction("uint",mesh,"cleft_mesh_face_markers.xml.gz")23 unique_sub_domains = list(set(sub_domains.values()))24 unique_sub_domains.remove(0)2526 # Assemble matrices and source vectors from exterior facets domains27 domain = MeshFunction("uint",mesh,2)28 F = {};f = {};tmp = K.copy(); tmp.zero()29 for k in unique_sub_domains:30 domain.values()[:] = (sub_domains.values() != k)31 F[k] = assemble(u*v*ds, exterior_facet_domains = domain, \32 tensor = tmp.copy(), reset_tensor = False)33 f[k] = assemble(v*ds, exterior_facet_domains = domain)Figure 33.3: Python code for the assembly of the matrices and vectors fromEq. (33.14)-(33.15).273

Simulation of Ca 2+ Dynamics in the Dyadic Cleft33.4.1 DiscretizationThecontinuousequationsarediscretizedusingtheFiniteelementmethod. Eq.(33.8)is multiplied with a proper test function, v, and we get:∫ ∫ċv dx = [σ∆c − ∇(ca)] v dx, (33.9)Ω Ωand we integrate by part and get:∫ ∫ċv dx = − (σ∇c − ca) ∇v dx + ∑ΩΩk∫∂Ω kJ k v ds k . (33.10)Consider a tetrahedralization of Ω, a mesh, where the domain is divided intodisjoint subdomains, Ω e , elements. A discrete solution c h ∈ V h is defined. HereV h = {φ ∈ H 1 (Ω) : φ ∈ P k (Ω e )∀e}, and P k represents the space of Lagrangepolynomials of order k. The backward Euler method is used to approximate thetime derivative and Eq. (33.10) can now be stated as follows: given c n hV h such that:∫Ωc n+1h− c n h∆t∫(v dx = − σ∇cn+1Ωh− c n+1ha ) · ∇v dx + ∑ k∫∂Ωfind cn+1h∈J k v ds k , ∀v ∈ V h (33.11)where ∆tis thetimestep. The trialfunction c n h (x) isexpressedasaweightedsumof basis functions,N∑c n h(x) = Cj n φ j (x), (33.12)jwhere Cj n arethecoefficients. Lagrange polynomialsoffirstorderisusedforboththe test and the trial function, k = 1, and the number of unknowns, N, will thencoincide with the number of vertices of the mesh.The testfunction v ischosenfrom thesame discrete basis as c n h (x), i.e., v i(x) =φ i (x) ∈ V h , for i ∈ [1 . . .N]. These are used in Eq. (33.11) to produce an algebraicproblem on the following form:1∆t M ( (C n+1 − C n) = −K +E+ ∑ α k F)C k n+1 2j + ∑ c k 0 fk , (33.13)kkwhere C n ∈ R N is the vector of coefficients from the discrete solution c n h (x), αkand c k 0 are constant coefficients ∫related to the flux and kth∫M ij = φ i φ j dx, K ij = ∇φ i · ∇φ j dx,∫ΩΩ(33.14)E ij = aφ i · ∇φ j dx, Fij ∫∂Ω k = φ i φ j ds,kΩ274

Johan Hake1 # Defining the stabilization using local Peclet number2 cppcode = """class Stab: public Function {3 public:4 Function* field; uint _dim; double sigma;5 Stab(const FunctionSpace& V): Function(V)6 {field = 0; sigma=1.0e5;}7 void eval(double* v, const Data& data) const {8 if (!field)9 error("Attach a field function.");10 double field_norm = 0.0; double tau = 0.0;11 double h = data.cell().diameter();12 field->eval(v,data);13 for (uint i = 0;i < geometric_dimension(); ++i)14 field_norm += v[i]*v[i];15 field_norm = sqrt(field_norm);16 double PE = 0.5*field_norm * h/sigma;17 if (PE > DOLFIN_EPS)18 tau = 1/tanh(PE)-1/PE;19 for (uint i = 0;i < geometric_dimension(); ++i)20 v[i] *= 0.5*h*tau/field_norm;}};21 """22 stab = Function(Vv,cppcode); stab.field = a2324 # Assemble the stabilization matrices25 E_stab = assemble(div(a*u)*dot(stab,grad(v))*dx)26 M_stab = assemble(u*dot(stab,grad(v))*dx)2728 # Adding them to the A and M matrices, weighted by the global tau29 tau = 0.28; E.axpy(tau,E_stab); M.axpy(tau,M_stab)Figure 33.4: Python code for the assembly of the SUPG term for the mass andadvection matrices.275

Simulation of Ca 2+ Dynamics in the Dyadic Cleftare the entries in the M, K, E and F k matrices. f k are boundary source vectorscorresponding to the k th boundary. The vector elements are given by:∫fi k = φ i ds. (33.15)∂Ω kThe code for producing the matrices and vectors in Eq. (33.14)-(33.15) is presentedin Fig. 33.3. Note that in the last for loop we iterate over the uniquesubdomains, and set the entries of the MeshFunction domain corresponding tothe k th boundary to 0 and the otherentriesto 1. In this way the same form for theexterior facet domain integrals can be used.The system in Eq. (33.13) is linear and the matrices and vectors can be preassembled.Thisallowsforaflexiblesystemwhereboundarymatricesandboundarysource vectors can be added, when a channel opens. ∆tcan also straightforwardlybe decreased when such an event occurs. This adaptation in time is crucialbothforthenumericalstabilityofthelinearsystem.∆tcanthenbeincreasedafter each time step as the demand on the size of ∆tfalls. The sparse linear systemis solved using the PETSclinear algebra backend[?] in PyDOLFINtogetherwith the Bi-CGSTAB iterative solver [?], and the BoomerAMG preconditionersfrom hypre[?]. In Fig. 33.5 a script is presented that solves the algebraic systemfrom Eq. (33.13) together with a crude time stepping scheme forthe opening andclosing of the included LCC flux.33.4.2 StabilizationIt turns out that the algebraic system in Eq. (33.13) is numerically unstable forphysiological relevant values of a, see Section 33.3.2. This is due to the transportterm introduced by A ij from Eq. (33.14). I have chosen to stabilize thesystem using the Streamline upwind Petrov-Galerkin (SUPG) method [?]. Thismethodadds a discontinuous streamline upwindcontribution to the testfunctionin Eq. (33.9),v ′ = v + s, where s = τ hτ la · ∇v. (33.16)2‖a‖Here τ ∈ [0, 1] is problem dependent, h = h(x) is the size of the local element ofthe mesh, and τ l = τ l (x), is given by,where PE l is the local Péclet number:τ l = coth(PE l ) − 1/ PE l , (33.17)PE l = ‖a‖h/2σ. (33.18)This form of τ l follows the optimal stabilization from an 1D case[?], other choicesexist. The contribution from the diffusion term of the weak form can be eliminatedby choosing a test function from a first order Lagrange polynomial, as the276

Johan Hake∆operatorwillreducethetrialfunctiontozero. ThePyDOLFINcodethatassemblesthe SUPG part of the problem is presented in Fig. 33.4. In the script twomatrices, E stab and M stab are assembled, which are both added to the correspondingadvectionandmassmatrices Eand Mweightedbytheglobal parametertau.A mesh with finer resolution close to the TT surface, at z = 0 nm, is alsoused to further increase the stability. It is at this point the electric field is atits strongest and it attenuates fast. At z = 3 nm the field is down to 5% of themaximal amplitude, and at z = 5 nm, it is down to 0.7%, reducing the need forhighmeshresolutions. Themeshgenerator tetgenisusedtotoproducemesheswith the needed resolution [?].The global stabilization parameter τ, is problem dependent. To find an optimalτ, for a certain electrical field and mesh, the sytem in Eq. (33.13) is solvedto steady state using only homogeneous Neumann boundary conditions. An homogeneousconcentration of c 0 = 0.1 µM is used as the initial condition. Thenumerical solution is then compared with the analytic solution of the problem.This solution is acquired by setting J = 0 in Eq. (33.1) and solving for the c, withthe following result:c(z) = c b exp(−2ψ(z)). (33.19)Here ψ is given by Eq. (33.4), and c b is the concentration in the bulk, i.e., wherez is large. c b was chosen such that the integral of the analytic solution was equalto c 0 × V, where V is the volume of the domain.◮ Editor note: Check equation (33.19).The error of the numerical solution for different values of τ and for threedifferent mesh resolutions are plotted in Fig. 33.6. The meshes are enumeratedfrom1-3. TheerrorisgiveninanormalizedL2norm. Asexpectedweseethatthemeshwiththefinestresolution produce thesmallest error. Themeshresolutionsare quantified by the number of vertices close to z = 0. In the legend of Fig. 33.6the median of the z distance of all vertices and the total number of vertices ineach mesh is presented. The three meshes were created such that the verticesclosed to z = 0 were forced to be situated at some fixed distances from z = 0.Three numerical and one analytical solution for the three different meshes areplotted in Fig. 33.7- 33.9. The numerical solutions are from simulations usingthree different τ: 0.1, 0.6 and the L2-optimal τ, see Fig. 33.6. The traces in thefigures are all picked from a line going from (0,0,0) to (0,0,12).In Fig. 33.7 the traces from mesh 1 is plotted. Here we see that all numericalsolutions are quite poor for all τ. The solution with τ = 0.10 is unstable asit oscillates and produces negative concentration. The solution with τ = 0.60seems stable but it undershoots the analytic solution at z = 0 with ˜1.7 µM. Thesolution with τ = 0.22 is the L2-optimal solution for mesh 1, and approximatesthe analytic solution at z = 0 well.In Fig. 33.8 the traces from mesh 2 is presented in two plots. The left plot277

Simulation of Ca 2+ Dynamics in the Dyadic Cleft1 # Time parameters2 dt_min = 1.0e-8; dt = dt_min; t = 0; c0 = 0.1; tstop = 1.03 events = [0.2,tstop/2,tstop,tstop]; dt_expand = 2.0;45 # Boundary parameters6 t_channels = {1:[0.2,tstop/2], 2:[tstop/2,tstop]}7 sigma = 1e5; ds = 20; area = 3.1416; Faraday = 0.0965; amp = -0.189 # Initialize the solution Function and the left and right hand side10 u = Function(Vs); x = u.vector()11 x[:] = c0*exp(-a.valence*a.phi_0*exp(-a.kappa*mesh.coordinates()[:,-1]))12 b = Vector(len(x); A = K.copy();1314 solver = KrylovSolver("bicgstab","amg_hypre")15 dolfin_set("Krylov relative tolerance",1e-7)16 dolfin_set("Krylov absolute tolerance",1e-10);1718 plot(u, vmin=0, vmax=4000, interactive=True)19 while t < tstop:20 # Initalize the left and right hand side21 A[:] = K; A *= sigma; A += E; b[:] = 02223 # Adding channel fluxes24 for c in [1,2]:25 if t >= t_channels[c][0] and t < t_channels[c][1]:26 b.axpy(-amp*1e9/(2*Faraday*area),f[c])2728 # Adding cytosole flux at Omega 329 A.axpy(sigma/ds,F[3]); b.axpy(c0*sigma/ds,f[3])3031 # Applying the Backward Euler time discretization32 A *= dt; b *= dt; b += M*x; A += M3334 solver.solve(A,x,b)35 t += dt; print "Ca Concentration solved for t:",t3637 # Handle the next time step38 if t == events[0]:39 dt = dt_min; events.pop(0)40 elif t + dt*dt_expand > events[0]:41 dt = events[0] - t42 else:43 dt *= dt_expand4445 plot(u, vmin=0, vmax=4000)4647 plot(u, vmin=0, vmax=4000, interactive=True)Figure 33.5: Python code for solving the system in Eq. (33.13), using the assembledmatrices from the two former code examples from Fig. 33.3- 33.4.278

Johan HakeFigure 33.6: The figureshows a plot ofthe error (a normalizedL2 norm of the differencebetweenthenumericaland analytical solutions)against the stabilizationparameter τ for3 different mesh resolutions.The mesh resolutionsare given by themedian of the z distanceof all vertices and the totalnumber of vertices inthemesh,seelegend. Weseethattheminimalvaluesof the error for thethree meshes, occur atthree different τ: 0.22,0.28, and 0.38.shows the traces for z < 1.5 nm and the right shows the traces for z > 10.5 nm.In the left plot we see the same tendency as in Fig. 33.7, an overshoot of thesolution with τ = 0.10 and an undershoot for the solution with τ = 0.60. TheL2-optimal solution, the one with τ = 0.28, overshoot the analytic solution forthe shown interval in the left plot, but undershoot for the rest of the trace.In the last figure, Fig. 33.9, traces from mesh 3 is presented. The resultsis also here presented in two plots, corresponding to the same z interval as inFig. 33.8. We seethat thesolution with τ = 0.10 isnot goodin eitherplots. Intheleft plot it clearly overshoots the analytic solution for most of the interval, andthen stays at a lower level than the analytic solution for the rest of the interval.The solution with τ = 0.60 is much better here than in the two previous plots. Itundershoots the analytic solution at z = 0 but stays closer to it for the rest of theinterval than the L2-optimal solution. The L2 norm penalize larger distancesbetween two traces, i.e., weighting the error close to z = 0 more than the rest.The optimal solution meassured in the Max norm is given when τ = 50, resultnot shown.These results tell us that it is difficult to get accurate numerical solutionfor the advection-diffusion problem presented in Eq. (33.8), even with optimalSUPG stabilization for the given mesh resolutions. Using finer mesh close toz = 0 would help, but it will create a larger system. It is interesting to noticethat the L2 optimal solutions is better close to z = 0, than other solutions and279

Simulation of Ca 2+ Dynamics in the Dyadic Cleftthe solution for the largest τ is better than other for z ¿ 2 nm. For a modellerthese constraints are important to know about because the solution at z = 0 andz = 12 nm is the most important, as Ca 2+ interact with other proteins at thesepoints.Iamcombiningasolverofthecontinuousanddeterministicadvection-diffusionequation, Eq. (33.2), and a solver of the discrete and stochastic systems ofMarkov chain models from Section 33.3.3. These systems are two-way coupledas some of the propensities in the Markov chains are dependent on the local[Ca2+ ] and boundary fluxes are turned on or off dependent on what state theMarkov models are in. I have used a hybrid approach similar to the one presentedin [?] to solve this system. Basically this hybrid method consists of amodified Gillespie method [?] to solve the stochastic state transitions, and a finiteelementmethodin space togetherwithabackward Euler methodin time, tosolve the continuous system.33.5 diffsim an event driven simulatorInthe scripts in Fig. 33.3- 33.5 it is shownhowasimple continuoussolver can bebuilt with PyDOLFIN. By pre-assemble the matrices from Eq. (33.14) a flexiblesystem for adding and removing boundary fluxes corresponding to the state ofthe channels is constructed. The script in Fig.33.5 uses fixed time steps for thechannelstates. Thesetimestepstogetherwithan expanding ∆tformasimplistictime stepping scheme that is sufficient to solve the presented example. Howeverit would be difficult to expand it to also incorporate the time stepping involvedwith the solution of stochastic Markov models, and other discrete variables. Forthis I have developed an event driven simulator called diffsim. In the lastsubsections in this chapter I will present the algorithm underlaying the timestepping scheme in diffsim and an example of how one can use diffsim todescribe and solve a model of the dyadic cleft. The diffsim software can befreely downloaded from URL:http://www.fenics.org/wiki/FEniCS_Apps.33.5.1 Stochastic systemThe stochastic evolution of the Markov chain models from Section 33.3.3 is determinedby a modified Gillespie method [?], which resembles the one presentedin [?]. I will not go into detail of the actual method, but rather explain the partof the method that has importance for the overall time stepping algorithm, seebelow.The solution of the included stochastic Markov chain models is stored in astatevector, S.Eachelementin ScorrespondstooneMarkovmodelandthevaluereflects which state each model is in. The transitions between these states aremodelled stochastically and are computed using the modified Gillespie method.280

Johan HakeThis method basically give us which of the states in S changes to what state andwhen. Itisnotall such statetransitionsthat arerelevant forthecontinuoussystem,e.g, a transition between two closed states in the LCC model will not haveany impact on the boundary fluxes, and can be ignored. Only transitions thateither open or close a channel, which is called channel transitions, will be recognized.The modified Gillespie method assume that any continuous variables acertain propensity function is dependenton are constant during a timestep. Theerrordone by assuming this is reducedby taking smaller time stepsright after achannel transition, as the continuous field is changing dramatically during thistime period.33.5.2 Time stepping algorithmTo simplify the presentation of the time stepping algorithm we only consider onecontinuous variable, this could for example be the Ca 2+ field. The frameworkpresented here can be expanded to also handle several continuous variables. Wedefine a base class called DiscreteObject which defines the interface for alldiscrete objects. A key function of a discrete object is to know when its nextevent is due to. The DiscreteObject that has the smallest next event time,gets to define the size of the next ∆t, for which the Ca 2+ field is solved with. Inpythonthis is easily done by making the DiscreteObjects sortable with respectto their next event time. All DiscreteObjects is then collected in a list,discrete objects see Fig. 33.10, and the DiscreteObjectwith the smallestnext event time is then just min(discrete objects).An event from a DiscreteObject that does not have an impact on the continuoussolutionwillbeignored,e.g.,aMarkovchain modeltransition thatisnota channel transition. A transition needs to be realized before we can tell if it is achanneltransition ornot. Thisisdonebysteppingthe DiscreteObject,callingthe objects step() method. If the method returns False it will not affect theCa 2+ field, and we enter the while loop, and a new DiscreteObject is picked,seeFig.33.10. Howeveriftheobjectreturns Truewhenitisstepped,wewillexitthe while loop and continue. Next we have to update the other discrete objectswith the chosen ∆t, solve the Ca 2+ field, broadcast the solution and last but notleast execute the discrete event that is scheduled to happen at ∆t.◮ Editor note: Fix ugly text sticking out in margin.In Fig. 33.11 we show an example of a possible realization of this algorithm.Theexamplestartsatt=2msatthetop-mosttimelinerepresentedbyA,anditincludesthreedifferenttypesofDiscreteObjects: i) DtExpander, ii) StochasticHandlerand iii) TStop. See the legend of the figure for more details.281

Simulation of Ca 2+ Dynamics in the Dyadic Cleft33.5.3 diffsim an examplediffsim is a versatile event driven simulator that incorporates the time steppingalgorithmpresentedintheprevioussectiontogetherwiththeinfrastructuretosolvemodelswithoneormorediffusionaldomains,definedbyacomputationalmesh. Each such domain can have several diffusive ligands. Custom fluxes caneasily be included throughthe framework diffsimgive. The submoduledyadiccleftimplements some published Markov models that can be used to simulatethe stochastic behaviour of a dyad and some convinient boundary fluxes. It alsoimplements the field flux from the lipid bi-layer discussed in Section 33.3.2. InFig. 33.12 runnable script is presented, which simulate the time to release, alsocalled the latency for a dyad. The two Markov models that is presented in section33.3.3ishereusedtomodelthestochasticdynamicsoftheRyRandtheLCC.The simulation is driven by an external dynamic voltage clamp. The data thatdefinesthisisreadin from afile usingutilities fromthe NumPypythonpackages.282

Johan HakeFigure 33.7: The figureshows the concentrationtraces of the numericalsolutions from Mesh 1,see legend of Fig. 33.6,for three different τ togetherwith the analyticsolution. The solutionswere picked from a linegoing between the points(0,0,0) and (0,0,12). Wesee that the solutionwith τ = 0.10 oscillates.The solution withτ = 0.22 was the solutionwith smallest globalerror for this mesh, seeFig 33.6, and the solutionwith τ = 0.60 undershootsthe analytic solutionat z = 0nm with ˜1.7µM.283

Simulation of Ca 2+ Dynamics in the Dyadic CleftFigure 33.8: The figures shows the concentration traces of the numerical solutionsfrom Mesh 2, see legend of Fig. 33.6, for three different τ together with theanalytic solution. The traces in the two panels were picked from a line goingbetween the points (0,0,0) and (0,0,1.5) for the left panel and between (0,0,10.5)and (0,0,12) for the right panel. We see from both panels that the solution withτ = 0.10 give the poorest solution. The solution with τ = 0.28 was the solutionwith smallest global error for this mesh, see Fig 33.6, and this is reflected in thereasonable good fit seen in the left panel, especially at z = 0nm. The solutionwith τ = 0.60 undershoots the analytic solution at z = 0 with ˜1.2 µM. From theright panelwesee thatall numerical solutionsundershootat z = 15nm,and thatthe trace with τ = 0.60 comes closest the analytic solution.284

Johan HakeFigure 33.9: The figures shows the concentration traces of the numerical solutionsfrom Mesh 3, see legend of Fig. 33.6, for three different τ together with theanalytic solution. The traces in the two panels were picked from the same linesas the one in Fig. 33.8. Again we see from both panels that the solution withτ = 0.10 give the poorest solution. The solution with τ = 0.38 was the solutionwith smallest global error for this mesh, see Fig 33.6, and this is reflected in thegood fit seen in the left panel, especially at z = 0nm. The solution with τ = 0.60undershoots the analytic solution at z = 0 with ˜0.7 µM. From the right panelwe see that all numerical solutions undershoot at z = 15nm, and the trace withτ = 0.60 also here comes closest the analytic solution.1 while not stop_sim:2 # The next event3 event = min(discrete_objects)4 dt = event.next_time()56 # Step the event and check result7 while not event.step():8 event = min(discrete_objects)9 dt = event.next_time()1011 # Update the other discrete objects with dt12 for obj in discrete_objects:13 obj.update_time(dt)1415 # Solve the continuous equation16 ca_field.solve(dt)17 ca_field.send()1819 # Distribute the event20 event.send()Figure 33.10: Python-like pseudo code for the time stepping algorithm used inour simulator285

Simulation of Ca 2+ Dynamics in the Dyadic CleftFigure 33.11: Diagram for the time stepping algorithm using 3 discrete objects:DtExpander, StochasticHandler, TStop. The values below the small ticks,corresponds to the time to the next event for each of the discrete objects. Thistime is measured from the last realized event, which is denoted by the thickertick. InAwehaverealizedatimeeventatt=2.0ms. Thenexteventtoberealizedis a stochastic transition, the one with smallest value below the ticks. In B thiseventisrealized,andthe StochasticHandlernowshowanewnexteventtime.Theeventisachanneltransitionforcingthedt,controlledbythe DtExpander,tobeminimized. DtExpandernowhasthesmallestnexteventtime,andisrealizedin C. The channel transition that was realised in B raised the [ Ca 2+] in the cleftwhich in turn increase the Ca 2+ dependent propensity functions in the includedMarkov models. The time to next event time of the StochasticHandler hasthereforebeenupdated,andmovedforwardinC.Alsonotethatthe DtExpanderhas expanded its next event time. In D the stochastic transition is realized andupdated with a new next event time, but it is ignored as it is not a channeltransition. The smallest time step is now the DtExpander, and this is realizedin E. In this example we do not realize the TStop event as it is too far away.286

Johan Hake1 from diffsim import *2 from diffsim.dyadiccleft import *3 from numpy import exp, fromfile45 # Model parameters6 c0_bulk = 0.1; D_Ca = 1.e5; Ds_cyt = 50; phi0 = -2.2; tau = 0.287 AP_offset = 0.1; dV = 0.5, ryr_scale = 100; end_sim_when_opend = True89 # Setting boundary markers10 LCC_markers = range(10,14); RyR_markers = range(100,104); Cyt_marker = 31112 # Add a diffusion domain13 domain = DiffusionDomain("Dyadic_cleft","cleft_mesh_with_RyR.xml.gz")14 c0_vec = c0_bulk*exp(-VALENCE[Ca]*phi0*exp(-domain.mesh().coordinates()[:,-1]))1516 # Add the ligand with fluxes17 ligand = DiffusiveLigand(domain.name(),Ca,c0_vec,D_Ca)18 field = StaticField("Bi_lipid_field",domain.name())19 Ca_cyt = CytosolicStaticFieldFlux(field,Ca,Cyt_marker,c0_bulk,Ds_cyt)2021 # Adding channels with Markov models22 for m in LCC_markers:23 LCCVoltageDepFlux(domain.name(), m, activator=LCCMarkovModel_Greenstein)24 for m in RyR_markers:25 RyRMarkovModel_Stern("RyR_%d"%m, m, end_sim_when_opend)2627 # Adding a dynamic voltage clamp that drives the LCC Markov model28 AP_time = fromfile(’AP_time_steps.txt’,sep=’\n’)29 dvc = DynamicVoltageClamp(AP_time,fromfile(’AP.txt’,sep=’\n’),AP_offset,dV)3031 # Get and set parameters32 params = get_params()3334 params.io.save_data = True35 params.Bi_lipid_field.tau = tau36 params.time.tstop = AP_time[-1] + AP_offset37 params.RyRMarkovChain_Stern.scale = ryr_scale3839 info(str(params))4041 # Run 10 simulations42 data = run_sim(10,"Dyadic_cleft_with_4_RyR_scale")43 mean_release_latency = mean([ run["tstop"] for run in data["time"]])Figure 33.12: An example of how diffsim can be used to simulate the time toRyR release latency, from a small dyad whos domain is defined by the mesh inthe file cleft mesh with RyR.xml.gz.287

CHAPTER34Electromagnetic Waveguide AnalysisBy Evan Lezar and David B. DavidsonChapter ref: [lezar]◮ Editor note: Reduce the number of macros.At their core, Maxwell’s equations are a set of differential equations describingthe interactions between electric and magnetic fields, charges, and currents.These equations provide the tools with which to predict the behaviour of electromagneticphenomena, giving us the ability to use them in a wide variety ofapplications, including communication and power generation. Due to the complexnature of typical problems in these fields, numeric methods such as thefinite element method are often employed to solve them.One of the earliest applications of the finite element method in electromagneticswas in waveguide analysis [?]. Since waveguides are some of the mostcommon structures in microwave engineering, especially in areas where highpower and low loss are essential [?], their analysis is still a topic of much interest.This chapter considers the use of FEniCS in the cutoff and dispersionanalysis of these structures as well as the analysis of waveguide discontinuities.These types of analysis form an important part of the design and optimisation ofwaveguide structures for a particular purpose.The aim of this chapter is to guide the reader through the process followed inimplementing solvers for various electromagnetic problems with both cutoff anddispersion analysis considered in depth. To this end a brief introduction of electromagneticwaveguide theory, the mathematical formulation of these problems,and the specifics of their solution using the finite element method are presentedin 34.1. This lays the groundwork for a discussion of the details pertaining tothe FEniCS implementation of these solvers, covered in 34.2. The translation of289

Electromagnetic Waveguide Analysisthe finite element formulation to FEniCS, as well as some post-processing considerationsare covered. In 34.3 the solution results for three typical waveguideconfigurations are presented and compared to analytical or previously publisheddata. This serves to validate the implementation and illustrates the kinds ofproblems that can be solved. Results for the analysis of H-plane waveguide discontinuitiesare then presented in 34.4 with two test cases being considered.34.1 FormulationAs mentioned, in electromagnetics, the behaviour of the electric and magneticfieldsaredescribedbyMaxwell’sequations[?,?]. Usingthesepartialdifferentialequations, various boundary value problems can be obtained depending on theproblem being solved. In the case of time-harmonic fields, the equation used isthe vector Helmholtz wave equation. If the problem is further restricted to adomain surroundedby perfect electrical ormagneticconductors(as isthe case ingeneral waveguide problems) the wave equation in terms of the electric field, E,can be written as [?]∇ × 1 µ r∇ × E − k 2 o ǫ rE = 0, in Ω, (34.1)subject to the boundary conditionsˆn × E = 0 on Γ e (34.2)ˆn × ∇ × E = 0 on Γ m , (34.3)with Ω representing the interior of the waveguide and Γ e and Γ m electric andmagnetic walls respectively. µ r and ǫ r are the relative magnetic permeabilityand electric permittivity respectively. These are material parameters that maybe position dependent but only the isotropic case is considered here. k o is theoperating wavenumber which is related to the operating frequency (f o ) by theexpressionk o = 2πf oc 0, (34.4)with c 0 the speed of light in free space. This boundary value problem (BVP) canalso be writtenin termsofthemagnetic field[?], but as thediscussions followingare applicable to both formulations this will not be considered here.If the guide is sufficiently long, and the z-axis is chosen parallel to its centralaxis as shown in Figure 34.1, then z-dependence of the electric field can be assumedtobeoftheforme −γz with γ = α+jβ acomplexpropagationconstant[?,?].Making this assumption and splitting the electric field into transverse (E t ) andaxial (ẑE z ) components, results in the following expression for the fieldE(x, y, z) = [E t (x, y) + ẑE z (x, y)]e −γz , (34.5)290

Evan Lezar and David B. DavidsonΓ exzΩyFigure 34.1: A long waveguide with an arbitrary cross-section aligned with thez-axis.with x and y the Cartesian coordinates in the cross-sectional plane of the waveguideand z the coordinate along the length of the waveguide.From (34.5) as well as the BVP described by (34.1), (34.2), and (34.3) it is possibleto obtain the following variational functional found in many computationalelectromagnetic texts [?, ?]F(E) = 1 2∫Ω1µ r(∇ t × E t ) · (∇ t × E t ) − k 2 o ǫ rE t · E t+ 1 µ r(∇ t E z + γE t ) · (∇ t E z + γE t ) − k 2 o ǫ rE z E z dΩ, (34.6)with∇ t = ∂∂xˆx + ∂∂yŷ (34.7)the transverse del operator.A number of other approaches have also been taken to this problem. Some,for instance, involve only nodal based elements; some use the longitudinal fieldsas the working variable, and the problem has also been formulated in terms ofpotentials,ratherthanfields. Agoodsummaryofthesemaybefoundin[?,Chapter9]. The approach used here, involving transverse and longitudinal fields, isprobably the most widely used in practice.34.1.1 Waveguide Cutoff AnalysisOne of the simplest cases to consider, and often a starting point when testing anewfiniteelementimplementation,iswaveguide cutoffanalysis. Whenawaveguideis operating at cutoff, the electric field is uniform along the z-axis whichcorresponds with γ = 0 in (34.5) [?]. Substituting γ = 0 into (34.6) yields the291

Electromagnetic Waveguide Analysisfollowing functionalF(E) = 1 2∫Ω1µ r(∇ t × E t ) · (∇ t × E t ) − k 2 cǫ r E t · E t+ 1 µ r(∇ t E z ) · (∇ t E z ) − k 2 cǫ r E z E z dΩ. (34.8)The symbol for the operating wavenumber k o has been replaced with k c , indicatingthat the quantity of interest is now the cutoff wavenumber. This quantity inaddition to the field distribution at cutoff are of interest in these kinds of problems.Using two dimensional vector basis functions for the discretisation of thetransverse field,andscalar basisfunctionsfortheaxial components,theminimisationof (34.8) is equivalent to solving the following matrix equation[Stt 00 S zz]{ete z}= k 2 c[Ttt 00 T zz]{ete z}, (34.9)which is in the form of a general eigenvalue problem. Here S ss and T ss representsthe stiffness and mass common to finite element literature [?, ?] with thesubscripts tt and zz indicating transverse or axial components respectively. Theentries of the matrices of (34.9) are defined as∫(s tt ) ij =∫(t tt ) ij =∫(s zz ) ij =∫(t zz ) ij =ΩΩΩΩ1µ r(∇ t × N i ) · (∇ t × N j )dΩ, (34.10)ǫ r N i · N j dΩ, (34.11)1µ r(∇ t M i ) · (∇ t M j )dΩ, (34.12)ǫ r M i M j dΩ, (34.13)with ∫ dΩ representing integration over the cross-section of the waveguide andΩN i and M i representing the i th vector and scalar basis functions respectively.Duetotheblock natureofthematricestheeigensystemcan bewrittenastwosmaller systems[ ]{ } [ ]{ }Stt et = k2c,TE Ttt et , (34.14)[ ]{ } [ ]{ }Szz ez = k2c,TM Tzz ez , (34.15)with k c,TE and k c,TM corresponding to the cutoff wavenumbers of the transverseelectric (TE) and transverse magnetic (TM) modes respectively. The eigenvectors({e t } and {e z }) of the systems are the coefficients of the vector and scalarbasis functions, allowing for the calculation of the transverse and axial field distributionsassociated with a waveguide cutoff mode.292

Evan Lezar and David B. Davidson34.1.2 Waveguide Dispersion AnalysisInthecaseofcutoffanalysisdiscussedin34.1.1, oneattemptstoobtainthevalueof ko 2 = k2 c for a given propagation constant γ, namely γ = 0. For most waveguidedesign applications however, k o is specified and the propagation constant is calculatedfrom the resultant eigensystem [?, ?]. This calculation can be simplifiedsomewhat by making the following substitution into (34.6)which results in the modified functionalF(E) = 1 ∫1(∇ t × E t,γ ) · (∇ t × E t,γ ) − k2 µoǫ 2 r E t,γ · E t,γrΩE t,γ = γE t , (34.16)− γ 2 [ 1µ r(∇ t E z + E t,γ ) · (∇ t E z + E t,γ ) − k 2 oǫ r E z E z]dΩ. (34.17)Using the same discretisation as for cutoff analysis discussed in 34.1.1, the matrixequation associated with the solution of the variational problem is given by[ ]{ } [ ] { }Att 0 et= γ 2 Btt B tz et, (34.18)0 0 e z B zt B zz e zwithA tt = S tt − k 2 o T tt, (34.19)B zz = S zz − k 2 oT zz , (34.20)which is in the form of a generalised eigenvalue problem with the eigenvaluescorresponding to the square of the complex propagation constant (γ).The matrices S tt , T tt , S zz , and T zz are identical to those defined in 34.1.1 withentries given by (34.10), (34.11), (34.12), and (34.13) respectively. The entries ofthe other sub-matrices, B tt , B tz , and B zt , are defined by∫1(b tt ) ij = N i · N j dΩ, (34.21)Ω µ∫ r1(b tz ) ij = N i · ∇ t M j dΩ, (34.22)µ r∫(b zt ) ij =ΩΩ1µ r∇ t M i · N j dΩ. (34.23)A common challenge in electromagnetic eigenvalue problems such as these isthe occurrence of spurious modes [?]. These are non-physical modes that fall inthe null space of the ∇ × ∇ × operator of (34.1) [?] (The issue of spurious modesis not as closed as most computational electromagnetics texts indicate. For a293

Electromagnetic Waveguide Analysissummary of recent work in the applied mathematics literature, written for anengineering readership, see [?]).One of the strengths of the vector basis functions used in the discretisationof the transverse component of the field is that it allows for the identificationof these spurious modes [?, ?]. In [?] a scaling method is proposed to shift theeigenvaluespectrumsuchthatthedominantwaveguidemode(usuallythelowestnon-zero eigenvalue) corresponds with the largest eigenvalue of the new system.Other approaches have also been followed to address the spurious modes. In [?],Lagrange mutipliers are used to move these modes from zero to infinity.Inthecase oftheeigensystem associatedwithdispersion analysis, thematrixequation of (34.18) is scaled as follows[ ]{ } [ ] { }Btt B tz et= θ2 Btt + AttBB zt B zz e z θ 2 + γ 2 θ 2 tz et, (34.24)B zt B zz e zanupperboundonthesquareofthepropagationconstant(γ 2 ) and µ (max)r and ǫ (max)r the maximum relative permeability and permittivity inthe computational domain.If λ is an eigenvalue of the scaled system of (34.24), then the propagationconstant can be calculated asγ 2 = 1 − λλ θ2 , (34.25)with θ 2 = k 2 oµ (max)rǫ (max)rand thus γ 2 → ∞ as λ → 0, which moves the spurious modes out of the region ofinterest.34.2 ImplementationThissectionconsidersthedetailsoftheimplementationofaFEniCS-basedsolverfor waveguide cutoff mode and dispersion curve problems as described in 34.1.1and 34.1.2. A number of code snippets illustrate some of the finer points of theimplementation.34.2.1 FormulationListing 34.1 shows the definitions of the function spaces used in the solution ofthe problems considered here. Nédélec basis functions of the first kind (N v andN u) are used to approximate the transverse component of the electric field. Thisensures that the tangential continuity required at element and material boundariescan be enforced [?]. The axial component of the field is modelled usinga set of Lagrange basis functions (M v, and M u). Additionally, a discontinuousGalerkin functionspace isincludedtoallow forthemodellingofmaterialparameterssuch as dielectrics.294

Evan Lezar and David B. DavidsonListing 34.1: Function spaces and basis functions.V_DG = FunctionSpace(mesh, "DG", 0) V_N = FunctionSpace(mesh,"Nedelec", transverse_order) V_M = FunctionSpace(mesh, "Lagrange",axial_order)combined_space = V_N + V_L(N_v, M_v) = TestFunctions(combined_space)(N_u, M_u) = TrialFunctions(combined_space)In order to deal with material properties, the Function class is extendedand the eval() method overridden. This is illustrated in Listing 34.2 where adielectric with a relative permittivity of ǫ r = 4 that extends to y = 0.25 is shown.This class is then instantiated using the discontinuous Galerkin function spacealready discussed. For constant material properties (such as the inverse of themagnetic permittivity µ r , in this case) a JIT-compiled function is used.class HalfLoadedDielectric(Function):def eval(self, values, x):if x[1] < 0.25:values[0] = 4.0else:values[0] = 1.0;e_r = HalfLoadedDielectric(V_DG)one_over_u_r = Function(V_DG, "1.0")Listing 34.2: Material properties and functions.k_o_squared = Function(V_DG, "value", {"value" : 0.0})theta_squared = Function(V_DG, "value", {"value" : 0.0})ThebasisfunctionsdeclaredinListing34.1andthedesiredmaterialpropertyfunctions are now used to create the forms required for the matrix entries specifiedin 34.1.1 and 34.1.2. The forms are shown in Listing 34.3 and the matricesof (34.9), (34.18), and (34.24) can be assembled using the required combinationsof these forms with the right hand side of (34.24), rhs, provided as an example.It should be noted that the use of JIT-functions for operating wavenumber andscaling parameters means that the forms need not be recompiled each time theoperating frequency is changed. This is especially beneficial when the calculationof dispersion curves is considered since the same calculation is performedfor a range of operating frequencies.From (34.2) it follows that the tangential component of the electric field mustbe zero on perfectly electrical conducting (PEC) surfaces. What this means inpractice is that the degrees of freedom associated with both the Lagrange andNédélec basis functions on the boundary must be set to zero since there can beno electric field inside a perfect electrical conductor [?]. An example for a PEC295

Electromagnetic Waveguide AnalysissurfacesurroundingtheentirecomputationaldomainisshowninListing34.4asthe ElectricWalls class. This boundary condition can then be applied to theconstructed matrices before solving the eigenvalue systems.The boundary condition given in (34.3) results in a natural boundary conditionfor the problems considered and thus it is not necessary to explicitly enforceit [?]. Such magnetic walls and the symmetry of a problem are often used to decreasethe size ofthecomputationaldomain although thisdoeslimit thesolutionobtained to even modes [?].Once the required matrices have been assembled and the boundary conditionsapplied, the resultant eigenproblem can be solved. This can be done byoutputting the matrices and solving the problem externally, or by making useof the eigensolvers provided by SLEPc that can be integrated into the FEniCSpackage.34.2.2 Post-ProcessingAfter the eigenvalue system has been solved and the required eigenpair chosen,this can be post-processed to obtain various quantities of interest. For the cutoffwavenumber, this is a relatively straight-forward process and only involves simpleoperations on theeigenvalues ofthesystem. For thecalculation ofdispersioncurves and visualisation of the resultant field components the process is slightlymore complex.Dispersion CurvesFor dispersion curves the computed value of the propagation constant (γ) is plottedas a function of the operating frequency (f o ). Since γ is a complex variable,a mapping is required to represent the data on a single two-dimensional graph.This is achieved by choosing the f o -axis to represent the value γ = 0, effectivelyListing 34.3: Forms for matrix entries.s_tt = one_over_u_r*dot(curl_t(N_v), curl_t(N_u))t_tt = e_r*dot(N_v, N_u)s_zz = one_over_u_r*dot(grad(M_v), grad(M_u))t_zz = e_r*M_v*M_ub_tt = one_over_u_r*dot(N_v, N_u)b_tz = one_over_u_r*dot(N_v, grad(M_u))b_zt = one_over_u_r*dot(grad(M_v), N_u)a_tt = s_tt - k_o_squared*t_ttb_zz = s_zz - k_o_squared*t_zzrhs = b_tt + b_tz + b_zt + b_zz + 1.0/theta_squared*a_tt296

Evan Lezar and David B. Davidsonclass ElectricWalls(SubDomain):def inside(self, x, on_boundary):return on_boundaryListing 34.4: Boundary conditions.dirichlet_bc = DirichletBC(combined_space,Function(combined_space, "0.0"), ElectricWalls())dividing the γ − f o plane into two regions. The region above the f o -axis is used torepresentthemagnitudeoftheimaginary partof γ,whereastherealpartfallsinthe lower region. A mode that propagates along the guide for a given frequencywill thus lie in the upper half-plane of the plot and a complex mode will be representedby a data point above and below the f o -axis. This procedure is followedin [?] and other literature and allows for quick comparisons and validation ofresults.Field VisualisationIn order to visualise the fields associated with a given solution, the basis functionsneed to be weighted with coefficients corresponding to the entries in aneigenvectorobtainedfromoneoftheeigenvalue problems. Inaddition, thetransverseor axial components of the field may need to be extracted. An example forplotting the transverse and axial componentsof the field is given in Listing 34.5.Here the variable x assigned to the function vector is one of the eigenvectorsobtained by solving the eigenvalue problem.Listing 34.5: Extraction and visualisation of transverse and axial field components.f = Function(combined_space) f.vector().assign(x)(transverse, axial) = f.split()plot(transverse)plot(axial)The eval() method of the transverse and axial functions can also becalled in order to evaluate the functions at a given spatial coordinate, allowingfor further visualisation or post-processing options.34.3 ExamplesThe first of the examples considered is the canonical one of a hollow waveguide,which has been covered in a multitude of texts on the subject [?, ?, ?, ?]. Since297

Electromagnetic Waveguide Analysisthe analytical solutions for this structure are known, it provides an excellentbenchmark and is a typical starting point for the validation of a computationalelectromagnetic solver for solving waveguide problems.The second and third examples are a partially filled rectangular guide anda shielded microstrip line on a dielectric substrate, respectively. In each caseresults are compared to published results from the literature as a means of validation.34.3.1 Hollow Rectangular WaveguideFigure 34.2 shows the cross section of a hollow rectangular waveguide. For thepurpose of this chapter a guide with dimensions a = 1m and b = 0.5m is considered.The analytical expressions for the electric field components of a hollowǫ r = 1µ r = 1baFigure 34.2: A diagram showing the cross section and dimensions of a hollowrectangular waveguide.rectangular guide with width a and height b are given by [?]E x = n ( mπx) ( nπy)b A mn cos sin , (34.26)a bE y = −m ( mπx) ( nπy)a A mn sin cos , (34.27)a bfor the TE mn mode, whereas the z-directed electric field for the TM mn mode hasthe form [?]( mπx) ( nπy)E z = B mn sin sin , (34.28)a bwith A mn and B mn constants for a given mode. In addition, the propagation constant,γ, has the formγ = √ ko 2 − k2 c , (34.29)with k o the operating wavenumber dependent on the operating frequency, and( mπ) 2 ( nπ) 2kc 2 = + , (34.30)a btheanalytical solutionforthesquareofthecutoffwavenumberforboththeTE mnand TM mn modes.298

Evan Lezar and David B. DavidsonCutoff AnalysisFigure 34.3 shows the first twocalculated TE cutoff modesfor the hollow rectangularguide, with the first two TM cutoff modes being shown in Figure 34.4. Thesolutionisobtainedwith64triangularelementsandsecondorderbasisfunctionsin the transverse as well as the axial discretisations.0.50.5y-axis [m]y-axis [m]0.00.00 1x-axis [m](a) TE 10 mode.0 1x-axis [m](b) TE 01 mode.Figure 34.3: The first two calculated TE cutoff modes of a 1 m × 0.5 m hollowrectangular waveguide.0.50.5y-axis [m]y-axis [m]0.00.00 1x-axis [m](a) TM 11 mode.0 1x-axis [m](b) TM 21 mode.Figure 34.4: The first two calculated TM cutoff modes of a 1 m × 0.5 m hollowrectangular waveguide.Table 34.1 gives a comparison of the calculated and analytical values for thesquare of the cutoff wavenumber of a number of modes for a hollow rectangularguide. As can be seen from the table, there is excellent agreement between thevalues.299

Electromagnetic Waveguide AnalysisTable 34.1: Comparison of analytical and calculated cutoff wavenumber squared(kc)forvariousTEandTMmodesofa1m×0.5mhollowrectangularwaveguide.2Mode Analytical [m −2 ] Calculated [m −2 ] Relative ErrorTE 10 9.8696 9.8696 1.4452e-06TE 01 39.4784 39.4784 2.1855e-05TE 20 39.4784 39.4784 2.1894e-05TM 11 49.3480 49.4048 1.1514e-03TM 21 78.9568 79.2197 3.3295e-03TM 31 128.3049 129.3059 7.8018e-03300

Evan Lezar and David B. DavidsonDispersion AnalysisWhen considering the calculation of the dispersion curves for the hollow rectangularwaveguide, the mixed formulation as discussed in 34.1.2 is used. The calculateddispersion curves for the first 10 modes of the hollow rectangular guideare shown in Figure 34.5 along with the analytical results. For the rectangularguide a number of modes are degenerate with the same dispersion and cutoffproperties as predicted by (34.29) and (34.30). This explains the visibility of onlysix curves. There is excellent agreement between the analytical and computedresults.1.00.5(¢/k o ) 20.0¡1.0¡0.5150 200 250 300 350f o [MHz]Figure 34.5: Dispersion curves for the first 10 modes of a 1 m × 0.5 m hollowrectangular waveguide. Markers are used to indicate the analytical results with and indicating TE and TM modes respectively.34.3.2 Half-Loaded Rectangular WaveguideIn some cases, a hollow rectangular guide may not be the ideal structure to usedue to, for example, limitations on its dimensions. If the guide is filled witha dielectric material with a relative permittivty ǫ r > 1, the cutoff frequency ofthe dominant mode will be lowered. Consequently a loaded waveguide will bemode compact than a hollow guide for the same dominant mode frequency. Furthermore,in many practical applications, such as impedance matching or phaseshifting sections, a waveguide that is only partially loaded is used [?].Figure 34.6 shows the cross section of such a guide. The guide consideredhere has the same dimensions as the hollow rectangular waveguide used in theprevious section, but its lower half is filled with an ǫ r = 4 dielectric material.301

Electromagnetic Waveguide Analysisdǫ r = 1µ r = 1ǫ r = 4µ r = 1baFigure34.6: Adiagramshowingthecrosssectionanddimensionsofahalf-loadedrectangular waveguide. The lower half of the guide is filled with an ǫ r = 4 dielectricmaterial.Cutoff AnalysisFigure 34.7 shows the first TE and TM cutoff modes of the half-loaded guideshown in Figure 34.6. Note the concentration of the transverse electric field inthe hollow part of the guide. This is due to the fact that the displacement flux,D = ǫE, must be normally continuous at the dielectric interface [?, ?].0.500.50y-axis [m]0.25y-axis [m]0.250.000.000 1x-axis [m](a) First TE mode.0 1x-axis [m](b) First TM mode.Figure 34.7: The first calculated TE and TM cutoff modes of a 1 m × 0.5 mrectangular waveguide with the lower half of the guide filled with an ǫ r = 4dielectric.Dispersion AnalysisThe dispersion curves for the first 8 modes of the half-loaded waveguide areshown in Figure 34.8 with results for the first 4 modes from [?] provided as reference.Here it can be seen that the cutoff frequency of the dominant mode hasdecreased and there is no longer the same degeneracy in the modes when comparedtothehollowguide ofthesamedimensions. Inaddition, thereare complex302

Evan Lezar and David B. Davidson4321(¤/k o ) 20£4£3£2£1100 120 140 160 180 200 220 240 260 280 300 320 340f o [MHz]Figure 34.8: Dispersion curves for the first 8 modes of a 1 m × 0.5 m rectangularwaveguide with its lower half filled with an ǫ r = 4 dielectric material. Referencevalues for the first 4 modes from [?] are shown as . The presence of complexmode pairs are indicated by and •.modes present as a result of the fourth and fifth as well as the sixth and seventhmodes occurring as conjugate pairs at certain points in the spectrum. It shouldbe noted that the imaginary parts of these conjugate pairs are very small andthus the • markers in Figure 34.8 appear to fall on the f o -axis. These complexmodes are discussed further in 34.3.3.34.3.3 Shielded MicrostripMicrostrip line is a very popular type of planar transmission line, primarily dueto the fact that it can be constructed using photolithographic processes and integrateseasily with other microwave components [?]. Such a structure typicallyconsistsofathin conductingstriponadielectric substrate above agroundplane.Inaddition, thestrip maybeshielded byenclosing itin aPEC boxtoreduce electromagneticinterference. A cross section of a shielded microstrip line is shownin Figure 34.9 with the thickness of the strip, t, exaggerated for clarity. Thedimensions used to obtain the results discussed here are given in Table 34.2.Table 34.2: Dimensions for the shielded microstrip line considered here. Definitionsfor the symbols are given in Figure 34.9.Dimension [mm]a, b 12.7d, w 1.27t 0.127303

Electromagnetic Waveguide Analysisǫ r = 1µ r = 1btwdǫ r = 8.875µ r = 1aFigure 34.9: A diagram showing the cross section and dimensions of a shieldedmicrostrip line. The microstrip is etched on a dielectric material with a relativepermittivityof ǫ r = 8.75. Theplaneofsymmetryisindicatedbyadashedlineandis modelled as a magnetic wall in order to reduce the size of the computationaldomain.Cutoff AnalysisSince the shielded microstrip structure consists of two conductors, it supports adominant transverse electromagnetic (TEM) wave that has no axial componentoftheelectric ormagneticfield[?]. Such a modehasacutoffwavenumberofzeroandthuspropagatesforallfrequencies[?,?]. Thecutoffanalysisofthisstructureis not considered here explicitly. The cutoff wavenumbers for the higher ordermodes(which are hybrid TE-TM modes[?]) can however be determined from thedispersion curves by the intersection of a curve with the f o -axis.Dispersion AnalysisThe dispersion analysis presented in [?] is repeated here for validation with theresultantcurvesshowninFigure34.10. Asisthecasewiththehalf-loadedguide,theresults calculated withFEniCS agree wellwith previously published results.In the figure it is shown that for certain parts of the frequency range of interest,modes six and seven have complex propagation constants. Since the matricesin the eigenvalue problem are real valued, the complex eigenvalues – andthusthe propagation constants–must occurin complexconjugate pairs asis thecase here and reported earlier in [?]. These conjugate propagation constants are304

Evan Lezar and David B. Davidsonassociated with two equal modes propagating in opposite directions along thewaveguide and thus resulting in zero energy transfer. It should be noted thatfor lossy materials (not considered here), complex modes are expected but do notnecessarily occur in conjugate pairs [?].4¦/ko3210¥2¥110 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25f o [GHz]Figure 34.10: Dispersion curves for the first 7 even modes of shielded microstripline using a magnetic wall to enforce symmetry. Reference values from [?] areshown as . The presence of complex mode pairs are indicated by and •.34.4 Analysis of Waveguide DiscontinuitiesAlthough this chapter focuses on eigenvalue type problems related to waveguides,the use of FEniCS in waveguide analysis is not limited to such problems.Thissectionbrieflyintroducesthesolutionofproblemsrelatingtowaveguidediscontinuitiesas an additional problem class. Applications where the solutions ofthese problems are ofimportance to microwave engineers is the design ofwaveguidefilters as well as the analysis and optimisation of bends in a waveguidewhere properties such as the scattering parameters (S-parameters) of the deviceare calculated [?].The hybrid finite element-modal expansion technique discussed in [?] is implementedand used to solve problems related to H-plane waveguide discontinuities.For such cases – which are uniform in the vertical (y) direction transverseto the direction of propagation (z) – the problem reduces to a scalar one in twodimensions [?] with the operating variable the y-component of the electric fieldin the guide. In order to obtain the scattering parameters at each port of thedevice, the field on the boundary associated with the port is written as a summationof the tangential components of the incoming and outgoing waveguidemodes. These modes can either be computed analytically, when a junction is305

Electromagnetic Waveguide AnalysisMagnitude [dB]0§20§15§10§5§160§140§120§100§80§6010 11 12 13 14 15Frequency [GHz]Phase [degrees]Figure 34.11: Magnitude (solid line) and phase (dashed line) of the transmissioncoefficient (S 21 ) of a length of rectangular waveguide with dimensionsa = 18.35mm, b = 9.175mm, and l = 10mm. The analytical results for the samestructure are indicated by markers with and • indicating the magnitude andphase respectively.rectangular for example, or calculated with methods such as those discussed inthe preceding sections [?].Transmissionparameter(S 21 )resultsforalengthofhollowrectangularwaveguideare shown in Figure 34.11. As expected, the length of guide behaves as afixed value phase shifter [?] and the results obtained show excellent agreementwith the analytical ones.ddcport 1wport 2acsFigure 34.12: Schematic of an H-plane iris in a rectangular waveguide dimensions:a = 18.35mm, c = 4.587mm, d = 1mm, s = 0.5mm, and w = 9.175mm. Theguide has a height of b = 9.175mm. The ports are indicated by dashed lines onthe boundary of the device.306

Evan Lezar and David B. DavidsonA schematic for a more interesting example is the H-plane iris shown in Figure34.12. The figure shows the dimensions of the structure and indicates theport definitions. The boundaries indicated by a solid line is a PEC material.The magnitude and phase results for the S-parameters of the device are given inFigure 34.13 and compared to the results published in [?] with good agreementbetween the two sets of data.Magnitude [dB]¨6¨5¨4¨3¨2¨110 11 12 13 14 15Frequency [GHz](a) Magnitude.Phase [degrees]140120100806040200©2010 11 12 13 14 15Frequency [GHz](b) Phase.Figure 34.13: Results for the magnitude and phase of the reflection coefficient(S 11 – solid line) and transmission coefficient (S 21 – dashed line) of an H-planeirisinarectangularwaveguideshowninFigure34.12. Referenceresultsfrom[?]areindicatedbymarkerswith and •indicating thereflectionandtransmissioncoefficient respectively.34.5 ConclusionIn this chapter, the solutions of cutoff and dispersion problems associated withelectromagnetic waveguiding structures have been implemented and the resultsanalysed. In all cases, the results obtained agree well with previously publishedor analytical results. This is also the case where the analysis of waveguidediscontinuities are considered, and although the solutions shown here are restrictedto H-plane waveguide discontinuities, the methods applied are applicableto other classes of problems such as E-plane junctions and full 3D analysis.This chapter has also illustrated the ease with which complex formulationscan be implemented and how quickly solutions can be obtained. This is largelydue to the almost one-to-one correspondence between the expressions at a formulationlevel and the high-level code that is used to implement a particularsolution. Even in cases where the required functionality is limited or missing,the use of FEniCS in conjunction with external packages greatly reduces developmenttime.307

CHAPTER35Applications in Solid MechanicsBy Kristian B. Ølgaard and Garth N. WellsChapter ref: [oelgaard-1]Summarise work on automated modelling for solid mechanics, with applicationto hyperelasticity, plasticity and strain gradient dependent models. Specialattention will be paid the linearisation of function which do come from a finiteelement space.309

CHAPTER36Modelling Evolving DiscontinuitiesBy Mehdi Nikbakht and Garth N. WellsChapter ref: [nikbakht]Summarise work on automated modelling of PDEs with evolving discontinuities,e.g. cracks.311

CHAPTER37Optimal Control ProblemsBy Kent-Andre Mardal, Oddrun Christine Myklebust and Bjørn Fredrik NielsenChapter ref: [mardal-3]This chapter is devoted to the study of several optimal control problems andtheir implementation in FEniCS. We start with an abstract description of optimalcontrol problems before we apply the abstract theory to several concreteexamples. We also discuss the construction of block preconditioners for efficientsolution of these problems.313

CHAPTER38Automatic Calibration of Depositional ModelsBy Hans Joachim SchrollChapter ref: [schroll]A novel concept for calibrating depositional models is presented. In this approachtransport coefficients are determined from well output measurements.Finite element implementation of the multi–lithology models and their duals isautomated by the FEniCS project DOLFIN using a python interface.38.1 Issues in dual lithology sedimentationDifferent types of forward computer models are being used by sedimentologistsand geomorphologists to simulate the process of sedimentary deposition over geologicaltime periods. The models can be used to predict the presence of reservoirrocks and stratigraphic traps at a variety of scales. State–of–the–art advancednumerical software provides accurate approximations to the mathematicalmodel, which commonly is expressed in terms of a nonlinear diffusion dominatedPDE system. The potential of todays simulation software in industrialapplications is limited however, due to major uncertainties in crucial materialparameters that combine a number of physical phenomena and therefore aredifficult to quantify. Examples of such parameters are diffusive transport coefficients.The idea in thiscontribution is to calibrate uncertain transport coefficients todirect observable data, like well measurements from a specific basin. In this approachthe forward evolution process, mapping data to observations, is reversedto determine the data, i.e. transport coefficients. Mathematical tools and numericalalgorithms are applied to automatically calibrate geological models to315

Automatic Calibration of Depositional Modelsactual observations — a critical but so far missing link in forward depositionalmodeling.Automaticcalibration, incombinationwithstochasticmodeling,willboosttheapplicability and impact of modern numerical simulations in industrial applications.38.2 A multidimensional sedimentation modelSubmarine sedimentation is an evolution process. By flow into the basin, sedimentsbuild up and evolve in time. The evolution follows geophysical laws, expressedas diffusive PDE models. The following system is a multidimensionalversion of the dual lithology model by Rivenæs [?, ?]( A s−A 1 − s) ( sh)t= ∇ ·( αs∇hβ(1 − s)∇h)in [0, T] × B . (38.1)Here h denotes the thickness of a layer of deposit and s models the volume fractionforthesandlithology.Consequently, 1−sisthefractionformud. Thesystemisdriven by fluxesantiproportional tothe flowrates s∇hand (1 −s)∇hresultingin a diffusive, but incompletely parabolic, PDE system. The domain of the basinis denoted by B. Parameters in the model are: The transport layer thickness Aand the diffusive transport coefficients α, β.For a forward in time simulation, the system requires initial and boundarydata. At initial time, the volume fraction s and the layer thickness h need to bespecified. According to geologists, such data can be reconstructed by some kindof “back stripping”. Along the boundary of the basin, the flow rates s∇h and(1 − s)∇h are given.38.3 An inverse approachThe parameter–to–observation mapping R : (α, β) ↦→ (s, h) is commonly referredto as the forward problem. In a basin direct observations are only availableat wells. Moreover, from the age of the sediments, their history can be reconstructed.Finally, well–data is available in certain well areas W ⊂ B and backwardin time.The objective of the present investigation is to determine transport coefficientsfrom observed well–data and in that way, to calibrate the model to thedata. This essentially means to invert the parameter–to–observation mapping.Denoting observed well–data by (˜s,˜h), the goal is to minimize the output functionalJ(α, β) = 1 ∫ T ∫(˜s − s) 2 + (˜h − h) 2 dx dt (38.2)|W | 0 W316

Hans Joachim Schrollwith respect to the transport coefficients α and β.In contrast to the ”direct inversion” as described by Imhof and Sharma [?],which is considered impractical, we do not propose to invert the time evolutionof the diffusive depositional process. We actually use the forward–in–time evolutionof sediment layers in a number of wells to calibrate transport coefficients.Via the calibrated model we can simulate the basin and reconstruct its historicevolution. By computational mathematical modeling, the local data observed inwells determines the evolution throughout the entire basin.38.4 The Landweber algorithmInaslightlymoreabstractsetting,thetaskistominimizeanobjectivefunctionalJ which implicitly depends on the parameters p via u subject to the constraintthat usatisfiessomePDEmodel;aPDEconstrainedminimizationproblem: Findp such that J(p) = J(u(p)) = min and PDE(u, p) = 0.Landweber’ssteepestdecentalgorithm [?]iteratesthefollowingsequenceuntilconvergence:1. Solve PDE(u k , p k ) = 0 for u k .2. Evaluate d k = −∇ p J(p k )/‖∇ p J(p k )‖.3. Update p k+1 = p k + ∆p k d k .Note that the search direction d k , the negative gradient, is the direction ofsteepest decent. To avoid scale dependence, the search direction is normed.The increment ∆p k isdeterminedby aonedimensionalline search algorithm,minimizing a locally quadratic approximation to J along the line p k + γd k , γ ∈ R.We use the ansatzJ(p k + γd k ) = aγ 2 + bγ + c , γ ∈ R .The extreme value of this parabola is located atγ k = − b2a . (38.3)To determine ∆p k = γ k , the parabola is fitted to the local data. For example b isgiven by the gradientJ γ (p k ) = b = ∇J(p k ) · d k = −‖∇J(p k )‖ 2 .To find a, another gradient of J along the line p k + γd k is needed. To avoid anextra evaluation, we project p k−1 = p k − γ k−1 d k−1 onto the line p k + γd k , that isπ(p k−1 ) = p k − γ ∗ d k and approximate the directional derivativeJ γ (p k − γ ∗ d k ) ≈ ∇J(p k−1 ) · d k . (38.4)317

Automatic Calibration of Depositional ModelsNote that this approximation is exact if two successive directions d k−1 and d k arein line. Elementary geometry yields γ ∗ = γ k−1 cosϕ and cos ϕ = d k−1 · d k . Thusγ ∗ = γ k−1 · d k−1 · d k . From J γ (p k − γ ∗ d k ) = −2aγ ∗ + b and (38.4) we find−2a = (∇J(pk−1 ) − ∇J(p k )) · d kγ k−1 · d k−1 · d k .Finally, the increment (38.3) evaluates as∆p k = γ k =∇J(p k ) · ∇J(p k−1 )∇J(p k ) · ∇J(p k−1 ) − ‖∇J(p k )‖ · ∇J(p k )2 ∇J(p k−1 ) · dk−1 .38.5 Evaluation of gradients by duality argumentsEvery single step of Landweber’s algorithm requires the simulation of a time dependent,nonlinear PDE system and the evaluation of the gradient of the objectivefunctional. The most common approach to numerical derivatives, via finitedifferences,isimpractical forcomplexproblems: Finite difference approximationwould require to perform n + 1 forward simulations in n parameter dimensions.Using duality arguments however, n nonlinear PDE systems can be replaced byone linear, dual problem. After all, J is evaluated by one forward simulationof the nonlinear PDE model and the complete gradient ∇J is obtained by one(backward) simulation of the linear, dual problem. Apparently, one of the firstreferences to this kind of duality arguments is [?].The concept is conveniently explained for a scalar diffusion equationu t = ∇ · (α∇u) .Astransportcoefficientsmayvarythroughoutthebasin, weallowforapiecewiseconstant coefficient{α1 x ∈ B 1α =.α 2 x ∈ B 2Assuming no flow along the boundary and selecting a suitable test function φ,the equation in variational form readsA(u, φ) :=∫ T0∫Bu t φ + α∇u · ∇φdxdt = 0 .Taking an derivative /.∂α i , i = 1, 2 under the integral sign, we findA(u αi , φ) =∫ T0∫B∫ T ∫u αi ,tφ + α∇u αi · ∇φdxdt = − ∇u · ∇φdxdt . (38.5)0 B i318

Hans Joachim SchrollThecorrespondingderivativeoftheoutputfunctional J = ∫ T ∫0 W (u−d)2 dxdtreads∫ T∫J αi = 2 (u − d)u αi dxdt , i = 1, 2 .0 WThe trick is to define a dual problemA(φ, ω) = 2∫ T0∫W(u − d)φdxdtsuch that A(u αi , ω) = J αi and by using the dual solution ω in (38.5)∫ T ∫A(u αi , ω) = J αi = − ∇u · ∇ωdxdt , i = 1, 2 .0 B iIn effect, the desired gradient ∇J = (J α1 , J α2 ) is expressed in terms of primal–and dual solutions. In this case the dual problem reads∫ T ∫∫ T ∫φ t ω + α∇φ · ∇ωdxdt = 2 (u − d)φdxdt ,0Bwhich in strong form appears as a backward in time heat equation with zeroterminal condition−ω t = ∇ · (α∇ω) + 2(u − d)| W . (38.6)Note that this dual equation is linear and entirely driven by the data mismatchin the well. With perfectly matching data d = u| W , the dual solution is zero.Along the same lines of argumentation one derives the multilinear operatorto the depositional model (38.1)A(u, v)(φ, ψ) =∫ T ∫(Au t + uh t + sv t )φ + αu∇h · ∇φ + αs∇v · ∇φdxdt+0 B∫ T0∫B0(−Au t − uh t (1 − s)v t ) ψ − βu∇h · ∇ψ + β(1 − s)∇v · ∇ψdxdt .The dual system related to the well output functional (38.2) reads∫ T∫A(φ, ψ)(ω, ν) = 2 (s − ˜s)φ + (h − ˜h)ψdxdt .0WBy construction it follows A(s p , h p )(ω, ν) = J p (α, β). Given both primal and dualsolutions, the gradient of the well output functional evaluates as∫ T∫J αi (α, β) = − s∇h · ∇ωdxdt ,0 B i∫ T ∫J βi (α, β) = − (1 − s)∇h · ∇νdxdt .0 B i319W

Automatic Calibration of Depositional ModelsA detailed derivation including non zero flow conditions is given in [?]. Forcompleteness, not for computation(!), we state the dual system in strong form−A(ω − ν) t + h t (ω − ν) + α∇h · ∇ω = β∇h · ∇ν + 2|W | (s − ˜s) ∣ ∣∣∣W−(sω + (1 − s)ν) t = ∇ · (αs∇ω + β(1 − s)∇ν) + 2 (h − ˜h)|W |∣ .WObviously the system is linear and driven by the data mismatch at the well.It always comes with zero terminal condition and no flow conditions along theboundary of the basin. Thus, perfectly matching data results in a trivial dualsolution.38.6 Aspects of the implementationThe FEniCS project DOLFIN [?] automates the solution of PDEs in variationalformulation and is therefore especially attractive for implementing dual problems,whicharederivedinvariational form. Inthissectionthecoding ofthedualdiffusionequation(38.6)isillustrated. Testingtheequationinspace, supp ϕ ⊂ B,the weak form reads∫B∫−ω t ϕ + α∇ω · ∇ϕdx = 2 (u − d)ϕdx .WTrapezoidal rule time integration gives∫− (ω n+1 − ω n )ϕdx + ∆t ∫α∇(ω n+1 + ω n ) · ∇ϕdxB2 B∫= ∆t (ω n+1 − d n+1 + ω n − d n )ϕdx , n = N, N − 1, . . .,0 .W(38.7)To evaluate the right hand side, the area of the well is defined as an subdomain:class WellDomain(SubDomain):def inside(self, x, on_boundary):return bool((0.2

Hans Joachim SchrolldxWell = Integral("cell", 1)The driving source in (38.7) is written as:f = dt*(u1-d1+u0-d0)*phi*dxWellb = assemble(f, mesh, cell_domains=subdomains)The first line in (38.7) is stated in variational formulation:F = (u_trial-u)*phi*dx \+ 0.5*dt*( d*dot( grad(u_trial+u), grad(phi) ) )*dxLet DOLFIN sort out left– and right hand sides:a = lhs(F); l = rhs(F)Assemble the linear system in matrix–vector form:A = assemble(a, mesh); b += assemble(l, mesh)And solve it:solve(A, u.vector(), b)The direct solver may be replaced by GMRES with algebraic multigrid:solve(A, u.vector(), b, gmres, amg)38.7 Numerical experimentsWith these preparations, we are now ready to inspect the well output functional(38.2) for possible calibration of the dual lithology model (38.1) to “observed”,actually generated synthetic, data. We consider the PDE system (38.1) withdiscontinuous coefficientsα ={α1 x ≥ 1/2α 2 x < 1/2, β ={β1 x ≥ 1/2β 2 x < 1/2in the unit square B = [0, 1] 2 . Four wells W = W 1 ∪ W 2 ∪ W 3 ∪ W 4 are placed onein each quarterW 4 = [0.3, 0.3] × [0.7, 0.8] , W 3 = [0.7, 0.8] × [0.7, 0.8] ,W 1 = [0.2, 0.3] × [0.2, 0.3] , W 2 = [0.7, 0.8] × [0.2, 0.3] .Initially s is constant s(0, ·) = 0.5 and h is piecewise linearh(0, x, y) = 0.5 max(max(0.2, (x − 0.1)/2), y − 0.55) .321

Automatic Calibration of Depositional ModelsFigure 38.1: Evolution of h, initial left, t = 0.04 right.The diffusive character of the process is evident from theevolution of has shownin Figure 38.1. No flow boundary conditions are implemented in all simulationsthroughout this section.To inspect the output functional, we generate synthetic data by computing areferencesolution. Inthefirstexperiment,thereferenceparametersare (α 1 , α 2 ) =(β 1 , β 2 ) = (0.8, 0.8). We fix β to the reference values and scan the well output overthe α–range [0.0, 1.5] 2 . The upper left plot in Figure 38.2 depicts contours of theapparently convex functional, with the reference parameters as the global minimum.Independent Landweber iterations, started in each corner of the domainidentify the optimal parameters in typically five steps. The iteration is stoppedif ‖∇J(p k )‖ ≤ 10 −7 , an indication that the minimum is reached. The lower leftplot shows the corresponding scan over β where α = (0.8, 0.8) is fixed. Obviouslythe search directions follow the steepest decent, confirming that the gradientsare correctly evaluated via the dual solution. In the right column of Figure 38.2we see results for the same experiments, but with 5% random noise added to thesyntheticwelldata. Inthiscasetheoptimalparametersareofcoursenotthereferenceparameters, but still close. The global picture appears stable with respectto noise, suggesting that the concept allows to calibrate diffusive, depositionalmodels to data observed in wells.Ultimately, the goal is to calibrate all four parameters α = (α 1 , α 2 ) and β =(β 1 , β 2 ) to available data. Figure 38.3 depicts Landweber iterations in four dimensionalparameter space. Actually projections onto the α and β coordinateplane are shown. Each subplot depicts one iteration. The initial guess variesfrom line to line. Obviously, all iterations converge and, without noise added,the reference parameters, α = β = (0.8, 0.8), are detected as optimal parameters.Adding 5% random noise to the recorded data, we simulate data observedin wells. In this situation, see the right column, the algorithm identifies optimalparameters, which are clearly off the reference. Fig. 38.5 depicts fifty realiza-322

Hans Joachim Schrolltions of this experiments. The distribution of the optimal parameters is showntogether with their average in red. The left column in Fig. 38.5 corresponds tothe reference parameters (α 1 , α 2 ) = (β 1 , β 2 ) = (0.8, 0.8) as in Fig. 38.3. The initialguesses vary from row to row and are the same as in Fig. 38.3. On averagethe calibrated, optimal parameters are close to the reference. Typical standarddeviations vary from 0.07 to 0.17, depending on the coefficient.In the next experiments non uniform reference parameters are set for α =(0.6, 1.0) and β = (1.0, 0.6). Figure 38.4 shows iterations with the noise–free referencesolution used as data on the left hand side. Within the precision of thestopping criterion, the reference parameters are detected. Adding 5% noise tothe well data leads to different optimal parameters, just as expected. On averagehowever, the optimal parameters obtained in repeated calibrations matchthe reference parameters quite well, see Figure 38.5, right hand side.In the next experiments, β is discontinuous along y = 1/2 and piecewise constantin the lower and upper half of the basinα ={α1 x ≥ 1/2α 2 x < 1/2, β ={β1 y ≥ 1/2β 2 y < 1/2.In this way the evolution is governed by different diffusion parameters in eachquarter of the basin. Having placed one well i each quarter, one can effectivelycalibrate the model to synthetic data with and without random noise, as shownin Figures 38.6 and 38.7.38.8 Results and conclusionThe calibration of piecewise constant diffusion coefficients using local data ina small number of wells is a well behaved inverse problem. The convexity ofthe output functional, which is the basis for a successful minimization, remainsstable with random noise added to the well data.We have automated the calibration of diffusive transport coefficients in twoways: First, the Landweber algorithm, with duality based gradients, automaticallydetects optimal parameters. Second, the FEniCS project DOLFIN, automaticallyimplements the methods. As the dual problems are derived in variationalform, DOLFIN is the appropriate tool for efficient implementation.AcknowledgmentsThe presented work was funded by a research grant from StatoilHydro. Theauthor wants to thank Are Magnus Bruaset for initiating this work. Manythanks to Bjørn Fredrik Nielsen for thoughtful suggestions regarding inverseproblems as well as Anders Logg and Ola Skavhaug for their support regardingthe DOLFIN software. The work has been conducted at Simula Research323

Automatic Calibration of Depositional Models1.41.41.21.21.01.0α20.8α20.80.60.60.40.40.20.20.00.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4α 10.00.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4α 11.41.41.21.21.01.0β20.8β20.80.60.60.40.40.20.20.00.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4β 10.00.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4β 1Figure 38.2: Contours of J and Landweber iterations, optimal parameters(green), reference parameters (red). Clean data left column, noisy data right.324

Hans Joachim Schroll1.61.4αβ1.61.4αβ1.21.2α2, β21.00.8α2, β21.00.80.60.60.40.40.20.20.00.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6α 1 , β 10.00.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6α 1 , β 11.61.4αβ1.61.4αβ1.21.2α2, β21.00.8α2, β21.00.80.60.60.40.40.20.20.00.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6α 1 , β 10.00.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6α 1 , β 11.61.4αβ1.61.4αβ1.21.2α2, β21.00.8α2, β21.00.80.60.60.40.40.20.20.00.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6α 1 , β 10.00.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6α 1 , β 11.61.4αβ1.61.4αβ1.21.2α2, β21.00.8α2, β21.00.80.60.60.40.40.20.20.00.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6α 1 , β 10.00.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6α 1 , β 1Figure 38.3: Landweber iterations. Clean (left) and noisy data (right).325

Automatic Calibration of Depositional Models1.61.4αβ1.61.4αβ1.21.2α2, β21.00.8α2, β21.00.80.60.60.40.40.20.20.00.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6α 1 , β 10.00.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6α 1 , β 11.61.4αβ1.61.4αβ1.21.2α2, β21.00.8α2, β21.00.80.60.60.40.40.20.20.00.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6α 1 , β 10.00.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6α 1 , β 11.61.4αβ1.61.4αβ1.21.2α2, β21.00.8α2, β21.00.80.60.60.40.40.20.20.00.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6α 1 , β 10.00.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6α 1 , β 11.61.4αβ1.61.4αβ1.21.2α2, β21.00.8α2, β21.00.80.60.60.40.40.20.20.00.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6α 1 , β 10.00.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6α 1 , β 1Figure 38.4: Landweber iterations. Clean (left) and noisy data (right).326

Hans Joachim Schroll1.61.61.41.41.21.2α2, β21.00.80.6α2, β21.00.80.60.40.40.20.00.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6α 1 , β 10.20.00.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6α 1 , β 11.61.61.41.41.21.2α2, β21.00.80.6α2, β21.00.80.60.40.40.20.00.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6α 1 , β 10.20.00.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6α 1 , β 11.61.61.41.41.21.2α2, β21.00.80.6α2, β21.00.80.60.40.40.20.00.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6α 1 , β 10.20.00.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6α 1 , β 11.61.61.41.41.21.2α2, β21.00.80.6α2, β21.00.80.60.40.40.20.00.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6α 1 , β 10.20.00.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6α 1 , β 1Figure38.5: Setsofoptimalparameterscalibratedtonoisydata, αblue, β yellow,average red.327

Automatic Calibration of Depositional Models1.61.4αβ1.61.4αβ1.21.2α2, β21.00.8α2, β21.00.80.60.60.40.40.20.20.00.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6α 1 , β 10.00.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6α 1 , β 11.61.4αβ1.61.4αβ1.21.2α2, β21.00.8α2, β21.00.80.60.60.40.40.20.20.00.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6α 1 , β 10.00.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6α 1 , β 11.61.4αβ1.61.4αβ1.21.2α2, β21.00.8α2, β21.00.80.60.60.40.40.20.20.00.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6α 1 , β 10.00.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6α 1 , β 11.61.4αβ1.61.4αβ1.21.2α2, β21.00.8α2, β21.00.80.60.60.40.40.20.20.00.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6α 1 , β 10.00.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6α 1 , β 1Figure 38.6: Landweber iterations. Clean (left) and noisy data (right).328

Hans Joachim Schroll1.61.4αβ1.61.4αβ1.21.2α2, β21.00.8α2, β21.00.80.60.60.40.40.20.20.00.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6α 1 , β 10.00.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6α 1 , β 11.61.4αβ1.61.4αβ1.21.2α2, β21.00.8α2, β21.00.80.60.60.40.40.20.20.00.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6α 1 , β 10.00.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6α 1 , β 11.61.4αβ1.61.4αβ1.21.2α2, β21.00.8α2, β21.00.80.60.60.40.40.20.20.00.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6α 1 , β 10.00.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6α 1 , β 11.61.4αβ1.61.4αβ1.21.2α2, β21.00.8α2, β21.00.80.60.60.40.40.20.20.00.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6α 1 , β 10.00.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6α 1 , β 1Figure 38.7: Landweber iterations. Clean (left) and noisy data (right).329

Automatic Calibration of Depositional ModelsLaboratory as part of CBC, a Center of Excellence awarded by the NorwegianResearch Council.330

CHAPTER39Computational ThermodynamicsBy Johan Hoffman, Claes Johnson and Murtazo NazarovChapter ref: [hoffman-3]◮ Editor note: Missing figures, need to be added in EPS sub directory.◮ Editor note: Missing bibliography, needs to be added to common bibliography.bib incorrect format.We test the functionality of FEniCS on the challenge of computational thermodynamicsintheformofafiniteelementsolveroftheEulerequationsexpressingconservation of mass, momentum and energy. We show that computationalsolutionssatisfy a2ndLaw formulatedin termsofkineticenergy, internal(heat)energy, work and shock/turbulent dissipation, without reference to entropy. Weshow that the 2nd Law expresses an irreversible transfer of kinetic energy toheat energy in shock/turbulent dissipation arising because the Euler equationslack pointwise solutions, and thus explains the occurence of irreversibility informally reversible systems as an effect of instability with blow-up of Eulerresiduals combined with finite precision computation, without resort to statisticalmechanics or ad hoc viscous regularization. We simulate the classical Jouleexperiment of a gas expanding from rest under temperature drop followed bytemperature recovery by turbulent dissipation until rest in the double volume.39.1 FEniCS as Computational ScienceThe goal of the FEniCS project is to develop software for automated computationalsolution of differential equations based on a finite element methodologycombining generality with efficiency. The FEniCS Application Unicorn offers a331

Computational Thermodynamicssolverforafluid-solidscontinuumbasedonaunifiedEulerianformulationofmomentumbalance combined with constitutive laws for fluid/solid in Eulerian/updatedLagrangian form, which shows the capability of FEniCS for automation ofcomputational fluid-structure interaction.Thermodynamics is a basic area of continuum mechanics with many importantapplications, which however is feared by both teachers, students and engineersas being difficult to understand and to apply, principally because of theapperance of turbulence. In this article we show that turbulent thermodynamicscan be made understandable and useful by automated computational solution,as another example of the capability of FEniCS.The biggest mystery of classical thermodynamics is the 2nd Law about entropyand automation cannot harbor any mystery. Expert systems are requiredformysteriesandFEniCSisnotanexpertsystem. Automationrequiresacontinuummechanicsformulationofthermodynamicswithatransparent2ndLaw. Wepresent a formulation of thermodynamics based on finite precision computationwitha2nd Law withoutreference toentropy, which weshowcan serve as abasisfor automated computational simulation of complex turbulent thermodynamicsand thus can open to new insight and design, a main goal of FEniCS. In thissetting the digital finite element model becomes the real model of the physics ofthermodynamicsviewedasaformofanalogfiniteprecisioncomputation,amodelwhich is open to inspection and analysis because solutions can be computed andput on the table. This represents a new kind of science in the spirit of Dijkstra[?] and Wolfram [?], which can be explored using FEniCS and which we presentin non-technical form in My Book of Knols [?].39.2 The 1st and 2nd Laws of ThermodynamicsHeat, a quantity which functions to animate, derives from an internal firelocated in the left ventricle. (Hippocrates, 460 B.C.)Thermodynamics is fundamental in a wide range of phenomena from macroscopicto microscopic scales. Thermodynamics essentially concerns the interplaybetween heat energy and kinetic energy in a gas or fluid. Kinetic energy, or mechanicalenergy, may generate heat energy by compression or turbulent dissipation.Heat energy may generate kinetic energy by expansion, but not througha reverse process of turbulent dissipation. The industrial society of the 19thcentury was built on the use of steam engines, and the initial motivation to understandthermodynamics came from a need to increase the efficiency of steamengines for conversion of heat energy to useful mechanical energy. Thermodynamicsis closely connected to the dynamics of slightly viscous and compressiblegases, since substantial compression and expansion can occur in a gas, but lessin fluids (and solids).332

Johan Hoffman, Claes Johnson and Murtazo NazarovThe development of classical thermodynamics as a rational science based onlogical deduction from a set of axioms, was initiated in the 19th century byCarnot [?], Clausius [?] and Lord Kelvin [?], who formulated the basic axiomsin the form of the 1st Law and the 2nd Law of thermodynamics. The 1st Lawstates (for an isolated system) that the total energy, the sum of kinetic and heatenergy, isconserved. The 1st Law isnaturally generalized toinclude also conservationof mass and Newton’s law of conservation of momentum and then can beexpressed as the Euler equations for a gas/fluid with vanishing viscosity.The 2nd Law has the form of an inequality dS ≥ 0 for a quantity namedentropy denoted by S, with dS denoting change thereof, supposedly expressinga basic feature of real thermodynamic processes. The classical 2nd Law statesthat the entropy cannot decrease; it may stay constant or it may increase, but itcan never decrease (for an isolated system).The role of the 2nd Law is to give a scientific basis to the many observationsof irreversible processes, that is, processes which cannot be reversed in time, likerunning a movie backwards. Time reversal of a process with strictly increasingentropy, would correspond to a process with strictly decreasing entropy, whichwould violate the 2nd Law and therefore could not occur. A perpetum mobilewouldrepresent a reversible process and so the role ofthe 2nd Law is in particulartoexplainwhyitisimposssible toconstructaperpetummobile, andwhytimeis moving forward in the direction an arrow of time, as expressed by Max Planck[?, ?, ?]: Were it not for the existence of irreversible processes, the entire edifice ofthe 2nd Law would crumble.While the 1st Law in the form ofthe Eulerequations expressing conservationof mass, momentum and total energy can be understood and motivated on rationalgrounds, the nature of the 2nd Law is mysterious. It does not seem to be aconsequenceofthe1st Law,since theEulerequationsseemtobe timereversible,andtheroleofthe2ndLawistoexplain irreversibility. Thusquestionsareliningup: nIf the 2nd Law is a new independent law of Nature, how can it be justified?What is the physical significance of that quantity named entropy, which Naturecan only get more of and never can get rid of, like a steadily accumulating heapof waste? What mechanism prevents Nature from recycling entropy? How canirreversiblity arise in a reversible system? How can viscous dissipation arise inasystemwithvanishing viscosity? WhyistherenoMaxwelldemon[?]? Whycana gas by itself expand into a larger volume, but not by itself contract back again,if the motion of the gas molecules is governed by the reversible Newton’s laws ofmotion? Why is there an arrow of time? This article presents answers.39.3 The EnigmaThose who have talked of “chance” are the inheritors of antique superstitionand ignorance...whose minds have never been illuminated by a ray of scien-333

Computational Thermodynamicstific thought. (T. H. Huxley)Thesewerethequestionswhichconfrontedscientistsinthelate 19thcentury,after the introduction of the concept of entropy by Clausius in 1865, and theseshowed to be tough questions to answer. After much struggle, agony and debate,the agreement of the physics community has become to view statistical mechanicsbased on an assumption of molecular chaos as developed by Boltzmann [?],to offer a rationalization of the classical 2nd Law in the form of a tendency of(isolated) physical processes to move from improbable towards more probablestates, or from ordered to less ordered states. Boltzmann’s assumption of molecularchaos in a dilute gas of colliding molecules, is that two molecules about tocollide have independent velocities, which led to the H-theorem for Boltzmann’sequations stating that a certain quantity denoted by H could not decrease andthus could serve as an entropy defining an arrow of time. Increasing disorderwould thus represent increasing entropy, and the classical 2nd Law would reflectthe eternal pessimistists idea that things always get more messy, and thatthere is really no limit to this, except when everything is as messy as it can everget. Of course, experience could give (some) support this idea, but the troubleis that it prevents things from ever becoming less messy or more structured,and thus may seem a bit too pessimistic. No doubt, it would seem to contradictthe many observations of emergence of ordered non-organic structures (like crystalsor waves and cyclons) and organic structures (like DNA and human beings),seemingly out of disordered chaos, as evidenced by the physics Nobel LaureateRobert Laughlin [?].Most trained thermodynamicists would here say that emergence of order outof chaos, in fact does not contradict the classical 2nd Law, because it concerns“non-isolated systems”. But they would probably insist that the Universe as awhole (isolated system) would steadily evolve towards a “heat-death” with maximalentropy/disorder(andnolife),thusfulfilling thepessimistsexpectation. Thequestion from where the initial order came from, would however be left open.Thestandardpresentationofthermodynamicsbasedonthe1stand2ndLaws,thusinvolves amixture ofdeterministic models(Boltzmann’sequationswith theH-theorem) based on statistical assumptions (molecular chaos) making the subjectadmittedly difficult to both learn, teach and apply, despite its strong importance.This is primarily because the question why necessarily dS ≥ 0 and neverdS < 0, is not given a convincing understandable answer. In fact, statisticalmechanics allows dS < 0, although it is claimed to be very unlikely. The basicobjective of statistical mechanics as the basis of classical thermodynamics, thusis to (i) give the entropy a physical meaning, and (ii) to motivate its tendencyto (usually) increase. Before statistical mechanics, the 2nd Law was viewed asan experimental fact, which could not be rationalized theoretically. The classicalview on the 2nd Law is thus either as a statistical law of large numbers or asa an experimental fact, both without a rational deterministic mechanistic theo-334

Johan Hoffman, Claes Johnson and Murtazo Nazarovretical foundation. The problem with thermodynamics in this form is that it isunderstood by very few, if any:• Every mathematician knowsit is impossible to understand an elementary course inthermodynamics. (V. Arnold)• ...no one knows what entropy is, so if you in a debate use this concept, you willalways have an advantage. (von Neumann to Shannon)• As anyone who has taken a course in thermodynamics is well aware, the mathematicsused in proving Clausius’ theorem (the 2nd Law) is of a very special kind,having only the most tenous relation to that known to mathematicians. (S. Brush[?])• Where does irreversibility come from? It does not come form Newton’s laws. Obviouslythere must be some law, some obscure but fundamental equation. perhaps inelectricty, maybe in neutrino physics, in which it does matter which way time goes.(Feynman [?])• For three hundred years science has been dominated by a Newtonian paradigmpresentingtheWorldeitherasasterilemechanicalclockorinastateofdegenerationand increasing disorder...It has always seemed paradoxical that a theory based onNewtonian mechanics can lead to chaos just because the number of particles islarge, and it is subjectivly decided that their precise motion cannot be observed byhumans... IntheNewtonianworldofnecessity, thereisnoarrowoftime. Boltzmannfound an arrow hidden in Nature’s molecular game of roulette. (Paul Davies [?])• The goal of deriving the law of entropy increase from statistical mechanics has sofar eluded the deepest thinkers. (Lieb [?])• Therearegreat physicists whohavenotunderstoodit. (EinsteinaboutBoltzmann’sstatistical mechanics)39.4 Computational FoundationIn this note we present a foundation of thermodynmaics, further elaborated in[?, ?], where the basic assumption of statistical mechanics of molecular chaos, isreplaced by deterministic finite precision computation, more precisely by a leastsquares stabilized finite element method for the Euler equations, referred to asEulerGeneralGalerkin orEG2. Inthespirit ofDijkstra[?], wethusview EG2asthephysical modelof thermodynamics, that isthe Eulerequations togetherwitha computational solution procedure, and not just the Euler equations withoutconstructive solution procedure as in a classical non-computational approach.UsingEG2asamodelofthermodynamicschangesthequestionsandanswersand opens new possibilities of progress together with new challenges to mathematicalanalysis and computation. The basic new feature is that EG2 solutions335

Computational Thermodynamicsare computed and thus are available to inspection. This means that the analysisof solutions shifts from a priori to a posteriori; after the solution has beencomputed it can be inspected.Inspecting computed EG2 solutions we find that they are turbulent and haveshocks, which is identified by pointwise large Euler residuals, reflecting thatpointwise solutions to the Euler equations are lacking. The enigma of thermodynamicsis thus the enigma of turbulence (since the basic nature of shocks isunderstood). Computational thermodynamics thus essentially concerns computationalturbulence. In this note and [?] we present evidence that EG2 opens toa resolution of the enigma of turbulence and thus of thermodynamics.The fundamental question concerns wellposedness in the sense of Hadamard,that is what aspects or outputs of turbulent/shock solutions are stable underperturbations in the sense that small perturbations have small effects. We showthatwellposednessofEG2solutionscanbetestedaposterioribycomputationallysolving a dual linearized problem, through which the output sensitivity of nonzeroEuler residuals can be estimated. We find that mean-value outputs such asdrag and lift and total turbulent dissipation are wellposed, while point-values ofturbulent flow are not. We can thus a posteriori in a case by case manner, assessthe quality of EG2 solutions as solutions of the Euler equations.We formulate a 2nd Law for EG2 without the concept of entropy, in terms ofthe basic physical quantities of kinetic energy K, heat energy E, rate of work Wand shock/turbulent dissipation D > 0. The new 2nd Law reads˙K = W − D, Ė = −W + D, (39.1)where the dot indicates time differentiation. Slightly viscous flow always developsturbulence/shocks with D > 0, and the 2nd Law thus expresses an irreversibletransfer of kinetic energy into heat energy, while the total energy E + Kremains constant.With the 2nd Law in the form (39.1), we avoid the (difficult) main task ofstatistical mechanics of specifying the physical significance of entropy and motivatingits tendency to increase by probabilistic considerations based on (tricky)combinatorics. Thus using Ockham’s razor [?], we rationalize a scientific theoryof major importance making it both more understandable and more useful. Thenew 2nd Law is closer to classical Newtonian mechanics than the 2nd Law ofstatistical mechanics, and thus can be viewed to be more fundamental.The new 2nd Law is a consequence of the 1st Law in the form of the Eulerequations combined with EG2 finite precision computation effectively introducingviscosity and viscous dissipation. These effects appear as a consequence ofthe non-existence of pointwise solutions to the Euler equations reflecting instablitiesleading to the development shocks and turbulence in which large scalekinetic energy is transferred to small scale kinetic energy in the form of heat energy.The viscous dissipation can be interpreted as a penalty on pointwise large336

Johan Hoffman, Claes Johnson and Murtazo NazarovEulerresidualsarisinginshocks/turbulence, withthepenaltybeingdirectlycoupledtotheviolationfollowingaprincipleofcriminallawexposedin[?].EG2thusexplains the 2nd Law as a consequence of the non-existence of pointwise solutionswith small Euler residuals. This offers an understanding to the emergenceof irreversible solutions of the formally reversible Euler equations. If pointwisesolutions had existed, they would have been reversible without dissipation, butthey don’t exist, and the existing computational solutions have dissipation andthus are irreversible.39.5 Viscosity SolutionsAn EG2solution can be viewed as particular viscosity solution ofthe Euler equations,which is a solution of regularized Euler equations augmented by additiveterms modeling viscosity effects with small viscosity coefficients. The effectiveviscosity in an EG2 solution typically may be comparable to the mesh size.For incompressible flow the existence of viscosity solutions, with suitable solutiondependent viscosity coefficients, can be proved a priori using standardtechniques of analytical mathematics. Viscosity solutions are pointwise solutionsof the regularized equations. But already the most basic problem withconstant viscosity, the incompressible Navier-Stokes equations for a Newtonianfluid,presentstechnicaldifficulties,andisoneoftheopenClayMillenniumProblems.Forcompressibleflowthetechnicalcomplicationsareevenmoresevere,anditisnotclearwhichviscositieswouldberequiredforananalytical proofoftheexistenceof viscosity solutions [?] to the Euler equations. Furthermore, the questionof wellposedness is typically left out, as in the formulation of the Navier-StokesMillennium Problem, with the motivation that first the existence problem has tobe settled. Altogether, analytical mathematics seems to have little to offer a prioriconcerning the existence and wellposedness of solutions of the compressibleEuler equations. In contrast, EG2 computational solutions of the Euler equationsseem to offer a wealth of information a posteriori, in particular concerningwellposedness by duality.An EG2 solution thus can be viewed as a specific viscosity solution with aspecific regularization from the least squares stabilization, in particular of themomentum equation, which is necessary because pointwise momentum balanceis impossible to achieve in the presence of shocks/turbulence. The EG2 viscositycan be viewed to be the minimal viscosity required to handle the contradictionbehind the non-existence of pointwise solutions. For a shock EG2 could then bedirectly interpreted as a certain physical mechanism preventing a shock wavefrom turning over, and for turbulence as a form of automatic computational turbulencemodel.EG2 thermodynamics can be viewed as form of deterministic chaos, where337

Computational Thermodynamicsthe mechanism is open to inspection and can be used for prediction. On theotherhand, the mechanism of statistical mechanics is notopen to inspection andcan only be based on ad hoc assumption, as noted by e.g. Einstein [?]. If Boltzmann’sassumption of molecular chaos cannot be justified, and is not needed,why consider it at all, [?]?◮ Editor note: Missing figure.Figure 39.1: Joule’s 1845 experiment39.6 Joule’s 1845 ExperimentTo illustrate basic aspects of thermodynamics, we recall Joule’s experiment from1845withagasinitiallyatrestwithtemperature T = 1atacertainpressureinacertain volume immersed into a container of water, see Fig. 39.1. At initial timea valve was opened and the gas was allowed to expand into the double volumewhile the temperature change in the water was carefully measured by Joule. Tothe great surprise of both Joule and the scientific community, no change of thetemperature of the water could be detected, in contradiction with the expectationthat the gas would cool off under expansion. Moreover, the expansion wasimpossible to reverse; the gas had no inclination to contract back to the originalvolume. Simulating Joule’s experiment using EG2, we discover the following asdisplayed in Fig. 39.2-39.7Figure 39.2: Density at two time instantsFigure 39.3: Temperature at two time instantsIn a first phase the temperature drops below 1 as the gas expands with increasingvelocity, and in a second phase shocks/turbulence appear and heat thegas towards a final state with the gas at rest in the double volume and the temperatureback to T = 1. The total (heat) energy is, of course, conserved since thedensity is reduced by a factor 2 after expansion to double volume. We can alsounderstand that the rapidity ofthe expansion process makes it difficult to detectany temperature drop in the water in the inital phase. Altogether, using EG2we can first simulate and then understand Joule’s experiment, and we thus seeno reason to be surprised. We shall see below as a consequence of the 2nd Lawthat reversal of the process with the gas contracting back to the original small338

Johan Hoffman, Claes Johnson and Murtazo NazarovFigure 39.4: Average density in left and right chamberFigure 39.5: Average temperature in left and right chambervolume, is impossible because the only way the gas can be put into motion is byexpansion, and thus contraction is impossible.In statistical mechanics the dynamics of the process would be dismissed andonlytheinitial andfinal statewouldbe subject toanalysis. The finalstate wouldthen be viewed as being “less ordered” or “more probable” or having “higher entropy”,because the gas would occupy a larger volume, and the reverse processwith the gas contracting back to the initial small volume, if not completely impossible,would be “improbable”. But to say that a probable state is is moreprobable than an improbable state is more mystifying than informative. Takingthe true dynamics of the process into account including in particular the secondphase with heat generation from shocks or turbulence, we can understand theobservation of constant temperature and irreversibility in a deterministic fashionwithout using any mystics of entropy based on mystics of statistics. In [?] wedevelop a variety of aspects of the arrow of time enforced by the new 2nd Law.39.7 The Euler EquationsWe consider the Euler equations for an inviscid perfect gas enclosed in a volumeΩ in R 3 with boundary Γ over a time interval I = (0, 1] expressing conservationof mass density ρ, momentum m = (m 1 , m 2 , m 3 ) and internal energy e: Find û =(ρ, m, e) depending on (x, t) ∈ Q ≡ Ω × I such thatR ρ (û) ≡ ˙ρ + ∇ · (ρu) = 0 in Q,R m (û) ≡ ṁ + ∇ · (mu + p) = f in Q,R e (û) ≡ ė + ∇ · (eu) + p∇ · u = g in Q,u · n = 0 on Γ × Iû(·, 0) = û 0 in Ω,(39.2)where u = m is the velocity, p = (γ − 1)e with γ > 1 a gas constant, f is aρgiven volume force, g a heat source/sink and û 0 a given initial state. We hereexpress energy conservation in terms of the internal energy e = ρT, with T thetemperature, and not as conservation of the total energy ǫ = e + k with k = ρv22the kinetic energy, in the form ˙ǫ + ∇ · (ǫu) = 0. Because of the appearance ofFigure 39.6: Average kinetic energy and temperature: short time339

Computational ThermodynamicsFigure 39.7: Average kinetic energy in both, left and right chamber(s): long timeshocks/turbulence, the Euler equations lack pointwise solutions, except possiblefor short time, and regularization is therefore necessary. For a mono-atomic gasγ = 5/3and(39.2) thenisaparameter-free model,theidealform ofmathematicalmodel according to Einstein...39.8 Energy Estimates for Viscosity SolutionsFor the discussion we consider the following regularized version of (39.2) assumingfor simplicity that f = 0 and g = 0: Find û ν,µ ≡ û = (ρ, m, e) such thatR ρ (û) = 0 in Q,R m (û) = −∇ · (ν∇u) + ∇(µp∇ · u) in Q,R e (û) = ν|∇u| 2 in Q,u = 0 on Γ × I,û(·, 0) = û 0 in Ω,(39.3)where ν > 0 is a shear viscocity µ >> ν ≥ 0 if ∇ · u > 0 in expansion (with µ = 0if ∇ · u ≤ 0 in compression), is a small bulk viscosity, and we use the notation|∇u| 2 = ∑ i |∇u i| 2 . We shall see that the bulk viscosity is a safety feature puttinga limit to the work p∇ · u in expansion appearing in the energy balance.We notethat onlythe momentumequation is subject to viscous regulariztion.Further, we note that the shear viscosity term in the momentum equation multipliedby the velocity u (and formally integrated by parts) appears as a positiveright hand side in the equation for the internal energy, reflecting that the dissipationfrom shearviscosity is transformedinto internal heat energy. Incontrast,the dissipation from the bulk viscosity represents another form of internal energynotaccountedforasheatenergy, acting onlyasasafety feature inthe sensethat its contribution to the energy balance in general will be small, while thatfrom the shear viscosity in general will be substantial reflecting shock/turbulentdissipation.BelowwewillconsiderinsteadregularizationbyEG2withtheadvantagethattheEG2solutioniscomputedandthusisavailable toinspection,while û ν,µ isnot.We shall see that EG2 regularization can be interpreted as a (mesh-dependent)combination of bulk and shear viscosity and thus (39.3) can be viewed as ananalytical model of EG2 open to simple form of analysis in the form of energyestimates.As indicated, the existence of a pointwise solution û = û ν,µ to the regularizedequations (39.3) is an open problem of analytical mathematics, although withsuitable additional regularization it could be possible to settle [?]. Fortunately,we can leave this problem aside, since EG2 solutions will be shown to exist a340

Johan Hoffman, Claes Johnson and Murtazo Nazarovposteriori by computation. We thus formally assume that (39.3) admits a pointwisesolution, and derive basic energy estimates which will be paralleled belowfor EG2. We thus use the regularized problem (39.3) to illustrate basic featuresof EG2, including the 2nd Law.We shall prove now that a regularized solution û is an approximate solutionoftheEulerequationsin thesensethat R ρ (û) = 0and R e (û) ≥ 0pointwise, R m (û)is weakly small in the sense that√ ν‖R m (û)‖ −1 ≤ √ + √ µ 0 for t ∈ I),and to this end we rewrite the equation for the internal energy as follows:D u e + γe∇ · u = ν|∇u| 2 ,341

Computational Thermodynamicswhere D u e = ė + u · ∇e is the material derivative of e following the fluid particleswith velocity u. Assuming that e(x, 0) > 0 for x ∈ Ω, it follows that e(x, 1) > 0 forx ∈ Ω, and thus E(1) > 0. Assuming K(0) + E(0) = 1 the energy estimate (39.7)thus shows that ∫µp(∇ · u) 2 dxdt ≤ 1, (39.8)Qand also that E(t) ≤ 1 for t ∈ I. Next, integrating (39.6) in space and time gives,assuming for simplicity that K(0) = 0,∫∫ ∫K(1) + ν(∆u) 2 dxdt = p∇ · udxdt − µp(∇ · u) 2 dxdt ≤ 1 ∫pdxdt ≤ 1QQQ µ Q µ ,where we used that ∫ pdxdt = (γ − 1) ∫ edxdt ≤ ∫ E(t)dt ≤ 1. It follows thatQ Q I∫Qν|∇u| 2 dxdt ≤ 1 µ . (39.9)By standard estimation (assuming that p is bounded), it follows from (39.8) and(39.9) that‖R m (û)‖ −1 ≤ C( √ √ νµ + √ ), µwith C a constant of moderate size, which completes the proof. As indicated,‖R m (û)‖ −1 is estimated by computation, as shown below. The role of the analysisis thus to rationalize computational experience, not to replace it.39.9 Compression and ExpansionThe2ndLaw(39.5)statesthatthereisatransferofkineticenergytoheatenergyif W < 0, that is under compression with ∇ · u < 0, and a transfer from heat tokinetic energy if W > 0, that is under expansion with ∇ · u > 0. Returning toJoule’s experiment, we see by the 2nd Law that contraction back to the originalvolume from the final rest state in the double volume, is impossible, because theonly way the gas can be set into motion is by expansion. To see this no referenceto entropy is needed.39.10 A 2nd Law witout EntropyWe note that the 2nd Law (39.5) is expressed in terms of the physical quantitiesof kinetic energy K, heat energy E, work W, and dissipation D and doesnot involve any concept of entropy. This relieves us from the task of finding aphysical significance of entropy and justification of a classical 2nd Law stating342

Johan Hoffman, Claes Johnson and Murtazo Nazarovthat entropy cannot decrease. We thus circumvent the main difficulty of classicalthermodynamics based on statistical mechanics, while we reach the samegoal as statistical mechanics of explaining irreversibility in formally reversibleNewtonian mechanics.We thus resolve Loschmidt’s paradox [?] asking how irreversibility can occurin a formally reversible system, which Boltzmann attempted to solve. ButLoschmidt pointed out that Boltzmann’s equations are not formally reversible,because of the assumption of molecular chaos that velocities are independentbefore collision, and thus Boltzmann effectively assumes what is to be proved.Boltzmann and Loschmidt’s met in heated debates without conclusion, but afterBoltzmann’s tragic death followed by the experimental verification of the molecularnature of gases, Loschmidt’s paradox evaporated as if it had been resolved,while it had not. Postulating molecular chaos still amounts to assume what is tobe proved.39.11 Comparison with Classical ThermodynamicsClassical thermodynamics is based on the relationTds = dT + pdv, (39.10)where ds represents change of entropy s per unit mass, dv change of volume anddT denotesthechangeoftemperature T perunitmass,combinedwitha2ndLawin the form ds ≥ 0. On the other hand, the new 2nd Law takes the symbolic formdT + pdv ≥ 0, (39.11)effectively expressing that Tds ≥ 0, which is the same as ds ≥ 0 since T > 0.In symbolic form the new 2nd Law thus expresses the same as the classical 2ndLaw, without referring to entropy.Integrating the classical 2nd Law (39.10) for a perfect gas with p = (γ − 1)ρTand dv = d( 1 ρ ) = −dρ ρ 2 , we getds = dT T + p T d(1 ρ ) = dT T+ (1 − γ)dρρ ,and we conclude that with e = ρT,s = log(Tρ 1−γ ) = log( e ) = log(e) − γ log(ρ) (39.12)ργ up to a constant. Thus, the entropy s = s(ρ, e) for a perfect gas is a function ofthe physical quantities ρ and e = ρT, thus a state function, suggesting that smight have a physical significance, because ρ and e have. We thus may decide343

Computational Thermodynamicsto introduce a quantity s defined this way, but the basic questions remains: (i)What is the physical significance of s? (ii) Why is ds ≥ 0? What is the entropynon-perfect gas in which case s may not be a state function?To further exhibit the connection between the classical and new forms of the2nd Law, we observe that by the chain rule,ρD u s = ρ e D ue − γD u ρ = 1 T (D ue + γρT ∇ · u) = 1 T (D ue + e∇ · u + (γ − 1)ρT ∇ · u)since by mass conservation D u ρ = −ρ∇ · u. It follows that the entropy S = ρssatisfiesṠ + ∇ · (Su) = ρD u s = 1 T (ė + ∇ · (eu) + p∇ · u) = 1 T R e(û). (39.13)A solution û of the regularized Euler equations (39.3) thus satisfiesṠ + ∇ · (Su) = ν T |∇u|2 ≥ 0 in Q, (39.14)where S = ρ log(eρ −γ ). In particular, in the case of the Joule experiment with Tthe same in the initial and final states, we have s = γ log(V ) showingan increaseof entropy in the final state with larger volume.We sum up by noting that the classical and new form of the second law effectivelyexpress the same inequality ds ≥ 0 or Tds ≥ 0. The new 2nd law isexpressed in terms of the fundamental concepts of of kinetic energy, heat energyand work without resort to any form of entropy and statistical mechanics withall its complications. Of course, the new 2nd Law readily extends to the case of ageneral gas.39.12 EG2EG2in cG(1)cG(1)-form for the Euler equations (39.2), reads: Find û = (ρ, m, ǫ) ∈V h such that for all (¯ρ, ū, ¯ǫ) ∈ W h((R ρ (û), ¯ρ)) + ((hu · ∇ρ, u · ∇¯ρ)) = 0,((R m (û), ū)) + ((hu · ∇m, u · ∇ū)) + (ν sc ∇u, ∇ū)) = 0,((R ǫ (û), ē)) + ((hu · ∇ǫ, u · ∇¯ǫ)) = 0,(39.15)where V h is a trial space ofcontinuouspiecewise linear functionson a space-timemesh of size h satisfying the initial condition û(0) = û 0 with u ∈ V h defined bynodal interpolation of m ρ , and W h is a corresponding test space of function whicharecontinuouspiecewise linearin space andpiecewise constantintime, allfunctionssatisfyingtheboundary condition u ·n = 0 at thenodeson Γ. Further, ((·, ·))344

Johan Hoffman, Claes Johnson and Murtazo Nazarovdenotes relevant L 2 (Q) scalar products, and ν sc = h 2 |R m (û)| is a residual dependentshock-capturingviscosity, see[?]. Wehereusetheconservation equationforthe total energy ǫ rather than for the internal energy e.EG2 combines a weak satisfaction of the Euler equations with a weightedleast squares control of the residual R(û) ≡ (R ρ (û), R m (û), R e (û)) and thus representsa midway between the Scylla of weak solution and Carybdis of leastsquares strong solution.39.13 The 2nd Law for EG2Subtracting the mass equation with ¯ρ a nodal interpolant of |u|22from the momentumequation with ū = u and using the heat energy equation with ē = 1,we obtain the following 2nd Law for EG2 (up to a √ h-correction controled by theshockcapturing viscosity [?]):where˙K = W − D h , Ė = −W + D h , (39.16)D h = ((hρu · ∇u, u · ∇u)). (39.17)For solutions with turbulence/shocks, D h > 0 expressing an irreversible transferof kinetic energy into heat energy, just as above for regularized solutions. Wenote that in EG2 only the momentum equation is subject to viscous regularization,sinceD h expressesapenaltyon u·∇uappearing inthemomentumresidual.39.14 The Stabilization in EG2The stabilization in EG2 is expressed by the dissipative term D h which can beviewed as a weighted least squares control of the term ρu · ∇u in the momentumresidual. The rationale is that least squares control of a part of a residual whichis large, effectively may give control of the entire residual, and thus EG2 givesa least squares control of the momentum residual. But the EG2 stabilizationdoes not correspond to an ad hoc viscosity, as in classical regularization, but toa form of penalty arsing because Euler residuals of turbulent/shock solutionsare not pointwise small. In particular the dissipative mecahnism of EG2 doesnot correspond to a simple shear viscosity, but rather to a form of “streamlineviscosity” preventing fluid particles from colliding while allowing strong shear.39.15 Output Uniqueness and StabilityDefiningamean-valueoutputintheformofaspace-timeintegral ((û, ψ))definedby asmoothweight function ψ, weobtain by duality-based errorestimation as in345

Computational Thermodynamics[?] an a posteriori error estimate of the form|((û, ψ)) − ((ŵ, ψ))| ≤ S(‖R(û)‖ −1 + (‖R(ŵ)‖ −1 ) ≤ S(‖hR(û)‖ 0 + (‖hR(ŵ)‖ 0 ),where û and ŵ are two EG2 solutions on meshes of meshsize h, S = S(û, ŵ) isa stablity factor defined as the H 1 (Q)-norm of a solution to a linearized dualproblem with coedd with the data ψ and ‖ · ‖ 0 is the L 2 (Q)-norm. An output((û, ˆψ)) is wellposed if S‖hR(û)‖ 0 ≤ TOL with S = S(û, û)) and TOL a smalltolerance TOL of interest.In the case shocks/turbulence ‖R(û)‖ 0 will be large ∼ h −1/2 , while ‖hR(û)‖ 0may be small ∼ h 1/2 , and an output thus is wellposed if S

CHAPTER40Saddle Point StabilityBy Marie E. RognesChapter ref: [rognes]The stability of finite element approximations for abstract saddle point problemshas been an active research field for the last four decades. The well-knownBabuska- Brezzi conditions provide stability for saddle point problems of theforma(v, u) + b(v, p) + b(u, q) = 〈f, v〉 + 〈g, q〉 ∀(v, q) ∈ V × Q, (40.1)where a, b are bilinear forms and V, Q Hilbert spaces. For a choice of discretespaces V h ⊂ V and Q h ⊂ Q,thecorrespondingdiscreteBabuska-Brezziconditionsguarantee stability.However, there are finite element spaces used in practice, with success1, thatdo notsatisfy thestability conditions in general. The elementspaces may satisfythe conditions of certain classes of meshes. Or, there are only a few spuriousmodes to be filtered out before the method is stable.Thetask ofdeterminingthestability ofagiven setoffiniteelementspacesfora given set of equations has mainly been a manual task. However, the flexibilityof the FEniCS components has made automation feasible.For each set of discrete spaces, the discrete Brezzi conditions can be equivalentlyformulated in terms of an eigenvalue problem. For instance, [...] whereB is the element matrix associated with the form b and M, N are the matricesinduced by the inner-products on V and Q respectively. Hence, the stability ofa set of finite element spaces on a type of meshes can be tested numerically bysolving a series of eigenvalue problems.A small library FEAST (Finite Element Automatic Stability Tester) has beenbuilt on top of the core FEniCS components, providing automated functionality347

Saddle Point Stabilityfor the testing of finite element spaces for a given equation on given meshes.With some additional input, convergence rates and in particular optimal choicesof element (in some measure such as error per degrees of freedom) can be determined.In this note, the functionality provided by FEAST is explained and resultsfor equations such as the Stokes equations, Darcy flow and mixed elasticity aredemon- strated.348

References[AB85][ABD84]D. N. Arnold and F. Brezzi. Mixed and nonconforming finite elementmethods: implementation, postprocessing and error estimates.RAIRO Modél. Math. Anal. Numér., 19(1):7–32, 1985.Douglas N. Arnold, Franco Brezzi, and Jim Douglas, Jr. PEERS: anew mixed finite element for plane elasticity. Japan J. Appl. Math.,1(2):347–367, 1984.[ADK + 98] ToddArbogast,ClintN.Dawson,PhilipT.Keenan,MaryF.Wheeler,and Ivan Yotov. Enhancedcell-centered finite differences for ellipticequations on general geometry. SIAM J. Sci. Comput., 19(2):404–425 (electronic), 1998.[AF89][AFS68]Douglas N. Arnold and Richard S. Falk. A uniformly accurate finiteelementmethodforthe Reissner–Mindlin plate. SIAM J. Num.Anal., 26:1276–1290, 1989.J.H. Argyris, I. Fried, and D.W. Scharpf. The TUBA family of plateelements for the matrix displacement method. The AeronauticalJournal of the Royal Aeronautical Society, 72:701–709, 1968.[AFW06a] D.N. Arnold, R.S. Falk, and R. Winther. Finite element exteriorcalculus, homological techniques, and applications. Acta Numerica,15:1–155, 2006.[AFW06b] Douglas N. Arnold, Richard S. Falk, and Ragnar Winther. Differentialcomplexes and stability of finite element methods. II. The elasticitycomplex. In Compatible spatial discretizations, volume 142 ofIMA Vol. Math. Appl., pages 47–67. Springer, New York, 2006.349

[AFW07]Douglas N. Arnold, Richard S. Falk, and Ragnar Winther. Mixedfinite element methods for linear elasticity with weakly imposedsymmetry. Math. Comp., 76(260):1699–1723 (electronic), 2007.[AGSNR80] J. C. Adam, A. Gourdin Serveniere, J.-C. Nédélec, and P.-A. Raviart.Study of an implicit scheme for integrating Maxwell’s equations.Comput. Methods Appl. Mech. Engrg., 22(3):327–346, 1980.[AM09][AO93][AO00][AW02]M. S. Alnæs and K.-A. Mardal. SyFi, 2009. http://www.fenics.org/wiki/SyFi/.M. Ainsworth and J.T. Oden. A unified approach to a posteriori errorestimation using element residual methods. Numerische Mathematik,65(1):23–50, 1993.Mark AinsworthandJ.Tinsley Oden. APosterioriErrorEstimationin Finite Element Analysis. Wiley and Sons, New York, 2000.Douglas N. Arnold and Ragnar Winther. Mixed finite elements forelasticity. Numer. Math., 92(3):401–419, 2002.[BBE + 04] Satish Balay, Kris Buschelman, Victor Eijkhout, William D. Gropp,Dinesh Kaushik, Matthew G. Knepley, Lois Curfman McInnes,Barry F. Smith, and Hong Zhang. PETSc users manual. TechnicalReport ANL-95/11 - Revision 2.1.5, Argonne National Laboratory,2004.[BCO81]E. B. Becker, G. F. Carey, and J. T. Oden. Finite Elements: An Introduction.Prentice–Hall, Englewood–Cliffs, 1981.[BDM85a] F. Brezzi, J. Douglas, Jr., and L. D. Marini. Variable degree mixedmethods for second order elliptic problems. Mat. Apl. Comput.,4(1):19–34, 1985.[BDM85b] Franco Brezzi, Jim Douglas, Jr., and L. D. Marini. Two familiesof mixed finite elements for second order elliptic problems. Numer.Math., 47(2):217–235, 1985.[BF91][BR78]Franco Brezzi and Michel Fortin. Mixed and hybrid finite elementmethods,volume15ofSpringerSeriesinComputationalMathematics.Springer-Verlag, New York, 1991.I. Babuška and W. C. Rheinboldt. A posteriori error estimates forthefiniteelementmethod. Int.J.Numer.Meth.Engrg., pages1597–1615, 1978.

[BR01][Bra77][Bra07][BS94][BS08][BW85][Cas79][CF89][Cia75][Cia76][Cia78][Cia02]R. Becker and R. Rannacher. An optimal control approach to a posteriorierror estimation in finite element methods. Acta Numerica,10:1–102, 2001.A. Brandt. Multi-level adaptive solutions to boundary-value problems.Mathematics of Computation, pages 333–390, 1977.Dietrich Braess. Finite elements. Cambridge University Press,Cambridge, third edition, 2007. Theory, fast solvers, and applicationsin elasticity theory, Translated from the German by Larry L.Schumaker.S. C. Brenner and L. R. Scott. The Mathematical Theory of FiniteElement Methods. Springer-Verlag, 1994.SusanneC.BrennerandL.RidgwayScott. Themathematicaltheoryof finite element methods, volume 15 of Texts in Applied Mathematics.Springer, New York, third edition, 2008.RE Bank and A. Weiser. Some a posteriori error estimators for ellipticpartial differential equations. Mathematics of Computation,pages 283–301, 1985.C. A. P. Castigliano. Théorie de l’équilibre des systèmes élastiques etses applications. A.F. Negro ed., Torino, 1879.M. Crouzeix and Richard S. Falk. Nonconforming finite elmentsfor the stokes problem. Mathematics of Computation, 52:437–456,1989.P.G. Ciarlet. Lectures on the finite element method. Lectures onMathematics and Physics, Tata Institute of Fundamental Research,Bombay, 49, 1975.P. G. Ciarlet. Numerical Analysis ofthe Finite Element Method. LesPresses de l’Universite de Montreal, 1976.P. G. Ciarlet. The Finite Element Method for Elliptic Problems.North-Holland, Amsterdam, New York, Oxford, 1978.Philippe G. Ciarlet. The finite element method for elliptic problems,volume40ofClassicsinAppliedMathematics. SocietyforIndustrialand Applied Mathematics (SIAM), Philadelphia, PA, 2002. Reprintof the 1978 original [North-Holland, Amsterdam; MR0520174 (58#25001)].

[Cou43][CR72][CR73][Dav04]R. Courant. Variational methods for the solution of problems ofequilibrium and vibrations. Bull. Amer. Math. Soc., pages 1–23,1943.P. G. Ciarlet and P.-A. Raviart. General Lagrange and Hermite interpolationin R n withapplications to finite elementmethods. Arch.Rational Mech. Anal., 46:177–199, 1972.M. Crouzeix and P A Raviart. Conforming and non-conforming finiteelement methods for solving the stationary Stokes equations.R.A.I.R.O Anal. Numer., 7:33–76, 1973.T.A. Davis. Algorithm 832: UMFPACK V4. 3an unsymmetricpatternmultifrontal method. ACM Transactions on MathematicalSoftware, 30(2):196–199, 2004.[EEHJ95] K. Eriksson, D. Estep, P. Hansbo, and C. Johnson. Introduction toadaptive methodsfor differential equations. Acta Numerica, 4:105–158, 1995.[EJ][EJ91][EJ95a][EJ95b][EJ95c][EJL98][Fel66]K. Eriksson and C. Johnson. Adaptive finite element methods forparabolicproblemsIII:Timestepsvariable inspace. inpreparation.K. Eriksson and C. Johnson. Adaptive finite element methods forparabolic problems I: A linear model problem. SIAM J. Numer.Anal., 28, No. 1:43–77, 1991.K. Eriksson and C. Johnson. Adaptive Finite Element Methods forParabolic Problems II: Optimal Error Estimates in$ L \ infty L 2$and$ L \ infty L \ infty$. SIAM Journal on Numerical Analysis,32:706, 1995.K. Eriksson and C. Johnson. Adaptive finite element methods forparabolic problems IV: Nonlinear problems. SIAM Journal on NumericalAnalysis, pages 1729–1749, 1995.K. Eriksson and C. Johnson. Adaptive finite element methodsfor parabolic problems V: Long-time integration. SIAM J. Numer.Anal., 32:1750–1763, 1995.K. Eriksson, C. Johnson, and S. Larsson. Adaptive finite elementmethods for parabolic problems VI: Analytic semigroups. SIAM J.Numer. Anal., 35:1315–1325, 1998.C.A. Felippa. Refined Finite Element Analysis of Linear and NonlinearTwo-dimensional Structures. PhD thesis, The University ofCalifornia at Berkeley, 1966.

[FY02][Gal15]R.D. Falgout and U.M. Yang. Hypre: A library of high performancepreconditioners. In Proceedings of the International Conferenceon Computational Science-Part III, pages 632–641. Springer-VerlagLondon, UK, 2002.B. G. Galerkin. Series solution of some problems in elastic equilibriumofrodsandplates.Vestnikinzhenerovitekhnikov,19:897–908,1915.[HBH + 05] Michael A. Heroux, Roscoe A. Bartlett, Vicki E. Howle, Robert J.Hoekstra, Jonathan J. Hu, Tamara G. Kolda, Richard B. Lehoucq,Kevin R. Long, Roger P. Pawlowski, Eric T. Phipps, Andrew G.Salinger, Heidi K. Thornquist, Ray S. Tuminaro, James M. Willenbring,Alan Williams, and Kendall S. Stanley. An overview of theTrilinos project. ACM Trans. Math. Softw., 31(3):397–423, 2005.[HL03][HS52][Hug87][Kir04][Li05][LL75][LM54][Log07]Peter Hansbo and Mats G. Larson. Discontinuous Galerkin and theCrouzeix–Raviart element: application to elasticity. Math. Model.Numer. Anal., 37:63–72, 2003.M.R. Hestenes and E. Stiefel. Methods of conjugate gradients forsolvinglinearsystems. J.Res.Nat.Bur.Stand,49(6):409–436, 1952.T. J. R. Hughes. The Finite Element Method: Linear Static andDynamic Finite Element Analysis. Prentice-Hall, 1987.R. C. Kirby. FIAT: A new paradigm for computing finite elementbasis functions. ACM Trans. Math. Software, 30:502–516, 2004.X.S. Li. An overview of SuperLU: Algorithms, implementation,and user interface. ACM Transactions on Mathematical Software(TOMS), 31(3):302–325, 2005.P. Lascaux and P. Lesaint. Some nonconforming finite elementsfor the plate bending problem. Rev. Française Automat. Informat.Recherche Operationnelle RAIRO Analyse Numérique, 9(R-1):9–53,1975.PD Lax and AN Milgram. Parabolic equations. Annals of MathematicsStudies, 33:167–190, 1954.Anders Logg. Efficient representation of computational meshes.In B. Skallerud and H. I. Andersson, editors, MekIT’07.Tapir Akademisk Forlag, 2007. http://www.ntnu.no/mekit07,http://butikk.tapirforlag.no/no/node/1073.

[Mor68][OD96][Ray70][Rit08][RKL08][RT77][RW83][SF73][SS86][Ver94][Ver99]L.S.D. Morley. The triangular equilibrium element in the solutionof plate bending problems. Aero. Quart., 19:149–169, 1968.J.T. Oden and L. Demkowicz. Applied functional analysis. CRCpress, 1996.Rayleigh. On the theory of resonance. Trans. Roy. Soc., A161:77–118, 1870.W. Ritz. Über eine neue Methode zur Lösung gewisser Variationsproblemeder mathematischen Physik. J. reine angew. Math.,135:1–61, 1908.M. Rognes, Robert C. Kirby, and Anders Logg. Efficient assembly ofh(div)and h(curl)conformingfiniteelements. SIAMJ.Sci. Comput.,2008. submitted by Marie 2008-10-24, resubmitted by Marie 2009-05-22.P.-A. Raviart and J. M. Thomas. A mixed finite element method for2nd order elliptic problems. In Mathematical aspects of finite elementmethods (Proc. Conf., Consiglio Naz. delle Ricerche (C.N.R.),Rome, 1975), pages 292–315. Lecture Notes in Math., Vol. 606.Springer, Berlin, 1977.T. F. Russell and M. F. Wheeler. The Mathematics of Reservoir Simulation,volume 1 of Frontiers Applied Mathematics, chapter Finiteelement andfinite difference methods for continuous flow in porousmedia, pages 35–106. SIAM, 1983.G. Strang and G. J. Fix. An Analysis of the Finite Element Method.Prentice-Hall, Englewood Cliffs, 1973.Y.SaadandM.H.Schultz. GMRES:Ageneralizedminimalresidualalgorithm for solving nonsymmetric linear systems. SIAM J. Sci.Stat. Comput., 7(3):856–869, 1986.R. Verfürth. A posteriori error estimation and adaptive meshrefinementtechniques. In Proceedings of the fifth internationalconference on Computational and applied mathematics table of contents,pages67–83.ElsevierSciencePublishersBVAmsterdam,TheNetherlands, The Netherlands, 1994.R. Verfürth. A review ofaposteriori errorestimation techniques forelasticity problems. Computer Methods in Applied Mechanics andEngineering, 176(1-4):419–440, 1999.

[Wes92][ZTZ67][ZZ87]P. Wesseling. An introduction to multigrid methods. Wiley Chichester,1992.O. C. Zienkiewicz, R. L. Taylor, and J. Z. Zhu. The Finite ElementMethod — Its Basis and Fundamentals, 6th edition. Elsevier, 2005,first published in 1967.OC Zienkiewicz and JZ Zhu. A simple error estimator and adaptiveprocedure for practical engineerng analysis. International Journalfor Numerical Methods in Engineering, 24(2), 1987.

APPENDIXANotationThe following notation is used throughout this book.A – the global tensor with entries {A i } i∈IA K – the element tensor with entries {A K i } i∈I KA 0 – the reference tensor with entries {A 0 iα } i∈I K ,α∈Aa – a multilinearforma K – the local contribution to a multilinearform a from a cell KA – the set of secondary indicesB – the set of auxiliary indicese – the error, e = u h − uF K – the mapping from the reference cell K 0 to KG K – the geometry tensor with entries {G α K } α∈AI – the set ∏ ρj=1 [1,Nj ] of indices for the global tensor AI K – the set ∏ ρj=1 [1,nj K ] of indices for the element tensor AK (primary indices)ι K – the local-to-global mapping from [1,n K ] to [1,N]K – a cell in the mesh TK 0 – the reference cellL – a linear form (functional) on ˆV or ˆV hL – the degrees of freedom (linear functionals) on V hL K – the degrees of freedom (linear functionals) on P KL 0 – the degrees of freedom (linear functionals) on P 0N – the dimension of ˆV h and V hn K – the dimension of P Kl i – a degree of freedom (linear functional) on V hl K i – a degree of freedom (linear functional) on P Kl 0 i – a degree of freedom (linear functional) on P 0P K – the local function space on a cell KP 0 – the local function space on the reference cell K 0357

P q (K) – the space of polynomials of degree ≤ q on Kr – the (weak) residual, r(v) = a(v,u h ) − L(v) or r(v) = F(u h ;v)u h – the finite element solution, u h ∈ V hU – the vector of degrees of freedom for u h = ∑ Ni=1 U iφ iu – the exact solution of a variational problem, u ∈ VˆV – the test spaceV – the trial spaceˆV ∗ – the dual test space, ˆV ∗ = V 0V ∗ – the dual trial space, V ∗ = ˆVˆV h – the discrete test spaceV h – the discrete trial spaceφ i – a basis function in V hˆφ i – a basis function in ˆV hφ K i – a basis function in P KΦ i – a basis function in P 0z – the dual solution, z ∈ V ∗T – the mesh, T = {K}Ω – a bounded domain in R d

List of Authors◮ Editor note: Include author affiliations here.◮ Editor note: Sort alphabetically by last name, not first.• Anders Logg• Andy R. Terrel• Bjørn Fredrik Nielsen• Claes Johnson• David B. Davidson• Emil Alf Løvgren• Evan Lezar• Garth N. Wells• Hans Joachim Schroll• Hans Petter Langtangen• Ilmar M. Wilbers• Joakim Sundnes• Johan Hake• Johan Hoffman• Johan Jansson359

• Kent-Andre Mardal• Kristian B. Ølgaard• Kristian Valen-Sendstad• Kristoffer Selim• L. Ridgway Scott• L. Trabucho• Luca Antiga• Marie E. Rognes• Martin S. Alnæs• Matthew G. Knepley• Mehdi Nikbakht• Murtazo Nazarov• Niclas Jansson• N. Lopes• Oddrun Christine Myklebust• Ola Skavhaug• P. Pereira• Robert C. Kirby• Susanne Hentschel• Svein Linge• Xuming Shan

Index∇, 165instant-clean,136instant-showcache,136BasisFunctions,161BasisFunction,161ComponentTensor,162Constant, 161Dx, 165Expr, 172FiniteElement,156Form, 159Functions, 161Function, 161Identity, 160IndexSum, 162Indexed, 162Index, 162Integral, 159ListTensor, 162Measure, 159MixedElement,156Operator, 172TensorConstant,161TensorElement,156Terminal, 160, 172TestFunctions,161TestFunction,161TrialFunctions,161TrialFunction,161VectorConstant,161VectorElement,156action, 168adjoint, 168as matrix, 162as tensor, 162as vector, 162cos, 164cross, 164curl, 165derivative, 168det, 164diff, 165div, 165dot, 164dx, 165energy norm, 168exp, 164grad, 165indices, 162inner, 164inv, 164lhs, 168ln, 164outer, 164pow, 164replace, 168rhs, 168rot, 165sensitivity rhs, 168sin, 164361

split, 161sqrt, 164system, 168transpose, 164tr, 164UFL, 153AD, 176Adaptive mesh refinement, 94algebraic operators, 164algorithms, 183atomic value, 160Automatic Differentiation, 176avg, 167basis function, 160basis functions, 161boundary conditions, 256boundary measure, 159Boussinesq models, 247Cache, 140cell integral, 159cerebrospinal fluid, 221Chiari I malformation, 221coefficient function, 160coefficient functions, 161coefficients, 161computational graph, 175computing derivatives, 176cross product, 164CSF, 221Cython, 128derivatives, 176determinant, 164DG operators, 167differential operators, 165differentiation, 176discontinuous Galerkin, 167discontinuous Lagrange element, 156dispersion curves, 296, 301, 303, 305dispersion relation, 254Distutils, 127domain specific language, 153dot product, 164Dynamic load balancing, 97eigenvalue problem, 292, 293electromagnetics, 289expression, 172expression representations, 186expression transformations, 185, 186expression tree, 172expression trees, 183exterior facet integral, 159F2PY, 128facet normal, 160finite element, 156finite element space, 156foramen magnum, 221form argument, 160form arguments, 161form language, 153form operators, 168forms, 159forward mode AD, 176functional, 153functions, 161identity matrix, 160implicit summation, 162index notation, 162indices, 162inner product, 164integrals, 159interior facet integral, 159interior measure, 159inverse, 164jump, 167Lagrange element, 156language operators, 164Maxwell’s equations, 289microstrip, see shielded microstripmultifunctions, 183

Navier-Stokes, 223Newtonian fluid, 223OpenMP, 132operator, 172outer product, 164Parallel radix sort, 99potential, 249Predictor-Corrector, 258program, 172propagation constant, 290referential transparency, 172reflective boundaries, 257restriction, 167reverse mode AD, 176Runge-Kutta, 258variational form, 153vector Helmholtz equation, 290water waves, 247waveguide, 289cutoff analysis, 291, 299, 302discontinuities, 305–307dispersion analysis, 293, 301, 302,304half-loaded rectangular, 301–303hollow rectangular, 298–301, 306wavenumbercutoff, 292, 296, 298, 299, 304operating, 290, 292weak form, 153Weave, 128Windkessel model, 130S-parameters, 305–307SAS, 221scatteringparameters,seeS-parametersshielded microstrip, 303Signature, 140signatures, 189source function, 256spatial coordinates, 160spinal canal, 221spinal cord, 222sponge-layers, 257spurious modes, 293subarachnoid space, 221SWIG, 127symbolic differentiation, 176tensor algebra operators, 164terminal value, 160, 172trace, 164transpose, 164tree traversal, 183Typemap, 128, 136UFL, 190Unified Form Language, 153

Part II Implementation - FEniCS Project

Create successful ePaper yourself

Delete template?

Save as template?