1 2<br> 3 4<i> 5This program was contributed by Martin Kronbichler. Many ideas presented here 6are the result of common code development with Niklas Fehn, Katharina Kormann, 7Peter Munch, and Svenja Schoeder. 8 9This work was partly supported by the German Research Foundation (DFG) through 10the project "High-order discontinuous Galerkin for the exa-scale" (ExaDG) 11within the priority program "Software for Exascale Computing" (SPPEXA). 12</i> 13 14<a name="Intro"></a> 15<h1>Introduction</h1> 16 17This tutorial program solves the Euler equations of fluid dynamics using an 18explicit time integrator with the matrix-free framework applied to a 19high-order discontinuous Galerkin discretization in space. For details about 20the Euler system and an alternative implicit approach, we also refer to the 21step-33 tutorial program. You might also want to look at step-69 for 22an alternative approach to solving these equations. 23 24 25<h3>The Euler equations</h3> 26 27The Euler equations are a conservation law, describing the motion of a 28compressible inviscid gas, 29@f[ 30\frac{\partial \mathbf{w}}{\partial t} + \nabla \cdot \mathbf{F}(\mathbf{w}) = 31\mathbf{G}(\mathbf w), 32@f] 33where the $d+2$ components of the solution vector are $\mathbf{w}=(\rho, \rho 34u_1,\ldots,\rho u_d,E)^{\mathrm T}$. Here, $\rho$ denotes the fluid density, 35${\mathbf u}=(u_1,\ldots, u_d)^\mathrm T$ the fluid velocity, and $E$ the 36energy density of the gas. The velocity is not directly solved for, but rather 37the variable $\rho \mathbf{u}$, the linear momentum (since this is the 38conserved quantity). 39 40The Euler flux function, a $(d+2)\times d$ matrix, is defined as 41@f[ 42 \mathbf F(\mathbf w) 43 = 44 \begin{pmatrix} 45 \rho \mathbf{u}\\ 46 \rho \mathbf{u} \otimes \mathbf{u} + \mathbb{I}p\\ 47 (E+p)\mathbf{u} 48 \end{pmatrix} 49@f] 50with $\mathbb{I}$ the $d\times d$ identity matrix and $\otimes$ the outer 51product; its components denote the mass, momentum, and energy fluxes, respectively. 52The right hand side forcing is given by 53@f[ 54 \mathbf G(\mathbf w) 55 = 56 \begin{pmatrix} 57 0\\ 58 \rho\mathbf{g}\\ 59 \rho \mathbf{u} \cdot \mathbf{g} 60 \end{pmatrix}, 61@f] 62where the vector $\mathbf g$ denotes the direction and magnitude of 63gravity. It could, however, also denote any other external force per unit mass 64that is acting on the fluid. (Think, for example, of the electrostatic 65forces exerted by an external electric field on charged particles.) 66 67The three blocks of equations, the second involving $d$ components, describe 68the conservation of mass, momentum, and energy. The pressure is not a 69solution variable but needs to be expressed through a "closure relationship" 70by the other variables; we here choose the relationship appropriate 71for a gas with molecules composed of two atoms, which at moderate 72temperatures is given by $p=(\gamma - 1) \left(E-\frac 12 \rho 73\mathbf{u}\cdot \mathbf{u}\right)$ with the constant $\gamma = 1.4$. 74 75 76<h3>High-order discontinuous Galerkin discretization</h3> 77 78For spatial discretization, we use a high-order discontinuous Galerkin (DG) 79discretization, using a solution expansion of the form 80@f[ 81\mathbf{w}_h(\mathbf{x}, t) = 82\sum_{j=1}^{n_\mathbf{dofs}} \boldsymbol{\varphi}_j(\mathbf{x}) {w}_j(t). 83@f] 84Here, $\boldsymbol{\varphi}_j$ denotes the $j$th basis function, written 85in vector form with separate shape functions for the different components and 86letting $w_j(t)$ go through the density, momentum, and energy variables, 87respectively. In this form, the space dependence is contained in the shape 88functions and the time dependence in the unknown coefficients $w_j$. As 89opposed to the continuous finite element method where some shape functions 90span across element boundaries, the shape functions are local to a single 91element in DG methods, with a discontinuity from one element to the next. The 92connection of the solution from one cell to its neighbors is instead 93imposed by the numerical fluxes 94specified below. This allows for some additional flexibility, for example to 95introduce directionality in the numerical method by, e.g., upwinding. 96 97DG methods are popular methods for solving problems of transport character 98because they combine low dispersion errors with controllable dissipation on 99barely resolved scales. This makes them particularly attractive for simulation 100in the field of fluid dynamics where a wide range of active scales needs to be 101represented and inadequately resolved features are prone to disturb the 102important well-resolved features. Furthermore, high-order DG methods are 103well-suited for modern hardware with the right implementation. At the same 104time, DG methods are no silver bullet. In particular when the solution 105develops discontinuities (shocks), as is typical for the Euler equations in 106some flow regimes, high-order DG methods tend to oscillatory solutions, like 107all high-order methods when not using flux- or slope-limiters. This is a consequence of <a 108href="https://en.wikipedia.org/wiki/Godunov%27s_theorem">Godunov's theorem</a> 109that states that any total variation limited (TVD) scheme that is linear (like 110a basic DG discretization) can at most be first-order accurate. Put 111differently, since DG methods aim for higher order accuracy, they cannot be 112TVD on solutions that develop shocks. Even though some communities claim that 113the numerical flux in DG methods can control dissipation, this is of limited 114value unless <b>all</b> shocks in a problem align with cell boundaries. Any 115shock that passes through the interior of cells will again produce oscillatory 116components due to the high-order polynomials. In the finite element and DG 117communities, there exist a number of different approaches to deal with shocks, 118for example the introduction of artificial diffusion on troubled cells (using 119a troubled-cell indicator based e.g. on a modal decomposition of the 120solution), a switch to dissipative low-order finite volume methods on a 121subgrid, or the addition of some limiting procedures. Given the ample 122possibilities in this context, combined with the considerable implementation 123effort, we here refrain from the regime of the Euler equations with pronounced 124shocks, and rather concentrate on the regime of subsonic flows with wave-like 125phenomena. For a method that works well with shocks (but is more expensive per 126unknown), we refer to the step-69 tutorial program. 127 128For the derivation of the DG formulation, we multiply the Euler equations with 129test functions $\mathbf{v}$ and integrate over an individual cell $K$, which 130gives 131@f[ 132\left(\mathbf{v}, \frac{\partial \mathbf{w}}{\partial t}\right)_{K} 133+ \left(\mathbf{v}, \nabla \cdot \mathbf{F}(\mathbf{w})\right)_{K} = 134\left(\mathbf{v},\mathbf{G}(\mathbf w)\right)_{K}. 135@f] 136 137We then integrate the second term by parts, moving the divergence 138from the solution slot to the test function slot, and producing an integral 139over the element boundary: 140@f[ 141\left(\mathbf{v}, \frac{\partial \mathbf{w}}{\partial t}\right)_{K} 142- \left(\nabla \mathbf{v}, \mathbf{F}(\mathbf{w})\right)_{K} 143+ \left<\mathbf{v}, \mathbf{n} \cdot \widehat{\mathbf{F}}(\mathbf{w}) 144\right>_{\partial K} = 145\left(\mathbf{v},\mathbf{G}(\mathbf w)\right)_{K}. 146@f] 147In the surface integral, we have replaced the term $\mathbf{F}(\mathbf w)$ by 148the term $\widehat{\mathbf{F}}(\mathbf w)$, the numerical flux. The role of 149the numerical flux is to connect the solution on neighboring elements and 150weakly impose continuity of the solution. This ensures that the global 151coupling of the PDE is reflected in the discretization, despite independent 152basis functions on the cells. The connectivity to the neighbor is included by 153defining the numerical flux as a function $\widehat{\mathbf{F}}(\mathbf w^-, 154\mathbf w^+)$ of the solution from both sides of an interior face, $\mathbf 155w^-$ and $\mathbf w^+$. A basic property we require is that the numerical flux 156needs to be <b>conservative</b>. That is, we want all information (i.e., 157mass, momentum, and energy) that leaves a cell over 158a face to enter the neighboring cell in its entirety and vice versa. This can 159be expressed as $\widehat{\mathbf{F}}(\mathbf w^-, \mathbf w^+) = 160\widehat{\mathbf{F}}(\mathbf w^+, \mathbf w^-)$, meaning that the numerical 161flux evaluates to the same result from either side. Combined with the fact 162that the numerical flux is multiplied by the unit outer normal vector on the 163face under consideration, which points in opposite direction from the two 164sides, we see that the conservation is fulfilled. An alternative point of view 165of the numerical flux is as a single-valued intermediate state that links the 166solution weakly from both sides. 167 168There is a large number of numerical flux functions available, also called 169Riemann solvers. For the Euler equations, there exist so-called exact Riemann 170solvers -- meaning that the states from both sides are combined in a way that 171is consistent with the Euler equations along a discontinuity -- and 172approximate Riemann solvers, which violate some physical properties and rely 173on other mechanisms to render the scheme accurate overall. Approxiate Riemann 174solvers have the advantage of beging cheaper to compute. Most flux functions 175have their origin in the finite volume community, which are similar to DG 176methods with polynomial degree 0 within the cells (called volumes). As the 177volume integral of the Euler operator $\mathbf{F}$ would disappear for 178constant solution and test functions, the numerical flux must fully represent 179the physical operator, explaining why there has been a large body of research 180in that community. For DG methods, consistency is guaranteed by higher order 181polynomials within the cells, making the numerical flux less of an issue and 182usually affecting only the convergence rate, e.g., whether the solution 183converges as $\mathcal O(h^p)$, $\mathcal O(h^{p+1/2})$ or $\mathcal 184O(h^{p+1})$ in the $L_2$ norm for polynomials of degree $p$. The numerical 185flux can thus be seen as a mechanism to select more advantageous 186dissipation/dispersion properties or regarding the extremal eigenvalue of the 187discretized and linearized operator, which affect the maximal admissible time 188step size in explicit time integrators. 189 190In this tutorial program, we implement two variants of fluxes that can be 191controlled via a switch in the program (of course, it would be easy to make 192them a run time parameter controlled via an input file). The first flux is 193the local Lax--Friedrichs flux 194@f[ 195\hat{\mathbf{F}}(\mathbf{w}^-,\mathbf{w}^+) = 196\frac{\mathbf{F}(\mathbf{w}^-)+\mathbf{F}(\mathbf{w}^+)}{2} + 197 \frac{\lambda}{2}\left[\mathbf{w}^--\mathbf{w}^+\right]\otimes 198 \mathbf{n^-}. 199@f] 200 201In the original definition of the Lax--Friedrichs flux, a factor $\lambda = 202\max\left(\|\mathbf{u}^-\|+c^-, \|\mathbf{u}^+\|+c^+\right)$ is used 203(corresponding to the maximal speed at which information is moving on 204the two sides of the interface), stating 205that the difference between the two states, $[\![\mathbf{w}]\!]$ is penalized 206by the largest eigenvalue in the Euler flux, which is $\|\mathbf{u}\|+c$, 207where $c=\sqrt{\gamma p / \rho}$ is the speed of sound. In the implementation 208below, we modify the penalty term somewhat, given that the penalty is of 209approximate nature anyway. We use 210@f{align*}{ 211\lambda 212&= 213\frac{1}{2}\max\left(\sqrt{\|\mathbf{u^-}\|^2+(c^-)^2}, 214 \sqrt{\|\mathbf{u}^+\|^2+(c^+)^2}\right) 215\\ 216&= 217\frac{1}{2}\sqrt{\max\left(\|\mathbf{u^-}\|^2+(c^-)^2, 218 \|\mathbf{u}^+\|^2+(c^+)^2\right)}. 219@f} 220The additional factor $\frac 12$ reduces the penalty strength (which results 221in a reduced negative real part of the eigenvalues, and thus increases the 222admissible time step size). Using the squares within the sums allows us to 223reduce the number of expensive square root operations, which is 4 for the 224original Lax--Friedrichs definition, to a single one. 225This simplification leads to at most a factor of 2262 in the reduction of the parameter $\lambda$, since $\|\mathbf{u}\|^2+c^2 \leq 227\|\mathbf{u}\|^2+2 c |\mathbf{u}\| + c^2 = \left(\|\mathbf{u}\|+c\right)^2 228\leq 2 \left(\|\mathbf{u}\|^2+c^2\right)$, with the last inequality following 229from Young's inequality. 230 231The second numerical flux is one proposed by Harten, Lax and van Leer, called 232the HLL flux. It takes the different directions of propagation of the Euler 233equations into account, depending on the speed of sound. It utilizes some 234intermediate states $\bar{\mathbf{u}}$ and $\bar{c}$ to define the two 235branches $s^\mathrm{p} = \max\left(0, \bar{\mathbf{u}}\cdot \mathbf{n} + 236\bar{c}\right)$ and $s^\mathrm{n} = \min\left(0, \bar{\mathbf{u}}\cdot 237\mathbf{n} - \bar{c}\right)$. From these branches, one then defines the flux 238@f[ 239\hat{\mathbf{F}}(\mathbf{w}^-,\mathbf{w}^+) = 240\frac{s^\mathrm{p} \mathbf{F}(\mathbf{w}^-)-s^\mathrm{n} \mathbf{F}(\mathbf{w}^+)} 241 {s^\mathrm p - s^\mathrm{n} } + 242\frac{s^\mathrm{p} s^\mathrm{n}}{s^\mathrm{p}-s^\mathrm{n}} 243\left[\mathbf{w}^--\mathbf{w}^+\right]\otimes \mathbf{n^-}. 244@f] 245Regarding the definition of the intermediate state $\bar{\mathbf{u}}$ and 246$\bar{c}$, several variants have been proposed. The variant originally 247proposed uses a density-averaged definition of the velocity, $\bar{\mathbf{u}} 248= \frac{\sqrt{\rho^-} \mathbf{u}^- + \sqrt{\rho^+}\mathbf{u}^+}{\sqrt{\rho^-} 249+ \sqrt{\rho^+}}$. Since we consider the Euler equations without shocks, we 250simply use arithmetic means, $\bar{\mathbf{u}} = \frac{\mathbf{u}^- + 251\mathbf{u}^+}{2}$ and $\bar{c} = \frac{c^- + c^+}{2}$, with $c^{\pm} = 252\sqrt{\gamma p^{\pm} / \rho^{\pm}}$, in this tutorial program, and leave other 253variants to a possible extension. We also note that the HLL flux has been 254extended in the literature to the so-called HLLC flux, where C stands for the 255ability to represent contact discontinuities. 256 257At the boundaries with no neighboring state $\mathbf{w}^+$ available, it is 258common practice to deduce suitable exterior values from the boundary 259conditions (see the general literature on DG methods for details). In this 260tutorial program, we consider three types of boundary conditions, namely 261<b>inflow boundary conditions</b> where all components are prescribed, 262@f[ 263\mathbf{w}^+ = \begin{pmatrix} \rho_\mathrm{D}(t)\\ 264(\rho \mathbf u)_{\mathrm D}(t) \\ E_\mathrm{D}(t)\end{pmatrix} \quad 265 \text{(Dirichlet)}, 266@f] 267<b>subsonic outflow boundaries</b>, where we do not prescribe exterior 268solutions as the flow field is leaving the domain and use the interior values 269instead; we still need to prescribe the energy as there is one incoming 270characteristic left in the Euler flux, 271@f[ 272\mathbf{w}^+ = \begin{pmatrix} \rho^-\\ 273(\rho \mathbf u)^- \\ E_\mathrm{D}(t)\end{pmatrix} \quad 274 \text{(mixed Neumann/Dirichlet)}, 275@f] 276and <b>wall boundary condition</b> which describe a no-penetration 277configuration: 278@f[ 279\mathbf{w}^+ = \begin{pmatrix} \rho^-\\ 280(\rho \mathbf u)^- - 2 [(\rho \mathbf u)^-\cdot \mathbf n] \mathbf{n} 281 \\ E^-\end{pmatrix}. 282@f] 283 284The polynomial expansion of the solution is finally inserted to the weak form 285and test functions are replaced by the basis functions. This gives a discrete 286in space, continuous in time nonlinear system with a finite number of unknown 287coefficient values $w_j$, $j=1,\ldots,n_\text{dofs}$. Regarding the choice of 288the polynomial degree in the DG method, there is no consensus in literature as 289of 2019 as to what polynomial degrees are most efficient and the decision is 290problem-dependent. Higher order polynomials ensure better convergence rates 291and are thus superior for moderate to high accuracy requirements for 292<b>smooth</b> solutions. At the same time, the volume-to-surface ratio 293of where degrees of freedom are located, 294increases with higher degrees, and this makes the effect of the numerical flux 295weaker, typically reducing dissipation. However, in most of the cases the 296solution is not smooth, at least not compared to the resolution that can be 297afforded. This is true for example in incompressible fluid dynamics, 298compressible fluid dynamics, and the related topic of wave propagation. In this 299pre-asymptotic regime, the error is approximately proportional to the 300numerical resolution, and other factors such as dispersion errors or the 301dissipative behavior become more important. Very high order methods are often 302ruled out because they come with more restrictive CFL conditions measured 303against the number of unknowns, and they are also not as flexible when it 304comes to representing complex geometries. Therefore, polynomial degrees 305between two and six are most popular in practice, see e.g. the efficiency 306evaluation in @cite FehnWallKronbichler2019 and references cited therein. 307 308<h3>Explicit time integration</h3> 309 310To discretize in time, we slightly rearrange the weak form and sum over all 311cells: 312@f[ 313\sum_{K \in \mathcal T_h} \left(\boldsymbol{\varphi}_i, 314\frac{\partial \mathbf{w}}{\partial t}\right)_{K} 315= 316\sum_{K\in \mathcal T_h} 317\left[ 318\left(\nabla \boldsymbol{\varphi}_i, \mathbf{F}(\mathbf{w})\right)_{K} 319-\left<\boldsymbol{\varphi}_i, 320\mathbf{n} \cdot \widehat{\mathbf{F}}(\mathbf{w})\right>_{\partial K} + 321\left(\boldsymbol{\varphi}_i,\mathbf{G}(\mathbf w)\right)_{K} 322\right], 323@f] 324where $\boldsymbol{\varphi}_i$ runs through all basis functions with from 1 to 325$n_\text{dofs}$. 326 327We now denote by $\mathcal M$ the mass matrix with entries $\mathcal M_{ij} = 328\sum_{K} \left(\boldsymbol{\varphi}_i, 329\boldsymbol{\varphi}_j\right)_K$, and by 330@f[ 331\mathcal L_h(t,\mathbf{w}_h) = \left[\sum_{K\in \mathcal T_h} 332\left[ 333\left(\nabla \boldsymbol{\varphi}_i, \mathbf{F}(\mathbf{w}_h)\right)_{K} 334- \left<\boldsymbol{\varphi}_i, 335\mathbf{n} \cdot \widehat{\mathbf{F}}(\mathbf{w}_h)\right>_{\partial K} 336+ \left(\boldsymbol{\varphi}_i,\mathbf{G}(\mathbf w_h)\right)_{K} 337\right]\right]_{i=1,\ldots,n_\text{dofs}}. 338@f] 339the operator evaluating the right-hand side of the Euler operator, given a 340function $\mathbf{w}_h$ associated with a global vector of unknowns 341and the finite element in use. This function $\mathcal L_h$ is explicitly time-dependent as the 342numerical flux evaluated at the boundary will involve time-dependent data 343$\rho_\mathrm{D}$, $(\rho \mathbf{u})_\mathrm{D}$, and $E_\mathbf{D}$ on some 344parts of the boundary, depending on the assignment of boundary 345conditions. With this notation, we can write the discrete in space, continuous 346in time system compactly as 347@f[ 348\mathcal M \frac{\partial \mathbf{w}_h}{\partial t} = 349\mathcal L_h(t, \mathbf{w}_h), 350@f] 351where we have taken the liberty to also denote the global solution 352vector by $\mathbf{w}_h$ (in addition to the the corresponding finite 353element function). Equivalently, the system above has the form 354@f[ 355\frac{\partial \mathbf{w}_h}{\partial t} = 356\mathcal M^{-1} \mathcal L_h(t, \mathbf{w}_h). 357@f] 358 359For hyperbolic systems discretized by high-order discontinuous Galerkin 360methods, explicit time integration of this system is very popular. This is due 361to the fact that the mass matrix $\mathcal M$ is block-diagonal (with each 362block corresponding to only variables of the same kind defined on the same 363cell) and thus easily inverted. In each time step -- or stage of a 364Runge--Kutta scheme -- one only needs to evaluate the differential operator 365once using the given data and subsequently apply the inverse of the mass 366matrix. For implicit time stepping, on the other hand, one would first have to 367linearize the equations and then iteratively solve the linear system, which 368involves several residual evaluations and at least a dozen applications of 369the linearized operator, as has been demonstrated in the step-33 tutorial 370program. 371 372Of course, the simplicity of explicit time stepping comes with a price, namely 373conditional stability due to the so-called Courant--Friedrichs--Lewy (CFL) 374condition. It states that the time step cannot be larger than the fastest 375propagation of information by the discretized differential operator. In more 376modern terms, the speed of propagation corresponds to the largest eigenvalue 377in the discretized operator, and in turn depends on the mesh size, the 378polynomial degree $p$ and the physics of the Euler operator, i.e., the 379eigenvalues of the linearization of $\mathbf F(\mathbf w)$ with respect to 380$\mathbf{w}$. In this program, we set the time step as follows: 381@f[ 382\Delta t = \frac{\mathrm{Cr}}{p^{1.5}}\left(\frac{1} 383 {\max\left[\frac{\|\mathbf{u}\|}{h_u} + \frac{c}{h_c}\right]}\right), 384@f] 385 386with the maximum taken over all quadrature points and all cells. The 387dimensionless number $\mathrm{Cr}$ denotes the Courant number and can be 388chosen up to a maximally stable number $\mathrm{Cr}_\text{max}$, whose value 389depends on the selected time stepping method and its stability properties. The 390power $p^{1.5}$ used for the polynomial scaling is heuristic and represents 391the closest fit for polynomial degrees between 1 and 8, see e.g. 392@cite SchoederKormann2018. In the limit of higher degrees, $p>10$, a scaling of 393$p^2$ is more accurate, related to the inverse estimates typically used for 394interior penalty methods. Regarding the <i>effective</i> mesh sizes $h_u$ and 395$h_c$ used in the formula, we note that the convective transport is 396directional. Thus an appropriate scaling is to use the element length in the 397direction of the velocity $\mathbf u$. The code below derives this scaling 398from the inverse of the Jacobian from the reference to real cell, i.e., we 399approximate $\frac{\|\mathbf{u}\|}{h_u} \approx \|J^{-1} \mathbf 400u\|_{\infty}$. The acoustic waves, instead, are isotropic in character, which 401is why we use the smallest feature size, represented by the smallest singular 402value of $J$, for the acoustic scaling $h_c$. Finally, we need to add the 403convective and acoustic limits, as the Euler equations can transport 404information with speed $\|\mathbf{u}\|+c$. 405 406In this tutorial program, we use a specific variant of <a 407href="https://en.wikipedia.org/wiki/Runge%E2%80%93Kutta_methods">explicit 408Runge--Kutta methods</a>, which in general use the following update procedure 409from the state $\mathbf{w}_h^{n}$ at time $t^n$ to the new time $t^{n+1}$ with 410$\Delta t = t^{n+1}-t^n$: 411@f[ 412\begin{aligned} 413\mathbf{k}_1 &= \mathcal M^{-1} \mathcal L_h\left(t^n, \mathbf{w}_h^n\right), 414\\ 415\mathbf{k}_2 &= \mathcal M^{-1} \mathcal L_h\left(t^n+c_2\Delta t, 416 \mathbf{w}_h^n + a_{21} \Delta t \mathbf{k}_1\right), 417\\ 418&\vdots \\ 419\mathbf{k}_s &= \mathcal M^{-1} \mathcal L_h\left(t^n+c_s\Delta t, 420 \mathbf{w}_h^n + \sum_{j=1}^{s-1} a_{sj} \Delta t \mathbf{k}_j\right), 421\\ 422\mathbf{w}_h^{n+1} &= \mathbf{w}_h^n + \Delta t\left(b_1 \mathbf{k}_1 + 423b_2 \mathbf{k}_2 + \ldots + b_s \mathbf{k}_s\right). 424\end{aligned} 425@f] 426The vectors $\mathbf{k}_i$, $i=1,\ldots,s$, in an $s$-stage scheme are 427evaluations of the operator at some intermediate state and used to define the 428end-of-step value $\mathbf{w}_h^{n+1}$ via some linear combination. The scalar 429coefficients in this scheme, $c_i$, $a_{ij}$, and $b_j$, are defined such that 430certain conditions are satisfied for higher order schemes, the most basic one 431being $c_i = \sum_{j=1}^{i-1}a_{ij}$. The parameters are typically collected in 432the form of a so-called <a 433href="https://en.wikipedia.org/wiki/Runge%E2%80%93Kutta_methods#Explicit_Runge%E2%80%93Kutta_methods">Butcher 434tableau</a> that collects all of the coefficients that define the 435scheme. For a five-stage scheme, it would look like this: 436@f[ 437\begin{array}{c|ccccc} 4380 \\ 439c_2 & a_{21} \\ 440c_3 & a_{31} & a_{32} \\ 441c_4 & a_{41} & a_{42} & a_{43} \\ 442c_5 & a_{51} & a_{52} & a_{53} & a_{54} \\ 443\hline 444& b_1 & b_2 & b_3 & b_4 & b_5 445\end{array} 446@f] 447 448In this tutorial program, we use a subset of explicit Runge--Kutta methods, 449so-called low-storage Runge--Kutta methods (LSRK), which assume additional 450structure in the coefficients. In the variant used by reference 451@cite KennedyCarpenterLewis2000, the assumption is to use Butcher tableaus of 452the form 453@f[ 454\begin{array}{c|ccccc} 4550 \\ 456c_2 & a_1 \\ 457c_3 & b_1 & a_2 \\ 458c_4 & b_1 & b_2 & a_3 \\ 459c_5 & b_1 & b_2 & b_3 & a_4 \\ 460\hline 461& b_1 & b_2 & b_3 & b_4 & b_5 462\end{array} 463@f] 464With such a definition, the update to $\mathbf{w}_h^n$ shares the storage with 465the information for the intermediate values $\mathbf{k}_i$. Starting with 466$\mathbf{w}^{n+1}=\mathbf{w}^n$ and $\mathbf{r}_1 = \mathbf{w}^n$, the update 467in each of the $s$ stages simplifies to 468@f[ 469\begin{aligned} 470\mathbf{k}_i &= 471\mathcal M^{-1} \mathcal L_h\left(t^n+c_i\Delta t, \mathbf{r}_{i} \right),\\ 472\mathbf{r}_{i+1} &= \mathbf{w}_h^{n+1} + \Delta t \, a_i \mathbf{k}_i,\\ 473\mathbf{w}_h^{n+1} &= \mathbf{w}_h^{n+1} + \Delta t \, b_i \mathbf{k}_i. 474\end{aligned} 475@f] 476Besides the vector $\mathbf w_h^{n+1}$ that is successively updated, this scheme 477only needs two auxiliary vectors, namely the vector $\mathbf{k}_i$ to hold the 478evaluation of the differential operator, and the vector $\mathbf{r}_i$ that 479holds the right-hand side for the differential operator application. In 480subsequent stages $i$, the values $\mathbf{k}_i$ and $\mathbf{r}_i$ can use 481the same storage. 482 483The main advantages of low-storage variants are the reduced memory consumption 484on the one hand (if a very large number of unknowns must be fit in memory, 485holding all $\mathbf{k}_i$ to compute subsequent updates can be a limit 486already for $s$ in between five and eight -- recall that we are using 487an explicit scheme, so we do not need to store any matrices that are 488typically much larger than a few vectors), and the reduced memory access on 489the other. In this program, we are particularly interested in the latter 490aspect. Since cost of operator evaluation is only a small multiple of the cost 491of simply streaming the input and output vector from memory with the optimized 492matrix-free methods of deal.II, we must consider the cost of vector updates, 493and low-storage variants can deliver up to twice the throughput of 494conventional explicit Runge--Kutta methods for this reason, see e.g. the 495analysis in @cite SchoederKormann2018. 496 497Besides three variants for third, fourth and fifth order accuracy from the 498reference @cite KennedyCarpenterLewis2000, we also use a fourth-order accurate 499variant with seven stages that was optimized for acoustics setups from 500@cite TseliosSimos2007. Acoustic problems are one of the interesting aspects of 501the subsonic regime of the Euler equations where compressibility leads to the 502transmission of sound waves; often, one uses further simplifications of the 503linearized Euler equations around a background state or the acoustic wave 504equation around a fixed frame. 505 506 507<h3>Fast evaluation of integrals by matrix-free techniques</h3> 508 509The major ingredients used in this program are the fast matrix-free techniques 510we use to evaluate the operator $\mathcal L_h$ and the inverse mass matrix 511$\mathcal M$. Actually, the term <i>matrix-free</i> is a slight misnomer, 512because we are working with a nonlinear operator and do not linearize the 513operator that in turn could be represented by a matrix. However, fast 514evaluation of integrals has become popular as a replacement of sparse 515matrix-vector products, as shown in step-37 and step-59, and we have coined 516this infrastructure <i>matrix-free functionality</i> in deal.II for this 517reason. Furthermore, the inverse mass matrix is indeed applied in a 518matrix-free way, as detailed below. 519 520The matrix-free infrastructure allows us to quickly evaluate the integrals in 521the weak forms. The ingredients are the fast interpolation from solution 522coefficients into values and derivatives at quadrature points, point-wise 523operations at quadrature points (where we implement the differential operator 524as derived above), as well as multiplication by all test functions and 525summation over quadrature points. The first and third component make use of 526sum factorization and have been extensively discussed in the step-37 tutorial 527program for the cell integrals and step-59 for the face integrals. The only 528difference is that we now deal with a system of $d+2$ components, rather than 529the scalar systems in previous tutorial programs. In the code, all that 530changes is a template argument of the FEEvaluation and FEFaceEvaluation 531classes, the one to set the number of components. The access to the vector is 532the same as before, all handled transparently by the evaluator. We also note 533that the variant with a single evaluator chosen in the code below is not the 534only choice -- we could also have used separate evalators for the separate 535components $\rho$, $\rho \mathbf u$, and $E$; given that we treat all 536components similarly (also reflected in the way we state the equation as a 537vector system), this would be more complicated here. As before, the 538FEEvaluation class provides explicit vectorization by combining the operations 539on several cells (and faces), involving data types called 540VectorizedArray. Since the arithmetic operations are overloaded for this type, 541we do not have to bother with it all that much, except for the evaluation of 542functions through the Function interface, where we need to provide particular 543<i>vectorized</i> evaluations for several quadrature point locations at once. 544 545A more substantial change in this program is the operation at quadrature 546points: Here, the multi-component evaluators provide us with return types not 547discussed before. Whereas FEEvaluation::get_value() would return a scalar 548(more precisely, a VectorizedArray type due to vectorization across cells) for 549the Laplacian of step-37, it now returns a type that is 550`Tensor<1,dim+2,VectorizedArray<Number>>`. Likewise, the gradient type is now 551`Tensor<1,dim+2,Tensor<1,dim,VectorizedArray<Number>>>`, where the outer 552tensor collects the `dim+2` components of the Euler system, and the inner 553tensor the partial derivatives in the various directions. For example, the 554flux $\mathbf{F}(\mathbf{w})$ of the Euler system is of this type. In order to reduce the amount of 555code we have to write for spelling out these types, we use the C++ `auto` 556keyword where possible. 557 558From an implementation point of view, the nonlinearity is not a big 559difficulty: It is introduced naturally as we express the terms of the Euler 560weak form, for example in the form of the momentum term $\rho \mathbf{u} 561\otimes \mathbf{u}$. To obtain this expression, we first deduce the velocity 562$\mathbf{u}$ from the momentum variable $\rho \mathbf{u}$. Given that $\rho 563\mathbf{u}$ is represented as a $p$-degree polynomial, as is $\rho$, the 564velocity $\mathbf{u}$ is a rational expression in terms of the reference 565coordinates $\hat{\mathbf{x}}$. As we perform the multiplication $(\rho 566\mathbf{u})\otimes \mathbf{u}$, we obtain an expression that is the 567ratio of two polynomials, with polynomial degree $2p$ in the 568numerator and polynomial degree $p$ in the denominator. Combined with the 569gradient of the test function, the integrand is of degree $3p$ in the 570numerator and $p$ in the denominator already for affine cells, i.e., 571for parallelograms/ parallelepipeds. 572For curved cells, additional polynomial and rational expressions 573appear when multiplying the integrand by the determinant of the Jacobian of 574the mapping. At this point, one usually needs to give up on insisting on exact 575integration, and take whatever accuracy the Gaussian (more precisely, 576Gauss--Legrende) quadrature provides. The situation is then similar to the one 577for the Laplace equation, where the integrand contains rational expressions on 578non-affince cells and is also only integrated approximately. As these formulas 579only integrate polynomials exactly, we have to live with the <a 580href="https://mathoverflow.net/questions/26018/what-are-variational-crimes-and-who-coined-the-term">variational 581crime</a> in the form of an integration error. 582 583While inaccurate integration is usually tolerable for elliptic problems, for 584hyperbolic problems inexact integration causes some headache in the form of an 585effect called <b>aliasing</b>. The term comes from signal processing and 586expresses the situation of inappropriate, too coarse sampling. In terms of 587quadrature, the inappropriate sampling means that we use too few quadrature 588points compared to what would be required to accurately sample the 589variable-coefficient integrand. It has been shown in the DG literature that 590aliasing errors can introduce unphysical oscillations in the numerical 591solution for <i>barely</i> resolved simulations. The fact that aliasing mostly 592affects coarse resolutions -- whereas finer meshes with the same scheme 593work fine -- is not surprising because well-resolved simulations 594tend to be smooth on length-scales of a cell (i.e., they have 595small coefficients in the higher polynomial degrees that are missed by 596too few quadrature points, whereas the main solution contribution in the lower 597polynomial degrees is still well-captured -- this is simply a consequence of Taylor's 598theorem). To address this topic, various approaches have been proposed in the 599DG literature. One technique is filtering which damps the solution components 600pertaining to higher polynomial degrees. As the chosen nodal basis is not 601hierarchical, this would mean to transform from the nodal basis into a 602hierarchical one (e.g., a modal one based on Legendre polynomials) where the 603contributions within a cell are split by polynomial degrees. In that basis, 604one could then multiply the solution coefficients associated with higher 605degrees by a small number, keep the lower ones intact (to not destroy consistency), and 606then transform back to the nodal basis. However, filters reduce the accuracy of the 607method. Another, in some sense simpler, strategy is to use more quadrature 608points to capture non-linear terms more accurately. Using more than $p+1$ 609quadrature points per coordinate directions is sometimes called 610over-integration or consistent integration. The latter name is most common in 611the context of the incompressible Navier-Stokes equations, where the 612$\mathbf{u}\otimes \mathbf{u}$ nonlinearity results in polynomial integrands 613of degree $3p$ (when also considering the test function), which can be 614integrated exactly with $\textrm{floor}\left(\frac{3p}{2}\right)+1$ quadrature 615points per direction as long as the element geometry is affine. In the context 616of the Euler equations with non-polynomial integrands, the choice is less 617clear. Depending on the variation in the various variables both 618$\textrm{floor}\left(\frac{3p}{2}\right)+1$ or $2p+1$ points (integrating 619exactly polynomials of degree $3p$ or $4p$, respectively) are common. 620 621To reflect this variability in the choice of quadrature in the program, we 622keep the number of quadrature points a variable to be specified just as the 623polynomial degree, and note that one would make different choices depending 624also on the flow configuration. The default choice is $p+2$ points -- a bit 625more than the minimum possible of $p+1$ points. The FEEvaluation and 626FEFaceEvaluation classes allow to seamlessly change the number of points by a 627template parameter, such that the program does not get more complicated 628because of that. 629 630 631<h3>Evaluation of the inverse mass matrix with matrix-free techniques</h3> 632 633The last ingredient is the evaluation of the inverse mass matrix $\mathcal 634M^{-1}$. In DG methods with explicit time integration, mass matrices are 635block-diagonal and thus easily inverted -- one only needs to invert the 636diagonal blocks. However, given the fact that matrix-free evaluation of 637integrals is closer in cost to the access of the vectors only, even the 638application of a block-diagonal matrix (e.g. via an array of LU factors) would 639be several times more expensive than evaluation of $\mathcal L_h$ 640simply because just storing and loading matrices of size 641`dofs_per_cell` times `dofs_per_cell` for higher order finite elements 642repeatedly is expensive. As this is 643clearly undesirable, part of the community has moved to bases where the mass 644matrix is diagonal, for example the <i>L<sub>2</sub></i>-orthogonal Legendre basis using 645hierarchical polynomials or Lagrange polynomials on the points of the Gaussian 646quadrature (which is just another way of utilizing Legendre 647information). While the diagonal property breaks down for deformed elements, 648the error made by taking a diagonal mass matrix and ignoring the rest (a 649variant of mass lumping, though not the one with an additional integration 650error as utilized in step-48) has been shown to not alter discretization 651accuracy. The Lagrange basis in the points of Gaussian quadrature is sometimes 652also referred to as a collocation setup, as the nodal points of the 653polynomials coincide (= are "co-located") with the points of quadrature, obviating some 654interpolation operations. Given the fact that we want to use more quadrature 655points for nonlinear terms in $\mathcal L_h$, however, the collocation 656property is lost. (More precisely, it is still used in FEEvaluation and 657FEFaceEvaluation after a change of basis, see the matrix-free paper 658@cite KronbichlerKormann2019.) 659 660In this tutorial program, we use the collocation idea for the application of 661the inverse mass matrix, but with a slight twist. Rather than using the 662collocation via Lagrange polynomials in the points of Gaussian quadrature, we 663prefer a conventional Lagrange basis in Gauss-Lobatto points as those make the 664evaluation of face integrals cheap. This is because for Gauss-Lobatto 665points, some of the node points are located on the faces of the cell 666and it is not difficult to show that on any given face, the only shape 667functions with non-zero values are exactly the ones whose node points 668are in fact located on that face. One could of course also use the 669Gauss-Lobatto quadrature (with some additional integration error) as was done 670in step-48, but we do not want to sacrifice accuracy as these 671quadrature formulas are generally of lower order than the general 672Gauss quadrature formulas. Instead, we use an idea described in the reference 673@cite KronbichlerSchoeder2016 where it was proposed to change the basis for the 674sake of applying the inverse mass matrix. Let us denote by $S$ the matrix of 675shape functions evaluated at quadrature points, with shape functions in the row 676of the matrix and quadrature points in columns. Then, the mass matrix on a cell 677$K$ is given by 678@f[ 679\mathcal M^K = S J^K S^\mathrm T. 680@f] 681Here, $J^K$ is the diagonal matrix with the determinant of the Jacobian times 682the quadrature weight (JxW) as entries. The matrix $S$ is constructed as the 683Kronecker product (tensor product) of one-dimensional matrices, e.g. in 3D as 684@f[ 685S = S_{\text{1D}}\otimes S_{\text{1D}}\otimes S_{\text{1D}}, 686@f] 687which is the result of the basis functions being a tensor product of 688one-dimensional shape functions and the quadrature formula being the tensor 689product of 1D quadrature formulas. For the case that the number of polynomials 690equals the number of quadrature points, all matrices in $S J^K S^\mathrm T$ 691are square, and also the ingredients to $S$ in the Kronecker product are 692square. Thus, one can invert each matrix to form the overall inverse, 693@f[ 694\left(\mathcal M^K\right)^{-1} = S_{\text{1D}}^{-\mathrm T}\otimes 695S_{\text{1D}}^{-\mathrm T}\otimes S_{\text{1D}}^{-\mathrm T} 696\left(J^K\right)^{-1} 697S_{\text{1D}}^{-1}\otimes S_{\text{1D}}^{-1}\otimes S_{\text{1D}}^{-1}. 698@f] 699This formula is of exactly the same structure as the steps in the forward 700evaluation of integrals with sum factorization techniques (i.e., the 701FEEvaluation and MatrixFree framework of deal.II). Hence, we can utilize the 702same code paths with a different interpolation matrix, 703$S_{\mathrm{1D}}^{-\mathrm{T}}$ rather than $S_{\mathrm{1D}}$. 704 705The class MatrixFreeOperators::CellwiseInverseMassMatrix implements this 706operation: It changes from the basis contained in the finite element (in this 707case, FE_DGQ) to the Lagrange basis in Gaussian quadrature points. Here, the 708inverse of a diagonal mass matrix can be evaluated, which is simply the inverse 709of the `JxW` factors (i.e., the quadrature weight times the determinant of the 710Jacobian from reference to real coordinates). Once this is done, we can change 711back to the standard nodal Gauss-Lobatto basis. 712 713The advantage of this particular way of applying the inverse mass matrix is 714a cost similar to the forward application of a mass matrix, which is cheaper 715than the evaluation of the spatial operator $\mathcal L_h$ 716with over-integration and face integrals. (We 717will demonstrate this with detailed timing information in the 718<a href="#Results">results section</a>.) In fact, it 719is so cheap that it is limited by the bandwidth of reading the source vector, 720reading the diagonal, and writing into the destination vector on most modern 721architectures. The hardware used for the result section allows to do the 722computations at least twice as fast as the streaming of the vectors from 723memory. 724 725 726<h3>The test case</h3> 727 728In this tutorial program, we implement two test cases. The first case is a 729convergence test limited to two space dimensions. It runs a so-called 730isentropic vortex which is transported via a background flow field. The second 731case uses a more exciting setup: We start with a cylinder immersed in a 732channel, using the GridGenerator::channel_with_cylinder() function. Here, we 733impose a subsonic initial field at Mach number of $\mathrm{Ma}=0.307$ with a 734constant velocity in $x$ direction. At the top and bottom walls as well as at 735the cylinder, we impose a no-penetration (i.e., tangential flow) 736condition. This setup forces the flow to re-orient as compared to the initial 737condition, which results in a big sound wave propagating away from the 738cylinder. In upstream direction, the wave travels more slowly (as it 739has to move against the oncoming gas), including a 740discontinuity in density and pressure. In downstream direction, the transport 741is faster as sound propagation and fluid flow go in the same direction, which smears 742out the discontinuity somewhat. Once the sound wave hits the upper and lower 743walls, the sound is reflected back, creating some nice shapes as illustrated 744in the <a href="#Results">results section</a> below. 745