1
2<br>
3
4<i>
5This program was contributed by Martin Kronbichler. Many ideas presented here
6are the result of common code development with Niklas Fehn, Katharina Kormann,
7Peter Munch, and Svenja Schoeder.
8
9This work was partly supported by the German Research Foundation (DFG) through
10the project "High-order discontinuous Galerkin for the exa-scale" (ExaDG)
11within the priority program "Software for Exascale Computing" (SPPEXA).
12</i>
13
14<a name="Intro"></a>
15<h1>Introduction</h1>
16
17This tutorial program solves the Euler equations of fluid dynamics using an
18explicit time integrator with the matrix-free framework applied to a
19high-order discontinuous Galerkin discretization in space. For details about
20the Euler system and an alternative implicit approach, we also refer to the
21step-33 tutorial program. You might also want to look at step-69 for
22an alternative approach to solving these equations.
23
24
25<h3>The Euler equations</h3>
26
27The Euler equations are a conservation law, describing the motion of a
28compressible inviscid gas,
29@f[
30\frac{\partial \mathbf{w}}{\partial t} + \nabla \cdot \mathbf{F}(\mathbf{w}) =
31\mathbf{G}(\mathbf w),
32@f]
33where the $d+2$ components of the solution vector are $\mathbf{w}=(\rho, \rho
34u_1,\ldots,\rho u_d,E)^{\mathrm T}$. Here, $\rho$ denotes the fluid density,
35${\mathbf u}=(u_1,\ldots, u_d)^\mathrm T$ the fluid velocity, and $E$ the
36energy density of the gas. The velocity is not directly solved for, but rather
37the variable $\rho \mathbf{u}$, the linear momentum (since this is the
38conserved quantity).
39
40The Euler flux function, a $(d+2)\times d$ matrix, is defined as
41@f[
42  \mathbf F(\mathbf w)
43  =
44  \begin{pmatrix}
45  \rho \mathbf{u}\\
46  \rho \mathbf{u} \otimes \mathbf{u} + \mathbb{I}p\\
47  (E+p)\mathbf{u}
48  \end{pmatrix}
49@f]
50with $\mathbb{I}$ the $d\times d$ identity matrix and $\otimes$ the outer
51product; its components denote the mass, momentum, and energy fluxes, respectively.
52The right hand side forcing is given by
53@f[
54  \mathbf G(\mathbf w)
55  =
56  \begin{pmatrix}
57  0\\
58  \rho\mathbf{g}\\
59  \rho \mathbf{u} \cdot \mathbf{g}
60  \end{pmatrix},
61@f]
62where the vector $\mathbf g$ denotes the direction and magnitude of
63gravity. It could, however, also denote any other external force per unit mass
64that is acting on the fluid. (Think, for example, of the electrostatic
65forces exerted by an external electric field on charged particles.)
66
67The three blocks of equations, the second involving $d$ components, describe
68the conservation of mass, momentum, and energy. The pressure is not a
69solution variable but needs to be expressed through a "closure relationship"
70by the other variables; we here choose the relationship appropriate
71for a gas with molecules composed of two atoms, which at moderate
72temperatures is given by $p=(\gamma - 1) \left(E-\frac 12 \rho
73\mathbf{u}\cdot \mathbf{u}\right)$ with the constant $\gamma = 1.4$.
74
75
76<h3>High-order discontinuous Galerkin discretization</h3>
77
78For spatial discretization, we use a high-order discontinuous Galerkin (DG)
79discretization, using a solution expansion of the form
80@f[
81\mathbf{w}_h(\mathbf{x}, t) =
82\sum_{j=1}^{n_\mathbf{dofs}} \boldsymbol{\varphi}_j(\mathbf{x}) {w}_j(t).
83@f]
84Here, $\boldsymbol{\varphi}_j$ denotes the $j$th basis function, written
85in vector form with separate shape functions for the different components and
86letting $w_j(t)$ go through the density, momentum, and energy variables,
87respectively. In this form, the space dependence is contained in the shape
88functions and the time dependence in the unknown coefficients $w_j$. As
89opposed to the continuous finite element method where some shape functions
90span across element boundaries, the shape functions are local to a single
91element in DG methods, with a discontinuity from one element to the next. The
92connection of the solution from one cell to its neighbors is instead
93imposed by the numerical fluxes
94specified below. This allows for some additional flexibility, for example to
95introduce directionality in the numerical method by, e.g., upwinding.
96
97DG methods are popular methods for solving problems of transport character
98because they combine low dispersion errors with controllable dissipation on
99barely resolved scales. This makes them particularly attractive for simulation
100in the field of fluid dynamics where a wide range of active scales needs to be
101represented and inadequately resolved features are prone to disturb the
102important well-resolved features. Furthermore, high-order DG methods are
103well-suited for modern hardware with the right implementation. At the same
104time, DG methods are no silver bullet. In particular when the solution
105develops discontinuities (shocks), as is typical for the Euler equations in
106some flow regimes, high-order DG methods tend to oscillatory solutions, like
107all high-order methods when not using flux- or slope-limiters. This is a consequence of <a
108href="https://en.wikipedia.org/wiki/Godunov%27s_theorem">Godunov's theorem</a>
109that states that any total variation limited (TVD) scheme that is linear (like
110a basic DG discretization) can at most be first-order accurate. Put
111differently, since DG methods aim for higher order accuracy, they cannot be
112TVD on solutions that develop shocks. Even though some communities claim that
113the numerical flux in DG methods can control dissipation, this is of limited
114value unless <b>all</b> shocks in a problem align with cell boundaries. Any
115shock that passes through the interior of cells will again produce oscillatory
116components due to the high-order polynomials. In the finite element and DG
117communities, there exist a number of different approaches to deal with shocks,
118for example the introduction of artificial diffusion on troubled cells (using
119a troubled-cell indicator based e.g. on a modal decomposition of the
120solution), a switch to dissipative low-order finite volume methods on a
121subgrid, or the addition of some limiting procedures. Given the ample
122possibilities in this context, combined with the considerable implementation
123effort, we here refrain from the regime of the Euler equations with pronounced
124shocks, and rather concentrate on the regime of subsonic flows with wave-like
125phenomena. For a method that works well with shocks (but is more expensive per
126unknown), we refer to the step-69 tutorial program.
127
128For the derivation of the DG formulation, we multiply the Euler equations with
129test functions $\mathbf{v}$ and integrate over an individual cell $K$, which
130gives
131@f[
132\left(\mathbf{v}, \frac{\partial \mathbf{w}}{\partial t}\right)_{K}
133+ \left(\mathbf{v}, \nabla \cdot \mathbf{F}(\mathbf{w})\right)_{K} =
134\left(\mathbf{v},\mathbf{G}(\mathbf w)\right)_{K}.
135@f]
136
137We then integrate the second term by parts, moving the divergence
138from the solution slot to the test function slot, and producing an integral
139over the element boundary:
140@f[
141\left(\mathbf{v}, \frac{\partial \mathbf{w}}{\partial t}\right)_{K}
142- \left(\nabla \mathbf{v}, \mathbf{F}(\mathbf{w})\right)_{K}
143+ \left<\mathbf{v}, \mathbf{n} \cdot \widehat{\mathbf{F}}(\mathbf{w})
144\right>_{\partial K} =
145\left(\mathbf{v},\mathbf{G}(\mathbf w)\right)_{K}.
146@f]
147In the surface integral, we have replaced the term $\mathbf{F}(\mathbf w)$ by
148the term $\widehat{\mathbf{F}}(\mathbf w)$, the numerical flux. The role of
149the numerical flux is to connect the solution on neighboring elements and
150weakly impose continuity of the solution. This ensures that the global
151coupling of the PDE is reflected in the discretization, despite independent
152basis functions on the cells. The connectivity to the neighbor is included by
153defining the numerical flux as a function $\widehat{\mathbf{F}}(\mathbf w^-,
154\mathbf w^+)$ of the solution from both sides of an interior face, $\mathbf
155w^-$ and $\mathbf w^+$. A basic property we require is that the numerical flux
156needs to be <b>conservative</b>. That is, we want all information (i.e.,
157mass, momentum, and energy) that leaves a cell over
158a face to enter the neighboring cell in its entirety and vice versa. This can
159be expressed as $\widehat{\mathbf{F}}(\mathbf w^-, \mathbf w^+) =
160\widehat{\mathbf{F}}(\mathbf w^+, \mathbf w^-)$, meaning that the numerical
161flux evaluates to the same result from either side. Combined with the fact
162that the numerical flux is multiplied by the unit outer normal vector on the
163face under consideration, which points in opposite direction from the two
164sides, we see that the conservation is fulfilled. An alternative point of view
165of the numerical flux is as a single-valued intermediate state that links the
166solution weakly from both sides.
167
168There is a large number of numerical flux functions available, also called
169Riemann solvers. For the Euler equations, there exist so-called exact Riemann
170solvers -- meaning that the states from both sides are combined in a way that
171is consistent with the Euler equations along a discontinuity -- and
172approximate Riemann solvers, which violate some physical properties and rely
173on other mechanisms to render the scheme accurate overall. Approxiate Riemann
174solvers have the advantage of beging cheaper to compute. Most flux functions
175have their origin in the finite volume community, which are similar to DG
176methods with polynomial degree 0 within the cells (called volumes). As the
177volume integral of the Euler operator $\mathbf{F}$ would disappear for
178constant solution and test functions, the numerical flux must fully represent
179the physical operator, explaining why there has been a large body of research
180in that community. For DG methods, consistency is guaranteed by higher order
181polynomials within the cells, making the numerical flux less of an issue and
182usually affecting only the convergence rate, e.g., whether the solution
183converges as $\mathcal O(h^p)$, $\mathcal O(h^{p+1/2})$ or $\mathcal
184O(h^{p+1})$ in the $L_2$ norm for polynomials of degree $p$. The numerical
185flux can thus be seen as a mechanism to select more advantageous
186dissipation/dispersion properties or regarding the extremal eigenvalue of the
187discretized and linearized operator, which affect the maximal admissible time
188step size in explicit time integrators.
189
190In this tutorial program, we implement two variants of fluxes that can be
191controlled via a switch in the program (of course, it would be easy to make
192them a run time parameter controlled via an input file). The first flux is
193the local Lax--Friedrichs flux
194@f[
195\hat{\mathbf{F}}(\mathbf{w}^-,\mathbf{w}^+) =
196\frac{\mathbf{F}(\mathbf{w}^-)+\mathbf{F}(\mathbf{w}^+)}{2} +
197   \frac{\lambda}{2}\left[\mathbf{w}^--\mathbf{w}^+\right]\otimes
198   \mathbf{n^-}.
199@f]
200
201In the original definition of the Lax--Friedrichs flux, a factor $\lambda =
202\max\left(\|\mathbf{u}^-\|+c^-, \|\mathbf{u}^+\|+c^+\right)$ is used
203(corresponding to the maximal speed at which information is moving on
204the two sides of the interface), stating
205that the difference between the two states, $[\![\mathbf{w}]\!]$ is penalized
206by the largest eigenvalue in the Euler flux, which is $\|\mathbf{u}\|+c$,
207where $c=\sqrt{\gamma p / \rho}$ is the speed of sound. In the implementation
208below, we modify the penalty term somewhat, given that the penalty is of
209approximate nature anyway. We use
210@f{align*}{
211\lambda
212&=
213\frac{1}{2}\max\left(\sqrt{\|\mathbf{u^-}\|^2+(c^-)^2},
214                     \sqrt{\|\mathbf{u}^+\|^2+(c^+)^2}\right)
215\\
216&=
217\frac{1}{2}\sqrt{\max\left(\|\mathbf{u^-}\|^2+(c^-)^2,
218                           \|\mathbf{u}^+\|^2+(c^+)^2\right)}.
219@f}
220The additional factor $\frac 12$ reduces the penalty strength (which results
221in a reduced negative real part of the eigenvalues, and thus increases the
222admissible time step size). Using the squares within the sums allows us to
223reduce the number of expensive square root operations, which is 4 for the
224original Lax--Friedrichs definition, to a single one.
225This simplification leads to at most a factor of
2262 in the reduction of the parameter $\lambda$, since $\|\mathbf{u}\|^2+c^2 \leq
227\|\mathbf{u}\|^2+2 c |\mathbf{u}\| + c^2 = \left(\|\mathbf{u}\|+c\right)^2
228\leq 2 \left(\|\mathbf{u}\|^2+c^2\right)$, with the last inequality following
229from Young's inequality.
230
231The second numerical flux is one proposed by Harten, Lax and van Leer, called
232the HLL flux. It takes the different directions of propagation of the Euler
233equations into account, depending on the speed of sound. It utilizes some
234intermediate states $\bar{\mathbf{u}}$ and $\bar{c}$ to define the two
235branches $s^\mathrm{p} = \max\left(0, \bar{\mathbf{u}}\cdot \mathbf{n} +
236\bar{c}\right)$ and $s^\mathrm{n} = \min\left(0, \bar{\mathbf{u}}\cdot
237\mathbf{n} - \bar{c}\right)$. From these branches, one then defines the flux
238@f[
239\hat{\mathbf{F}}(\mathbf{w}^-,\mathbf{w}^+) =
240\frac{s^\mathrm{p} \mathbf{F}(\mathbf{w}^-)-s^\mathrm{n} \mathbf{F}(\mathbf{w}^+)}
241                   {s^\mathrm p - s^\mathrm{n} } +
242\frac{s^\mathrm{p} s^\mathrm{n}}{s^\mathrm{p}-s^\mathrm{n}}
243\left[\mathbf{w}^--\mathbf{w}^+\right]\otimes \mathbf{n^-}.
244@f]
245Regarding the definition of the intermediate state $\bar{\mathbf{u}}$ and
246$\bar{c}$, several variants have been proposed. The variant originally
247proposed uses a density-averaged definition of the velocity, $\bar{\mathbf{u}}
248= \frac{\sqrt{\rho^-} \mathbf{u}^- + \sqrt{\rho^+}\mathbf{u}^+}{\sqrt{\rho^-}
249+ \sqrt{\rho^+}}$. Since we consider the Euler equations without shocks, we
250simply use arithmetic means, $\bar{\mathbf{u}} = \frac{\mathbf{u}^- +
251\mathbf{u}^+}{2}$ and $\bar{c} = \frac{c^- + c^+}{2}$, with $c^{\pm} =
252\sqrt{\gamma p^{\pm} / \rho^{\pm}}$, in this tutorial program, and leave other
253variants to a possible extension. We also note that the HLL flux has been
254extended in the literature to the so-called HLLC flux, where C stands for the
255ability to represent contact discontinuities.
256
257At the boundaries with no neighboring state $\mathbf{w}^+$ available, it is
258common practice to deduce suitable exterior values from the boundary
259conditions (see the general literature on DG methods for details). In this
260tutorial program, we consider three types of boundary conditions, namely
261<b>inflow boundary conditions</b> where all components are prescribed,
262@f[
263\mathbf{w}^+ = \begin{pmatrix} \rho_\mathrm{D}(t)\\
264(\rho \mathbf u)_{\mathrm D}(t) \\ E_\mathrm{D}(t)\end{pmatrix} \quad
265 \text{(Dirichlet)},
266@f]
267<b>subsonic outflow boundaries</b>, where we do not prescribe exterior
268solutions as the flow field is leaving the domain and use the interior values
269instead; we still need to prescribe the energy as there is one incoming
270characteristic left in the Euler flux,
271@f[
272\mathbf{w}^+ = \begin{pmatrix} \rho^-\\
273(\rho \mathbf u)^- \\ E_\mathrm{D}(t)\end{pmatrix} \quad
274 \text{(mixed Neumann/Dirichlet)},
275@f]
276and <b>wall boundary condition</b> which describe a no-penetration
277configuration:
278@f[
279\mathbf{w}^+ = \begin{pmatrix} \rho^-\\
280(\rho \mathbf u)^- - 2 [(\rho \mathbf u)^-\cdot \mathbf n] \mathbf{n}
281 \\ E^-\end{pmatrix}.
282@f]
283
284The polynomial expansion of the solution is finally inserted to the weak form
285and test functions are replaced by the basis functions. This gives a discrete
286in space, continuous in time nonlinear system with a finite number of unknown
287coefficient values $w_j$, $j=1,\ldots,n_\text{dofs}$. Regarding the choice of
288the polynomial degree in the DG method, there is no consensus in literature as
289of 2019 as to what polynomial degrees are most efficient and the decision is
290problem-dependent. Higher order polynomials ensure better convergence rates
291and are thus superior for moderate to high accuracy requirements for
292<b>smooth</b> solutions. At the same time, the volume-to-surface ratio
293of where degrees of freedom are located,
294increases with higher degrees, and this makes the effect of the numerical flux
295weaker, typically reducing dissipation. However, in most of the cases the
296solution is not smooth, at least not compared to the resolution that can be
297afforded. This is true for example in incompressible fluid dynamics,
298compressible fluid dynamics, and the related topic of wave propagation. In this
299pre-asymptotic regime, the error is approximately proportional to the
300numerical resolution, and other factors such as dispersion errors or the
301dissipative behavior become more important. Very high order methods are often
302ruled out because they come with more restrictive CFL conditions measured
303against the number of unknowns, and they are also not as flexible when it
304comes to representing complex geometries. Therefore, polynomial degrees
305between two and six are most popular in practice, see e.g. the efficiency
306evaluation in @cite FehnWallKronbichler2019 and references cited therein.
307
308<h3>Explicit time integration</h3>
309
310To discretize in time, we slightly rearrange the weak form and sum over all
311cells:
312@f[
313\sum_{K \in \mathcal T_h} \left(\boldsymbol{\varphi}_i,
314\frac{\partial \mathbf{w}}{\partial t}\right)_{K}
315=
316\sum_{K\in \mathcal T_h}
317\left[
318\left(\nabla \boldsymbol{\varphi}_i, \mathbf{F}(\mathbf{w})\right)_{K}
319-\left<\boldsymbol{\varphi}_i,
320\mathbf{n} \cdot \widehat{\mathbf{F}}(\mathbf{w})\right>_{\partial K} +
321\left(\boldsymbol{\varphi}_i,\mathbf{G}(\mathbf w)\right)_{K}
322\right],
323@f]
324where $\boldsymbol{\varphi}_i$ runs through all basis functions with from 1 to
325$n_\text{dofs}$.
326
327We now denote by $\mathcal M$ the mass matrix with entries $\mathcal M_{ij} =
328\sum_{K} \left(\boldsymbol{\varphi}_i,
329\boldsymbol{\varphi}_j\right)_K$, and by
330@f[
331\mathcal L_h(t,\mathbf{w}_h) = \left[\sum_{K\in \mathcal T_h}
332\left[
333\left(\nabla \boldsymbol{\varphi}_i, \mathbf{F}(\mathbf{w}_h)\right)_{K}
334- \left<\boldsymbol{\varphi}_i,
335\mathbf{n} \cdot \widehat{\mathbf{F}}(\mathbf{w}_h)\right>_{\partial K}
336+ \left(\boldsymbol{\varphi}_i,\mathbf{G}(\mathbf w_h)\right)_{K}
337\right]\right]_{i=1,\ldots,n_\text{dofs}}.
338@f]
339the operator evaluating the right-hand side of the Euler operator, given a
340function $\mathbf{w}_h$ associated with a global vector of unknowns
341and the finite element in use. This function $\mathcal L_h$ is explicitly time-dependent as the
342numerical flux evaluated at the boundary will involve time-dependent data
343$\rho_\mathrm{D}$, $(\rho \mathbf{u})_\mathrm{D}$, and $E_\mathbf{D}$ on some
344parts of the boundary, depending on the assignment of boundary
345conditions. With this notation, we can write the discrete in space, continuous
346in time system compactly as
347@f[
348\mathcal M \frac{\partial \mathbf{w}_h}{\partial t} =
349\mathcal L_h(t, \mathbf{w}_h),
350@f]
351where we have taken the liberty to also denote the global solution
352vector by $\mathbf{w}_h$ (in addition to the the corresponding finite
353element function). Equivalently, the system above has the form
354@f[
355\frac{\partial \mathbf{w}_h}{\partial t} =
356\mathcal M^{-1} \mathcal L_h(t, \mathbf{w}_h).
357@f]
358
359For hyperbolic systems discretized by high-order discontinuous Galerkin
360methods, explicit time integration of this system is very popular. This is due
361to the fact that the mass matrix $\mathcal M$ is block-diagonal (with each
362block corresponding to only variables of the same kind defined on the same
363cell) and thus easily inverted. In each time step -- or stage of a
364Runge--Kutta scheme -- one only needs to evaluate the differential operator
365once using the given data and subsequently apply the inverse of the mass
366matrix. For implicit time stepping, on the other hand, one would first have to
367linearize the equations and then iteratively solve the linear system, which
368involves several residual evaluations and at least a dozen applications of
369the linearized operator, as has been demonstrated in the step-33 tutorial
370program.
371
372Of course, the simplicity of explicit time stepping comes with a price, namely
373conditional stability due to the so-called Courant--Friedrichs--Lewy (CFL)
374condition. It states that the time step cannot be larger than the fastest
375propagation of information by the discretized differential operator. In more
376modern terms, the speed of propagation corresponds to the largest eigenvalue
377in the discretized operator, and in turn depends on the mesh size, the
378polynomial degree $p$ and the physics of the Euler operator, i.e., the
379eigenvalues of the linearization of $\mathbf F(\mathbf w)$ with respect to
380$\mathbf{w}$. In this program, we set the time step as follows:
381@f[
382\Delta t = \frac{\mathrm{Cr}}{p^{1.5}}\left(\frac{1}
383           {\max\left[\frac{\|\mathbf{u}\|}{h_u} + \frac{c}{h_c}\right]}\right),
384@f]
385
386with the maximum taken over all quadrature points and all cells. The
387dimensionless number $\mathrm{Cr}$ denotes the Courant number and can be
388chosen up to a maximally stable number $\mathrm{Cr}_\text{max}$, whose value
389depends on the selected time stepping method and its stability properties. The
390power $p^{1.5}$ used for the polynomial scaling is heuristic and represents
391the closest fit for polynomial degrees between 1 and 8, see e.g.
392@cite SchoederKormann2018. In the limit of higher degrees, $p>10$, a scaling of
393$p^2$ is more accurate, related to the inverse estimates typically used for
394interior penalty methods. Regarding the <i>effective</i> mesh sizes $h_u$ and
395$h_c$ used in the formula, we note that the convective transport is
396directional. Thus an appropriate scaling is to use the element length in the
397direction of the velocity $\mathbf u$. The code below derives this scaling
398from the inverse of the Jacobian from the reference to real cell, i.e., we
399approximate $\frac{\|\mathbf{u}\|}{h_u} \approx \|J^{-1} \mathbf
400u\|_{\infty}$. The acoustic waves, instead, are isotropic in character, which
401is why we use the smallest feature size, represented by the smallest singular
402value of $J$, for the acoustic scaling $h_c$. Finally, we need to add the
403convective and acoustic limits, as the Euler equations can transport
404information with speed $\|\mathbf{u}\|+c$.
405
406In this tutorial program, we use a specific variant of <a
407href="https://en.wikipedia.org/wiki/Runge%E2%80%93Kutta_methods">explicit
408Runge--Kutta methods</a>, which in general use the following update procedure
409from the state $\mathbf{w}_h^{n}$ at time $t^n$ to the new time $t^{n+1}$ with
410$\Delta t = t^{n+1}-t^n$:
411@f[
412\begin{aligned}
413\mathbf{k}_1 &= \mathcal M^{-1} \mathcal L_h\left(t^n, \mathbf{w}_h^n\right),
414\\
415\mathbf{k}_2 &= \mathcal M^{-1} \mathcal L_h\left(t^n+c_2\Delta t,
416                       \mathbf{w}_h^n + a_{21} \Delta t \mathbf{k}_1\right),
417\\
418&\vdots \\
419\mathbf{k}_s &= \mathcal M^{-1} \mathcal L_h\left(t^n+c_s\Delta t,
420  \mathbf{w}_h^n + \sum_{j=1}^{s-1} a_{sj} \Delta t \mathbf{k}_j\right),
421\\
422\mathbf{w}_h^{n+1} &= \mathbf{w}_h^n + \Delta t\left(b_1 \mathbf{k}_1 +
423b_2 \mathbf{k}_2 + \ldots + b_s \mathbf{k}_s\right).
424\end{aligned}
425@f]
426The vectors $\mathbf{k}_i$, $i=1,\ldots,s$, in an $s$-stage scheme are
427evaluations of the operator at some intermediate state and used to define the
428end-of-step value $\mathbf{w}_h^{n+1}$ via some linear combination. The scalar
429coefficients in this scheme, $c_i$, $a_{ij}$, and $b_j$, are defined such that
430certain conditions are satisfied for higher order schemes, the most basic one
431being $c_i = \sum_{j=1}^{i-1}a_{ij}$. The parameters are typically collected in
432the form of a so-called <a
433href="https://en.wikipedia.org/wiki/Runge%E2%80%93Kutta_methods#Explicit_Runge%E2%80%93Kutta_methods">Butcher
434tableau</a> that collects all of the coefficients that define the
435scheme. For a five-stage scheme, it would look like this:
436@f[
437\begin{array}{c|ccccc}
4380 \\
439c_2 & a_{21} \\
440c_3 & a_{31} & a_{32} \\
441c_4 & a_{41} & a_{42} & a_{43} \\
442c_5 & a_{51} & a_{52} & a_{53} & a_{54} \\
443\hline
444& b_1 & b_2 & b_3 & b_4 & b_5
445\end{array}
446@f]
447
448In this tutorial program, we use a subset of explicit Runge--Kutta methods,
449so-called low-storage Runge--Kutta methods (LSRK), which assume additional
450structure in the coefficients. In the variant used by reference
451@cite KennedyCarpenterLewis2000, the assumption is to use Butcher tableaus of
452the form
453@f[
454\begin{array}{c|ccccc}
4550 \\
456c_2 & a_1 \\
457c_3 & b_1 & a_2 \\
458c_4 & b_1 & b_2 & a_3 \\
459c_5 & b_1 & b_2 & b_3 & a_4 \\
460\hline
461& b_1 & b_2 & b_3 & b_4 & b_5
462\end{array}
463@f]
464With such a definition, the update to $\mathbf{w}_h^n$ shares the storage with
465the information for the intermediate values $\mathbf{k}_i$. Starting with
466$\mathbf{w}^{n+1}=\mathbf{w}^n$ and $\mathbf{r}_1 = \mathbf{w}^n$, the update
467in each of the $s$ stages simplifies to
468@f[
469\begin{aligned}
470\mathbf{k}_i &=
471\mathcal M^{-1} \mathcal L_h\left(t^n+c_i\Delta t, \mathbf{r}_{i} \right),\\
472\mathbf{r}_{i+1} &= \mathbf{w}_h^{n+1} + \Delta t \, a_i \mathbf{k}_i,\\
473\mathbf{w}_h^{n+1} &= \mathbf{w}_h^{n+1} + \Delta t \, b_i \mathbf{k}_i.
474\end{aligned}
475@f]
476Besides the vector $\mathbf w_h^{n+1}$ that is successively updated, this scheme
477only needs two auxiliary vectors, namely the vector $\mathbf{k}_i$ to hold the
478evaluation of the differential operator, and the vector $\mathbf{r}_i$ that
479holds the right-hand side for the differential operator application. In
480subsequent stages $i$, the values $\mathbf{k}_i$ and $\mathbf{r}_i$ can use
481the same storage.
482
483The main advantages of low-storage variants are the reduced memory consumption
484on the one hand (if a very large number of unknowns must be fit in memory,
485holding all $\mathbf{k}_i$ to compute subsequent updates can be a limit
486already for $s$ in between five and eight -- recall that we are using
487an explicit scheme, so we do not need to store any matrices that are
488typically much larger than a few vectors), and the reduced memory access on
489the other. In this program, we are particularly interested in the latter
490aspect. Since cost of operator evaluation is only a small multiple of the cost
491of simply streaming the input and output vector from memory with the optimized
492matrix-free methods of deal.II, we must consider the cost of vector updates,
493and low-storage variants can deliver up to twice the throughput of
494conventional explicit Runge--Kutta methods for this reason, see e.g. the
495analysis in @cite SchoederKormann2018.
496
497Besides three variants for third, fourth and fifth order accuracy from the
498reference @cite KennedyCarpenterLewis2000, we also use a fourth-order accurate
499variant with seven stages that was optimized for acoustics setups from
500@cite TseliosSimos2007. Acoustic problems are one of the interesting aspects of
501the subsonic regime of the Euler equations where compressibility leads to the
502transmission of sound waves; often, one uses further simplifications of the
503linearized Euler equations around a background state or the acoustic wave
504equation around a fixed frame.
505
506
507<h3>Fast evaluation of integrals by matrix-free techniques</h3>
508
509The major ingredients used in this program are the fast matrix-free techniques
510we use to evaluate the operator $\mathcal L_h$ and the inverse mass matrix
511$\mathcal M$. Actually, the term <i>matrix-free</i> is a slight misnomer,
512because we are working with a nonlinear operator and do not linearize the
513operator that in turn could be represented by a matrix. However, fast
514evaluation of integrals has become popular as a replacement of sparse
515matrix-vector products, as shown in step-37 and step-59, and we have coined
516this infrastructure <i>matrix-free functionality</i> in deal.II for this
517reason. Furthermore, the inverse mass matrix is indeed applied in a
518matrix-free way, as detailed below.
519
520The matrix-free infrastructure allows us to quickly evaluate the integrals in
521the weak forms. The ingredients are the fast interpolation from solution
522coefficients into values and derivatives at quadrature points, point-wise
523operations at quadrature points (where we implement the differential operator
524as derived above), as well as multiplication by all test functions and
525summation over quadrature points. The first and third component make use of
526sum factorization and have been extensively discussed in the step-37 tutorial
527program for the cell integrals and step-59 for the face integrals. The only
528difference is that we now deal with a system of $d+2$ components, rather than
529the scalar systems in previous tutorial programs. In the code, all that
530changes is a template argument of the FEEvaluation and FEFaceEvaluation
531classes, the one to set the number of components. The access to the vector is
532the same as before, all handled transparently by the evaluator. We also note
533that the variant with a single evaluator chosen in the code below is not the
534only choice -- we could also have used separate evalators for the separate
535components $\rho$, $\rho \mathbf u$, and $E$; given that we treat all
536components similarly (also reflected in the way we state the equation as a
537vector system), this would be more complicated here. As before, the
538FEEvaluation class provides explicit vectorization by combining the operations
539on several cells (and faces), involving data types called
540VectorizedArray. Since the arithmetic operations are overloaded for this type,
541we do not have to bother with it all that much, except for the evaluation of
542functions through the Function interface, where we need to provide particular
543<i>vectorized</i> evaluations for several quadrature point locations at once.
544
545A more substantial change in this program is the operation at quadrature
546points: Here, the multi-component evaluators provide us with return types not
547discussed before. Whereas FEEvaluation::get_value() would return a scalar
548(more precisely, a VectorizedArray type due to vectorization across cells) for
549the Laplacian of step-37, it now returns a type that is
550`Tensor<1,dim+2,VectorizedArray<Number>>`. Likewise, the gradient type is now
551`Tensor<1,dim+2,Tensor<1,dim,VectorizedArray<Number>>>`, where the outer
552tensor collects the `dim+2` components of the Euler system, and the inner
553tensor the partial derivatives in the various directions. For example, the
554flux $\mathbf{F}(\mathbf{w})$ of the Euler system is of this type. In order to reduce the amount of
555code we have to write for spelling out these types, we use the C++ `auto`
556keyword where possible.
557
558From an implementation point of view, the nonlinearity is not a big
559difficulty: It is introduced naturally as we express the terms of the Euler
560weak form, for example in the form of the momentum term $\rho \mathbf{u}
561\otimes \mathbf{u}$. To obtain this expression, we first deduce the velocity
562$\mathbf{u}$ from the momentum variable $\rho \mathbf{u}$. Given that $\rho
563\mathbf{u}$ is represented as a $p$-degree polynomial, as is $\rho$, the
564velocity $\mathbf{u}$ is a rational expression in terms of the reference
565coordinates $\hat{\mathbf{x}}$. As we perform the multiplication $(\rho
566\mathbf{u})\otimes \mathbf{u}$, we obtain an expression that is the
567ratio of two polynomials, with polynomial degree $2p$ in the
568numerator and polynomial degree $p$ in the denominator. Combined with the
569gradient of the test function, the integrand is of degree $3p$ in the
570numerator and $p$ in the denominator already for affine cells, i.e.,
571for parallelograms/ parallelepipeds.
572For curved cells, additional polynomial and rational expressions
573appear when multiplying the integrand by the determinant of the Jacobian of
574the mapping. At this point, one usually needs to give up on insisting on exact
575integration, and take whatever accuracy the Gaussian (more precisely,
576Gauss--Legrende) quadrature provides. The situation is then similar to the one
577for the Laplace equation, where the integrand contains rational expressions on
578non-affince cells and is also only integrated approximately. As these formulas
579only integrate polynomials exactly, we have to live with the <a
580href="https://mathoverflow.net/questions/26018/what-are-variational-crimes-and-who-coined-the-term">variational
581crime</a> in the form of an integration error.
582
583While inaccurate integration is usually tolerable for elliptic problems, for
584hyperbolic problems inexact integration causes some headache in the form of an
585effect called <b>aliasing</b>. The term comes from signal processing and
586expresses the situation of inappropriate, too coarse sampling. In terms of
587quadrature, the inappropriate sampling means that we use too few quadrature
588points compared to what would be required to accurately sample the
589variable-coefficient integrand. It has been shown in the DG literature that
590aliasing errors can introduce unphysical oscillations in the numerical
591solution for <i>barely</i> resolved simulations. The fact that aliasing mostly
592affects coarse resolutions -- whereas finer meshes with the same scheme
593work fine -- is not surprising because well-resolved simulations
594tend to be smooth on length-scales of a cell (i.e., they have
595small coefficients in the higher polynomial degrees that are missed by
596too few quadrature points, whereas the main solution contribution in the lower
597polynomial degrees is still well-captured -- this is simply a consequence of Taylor's
598theorem). To address this topic, various approaches have been proposed in the
599DG literature. One technique is filtering which damps the solution components
600pertaining to higher polynomial degrees. As the chosen nodal basis is not
601hierarchical, this would mean to transform from the nodal basis into a
602hierarchical one (e.g., a modal one based on Legendre polynomials) where the
603contributions within a cell are split by polynomial degrees. In that basis,
604one could then multiply the solution coefficients associated with higher
605degrees by a small number, keep the lower ones intact (to not destroy consistency), and
606then transform back to the nodal basis. However, filters reduce the accuracy of the
607method. Another, in some sense simpler, strategy is to use more quadrature
608points to capture non-linear terms more accurately. Using more than $p+1$
609quadrature points per coordinate directions is sometimes called
610over-integration or consistent integration. The latter name is most common in
611the context of the incompressible Navier-Stokes equations, where the
612$\mathbf{u}\otimes \mathbf{u}$ nonlinearity results in polynomial integrands
613of degree $3p$ (when also considering the test function), which can be
614integrated exactly with $\textrm{floor}\left(\frac{3p}{2}\right)+1$ quadrature
615points per direction as long as the element geometry is affine. In the context
616of the Euler equations with non-polynomial integrands, the choice is less
617clear. Depending on the variation in the various variables both
618$\textrm{floor}\left(\frac{3p}{2}\right)+1$ or $2p+1$ points (integrating
619exactly polynomials of degree $3p$ or $4p$, respectively) are common.
620
621To reflect this variability in the choice of quadrature in the program, we
622keep the number of quadrature points a variable to be specified just as the
623polynomial degree, and note that one would make different choices depending
624also on the flow configuration. The default choice is $p+2$ points -- a bit
625more than the minimum possible of $p+1$ points. The FEEvaluation and
626FEFaceEvaluation classes allow to seamlessly change the number of points by a
627template parameter, such that the program does not get more complicated
628because of that.
629
630
631<h3>Evaluation of the inverse mass matrix with matrix-free techniques</h3>
632
633The last ingredient is the evaluation of the inverse mass matrix $\mathcal
634M^{-1}$. In DG methods with explicit time integration, mass matrices are
635block-diagonal and thus easily inverted -- one only needs to invert the
636diagonal blocks. However, given the fact that matrix-free evaluation of
637integrals is closer in cost to the access of the vectors only, even the
638application of a block-diagonal matrix (e.g. via an array of LU factors) would
639be several times more expensive than evaluation of $\mathcal L_h$
640simply because just storing and loading matrices of size
641`dofs_per_cell` times `dofs_per_cell` for higher order finite elements
642repeatedly is expensive. As this is
643clearly undesirable, part of the community has moved to bases where the mass
644matrix is diagonal, for example the <i>L<sub>2</sub></i>-orthogonal Legendre basis using
645hierarchical polynomials or Lagrange polynomials on the points of the Gaussian
646quadrature (which is just another way of utilizing Legendre
647information). While the diagonal property breaks down for deformed elements,
648the error made by taking a diagonal mass matrix and ignoring the rest (a
649variant of mass lumping, though not the one with an additional integration
650error as utilized in step-48) has been shown to not alter discretization
651accuracy. The Lagrange basis in the points of Gaussian quadrature is sometimes
652also referred to as a collocation setup, as the nodal points of the
653polynomials coincide (= are "co-located") with the points of quadrature, obviating some
654interpolation operations. Given the fact that we want to use more quadrature
655points for nonlinear terms in $\mathcal L_h$, however, the collocation
656property is lost. (More precisely, it is still used in FEEvaluation and
657FEFaceEvaluation after a change of basis, see the matrix-free paper
658@cite KronbichlerKormann2019.)
659
660In this tutorial program, we use the collocation idea for the application of
661the inverse mass matrix, but with a slight twist. Rather than using the
662collocation via Lagrange polynomials in the points of Gaussian quadrature, we
663prefer a conventional Lagrange basis in Gauss-Lobatto points as those make the
664evaluation of face integrals cheap. This is because for Gauss-Lobatto
665points, some of the node points are located on the faces of the cell
666and it is not difficult to show that on any given face, the only shape
667functions with non-zero values are exactly the ones whose node points
668are in fact located on that face. One could of course also use the
669Gauss-Lobatto quadrature (with some additional integration error) as was done
670in step-48, but we do not want to sacrifice accuracy as these
671quadrature formulas are generally of lower order than the general
672Gauss quadrature formulas. Instead, we use an idea described in the reference
673@cite KronbichlerSchoeder2016 where it was proposed to change the basis for the
674sake of applying the inverse mass matrix. Let us denote by $S$ the matrix of
675shape functions evaluated at quadrature points, with shape functions in the row
676of the matrix and quadrature points in columns. Then, the mass matrix on a cell
677$K$ is given by
678@f[
679\mathcal M^K = S J^K S^\mathrm T.
680@f]
681Here, $J^K$ is the diagonal matrix with the determinant of the Jacobian times
682the quadrature weight (JxW) as entries. The matrix $S$ is constructed as the
683Kronecker product (tensor product) of one-dimensional matrices, e.g. in 3D as
684@f[
685S = S_{\text{1D}}\otimes S_{\text{1D}}\otimes S_{\text{1D}},
686@f]
687which is the result of the basis functions being a tensor product of
688one-dimensional shape functions and the quadrature formula being the tensor
689product of 1D quadrature formulas. For the case that the number of polynomials
690equals the number of quadrature points, all matrices in $S J^K S^\mathrm T$
691are square, and also the ingredients to $S$ in the Kronecker product are
692square. Thus, one can invert each matrix to form the overall inverse,
693@f[
694\left(\mathcal M^K\right)^{-1} = S_{\text{1D}}^{-\mathrm T}\otimes
695S_{\text{1D}}^{-\mathrm T}\otimes S_{\text{1D}}^{-\mathrm T}
696\left(J^K\right)^{-1}
697S_{\text{1D}}^{-1}\otimes S_{\text{1D}}^{-1}\otimes S_{\text{1D}}^{-1}.
698@f]
699This formula is of exactly the same structure as the steps in the forward
700evaluation of integrals with sum factorization techniques (i.e., the
701FEEvaluation and MatrixFree framework of deal.II). Hence, we can utilize the
702same code paths with a different interpolation matrix,
703$S_{\mathrm{1D}}^{-\mathrm{T}}$ rather than $S_{\mathrm{1D}}$.
704
705The class MatrixFreeOperators::CellwiseInverseMassMatrix implements this
706operation: It changes from the basis contained in the finite element (in this
707case, FE_DGQ) to the Lagrange basis in Gaussian quadrature points. Here, the
708inverse of a diagonal mass matrix can be evaluated, which is simply the inverse
709of the `JxW` factors (i.e., the quadrature weight times the determinant of the
710Jacobian from reference to real coordinates). Once this is done, we can change
711back to the standard nodal Gauss-Lobatto basis.
712
713The advantage of this particular way of applying the inverse mass matrix is
714a cost similar to the forward application of a mass matrix, which is cheaper
715than the evaluation of the spatial operator $\mathcal L_h$
716with over-integration and face integrals. (We
717will demonstrate this with detailed timing information in the
718<a href="#Results">results section</a>.) In fact, it
719is so cheap that it is limited by the bandwidth of reading the source vector,
720reading the diagonal, and writing into the destination vector on most modern
721architectures. The hardware used for the result section allows to do the
722computations at least twice as fast as the streaming of the vectors from
723memory.
724
725
726<h3>The test case</h3>
727
728In this tutorial program, we implement two test cases. The first case is a
729convergence test limited to two space dimensions. It runs a so-called
730isentropic vortex which is transported via a background flow field. The second
731case uses a more exciting setup: We start with a cylinder immersed in a
732channel, using the GridGenerator::channel_with_cylinder() function. Here, we
733impose a subsonic initial field at Mach number of $\mathrm{Ma}=0.307$ with a
734constant velocity in $x$ direction. At the top and bottom walls as well as at
735the cylinder, we impose a no-penetration (i.e., tangential flow)
736condition. This setup forces the flow to re-orient as compared to the initial
737condition, which results in a big sound wave propagating away from the
738cylinder. In upstream direction, the wave travels more slowly (as it
739has to move against the oncoming gas), including a
740discontinuity in density and pressure. In downstream direction, the transport
741is faster as sound propagation and fluid flow go in the same direction, which smears
742out the discontinuity somewhat. Once the sound wave hits the upper and lower
743walls, the sound is reflected back, creating some nice shapes as illustrated
744in the <a href="#Results">results section</a> below.
745