High-order semi-Lagrangian kinetic scheme for compressible turbulence

Turbulent compressible flows are traditionally simulated using explicit time integrators applied to discretized versions of the Navier-Stokes equations. However, the associated Courant-Friedrichs-Lewy condition severely restricts the maximum time step size. Exploiting the Lagrangian nature of the Boltzmann equation's material derivative, we now introduce a feasible three-dimensional semi-Lagrangian lattice Boltzmann method (SLLBM), which circumvents this restriction. While many lattice Boltzmann methods for compressible flows were restricted to two dimensions due to the enormous number of discrete velocities in three dimensions, the SLLBM uses only 45 discrete velocities. Based on compressible Taylor-Green vortex simulations we show that the new method accurately captures shocks or shocklets as well as turbulence in 3D without utilizing additional filtering or stabilizing techniques, even when the time step sizes are up to two orders of magnitude larger compared to simulations in the literature. Our new method therefore enables researchers for the first time to study compressible turbulent flows by a fully explicit scheme, whose range of admissible time step sizes is only dictated by physics, while being decoupled from the spatial discretization.

Numerical simulations have become an indispensable tool to understand their physics, and many studies exploring compressible turbulent flows have been conducted using high-order compact finite difference, optimized dispersion-relation preserving schemes [19,[24][25][26][27][28][29][30] for the spatial derivatives, often combined with low-dispersiondissipation Runge-Kutta schemes for time-integration [19,31,32]. Although these methods provide accurate results, the time steps are generally small [33], because of the methods' Eulerian time derivatives, which describe how the variables of interest pass through fixed locations in the field. Thus, the admissible time step sizes are tightly linked to spatial resolution. This issue is for many discretizations linked to the Courant-Friedrichs-Lewy (CFL) condition, using linear stability theory, relating a characteristic velocity c to the spatial and temporal discretization intervals δ x and δ t , respectively (see [34], for example). Though implicit time integration schemes often provide larger stability domains, their application can be unfeasible for transient problems due to their computational cost. Explicit time integration schemes with schemespecific CFL max , by contrast, enforce small time steps δ t for high flow velocities, typically occurring in many high-speed compressible flows. Another obvious way to circumvent the CFL condition in Eq. (1) is to incorporate Lagrangian time derivatives, which track the motion of the variables of interest moving through the domain. In practice, Semi-Lagrangian (SL) schemes are used instead, which provide a viable alternative to the discretization of Eulerian time derivatives. SL schemes discretize the Lagrangian solution by tracking the trajectories back in time. The prefix "semi" indicates that the trajectories' end points usually do not coincide with the simulation grid points, which requires application of an appropriate interpolation scheme. SL methods were successfully incorporated in algorithms solving the Navier-Stokes equations [35], although tracking of the fluid trajectories was often found to be cumbersome, introducing additional errors [36]. The major advantage when using SL schemes in kinetic theory is that the trajectories are linear, resulting in cancellation of the tracking error. Consequently, SL schemes were both applied to the Vlasov equation [37][38][39] and to the Bhatnagar-Gross-Krook (BGK)-Boltzmann equation [40,41]. Recently, we introduced the semi-Lagrangian lattice Boltzmann method (SLLBM) [42,43] for compressible flows [44], which solves the lattice Boltzmann equation using a high-order SL streaming step.
In this article, we explore the capabilities of the SLLBM for three-dimensional compressible flows. Furthermore, we demonstrate that the SLLBM remains stable for time step sizes that exceed typical CFL constraints of Eulerian solvers by orders of magnitude. To yield a lean scheme, the SLLBM is combined with state-of-theart cubature rules for the velocity discretization [45][46][47]. This combination proves capable of modeling compressible turbulence with time steps that are at least one order of magnitude larger than in standard Eulerian methods and decouple the spatial from the temporal discretization.

A. Background
We start with a critical look at the more specialized lattice Boltzmann method (LBM) [48]. Despite the successes of the standard LBM in the computation of multiphase [49], particle-laden [50], thermal [51], or turbulent flows [52,53], compressible LBM [54] were overlooked for a relatively long time, but regained attraction during the last decade [55][56][57][58][59][60][61][62][63][64][65][66][67][68]. Let us recall the force-free BGK-Boltzmann equation with the continuous distribution function f , the equilibrium distribution function f eq , the particle velocity ξ, and the relaxation time λ. To discretize Eq. (2), the original LBM is based on three key principles. First, the equilibrium distribution function f eq is polynomially expanded into a series of Hermite polynomials H (n) , with expansion coefficients being the equilibrium moments a (n) eq [69], where N is the expansion order and ω(ξ) the weight function. Since a (n) eq and H (n) are symmetric tensors of rank n, the product involves contraction to all D n scalar components, depending on dimension D. Second, a Gauß-Hermite quadrature is applied to the unbounded velocity space of the Boltzmann equation, leading to discrete particle velocity sets [69]. The moments are then found by the quadrature with the weighted discrete distribution functions f i = w i f (ξ i )/ω(ξ i ). The combination of Q discrete particle velocities ξ i and weights w i , the velocity set, is usually derived by the Gauß-product rule applied to a onedimensional Gauß-Hermite quadrature. Third, the discrete Boltzmann equation is integrated along characteristics with an inherent Lagrangian discretization of the Boltzmann equation's material derivative to obtain a stable numerical scheme and second-order temporal convergence [70]. Unfortunately, the LBM in its original formulation is mainly restricted to Cartesian grids and velocity discretizations that match the regular lattices. The customary "D2Q9" based on second-order expansion in Eq.
(3) is plagued by a cubic error being proportional to the Mach number. Consequently, compressible simulations either demand correction terms that annihilate the errors and restore Galilean invariance [64,71] or higher-order discretizations of Eqs. (3) and (4) which render abscissae that reside off-lattice. Utilization of such velocity sets therefore requires an efficient off-lattice Boltzmann solver. Previous Eulerian off-lattice Boltzmann schemes [72][73][74][75][76], like finite difference or finite volume LBM, would be suited in principle. However, they sacrifice the Lagrangian time integration along characteristics. Moreover, their time step is severely restricted by a CFL condition, Eq. (1), with respect to the fast discrete particle velocities.
In contrast, the SLLBM preserves all of the aforementioned key principles of the LBM but it also decisively extends its capabilities. In previous works [42][43][44] we have shown that a high-order interpolation increases the spatial order of the method and nihilates mass losses. Also, we have demonstrated the unconditional stability of the advection step, when incorporating Gauß-Lobatto-Chebyshev nodes for the interpolation up to third order, and that the stability is practically not affected even with fourth order. The flexibility in terms of meshing and velocity sets encouraged us to search for efficient quadrature rules solving the weight function. This research led us to long-established cubature rules [45][46][47], i.e. multivariate quadratures, which are often used in Kalman filters, e.g., in [77].

A. Compressible semi-Lagrangian LBM
The compressible SLLBM uses the established lattice Boltzmann equation with the BGK collision operator [42] Here, h i denotes either f i or the second distribution function g i related to the variable heat capacity ratio γ. The shifted dimensionless relaxation parameter τ = λ/δ t +0.5 = µ/(P c s δ t )+0.5 depends on dynamic viscosity µ, lattice speed of sound c s , and pressure P = ρT with density ρ and temperature T . The discrete equilibrium distribution function f eq i is To adjust the heat conductivity, a quasiequilibrium approach [78] is applied to Eq. (5), for more details see Appendix D.
Density ρ, momentum ρu, and energy E are determined by Appendix C demonstrates the connection of the present SLLBM model with the macroscopic equations using a Chapman-Enskog analysis [79]. The integration along characteristics, hidden in Eq. (5), incorporates a secondorder temporal error, whose order can be increased by multistep schemes [80]. In standard LBMs, the particle velocities ξ i in Eq. (5) are designed to end up on one of the neighboring nodes, and the time step size is invariably set to unity for the same purpose. By contrast, the SLLBM's particle distribution functions are still integrated along characteristics, but the departure points may reside offside the grid, i.e. they are off-lattice. To recover the off-lattice values an interpolation is needed. While several interpolation strategies are possible, we chose a cell-oriented approach, which means that once a departure point is identified, the degrees of freedom pointsĥ iΞj in the enclosing cell are used for the interpolation in cell Ξ and with the basis functions ψ Ξj . A threedimensional reference cell with polynomial order p = 4 is shown in Fig. 1. For each of the N support points in the simulation, there are Q particle velocities, i.e. there are N · Q departure points to be identified. Therefore, at the beginning of the simulation the path from each support point to the corresponding Q departure points is tracked through all adjacent cells. Then a sparse matrix Ψ stores the interpolation coefficients ψ Ξj belonging to the departure point's position; the algorithm is presented in [43]. The actual streaming step is expressed as a matrix-vector whereas the collision step remains local.

B. Cubature-based velocity sets
The discretization of the velocity space is a key principle for any simulation with the lattice Boltzmann method. If a quadrature is to be applied, it must be suited to integrate the weight function exp(−x 2 ), and it has to be of ninth degree to enable compressible flow simulations [81]. A prominent method to derive two-and three-dimensional velocity sets is the Gauß-product rule applied to a one-dimensional quadrature. Application to the one-dimensional degree-nine Gauß-Hermite quadrature delivers a two-dimensional D2Q25 off-lattice velocity set with 25 abscissae, which we used for previous work [44]. Due to its structure, this velocity set is infeasible for standard on-lattice streaming but perfectly suited for the SLLBM.
For the simulations in this work we used velocity sets derived by cubature rules [45,46], exhibiting the same degree of precision but consisting of fewer support points to lower computational cost. In two dimensions we employed the degree-nine D2Q19 velocity set with 19 abscissae [82] that we have presented in recent work [47]. In three dimensions, the Gauß-product rule led to noncompetitive 125 abscissae for a three-dimensional degree-nine D3Q125 velocity set. Therefore, we successfully identified and implemented a degree-nine D3Q45 velocity set with only 45 components, which fulfills all requirements in terms of symmetry. The D3Q45 was derived by a cubature rule after Konyaev [83] and has recently been listed in the supplemental material of a cubature article by van Zandt [84]. The resulting discrete velocities are shown in Fig. 2, whereas weights and abscissae of the D2Q19 and D3Q45 are listed in [47].

III. RESULTS
We test the proposed method through three test cases. The first two test cases, a temporally underresolved Sod shock tube and a two-dimensional Riemann problem, demonstrate the effect of extraordinarily large time step sizes on the simulations. The third test case is the compressible three-dimensional Taylor-Green vortex at different Reynolds and Mach numbers to present the capability of SLLBM to simulate both turbulent and shocked three-dimensional flows.

A. Temporally underresolved Sod shock tube
The Sod shock tube illustrates the large range of time step sizes accessible to the SLLBM. The domain x ∈ [0, 1] Three-dimensional D3Q45 velocity set with 45 abscissae, derived by a degree-nine cubature rule originally by Konyaev [83] was divided into two regions at x = 0.5 with initially and a viscosity of µ = 7 · 10 −4 . The domain x ∈ [0, 1] was discretized using 2000 cells at third polynomial order, i.e. 6000 grid points in x direction. Despite this fine spatial resolution, which we only chose for demonstration purposes, the time step size was set to δ t = 0.001, such that the solution at t = 0.1 was reached by performing only 100 time steps. Fig. 3 shows that the SLLBM accurately traces the shock front despite the extremely large time step. For a smaller time step size of δ t = 0.0005 and viscosity of µ = 10 −5 the simulation results perfectly matched the inviscid reference solution. To get an impression of the time step size of other solvers, we repeated the simulations with the lower viscosity µ = 7 · 10 −4 using the same collision process and velocity discretization, but this time applying a spectralelement discontinuous Galerkin solver for the streaming step [43,74]. This solver also features high-order solutions, but requires a dedicated time integrator and the time step size is bounded by the CFL condition (1). The simulation produced nearly identical results as the SLLBM with δ t = 0.001 (therefore not shown), but required δ t = 0.000086 to be numerically stable, i.e., 1154 time steps with an explicit exponential time integrator [85] and δ t = 0.00005, i.e., 2200 time steps with the more common fourth-order Runge-Kutta method.
As the distance of the departure points from the active nodes is proportional to the time step size (Eq. 5), the departure points of the SLLBM were located up to seven cells away. The trajectory is shown in Fig. 4 for an exemplary departure point. It is obvious that the CFL restriction of explicit Eulerian solvers prohibits the exchange of information crossing multiple cells. This property is of special interest in the case of simulations with body-fitted meshes, where the spatial extent of the smallest cells usually dictates the time step size of the whole simulation. As opposed to Eulerian solvers, the maximum stable time step size in the SLLBM is proportional only to the viscosity and not dictated by the mesh size. On top of that, when doubling the number of cells the number of SLLBM time steps can still be kept constant, whereas it inevitably doubles for the explicit discontinuous Galerkin solver. Finally, Fig. 5 confirms that the SLLBM is also capable to stably simulate the shock tube with a lower resolution of 100 cells, polynomial order p = 4 and 400 grid points, without additional numerical dissipation measures. Here, the time step size was set to δ t = 0.005 and δ t = 0.0005 with viscosities µ = 0.002 and µ = 0.0002, respectively.

B. 2D Riemann study of time step size effects
Case 12 of the two-dimensional Riemann problems was intensively studied by Lax [86] as well as Kurganov and Tadmor [87]. In one of our last works [44] we already showed that the SLLBM is capable to resolve the density contours of this test case with good visual agreement to the references. This time, similar to the shock tube test case in Section 3, we examined the effect of the time step size onto this test case. The initial conditions of the four quadrants are [76] (ρ, u x , u y , p) = The heat capacity ratio was γ = 1.4, the number of cells was N Ξ = 128 2 with polynomial order p = 4, i.e., N = 512 2 grid points. The simulation end time was t end = 0.25. Fig. 6 shows the density contours in the interval ρ ∈ [0.412, 1.753], beginning with a small time step size and low viscosity in simulation a). Each of the figures also lists the time step size and viscosity. When increasing the time step size, the viscosity also needs to be increased to ensure stable simulations as depicted for simulation b). The effect of a maladjusted time step size is shown in simulation c), where oscillations occur in the top right corner near the shock fronts. Finally, simulation d) depicts the density contours for a approximately 50 times larger time step size than for case a). In this case, the shock fronts are widened, comparable to the observations in Section III A.

C. Compressible Three-Dimensional Taylor-Green vortex
To show that the SLLBM captures the intricate interactions between turbulent and compressible features including shocklets, the compressible three-dimensional Taylor-Green vortex flow was simulated. On the triplyperiodic domain S = [0, 2π] 3 , the initial conditions with constant temperature initial condition (CTIC) are with velocities u, density ρ, Mach number Ma, and temperature T . The numerator C differs between the cases in this work with C = 1 for Re = 100 and Re = 400 as well as C = γMa 2 for Re = 1600. The Reynolds number is defined as Re = 1/ν ∞ , where the subscript ∞ denotes the value at T = 1. The Prandtl number is Pr = 0.7; the heat capacity ratio is γ = 1.4. The dynamic viscosity µ = νρ obeys the Sutherland law In comparison to forced or decaying isotropic turbulence, this test case enables a deterministic initialization and thus an easier and more objective comparison. As a general rule, lower viscosities demand smaller time step sizes. When choosing the time step size too large for a given viscosity, the simulations tend to become unstable near the shock fronts, as depicted in c). By contrast, simulation b) remains stable due to a 57 percent reduction of the time step size with equal prescribed viscosity.

Reynolds number Re = 400
Peng and Yang thoroughly studied the compressible Taylor-Green vortex at Reynolds number Re = 400 [88]. The original work used a compact eighth-order finite difference scheme [24] to discretize the convective terms in the Navier-Stokes equation in combination with a seventh-order weighted essentially non-oscillatory (WENO) scheme. The present compressible SLLBM uses fourth-order polynomials for the equilibrium (Eq. (3)) and for the interpolation (Eq. (8)), without utilizing ad-ditional filtering or stabilizing techniques. Moreover, to satisfy the CFL condition, the original work applied a time step size of δ t = 0.0005, wheares the SLLBM was capable to utilize time step sizes two orders of magnitude larger: δ t = 0.017 for Mach number Ma = 0.5, δ t = 0.033 for Ma = 1.0, δ t = 0.018 for Ma = 1.5, and δ t = 0.012 for Ma = 2.0. The spatial resolution was N points = 256 3 , whereas the reference operated with 512 3 grid points.

depicts the solenoidal dissipation defined as
All Mach numbers quite accurately matched the reference. The effect of changing the time step to similar levels as in the above mentioned reference can be seen for the smaller Mach numbers Ma = 0.5 and Ma = 1.0. The small deviation from the reference seen here is reduced by decreasing the time step size to similar levels, here δ t = 0.003 in both cases.
The dilatational dissipation is a measure for pressure work in the simulation. Fig. 9 shows that the dilatational effects are strong in the beginning at small simulation times, surmounting the solenoidal dissipation. The comparison to the reference shows a slight deviation from the reference and mild oscillations near the peak values around t = 2.5.

Reynolds number Re = 1600
The Taylor-Green vortex at Reynolds number Re = 400 shows strong dilatational effects, but the transition to fully developed turbulence requires higher Reynolds numbers. Therefore, another recent work by Lusher and Sandham examined this test case at Reynolds number Re = 1600 up to Mach number Ma = 1.25 [89]. In their study, the authors compared high-order finite difference schemes equipped by WENO or targeted essentially non-oscillatory (TENO) schemes of different orders. The present work re-examines the Mach numbers Ma = 1.0 and Ma = 1.25 up to t = 20, with time step sizes δ t = 0.010, and δ t = 0.009, respectively. These time step sizes were still 20 times larger than those in the reference [89]. The resolution was N points = 256 3 , whereas the reference used 512 3 grid points. Fig. 10 demonstrates that, just as in the case Re = 400, the kinetic energy over time is excellently reproduced for both Mach numbers. Fig. 11 shows the solenoidal dissipation in comparison to the reference with even better results for the higher Mach number Ma = 1.25. The good agreement confirms the little numerical dissipation introduced by the SLLBM at polynomial order p = 4 in turbulent flows at transonic Mach numbers. By contrast, the interpolation order p = 2 worsens the solution despite the identical resolution of N points = 256 3 . Next, we investigated the dilatational dissipation. During the early phase of the simulation the large vortex structures begin to entangle. This entanglement leads to strong moving shock-like structures or turbulent shocklets with strong negative dilatation [89] and local numerical oscillations of the macroscopic variables due to an underresolution of these shocklets. A slice of the density field illustrating the shocklets, indicated by large jumps in the governing variables, is shown in Fig. 13. The size of the jumps agreed well with that obtained via classical Rankine-Hugoniot jump conditions. Additionally, during the early phase, the Mach numbers can be higher than the initially prescribed Mach numbers. Moreover, to compute Eq. (19), we made use of the gradients of the interpolation polynomials, which-in contrast to the distribution functionsare not continuous over the cells. It is likely that this approach to compute the dilatational dissipation faces issues at strong shocklets. This explains the oscillations for Ma = 1.25 in Fig. 12, which depicts the dilatational dissipation. Despite these deviations from the reference in the beginning, the SLLBM was able to reproduce the  dilatational dissipation well for the rest of the simulation, as shown in Fig. 12, without using additional stabilizing measures like filtering or artificial diffusivity.

Reynolds number Re = 100 for higher Mach numbers
As a last test case, Fig. 14 shows that for Re = 100 we were able to stably simulate the given test case for Mach numbers Ma = 2. the SLLBM was once more able to use large times steps δ t = 0.015 and δ t = 0.03, respectively. Although only shown at low Reynolds numbers, these stable simulations indicate the principal capability of the SLLBM to perform stable simulations at high Mach numbers.

IV. DISCUSSION
As demonstrated by the numerical experiments, the SLLBM is able to simulate complex two-and threedimensional compressible flows with shocks and shocklets. There are two main arguments in favor of the method. First, the method works with large and adjustable time step sizes. Second, cubature rules provide lean velocity sets resulting in an efficient scheme. The D3Q45 used in this work is, to the authors knowledge, the smallest known degree-nine velocity set and key to reduce errors caused by interpolation.
The SLLBM is not subject to a CFL condition, as already shown in past works [42] and as confirmed by the shock tube simulations in this work. In contrast to the usual spatial filtering, the independence of the CFL condition opens the field of temporal large eddy simulations (TLES) with fine spatial, but coarse temporal resolutions [90,91]. Since no dedicated filter operation has been used, we classify the scheme as some sort of temporal implicit large eddy simulation (TILES). Despite the temporally coarse resolutions, most relevant flow features during the simulations were preserved, or they were recovered by scaling the time step size down: a key feature of the SLLBM that on-lattice Boltzmann schemes often miss.
Although there is no CFL condition, the simulations are not automatically stable, because the collision step may still be prone to instabilities at small relaxation times as pointed out by various works, e.g. by Siebert et al. [92]. That said, the stability regions of the BGK collision operator combined with the D3Q45 and fourthorder equilibria (at high Mach numbers and adjustable time step sizes) are yet unknown. Nevertheless, we observed that the SLLBM produces good results without using additional filtering or stabilizing techniques. To accomplish simulations with even higher Reynolds and Mach numbers with large time step sizes, the application of such techniques to the SLLBM might be beneficial, though.

A. Comparison to other LBM solvers
Let us now compare the SLLBM for three-dimensional compressible flows to other LBM solvers, with distinction between on-lattice solvers and off-lattice solvers. Since we discussed many aspects in recent works [44,47], we mainly restrict the discussion to compressible LBM solvers with applications to three-dimensional turbulent flows.
To perform a time step, off-lattice Boltzmann methods typically require a special treatment of the distribution function values such as the discretization by finite difference schemes [93][94][95] or finite volume schemes [76] and the application of a dedicated time integrator, e.g., a Runge-Kutta scheme. As an example for these Eulerian time integration schemes, Chen et al. presented compressible decaying three-dimensional isotropic turbulence simulations obtained by a modified discrete unified gas kinetic scheme (DUGKS), which is essentially a finite volume LBM with second-order spatial and temporal accuracy [96]. Also, the authors made use of a high-order Gaussian quadrature to discretize the velocity space, but they relied on a decisively larger D3Q77 velocity set with identical quadrature degree as the D3Q45. Like all explicit Eulerian time integration schemes, DUGKS is subject to the CFL condition (1).
A second category of off-lattice schemes are interpolation-based or semi-Lagrangian implementations. Pavlo et al. pioneered in using second-order interpolations to incorporate high-order velocity sets for simulations of thermal flows [97,98]. Renowned semi-Lagrangian implementations are the Particles on Demand method [61], which uses dynamically shifted velocity sets to reach high Mach numbers or the SLLBM for compressible flows on unstructured meshes by Saadat et al. [65]. Unlike the present method, the authors of the latter approach used a D2Q9 velocity set and computed correction terms by exploiting the gradients of the distribution functions, which practically come at low costs when using interpolation polynomials. To the best of our knowledge, these methods have not been applied to complex three-dimensional flows, except for a spherical Riemann problem presented by Zakirov et al. using a D3Q125 velocity set [99]. Additionally, one decisive feature of the present method is the organization of support points by cells, enabling three-dimensional high-order solutions.
On-lattice Boltzmann solvers have the Lagrangian integration along characteristics in common with interpolation-based schemes, albeit constrained by a strong coupling of space and time discretization. This explains, the rather large time step sizes of on-lattice Boltzmann methods with increasing Mach number [47,100]. On-lattice Boltzmann methods generally exhibit secondorder accuracy in space and time, but they are not as fiercely concerned by numerical diffusion as low-order off-lattice Boltzmann methods. In general, the computational costs of on-lattice Boltzmann methods' streaming step are low. On the downside, they often suffer from large velocity sets. For example, Frapolli et al. [55,57,101] were able to simulate various threedimensional compressible flows, including isotropic decaying turbulence simulations or the flow around an Onera M6 wing, by the entropic lattice Boltzmann method. In many ways, their works were groundbreaking for the development of compressible LBMs. However, the authors used a velocity set with 343 discrete velocities and the weights were temperature-dependent, which limits the temperature range of the method. In addition, in many on-lattice Boltzmann frameworks the time step size is not adjustable due to the linking of spatial and temporal discretization [100]. An exception to this are hybrid lattice Boltzmann methods, which solve the temperature field separately [58,62,102,103]. In this case, by adapting the speed of sound, the time step size can be varied. By using numerical equilibria instead of the polynomial equilibrium in this work, Latt et al. were able to simulate a three-dimensional flow around a sphere [66] by using only 39 discrete velocities, as similarly proposed by Frapolli [100]. Simulations of complex turbulent flows by this method are not available, yet. Recently, Saadat et al. [104] proposed an on-lattice Boltzmann model with a regular D3Q27 velocity set to perform three-dimensional decaying compressible isotropic turbulence simulations.
Despite the low degree of the velocity set, this model proved capable to simulate shocked flows up to Mach numbers of Ma 1.5. This is achieved by correcting the error-prone high-order moments by expressions obtained from applying finite differences to the macroscopic variables.
In summary, the number of compressible LBM solvers for three-dimensional flows is still limited for both onlattice and off-lattice Boltzmann schemes, which indicates the value of the presented SLLBM framework to set a pattern for future research.

B. Numerical efficiency
The numerical efficiency of the SLLBM depends on the implementation of the collision and the streaming step.
The former requires at each support point the calculation of the discrete equilibrium function values given by Eq. (6). Once the density, velocities, and temperature are gained from the distribution function values, the determination of the equilibrium, however, is well parallelizable. Note, that the tensors in Eq. (6) are symmetric, so that many entries need to be computed only once per time step and support point, e.g., a xxxy,eq is equal to a yxxx,eq and all other index permutations. In addition, the entries of the Hermite tensors H i are constant for a given velocity set and only need to be determined once for the whole simulation.
Another decisive factor of the simulation performance is the streaming step, which requires a "triquartic" interpolation, i.e., interpolation polynomials of order p = 4 in three dimensions. The reference cell is shown in Fig. 1. To interpolate one distribution function value at a given departure point, all (p + 1) D = 125 support points of the cell are processed with the interpolation coefficients that are stored in the Q matrices Ψ i . These matrices are the cornerstone for the whole simulation. When accounting to equally shaped cells, like in all simulations in the present work, the size of the matrices can be significantly reduced, since the matrices are identical over nearly all cells (boundary cells excepted). However, for irregular or distorted three-dimensional meshes, the size of the matrices grows quickly, rendering the streaming step memory-bound. Therefore, matrix-free implementations [105] appear to be an attractive extension for SLLBM implementations. In matrix-free versions the interpolation coefficients will be computed afresh in each time step, which is potentially faster for today's highperformance computing clusters.
At first glance, the second distribution function appears to be a severe limitation of the method. On closer inspection, however, it turns out that the overhead is kept in reasonable limits since the computation-intensive equilibrium distribution g eq i linearly depends on f eq i and the interpolation coefficients can be used for both f i and g i . Still, all aforementioned issues pay off in light of the large time step sizes of semi-Lagrangian implementations. A major way to further reduce the computational cost is the reduction of discrete velocities in the Gaussian quadrature. The research for even compacter cubature rules is still ongoing [47], so that future degree-nine velocity sets will possibly be even more efficient.
One last remark regarding the equilibrium distribution function determined via Eq. (6). In the past the Hermite expansion-based equilibrium was deemed causing instabilities due to negative distribution function values at high Mach numbers [100]. Fig. 15 depicts the equilibrium function values f eq i over the Mach number Ma x in x-direction. The figure manifests that most of the discrete velocities significantly diverge for Mach numbers larger than Ma > 2.0 towards either ∞ or −∞. However, our simulations showed that simulations even up to Ma = 3.0 remained stable. This observation indicates that negative distribution function values are not per se a source of instabilities, since the values' sole role is to encode the moments of different orders. For the first moments this encoding can also be seen in Fig. 15: despite the increasing first-order moment of the momentum, the "zeroth-order" moment of the density remains ρ = 1.0. To ensure this, negative values are a consequence at high Mach numbers due to the discretization of the velocity space via Gaussian quadratures.

V. CONCLUSION
The SLLBM for three-dimensional compressible flows is a viable alternative to other solvers. The SLLBM allows very large time steps sizes not restricted by the customary CFL condition. Although the presented SLLBM is a fourth-order spatial method and accurately captures shocks as well as turbulence, no stabilization or filtering were required for the presented test cases. Due to these unique features, the cubature-based fully explicit SLLBM enables researchers to perform compressible turbulence simulations, in which the admissible time step sizes are decoupled from the spatial discretization, opening a new field of affordable simulations for compressible turbulent flows. The equilibrium moments up to fourth order read αβ,eq = Π eq αβ = ρ(u α u β + T 0 (θ − 1)δ αβ ) (A1c) where θ = T /T 0 is the relative temperature.

Appendix B: Hermite tensors
The scaled Hermite tensors withξ i = ξ i /c s up to fourth order read

Appendix C: Chapman-Enskog analysis
This section shows the approximation of the compressible Navier-Stokes equations, when applying a Chapman-Enskog analysis to the SLLBM model. To that end, a second-order multiscale expansion is applied to the temporal derivatives where is a smallness parameter usually identified as the Knudsen number [79] and the discrete distribution These expansion terms are applied to a second-order Taylor expansion of Eq. (5) with the material derivative [106] Next, Eq. (C3) is applied to (C4) and the terms of same order are collected. The zeroth-order terms of order 0 are with the trivial relation When applying the moment relations of Eq. (7) to the last Equation, we need to differ between the moments of f i and g i , i.e, showing mass, momentum and energy conservation during the collision step, which also implies that As a second step, the order 1 terms are collected This time, the relations of Eq. (7) yield for the zeroth moment for the first moments with the help of Eq. (A1c) and and for the total energy for the temperature. These are the Euler equations; to derive the Navier-Stokes equations, the terms of order 2 also need to be gathered. This leads to Eq. (C12) simplifies the last Equation's derivatives Due to vanishing terms, the zeroth moment of Eq. (C18) is Then, the slightly more complicated part begins with the first moments of (C18) To express the nonequilibrium moment Π (1) αβ by equilibrium moments, we derive the second order moment of (C12), which is and then complements Eq. (C20) The equilibrium moments can be explicitly specified by Equations (A1c) and (A1d). By applying the product rule and by using the Euler equations for mass (C13) and momentum (C14) and temperature (C16) to replace the time derivatives, one obtains αβγ =∂ γ (ρu α u β u γ ) + P ∂ β u α + P ∂ α u β + ∂ γ (P u γ )δ αβ + u α ∂ β P + u β ∂ α P. (C24) This turns Eq. (C22) into As a last step, the total energy is determined where q (1) is a contracted variant of Q Resembling q β , the tensor r βγ is the contracted variant of R αβγδ detailed in Eq. (A1e). Again, by a number of replacements, one obtains [107] q (1) This term now complements Eq. (C26). Finally, by summing up all contributions of orders 0 , 1 , and 2 , the compressible Navier-Stokes equations are derived ∂ t ρ + ∂ α (ρu α ) = 0, (C31) ∂ t (ρu α ) + ∂ β (ρu α u β + P δ αβ ) = ∂ β (σ αβ ), (C32) ∂ t (ρE) + ∂(ρEu β ) = ∂ β (λ∂ β T ) + ∂ β (u γ σ βγ ), (C33) with dynamic viscosity µ = τ P , thermal conductivity λ = τ P (C v + 1) = τ P C p , since the heat capacity at constant pressure is defined as C p = C v + 1. The stress tensor denotes Since the dynamic viscosity depends on the local pressure P , the relaxation parameter has to be pressuredependent, i.e., τ = µ/(P c s δ t ) + 0.5. From Eq. (C34) the bulk viscosity, with respect to the notation in Eq. (C32)), can be identified as µ b = µ/C v . Note that the presented derivation links the thermal conductivity to the dynamic viscosity. In our approach, we used the quasi-equilibrium approach to adjust the Prandtl number Pr = (µC p )/λ. For the corresponding extension of the Chapman-Enskog analysis, we refer to [56].
Appendix D: Quasi-equilibrium approach for variable Prandtl number To obtain a variable Prandtl number Pr, the following equation replaces Eq. (5) with τ Pr = (τ − 0.5)/Pr + 0.5. The quasinonequilibrium h * i is thereby obtained by first computing the centered heat flux tensor Then, the nonequilibrium partQ neq αβγ =Q αβγ −Q eq αβγ is applied to the Hermite tensor H iαβγ , via full contraction of indices.