Multi-scale semi-Lagrangian lattice Boltzmann method

We present a multi-scale lattice Boltzmann scheme, which adaptively refines particles' velocity space. Different velocity sets, i.e., higher- and lower-order lattices, are consistently and efficiently coupled, allowing us to use the higher-order lattice only when and where needed. This includes regions of either high Mach number or high Knudsen number. The coupling procedure of different lattices consists of either projection of the moments of the higher-order lattice onto the lower-order lattice or lifting of the lower-order lattice to the higher-order velocity space. Both lifting and projection are local operations, which enable a flexible adaptive velocity set. The proposed scheme can be formulated both in a static and an optimal, co-moving reference frame, in the spirit of the recently introduced Particles on Demand method. The multi-scale scheme is first validated through a convected athermal vortex and also studied in a jet flow setup. The performance of the proposed scheme is further investigated through the shock structure problem and a high Knudsen Couette flow, typical examples of highly non-equilibrium flows in which the order of the velocity set plays a decisive role. The results demonstrate that the proposed multi-scale scheme can operate accurately, with flexibility in terms of the underlying models and with reduced computational requirements.


I. INTRODUCTION
With its roots in kinetic theory, the lattice Boltzmann method (LBM) describes the evolution of fluid flow via the propagation and collision of discretized particle distribution functions (populations) f i (x, t), which are associated with a set of discrete velocities and constructed to recover the macroscopic Navier-Stokes equations (NSE) in the hydrodynamic limit. LBM has matured to a competitive alternative to conventional numerical solvers, with a vast range of applications including compressible flows [1], complex moving geometries [2], multiphase flows [3,4] and rarefied gas dynamics [5], to mention a few.
While LBM has indeed conquered a large range of fluid dynamics, most popular LBM models use so-called standard lattices such as the D2Q9 or the D3Q27 in two or three dimensions (D = 2, 3) with Q=9 and Q=27 discrete velocities, respectively. While this is mainly due to their simplicity and efficiency, the limited number of speeds puts severe restrictions on their range of validity. On the other hand, a systematic increase of the number of velocities to so-called high-order or multispeed lattices has been shown to extend the range of validity significantly. High-order lattices can be constructed systematically either by discretizing the Boltzmann equation on the roots of the Hermite polynomials [5,6] or by entropy considerations, yielding a set of so-called admissible lattices [7,8]. Note that the roots of the Hermite polynomials are irrational numbers and thus require off-lattice propagation schemes such as the semi-Lagrangian LBM [9,10], whereas admissible lattices as in [7,8] remain on-lattice with integer-valued velocities. The increase of accuracy of such lattices can be exploited in many appli- * Corresponding author; ikarlin@ethz.ch cations such high-speed flows, non-equilibrium gas flows or relativistic fluids [1,[11][12][13] to name a few.
In this paper, we will restrict our attention to compressible as well as non-equilibrium flows but the proposed concepts are generic and can be used for all multiscale applications using high-order lattices. In particular, it is well known that the lack of Galilean invariance and insufficient isotropy of standard lattices, limits classical LB models to isothermal, low-Mach number flows [14,15] and the extension of LBM to high-speed compressible flows is still an active area of investigation. For instance, so-called augmented LB models have been developed in [16,17] to mitigate these shortcomings by introducing non-local corrections into the kinetic equations to eliminate the error terms in the momentum and energy equation, which arise due to the constraints of the standard lattices. Promising results have been shown in recent contributions, featuring both variable Prandtl number and adiabatic exponent [17]. Moreover, the socalled Particles on Demand (PonD) method has recently been proposed in [18,19], which eliminates the Galilean invariance errors of the standard lattices from the outset by representing the populations in a co-moving reference frame. Note that while both of these approaches get by with minimal velocity sets, another alternative to lift the aforementioned constraints is the use of multispeed lattices. There, the increase of the number of speeds moderates the lattice constraints and pertinent moments to recover the full Navier-Stokes-Fourier (NSF) equations in the hydrodynamic limit can be represented by the lattice [20][21][22][23]. It must be noted however that while multispeed lattices can extend the range of velocity significantly, the associated temperature range typically decreases with an increase of the number of particle's velocities. Recently, an entropic LBM realization of multispeed lattices has demonstrated promising results for both trans-and supersonic flows [1].
High-order lattices can also be used to increase ac-arXiv:2102.12559v1 [physics.flu-dyn] 24 Feb 2021 curacy in non-equilibrium flows. The degree of nonequilibrium or rarefaction is usually quantified by the Knudsen number, which is defined as the ratio of the molecular mean free path and a characteristic length. It has been shown both analytically and numerically [24,25] that by increasing the number of discrete velocities (order of Gauss-Hermite quadrature), LB models can capture non-equilibrium effects of wall-bounded flows beyond the NSF level.
With the examples from above in mind, it needs to be mentioned that while high-order lattices can provide a more accurate description of the flow, they come at high computational costs, which can make these models prohibitive for flows with realistic complexity in three dimensions. Fortunately, for most practical applications, the regions requiring high-order velocity sets are typically confined to a small sub-region of the entire computational domain. Hence, significant computational resources can be saved by using a multi-scale description, which uses higher-order lattices only when and where needed. In that spirit, a variety of different multi-scale frameworks, coupling different methods have been proposed in the literature [26]. For example, a multi-scale model, coupling LBM for continuum regions to direct simulation Monte Carlo (DSMC) for the high Knudsen number regions, was proposed in [27,28] for steady-state simulations. Furthermore, so-called discrete velocity models (DVM) have been shown to be successful in simulating rarefied gases [29][30][31] and multi-scale schemes with adaptively refined phase space meshes have been proposed to reduce both computational time and memory [32,33]. DVMs have also been coupled with the more efficient LBM, which was used in continuum flow zones whereas the DVM was restricted to the rarefied regions only [34,35]. Finally, in the realm of LBM, a finite difference LB scheme for rarefied gas dynamics was proposed in [36], where different lattices are coupled in a static manner by a non-local extrapolation procedure. While interesting, this approach is limited to static phase space refinement and suffers from severe stability issues due to a non-local ad hoc coupling procedure of different lattices based on extrapolation procedures.
In this work, we propose a multi-scale LBM scheme, which alleviates these issues and allows for adaptive phase space refinement using a consistent and local coupling procedure. For maximum lattice flexibility, we use a semi-Lagrangian advection procedure to naturally decouple the velocity space from the physical space [9,18,[37][38][39][40]. Higher-order lattices are thus only used when and where necessary, while preserving second-order accuracy. We further shed light on the nature of multi-scale problems and the range of validity of coupling procedures. The proposed scheme is then validated on examples in both high-Mach and high-Knudsen number flows but the scheme can be beneficial whenever large lattices are needed in a confined region. For illustration of the coupling procedure, we use a dual population, multispeed LBM model with variable Prandtl number and adiabatic exponent as proposed in [1] for high-Mach regions. Further, while high-Mach number flows are intrinsically captured in PonD through an adaptive reference frame, we extend PonD's range of validity to non-equilibrium flows using multispeed lattices.
The paper is organised as follows. In Sec. II the twopopulation model is presented and it's semi-Lagrangian realisation is explained. Subsequently, in Sec. III the multi-scale coupling scheme is introduced. Numerical results are presented in Sec. IV, with simulations of an athermal convected vortex, a jet flow, the shock structure problem and a high Knudsen Couette flow. Finally, concluding remarks are given in Sec. V.

A. Discrete velocities
Without a loss of generality, we consider discrete speeds in two dimensions, D = 2, formed by tensor products of roots of Hermite polynomials c iα , The roots of the lowest three Hermite polynomials are collected in Tab. I for the sake of completeness. Following the standard nomenclature, we refer to the corresponding discrete speeds (1) as D2Q9, D2Q16 and D2Q25 models. Each model is characterized by the lattice temperature T L and the weights W i associated with the vectors (1), where w iα are weights of the Gauss-Hermite quadrature, see Tab. I. Note that the lattice temperature is matched at the outset, T L = 1, for all the models under consideration.
With the discrete speeds (1), the particles' velocities v i are defined relative to a reference frame, specified by the frame velocity u ref and the reference temperature T ref , where R is the gas constant. Two reference frames of interest shall be considered below. The local co-moving reference frame is specified by the local temperature T = T (x, t) and the local flow velocity u = u(x, t).
The lattice reference frame is specified by u ref = 0 and

B. Kinetic equations
For the sake of presentation, we consider a twopopulation kinetic model for ideal gas with a variable adiabatic exponent and Prandtl number [1], where f eq i and g eq i are local equilibria while f * i and g * i are quasi-equilibrium populations. Moreover, δt is the time step and ω 1 and ω 2 are relaxation parameters related to the dynamic viscosity and thermal conductivity [1], respectively. The local conservation laws for the density ρ, momentum ρu and the total energy ρE are, We consider ideal gas with the internal energy of the form U = C v T , where C v is the specific heat at constant volume. The total energy is, The quasi-equilibrium populations are, where sym(. . . ) denotes symmetrization and θ is the reduced temperature, while the non-equilibrium third-order tensor Q and the heat flux vector q are, When the local flow velocity u(x, t) and the local temperature T (x, t) are used to gauge particles' velocities (3), we say that kinetic equations (4) and (5) are formulated in the co-moving reference frame, where the equilibrium populations depend only on the density and the temperature, With the co-moving reference frame and for any of the models of sec. II A, the hydrodynamic limit of the kinetic equations (4) and (5) are the standard equations of compressible gas dynamics, with the dynamic viscosity µ, thermal conductivity κ and the bulk viscosity ξ related to the relaxation parameters ω 1 and ω 2 as follows, Here C p is the specific heat of ideal gas at constant pressure, The Prandtl number is defined as Pr = C p µ/κ, and the adiabatic exponent is γ = C p /C v . In the following, we set R = 1, without loss of generality.

Co-moving reference frame
The implementation of the propagation using the comoving reference frame requires a transformation between non-equal reference frames which we remind for the sake of completeness [18]. Specification of a reference frame shall be denoted λ, where θ is the reduced reference temperature (12). In a given reference frame λ, the Q linearly independent moments of the f -and of the g-populations are defined as, where m, n ∈ {0, . . . , √ Q − 1}. Equations (23) and (24) establish a linear map of the Q-dimensional population vectors f λ and g λ into the moment vectors M λ and N λ . Denoting M λ the Q×Q the matrix of this map, we write (23) and (24) as, Consider two reference frames, λ and λ . Following [18], populations are transformed based on the principle of independence of the moments on the reference frame, With (27) and (28), the populations are transformed from the reference frame λ to the reference frame λ , where the transfer matrix G λ λ reads, Finally, populations are reconstructed at a point x and time t using Lagrange interpolation, where the summation is carried out over the collocation points x s and a s are interpolation functions. Below, we use the third-order Lagrange polynomials. Reconstruction (32) and (33) takes into account the fact that the reference frames λ s at various collocation points x s may differ from one another. Thus, the corresponding populations f λs and g λs are transformed into a target reference frame λ using (29) and (30) prior to the interpolation. Evaluation of the populations at the monitoring point x at time t involves the propagation and the collision steps. In the propagation step, semi-Lagrangian advection is performed using the reconstruction (32) and (33) at the departure points of characteristic lines, Here, particle's velocities v i are defined relative to the co-moving local reference frame at x and t. In order to find the co-moving reference frame, a predictor-corrector process is executed as follows: In the prediction step, the local reference frame λ 0 = {u 0 , T 0 }, is initialized using the local flow velocity and the local temperature available from the previous time step, . The density, momentum and temperature are consequently computed, The computed velocity (36) and temperature (37) define the corrector reference frame λ 1 = {u 1 , T 1 } at the monitor point and the propagation step (34) is repeated with the updated reference frame. The predictor-corrector process is iterated until convergence with the limit values, defining the density, velocity, temperature and the precollision populations at the monitoring point x at time t. The predictor-corrector iteration loop ensures that the propagation and the collision steps are performed at the co-moving reference frame, in which the local equilibrium populations (16,17) are exact.

Lattice reference frame
If instead of an adaptive co-moving reference PonD frame, the fixed lattice reference frame λ L = {0, T L } is used, we arrive at the semi-Lagrangian LBM [9,10,37]. In this special case, the lattice equilibrium populations f L i , g L i are evaluated by transforming the exact equilibrium populations (16) and (17) to the lattice reference frame, Similarly, the quasi-equilibria are transformed to the lattice reference frame, A discussion is in order here. It is well known that the standard D2Q9 lattice with the lattice equilibrium does not allow for a compressible model. At the same time, the mere change to the co-moving reference frame readily provides a compressible model with the same number of discrete velocities. In order to understand why the choice of the reference frames matters, we remind that attention should be paid at the higher-order moments that are not included in the transformation. For the D2Q9 lattice these are the diagonal elements of the third-order moment and the diagonal elements of the fourth-order moment, While the off-diagonal third-order elements, Q xyy = M 12 and Q yxx = M 21 , as well as the off-diagonal fourth-order element R xy = M 22 are the same in both, the co-moving and the lattice reference frame since they are among the moments transformed, the equilibrium values of the nontransformed moments are not equal in both references. Indeed, using the co-moving equilibrium (16) we find, With the diagonal moments (45) and (46), the equilibrium third-order moment tensor and the once-contracted equilibrium fourth-order moment tensor become, The equilibrium Maxwell-Boltzmann relations (47) and (48) are precisely what is required to recover the compressible flow equations in the thermodynamic limit. On the contrary, the lattice equilibrium (41) returns, instead of (45) and (46), Finally, if the temperature is fixed to the lattice temperature T L , the athermal model is recovered with III. MULTI-SCALE COUPLING SCHEME

A. The overlap and switching
Each discrete velocity set can be characterized by its domain of validity. Going by an example, the standard D2Q9 lattice provides a reliable simulation of a nearlyincompressible flow as long as the magnitude of the flow velocity stays below u 0.1 in lattice units. This estimate is based on the deviation of the diagonal element of the equilibrium third-order moment (49) from the Maxwell-Boltzmann value (47), see Fig. 1. The nextorder velocity set in the Hermite hierarchy D2Q25 of Tab. I provides a larger domain of validity [5]. It is therefore expected that the use of the D2Q25 velocity set instead of the D2Q9 can improve the accuracy of the simulation in certain cases. At the same time, using the higher-order in the entire computational domain is computationally demanding and wasteful in regions where the accuracy of the lower-order model is sufficient. Because the semi-Lagrangian propagation is not bound to a specific lattice and both the higher-and the lower-order model can be realized on the same grid, we seek an adaptive realization where a higher-order model is used only if the flow situation demands more accuracy than expected from the lower-order model while the lower-order model is used then its accuracy suffices.
Let us consider two velocity sets of different order At a monitoring point x, at time t, we can choose between the two corresponding population vectors, f q and f Q . We further assume that at (x, t) the validity domains of the higher-order Q-model and the lower-order q-model overlap and both models are equally applicable. Switching between the models then requires one of the two operations: • Lifting: The lifting operation switches from the lower-order q-model to the higher-order Q-model and is required to increase the accuracy if the available state f q is close to the limit of its validity domain and we need to proceed with the higher-order model. The lifting operation thus requires us to specify a map, • Projection: The projection operation switches from the higher-order Q-model to the lower-order qmodel and is required to maintain the accuracy with lesser velocities if the available state f Q is within the validity domain and we can to proceed with the lower-order model. The projection operation thus requires us to specify a map, Note that the lifting and projection must occur when both low-and high-order are equally valid and a proper transfer of information between the different models is possible, i.e., the validity domains must overlap. This is a general feature that the models of any multi-scale method must respect, such that a coupling strategy can be physically devised. We shall now specify the two switching operations separately.

B. Lifting
We use notation m q for the q-dimensional vector of moments of the low-order model, available at (x, t). With the q × q matrix M q , we have On the other hand, a moment vector of the Q-populations M Q can be considered as an element of a direct sum, where M q ∈ Im(M q ) is a vector from the image of the matrix M q while M Q−q is the rest of the higher-order moments. The lifting operation consists in specifying the individual contributions as, This construction can be understood as a decomposition between conserved, "slow" and "fast" components. The information of the low-order population vector are fully retained, both equilibrium and non-equilibrium contributions, and the moments M q are identified as the "slow" components. On the other hand, the missing M Q−q are considered as "fast" moments, which are strongly enslaved by the dynamics of the slow moments. The final moment vector M q→Q can then be written as With the moments M q→Q completed, the populations f Q are found by moment inversion, where M Q is the Q×Q moments-to-populations transform for the higher-order model.

C. Projection
In the projection step, a high-order population vector f Q is mapped to a lower-order f q . Fortunately, the highorder lattice can represent Q linearly independent moments, which contains the subset of the first q moments, which are required to construct the populations f q of the low-order lattice. Hence, in contrast to the lifting procedure, there is no missing information and all linearly independent moments m Q→q are operationally available from f Q , The population vector f q is obtained by moment inversion, The concepts of the two mappings are generic and apply with either f (density, momentum conserving lattice) or g (energy conserving lattice) populations. Finally, we stress that both lifting and projection are fully local operations, which leads to high efficiency, numerical stability and flexibility for an adaptive velocity refinement.

D. Semi-Lagrangian propagation
Now we discuss the semi-Lagrangian propagation in the multi-scale setting, using the lifting and the projection operators. We consider the population f i (x, t), with a discrete velocity v i , belonging to either the lower-or higher-order velocity set. The semi-Lagrangian propagation starts with the calculation of the departure point x dep,i = x − v i δt, and the identification of the surrounding collocation points, (see, Eq. (32)). Here we must take into consideration that the collocation points use in general velocity sets of different order. Thus, the lifting and projection operators are applied, such that the populations involved in the reconstruction are all expressed in the same velocity set. Figure 2 shows an example of the semi-Lagrangian propagation in the multi-scale setting.
The implementation can be summarised in the following steps: and identification of the collocation points.

For all collocation points:
• If velocity set order is lower than velocity set order of (x, t): apply lifting.
• If velocity set order is higher than velocity set order of (x, t): apply projection.
The first and third step are the same as for the case of a single velocity set throughout the domain, whereas the second step is active only near the interface of different velocity sets. The collision step that follows the propagation proceeds as in the case of a uniform velocity set. The modification of the semi-Lagrangian algorithm that enables the multi-scale feature is independent of the underlying model and can be used with single or double populations and with static or co-moving reference frame (PonD). It must be noted that in a co-moving reference frame, the semi-Lagrangian propagation step is performed iteratively in a predictor-correct loop. During the iterations the lifting/projection operations need to be performed once to each point that is close to the interface region (step 2 above).

IV. NUMERICAL RESULTS AND DISCUSSION
In this section we investigate the performance and accuracy of the multi-scale scheme. The velocity sets that are used in the simulations, generated by Gauss-Hermite quadrature, are described in section II A. The time step δt of the simulations that follow is such that CFL = (max|c i |δt)/δx = 1.0, where c i correspond to the lattice velocities.

A. Multi-scale flows
Before going into to the details of our validation, it is instructive to remind the nature of multi-scale problems. Multi-scale flows are characterized by large variations of characteristic quantities which most commonly includes but is not limited to spatial scales, time scales, Mach number or Knudsen number. In what follows, we only consider the Mach or Knudsen number, since in those cases we will benefit from using high-order lattices. For large spatio-temporal scales on the other hand, we refer to existing grid-refinement techniques such as [41]. However, when devising a numerical scheme, which bridges such large variations by coupling different methods for different scales, it is important to realize that consistent coupling is only possible if there is a region, where the validity range of both schemes overlap. In our case, this corresponds to regions of the flow, where both lattices can provide an accurate description of the flow field. These regions will eventually become the interface region between both lattices and thus correspond to the phase space refinement criterion. Hence, if such overlap regions do not exist in the flow, the use of a multi-scale scheme is inappropriate.
We will first test these ideas for variation in Mach using an athermal convected vortex and an athermal jet flow using a fixed reference frame (λ = {0, T L }) as examples. The underlying model for these simulation consist of a single population, without quasi-equilibrium relaxation. The motivation of using different velocity sets in the case of a fixed reference frame stems from the errors of the D2Q9 lattice due to non-Galilean invariance as the Mach number grows and the refinement criterion between high and low order velocity sets is based on a Mach number threshold. We proceed with the shock structure problem using a co-moving reference frame (PonD) and the full compressible model (f,g populations with variable Prandtl number and adiabatic exponent), in which the velocity set is dictated by the variation of the Knudsen number.

Athermal vortex convection
The reference case of an athermal, convected vortex is studied with the proposed multi-scale scheme, using the standard D2Q9 lattice and the D2Q25. The initial conditions of the velocity and density fields are the following, where U 0 is the advection velocity, is the strength of the vortex and R c is the characteristic radius [42]. The advection and vortex Mach numbers are set to 0.25 and 0.6 respectively. For the simulations a [200 × 200] grid was used with periodic boundary conditions. An estimate for the Mach threshold value that should be used as an appropriate refinement criterion can be obtained from an error analysis of the pertinent equilibrium moments. For athermal, incompressible flows the accuracy of the D2Q9 lattice is limited by the equilibrium third-order moment tensor, Q eq αβγ . Figure 1 shows the deviation of the Q eq xxx component from it's Maxwell-Boltzmann counterpart Q eq,MB xxx = 3ρc 2 s + ρu 3 as a function of Mach number, where c s is the speed of sound. Here a threshold value M a thr = 0.3 was selected, which leads to 3% accuracy. Figure 3 shows the velocity set and the density field at two different times. The high-order velocity set is wrapped around the high speed region of the vortex and follows it during advection.
The behaviour of the coupling scheme is compared with similar simulations in which a single velocity set is used throughout the domain. In figure 4, density contours are shown for the cases of D2Q9, D2Q25 and the hybrid D2Q9/25 velocity set after 500 time steps. The insufficient isotropy of the D2Q9 velocity set manifests in the distortion of the vortex, in contrast with the cases of D2Q25 and the coupled case. In figure 5 the density profiles are plotted along the center of the vortex, indicating an almost identical behaviour for the D2Q25 and the D2Q9/25 hybrid framework, and deviations for the case of the standard D2Q9 velocity set.

Athermal jet flow
The next case to be analysed through the multi-scale scheme is an athermal jet flow. The inflow velocity profile consists of a base low-speed flow with Mach number equal to M a L = 0.15 and a high-speed jet region superimposed symmetrically in the centre of the domain with diameter D jet and Mach number M a H = 0.6. At the inlet a zero pressure gradient and fixed velocity are imposed, whereas a non-reflecting boundary condition is prescribed at the outlet. Free-stream boundary conditions are applied at the top and bottom planes. The simulation is carried out on a [750 × 500] grid with a jet diameter D jet = 25. The viscosity is adjusted such that the jet Reynolds number is Re = D jet U jet /ν = 1000.
The multi-scale model uses the D2Q9 and D2Q25 velocity sets, at a fixed reference frame at rest. The refinement criterion is based on the local Mach number and set to M a thr = 0.3. Figure 6 shows the instantaneous xvelocity field and the corresponding spatial distribution of the velocity set at the same time step. The time average results for the velocity set and the x-velocity are shown in figure 7, where it is evident that the D2Q25 is dominant across the center-line and near the jet inlet whereas the D2Q9 is used at low-speed regions of the flow. Note that the average ratio of the nodes using the high order velocity set is roughly 15% and the discussion regarding the computational speed-up is presented in section IV C . In figure 8, the multi-scale simulation using the hybrid D2Q9/25 lattice is further compared with simulations in which the D2Q9 and D2Q25 were applied uniformly in the entire domain. The top plot shows the average x-velocity (normalised with the maximum velocity of the jet) across the center-line, while the middle and the bottom plots show the average x-velocity across planes normal to the flow, with distances from the inlet x c /D jet = 12, 16. It is apparent that the multi-scale simulation is in good agreement with the simulations obtained with the D2Q25. In contrast the D2Q9 shows significant deviations with respect to D2Q25. This further validates the proposed multi-scale scheme.

Shock structure
The shock structure problem is a classical problem in kinetic theory of gases, in which non-equilibrium phenomena dominate the flow. It is well known that since the thermodynamic variables vary on a scale of few mean free paths, traditional continuum equation such as Navier-Stokes-Fourier equations fail to correctly describe the shock wave structure.
The numerical setup consists a quasi one-dimensional plane shock wave, with an initial step at the center of the computational domain. The underlying model consists of the two-population realisation of the PonD method, with Prandtl number fixed to P r = 2/3 and adiabatic exponent of monoatomic ideal gas γ = 5/3. The upsteam and downstream flow values are connected through the Rankine-Hugoniot conditions [43]. The upstream mean free path for hard sphere molecules is defined as, where p 1 , α 1 , µ 1 are the pressure, speed of sound and the viscosity of the gas upstream of the shock respectively. The viscosity varies with the temperature as, where for the case of hard spheres s = 0.5. The steady-state non-dimensional density, tempera- ture, normal stress and heat flux are defined as follows, where the subscripts 1 and 2 indicate the upstream and downstream values w.r.t. the shock wave.
The D2Q16 and D2Q25 velocity sets are coupled for the computation of the shock structure. The high order velocity is used inside the region of the shock wave and the low order in the rest of the computational domain. The refinement criterion is based on the local Knudsen number, which can be computed as follows, where φ is density (in general other macroscopic fields can be used such as Mach number). The threshold value was set to Kn thr = 0.02 , a typical value suggested by the literature [36,44]. The simulations were carried out with a quasi one-dimensional setup with the resolution of the lattice ∆x corresponding to 0.02λ 1 . The velocity set in the domain is initialized with the D2Q16 lattice and wherever the local Knudsen number exceeds the threshold value is refined to D2Q25. The time evolution of the velocity set and the computed Knudsen number is shown in figure 9. The D2Q25 is initially concentrated in the middle of the domain and gradually expands towards the entire shock transition layer.
The steady-state results for M a = 1.6 are shown in figure 10 and compared with the results of Ohwada [45]. The origin of the coordinate system is the point with ρ n = 0.5 and x is non-dimensionalized with 0.5 √ πλ 1 . It can be seen that the density, temperature and velocity profiles match very well with the reference data. The normal stress and heat flux profiles are also in good agreement with the reference data.

B. Micro-Couette flow
The simulation of microscale flows is of both practical and theoretical interest especially when the Knudsen numbers is not negligible. A classical case to test the behaviour of numerical solvers for this regime is the shear driven Couette flow. It has been established that the choice of the discrete velocities set as well as the boundary conditions are of paramount importance for the accurate description of the fluid near the walls. In particular the standard D2Q9 lattice does not suffice to capture the Knudsen layer and significant deviations occur for the velocity profile at finite Knudsen numbers. Furthermore it has been reported that even-orders velocity sets perform significantly better than odd-orders [25]. At this point, we must stress that the micro-Couette flow is not a true multi-scale problem. Indeed the Knudsen number is uniform in the entire domain and in contrast to shock-structure problem, where the Knudsen number varies within the shock transition layer. For a quantitative discussion, we use the exact solution of the micro-Couette flow, which was been found analytically for D2Q9 and D2Q16 lattices in [11]. The authors concluded that while the D2Q9 can predict a slip-flow solution, it fails in the transient regime (Kn 0.1). The D2Q16 lattice on the other hand improves the accuracy of the solution considerably and predicts the boundary Knudsen layer in qualitative agreement with kinetic theory [46]. Figure 11 shows the relative error of the analytical solutions obtained by D2Q9 and D2Q16 for Knudsen numbers 0.1 and 0.2. At a constant Knudsen number the relative error has a constant value in the main flow and grows in the vicinity of the wall boundaries. Increasing the Knudsen number leads to magnification of the error in the main flow as well as extending the influence of the wall due to the boundary Knudsen layer.
The setup of the Couette test case consists of two plates surrounding the fluid, which move with opposite velocities u w = ±0.1 and same temperature T w = T L . The simulations were performed with a quasi-one dimensional [300 × 4] grid. We test the multi-scale scheme with the D2Q9 velocity set in the main flow and the D2Q16 near the wall boundaries. The underlying model consists of the two-population realisation of the PonD method, with Prandtl number fixed to P r = 2/3 and adiabatic exponent of monoatomic ideal gas γ = 5/3. Diffusive boundary conditions are implemented to efficiently capture the gas-wall interactions [25,47].
The results for the non dimensional velocity (normalised with the difference of the wall velocities) for Knudsen numbers equal to 0.1, 0.2 and 0.4 are presented in figure 12, in which comparison is made with results from linearised BGK [25]. The dotted vertical lines on the plots indicate the D2Q9/D2Q16 interface for the multi-scale simulations. The position of the D2Q9/D2Q16 interface is set according to the theoretical error analysis (1% above the main flow error).
We observe that for all Knudsen numbers the multiscale scheme matches well with D2Q9 solution in the main flow and also with the reference results near the wall boundaries, due to the local use of the D2Q16 lattice. However, a small discontinuity develops near the interface region of the different velocity sets, which becomes more prominent with increasing Knudsen number. This is an expected behaviour, caused by the discrepancy between the solutions of D2Q9 and D2Q16, as shown in Fig. 11.

C. Computational efficiency
In this section, we investigate the increase of computational efficiency when using the multi-scale scheme. To that end, a computational domain of size (N × N ) is decomposed in two rectangular zones which use the highorder D2Q25 and low-order D2Q9 velocity sets respectively. The ratio of the CPU times t D2Q25 /t hyb , where t D2Q25 refers to the time of a globally used D2Q25 and t hyb to the hybrid scheme, is measured as a function of the domain size N , keeping the ratio of the high-order nodes to 10% of the computational domain fixed. The simulations were carried out on a eight-core PC (Intel Core i7-9700@3GHz) and the timings are shown in Figure 13. The speedup that can be observed for the simulation using the D2Q9 lattice in the entire domain is t D2Q25 /t D2Q9 = 2.9. Thus, the maximum speedup that can be achieved when the ratio of D2Q25 is fixed to 10% is (t D2Q25 /t hyb ) max = 2.43. For small domain sizes the computational cost due to the lifting and projection transformations at the interface nodes limits the speedup but for larger domains (N > 100) significant speedup is observed, saturating close to the optimal value at around 2.4. The proposed multi-scale scheme can therefore lead to significant savings in CPU time.

V. CONCLUSION
In this work, we presented a novel multi-scale scheme with an adaptive velocity set, according to a refinement criterion based on the local Mach or Knudsen number. Velocity sets of different order are coupled through the lifting and projection operators. Both operators involve only local computations, which results in a robust and flexible adaptive velocity refinement. The multi-scale scheme can be implemented with either a static or comoving reference frame and with different models (single/double population, quasi-equilibrium models). The numerical results with a variety of flows validated the accuracy and efficiency of the proposed scheme. This work focused on 2D applications, but the lifting and projection, underlying the coupling scheme, hold also in three dimensions. Thorough investigation of the proposed scheme in three dimensions is left for future work. The accuracy in space of the multi-scale scheme is studied with the incompressible Taylor-Green vortex. The high-order D2Q25 is used inside a circular disk centered in the domain and the D2Q9 is used outside. Figure 14 shows the scaling of the L 2 error, which indicates that the multi-scale scheme retains second-order accuracy.