Semi-automatic construction of Lattice Boltzmann models

A crucial step in constructing a Lattice Boltzmann model is the definition of a suitable set of lattice velocities, and the correct assignment of the associated weights. For high-order models, the solution of this problem requires a non-trivial effort. The paper outlines the functioning of a publicly available Python script which has been written to assist researchers in that task. The speed of sound $c_s$ is considered as a parameter, which can, within limits, be chosen at will. Under this premise, the Maxwell-Boltzmann constraint equations are a system of linear equations to determine the weights, and hence amenable to numerical solution by standard linear algebra library routines. By suitable contractions, the tensor equations are mapped to a set of equivalent scalar equations, which simplifies the treatment significantly. For a user-supplied set of velocity shells, the software first checks if a solution for the weights exists, and returns it if it also happens to be unique. In such a case, the software also calculates the range of $c_s$ values that yield positive weights. Standard models like D3Q19 with a well-defined special $c_s$ value then result as limiting cases where one of the weights vanishes. In case of an infinite set of solutions, the user may find one particular solution by supplying a $c_s$ value, and then minimizing one or several weights within the framework of standard linear programming. Some examples illustrate the feasibility and usefulness of the approach. A number of models that have been discussed in the literature are nicely reproduced, while the software has also been able to find some new models of even higher order.

The Lattice Boltzmann (LB) method [1][2][3][4][5] can nowadays be viewed as a mature and well-established method to solve the equations of motion of fluid dynamics. Briefly, the method is based upon a regular lattice, each of whose sites r at time t contains a finite set of populations n i ( r, t). The index i is associated with a corresponding finite set of velocities (or lattice speeds) c i . This set is chosen commensurate with the symmetry of the lattice. The velocities are used for the streaming step of the algorithm, where n i ( r, t) is, within one time step h, moved to a new site r ′ = r + h c i : In other words, the velocities must be chosen in such a way that they carry the populations from one site to another (and not to some "interstitial site"). Interactions are modeled by an additional collision step, where ∆ i ( r, t) is the so-called "collision operator", such that the full update rule (the so-called Lattice Boltzmann Equation (LBE)) is given by The populations are usually identified with the mass densities associated with their corresponding velocities, such that the total mass density ρ at the local site is given by Similarly, the momentum density j is given by j( r, t) = i n i ( r, t) c i = ρ( r, t) u( r, t), where u( r, t) is the local streaming velocity. The collision operator is then constructed in such a way that it locally conserves the mass density, as well as the momentum density, An additional conservation law for the kinetic energy may be added if the method is intended to not only simulate isothermal hydrodynamics, but also heat transport.
In what follows, we will assume that the lattice is a simple cubic lattice in d spacial dimensions. We will also use natural units, where both the lattice spacing as well as the time step h are set to unity.
The standard and most popular version of the LBE is based upon a linearized Boltzmann equation [6,7].
In terms of a "cookbook recipe" it may be described as follows: One first obtains the local conserved quantities ρ and j (and u = j/ρ), which are then used to calculate a set of local equilibrium populations: Here c s denotes the (isothermal) speed of sound, while the coefficients w i are a set of positive weights associated with the velocities c i . For symmetry reasons, these weights must take the same value within a velocity shell.
Here a shell is defined as the equivalence class of all lattice speeds that can be mapped onto each other by one of the symmetry operations of the lattice's point group, see also Sec. II C. Furthermore, we require the moment conditions where Greek letters denote Cartesian indexes, for which the Einstein summation convention is implied. It should be noted that Eqs. 9 and 11 are valid automatically for symmetry reasons. Similarly, the only aspect of Eq. 10 that does not follow automatically from symmetry is the value of the prefactor of the unit tensor on the right hand side (rhs). In contrast, Eq. 12 is less trivial: Not only is there a need to adjust the prefactor c 4 s on the rhs, but we also need to ensure that the fourth-rank tensor is isotropic: From cubic symmetry alone, the form of the rhs is not guaranteed at all -rather one expects an additional term κ 4 δ αβγδ , where δ αβγδ is one for all indexes being the same, and zero otherwise. Therefore one needs to adjust the coefficients in such a way that κ 4 vanishes. The well-known D3Q19 model [7] is one possible solution of this problem: Here the velocities on the three-dimensional cubic lattice comprise the three shells with c i 2 = 0, 1, 2 (one velocity + six velocities + twelve velocities = 19 velocities), and the weights are given by w i = 1/3, 1/18, 1/36, respectively, for the three shells. For this model, the speed of sound takes the value c 2 s = 1/3. Via straightforward calculation one then shows that the equilibrium populations according to Eq. 7 satisfy analogous moment conditions: i n eq i c i = j, (14) i n eq i c iα c iβ = ρc 2 s δ αβ + ρu α u β .
It should be noted that the model implies the thermodynamics of an ideal gas. If m denotes the mass of a gas particle, the equation of state is given by where p is the thermodynamic pressure, k B Boltzmann's constant, and T the absolute temperature. Since the speed of sound is given by c 2 s = ∂p/∂ρ, it is clear that ρc 2 s in Eq. 15 is indeed just the pressure, such that the whole rhs of Eq. 15 is just the Euler stress occurring in the Navier-Stokes equation. Furthermore, we note that, for an ideal gas which is globally at rest, the kinetic energy of a gas particle, in units of k B T , can be written as where v is the particle velocity.
After obtaining the equilibrium populations as discussed, one then constructs a linearized collision operator where the coefficients L ij encode details about the dissipative processes in the system (i. e. viscous damping in isothermal hydrodynamics). Via a Chapman-Enskog expansion (see, e. g., Ref. [5] for details) one then shows that for small Mach numbers (i. e. ignoring terms of order (u/c s ) 3 ) Navier-Stokes dynamics is recovered in the continuum limit. Considering the continuum statistical mechanics of the gas at rest ( u = 0), the velocity distribution of the particles is given by the Maxwell-Boltzmann distribution In analogy to Eqs. 8 to 12 we can therefore similarly consider the velocity moments which means that we can write the moment conditions Eqs. 8-12 for the coefficients w i in the compact form of so-called "Maxwell-Boltzmann constraints" (MBCs) for all tensor ranks up to rank four. If we ignore the details of the collision operator, and also problems of stability, accuracy, staggered invariants, etc., we may therefore say that the construction of a standard LB model is tantamount to the two steps: LB1: Find a suitable set of velocities c i ; and LB2: calculate the weights w i , based upon satisfying Eq. 25, which is therefore seen to lie at the heart of the process.
Of course, it is possible to solve problems LB1 and LB2 merely with paper-and-pencil work. However, already for the D3Q19 model this is a task that can no longer be viewed as completely trivial. Furthermore, we should take into account that there is a growing trend in the community [8][9][10][11][12][13][14] to consider higher-order LB models, which means, in the present context, the study of larger velocity sets with suitably adjusted weights, such that Eq. 25 is satisfied for even higher-rank tensors than just fourth order. Except for the goal to obtain a better degree of isotropy, which is of course desirable as such, there are also cases where the physics dictates such higherorder models. One example is thermal transport, where the hydrodynamic equation of motion for the energy density contains a term ∝ u 3 , such that the expansion of n eq i in powers of u needs to be carried to higher than second order, which in turn means that also higher-order velocity moments appear in the theory [15]. Even for isothermal flows, it has been demonstrated that the improved isotropy properties, which result from a larger velocity set, significantly help in the removal of artifacts, in particular in problems where rotational symmetry plays a crucial role [16,17]. Yet another example is the study of isothermal gas-liquid systems within the framework of a density-functional approach with a smeared-out interface. Here the interface is modeled by a gradient-square term in the free energy functional, such that a thirdorder gradient of density occurs in the Navier-Stokes equations as an interfacial driving force. Therefore such a system requires a Chapman-Enskog expansion up to third order [18] and, concomitantly, correct MBCs up to sixth-rank tensors. These issues shall not be further discussed here. We are rather concerned with the solution of LB1 and LB2 as such, just as a mathematical problem, which we wish to solve in a fairly general fashion with maximum use of a computer and minimum paperand-pencil work, since the latter is both cumbersome and error-prone, in particular for high-order models. It turns out that the problem is most suitable for solution on the computer if we consider c 2 s not as some "magic number" (like c 2 s = 1/3) resulting from the analysis, but rather as a parameter that can (within limits) be chosen freely at will. This additional degree of freedom requires at least one additional velocity shell, compared to models like D3Q19 with a fixed and prescribed value of c 2 s . At first glance, this might be viewed as an unnecessary complication; however, the advantage of this treatment is that in this way the problem becomes strictly linear, such that standard library routines of linear algebra become applicable. Furthermore, there are cases where the physics of the problem anyway makes it desirable to have c 2 s available as a free parameter: Since the equation of state is given by p = ρc 2 s , one can implement a nontrivial equation of state by making c 2 s a parameter that depends on the local density. Finally, it should be noted that models with "magic" c 2 s values like D3Q19 can be derived very easily from the more general treatment: The "magic" c 2 s is just the value that causes the weight of the additional shell to vanish, which means that this shell simply does not occur in the thus-reduced model.
The purpose of the present paper is to derive an algorithm to treat the solution of LB1 and LB2 numerically. We have developed a script which implements these considerations in Python [19] and which is publicly available [20]. It has been written in such a way that it runs both under Python 2.7 as well as 3.5. The present paper may therefore also be viewed as the documentation of the software. The non-trivial aspects of linear algebra are taken care of by utilizing well-established routines from the NumPy [21,22] package. As far as we understand, and which seems to be consensus in the community [9], there is no known method to find a suitable (smallest) set of lattice velocities with simple a priori criteria; rather one has to choose a set (essentially by trial and error) and then check if this allows for a solution of LB2. This is precisely what the script does: It asks the user for defining a set of shells, and then uses that set for analysis. We mainly focus on the case where LB2 has one and only one solution ("minimal" models). This is in spirit quite similar to the work of Philippi et al. [9], and also of Shan [13,14], however with significantly reduced mathematical complexity. Those cases where the problem has no solution whatsoever are obviously discarded. There are also cases where there are infinitely many solutions. These cases are not analyzed in a comprehensive fashion, but only by reduction to a special case, where c s is given. From there, a unique set of weights is determined by solving a linear programming problem which aims at the minimization of some particular weight, or even several of them. The script is able to treat arbitrary spacial dimensions, and an arbitrary maximum tensor rank.
At this point, we would like to emphasize that of course a large fraction of what has been presented so far, and will be presented in the following sections, is not new. The central importance of Eq. 25 has been appreciated by numerous authors, and a significant fraction of them refers to it not in terms of MBCs but rather in terms of Gaussian integration -while the mathematical problem as such is of course identical, regardless of nomenclature.
Secondly, the underlying linear structure of the problem, and last not least its relation to linear programming, is also well-known, and has, most notably, been exploited previously in the work by X. Shan [13,14]. As far as we are aware, Ref. 14 is so far the most extensive study on the problem, with models that are isotropic up to tensor rank eight. What is new about our work is (i) the implementation in terms of publicly available software, (ii) a novel approach to re-cast the tensor equations in terms of scalar equations by contraction with random tensors (see Sec. II), and (iii) the systematic application of numerical linear algebra, without any complicated group theory. Beyond a perfect reproduction of the results of Ref. 14, see Appendix B, we are also able to find models with a yet higher degree of isotropy up to tensor rank ten.
The remainder of this paper is organized as follows: The following section (Sec. II) is devoted to a detailed derivation and description of the algorithm that has been implemented. Section III then demonstrates, via a few examples, what kind of results can be obtained with the software very easily. After that, Sec. IV provides a brief summary.
Appendix A briefly discusses how the obtained models can be used to construct the equilibrium populations for nonvanishing flow velocities, using either the Hermitepolynomial expansion or the entropic approach, which are demonstrated to be asymptotically equivalent in the limit of full isotropy. This part does not present new results but is rather intended as background information to complete the picture; experienced readers can probably skip that part. Appendix B provides details on how we used our software to check the results of Ref. 14, and Appendix C some numerical details about the "test" mode of our script, where the set of weights is not calculated but rather checked whether it indeed satisfies the MBCs.

A. Linear algebra
Let us consider the central relation This is a tensor identity for tensors of rank m, where m is the number of c i factors on the left hand side (lhs), or the number of v factors on the rhs. For odd m, the relation is trivially satisfied for symmetry reasons. We wish to satisfy the relation for all m with m ≤ M , where M is a user-supplied even number. The rank m = 0 is just the normalization condition for the weights. The weight w 0 , corresponding to the velocity c i = 0, occurs only in that condition but not the other equations. It is therefore sufficient to first solve the problem for the weights with nonzero c i , restricting attention to even m ≥ 2, and then adjust w 0 at the end in order to satisfy normalization.
If we denote the number of shells (excluding the zero velocity shell) with N s , enumerate these shells with an index s = 1, . . . , N s , and take into account that the weights are identical within a shell, the MBCs can be written as to be satisfied for tensor ranks m = 2, 4, . . . , M .
We note that on both sides the tensors are obviously fully symmetric under arbitrary exchange of indexes. This property alone reduces the complexity (or dimensionality) of the problem enormously. However, a further reduction occurs because of geometric symmetry. The rhs is clearly invariant under reflection, and any rotation in continuous space, while the lhs is invariant under the cubic group. For the time being, we view c 2 s as a fixed ("user-supplied") number, and therefore we may consider the integrals on the rhs as evaluated, such that the rhs is simply a known numerical tensor.
We now consider a tensor as a vector in tensor product space. From symmetry (see also Ref. [10]), we know that both sides can be expanded in terms of elementary tensors as follows: • m = 2: rhs = . . . δ αβ (28) lhs = . . . δ αβ ; (29) • m = 4: • m = 6: and so on. Here the δ tensors are generalized Kronecker symbols, which are one if all indexes are the same and zero otherwise. The symbol "perm." indicates a suitable set of index permutations such that the expression under consideration is properly symmetrized (like explicitly indicated for m = 4). The prefactors ". . ." are the coefficients which may in principle be calculated by evaluating Gaussian integrals for the rhs, or lattice sums for the lhs. We may then consider the tensors δ αβ , δ αβ δ γδ + perm., δ αβγδ , etc. as basis vectors in tensor space and the coefficients ". . ." as vector components. From this, we see that the rhs is always an element of a one-dimensional space, while the dimensionality of the space corresponding to the lhs depends on the tensor rank m: For m = 2, we get a one-dimensional space, for m = 4 a two-dimensional space, for m = 6 a three-dimensional space, and so on. To discuss the "and so on" in more detail, let us first introduce a short-hand notation and simply write (2) for a second-rank Kronecker tensor, (2, 2) for the symmetrized product of two second-rank Kronecker tensors, (4) for a fourth-rank Kronecker tensor, etc.. We may then say that the space for m = 2 has the basis (2), while m = 4 has the basis (4), (2,2), and m = 6 has the basis (6), (4, 2), (2,2,2). For m = 8 we then get (8), (6,2), (4,4), (4, 2, 2), (2, 2, 2, 2) or a five-dimensional space. This process continues: For each higher m, we get a new tensor (m), plus all possible products of the lowerorder tensors. In general, we thus get a tensor space dimension D T (m) for mth rank tensors, and this may be calculated easily in Python by explicitly constructing the patterns (m), (m, m − 2), . . . from the lower-order patterns in a recursive fashion. As far as we understand, there is no closed formula for D T (m); in number theory, D T (m) is known as the "partition function" (or "number of partitions") of m/2 (see e. g. Ref. [23]). For given M , the script therefore calculates (and stores) the dimensions D T (m) for all m = 2, 4, . . . , M , as well as the dimension of the total space (comprising tensors of all the ranks under consideration), which is It is also clear that, for each m, not only the lhs but also the rhs of Eq. 27 must be an element of the D T (m)dimensional symmetry-restricted subspace, since the cubic group is a subgroup of the full rotation-and-reflection group of continuous space. The problem, however, is that this consideration yields only the maximum dimension of the subspace of all the tensors whose form is that of the lhs. The number D T (m) is just a consequence of symmetry, while the actual dimension is a result of the supplied velocity set: The true subspace is the span of the elementary tensors i∈s c iα c iβ . . . c iγ , and this may, for a poorly chosen (or simply too small) set, be smaller than the space of tensors that are symmetric with respect to the cubic group, and to the permutation group of the indexes. In that situation it may actually occur that the rhs is not an element of that smaller space, or, in other words, that there is no set of weights that solves Eq. 27. Conversely, it may also turn out that the velocity set is chosen rather large, such that the equations have infinitely many solutions. An important aspect of the software is therefore that it has to be able to reliably detect such cases.
In the present paper, we propose to start from Eq. 27 and to contract it with an elementary tensor of rank m where n is some unit vector (| n| = 1), chosen with random orientation, uniformly distributed on the ddimensional sphere. In this way, we project the tensor equation onto a scalar equation. In this context, it should be recalled that contraction over all indexes of two tensors of the same rank naturally defines a scalar product in tensor product space, which then immediately allows one to construct the geometric concept of an orthogonal projection. It should also be noted that the elementary tensors are invariant under index permutation but not under any geometric symmetry transformation. We do this contraction not only for one unit vector but for D T (m) unit vectors for the tensor equation of rank m, and do this for all ranks m = 2, 4, . . . , M . We thus obtain R scalar equations, and for each of these equations we generate a new unit vector n r , r = 1, . . . , R. Let us denote the rank corresponding to the rth equation with m r .
On the rhs we then obtain [24] d We therefore define which can be straightforwardly calculated as soon as the velocity shells are specified and the random vectors are generated. Then the resulting set can be written as which is obviously a set of linear equations to determine the weights w s . In matrix form this is written as where A is the R×N s matrix formed by the elements A rs , w the vector of weights, and b the rhs vector according to Eq. 38. Our strategy is thus to construct this set of equations and to solve it numerically. Let us now discuss why we believe that this procedure is correct and useful. Within a given tensor rank m, we have D T (m) elementary tensors n α n β . . . n γ . It is then highly probable that these tensors are all linearly independent. Actually, in our opinion this is much more probable than linear independence of a set of elementary tensors chosen by a guessing and erring human. More importantly, though, it is highly likely that the projections of the elementary tensors onto the D T (m)-dimensional subspace of invariant tensors are still linearly independent. If that is the case, then the contractions, i. e. the scalar products of the elementary tensors with the lhs tensor, provide enough information in order to characterize the latter uniquely. In other words: Our thusgenerated R scalar equations are equivalent to the original set of tensor equations.
The easiest case occurs obviously when A is quadratic (N s = R) and non-degenerate, because then Eq. 39 can be solved by simple inversion. Therefore the script first calculates R and then suggests to pick precisely R shells -but the user has the freedom to follow that suggestion or not; i. e. both N s > R as well as N s < R are permitted. Typically, one expects infinitely many solutions for N s > R and no solution whatsoever for N s < R; however, due to degeneracies this does not always have to be the case. Similarly, picking N s = R does not guarantee at all that A is non-degenerate. A significant part of the software therefore aims at treating these less straightforward cases.
At this point, it is useful to consider c 2 s no longer as a fixed number but rather as a parameter that can be varied. Since the rhs b consists of c 2 s , c 4 s , . . ., c M s , it is clear that the weights must be polynomials in c 2 s . Therefore we write Comparing coefficients, we find or, in matrix form Our aim is therefore to solve that system to calculate the matrix Q, such that we find a solution that is valid for any possible value of c s -note that Eq. 43 no longer contains c 2 s . The first step of the analysis is a standard singularvalue decomposition [25,26], which is possible for any matrix A independent of its shape or rank. The NumPy package provides a routine to do this [27]. The decomposition reads where S is a rectangular matrix of the same shape (R × N s ) as A, and U and V are quadratic orthogonal matrices of suitable size (R × R and N s × N s ), with U T U = 1, V T V = 1 (unit matrices). Here the superscript T denotes transposition. S is a matrix consisting of all zeros, except the entries S 11 = σ 1 > 0, S 22 = σ 2 > 0, . . ., S ZZ = σ Z > 0 (the singular values). Here of course it has to be checked if some "nonzero" singular values have only been produced as a result of numerical roundoff errors. Obviously, Z ≤ min(N s , R). Z is the rank of S (or of A), and maximum rank occurs for Z = min(N s , R), while for Z < min(N s , R) the problem is rank-deficient. Inserting Eq. 44 into Eq. 43, one sees that the problem is equivalent to with the abbreviations As Z ≤ R, we can only have the cases Z = R or Z < R. Let us first treat the latter case, for which there are R−Z equations of the form This can obviously only hold if the rhs vanishes, and this can be easily checked by calculating the Frobenius norm of the latter, using the standard NumPy routine "norm" [28]. This is nothing but the criterion for the existence of a solution, and if it fails, the script aborts, and informs the user. This situation means that the set of shells is either too small or chosen inappropriately, such that degeneracies occur. The user is then encouraged to try again with a different set of shells. Conversely, if the check succeeds, then the equations number Z + 1, Z + 2, . . . , R may simply be discarded. Doing this, we arrive at a simplified matrixS of size Z × N s , as well as a simplified rhsD ′ .
If Z = R, no such "pruning" needs to be done, and we simply haveS = S,D ′ = D ′ . We thus arrive at a simplified setS As a next step, we scale the equations by 1/σ 1 , 1/σ 2 , . . . , S ′ is a trivial matrix whose nonzero entries are all one. Now, since N s ≥ Z, the matrixS ′ can either be quadratic (N s = Z) or rectangular, with more columns than rows (N s > Z). In the former case,S ′ is simply the unit matrix, such that the solution is unique and directly found via Q ′ =D ′′ or Q = VD ′′ , from which the weights are found as polynomials in c 2 s , returned, and further processed according to subsections II B and II E.
For N s > Z we have infinitely many solutions. To treat this latter case, we also provide some numerical procedures, however in a less comprehensive and ambitious fashion. The matrices V andD ′′ (from which S ′ can be easily re-constructed), together with necessary information about the shells, are stored in a file, which is then processed further in a separate script "Continue.py". This will be the topic of subsection II F.

B. Range of validity
Assuming that the script has found a unique solution by making use of linear algebra, we still have not yet satisfied one important condition: For physical reasons, the populations n i should be positive, which in turn means that the weights w s must be positive as well. This is however typically only true within one (or more) narrow interval(s) of c 2 s values. It may even turn out that there is no c s value whatsoever that satisfies the condition. It is therefore desirable that the script automatically finds this range of validity. This is facilitated by the NumPy routine "roots" [29], which returns all complex roots of a polynomial given in terms of its coefficients. This procedure is applied to all the functions w s (c 2 s ) that the linear algebra routines have found. Technically, "roots" is a linear algebra routine as well, since it is based upon mapping the root-finding problem onto an eigenvalue problem.
The script then eliminates all roots with non-vanishing imaginary part, as well as all roots with real part ≤ 0. The remaining K roots z 1 , z 2 , . . . , z K are arranged in a sorted array, making use of the NumPy routine "sort" [30]. This defines a sequence of intervals (0, z 1 ), (z 1 , z 2 ), . . . , (z K−1 , z K ), (z K , ∞), in which no change of sign can occur. By evaluating all functions w s (c 2 s ) in the centers of these intervals (i. e. at the points (z n+1 + z n )/2), we can eliminate all the intervals that violate the condition of positivity of weights. For the last interval, the functions are evaluated at (3/2)z K . Typically -but not always -this procedure finds one single interval of validity.
The special c 2 s values that form the limits of validity ("magic" c 2 s values) are characterized by the vanishing of at least one weight w s . In this case, the corresponding shell(s) can be discarded completely, which gives rise to a "reduced" model, which is often useful in practice. For this reason, the script evaluates all weights at the "magic" c 2 s values, such that the user receives quick information about the properties of the resulting reduced models.

C. Velocity shells
We recall that a shell is defined as an equivalence class of lattice speeds that can be mapped onto each other by an element of the cubic point group of the lattice. A less strict concept is that of a "modulus shell" that comprises all lattice speeds whose modulus is the same. In general, a modulus shell can be decomposed into several subshells, each of which is an equivalence class of its own. Therefore the script proceeds in several steps in order to define the shells: (i) Construction of the cubic group in d dimensions, (ii) supply of modulus values by the user, (iii) finding the corresponding modulus shells, (iv) decomposing the modulus shells into subshells, and (v) possible a posteriori elimination of some of the thus-found shells by the user.
Let us first discuss the construction of the cubic group in d dimensions. Denoting the Cartesian unit (column) vectors with e 1 , e 2 , . . . , e d , we see that the d × d unit matrix is written as ( e 1 , e 2 , . . . , e d ). A transformation that is just tantamount to a permutation π of the Cartesian axes therefore corresponds to the matrix ( e π(1) , e π(2) , . . . , e π(d) ). Combining this with the possibility to flip the orientation of an axis, the most general transformation matrix of the cubic group has the form (± e π(1) , ± e π(2) , . . . , ± e π(d) ), where each combination of signs is possible. Based upon these observations, it is very easy to construct the set of all transformation matrices, whose number therefore turns out to be d!2 d (i. e. eight in two dimensions, 48 in three dimensions). We here make use of the "permutations" routine of the "itertools" section of the standard Python library, plus the observation that any sign combination can be written as a string of pluses and minuses. Such a string is straightforwardly mapped onto a corresponding string of zeros and ones. Such a string, in turn, is identified with the binary representation of an integer in the range 0, 1, . . . , 2 d − 1, which therefore just needs to be scanned in order to find all sign combinations.
In the next step, the user specifies the squared moduli of the desired velocities. For one modulus shell, we thus have an integer number L = c 2 i . The corresponding vectors are then being searched for by the script. Obviously, it is sufficient to search a d-dimensional cubic grid, where each coordinate varies from −L to +L. The total number of points to be scanned is thus (2L + 1) d . Introducing a trivial coordinate shift, one may as well search a grid whose coordinates vary from 0 to 2L. A corresponding one-dimensional index k that scans all grid points then varies from 0 to (2L + 1) d − 1. This index is related to the shifted coordinates x 1 , x 2 , . . . , x d via Therefore these coordinates can be retrieved from k recursively by successive modulo operations. After having collected and shifted the thus-found coordinates, the program calculates the squared modulus and checks if that value coincides with L. If yes, the vector is added to a list.
The two final steps (iv) and (v) are then straightforward and need not be explained in further detail.

D. Random vectors
Using a uniform random generator (the script uses the built-in generator that is provided by Python via the "random" package), it is very easy to generate d coordinates x i distributed uniformly in the interval (−1, 1). We then calculate i x 2 i and check if this is smaller than one. If not, the procedure is repeated until the criterion is satisfied. The thus-found vector x is then normalized to unity, yielding n = x/ | x|. It is clear that the thusgenerated vector n is a unit vector that is uniformly distributed on the unit sphere.

E. Rational numbers
Considering the expansion of w s in powers of c 2 s (Eq. 40), and the original equations in the form of Eq. 8 to 12, it is quite clear that the coefficients q sµ can be viewed as the solution to a system of linear equations whose coefficients are all integer. For this reason, they must be simple rational numbers. Since, e. g., a fraction like 1/24 is more intuitive and aesthetically more appealing than the corresponding floating-point number 0.04166666, the script makes use of a routine that converts the latter into the former. In principle, this is done via a standard continued-fraction expansion [31], which is however somewhat tricky to implement due to its high sensitivity to roundoff errors. Fortunately, Python provides the ready-made routine "Fraction" [32] which yields quite satisfactory results if the size of the denominator is suitably limited, and the model is of sufficiently low order, such that the denominators are not too large.
This conversion is also applied to the "magic" c 2 s values and to the coefficients of the resulting reduced models. However, these might be irrational, in which case the procedure provides fractions with large numerators and denominators. If the user is interested in exact algebraic numbers, we recommend to identify the algebraic equation whose solution provides the magic c 2 s value, and to attempt its exact solution with the help of a computer algebra system such as Wolfram Alpha [33]. It should be stressed, though, that for practical purposes a floatingpoint representation is absolutely sufficient.

F. The case of infinitely many solutions
For rank-deficient problems that have infinitely many solutions, we do not attempt to find the weights as a function of c 2 s , but rather only for one particular c 2 s value, for which the user is explicitly asked. We do this in a separate script "Continue.py", which obtains its further input from a file written by the main script.
Starting from Eq. 50, which we write in the form and recalling that the sought-for matrix Q contains the coefficients of the polynomial expansions of the weights w s , we see that we can, for a given (user-supplied) c 2 s value, immediately construct a set of linear equations that the weights have to satisfy. We know that this set has infinitely many solutions. Furthermore, we know that all weights have to satisfy the conditions w s ≥ 0. If we then combine this with some linear optimization problem, we see that this is identical to a problem of standard linear programming [34]. The most useful optimization problem that we can imagine in this context is to minimize one of the weights, or perhaps even several of them. The user is therefore asked which of the weights is to be minimized; in case that several weights are supplied, the script simply attempts to minimize the sum of these weights. For our purposes, we found the package "cvxpy" [35] particularly useful in terms of (i) Python integration, (ii) correctness of results, and (iii) numerical stability. The script checks if the problem has a solution, and if yes, it returns it, together with the c 2 s value. Quite often, the minimized weight turns out to be zero. To enhance the ease of use, the user may supply a whole interval of c 2 s values plus a step size, such that the whole interval is being scanned.
In practical applications, it often turns out that it is useful to first supply a fairly large set of shells, which then results in a rank-deficient problem, and to then use "Continue.py" to remove more and more shells until finally a minimal model is found.

G. The "test" mode
Except for solving the problem of finding weights from scratch, quite frequently the situation arises where one is confronted with a given (or claimed) solution (e. g. from the literature), and one would like to quickly check its correctness. The script therefore provides a "test" mode, where the formalism developed above is used for that purpose. Input data are therefore not only spacial dimension, maximum tensor rank, and the set of shells (as always), but additionally the value of c s , plus the set of weights that should be tested. Note that in "test" mode the script assumes that a given solution has been given for one special well-defined c s value, and also disregards the problem of positivity of weights. Therefore, the task is to simply check if the given vector of weights w satisfies Eq. 39, which is easy, because the provided information allows to calculate both the matrix A and the inhomogeneity b. In case that the given solution is provided simply as a set of numbers (a vector w 0 ), we therefore calculate the residual and analyze whether it is zero within numerical accuracy.
In the more general case of a degenerate solution, we assume that it is given in the form where the λ i form a set of parameters which may be varied independently. Obviously, we again have to evaluate ∆ 0 as before, and check for its vanishing, but additionally we also need to evaluate the additional residuals ∆ i = A w i , i ≥ 1, and check for their vanishing as well.
Given the fact that literature values for weights are typically given with not more than six-digit accuracy, we need to take care that the check for vanishing residuals is not too stringent. How this is done in detail is explained in Appendix C.

H. The algorithm as a whole
The considerations given above give rise to a procedure which is summarized in the flow diagram Fig. 1. In general, input data may be provided either by an interactive dialogue or via command-line arguments.
We continue with M = 6, i. e. R = 6. Attempting a six-shell model with c 2 i = 1, 2, 4, 5, 8, 9 gives rise to a rank-deficient problem with infinitely many solutions. Removing the shell c 2 i = 5 (it is the most natural candidate since it contains most velocities) yields indeed a unique solution given by We now turn to M = 8, i. e. R = 11. We thus first attempted the set c 2 i = 1, 2, 4, 5, 8,9,10,13,16,18,25. The shell c 2 i = 25 comprises two subshells (with vectors of types (5, 0) and (4, 3), respectively), such that the set actually gives rise to a twelve-speed model. Not surprisingly, this results in a rank-deficient problem with infinitely many solutions. However, the rank turns out to be merely eight, which indicates that it might be possible to reduce the model to just eight non-trivial shells. We therefore tried by removing the outer shells The range of validity is 0.6979533 ≤ c 2 s ≤ 0.8704738; these numbers are the irrational roots of w (40) and w (30) . Removing the corresponding shells then gives rise to two 37-speed models. At the lower bound we thus obtain the weights w (00) = 0.2331507, w (10) = 0.1073061, w (11) = 0.05766786, w (20) = 0.01420822, w (21) = 0.005353049, w (22) = 0.001011938, w (30) = 2.453010 × 10 −4 , w (31) = 2.834143 × 10 −4 . Comparison with Ref. [9] shows that this set of velocities and weights is identical to the model derived by Philippi et al. under the name "D2V37" model.
The two-dimensional models that were investigated in a recent paper by Shan [14] (going up to tensor order M = 8) could all be verified (except for one minor typo), see Appendix B . Furthermore, it is also possible to study the case of tenth-order isotropy, corresponding to R = 18.
Starting from the eighteen velocities c 2 i = 1, 2, 4, 5, 8,9,10,13,16,17,18,20,25,32,36,37,40,52, one finds that this yields a rank-deficient problem with infinitely many solutions, where the rank of the problem is eleven. Removing outer shells, we can reduce this to the set c 2 i = 1, 2, 4, 5, 8,9,10,13,16,25, which corresponds to eleven shells (the shell c 2 i = 25 is decomposed into two subshells, while all others are irreducible). This is a 61speed model with a unique solution and a range of validity of 0.7592510 ≤ c 2 s ≤ 0.9054850. We do not give the expansion of the weights as polynomials in c 2 s here; the expressions are lengthy and the rational representations of the floating-point numbers most probably affected by roundoff errors. The other properties of the model are summarized in Tab. I One thus sees, from the reduced model at the upper limit, that in two dimensions it is possible to construct a model that is isotropic up to tenth order with as few as 53 velocities.
Let us now finally comment on the case of rankdeficient problems with infinitely many solutions. The  main virtue of such models is that they are able to extend the admissible range of c 2 s values, however at the expense of more lattice speeds. To illustrate this, let us again go back to the simple case M = 4. As we have seen already, the set c 2 i = 1, 2, 4 yields a minimal model with range of validity 1/3 ≤ c 2 s ≤ 2/3. We now add one further shell c 2 i = 5 (i. e. we enhance the model from 13 speeds to 21 speeds), which results in a rank-deficient problem, which we analyze using "Continue.py" as described, where we demand that the weight of the additional shell should be as small as possible. Scanning for admissible c 2 s values, we find that the lower bound remains unchanged, but the upper bound is increased to roughly c 2 s = 1.185, which is a significant increase. As expected, the weight of the additional shell remains zero as long as c 2 s remains in the original interval 1/3 ≤ c 2 s ≤ 2/3. As soon as c 2 s exceeds 2/3, the weight of the additional shell starts to increase, while at the same time the weight of the shell c 2 i = 1 drops to zero and remains at that value. Therefore we have essentially joined two minimal models. Indeed, running the main script for the set c 2 i = 2, 4, 5 results in a unique solution and a range of validity 2/3 ≤ c 2 s ≤ 32/27.

B. Three-dimensional models
For M = 4, i. e. R = 3, we were unable to find a suitable velocity set that would comprise only two nontrivial shells. A straightforward and simple choice for three shells would be c 2 i = 1, 2, 3, corresponding to typical vectors (1, 0, 0), (1, 1, 0), and (1, 1, 1). In this case the matrix turns out to be rank-deficient, and there is no solution. Trying the three shells c 2 i = 1, 2, 4 (last value corresponding to a typical vector (2, 0, 0)) gives rise to a 25-speed model ( It is worth noting that the thus-derived 41-velocity model is different from the 41-speed model discussed by Chikatamarla and Karlin [11]. The latter comprises the five non-trivial shells c 2 i = 1, 2, 3, 9, 27, where in the case of c 2 i = 9 only the six vectors of type (3, 0, 0) are taken into account, while the c 2 i = 27 shell contains only the eight vectors of type (3,3,3). To analyze this case, we need to add one more shell in order to allow for a varying c 2 s value, for which we take c 2 i = 16. Indeed we then find that the model has a unique solution and a fairly narrow range of validity of 0.3500280 ≤ c 2 s ≤ 0.3675445. At the upper limit the weight of c 2 i = 16 vanishes, and thus we recover the model of Ref. [11]. Here we find w (000) = 0.2759976, w (100) = 0.06508547, w (110) = 0.02482560, w (111) = 4.256684 × 10 −3 , w (300) = 2.512627×10 −4 , w (333) = 2.674506×10 −6 . Comparison with Ref. [11] shows that these parameters are identical to the numbers given there.
With some trial and error (along similar lines as described in more detail for the two-dimensional case), we were also able to find minimal models for M = 8 (R = 11) and M = 10 (R = 18). For eighth-order isotropy, a model of ten non-trivial shells turns out to be sufficient: 2,3,4,6,8,9,11,16,27, where for c 2 i = 9 we take the subshell of type (3,   interval of validity. Similarly, we also found a possible minimal model with tenth-order isotropy. This is facilitated by the set c 2 i = 1, 2, 3, 4, 6, 8, 9, 11, 12, 17, 18, 25, where the shell c 2 i = 9 is restricted to vectors of type (3, 0, 0), while for all other modulus shells we take all subshells. This set comprises 221 velocities in total and the model is valid in the interval 1.033691 ≤ c 2 s ≤ 1.206545. The reduced models at the lower and upper limit of validity are obtained by elimination of shells which both contain 24 velocities. The reduced models are therefore both 197-speed models.
For these two final models we do not present the expansions of the weights in powers of c 2 s , for similar reasons as for the case d = 2, M = 10. Other model properties are summarized in Tabs. II and III.
The three-dimensional models that were investigated in a recent paper by Shan [14] (going up to tensor order M = 8) could all be verified, see Appendix B.

IV. SUMMARY
The present study has shown that it is possible to formulate the problem of constructing weight coefficients in an LB model as one of numerical linear algebra. Crucial for this to work was (i) the notion of c 2 s as a free parameter; (ii) a detailed understanding of the symmetry restrictions on the dimensionality of the underlying tensor spaces; (iii) a mapping of the tensor equations to scalar equations by contraction with tensors of the form n α n β . . . n γ constructed from random unit vectors; and (iv) analysis of the linear-algebra problem in terms of the singular-value decomposition. Putting these observations into software, it is possible to write a program that (i) checks for the validity of a given set of shells, and (ii) calculates the corresponding weights. We found it encouraging to see with what ease the automatic script does all the algebra to derive standard LB models and even new ones -to the best of our knowledge, so far no LB model has been discussed in the literature that is isotropic up to tensor rank ten. The successful examples of Sec. III show clearly that this is a fairly useful tool for the LB community. velocity u: (A1) and we require that this holds for all tensors up to a certain rank K. For example, the populations according to Eq. 7 satisfy Eq. A1 up to tensor rank K = 2 (cf. Eqs. [13][14][15]. However, it turns out that this problem can be solved fairly easily as soon as the set of velocities c i , along with the corresponding set of weights w i , has been found. The MBC problem according to Eq. 25 must be solved up to tensor rank M = 2K, and then a straightforward solution of Eq. A1 is found in terms of a tensorial polynomial of order K in u, where the expansion coefficients are essentially the tensor Hermite polynomials in c i , which were introduced into LB theory by He and Luo [36]. How this is done will be detailed below. It thus turns out that the most difficult aspect of the problem is the identification of a proper set of velocities and the determination of the weights (as should have become quite clear from the main text).
To simplify the problem of Eq. A1 we first introduce suitably scaled variables: ν i = n eq i /(w i ρ), d i = c i /c s , ξ = v/c s , η = u/c s , as well as a normalized Maxwell-Boltzmann distribution In terms of these variables, Eq. A1 is written as At this point, it is useful to introduce tensor Hermite polynomials [8,37] via their definition where ∂ α denotes a derivative in velocity space, ∂ α = ∂/∂ξ α . It should be noted that n denotes both the rank of the tensor as well as the degree of the polynomial in ξ.
It can be shown [37] that the polynomials are mutually orthogonal with respect to the weight function φ( ξ). The definition implies that the Taylor expansion of φ( ξ − η) with respect to η reads Now, instead of requiring the identity of tensor moments up to rank K (Eq. A3), we may equivalently require the identity of the corresponding expressions, where the products d iα d iβ . . . d iγ and ξ α ξ β . . . ξ γ are replaced by the corresponding Hermite polynomials up to order K: We now insert the Taylor expansion, Eq. A5. Making use of orthogonality, one sees that only the term m = n survives: For our purposes, it is not necessary to evaluate the rhs further. Rather we note that Eq. A7 needs to be satisfied for all n with 0 ≤ n ≤ K, and we now wish to show that the polynomial ansatz which is, in essence, a polynomial in the flow velocity u, does indeed solve the problem. Inserting the ansatz into the lhs of Eq. A7, one sees that there polynomials in d i occur, whose order does not exceed 2K. However, the coefficients w i have already been adjusted such that the u = 0 MBCs are satisfied up to order 2K. It is therefore justified to replace i w i . . . on the lhs with d d ξφ( ξ) . . ., where simultaneously d i is being replaced by ξ. Again, orthogonality tells us that only the term m = n survives. After these operations, it becomes obvious that rhs and lhs are identical, which completes the proof.

The entropic approach
An alternative approach is to find the equilibrium populations by maximizing a suitably constructed entropy. This has been popularized by the so-called "entropic" LB method [38,39]. The entropy can be derived by elementary statistical considerations, as outlined in Ref. [40]. Here one assumes a lattice gas with many particles on each lattice site, such that the notion of a single-site entropy makes sense. Each particle has a mass m, and we define µ as the associated mass density, µ = m/a d , where a is the lattice spacing. The model then yields for the entropy Defining a scaled entropy asS = µS/ρ, this can be written in terms of the reduced variables of the previous subsection:S The equilibrium populations are then found by maximizing S under the constraints of given mass and momentum, i n i c i = ρ u, (A12) We now wish to show that the solution derived in the previous subsection, i. e. a Kth-order polynomial in the flow velocity u (see Eq. A8), is an approximate solution of the maximum-entropy problem for small u, correct up to error terms of order u K+1 . Equivalently, we may also show that Eq. A8, evaluated for K = ∞, is the exact solution of the maximum-entropy problem, and we will take that latter approach. The proof is complete as soon as it is clear that the Lagrange multipliers λ ρ and λ u can be adjusted in such a way that Eqs. A17 and A18 hold. Since we assume that the MBCs are satisfied up to infinite order in u, we may however replace the terms i w i . . . on the lhs by the corresponding integrals d d ξφ( ξ) . . .. We therefore obtain The Gaussian integrals on the lhs are trivial to evaluate; this yields − λ u exp 1 2 λ 2 u = η exp (λ ρ ) .
Therefore a solution for the Lagrange multipliers can indeed be found; it is simply given by λ u = − η and λ ρ = η 2 /2.
Appendix B: Comparison with Ref. 14 We have used the present Python script in order to verify the results reported in Ref. 14, in which the author studies the MBCs within the framework of Gauss-Hermite quadratures. The quadratures are labeled as E M d,n where d is the spacial dimension, n is the number of velocities and M is the highest tensor order satisfied. For the comparison, note that the parameter c of Ref.
14 must be identified with 1/c s in the notation of the present paper. Furthermore, it should be noted that in the present paper "maximum tensor order" refers to the largest non-trivial (i. e. even) order, while the notation of Ref. 14 includes the next tensor order as well (which is trivially satisfied because it is odd). In other words the notion of, e. g., "maximum tensor order 7" in Ref. 14 corresponds to "maximum tensor order 6" in the context of the present work.
The "test" mode described in Sec. II G was written precisely for such purposes. We used it to check the quoted weights for one-dimensional (Table 2 of Ref. 14), two-dimensional (Tables 3 and 4), and three-dimensional (Tables 5 and 6) models. All numbers given in the paper turned out to be correct, except for two minor typos, which the script detected by being unable to verify the weights. The first typo occurs in Table 2, quadrature E 9 1,7 , where a direct calculation with one additional (auxiliary) shell c 2 i = 16 shows that the weight w 4 for the shell with c 2 i = 9 should read 812.129 instead of 8121.29. The other typo occurs in Table 4, third model, listed in column six. This model was checked further by a direct calculation using the velocity shells c 2 i = 1, 2, 4, 5, 9, 13, 18, 16. Here again c 2 i = 16 serves as an auxiliary shell. We then found that the solution from our script coincides with the solution from the table, except for the weight for the typical velocity (0, 2) which should read 862.347 instead of 8623.47.