Choice of boundary condition for lattice-Boltzmann simulation of moderate Reynolds number flow in complex domains

Modeling blood flow in larger vessels using lattice-Boltzmann methods comes with a challenging set of constraints: a complex geometry with walls and inlet/outlets at arbitrary orientations with respect to the lattice, intermediate Reynolds number, and unsteady flow. Simple bounce-back is one of the most commonly used, simplest, and most computationally efficient boundary conditions, but many others have been proposed. We implement three other methods applicable to complex geometries (Guo, Zheng and Shi, Phys Fluids (2002); Bouzdi, Firdaouss and Lallemand, Phys. Fluids (2001); Junk and Yang Phys. Rev. E (2005)) in our open-source application \HemeLB{}. We use these to simulate Poiseuille and Womersley flows in a cylindrical pipe with an arbitrary orientation at physiologically relevant Reynolds (1--300) and Womersley (4--12) numbers and steady flow in a curved pipe at relevant Dean number (100--200) and compare the accuracy to analytical solutions. We find that both the Bouzidi-Firdaouss-Lallemand and Guo-Zheng-Shi methods give second-order convergence in space while simple bounce-back degrades to first order. The BFL method appears to perform better than GZS in unsteady flows and is significantly less computationally expensive. The Junk-Yang method shows poor stability at larger Reynolds number and so cannot be recommended here. The choice of collision operator (lattice Bhatnagar-Gross-Krook vs.\ multiple relaxation time) and velocity set (D3Q15 vs.\ D3Q19 vs.\ D3Q27) does not significantly affect the accuracy in the problems studied.


I. INTRODUCTION
In the last two decades, lattice-Boltzmann methods (LBM) [1][2][3] have been widely studied and used for fluid flow problems. We are particularly interested in its application to the study of hemodynamics: the flow behavior of blood under physiological conditions. Accurate simulation of the flow of blood in an individual could have near-term clinical benefits for, inter alia, the treatment of aneurysms [4][5][6] or stenoses [7][8][9].
Applications such as these have a number of challenges [retaining computational performance in a relatively sparse, three-dimensional (3D) fluid domain; capturing the complex rheology of a dense suspension of deformable particles; accounting for the compliance of vessel walls] but here we address the choice of boundary condition method in complex domains. We will not examine the rheology and compliance problems in this article, restricting ourselves to a Newto-nian fluid within a rigid-walled system, but instead aim to provide recommendations for the choice of boundary condition method(s) for a complex geometry. This choice is examined in the context of two other variables: the discrete velocity set and lattice-Boltzmann collision operator. Bearing in mind recent controversy over the reproducibility of computational science [10,11], we have released the source code of our application HEMELB for the community's scrutiny and use.
Lätt et al. [12] considered five boundary conditions and assessed their accuracy. However, they studied only boundary conditions in which the wall passes through a lattice point, immediately restricting their results to boundaries that are normal to one of the Cartesian directions of the underlying grid. Boyd et al. [13] compared simple bounce-back to the Guo-Zheng-Shi method [14] in simulations of arterial bifurcations, finding that the two methods give different results when stenosis is present, but without any analysis suggesting which, if either, is more accurate. Stahl et al. [15] performed LBM simulations with simple bounce-back boundary conditions to examine the effect of staircased boundaries (where walls that are not aligned with the underlying grid are approximated by a set of grid edges) on the measurements of shear stress both at the wall and in the bulk flow, finding errors in the shear stress of up to 35% for two-dimensional (2D) channel flow and 28% for 2D bent-pipe flow. They also introduced a method for estimating the local wall normal from the shear stress tensor. The endothelial shear stresses in large arterial trees were studied by Melchionna et al. [16] with their code MUPHY [17]. Pan et al. [18] have studied the effect of different boundary conditions in porous media for flows at low Reynolds (Re) number, concluding that interpolation at the boundaries significantly improves accuracy.
In this paper we comprehensively compare the accuracy of simulations performed with some of the most widely used solid-fluid boundary conditions against analytical solutions relevant for simulations of blood flow in the larger vessels. To the best of our knowledge, this is the first like-for-like comparison of the popular boundary conditions for lattice Boltzmann in geometries and under conditions relevant for hemodynamics and other moderate-Re-number problems. Other comparison studies cover this case, instead focusing on straight boundaries [12] and porous media [18].
The outline of the paper is as follows. In Sec. II we briefly introduce the lattice-Boltzmann method and discuss the collision operators, velocity sets, and boundary conditions used. Our open-source software [19] and simulation protocols are described, and results given in Sec. III. We present our conclusions in Sec. IV.

II. THE LATTICE-BOLTZMANN METHOD
Here we provide a brief summary of the lattice-Boltzmann method (LBM); for a full derivation, we refer the reader to one of the many thorough descriptions available [1][2][3]. LBM operates at a mesoscopic level, storing a discrete-velocity approximation to the one-particle distribution functions of the Boltzmann equation of kinetic theory [20], {f i (r,t)}, on a regular lattice, with grid spacing x. The set of velocities {c i } is chosen such that the distances traveled in one time step ( t), e i = c i t, are lattice vectors and to ensure Galilean invariance [21]. When one only wishes to reproduce Navier-Stokes dynamics, the set is typically a subset of the Moore neighborhood, including the rest vector. For 3D simulations, the most commonly used sets have 15, 19, and 27 members (termed D3Q15, D3Q19, and D3Q27, respectively). The LBM can be shown, through a Chapman-Enskog expansion, to reproduce the Navier-Stokes equations in the quasi-incompressible limit with errors proportional to the Mach (Ma) number squared.
In the absence of forces, the density ρ(r,t) and the velocity u(r,t) at a fluid site can be calculated from the distributions by Advancing the system one time step is conceptually divided into two stages. The first is collision, which relaxes the distributions towards a local equilibrium (we denote the postcollisional distributions as f i ): whereˆ is the collision operator. The second is streaming, where the postcollisional distributions are propagated along the lattice vectors to new locations in the lattice, defining the distributions of the next time step:

A. Collision operators
The collision operator approximates the microscopic interparticle interactions. Here we summarize the two considered in this article: lattice Bhatnagar-Gross-Krook and multiple relaxation time.
Lattice Bhatnagar-Gross-Krook (LBGK) [21,22] approximates the collision process as relaxation towards a local equilibrium (see below), in a discrete velocity analog of the Boltzmann-BGK approximation from kinetic theory [23], where τ is the relaxation time. In this case all modes relax towards equilibrium at the same rate. This can be shown, through a Chapman-Enskog expansion (see, e.g., [2,24]), to reproduce the Navier-Stokes equations with a kinematic viscosity ν given by For the equilibrium distribution, we use a second-order (in velocity space) approximation to a Maxwellian distribution, where the weights w i and speed of sound c s depend on the choice of velocity set. LBGK is simple to implement and gives excellent performance; it is therefore widely used. Multiple relaxation time (MRT) collision operators, developed at the same time as LBGK [25], generalize the notion of relaxation towards local equilibrium (the same as above) by allowing different relaxation rates for different moments of the distributions, potentially improving stability properties and accuracy [26]. The eigenvalue of the collision matrix which corresponds to the relaxation of shear stress λ shear determines the viscosity as in (6) (τ → 1/λ shear ). We use the MRT operator on the D3Q15 and D3Q19 lattices, as described by d'Humières et al. [26]. Here, the distribution function f i are transformed into the moment basis m i by the matrix M ij and the different moments can be relaxed towards equilibrium at different rates, before being projected back into the distribution space for advection, where s j is the relaxation rate for mode j .

B. No-slip boundary conditions
Boundary conditions for LBM have some additional complications compared to boundary conditions for Navier-Stokes-based methods, due to the LBM's mesoscopic nature. Typically, one wishes to impose conditions on the macroscopic, hydrodynamic variables (p, u) but these must be implemented through a closure relation for the mesoscopic distributions. There is no single, obviously superior choice. In this section, we will briefly review some commonly used boundary condition methods for lattice-Boltzmann models which impose the no-slip condition that the velocity of fluid adjacent to a wall is equal to the velocity of the wall.
Many boundary condition methods do not vary their behavior with respect to the location of the walls in relation to the Eulerian grid. The wall is often assumed to pass infinitesimally close to a point, the grid point remaining inside the fluid domain (sometimes referred to as "wet node" boundary conditions [12]), or alternatively the boundary is considered to be halfway along the lattice vector to a point outside the fluid. In the case of complex or non-lattice-aligned domains, these methods (and simple bounce-back; see below) will always cause a first-order modeling error, irrespective of the order of numerical accuracy of the resulting lattice-Boltzmann method. (Typically second-order accuracy is sought, since this is the inherent accuracy of standard lattice-Boltzmann methods in bulk fluids.) This point has been studied numerically by Stahl et al. [15]. Other boundary conditions allow the wall to be at an arbitrary position along the link between a solid and a fluid site. These reduce the modeling error of fixed wall position methods, but often at the price of increased complexity and/or the requirement of data from neighboring fluid lattice points. Furthermore, a number of methods suffer reduced accuracy at sites in corners, which can reduce the accuracy of simulations throughout the domain.
There are a number of popular methods [27][28][29][30] that can only operate on straight, axis-aligned planes and force the boundary to pass directly through the lattice point. Malaspinas et al. [31] generalized the regularized method [29] to cope accurately with corner nodes. The authors acknowledge that it fails for the case of a D3Q15 lattice and a right-angled corner and this method also forces the boundary to pass through a lattice point. These methods are, therefore, unsuitable for problems involving complex boundaries.
Simple bounce-back (SBB) is perhaps the most widely used boundary condition for solid walls, positioning them halfway along the lattice vector from fluid to solid. It is straightforward to implement and gives second-order accurate simulation results for flat boundaries aligned with the Cartesian axes of the lattice [32], although in more complex cases this degrades to first-order accuracy [33]. It is also computationally cheap and local in its operation. SBB ensures conservation of mass up to machine precision. In this work we use the halfway bounce-back scheme [34]. Despite SBB exhibiting the modeling error mentioned above, we include it in this study due to its wide use.
The Bouzidi-Firdaouss-Lallemand (BFL) method [35] starts with simple bounce-back, but interpolates the value of the to-be-propagated distribution with the distribution at the fluid site which standard bounce-back would stream it to. They present two variants: one using linear interpolation and another using quadratic interpolation. In the present work, we restrict our attention to the linear case, due to its locality and smaller communication requirement (indeed, it can be implemented without any interprocess communication above the normal lattice-Boltzmann streaming step, albeit at a price of revisiting the sites adjacent to the wall). Bouzidi et al. claim that both variants show second-order convergence, but with the linear method having a poorer prefactor [35]. The method as presented by Bouzidi et al. may, depending upon the distance of the wall from the fluid site, fail when computing f i (x,t + t) where x − e i lies outside the fluid domain, if the site at x + e i is also outside. Since this can occur in a curved domain, in these cases we fall back to performing SBB. While this does degrade accuracy slightly, it does appear to offer good stability of the simulation.
Guo, Zheng, and Shi (GZS) [14] present a boundary condition which decomposes the unknown distributions at the wall into equilibrium and nonequilibrium parts. The equilibrium part uses the density of the fluid site and a linearly extrapolated velocity such that the velocity at the solid wall is as imposed. For the nonequilibrium part, the value from the fluid site (or the next site into the bulk) is used. The method as described in [14] has the same problem as BFL. We again fall back to SBB when this occurs.
Junk and Yang (JY) [36] propose a correction to the simple bounce-back method. They claim an advantage compared to interpolation-based methods such as GZS and BFL as their method is completely local and is able to handle nonstraight boundaries where sites have lattice vectors in opposite directions, both crossing the solid-fluid boundary. The method arises from their analysis [37] of boundary conditions for LBM in terms of general methods for studying finite difference schemes rather than the standard Chapman-Enskog expansion. By adding correction terms to the collision operator and then discretizing them in an optimal (under their analysis framework) manner, they ensure the Navier-Stokes equations are obeyed with the correct boundary conditions. The multireflection method by Ginzburg and d'Humières [38] uses linear combination of five neighboring distribution functions along a link direction plus a correction to determine the unknown direction. This combination is obtained from a Chapman-Enskog expansion at the boundary sites, which is third-order accurate for steady flows. Chun and Ladd [39] present a method based only on interpolation of the equilibrium part of the distribution functions, relying on the fact that the nonequilibrium part is always one order higher in the Chapman-Enskog expansion. Chun and Ladd demonstrate numerically that their method is advantageous for problems with many small gaps, such as porous media.
It is well known that LBGK combined with SBB does not locate the plane of zero velocity exactly halfway along the link for flows with varying velocity gradients [40] and that this can be a significant issue in, for example, porous media [18]. The effective width of the channel in lattice units H , for a Poiseuille flow in a 2D lattice-aligned channel, driven by a constant body force is given by [Eq. (19) from [41]] where N x is the number of lattice points across the channel and ν is the viscosity in lattice units. While this error can be reduced by optimizing the choice of MRT relaxation times [42], it can only reach zero for particularly simple cases. Typically at Re 1 the larger system sizes and smaller relaxation times combine to reduce the relative error. For example, with N x = 40 and τ = 0.55 the effective width is 39.987, a relative error of 0.03%. Due to the smallness of these errors in the problems studied here, we do not consider them further. We also note that some authors propose the use of grid refinement [43], finite difference [44], and finite volume [45,46] discretizations of the discrete Boltzmann equation as methods for improving accuracy around nonplanar boundaries, but we will restrict our attention here to implementations on a single, regular grid. In the case of moving boundaries, a refilling procedure [47] can be used. Additionally, the immersed boundary method (IBM) [48] has been used in conjunction with the LBM to simulate rigid [49] and deformable [50] boundaries. IBM requires a further layer of fluid sites outside the walls as well as another, simpler boundary condition at the edge of the halo region, but admits an extension to moving boundaries in a natural way.

C. Open boundary conditions
For inlets, we use Ladd's method [34] to impose the expected velocity profile (parabolic, Womersley). This is a modification of SBB, with a correction term −2w i ρu · c i /c 2 s , where u is the imposed velocity at the halfway point of the link. However, using this at outlets as well will cause the total mass in the system to increase unboundedly, due to the unbalanced mass fluxes into and out from the system since LBM has a finite compressibility [51].
Alternatively, one could impose mixed Dirichlet-Neumann boundary conditions at the outlet: wheren is the inward pointing normal of the open boundary.
A number of authors [28,[52][53][54] have proposed open boundary conditions that fulfill some or all of these requirements, however, the techniques cited are only suitable for inlets aligned with the lattice's axes. We have therefore developed a simple method that meets these requirements.
Assume that, at the start of a time step, all distributions for an inlet or outlet site are known. LBM then proceeds as normal: a collision step, followed by a streaming step; the distributions that would have come from exterior sites now have an undefined value. We close the system by constructing a "phantom site" (indicated below with a subscript P) beyond the boundary, whose hydrodynamic variables are estimated based on the imposed values and those at the inlet or outlet site (indicated with subscript I). Note that there is one phantom site per missing distribution, i.e., the phantom sites of adjacent inlet or outlet sites are unrelated. This is to eliminate the need for communication between neighboring sites. The equilibrium distribution for the missing direction is computed at the phantom site and then streamed into place. For the density, we assume the target pressure p 0 , i.e., we perform a zero-order extrapolation from the outlet plane. Condition (10b) is enforced by projecting away any velocity component not parallel to the inlet or outlet normal,n. For condition (10c), we take first-order finite-difference approximations for the derivatives, givinĝ Hence u(r P ,t) ≈ (u(r I ,t) ·n)n.
For lattice sites that are adjacent to both open and closed boundaries, we use the above method on those links that cross the inlet or outlet and the solid wall boundary condition on those links that cross the solid wall of the domain.

III. SIMULATIONS
Our goal is to determine which combination of boundary condition, collision operator, and velocity set gives the best allround accuracy in a nontrivial geometry (i.e., with curved surfaces), with a focus on computational hemodynamics. We also assess the computational requirements of the different models.
We compare against analytical solutions, which restricts us to relatively simple domains: We choose a cylinder and a torus. For the cylinder we use both steady, Hagen-Poiseuille flow and a time-dependent, sinusoidal Womersley flow. For the torus, we use only steady flow. By choosing a non-axis-aligned orientation for the cylinder, we better mimic a typical production simulation of the human vasculature. The orientationn was chosen pseudorandomly from the unit sphere, subject to the constraint thatn ·ê i 0.9, ∀i. The value iŝ Our approach is to select parameters in lattice units, but with physiologically relevant Re and Womersley (α) numbers.

A. Software: HEMELB
The simulations in this paper were performed with HEMELB [55], a lattice-Boltzmann-based fluids solver, which includes capability for in situ imaging of flow-fields and real-time steering [56]. It is a distributed memory application, parallelized with MPI. We have shown that HEMELB's computational performance scales linearly up to at least 32 768 cores [57]. We have released the software online [19], under the open-source GNU Lesser General Public License (LGPL), to enable interested researchers to reproduce our results as well as to use the software for novel problems. HEMELB has several linked components, described in Groen et al. [57]. We have recently redeveloped the lattice-Boltzmann core to allow for easy switching between use of different velocity sets, collision operators, and boundary conditions, through a statically polymorphic, object-oriented design. This avoids any run-time overhead due to dynamic polymorphism [58]. The individual software components are tested through a battery of over 100 unit and regression tests, which are run nightly by our continuous integration server.
HEMELB includes, among other features, the following: the D3Q15, D3Q19, and D3Q27 velocity sets; the lattice Bhatnagar-Gross-Krook (LBGK) and multiple relaxation time (MRT) collision operators; and, the simple bounce-back (SBB), Guo-Zheng-Shi (GZS), Bouzidi-Firdaouss-Lallemand (BFL), and Junk-Yang (JY) boundary condition methods. HEMELB does not currently support the combination of the GZS boundary condition with the MRT collision operator; this will be addressed in a future release.
The software includes a separate tool for defining the simulation domain. This requires either a geometric primitive or a general surface, meshed with triangles. The user can then place inlets and outlets, specify their pressure and/or velocity boundary conditions, and select the fineness of the lattice. This setup tool then generates the input for HEMELB itself, producing a description of each fluid site and, if needed, the location of the wall. For the cylinders used in this work, we directly use the cylinder, rather than approximating it with triangles.

B. Convergence analysis
In this section we report on a series of simulations of Hagen-Poiseuille flow over a range of resolutions and Re numbers, defined here as where U max is the maximum velocity, D the pipe diameter, and ν the kinematic viscosity. The velocity U is where r is the distance from the cylinder axis, defined byn, and ∂ n p is the pressure gradient along the axis.
In Table I we list all the parameters chosen for the simulations. The range of Re spans typical values for cerebral arteries in the human body [59]. For each case we vary the diameter D from 12 to 192 lattice units; the length of tube used is given by L = 4D. Due to the finite speed of sound in LBM (c s = 1/ √ 3 for the models used here), we list the Ma numbers (Ma ≡ U max /c s ); the lowest resolution simulations in each have extremely high values and will consequently have poor accuracy, but this allows us to assess the convergence behavior at modest computational expense. Next we show the value of the LBGK relaxation time τ which must be greater than t/2 [1] and not be much greater than t [60]. For the MRT simulations, we use the parameters from [26] except for the stress relaxation rate where we use s 9 = s 11 = 1/τ , i.e., s 1 = 1.6 t, s 2 = 1.2 t, s 4 = 1.6 t, and s 14 = 1.2 t. We hold τ constant in lattice units while refining the spatial resolution, which implies diffusive scaling of the time step (when converted to physical units), i.e., t ∝ x 2 . The density, and hence pressure, difference ρ driving the flow is also reported; this must remain much less than the reference density of the simulation ρ 0 in order to keep compressibility errors small. Finally, we list the time for momentum to diffuse across the cylinder's diameter, T mom ≡ D 2 /ν, and the time for a sound wave to propagate the length of the cylinder, T s ≡ L/c s , to give some idea of the time required for the simulation to converge to a steady state. To determine whether a simulation has indeed converged, we compute the maximum difference between flow fields at two times: and require that u(t,t + 1) < 10 −7 .
We use a simple initialization procedure, initializing to a uniform density fluid at rest. This approach is general and can be applied to any geometry without requiring any preprocessing step [61][62][63], but does require longer simulation times until a steady solution is reached. For each simulation we compare the Poiseuille solution, U(r), with the velocity field found by simulation, u(r,t). We define the velocity error as u (r,t) ≡ u(r,t) − U(r), (17) and use the 2 norm scaled by the predicted velocity range as our measure of error: These are evaluated over all fluid sites in the central 90% of the cylinder, thus excluding the inlet and outlet sites. For the pressure gradient, we use the measured difference at the edge of this volume divided by the distance between the two planes. This is to disentangle the errors due to the open boundary condition method from the no-slip condition. We will return in the future to a full validation of the open boundary condition method. The simulations were performed on HECToR, the United Kingdom national supercomputer, using up to 544 cores. The number of cores for each run was chosen to minimize the run time while remaining efficient which we have shown to occur at around 10 3 sites per core [57].
The Junk-Yang method showed poor stability (instability being defined as one of the distribution functions becoming negative) and only producing a converged solution for the lowest resolution cases at low Re number. We therefore disregard the JY method from further consideration. We do, however, have some confidence in our implementation of the method as our unit test suite demonstrates (data not shown) that the implementation reduces to SBB when the wall is a plane normal to one of the axes and halfway between the site and its nonfluid neighbors, as expected [36].
In Fig. 1 we show the velocity errors as a function of increasing resolution for the remaining three boundary condition methods and the three velocity sets. We show only the results for the LBGK collision operator (the results with MRT are visually indistinguishable).
The results clearly show that SBB gives the expected firstorder convergence as the resolution increases, while BFL and GZS show second-order convergence with similar prefactors. GZS offers slightly superior accuracy than BFL, reaching up to 20% lower errors (Re = 300) but the relative benefit is highly variable, vanishing in some cases.
As the Re number is increased, we note that the errors also increase, particularly when going from Re = 1 to 30. This is likely due to the increase in the Ma number. The small improvement from Re = 30 to 300 is probably due to the decreasing density difference and hence reduced compressibilty errors.
The velocity set has a small effect on the measured accuracy: D3Q15 and D3Q19 are almost coincident. D3Q27 does offer a small benefit of 1%-4% for GZS and BFL, but worsens accuracy for SBB; this may be due to the greater number of sites which have to implement the boundary condition and therefore suffer from the first-order modeling error.

C. Womersley flow
Here we report on simulations of oscillatory flow, in order to explore the different boundary condition methods' effect on accuracy for time-dependent cases. The Womersley number (α) is a dimensionless number governing dynamical similarity in cases of oscillatory flow. It relates to the ratio of transient forces to viscous forces (or alternatively the ratio of diameter to boundary layer growth during one period of oscillation). In the case of a cylinder (radius R, axisn) and laminar flow with zero average pressure gradient [∂ n p(t) = (A/L) sin ωt, ω ≡ 2π/T osc ], it is defined as where ν is the fluid viscosity. The time-dependent Navier-Stokes equations admit an analytic solution [64]: where J 0 is the order-0 Bessel function of the first kind and Re(z) gives the real part of z.
In the larger arteries of the human body, Womersley numbers range approximately between 4 and 20 [59], so we select from this range. For these simulations, we define the Re number as that for a Poiseuille flow with a pressure gradient given by the amplitude of the imposed gradient. Based on measured Reynolds and Womersley numbers in human arteries [59], we have fit a simple power law α = ARe γ , giving γ = 0.36 and A = 0.1. We select parameters only from this curve within the Re − α plane, as shown in Table II. We use the same cylinder orientation as in Sec. III B and select a diameter D = 48 x, as this gives a reasonable balance between computational cost and accuracy, such as would be chosen for production simulations; we keep L = 4D. We initialize the simulation to a constant pressure with zero velocity (with the phase of the pressure oscillation chosen such that the driving difference is zero at simulation start). We allow the simulation to run until u(t,t − T osc ) < 10 −7 is reached for all sample points during one oscillation; however, to reduce the amount of data collected, we record data only for those points within one lattice unit of an axis-normal plane halfway along the cylinder. This was reached in 6-25 oscillation periods. We run these simulations for all combinations of LBM components and compute residuals at four sample points during one pressure oscillation period using Eq. (18). The four residuals are then reduced by taking the root-mean-square average and the maximum, respectively, effectively extending the averaging or maximization over time as well as space. We estimate the maximum as the maximum velocity over a full period at the center of the pipe, i.e., where U max is the maximum velocity of a Poiseuille flow driven by a pressure gradient of the amplitude of the Womersley flow.
In Fig. 2 we show an example of the simulated velocities, for the D3Q15 velocity set, the LBGK collision operator, and the BFL boundary condition. We plot the axial velocity, normalized by U max , against the radial coordinate at four evenly spaced moments during the period of oscillation. The agreement with the analytical profiles is excellent.
The 2 error norms for the simulations are shown in Table III. We see that all simulations well approximate the analytical solutions, with errors in the range 0.1%-4%. In Fig. 3, we show the range of error residuals against the Womersley number. This clearly shows the BFL and GZS methods offers superior accuracy to SBB in all cases, irrespective of the choice of collision operator and velocity set. Choosing MRT over LBGK does not offer any benefits, however, we have not varied the relaxation rates for the nonstress tensor moments of the distributions (e.g., by projecting out all the kinetic modes every time step or optimizing "magic numbers" [42]). We note  again that we have adopted the parameters from [26] without optimization. The choice of velocity set does play a small role, especially in the higher Re and Womersley number cases, leading to slightly improved results with larger velocity sets.

D. Dean flow
The problems in the cylinders studied above are fundamentally 2D flows. To more throroughly test the boundary conditions, we choose a further benchmark problem that is fully 3D: flow in a torus. For moderate Dean numbers (the dynamical similarity number for flow in curved pipes; see below) the primary, axial flow is a perturbation of a parabolic profile while the secondary flow in a cross-sectional plane is a  The radius of the tube is a and the distance from the center of the tube to the center of the torus is c. We define a toroidal coordinate system (r,θ,φ) where (r,θ ) are polar coordinates in the cross-section plane and φ is the angle that the cross-section plane makes with the x-z plane.
pair of counter-rotation vortices-see the grayscale images in Fig. 5. We define the radius of the tube as a and the distance from the center of the tube to the center of the torus as c; our coordinate systems are illustrated in Fig. 4.
The characteristic number for flow in curved pipes is the Dean number κ given by [65,66] where Re is the Reynolds number based on the maxiumum axial velocity of the flow that would result from the same pressure gradient in a straight pipe and δ ≡ a/c is the curvature ratio. The Dean number is the square root of the product of the inertial and centrifugal forces, scaled by the viscous forces [67]. Dean [65,66] was the first to derive an approximate solution for laminar, fully developed flow in a torus and much of the later work is reviewed in [67]. We choose to use the approximations for a weakly curved pipe from Siggers and Waters [68], due to their clarity of presentation. They define the velocity components (U r ,U θ ,U φ ) ≡ ν a (u,v,w) and the stream function ψ by where h = 1 + δr cos θ . They then expand as a power series in κ, and each term w k and ψ k as a series in δ: We use only the terms, w 02 = 1 128 (1 − r 2 )(−3 + 11r 2 + 10r 2 cos 2θ ), and ψ 00 = 1 2 10 × 3 2 r(1 − r 2 ) 2 (4 − r 2 ) sin θ, ψ 01 = 1 2 12 × 3 2 × 5 r 2 (1 − r 2 ) 2 (56 − 17r 2 ) sin 2θ, (27) This expansion is accurate for δ 1 and κ 100 [68]. For our simulations, we select a curvature δ = 0.1 and two target Dean κ 0 numbers (100 and 200), as representative of physiological values while not making the series expansion too inaccurate. The tube radius a = 24 x (hence c = 240 x) and the shear relaxation time τ = 0.6 t for κ 0 = 100 and τ = 0.55 t for κ 0 = 200. We sample the flow at a plane halfway around the torus to keep data volumes small and impose a parabolic flow profile at the inlet and a constant pressure at the outlet. Since the imposed profile is incorrect, we simulate 90% of the full torus to allow distance for the flow to fully develop. One must also compute the actual Dean number, by equating the imposed flux, This gives our actual Dean numbers as κ ≈ 111.78 and κ ≈ 223.56. We use the 2 -norm error from Eq. (18) but also evaluate it for the primary flow and the secondary flow. The simulations using the GZS boundary condition were unstable for the larger κ/Re case, but the remaining results are collected in Table IV scaled by U φ evaluated at the tube center, i.e., The table of error norms shows that the error is dominated by the error in the secondary flow, which is approximately independent of the LBM model used. This is borne out by Figs. 5 and 6 which show the patterns of the secondary flow error (i.e., the coloring) remaining near constant. The κ ≈ 224 case has much larger errors that vary little which we ascribe to the series solution becoming inaccurate. We note that our simulations do locate the centers of the pair of vortices accurately. The streamlines for the SBB simulations show large errors in the velocity field's direction near the walls, due to the staircasing of the boundary. This is likely to cause the stress near the walls, which is a key hemodynamic factor, to have larger errors. The primary flow errors for the case κ ≈ 112 are approximately 25% larger for SBB than for either BFL or GZS, again, due to the staircasing of the boundary. The velocity set and collision operator do not strongly affect the measured error, but there is a small benefit to using MRT over LBGK and for using the larger velocity sets.

E. Relative performance
The relative performance of the options is germane to the choice of which combination of lattice-Boltzmann components to use. In Table V we give the measured per-core performance for the lowest Re/α pulsatile flow simulations. All these runs were performed on HECToR, a Cray XE6 supercomputer with two 16-core AMD Opteron 2.3 GHz Interlagos processors per node. The simulations used 64 cores. The performance is given in millions of site updates per second (MSUPS) and is based on timings that include only the lattice-Boltzmann updates. We also estimate the relative time to perform one site update for the different methods, assuming that an SBB site update takes the same time as a bulk fluid site update. The simulation domain includes 347 401 sites of which 42 340 (12%) are at the solid-fluid boundary and hence use the various boundary condition methods.
SBB gives the best computational performance in all cases while BFL requires between 20% and 60% more compute time; the extra time needed remains approximately constant across all collision operators and velocity sets, but reduces as a For the three velocity sets, the number of distribution updates per second (i.e., SUPS × Q) remains approximately constant-D3Q15, 22.5; D3Q19, 22.8; D3Q27, 21.6. The cost of changing the collision operator from LBGK to MRT is large, decreasing performance by a factor of two. We do not place much emphasis on this result as the MRT collision operator is implemented naïvely and there is scope to improve the performance.

IV. CONCLUSION
The majority of benchmark problems reported in the lattice-Boltzmann literature use lattice-aligned geometries, rather than the complex domains required by many applications. We have performed a comparison between LBM simulation solutions, from our open source application HEMELB [19], and analytical solutions in a non-lattice-aligned, curved domain up to Re numbers of 300, with steady and unsteady flow. We have varied the resolution of the grid used and the different components of the algorithm (collision operator, velocity set, no-slip boundary condition). We find that at these moderate values of Re, the choice of velocity set and collision operator do not greatly affect accuracy or stability, but that the choice of no-slip boundary condition method is critical.
Recent studies [69,70] have shown that the commonly used D3Q15 and D3Q19 velocity sets can give strongly orientation (with respect to the underlying lattice) dependent results when simulating flows for Re numbers 250, while the D3Q27 velocity set maintains good isotropy. The problems studied are in relatively complex cases (a circular pipe with a narrowing; circular and square pipes at higher Re number) but they are comparable to our simulations of flow in a toroidal pipe due to the strong three dimensionality of the Dean flow. We do not see any lattice artifacts in these simulations, even with Re ≈ 400 and the strong secondary flows. Even so, one must be aware of the possibility of anisotropy when simulating a complex flow and ensure that it does not occur in one's work.
The Junk-Yang method shows poor stability and is therefore unsuitable for our hemodynamic applications or other applications that require even moderate-Re numbers.
Simple bounce-back (SBB) shows first-order convergence over a wide range of resolutions and Re numbers, as expected. We confirm that the Bouzidi-Firdaouss-Lallemand (BFL) and (modified) Guo-Zheng-Shi (GZS) methods both show secondorder convergence over a wide range of resolutions and Re numbers. For the steady flow problem, GZS has lower errors than BFL, by a variable amount (up to 20%). For the timedependent problem, BFL has lower errors than GZS, by around 10%. GZS also became unstable for the largest Re number flows in curved pipes.
The typical goal of most computational modeling is to solve the desired system of equations to some problem-dependent accuracy in the minimum time. Considering the steady flow problem as a proxy, we can estimate which of the GZS and BFL boundary conditions will give the shortest simulation time. Taking an average increase in accuracy for GZS of ∼10% allows us to estimate that the GZS method can use a ∼ 5% lower resolution to obtain the same accuracy. Due to the diffusive scaling of the time step, the GZS method will require fewer site updates than BFL by a factor of ∼0.95 5 ≈ 0.79 or 21% less. However, the BFL method is more computationally efficient, by ∼30%, for the typical surface:bulk ratios studied here, so the equivalent-accuracy BFL simulation will complete sooner.
We therefore recommend BFL as the best all-round boundary condition of those tested, due to the good accuracy, performance, stability, and simplicity of implementation. The GZS method also offers good accuracy (in our modified form) but has worse stability at high Re number and much poorer performance. This last point may not matter in domains with a relatively low number of wall boundary sites. SBB is much less accurate, but may be acceptable for use in undemanding applications and when development time is at a premium. We believe that these results will prove helpful to the community when selecting methods for simulating hemodynamics and comparable applications with lattice-Boltzmann methods.