Neural Network Solutions to Differential Equations in Non-Convex Domains: Solving the Electric Field in the Slit-Well Microfluidic Device

The neural network method of solving differential equations is used to approximate the electric potential and corresponding electric field in the slit-well microfluidic device. The device's geometry is non-convex, making this a challenging problem to solve using the neural network method. To validate the method, the neural network solutions are compared to a reference solution obtained using the finite element method. Additional metrics are presented that measure how well the neural networks recover important physical invariants that are not explicitly enforced during training: spatial symmetries and conservation of electric flux. Finally, as an application-specific test of validity, neural network electric fields are incorporated into particle simulations. Conveniently, the same loss functional used to train the neural networks also seems to provide a reliable estimator of the networks' true errors, as measured by any of the metrics considered here. In all metrics, deep neural networks significantly outperform shallow neural networks, even when normalized by computational cost. Altogether, the results suggest that the neural network method can reliably produce solutions of acceptable accuracy for use in subsequent physical computations, such as particle simulations.


I. INTRODUCTION
Many important phenomena can be modelled effectively by partial differential equations (PDEs) with appropriate boundary conditions (BCs).When PDE problems are posed in domains with complicated geometries, they are often too difficult to be solved analytically, and must instead be approximated numerically.The standard tools for numerically solving PDE problems in complex geometries are mesh-based approaches, such as the finite element method (FEM) [1].In these methods, the problem domain is decomposed into a mesh of smaller subdomains, and the solution is approximated by a linear combination of simple, local functions.
In this work, we will explore a less common numerical solution method for PDE problems, which we will refer to as the neural network method (NNM) [2].In the NNM, the solution is directly approximated by a neural network (e.g., Fig 1), rather than by a linear combination of local basis functions.In a process called training, the network parameters are varied until it approximately satisfies the PDE and BCs.
The purpose of the present study is to investigate the effectiveness of the NNM on a problem exhibiting a complicated geometry.Specifically, the NNM is used to solve a model of the electric field in the slit-well microfluidic device, which is an application of active research interest [3][4][5][6].The problem domain is non-convex, and the electric field is discontinuous in the limit of sharp corners.Despite the growing popularity of the NNM, relatively few authors have validated it on problems with such illbehaved solutions.The rest of this introduction provides FIG. 1. Schematic of a fully-connected feedforward neural network of depth d and width w mapping coordinates (x, y) to an output ũ(x, y) .Each node computes a weighted sum of its incoming arrows, and the result (plus a bias) is passed to an activation function.In the NNM, the parameters are optimized to make ũ(x, y) approximately satisfy a target PDE and its BCs.
an overview of the NNM, including its previous use to study systems similar to the slit-well, as well as a review of the slit-well device itself.

A. The neural network method
The neural network method of solving differential equations was first published in 1994 by Dissanayake and Phan-Thien [2], and belongs to the broader family of techniques known as methods of weighted residuals [2,7].Around the same time, Meade Jr. and Fernandez [8] separately demonstrated a variant of the NNM that did not use iterative training, and instead solved a system of linear equations for the network weights; it was, however, designed for solving only ordinary differential equations.In 1995, van Milligen et al. [9] independently proposed a method quite similar to the original approach by Dissanayake and Phan-Thien [2], to solve second-order elliptic PDEs describing plasmas in tokamaks.In 1998, the NNM was proposed independently again by Lagaris et al. [10].Their modified methodology embedded the neural network within an ansatz that was manually constructed to exactly satisfy the boundary conditions; however, this form is challenging to construct when the boundary conditions or the domain geometry are complicated.Many authors have since contributed to the development of the NNM, and in 2015 Yadav et al. [11] published a book reviewing much of the work up to that time.
The NNM has various potential appeals over more common methods like FEM.For instance, the NNM is mesh-free, and generally produces uniformly accurate solutions throughout the PDE domain [11,12].Whereas earlier implementations used shallow neural networks (i.e.those having only one hidden layer), many authors have recently noted the significant benefits of using deep architectures [13][14][15][16][17][18][19][20][21][22][23][24][25].In particular, it appears that the NNM with deep neural networks performs remarkably well in high-dimensional problems [13,14,[16][17][18][20][21][22][23][24][25][26].Such high-dimensional PDEs are typically intractable using FEM and most traditional methods.These suffer from the so-called curse of dimensionality, in which computational cost grows exponentially with the number of dimensions.In addition to the above empirical demonstrations of the NNM, several theorems have been published stating that the computational cost of the NNM grows at most polynomially in the number of dimensions for various classes of PDEs [27][28][29].
Nonetheless, the theoretical grounding of the NNM is less thoroughly developed than those of other techniques.There are as of yet few guarantees regarding, e.g., under what conditions the NNM will converge to the true solution of a given PDE, at what rate, and to what precision.As such, confidence in the method still relies heavily on empirical demonstrations.However, available empirical demonstrations focus primarily on problems with relatively well-behaved solutions [14,15,17,18,[20][21][22][23][24][25]30].Indeed, Michoski et al. [31] noted this, and conducted an investigation of the NNM applied to irregular problems exhibiting shocks.The current work is analogous in this regard, but focuses instead on the non-convexity of the slit-well domain as the source of irregularity.

B. The slit-well microfluidic device
Micro-and nanofluidic devices (MNFDs) are small, synthetically fabricated systems with applications in molecular detection and manipulation [5,6,[32][33][34].One important use of MNFDs is to sort mixtures of molecules, including free-draining molecules such as DNA that cannot normally be separated electrophoretically in free solution [6].For instance, the slit-well device proposed by Han and Craighead [3] can be used for sorting polymers (such as DNA [3,4,35,36]) or nanoparticles [37,38].The device's periodic geometry, illustrated schematically in Fig. 2, consists of parallel channels (called wells) separated by shallower regions (called slits).An electric field is applied to drive molecules through the device.
MNFDs such as the slit-well exploit the complexity of physical phenomena at the single-molecular scale (often below the optical resolution limit) to produce useful and sometimes surprising behaviours.This, however, makes them challenging to design and optimize, and renders theoretical and computational investigations important to the development of MNFD technologies.For example, the sorting mechanism in the slit-well device depends nonlinearly on the magnitude of the applied electric field as well as the size and shape of the wells, the slits, and the molecules themselves [6,[35][36][37][38][39].For some choices of these parameters, the slit-well sorts molecular mixtures into increasing order of size; for others, however, it sorts them into decreasing order.A rich literature exists exploring these processes, reviewed in part by Dorfman [6] and Langecker et al. [39].

C. NNM with complicated geometries
There are relatively few demonstrations of the NNM on problems with complicated domain geometries.Specifically, the NNM has mostly been applied to problems posed in rectangular or circular domains [14, 17, 18, 20-22, 24, 25].Of note, Wei et al. [26] used the NNM to solve PDEs in nanobiophysics that also arise in MNFDs (i.e., Fokker-Planck for particles and polymers).However, their work did not consider these problems in MNFD geometries.Even among the demonstrations of the NNM in more complicated (e.g., non-convex) domain geometries, most problems feature boundary conditions that produce relatively smooth, well-behaved solutions [15,23,30].Sirignano and Spiliopoulos [16] solved a free-boundary problem based on a financial system, but it is not clear whether that PDE exhibits the specific kinds of challenging features considered in the current work.
An exception to the above is given by E et al. [13], who applied a variant of the NNM to a Poisson equation in a square domain with a re-entrant needle-like boundary.
This problem exhibits the same singular behaviour as the slit-well problem with sharp corners (see Sec. II A).Their Deep Ritz training protocol was based on a variational formulation of Poisson's equation.However, variational formulations cannot be obtained for all PDEs [40].For this reason, we have opted to study the more general NNM algorithm originally presented by Dissanayake and Phan-Thien [2].
When Anitescu et al. [41] revisited this needle problem using the original method of Dissanayake and Phan-Thien [2], they reported poorer convergence than obtained by E et al. [13] with the Deep Ritz method.A similar observation was made during the present work: re-entrant corners significantly impair the convergence of the standard NNM (Sec.II A).In contrast to the present work, the error analyses reported by E et al. [13] and Anitescu et al. [41] did not consider the physical realism of the NNM solutions (Sec.I D) nor the accuracy of the NNM solutions' gradients.These characteristics of the NNM are important for use in various applications, including studies of MNFDs, and are investigated directly in the present work.

D. Physical realism of NNM solutions
Various modifications of the NNM have been proposed to ensure solutions exactly satisfy problem-specific invariants that are known a priori, such as boundary conditions [12,15,30], non-negativity [42], Hamiltonian dynamics [43], or special invariants of the Schrödinger equation [44].However, manually creating formulations of the NNM that explicitly satisfy specific invariants can be difficult.Furthermore, this approach cannot account for invariants which may be unknown ahead of time.It is natural to question how well the NNM approximates invariant quantities when these are not explicitly enforced.
In fact, although certain numerical methods can be devised specifically to satisfy some conservation laws (e.g., finite volume methods conserve flux [45], symplectic ODE integrators conserve energy [46]), most numerical methods (including standard FEM formulations) do not satisfy physical invariances exactly.For instance, Zhang et al. [47] discussed what modifications of the FEM are necessary to render it flux-conserving.As part of the present work, we will investigate how well the NNM satisfies physical invariances of the slit-well problem in the absence of any problem-specific customization.

II. METHODOLOGY A. Problem statement
We use the simplest electrostatic model of the electric field E in the slit-well, namely the two-dimensional Laplace equation for the electric potential u. Figure 3 illustrates the geometry of our model over one periodic sub-unit of the slit-well device.Uniform Dirichlet boundary conditions were imposed on the coloured segments (specifically, u = ±1 on the right and left, respectively) to model an applied voltage across the system.The grey boundaries correspond to homogeneous Neumann (i.e., insulating) boundary conditions.Throughout the interior of the domain (i.e., the yellow area in Fig. 3), the potential was modelled by Laplace's equation.
In contrast with other authors, we have rounded the re-entrant corners at the interface of the slits and wells.It can be shown that near sharp (i.e.non-differentiable) re-entrant corners, solutions u to Laplace's equation are not continuously differentiable [48][49][50].That is, sharp re-entrant corners cause singularities in the electric field E. Because the magnitude of E near the corners diverges as the curvature goes to zero, the slit-well electric field is ill-conditioned, in the sense that small changes in the curvature of the corners produce large changes in E.
Although such ill-conditioning hinders the performance of most numerical methods, including FEM [48][49][50], they present a particular challenge for the NNM.The fully-connected feedforward neural networks typically used for the NNM are infinitely differentiable functions.However, the true solution to the slit-well problem with sharp corners exhibits a discontinuous electric field, so that significant errors seem likely near the corners.Furthermore, because the neural network is a global approximation method, local errors near the corners can affect performance throughout the domain.
In practice, the training methodology we present here (Sec.II B), when applied to the problem with sharp corners, failed to converge to even a reasonable approximation of the true solution.Even in preliminary tests with rounded corners, the convergence rate of the NNM was observed to deteriorate as the curvature of the corners was reduced.Therefore, for the current work, an in-termediate curvature (Fig. 3) was selected to produce a challenging but attainable benchmark for the NNM.

B. NNM implementation
In this section, we describe our implementation of the NNM.It is similar to those of Dissanayake and Phan-Thien [2], van Milligen et al. [9], Berg and Nyström [15], Sirignano and Spiliopoulos [16], Magill et al. [19], and Wei et al. [26], among others.The true solution u(x) to the PDE was directly approximated by a neural network ũ(x).This was accomplished by training the neural network to minimize the loss functional Here ∇ 2 u = 0 is the PDE required to hold in the interior of the domain Ω ⊂ R 2 , and B is a differential operator describing the boundary conditions Bu = 0 on the boundary ∂Ω of the domain (described in Section II A and illustrated in Fig. 3).Thus, L[ũ] quantifies the extent to which the neural network fails to satisfy the PDE and its boundary conditions.The parameters of the network were updated iteratively using the Adam optimizer, a modified gradient descent algorithm [51].The integrals in L[ũ] were approximated via the Monte Carlo method, as described in more detail below.The approximate electric field, Ẽ, and other required derivatives were obtained exactly via automatic differentiation.The weights of the network were initialized by the Glorot method [52].Computations were done using Tensorflow 1.13, and all hyperparameters not discussed here were set to their default values [53].
The Monte Carlo samples x i ∈ Ω used to estimate the first term of L[ũ] were selected from 100,000 points uniformly distributed in the bounding rectangle [−L x , L x ] × [−L y , L y ], by rejecting those lying outside the domain.Those used to estimate the second term were generated by directly sampling the boundary with a linear density of 40 points per unit length.Altogether, this yielded an expected batch size of roughly 62,000.To reduce the overhead of sampling training points, batches were reused for 1,000 parameter updates before resampling.
The testing loss was computed on a set of points sampled once at the beginning of training, generated using the same procedure as the training points.The testing loss was computed and recorded every 100 parameter updates.Early stopping was used to terminate training when the testing loss failed to improve after 100 consecutive tests.The final network was taken from the epoch at which the testing loss was smallest.This training procedure was conceived to ensure that networks converged to very stable local minima, in order to study the behaviour of the NNM in the limit of long training time.
The neural networks considered in this study were all fully-connected feedforward networks with tanh activation functions (Fig. 1), consisting of d hidden layers of equal width w.Specifically, the networks mapped an input vector x, corresponding to a point in the problem domain, to ũ given by ũ where Here, W 1 ∈ R w×2 , W i ∈ R w×w for i = 2 . . .d, and W d+1 ∈ R 1×w are the network's weight matrices, while b i ∈ R w for i = 1 . . .d, and b d+1 ∈ R are its biases.

C. FEM implementation
To provide a reliable ground truth against which to compare the performance of the NNM, the target PDE was also solved via the FEM using FEniCS [54].The domain and mesh were constructed using the mshr package.The resolution parameter for generate_mesh was set to 200 and the circular re-entrant corners were approximated linearly with 100 segments each.
In order to obtain an accurate approximation of the electric field, and not just of the electric potential, the FEM was applied to a standard dual-mixed formulation of Laplace's equation for the electric field and electric potential simultaneously [54].In this approach, ũ and Ẽ are approximated simultaneously using separate basis functions.Solving for ũ alone and reconstructing Ẽ by differentiation was found to yield poor results.
Convergence tests (not shown) confirmed that the FEM solution converged in proportion to the square of the mesh resolution.The tests suggest that the absolute error in the FEM solution relative to the true solution is on the order of machine precision (i.e, 10 −16 ).Note that the FEM solution was computed in double precision, whereas the NNM was computed in single precision.

III. RESULTS
At its core, the NNM is motivated by the rationale that training networks to minimize the loss functional (Eqn. 1) will cause those networks to approximate the correct solution.This section contains investigations into the following related questions: 1.If a network exhibits a small loss, how close is it to the true solution?Specifically, is the loss functional a reliable estimator of actual network performance?
2. If a network is close to the true solution, how well does it reproduce the physical characteristics of the true solution?Specifically: -To what extent does it exhibit the same spatial symmetries as the true solution?
-To what extent does it conserve electric flux?
3. If a network is close to the true solution, and the corresponding electric field is used to conduct particle simulations, how accurate are subsequent measurements made using those particle simulations?
4. How does architecture affect these conclusions?
All experiments were repeated across four random initializations and multiple network architectures: specifically, all combinations of depths d = 1, 2, 3, 4, 5, 6 and widths w = 10, 25, 50, 75, 100, 150, 200, 250 were examined, as well as networks of depth 1 and widths 500 and 1000.Figure 4(a) shows an example of an NNM solution obtained using a network of depth 5 and width 75.The approximate electric field Ẽ is superimposed in black lines over colored contours showing the approximate electric potential ũ.It is visually indistinguishable from the reference FEM solution (not shown).Figure 4(b) shows (ũ − u) 2 , the squared error of the NNM potential compared to the FEM potential.Figure 4(c) shows Ẽ − E 2 / E 2 , the pointwise relative error of the NNM electric field.Here, • 2 denotes the Euclidean norm.Note that the error in the potential cannot be normalized pointwise, as discussed in the next section.
Both of the error distributions in Fig. 4 are particularly pronounced near the re-entrant corners.The electric field intensity is also very large in these regions (see Fig. 12).In the limit of small curvature, in fact, it is at these corners that the electric field develops singularities (see Sec. II A).In fact, the peaks in error and electric field intensity both occur precisely where the boundary transitions from flat to curved, i.e., where the second derivative of the boundary curve is discontinuous.
Additionally, Fig. 4(c) shows pronounced relative error in the electric field near the corners at the bottom of the well.These peaks arise because the magnitude of the true electric field approaches zero in those corners (see Fig. 12).Since the denominator of Ẽ−E 2 / E 2 is very small, even small errors in the electric field near those corners manifest as large relative error.The maximum relative error in the domain Ω consistently occured in these bottom-most corners for all NNM solutions in the dataset.Nonetheless, for many applications, errors of this kind are likely to be less important than the errors occuring near the re-entrant corners, as they are much smaller in absolute magnitude.

A. Error relative to FEM
The purpose of this section is to investigate the errors of the NNM solutions relative to the reference FEM solution, and to what extent the loss functional correlates with these errors.The error in an approximate electric potential ũ will be characterized by Here, • Ω denotes the mean over the domain Ω.Whereas Fig. 4(b) shows the distribution of the squared error in ũ throughout the domain, δ u [ũ] corresponds to the root mean squared error of ũ over Ω, normalized by the root mean squared value of the true solution u.Note that one cannot define an unambiguous pointwise relative error for ũ, since the electric potential does not have a physically meaningful zero.The metric δ u [ũ] represents the magnitude of the error in ũ relative to the magnitude of the true solution u, when both of these are measured in the L 2 norm for functions.
For the electric field, conversely, a meaningful pointwise relative error can be defined as Ẽ − E 2 / E 2 , where both the numerator and the denominator vary throughout the domain.The average of this pointwise relative error is denoted and acts as a global error metric for Ẽ.This is precisely the mean of error distributions like the one shown in Fig. 4(c).
Figure 5 shows the global error metrics δ u [ũ] and δ E [ Ẽ] for all networks in the dataset, plotted against each network's testing loss.The integrals required to compute the error metrics were approximated via the Monte Carlo method, by sampling the domain interior using the same procedure described in Section.II B. Marker color corresponds to network depth, and marker size corresponds to network width.
It is clear in Fig. 5 that lower testing losses correlate strongly with lower values of both δ u [ũ] and δ E [ Ẽ].This result confirms the basic motivation underlying the NNM, namely that training neural networks to minimize the loss functional will cause them to approximate the correct solution.It also suggests that, in the absence of theoretical guarantees on the convergence of the NNM, the testing loss may provide a practical proxy for estimating a solution's true accuracy.
The data in both Figs.

B. Physically motivated error metrics
The results in the previous section suggest that the NNM can reliably produce accurate solutions to the slitwell problem.Furthermore, networks with smaller loss values are closer to the true solution, i.e., they have smaller error values.Finally, the NNM does not appear overly sensitive to the choice of architecture, given at least two hidden layers and sufficient network width.
The purpose of this section is to investigate whether networks with small loss and error values also approximately reproduce physical characteristics of the true solution.Specifically, we investigate the NNM solutions' satisfaction of spatial symmetries and the conservation of electric flux.

Deviation from symmetry
The true solution of the target PDE satisfies three spatial symmetries.First, the true electric potential u is anti-symmetric in the horizontal direction about the centre of the well, i.e., where (x, y) are the coordinates of a point about the center of the well.As a result, the vertical component of the true electric field E also exhibits this anti-symmetry in x, i.e., Finally, the horizontal component of the electric field is symmetric about the centre of the domain, i.e., The extent to which a network deviates from these symmetries will be quantified using relative error metrics analogous to those used in the previous section.Specifically, the deviation of an approximate electric potential ũ from symmetry will be quantified by where ũ (x, y) = −ũ(−x, y).This is the root mean squared difference between ũ and its negative reflection, normalized by the root mean squared value of the true potential u.In analogy with δ u [ũ], the metric R u [ũ] measures the magnitude of the deviation of ũ from symmetry relative to the magnitude of the true solution u (when both are measured in the L 2 norm).The deviation of an approximate electric field Ẽ from symmetry will be quantified by where Ẽ is the transformed electric field In analogy with δ E [ Ẽ], this is the mean pointwise relative deviation from symmetry of the electric field.These metrics of deviation from symmetry are closely connected to the relative error metrics of Sec.III A. Specifically, the triangle inequality implies that By definition, the true solution u is invariant under the transformation that maps ũ to ũ .Specifically, By the symmetry of the domain, it follows that Combining these results and dividing by u 2 Ω , it follows that that is, the distance from an approximate potential ũ to its reflection ũ is, at most, twice the distance from ũ to the true solution u.Very similar reasoning can be applied to an approximate electric field Ẽ to conclude that Thus, solutions with small error values will inevitably be nearly symmetric, simply by virtue of being nearly equal to a symmetric function.Furthermore, since it was established in Sec.III A that the loss functional provides a reliable estimator of the error, it follows that the loss also provides a reliable estimator of the deviation from symmetry.It remains to be seen, however, whether or not inequalities 18 and 19 are strict in practice.That is, do neural networks learn that symmetry is a desirable feature, or are they only symmetric insofar as they approximate the true solution?
Figure 6 shows for all networks in the dataset, plotted against each network's testing loss.As in Fig. 5, the marker sizes correspond to network widths, and the colours indicate network depth.
The dotted lines correspond to the maximum deviation from symmetry permitted for a given error value, according to inequalities 18 and 19.
Most of the data in Fig. 6(a) lie nearly on the dotted line: roughly 90% lie above 1.5, and 75% lie above 1.9.This indicates that most of the electric potentials approximated via the NNM satisfy the target symmetries only to the smallest degree required by virtue of their proximity to the true solution.The data in Fig. 6(b), however, lie somewhat farther from the dotted line.Quite a few of the most symmetric electric field approximations have R E [ Ẽ]/δ E [ Ẽ] ratios below 1, indicating that they are more similar to their own reflections than they are to the true solution.It is important to note, however, that the electric field metrics of error and symmetry are normalized pointwise by the electric field intensity, whereas the electric potential metrics are not normalized pointwise.This distinction may account for some of the apparent differences between Figs. 6

(a) and (b).
Altogether, the results in this section indicate that the NNM solutions deviate from the symmetries of the true solution by an amount comparable to their error values.Some networks may produce electric field solutions that are more symmetric than required given their error values alone, but most networks only exhibit the minimal degree of symmetry required by the triangle inequality.As discussed in the introduction, directly constraining the networks to satisfy the symmetries (e.g., by modifying the network architectures, or by adding additional terms to the loss functional) would almost certainly improve the symmetry of the resulting approximations.However, implementing such constraints can be expensive for more complicated invariants, and some problems may exhibit invariants that are unknown a priori.These results illustrate that the NNM can still learn to satisfy invariants approximately, even when they are not explicitly enforced.Furthermore, the loss functional may provide a means of empirically estimating the extent to which such invariants are satisfied in practice.

Conservation of flux
Another important physical property of the true solution to the target PDE is the conservation of electric flux.In its strong form, conservation states that the true electric field E must be divergence-free at all points in the domain.This is equivalent to the condition that the true electric potential u must satisfy Laplace's equation, ∇ 2 u = 0, since it can be rewritten as Thus, one could quantify the deviation from conservation of flux of an approximate field Ẽ by computing some error norm of ∇• Ẽ.However, since all the derivatives taken in the NNM are exact (obtained via automatic differentiation), ∇ • Ẽ is exactly equal to ∇ 2 ũ.As a result, the first term of the loss functional (Eqn. 1) is precisely a measure of how well the NNM satisfies the strong form of the conservation of flux.Nonetheless, the strong form of conservation is insufficient to fully describe the extent to which the electric field conserves flux over extended regions of space within the domain.This is better described using the weak form, which states that the surface integral of the flux into any closed subset of the domain must be zero.Motivated by this, we define the quantity Here B(x; ) is a ball of radius centered at a point x in the domain, ∂B(x; ) denotes its boundary, and Ẽn denotes the outward normal component of the electric field into its surface.The outer integral is taken over Ω , by which we denote the set of all points in the domain that are at least a distance from the boundary.The factors |Ω | and |B | are the areas of Ω and B(x; ), respectively.In other words, E(ũ; ) is the mean square norm of the flux into all balls of radius that are entirely contained within Ω, divided by the area of those balls.Because this definition of E(ũ; ) is mesh-agnostic, it can also be computed directly for a FEM solution.Numerical calculations of E(ũ; ) and related metrics in this section are somewhat technical, and details are relegated to App.B. Figure 7 shows E(ũ; ) computed for a sample of NNM solutions (coloured lines) as well as for the reference FEM solution (black line).The architectures, losses, and relative errors of the three networks shown in Fig. 7  sured on several other NNM solutions (not included).
In particular, E(ũ; ) was consistently observed to decrease monotonically with increasing .In Fig. 7, the network with architecture (d, w) = (2, 25) achieved relatively mediocre performance.The (1, 200) network performed fairly poorly overall, but was still among the best performing shallow networks in the dataset.As expected, the best of the three networks according to testing loss and the relative error metrics, (4, 150), also performed best in terms of conservation of flux.Similarly, (2,25) outperformed (1,200).We emphasize that the (2, 25) network outperforms the (1,200) network in all metrics, despite having slightly smaller capacity.This is reflective of the disproportionately poor performance of shallow architectures noted in Secs.III A and III D.
The behaviour of E(ũ; ) for the FEM solution differs from that of the NNM solutions in some important ways.Whereas, for all three NNM solutions, E(ũ; ) is roughly constant below ≈ 10 −1 , for the FEM solution E(ũ; ) continues to increase with decreasing until at least ≈ 10 −4 .As a result, although the FEM solution achieves better E(ũ; ) than all NNM solutions at long length scales, the converse is true at sufficiently small length scales.The best NNM solution in Fig. 7, (4, 150), exhibits comparable conservation of flux to the FEM solution at length scales near the mean FEM mesh size L mesh = |Ω|/N , where N is the number of mesh elements.At length scales below L mesh , the (4, 150) network conserves flux more accurately than the FEM solution.Even the worst of the three NNM solutions shown in Fig. 7 performs comparably to the FEM solution in conservation of flux at length scales below ≈ 10 −3 .The relative stability of the NNM at small length scales may be attributable to its mesh-free nature, and is an appealing feature for subsequent use in particle simulations.Finally, we recall (see Sec. II C) that the FEM solution was computed in double precision, and suggest that the single precision used for the NNM solutions may be a limiting factor to their performance at large length scales.
For small choices of , E(ũ; ) converges to a measure of the strong form of conservation of flux.By the divergence theorem, for a continuously differentiable field Ẽ, the flux error metric E(ũ; ) can be rewritten as In the remainder of this section, angle brackets • S will be used to denote means over any set S. From Eqn. 22, it is easy to deduce the limit of E(ũ; ) as → 0, which will be denoted E(ũ; 0).Since Ω → Ω and the mean over B(x; ) approaches the identity operator, it follows that The leftmost points in Fig. 7 illustrate E(ũ; 0) for each of the solutions.For the NNM solutions, E(ũ; ) converges to E(ũ; 0) as → 0, as expected.This is not the case for the FEM solution, for which E(ũ; ) exceeds E(ũ; 0) for small .However, this is not a contradiction, as Eqn.24 was derived by assuming continuous differentiability.Equation 24is precisely the mean of the square deviation of ũ from the strong form of conservation of flux.For NNM solutions, E(ũ; 0) is equal to the first term of the loss functional (Eqn. 1) divided by |Ω|, and is therefore bounded above by the loss.Given that E(ũ; ) was observed to decrease monotonically with , this suggests that, as for the relative errors and symmetry errors, the loss provides a useful estimator of the error in conservation of flux over any length scale.
However, as increases, the metric E(ũ; ) becomes increasingly biased, because the center of the balls B(x; ) cannot be placed within a distance of the boundaries of the domain.At moderate values of , this means that errors in flux conservation in the interior of the domain are weighted more heavily than those near the boundaries of the domain.Eventually, when > 0.6, the balls are too large to fit inside the slits of the device, so that only errors inside the well contribute to E(ũ; ).For this reason, the data in Fig. 7 are only computed for values sufficiently below 0.6 that this bias is deemed acceptably small.This biased behaviour of E(ũ; ) arises because the inner integral in Eqn.22 is based on circle-shaped test sets.A more meaningful metric of flux conservation over very long length scales can be obtained by replacing B(x; ) with ∂Ω in Eqn.22.This global flux error will be denoted E(ũ; ∂Ω), and satisfies Thus, E(ũ; ∂Ω) is directly connected to Ẽn ∂Ω , the net flux through ∂Ω, which is zero for the true solution.Note that the second equality in Eqn. 25 follows from the divergence theorem, so it applies to the NNM solutions but not the FEM solution.Together with the second equality of Eqn. 24, this means which is the variance of ∇ 2 u over Ω.This is always nonnegative, so it follows that for any ũ satisfying the second inequalities in both Eqns.24 and 25.
The rightmost points in Fig. 7 illustrate E(ũ; ∂Ω) for each of the four solutions.Figure 8 shows E(ũ; ∂Ω) for all NNM solutions versus each network's testing loss; the dotted line indicates the value for the FEM solution.It is immediately evident that E(ũ; ∂Ω) relates to testing loss in a similar way as do the relative error metrics (Fig. 5).As was the case for the other metrics, E(ũ; ∂Ω) decreases with decreasing testing loss, suggesting that testing loss is a useful estimator of global flux error.Indeed, this is inevitable in the limit of small loss, since E(ũ; ∂Ω) is bounded above by E(ũ; 0), which is in turn bounded above by the loss.It also appears that the data in Fig. 8 are divided into the same two clusters as the data in Fig. 5, with the shallow architectures performing worse than nearly all deep architectures.Somewhat surprisingly, the best of the NNM solutions appear to conserve flux globally to nearly the same degree as the reference FEM solution, despite being computed in single (rather than double) precision.Indeed, one network with architecture (4, 200) appears to slightly outperform the FEM solution in this respect.However, it is important to note that E(ũ; ) for this (4, 200) network (not shown) exhibits essentially the same behaviour as that of the (4, 150) network analysed in Fig. 7.In other words, although that particular network performs very well at global flux conservation, FEM does a significantly better job at conserving flux over intermediate length scales.This suggests that, for the NNM solutions, the error in conservation of flux is heterogeneously distributed throughout the domain, which is consistent with the previous observation that error in the NNM solutions is significantly larger near the re-entrant corners.
In summary, the metric E(ũ; ) provides a meshagnostic measure of how well an NNM solution conserves flux over a length scale .As → 0, the limit satisfies Eqn. 24, and is bounded above by the loss.Empirically, E(ũ; ) is observed to decrease monotonically with , so that the loss provides a useful estimator of the error in flux conservation over intermediate length scales, too.Alas, when is large relative to other length scales in the domain, E(ũ; ) is a biased metric, as it places less weight on flux lost near the boundaries of the domain.However, a related measure of global conservation of flux over the entire domain is given by Eqn. 25, which is not biased.This measure, too, is bounded above by the loss.Altogether, the NNM seems capable of reliably producing solutions that conserve flux to an acceptable level of accuracy without the need to explicitly enforce this physical invariant during training.In particular, some of the NNM solutions conserve flux globally roughly as well as the FEM solution.Furthermore, even relatively mediocre NNM solutions conserve flux better than the FEM solution over sufficiently small length scales.

C. Application to particle simulations
Section III A looked directly at error between NNM and FEM, and Sec.III B looked at error metrics motivated by physical invariants.Both suggested that the testing loss provides a reliable estimator of the true performance of the network solutions, and that (with appropriate network architectures) the NNM consistently finds solutions with seemingly small error values.However, the question of what error values are acceptable is subjective, and often depends on the intended application of the numerical solutions.For this reason, this section will consider the performance of the NNM solutions when used as the driving force fields in particle simulations of Brownian motion in the slit-well device (implemented in C).The simulation scenario is quite similar to those investigated by Cheng et al. [37] and Wang et al. [38].
Simulations of N = 100, 000 particles in the slit-well domain were initialized with all the particles located in the middle of the same well.The particle positions x i evolved according to the discretized Brownian equation, where the timestep was set to ∆t = 10 −4 , the diffusion coefficient to D = 1, and the friction coefficient to γ = 1.
The particle charge, q, was varied from 1 to 10.The term R(t) is a random driving force, representing thermal motion of an implicit solvent, and was sampled via the Box-Muller transform from an independent standard Gaussian distribution for each particle at each timestep.The driving electric field, Ẽ, was obtained from either the reference FEM solution or from one of the NNM solutions.The electric fields were discretized onto a uniform square mesh overlain on [−L x , L x ] × [−L y , L y ], the smallest bounding box containing Ω (see Sec. II B).The sidelengths of the mesh elements were set to 0.01.The field experienced by a particle at a given position was approximated by nearest-neighbour interpolation to the mesh.We leave more sophisticated coupling between the particle simulations and the electric fields to future work.
Particles experienced periodic boundary conditions across the left and right sides of the periodic sub-unit illustrated in Fig. 3, and the boundaries that were insulating in the electric field problem were treated as reflective in the particle simulations.The number of times each particle crossed the domain was tracked, so as to measure its absolute displacement from the original position.After t max = 10 6 timesteps, the mean horizontal displacement of the particles from the initial position, x , was divided by t max to obtain an estimate v x of the average particle velocity.This average velocity was then divided by particle charge to estimate the effective particle mobility, µ = v x /q.The statistical error on this mobility measurement was estimated as s = (σ vx /q)/ √ N , where σ vx is the standard deviation of the particle velocities.
These mobility measurements are shown in Fig. 9(a) for simulations conducted with the same four electric fields investigated in Sec.III B 2: that of the reference FEM solution, and that of the three NNM solutions summarized in Table I.The simulations using the FEM field were conducted twice with different random seeds, shown as the two black lines in Fig. 9(a).The difference between these two sets of measurements provides a means of distinguishing the errors introduced by the electric fields from simple statistical fluctuations on the mobility measurements.In Fig. 9(a), the measurements of µ made using the networks of architectures (2,25) and (4,150) appear fairly similar to those made using the FEM field.Conversely, the measurements using the (1, 200) architecture are quite easily distinguished from the FEM data.All simulations recovered effective mobilities that varied with q, induced in the otherwise free-draining particles by their interactions with the slit-well geometry.
The relative error between two mobility measurements µ 1 and µ 2 was quantified as The coloured lines in Fig. 9 estimated via standard rules for propagation of error.Unsurprisingly, the errors of the (1, 200) architecture are significantly larger than those of the other two architectures, and show a clear bias towards underestimating the mobility.Nonetheless, even this crude solution produces errors smaller in magnitude than 5% of the actual mobility.This suggests that the current particle simulations are relatively insensitive to moderate inaccuracies in the driving electric field.
The relative errors of both the (2, 25) and (4, 150) architectures are comparable to the relative errors between the two sets of FEM-based measurements, and lie below 1% for all values of q.However, the relative errors for the (2,25) architecture are negative for all q above 2, whereas the relative errors of the (4, 150) architecture are roughly evenly distributed about 0. This suggests that the (2,25) architecture introduces a small but detectable systematic bias into the mobility measurements.Conversely, the errors of the better (4, 150) architecture are comparable to statistical fluctuations, despite the relatively large number of simulated particles, N = 100, 000.These results confirm that the best of the NNM solutions presented in the current work are sufficiently accurate for use in particle simulation applications.Moreover, the relative performance of the three architectures is consistent with their values of L[ũ], δ u [ũ], and δ E [ Ẽ] (Tbl.I).
In Fig. 9, the network with architecture (2, 25) significantly outperforms that with architecture (1,200), despite having slightly smaller capacity, re-emphasizing the advantages of deep architectures over shallow ones.Conversely, the much larger (4, 150) architecture only achieves moderate improvements over the (2,25) architecture, reflecting the diminishing returns associated with increasing network capacity.These subtle impacts of architecture are investigated more closely in Sec.III D.

D. Effect of network architecture
The previous sections have demonstrated that the testing loss is a useful estimator of several independent error metrics.Specifically, the loss functional appears to reliably estimate the error relative to the reference FEM solution; the deviation from symmetry; the deviation from conservation of flux; and the error introduced into subsequent mobility measurements.Thus, the loss is a useful single metric of performance via which to compare different NNM architectures.
In Fig. 10, the testing loss is plotted against the total network capacity.Here, network capacity is measured as the total number of parameters in the network, given in terms of the width w and depth d by (2 + 1)w + (d − 1)(w + 1)w + (w + 1), (30) since the networks have two inputs and one output.The coloured lines in Fig. 10 correspond to different network depths, so that the various capacities within each line identify the network widths.The error bars show maxima and minima over all random seeds, whereas the lines indicate mean performance.The data in Fig. 10 show that, for network capacities below 5 • 10 3 , increasing capacity improves testing loss for any choice of depth.This suggests that, for those networks, insufficient capacity is a primary bottleneck towards representing more accurate approximations of the true solution.In particular, for the networks with two hidden layers, increasing the capacity improves the loss by nearly two orders of magnitude.Furthermore, in this low-capacity regime, increasing depth improves performance for a given capacity.In other words, when insufficient network capacity is the primary barrier to improved performance, deeper networks make more efficient use of that limited resource.Indeed, this is consistent with the effects of architecture observed in Figs. 5, 7, 8, and 9. Specifically, shallow networks perform particularly poorly in all metrics throughout the present work, even compared to networks with comparable capacity and as few as two hidden layers.
For deep networks with moderately large capacities (5•10 3 to 5•10 4 ), testing loss is essentially independent of network architecture (i.e., independent of both depth and capacity/width).This suggests that insufficient network capacity is no longer a primary bottleneck to improving solution accuracy.The investigation by Magill et al. [19] suggested that the internal representations learned by networks in the NNM become essentially independent of width above some critical size, so it is not surprising that loss similarly becomes independent of width.However, it is noteworthy that this limiting loss value is also independent of network depth (among those with two or more hidden layers).
For networks with capacities of 5 • 10 4 or above, testing loss begins to increase with further increases in capacity. Figure 5 illustrates that these same networks sometimes exhibit relative errors nearly as high as some shallow networks, despite having two orders of magnitude more capacity.Their poor performance can be understood in terms of the difficulties commonly encountered in training very deep, wide neural networks.For instance, Berg and Nyström [15] noted similar loss in performance when training networks with five or more hidden layers, and attributed this to vanishing gradients.Refinements in the network architectures and training algorithms can be expected to alleviate this phenomenon.
Note that the behaviour of these networks with very large capacities cannot be described in terms of overfitting, another problem commonly encountered by net- works with excessively large capacities.Overfitting is typically defined as a significant gap between the training and testing losses of networks.In the NNM, however, the testing and training sets are drawn from identical distributions.In the implementation used here, in particular, the training set is redrawn regularly throughout training, so that it is fundamentally impossible for the network to be overfitting to a specific set of training samples.
Finally, Fig. 11 shows the total training time of the NNM solutions against testing loss.The same two populations identified in Figs. 5 and 8 are evident again in Fig. 11.The cluster on the right contains all the shallow networks as well as the narrowest of the deep ones.The cluster on the left consists of those networks that attained better than 1% error relative to FEM (Fig. 5).Within each cluster, testing loss and training time are loosely correlated.For all networks, training time was on the order of hours.However, it is important to note that the implementation in the present work was not concerned with optimizing the computational efficiency of the NNM, but rather with ensuring that the training process was thoroughly converged (Sec.II B).
Once again, the networks in the right cluster perform disproportionately poorly, even though many of them have capacities comparable to some of those in the left cluter (Fig. 10).Thus, not only do the networks in the left cluster achieve better accuracies (as measured by testing loss or any of the various error metrics in this paper), but they also finish training far more rapidly.Further, this conclusion is true even between networks of equal capacity.These observations demonstrate many benefits of using deeper architectures in the NNM, and several disadvantages of using shallow architectures.

IV. CONCLUSIONS
This work investigated the performance of the neural network method (NNM) when used to solve the electric potential and field in the slit-well device.This problem features a non-convex geometry, which makes it particularly challenging to solve with the NNM.Performance was quantified in multiple metrics, and compared against a reference FEM solution.
The best network architectures studied here reliably achieved relative errors below 0.1% in both the potential and the field.NNM solutions also recovered spatial symmetries of the true solution to roughly the same extent that they approximated the true solution.Regarding conservation of flux, the NNM solutions performed comparably to the reference FEM solution.Finally, particle simulations conducted using the NNM electric fields yielded mobility measurements consistent with those based on the FEM electric field.In each of these metrics, the testing loss was found to provide a useful estimator of the networks' true performance.That is, networks with smaller losses were found to be closer to the true solution; to more closely approximate the target symmetries; to conserve flux more accurately; and to produce better particle simulations.
These empirical investigations uncovered several valuable insights for practical use of the NNM.Accurate solutions to physical problems can be obtained even without explicitly enforcing known physical invariants of the true problem.The importance of architecture was re-emphasized: deep architectures consistently outperformed shallow ones, converging to better solutions in less time and using fewer degrees of freedom.Finally, the testing loss may provide a practical means of gauging a solution's accuracy, even when the ground truth is unknown and convergence is not theoretically guaranteed.
In summary, this work demonstrates that the NNM can successfully solve a problem that is ill-conditioned due to the non-convexity of its domain.The NNM solutions were found to be particularly appropriate for use in subsequent particle simulations.This suggests that it could be a useful tool for the study of micro-and nanofluidic devices (MNFDs) and other biophysical systems.Moreover, differential equations in domains with complicated geometries arise throughout physics and other fields.These results support the feasibility of using the NNM to solve this fundamental and ubiquitous class of problems.
Appendix A: Additional plots of the electric field solution Figure 12 shows the FEM electric field intensity throughout the domain, in both linear and logarithmic colour scales.In particular, Fig. 12 illustrates that the peak field intensity occurs near the re-entrant corners, with a magnitude of about 0.36.In the bottom corners of the well, the field intensity is over four orders of magnitude weaker.These features contribute to the difficulty of applying the NNM to the slit-well electric field problem, since the standard loss functional used during training places equal weight on all regions of Ω and ∂Ω.The regions of very intense electric field near the re-entrant corners, specifically, seem to be most difficult to resolve for the NNM, as seen in the error maps shown in Fig. 4.

Appendix B: Details of flux loss calculations
This appendix contains descriptions of how the metrics shown in Figs.7 and 8 were computed.For Fig. 7, the integrals in Eqn.21 were computed by sampling 10000 uniformly spaced points on ∂B(x; ) for each choice of the center x.Candidate samples for the centers were generated according to the same procedure described in Section II B, but with ten times higher sample density, and all points within a distance of ∂Ω were rejected.
The leftmost points in Fig. 7 correspond to Eqn. 24.For the NNM solutions, these were computed by Monte Carlo integration over Ω using ten times higher sampling density than in Sec.II B. The rightmost points in Fig. 7 correspond to Eqn. 25.These were not computed using a Monte Carlo integration approach.Because Ẽn ∂Ω is a small number computed by summing many positive and negative terms, it is vulnerable to catastrophic cancellation.For this reason, it was computed using a uniform mesh of points along ∂Ω, sampled with 100 times higher density than in Sec.II B. For the FEM solution, the integrals required for Eqns.24 and 25 were both computed in FEniCS using Gaussian quadrature via the assemble command.

FIG. 2 .
FIG. 2. A schematic of particles being electrically driven through the slit-well device.

2 FIG. 3 .
FIG. 3. A cross-sectional view of the slit-well device illustrating our PDE model of the electric potential in one periodic sub-unit of the device.The re-entrant corners follow circular arcs, and the numbers indicate the lengths of each dotted line.The solution satisfies Laplace's equation in the yellow region, Dirichlet conditions on the red and blue boundaries, and homogeneous Neumann conditions on the gray boundaries.

FIG. 4 .
FIG. 4. Example NNM solution using 5 hidden layers of width 75.(a) Approximate electric potential (colored contours) and electric field (solid black lines).(b) Squared error of the electric potential.(c) Pointwise relative error in the electric field (note the logarithmic color scale).The error in plots (b) and (c) are interpolated from values evaluated on the FEM mesh points.

FIG. 5 .
FIG. 5. Global error metrics for the NNM solutions relative to the reference FEM solution, shown against testing loss for a variety of network architectures.(a) The relative error of the electric potentials, δu[ũ].(b) The relative error of the electric fields, δ E [ Ẽ]. Marker colour indicates the depth of the network, and marker area indicates its width.

FIG. 6 .
FIG. 6.Relative deviation of symmetry for the NNM solutions normalized by relative error, shown against testing loss.(a) Deviation of symmetry of the NNM electric potentials, Ru[ũ], divided by the relative error, δu[ũ].(b) Deviation of symmetry of the NNM electric fields, R E [ Ẽ], divided by the relative error, δ E [ Ẽ]. Marker colour indicates the depth of the network, and marker area indicates its width.The dotted lines show the upper bounds given by Equations 18 and 19.

FIG. 7 .
FIG. 7. The flux error metric E(ũ; ) plotted as a function of the ball radius for three NNM solutions as well as the reference FEM solution.The legend entries for the NNM solutions indicate the architecture (d, w) for each case.The leftmost points show E(ũ; 0), and the rightmost show E(ũ; ∂Ω).The dotted vertical line labeled L mesh indicates the mean length scale of the FEM mesh.

FIG. 8 .
FIG. 8. Error in global flux conservation for all NNM solutions as a function of each network's testing loss.Marker colour indicates the depth of the network, and marker area indicates its width.The dotted line indicates the corresponding error in the FEM solution.
FIG. 9. (a) Lines show the mobility measurements µ made using four different electric field solutions.The two black lines correspond to separate simulations made using the same reference FEM field.The error bars indicate the estimated statistical error of mobility, s.(b) The coloured lines show the relative errors between the NNM-based measurements and the first set of FEM-based results.The black line shows the relative errors between the two sets of FEM-based measurements.The error bars are obtained from the data in (a) via standard rules for propagation of uncertainty.

FIG. 10 .
FIG. 10.Testing loss versus network capacity, coloured by network depths.The error bars show maxima and minima over four random seeds, and the lines indicate mean performance.The dotted lines at capacities of 5 • 10 3 and 5 • 10 4 roughly delineate the three regimes discussed in the text.

FIG. 11 .
FIG. 11.Total training time and final testing loss of the NNM solutions.Marker colour indicates the depth of the network, and marker area indicates its width.
The upper-right clusters consist of those networks achieving relative errors worse than 1% in both δ u [ũ] and δ E [ Ẽ].This population contains all of the shallow network architectures, suggesting that at least two hidden layers are required to achieve good performance on this problem.Furthermore, as discussed in Sec.III D, shallow networks always underperform relative to deep networks, even when normalized by capacity.The narrowest of the deep network architectures also attain relative errors worse than 1%.This implies that even with two hidden layers, networks require some minimum capacity (i.e., memory consumption) in order to achieve good performance on this problem.The lower-left clusters in Figs.5(a) and (b) contain the majority of the dataset, and consist of those networks attaining relative errors below 1% in both δ u [ũ] and δ E [ Ẽ].The best networks achieved relative errors as low as δ u [ũ] ≈ 0.01% and δ E [ Ẽ] ≈ 0.1%.For reference, the example solution shown in Fig.4corresponds to a testing loss of L[ũ] ≈ 9 • 10 −6 , and error values of δ u [ũ] ≈ 0.2% and δ E [ Ẽ] ≈ 0.08%.A variety of architecture choices (i.e., depths and widths) produce comparably good performance, suggesting that the NNM can produce accurate solutions without the need for careful architecture tuning.This is explored further in Sec.III D.
are listed in Table.I.The shape of E(ũ; ) measured for the NNM solutions in Fig.7is representative of what was mea-