Learned discretizations for passive scalar advection in a 2-D turbulent flow

The computational cost of fluid simulations increases rapidly with grid resolution. This has given a hard limit on the ability of simulations to accurately resolve small scale features of complex flows. Here we use a machine learning approach to learn a numerical discretization that retains high accuracy even when the solution is under-resolved with classical methods. We apply this approach to passive scalar advection in a two-dimensional turbulent flow. The method maintains the same accuracy as traditional high-order flux-limited advection solvers, while using 4x lower grid resolution in each dimension. The machine learning component is tightly integrated with traditional finite-volume schemes and can be trained via an end-to-end differentiable programming framework. The solver can achieve near-peak hardware utilization on CPUs and accelerators via convolutional filters. Code is available at https://github.com/google-research/data-driven-pdes.

The computational cost of fluid simulations increases rapidly with grid resolution. This has given a hard limit on the ability of simulations to accurately resolve small scale features of complex flows. Here we use a machine learning approach to learn a numerical discretization that retains high accuracy even when the solution is under-resolved with classical methods. We apply this approach to passive scalar advection in a two-dimensional turbulent flow. The method maintains the same accuracy as traditional high-order flux-limited advection solvers, while using 4× lower grid resolution in each dimension. The machine learning component is tightly integrated with traditional finite-volume schemes and can be trained via an end-to-end differentiable programming framework. The solver can achieve near-peak hardware utilization on CPUs and accelerators via convolutional filters. Code is available at https://github.com/google-research/data-driven-pdes.

I. INTRODUCTION
A key problem in the numerical simulation of complex phenomena is the need to accurately resolve spatiotemporal features over a wide range of length scales. For example, the computational requirement for simulating a high Reynolds number fluid flow scales like Re 3 , implying that a tenfold increase in Reynolds number requires a thousand fold increase in computing power. Over the past decades, the extra computing power made available through Moore's law has been used to increase grid resolution dramatically, leading to breakthroughs in turbulence modeling [1], weather prediction [2], and climate projection [3]. Nonetheless, there is still a formidable gap towards resolving the finest spatial scales of interest [4], especially with the recent slow-down of Moore's Law [5,6]. Machine learning has given a potential way out of this conundrum, by training low-resolution models to learn the rules from their high-resolution counterparts [7][8][9][10]. The learned models aim to produce high-fidelity simulations using much less computational resources. Incorporating machine learning into numerical models also facilitates the adoption of emerging hardware, considering that the fastest growth in computing power now relies on domain-specific architectures like Graphical Processing Units (GPUs) [11] and Tensor Processing Units (TPUs) [12,13] that are optimized for machine learning tasks.
Recently we introduced data driven discretizations [14] to learn numerical methods that achieve the same accuracy as traditional finite difference methods but with much coarser grid resolution. These methods are equation specific, and require training a coarse resolution solver with high resolution ground truth simulations. Since the dynamics of a partial differential equation is entirely local, the high resolution simulations can be carried out on a small domain. We demonstrated the method with a set of canonical one-dimensional equations, demonstrating a 4 ∼ 8× upscaling of effective resolution [14]. Here we extend this methodology to twodimensional advection of passive scalars in a turbulent flow, a canonical problem in physics [15] and a classic challenge in atmospheric modeling [16]. We show that machine-learned advection solver can use a grid with 4× coarser resolution than classic high-order solvers while still maintaining the same accuracy.

A. Advection equation
We consider the advection of a scalar concentration field C( x, t) under a specified velocity field u( x, t): If the prescribed velocity field is divergence-free then, Eq. (1) reduces [17] ∂C ∂t + u · ∇C = 0.
A classical Eulerian scheme uses discretizations of the spatial derivative ∂C ∂x , often in a form of: where {x 1 , .., x N } is the spatial grid points, C j is the concentration at point x j , and {α −k , ..., α k } are predefined finite-difference coefficients. For example, a first-order forward difference Ci+1−Ci ∆x (where C i+1 is in the upwinding direction) leads to the upwind scheme. Sophisticated high-order methods with flux limiters will choose different coefficients depending on local fields [18]. Extension to two-dimensions can be done by either operator splitting (solve for each dimension separately) [19] or a true two-dimensional discretization [20].
Although high-order Eulerian schemes are highly accurate under idealized flows [21], their accuracy breaks down to first-order under turbulent or strongly sheared flows, resulting in significant numerical diffusion [16]. Adaptive mesh refinement can reduce such numerical diffusion [22], but increases software complexity. Lagrangian methods avoid numerical diffusion [23], but have inhomogeneous spatial coverage and also difficulties in dealing with nonlinear chemical reaction [24]. Semi-Lagrangian approaches involve remapping from a distorted Lagrangian mesh to a regular Eulerian mesh [25], and such remapping step exhibits similar numerical diffusion as Eulerian methods. Flow-map approaches [26] can achieve Lagrangian-like accuracy on a Eulerian mesh, but need to solve for the advection trajectory over multiple steps and requires a special treatment to incorporate additional terms (e.g. chemical reaction) between advection steps. Different from existing methods, here we aim to develop an ultra-accurate advection solver under the requirements of: (1) a strictly Eulerian framework on a fixed grid, (2) explicit time-stepping, and (3) only relying on the current state to predict the next time step.

B. Learning optimal coefficients
Instead of using predefined rules to compute finitedifference coefficients (Eq. 4), our data driven discretizations [14] predict the local-field-dependent coefficients α = {α −k , ..., α k } via a convolutional neural network: The coefficients α |x=xj depend on the local environment around x j , with the inputs to the neural network being the neighboring fields {C j , C j±1 , ...} and { u j , u j±1 , ...}. For simplicity of presentation, here we use 1-D indices {j, j ± 1, ...} to denote spatially adjacent points. For 2-D advection problems, this computation involves 2-D convolution across both x and y dimensions. We learn the neural network weights W by minimizing the difference between the machine learning prediction and the true solution. Fig. 1 shows the forward solver workflow and training framework. During the forward solve, we replace the computation of finite-difference coefficients with a convolution neural network, while still using classic approaches for the rest of the steps (computing the advection flux and doing the time-stepping). During training, we accumulate the forward solver prediction results over 10 time steps and then compare to the reference solution over this time period, by computing the mean absolute error (MAE) over the entire spatial domain between the two time series: The MAE is used as the loss function for neural network training [27]. We find that using this multi-step loss function (as opposed to a single time step) stabilizes the forward integration, similar to the findings by [28]. In our experiments, we found using MAE resulted in slightly more accurate predictions than using mean square error (MSE), but the difference was not large.
The training of a neural network inside a classic numerical solver is made possible by writing the entire program in a differentiable programming framework [29], which allows efficient gradient-based optimization of arbitrary parameters in the code using automatic differentiation (AD) [30]. AD tools have a long history, dating back to Fortran 77 [31]. Recent developments of AD frameworks, such as TensorFlow [32], PyTorch [33], JAX [34], Flux.jl [35], and Swift [36], are even easier to program and support hardware accelerators like GPUs and TPUs. Those developments make it easier to incorporate machine learning into scientific computing code (e.g. [37]). We implemented our advection solver in TensorFlow Eager [38].

C. Baseline solver and reference solution
As a baseline method, we use the second-order VanLeer advection scheme with a monotonic flux limiter [39]. To obtain the reference "true" solution, we run the baseline advection solver at sufficiently high resolution to ensure the solution has converged. We then down-sample the high-resolution results using conservative averaging, to produce the training and test datasets for our machinelearning-based model on a coarse grid.
We remark that although higher-order schemes with more advanced limiters would be more accurate, any fluxlimited high-order schemes break to first-order under turbulent flows in order to ensure monotonicity [16]. Starting from second-order, increasing the spatial resolution is generally more effective than further improving the solver order or the limiter [40,41].

D. Physical constraints
There is growing emphasis on embedding physical constraints into the design of machine learning methods. This is typically done either either by adding "soft" constraints as terms the loss function [42,43], or "hard" constraints in the model architecture [14,[44][45][46][47][48]. Since here we only replace a small component in the numerical solver with machine learning, we can impose arbitrary physical constraints before and after the neural network components. Using hard constraints allows the machine learning algorithm to focus on approximation problems, by imposing physical consistency requirements by construction. In particular, we require: (1) Finite-volume representation for mass conservation. We compute the flux across grid cell boundaries,  1. End-to-end learning framework with differential programming. During training, the model is optimized to predict future concentrations across multiple time steps, based on a precomputed dataset of snapshots from high resolution simulations. During inference, the optimized model is repeatedly applied to predict time evolution. The neural network component contains a stack of 2-D convolutional layers with ReLU activation functions (degraded to 1-D convolution for 1-D problems.). Physical constraints are imposed before and after the convolutional layers (Section II D). In the "Time-stepping" block, H is the advection operator that computes the concentration update based on the machine-learning estimate of spatial derivatives.
and then apply the flux to update the concentration fields C i . This ensures that mass is exactly conserved. The machine-learning estimate of spatial derivatives ∂C ∂x is used for obtaining the optimal interpolation values C i+ 1 2 at cell boundaries, which is then used for calculating the flux via u i+ 1 2 C i+ 1 2 . (2) Polynomial accuracy constraints. Following [14], we can force the machine-learning-predicted coefficients to satisfy an m-th order polynomial constraint, so that the approximation error decays as O(∆x m ). This ensures that if the learned discretization is fit to solutions that are smooth on the scale of the mesh, we will recover classical finite-difference methods. In our experiments, we find that a first-order constraint gives the best result on coarse grids. This preserves a balance between accuracy constraints and model flexibility that may be particular valuable in non-monotonic regions, where higher order advection schemes often revert to first-order upwinding [18]. First-order accuracy requires k j=−k α j = 0, and can be enforced by applying an affine transformation to the original neural network output (our implementation), or by having the neural network only output {α −k , ..., α k−1 } and solving for the last α k . We choose the constant vector in the affine transformation to match a centered, first order scheme (equal weight on the two nearest grid cells). Accordingly, our randomly initialized neural net at the start of training produces interpolation coefficients that are very close to a centered, first order scheme.
(3) Input feature normalization. Before feeding the current concentration field C to the neural network, we normalize it globally to [0,1]. This ensures that the over-all magnitude of the concentration does not affect the prediction of finite-difference coefficients, and thus our solver satisfies the "semi-linear" requirement for advection schemes that H(aC + b) = aH(C) + b where H is the advection operator and {a, b} are constants (Eq 2.12-2.13 of [19]). Without such normalization, we find that the trained model diverges quickly during the forward integration.

E. Other choices of learned terms
Our training framework can be easily adapted to learn other parameters besides the finite-difference coefficients. In this section, we describe other approaches that we experimented with but did not choose.
Numerical methods introduce artificial numerical dissipation, so it is natural to consider adding explicit corrections to diffusion. One of the earliest flux-correct transport (FCT) algorithms [49] includes an anti-diffusion coefficient of 1/8 as a correction term, though the choice of 1/8 was subjective and it was later acknowledged that such correction should better be velocity-and wavenumber-dependent [50]. We considered learning diffusive correction directly, in the form: where the (anti-)diffusion coefficients D = {D xx , D xy , D yy } are computed by a convolutional neural network D = f (C, u; W ), while the advectiondiffusion equation itself is still solved by a traditional high-order finite volume method. The idea resembles learning the Reynolds stress tensor [10] in a Reynolds averaged Navier Stokes (RANS) simulation. As in Section II B, here the neural network is trained by minimizing the difference between the model prediction and the reference solution. In practice, we found that this learned diffusion model achieves about 3× upscaling compared to the second-order baseline solver, but performs slightly worse than our original approach of learning finite-difference coefficients (Section II B) that can achieve 4× upscaling.
We also experimented with other learned terms, including (1) a pure machine learning approach, by having the neural network directly predict the concentration at the next step C(t + ∆t) based on the current state C(t) and u(t); and (2) having the neural network directly predict the spatial derivative ∂C ∂x instead of the finitedifference coefficients α that need to be further multiplied with the concentration field C to obtain the spatial derivative. We found those methods to be unstable due to the lack of physical constraints (Section II D).

III. NUMERICAL RESULTS
We apply the data driven discretization to one-or two-dimensional advection. Two-dimensional advection is highly relevant for atmospheric modeling, as the vertical dimension can be decoupled from the horizontal dimensions and solved independently [19].
The performance of our learned advection solver (the "neural network model" hereafter) depends on the hyperparameters of the convolutional neural network component. For simplicity, this section only presents the results with the default hyperparameter configuration. For 1-D problems, we use 4 convolutional layers and 32 filters in each layer; For 2-D problems, we use 10 convolutional layers and 128 filters in each layer. All cases use a 3point finite difference stencil (k = 1 in Eq. 4). The impact of hyperparameters on model accuracy and computational speed is further examined in Section IV. We use the Adam optimizer [51] with default parameters for neural network training. Our simple convolutional neural network achitecture already achieves a high accuracy, without additional operations like residual connections and batch normalization.

A. 1-D advection under constant velocity
We first show that our neural network model can achieve near-perfect result for a canonical test problem: 1-D advection constant velocity [39]. We consider a periodic 1-D grid of 32 grid points. The concentration field is shifted by a constant distance per time step, determined by the Courant-Friedrichs-Lewy (CFL) number To generate training data, we initialize 30 square waves with heights randomly-sampled from [0.1, 0.9] and widths from 2 ∼ 8 grid points. Test data are randomly sampled from the same range of width and height. The reference "true" solution is generated by the baseline solver at 8× resolution (256 grid points) and down-sampled to the original coarse grid. Fig. 2 shows one test sample during the forward integration. The first-order upwind scheme exhibits large numerical diffusion, due to its second-order spatial discretizaion error [52]. The second-order VanLeer scheme (our baseline) is more accurate but stills accumulates diffusion over time. In contrast, our neural network model closely tracks the reference "true solution" obtained by the 8× resolution baseline. When a slight numerical diffusion occurs at one step, the next step applies a slight anti-diffusion to correct it. Intuitively, the solver learns that the optimal solution in one-dimensional advection is to maintain the initial shape. Fig. 3 shows the mean absolute error over time, averaged over all test samples. The error indicates the deviation from the reference solution obtained by the baseline solver at 256 grid points. The neural network model achieves a factor of 8 less error than the baseline secondorder VanLeer scheme.
We further investigate this intriguing behavior of our neural network model using out-of-sample test data. As shown in Fig. 4 ually turns Gaussian waves into squares, which are the only shape in the training data. Then, the model can maintain the squares indefinitely. Such phenomenon of "turning other shapes to squares" also exists in manuallydesigned schemes that are overly-optimized for square waves [50]. The over-fitting problem here can be easily fixed by adding Gaussian shapes into training data; after that the neural network model can track both Gaussian and square shapes perfectly. Given that the input features for convolutional neural networks are localized in space, covering representative input patterns only requires limited amounts of training data.

B. 2-D deformational flow
We next demonstrate that our neural network model can also achieve near-perfect result for a 2-D deformational flow test, originally proposed by [17] and later extended to spherical coordinates as a standard test for atmospheric advection schemes [53,54]. The spatial domain is a square [0, 1] × [0, 1], and the velocity field is a periodic swirling flow: u(x, y, t) = sin 2 (πx) sin(2πy) cos(πt/T ) (8) v(x, y, t) = sin 2 (πy) sin(2πx) cos(πt/T ) where the period T = 5 in our setup. The direction of this flow reverses at t = (n − 1 2 )T for any positive integer n. The exact solution at t = nT is identical to the initial condition.
The initial concentration field is a blob centered at [1/4, 1/4]: r(x, y) = min(1, The model is not directly trained on this deformational flow, but instead on an ensemble of periodic, divergencefree, random velocity fields, implemented as superpositions of Sinusoidal waves as described by [55]. The trained model is able to generalize across different flows as long as the training data contain representative local patterns. Fig. 5 shows the advection under deformational flow for the baseline and the neural network models, both on 64 × 64 grid points. The time step is chosen so that the maximum CFL number is 0. 5 the baseline VanLeer scheme incurs large numerical diffusion when the initial blob is stretched to a thin filament [16].
To quantify the numerical diffusion, we use the entropy S as a metric [56]: where the concentration C is scaled such that the initial conditions falls in the range [0, 1], and β is a normalization factor so that the initial entropy is 1. Entropy is conserved under pure advection and increases under diffusion. To avoid an undefined answer if any C i < 0, we use first set negative values in the concentration to zero, and evaluate C = 0 via the limit x log x = 0 as x → 0. Fig. 6 shows the entropy over time. Any monotonic advection solver can only increase entropy; any entropy decrease indicates nonphysical anti-diffusion, which often occurs due to numerical instability. Strikingly, the neural network model can decrease entropy, while still remaining numerically stable. Although such behavior seems to be nonphysical, it is indeed the best possible solution on such a coarse grid. On a grid that perfectly resolves the concentration field, the entropy remains constant under the deformational flow. Yet on a coarse grid view, the computed entropy increases when the initial blob turns into filament due to conservative averaging, and then decreases when the filament reverts back into a blob. Our neural network model can disobey the commonlyused constraint of non-decreasing entropy, and thus more closely matches the exact solution, when compared to traditional monotonic solvers.

C. 2-D turbulent flow
As the final test, we use the velocity fields from freelyevolving, decaying 2-D turbulence simulations in pyqg (https://github.com/pyqg/pyqg). The spatial domain is [0, 2π] × [0, 2π] with periodic boundary condition. We use a 256 × 256 grid for generating the reference solution using the baseline solver, and a 32 × 32 coarse grid for model evaluation. As in previous cases, here the advection time step is chosen so that the maximum CFL number is 0.5.
The training and test velocities are generated from different random seeds. We start with the McWilliams-84 random initial condition [57] and let the turbulence decay with time. We discard the initial 4 seconds of the simulation so that the velocity field can be resolved on the coarse grid. For the initial concentration field, we use an ensemble of 10 blobs with width 0.5 at random locations. Note that the spatial scale of the concentration field under turbulent advection can become much smaller than the scale of the velocity field [15,58], making it challenging for traditional advection solvers to resolve the concentration gradient. We use 20 random initial conditions for training data and 20 for test data. The actual sample size for the training dataset is much larger, as each initial condition is integrated into a time series of 1024 steps on the fine grid or 128 steps on the coarse grid, which is further broken into many 10-step time series for calculating the multi-step loss function. Fig. 7 shows one test sample under the 2-D turbulent flow, for both the initial condition (the left column of Fig. 7) and the integration results after 256 time steps (the middle and right columns of Fig. 7). Note that this is twice the maximum number of time-steps used for model training. The initial blobs are stretched into thin filaments under the turbulent flow. The baseline solver (second-order VanLeer scheme) can resolve such filaments on the fine-resolution grid, but incurs large numerical diffusion on the coarse grid and loses the sharp concentration gradient. However, when the fine-grid solution is directly resampled to the coarse grid, most sharp features can actually be preserved. Thus, the inability to resolve sharp gradients is not due to the coarse grid itself, but instead due to the numerical error in the baseline advection solver. Our neural network model, trained to track the optimal reference solution on the coarse grid, is able to preserve sharp features during the forward integration. The model performs well on all test samples, with more shown in Appendix. Fig. 8 shows a variety of error metrics for advection under turbulent flow, averaged over all test samples. The error is computed as the deviation from the reference solution obtained by the baseline solver at 256 × 256 grid. We also compare the baseline solver at intermediate grid resolutions (64 × 64 and 128 × 128). All solutions are resampled to the 32 × 32 coarse grid for error calculation. We use two measurements of accuracy, mean absolute error (our training loss) and the coefficient of determination R 2 , which means the goodness of fit for linear regression models for to reference solution. Based these metrics, our neural network model achieves roughly the same accuracy as the baseline method at 4× resolution (128 × 128). We also evaluate the entropy for all so- Error for 2-D turbulent advection on test data. The neural network model achieves the same accuracy as the second-order VanLeer baseline scheme at 4× resolution, and entropy similar to the baseline at 8× resolution.
lutions based on Eq. (10), which suggests that from a statistical perspective our neural net model almost perfectly matches the reference simulation on which it was trained. Figure 9 illustrates the limitations of stability and generalization for our neural net model by integrating for far longer than the 128 time steps used for training data. After 1000 time integration steps, our neural net model shows obvious numerical artifacts (checkerboard patterns) and very poor accuracy for about 10% of random seeds. Unlike the baseline models, our neural net model does not guarantee properties such as monotonicity, and when presented with examples outside of its training data it occasionally extrapolates poorly. Figuring out how to guarantee stability for neural net models, either by training on more comprehensive datasets or by imposing architecture constraints, is an important topic for future work.
Finally, to get a glimpse into the inner workings of the trained model, Fig. 10 examines predicted interpolation coefficients for x component of the velocity field. We see that similar to hand-crafted methods, the learned interpolation depends on both velocity and concentration. While some of the symmetries have been clearly learned from the data, we believe that incorporating such priors could improve the results further.

IV. COMPUTATIONAL PERFORMANCE AND ACCURACY WITH DIFFERENT HYPERPARAMETERS
There is a tradeoff between accuracy and speed for our neural network model, as using a larger convolutional neural network increases both the accuracy and the run time. We performed a grid search on model hyperparameters, for the number of layers ranging from [4,6,8,10], the number of convolutional filers ranging from [16,32,64,128], and the finite-difference stencil size ranging from [3,5], with each case replicated 3 times with different random seeds. The model accuracy is evaluated on the 2-D turbulence case in Section III C, and the run time is measured on a single Nvidia V100 GPU. Illustration of how prediction of the interpolation coefficients changes for different combinations of concentration (top row) and velocity field (left column) inputs. Concentration values represented by color; velocity field has unit magnitude and changes direction as shown in the plot. The target location for the interpolation is marked by a red bar on the concentration plots. The model predominantly interpolates along the velocity field and concentration edges, rediscovering the upwinding-like methods at the corner cases of the facet. While the model learned some general symmetries, we expect even better results for models that incorporate symmetries by construction. Fig. 11 shows the model accuracy and speed using different hyperparameters. The performance of the baseline solver at intermediate grid resolutions (64 × 64 and 128 × 128) is overlaid on for comparison. A large neural network (8 ∼ 10 layers and 128 filters) achieves comparable accuracy and speed as the baseline solver at 4× resolution, while a small neural network (4 layers and 32 filters) performs similarly to the baseline solver at 2× resolution. Fig. 12 shows that using 64 filters and 10 layers achieves a good balance between accuracy and speed, in which case the model achieves similar accuracy as the 4× resolution baseline while being 80% faster.
The model speed largely depends on the code implementation and software configuration. Our current implementation of the neural network model has a lot of room for performance optimization. For example, our code still requires unnecessary, large memory allocation, and does not use the reduced-precision Tensor Cores in the V100 GPU. With more performance tuning as well as techniques like model compression and quantization [59],  Hyper-parameter effect on neural network model performance. Same as Fig. 11, but here the data points are grouped by different numbers of convolutional layers and the number of filters, with shape denoted the size of the stencil. Duplicate points corresponds to identical models trained with different random seeds. the neural network model may significantly outperform the baseline in terms of computational performance. Incorporating neural networks into numerical methods also allows better utilization of current and emerging hardware. It is reported that "current (Earth sys-tem) models running on next generation architectures use less than 2% of peak performance" [60]. This is because classic numerical methods (e.g. finite difference, finite volume) are limited by memory bandwidth rather than processor speed [61] [62]. In contrast, neural networks mostly consist of dense matrix multiplications with a high compute-to-memory ratio, and therefore can achieve near-peak performance on both CPUs and hardware accelerators (see the Roofline charts [63] in [64]). We measure the machine utilization using Perf on CPU and NVProf on GPU, and find that the neural network model achieves 80% of peak FLOPs (floating point operations per second), while the baseline solver only uses 1 ∼ 2% of peak FLOPs. Clever use of neural network emulations inside existing models may provide a way to squeeze out "free compute cycles" that are currently not utilized.

V. CONCLUSION
We developed a data-driven discretization for solving passive scalar advection in one-or two-dimensions. Instead of using pre-defined coefficients to compute the spatial derivatives in the partial differential equation (PDE), we used a convolutional neural network to learn the optimal finite difference coefficients, so that the solution best matches the "true" result obtained by high-resolution reference simulations.
Our neural-network-based model is able to obtain nearperfect results for idealized 1-D and 2-D test problems, while a traditional high-order solver incurs significant diffusion error. Under a 2-D turbulent flow, the neural network model running on a coarse grid can achieve the same accuracy as a traditional high-order solver running on a 4× higher resolution grid. The neural network model exhibits several interesting behaviors that may help explain its unusual accuracy. Our learned models have been specifically optimized for modeling specific class of flows used as training data, which limits their range of validity. For example, in 1D the model converts unseen shapes into known shapes, and on 2D turbulent flows the model occasionally fails entirely when asked to make predictions for much longer times than were used in training. An important challenge for future work is identify techniques that can ensure learned discretizations are robust even to such outof-distribution inputs. Alternatively, it may be able to identify training datasets that cover the full range of phenomena of interest, e.g., in the context of weather or pollution forecasts where the same equations are solved day after day.
At the same accuracy, the speed of our neural network model is comparable to the baseline high-order solver (that runs at 4× higher resolution). There is a lot of room for further optimizing the neural network performance in our code implementation. Notably, the neural network model can achieve a much higher machine uti-lization than traditional finite-difference methods, and will better utilize emerging hardware accelerators.
An open question is how to apply our method in existing computational fluid dynamics (CFD) or climate/weather models, which tend to be implemented in large codebases of C++ or Fortran. Although past work has successfully replaced one component in a model with neural networks [65], our approach works best in an endto-end differentiable program. Recent efforts in implementing models in Julia [66] and JAX [37] should ease the integration of scientific computing and machine learning.
Code and tutorials for this work are available at https://github.com/google-research/ data-driven-pdes. Figure S1 shows more test samples for the 2-D turbulent advection problem in Section III C. In all test samples, the neural network model is able to maintain a sharp gradient that closely matches the reference true resolution, while the baseline model incurs significant numerical diffusion error.