Dedalus: A Flexible Framework for Numerical Simulations with Spectral Methods

Numerical solutions of partial differential equations enable a broad range of scientific research. The Dedalus Project is a flexible, open-source, parallelized computational framework for solving general partial differential equations using spectral methods. Dedalus translates plain-text strings describing partial differential equations into efficient solvers. This paper details the numerical method that enables this translation, describes the design and implementation of the codebase, and illustrates its capabilities with a variety of example problems. The numerical method is a first-order generalized tau formulation that discretizes equations into banded matrices. This method is implemented with an object-oriented design. Classes for spectral bases and domains manage the discretization and automatic parallel distribution of variables. Discretized fields and mathematical operators are symbolically manipulated with a basic computer algebra system. Initial value, boundary value, and eigenvalue problems are efficiently solved using high-performance linear algebra, transform, and parallel communication libraries. Custom analysis outputs can also be specified in plain text and stored in self-describing portable formats. The performance of the code is evaluated with a parallel scaling benchmark and a comparison to a finite-volume code. The features and flexibility of the codebase are illustrated by solving several examples: the nonlinear Schrodinger equation on a graph, a supersonic magnetohydrodynamic vortex, quasigeostrophic flow, Stokes flow in a cylindrical annulus, normal modes of a radiative atmosphere, and diamagnetic levitation. The Dedalus code and the example problems are available online at http://dedalus-project.org/.

Partial differential equations (PDEs) describe continuum processes.The continuous independent variables typically represent space and time, but can also represent more abstract quantities such as momentum, energy, age of a population, or currency.The ability to equate the infinitesimal rates of change of different quantities produces endless possible applications.Important examples include wave propagation, heat transfer, fluid flow, quantum mechanical probability flux, chemical & nuclear reactions, biological phenomena, and even financial markets [1] or social/population dynamics [2,3].Even more intriguing are possible combinations of several of the above [4,5].
Apart from a small handful of closed-form solutions, the vast majority of PDEs require serious numerical and computational intervention.A wide variety of numerical algorithms solve PDEs through the general approach of discretizing its continuous variables and operators to produce a finite-sized algebraic system yielding an approximate solution.Finite element, finite volume, and finite difference methods are common schemes that discretize the domain of the PDE into cells or points and derive algebraic relations between the values at neighboring cells or points from the governing equations.These methods can accommodate complex geometries (such as the flow around an aircraft), but can be difficult to implement for complex equations and typically converge relatively slowly as additional cells or points are added.
In contrast, spectral methods discretize variables by expanding them in a finite set of basis functions and derive equations for the coefficients of these functions.These methods are wellsuited to many equation types and provide rapidly converging solutions (e.g.exponential for smooth functions) as additional modes are included.However, spectral methods are typically limited to simple geometries (such as boxes, cylinders, and spheres).Recent literature has developed sparse representations of equations that are substantially better conditioned and faster than traditional dense collocation techniques [6][7][8][9][10][11][12].These features make spectral methods an attractive choice for scientists seeking to study a wide variety of physical processes with high precision.
While computing capacities have grown exponentially over the past few decades, the progression of software development has been more gradual.Many software packages have chosen one or a few closely related PDEs and focused on creating highly optimized implementations of algorithms that are wellsuited to those choices.These solvers usually hardcode not only the PDE but also the dynamical variables, choice of input control parameters, integration scheme, and analysis output.While many world-class simulation codes have been developed this way, often scientific questions lead beyond what a dedicated code can do.This is not always because of a lack of computational power or efficiency, but often because continued progress requires an alternative model, dynamical variable reformulation, or more exotic forms of analysis.
Simulation packages with flexible model specification also address an underserved scientific niche.It is often straightforward to write serial codes to solve simple one-dimensional equations for particular scientific questions.It is also worthwhile to invest multiple person-years building codes that solve well-known equations.However, it can be difficult to justify spending significant time developing codes for novel models that are initially studied by only a few researchers.We believe this leaves many interesting questions unaddressed simply from a local cost-benefit analysis.Flexible toolkits can lower the barrier to entry for a large number of interesting scientific applications.
The FEniCS1 and Firedrake2 packages both allow users to symbolically enter their equations in variational form and produce finite-element discretizations suitable for forwardmodeling and optimization calculations.These are very powerful tools for solving wide ranges of PDEs in complicated geometries, however they remain less efficient than spectral methods for many PDEs in simple geometries.Channelflow3 uses sparse Chebyshev methods to simulate the Navier-Stokes equations and allows users to find and analyze invariant solutions using dynamical systems techniques.However, the code is restricted to solving incompressible flow in a periodic channel geometry.The Chebfun4 and ApproxFun5 packages are highly flexible toolkits for performing function approximation using spectral methods.They include a wide variety of features including sparse, well-conditioned, and adaptive methods for efficiently solving differential equations to machine precision.However, these packages are not optimized for the solution of multidimensional PDEs on parallel architectures.
We begin this paper with a review of the fundamental theory of spectral methods and a description of the specific numerical method employed by Dedalus ( §II).We then provide an overview of the project and codebase using a simple example problem ( §III).Sections §IV- §X detail the implementations of the fundamental modules of the codebase, with a particular emphasis on its systems for symbolic equation entry and automatic distributed-memory parallelization.Although these sections describe essential details of the code, a careful reading is not necessary to begin using Dedalus.Finally, §XI demonstrates the features and performance of the codebase with a parallel scaling analysis, a comparison to a finite volume code, and example simulations of nonlinear waves on graphs, compressible magnetohydrodynamic flows, quasi-geostrophic flow in the ocean, Stokes flow in cylindrical geometry, atmospheric normal modes, and diamagnetic levitation.

Spectral representations of functions
A spectral method discritizes functions by expanding them over a set of basis functions.These methods find broad application in numerical analysis and give highly accurate and efficient algorithms for manipulating functions and solving differential equations.The classic reference Boyd et al. [78] covers the material in this section in great detail.
Consider a complete orthogonal basis {φ n (x)} and the associated inner product (φ n , φ m ) φ ∝ δ n,m .The spectral representation of a function f (x) comprises the coefficients { f φ n } appearing in the expansion of f (x) as with We will use bra-ket notation to denote such normalized brafamily inner products.Formally, an exact representation requires an infinite number of nonzero coefficients.Numerical spectral methods approximate functions (e.g.PDE solutions) using expansions that are truncated after N modes.The truncated coefficients f φ n are computed using quadrature rules of the form where the weights, w i , and collocation points, x i , depend on the underlying inner-product space.The quadrature scheme constitutes a discrete spectral transform for translating between the spectral coefficients and N samples of the function.
The error in the truncated approximation is often of the same order as the last retained coefficient.The spectral coefficients of smooth functions typically decay exponentially with n, resulting in highly accurate representations.The spectral coefficients of non-smooth functions typically decay algebraically as n −α , where α depends on the order of differentiability of f (x).Exact differentiation and integration on the underlying basis functions provides accurate calculus for general functions.For PDEs with highly differentiable solutions, spectral methods therefore give significantly more accurate results than fixed-order schemes.

Common spectral series
Trigonometric polynomials are the archetypal spectral bases: sine series, cosine series, and complex exponential Fourier series.These bases provide exponentially converging approximations to smooth functions on periodic intervals.The Fast Fourier Transform (FFT) can compute the series coefficients in O(N log N) time, enabling computations requiring both the coefficients and grid values to be performed efficiently.
The classical orthogonal polynomials also frequently appear as spectral bases.Most common are the Chebyshev polynomials {T n (x)}, which provide exponentially converging approximations to smooth functions on the interval [−1 , 1].A simple change of variables relates the Chebyshev polynomials to cosine functions, T n (x) = cos(n cos −1 (x)). (4) Geometrically, T n (x) is the projection of cos(nx) from the cylinder to the plane; see Fig. 1.The relation to cosines enables transforming between Chebyshev coefficients and values on collocation points using the fast discrete cosine transform (DCT).The fast transform often makes Chebyshev series preferable to other polynomials on finite intervals.

Solving differential equations with spectral methods
Spectral methods solve PDEs by creating algebraic equations for the coefficients of the truncated solution.Different approaches for constructing and solving these systems each come with advantages and disadvantages.In examining a few approaches, we consider a simple linear PDE of the form Lu(x) = f (x), where L is a x-differential operator.
The collocation approach is perhaps the most common polynomial spectral method.In this case, the differential equation is enforced at the interior collocation points.The solution is written in terms of the values at these points: The boundary conditions typically replace the DE at the collocation endpoints.The collocation method works well in a broad range of applications.The primary advantages are that many boundary conditions are easily enforced and the solution occurs on the grid.The primary disadvantages are that the method produces dense matrices (L) and more complicated boundary conditions require more care to implement [79].
An alternative is the Galerkin method, where the solution is written in terms of "trial" functions {φ n } that satisfy the boundary conditions.The differential equation is then projected against a set of "test" functions {ψ n }: For periodic boundary conditions, the Galerkin method using Fourier series produces diagonal derivative operators, allowing constant-coefficient problems to be solved trivially.Galerkin bases can be constructed from Chebyshev polynomials for simple boundary conditions, with the caveat that the series coefficients must be converted back to Chebyshev coefficients to apply fast transforms.
The tau method generalizes the Galerkin method by solving the perturbed equation where P(x) is specified.The parameter τ adjusts to accommodate the boundary conditions, which are enforced simultaneously.The tau method provides a conceptually straightforward way of applying general boundary conditions without needing a specialized basis.The classical tau method [80,81] uses the same test and trial functions and assumes P(x) = φ n−1 (x), making it equivalent to dropping the last row of the discrete L matrix and replacing it with the boundary condition.This classical formulation with Chebyshev series results in dense matrices, but (as described below) the tau method can be modified to produce sparse and banded matrices for many equations.

B. A general sparse tau method
This section describes the spectral method employed in Dedalus.We use a tau method with different trial and test bases, which produces sparse matrices for general equations.Formulating problems as first-order systems and using Dirichlet preconditioning (basis recombination) renders matrices fully banded.
The method is fundamentally one-dimensional, but generalizes trivially to D-dimensions with all but one separable dimension (e.g.Fourier-Galerkin).Multidimensional problems then reduce to an uncoupled set of 1D problems that are solved individually.
Other approaches based on the ultraspherical method [9] or integral formulations [7] similarly result in sparse and banded matrices for many equations.Our particular approach is designed to accommodate general systems of equations and boundary conditions automatically.

Sparse differential operators
Traditional polynomial spectral formulations often result in dense derivative matrices.In particular, the derivatives of Chebyshev polynomials are dense when expanded back in Chebyshev polynomials: Fig. 2 shows the matrix version of Eq. ( 8), referred to as the "T-to-T" form.However, the derivatives of Chebyshev polynomials (of the first kind) are proportional to the Chebyshev polynomials of the second kind, U n (x): Defined trigonometrically, Using test functions ψ n = U n and trial functions φ n = T n produces a single-band derivative matrix, called the "T-to-U" form (right side in Fig. 2).We convert non-differential terms from T-to-U via the sparse conversion relation Together, these relations render first-order differential equations sparse.Higher-order equations can be handled by utilizing ultraspherical polynomials for higher derivatives [9], but our approach is to simply reduce all equations to first-order systems.The sparse-τ method extends to other orthogonal polynomial series.Appendix A 1 lists the full set of derivative and conversion relations implemented in Dedalus.

Banded boundary conditions
Choosing the tau polynomial P(x) = U N −1 (x) in a T-to-U method allows dropping the last matrix row and finding u(x) without finding τ.The system then consists of a banded interior matrix, bordered with a dense boundary-condition row.Applying a right-preconditioner renders the boundary row sparse and the system fully banded.For Dirichlet boundary conditions, using the adjoint relation of Eq. (11) gives the non-orthogonal polynomials, where In this basis, Dirichlet boundary conditions only involve the first two expansion coefficients.This technique is known as "Dirichlet preconditioning" or "basis recombination".In summary, for first-order systems with Dirichlet boundary conditions, choosing φ n = D n , ψ n = U n , and P = U N −1 produces fully banded matrices.The resulting matrices are efficiently sovled using sparse/banded algorithms.With a first-order system, it is possible to reformulate any boundary condition (e.g.Neumann or global integral conditions) in terms of a Dirichlet condition on the first-order variables.This formulation extends to other orthogonal polynomial bases.Appendix A 2 lists the full set of Dirichlet recombinations implemented in Dedalus.

Non-constant coefficients
Many physical problems require multiplication by spatially non-constant coefficients (NCCs) that vary slowly compared to the unknown solution.Olver et al. [9], in the context of Chebyshev polynomials, observed that multiplication by such NCCs corresponds to band-limited spectral operators.
Multiplication by a general NCC g(x) acts linearly on u(x) via Given an expansion of the NCC in some basis as For all orthogonal polynomials, ψ i |ξ n φ j = 0 if |i − j | > n.We truncate NCC expansions by dropping all terms where g ξ n is smaller than some threshold amplitude.The overall bandwidth of G ψφ is therefore N , the number of terms that are retained in the expansion of the NCC.For smooth NCCs, N N and the multiplication matrix has low bandwidth.Appendix A 3 lists the full set of multiplication matrices implemented in Dedalus.

Solving systems of equations
For coupled systems of equations with S variables, S j=1 L i, j u j (x) = f i (x) for i = 1, ..., S. In block-operator form where X is the state-vector of variables, u j , and L is a matrix of the operators.We discretize the system by replacing each L i, j with its sparse-tau matrix representation described above.In block-banded form, using Kronecker products and the placement matrix The system matrix L sys acts on the concatenation of the variable coefficients and has bandwidth O(SN).
If the operator matrices are interleaved rather than blockconcatenated, the resulting system matrix will act on the interleaved variable coefficients and takes the form This matrix will have bandwidth O(S), making it practical to simultaneously solve coupled systems of equations with large N.

Summary & example
Dedalus uses a modified tau method that produces banded and well-conditioned matrices.Carefully chosen test-trial basis pairs render derivatives sparse.The first-order formulation makes all boundary conditions equivalent to Dirichlet conditions, which become banded under basis recombination.Truncated NCC expansions retain bandedness for smooth NCCs.Finally, matrices and coefficients are interleaved to keep systems of equations banded.Fig. 3 shows the matrices at various conceptual stages for a Chebyshev discretization of Poisson's equation in 1D with Dirichlet and Neumann boundary conditions: The combination of writing equations as first-order systems, T-to-U derivative mapping, Dirichlet preconditioning, and grouping modes before variables produces a banded pencil matrix.
We also note that coupled systems easily allow constraint equations (those without temporal derivatives) to be imposed alongside evolution equations, avoiding variable reformulations and/or splitting methods.For instance, the divergence condition in incompressible hydrodynamics can be imposed directly (determining the pressure).The momentum equation can then be integrated without splitting or derived pressure boundary conditions.

A. Codebase structure
Dedalus makes extensive use of object-oriented programming to provide a simple interface for the parallel solution of general systems of PDEs.The basic class structure reflects the mathematical objects that are encountered when posing and solving a PDE.As an illustrative example, we consider solving the Fisher-KPP equation, a reaction-diffusion equation that first arose in ecology: with unknown variable u(x, t), diffusion coefficient D, and reaction rate R [82,83].
Properly posing the PDE first requires specifying its spatial domain.This is done by creating a Basis object discretizing each dimension over a specified interval and forming a Domain object as the direct-product of these bases.Here we construct a 2D channel domain, periodic in x and with Neumann boundary conditions on the boundaries in y, as the direct product of a Fourier basis and a Chebyshev basis:  4) After reversing the Kronecker product to group by modes rather than variables.This final matrix is highly sparse and completely banded.
To finish posing the IVP, we need to specify a temporal integration scheme, the temporal integration limits, and the initial values of the variables.This is done through the Solver object built by each Problem.The initial data of the state fields can be easily accessed from the solver and set in grid ( g ) or coefficient ( c ) space.Finally, the problem is solved by iteratively applying the temporal integration scheme to advance the solution in time.In Dedalus, this main loop is directly written by the user, allowing for arbitrary data interactions as the integration progresses.Although not included in this example, Dedalus also provides extensive analysis tools for evaluating and saving quantities during the integration.While this simple example covers the core user-facing classes, several other classes control the automatic MPI parallelization and efficient solution of the solvers.All together, the fundamental class hierarchy consists of the following: • Basis: A one-dimensional spectral basis.
• Domain: The direct product of multiple bases, forming the spatial domain of dependence of a PDE.
• Distributor: Directs the parallel decomposition of a domain and spectral transformations of distributed data between different states.
• Layout: A distributed transformation state, e.g.gridspace or coefficient-space.
• Field: A scalar-valued field over a given domain.The fundamental data unit in Dedalus.
• Operator: Mathematical operations on sets of fields, composed to form mathematical expressions.
• Handler: Captures the outputs of multiple operators to store in memory or write to disk.
• Evaluator: Efficiently coordinates the simultaneous evaluation of the tasks from multiple handlers.
• Timestepper: ODE integration schemes that are used to advance initial value problems.
• Solver: Coordinates the actual solution of a problem by evaluating the underlying operators and performing time integration, linear solves, or eigenvalue solves.
The following sections detail the functionality and implementation of these classes.

B. Dependencies
Dedalus is provided as an open-source Python3 package.We choose to develop the code in Python because it is an opensource, high-level language with a vast ecosystem of libraries for numerical analysis, system interaction, input/output, and data visualization.While numerical algorithms written directly in Python sometimes suffer from poor performance, it is quite easy to wrap optimized C libraries into high-level interfaces with Python.A typical high-resolution Dedalus simulation will spend a majority of its time in optimized C libraries.
• An implementation of the MPI communication interface and its Python wrapper mpi4py [87].
• The HDF5 C-library for reading and writing HDF5 files and its Python wrapper h5py [88,89].
A wide range of standard-library Python packages are used to build logging, configuration, and testing interfaces following standard practices.Additionally, and perhaps counter-intuitively, we have found that creating algorithms to accommodate a broad range of equations and domains has resulted in a compact and maintainable codebase.Currently, the Dedalus package consists of roughly 10,000 lines of Python.By producing sufficiently generalized algorithms, it is possible to compactly and robustly provide a great deal of functionality.

C. Documentation
Dedalus has been publicly available under the open-source GPL3 license since its creation, and is developed under distributed version control.The online documentation includes a series of tutorials and example problems demonstrating the code's capabilities and walking new users through the basics of constructing and running a simulation.Links to the source code repository and the documentation are available through the project website, http://dedalus-project.org/.

D. Community
The Dedalus collaboration uses open-source code development and strongly supports open scientific practices.The benefits of the philosophy include distributed contributions to the codebase, a low barrier-to-entry (especially for students), and detailed scientific reproducibility.We outline these ideas in detail in Oishi et al. [90].
Code development occurs through a public system of pull requests and reviews on the source code repository.Periodic releases are issued to the Python Package Index (PyPI), and a variety of full-stack installation channels are supported, including single-machine and cluster install scripts and a condabased build procedure.
The core developers maintain mailing lists for the growing Dedalus user and developer communities.The mailing list is publicly archived and searchable, allowing new users to find previous solutions to common problems with installation and model development.The Dedalus user list currently has over 150 members.A list of publications using the code is maintained online6.

IV. SPECTRAL BASES
Dedalus currently represents multidimensional fields using the direct product of one-dimensional spectral bases.This direct product structure generally precludes domains including coordinate singularities, such as full disks or spheres.However, curvilinear domains without coordinate singularities, such as cylindrical annuli can still be represented using this direct product structure (for an example, see XI F).Basis implementations form the lowest level of the program's class hierarchy.The primary responsibilities of the basis classes are to define their collocation points and to provide an interface for transforming between the spectral coefficients of a function and the values of the function on their collocation points.
An instance of a basis class represents a series of its respective type truncated to a given number of modes N c , and remapped from native to problem coordinates with an affine map.A basis object is instantiated with a name string defining the coordinate name, a base_grid_size integer setting N c , parameters fixing the affine coordinate map (a problem coordinate interval for bases on finite intervals, or stretching and offset parameters for bases on infinite intervals), and a dealias factor.Each basis class is defined with respect to a native coordinate interval, and contains a method for producing a collocation grid of N g points on this interval, called a native grid of scale s = N g /N c .Conversions between the native coordinates x n and problem coordinates x p are done via an affine map of the form x p = ax n + b, which is applied to the native grid to produce the basis grid.
Each basis class defines methods for forward transforming (moving from grid values to spectral coefficients) and backward transforming (vice versa) data arrays along a single axis.Using objects to represent bases allows the transform methods to easily cache plans or matrices that are costly to construct.The basis classes also present a unified interface for implementing identical transforms using multiple libraries with different performance and build requirements.We now define the basis functions, grids, and transform methods for the currently implemented spectral bases.

A. Fourier basis
For periodic dimensions, we implement a Fourier basis consisting of complex exponential modes on the native interval [0, 2π]:  and a native grid consisting of evenly-spaced points beginning at the left side of the interval: A function is represented as a symmetric sum over positive and negative wavenumbers where k m = floor((N c − 1)/2) is the maximum resolved wavenumber, excluding the Nyquist mode k N = N c /2 when N c is even.When f is a real function, we store only the complex coefficients corresponding to k ≥ 0; the k < 0 coefficients are determined by conjugate symmetry.We discard the Nyquist mode since it is only marginally resolved: for real functions, the Nyquist mode captures cos(k N x), but not sin(k N x), which vanishes on the grid when N g = N c .The expansion coefficients are given explicitly by and are computed with the fast Fourier transform (FFT).We implement FFTs from both the Scipy and FFTW libraries, and rescale the results to match the above normalizations, i.e. the coefficients directly represent mode amplitudes.The coefficients are stored in the traditional FFT output format, starting from k = 0 and increasing to k m , then following with −k m and increasing to −1.

B. Sine/Cosine basis
For periodic dimensions possessing definite symmetry with respect to the interval endpoints, we implement a SinCos basis consisting of either sine waves or cosine waves on the native interval [0, π]: and a native grid consisting of evenly-spaced interior points: Functions with even parity are represented with cosine series as while functions with odd parity are represented with sine series as The Nyquist mode k N = N c is dropped from the sine series, since the corresponding cosine mode vanishes on the grid when N g = N c .The expansion coefficients are given explicitly by and are computed using the fast discrete cosine transform (DCT) and discrete sine transform (DST).The same grid is used for both series, corresponding to type-II DCT/DSTs for the forward transforms, and type-III DCT/DSTs for the backward transforms.We implement transforms from both the Scipy and FFTW libraries, and rescale the results to match the above normalizations, i.e. the coefficients directly represent mode amplitudes.These transforms are defined to act on real arrays, but since they preserve the data-type of their inputs, they can be applied simultaneously to the real and imaginary parts of a complex array.The spectral coefficients for complex functions are therefore also complex, with their real and imaginary parts representing the coefficients of the real and imaginary parts of the function.

C. Chebyshev basis
For finite non-periodic dimensions, we implement a Chebyshev basis consisting of the Chebyshev-T polynomials on the native interval [−1, 1]: The native Chebyshev grid uses the Gauss-Chebyshev quadrature nodes (a.k.a. the roots or interior grid): Near the center of the interval, the grid approaches an even distribution where ∆x ≈ π/N g .Near the ends of the interval, the grid clusters quadratically and allows very small structures to be resolved (Fig. 4).
A function is represented as The expansion coefficients are given explicitly by and are computed using the fast discrete cosine transform (DCT) via the change of variables x = cos(θ).The Chebyshev basis uses the same Scipy and FFTW DCT functions as the cosine basis, wrapped to handle the sign difference in the changeof-variables and preserve the ordering of the Chebyshev grid points.It also behaves similarly for complex functions, preserving the data type and producing complex coefficients for complex functions.Chebyshev rational functions can also be used to discretize the half line and the entire real line [12].These functions are not implemented explicitly in Dedalus, but can be utilized with the Chebyshev basis by manually including changes of variables in the equations.

D. Legendre basis
For finite non-periodic dimensions, we also implement a Legendre basis consisting of the Legendre polynomials P n (x) on the native interval [−1, 1].The Legendre polynomials are orthogonal on this interval as where N 2 n = 2/(2n + 1).The native grid points x i are the Gauss-Legendre quadrature nodes, calculated using scipy.special.roots_legendre.
A function is represented as The expansion coefficients are given by where w i are the Gauss-Legendre quadrature weights, also computed using scipy.special.roots_legendre.We synthesize the basis functions with the standard recursion relations in extended precision, which prevents underflows or overflows in problems with large numbers of modes.Quadrature-based Matrix-Multiply Transforms (MMTs) convert between grid values and coefficients.

E. Hermite basis
For problems on the whole real line (−∞, ∞), we implement a Hermite basis consisting of the physicists' Hermite polynomials H n (x) and the normalized Hermite functions where N 2 n = π 1/2 2 n n!.The Hermite polynomials are orthogonal under the Gaussian weight as The "enveloped" Hermite functions incorporate the weight and normalizations so that Since the Hermite functions exist over the entire real line, the affine map from native to problem coordinates is fixed by specifying center and stretch parameters, rather than specifying a problem interval.The native grid points x i are the Gauss-Hermite quadrature nodes, calculated using scipy .special.roots_hermite.
Polynomial functions are represented in the standard basis as while functions that decay towards infinity are represented in the enveloped basis as The expansion coefficients are where w i are the Gauss-Hermite quadrature weights, also computed using scipy.special.roots_hermite.We synthesize the basis functions with the standard recursion relations in extended precision, which prevents underflows or overflows in problems with hundreds of modes.Quadrature-based Matrix-Multiply Transforms (MMTs) convert between grid values and coefficients.

F. Laguerre basis
For problems on the half real line (0, ∞), we implement a Laguerre basis consisting of the standard Laguerre polynomials L n (x) and the normalized Laguerre functions The Laguerre polynomials are orthonormal under the exponential weight: The enveloped functions incorporate the weight so that Since the Laguerre functions exist over the positive half line, the affine map from native to problem coordinates is fixed by specifying edge and stretch parameters, rather than specifying a problem interval.A negative value of the stretch parameters can be used to create a basis spanning the negative half line.The native grid points x i are the Gauss-Laguerre quadrature nodes, calculated using scipy.special.roots_laguerre.
Polynomial functions are represented in the standard basis as while functions that decay towards infinity are represented in the enveloped basis as The expansion coefficients are where w i are the Gauss-Laguerre quadrature weights, also computed using scipy.special.roots_laguerre.We synthesize the basis functions with the standard recursion relations in extended precision, which prevents underflows or overflows in problems with hundreds of modes.Quadraturebased Matrix-Multiply Transforms (MMTs) convert between grid values and coefficients.

G. Compound bases
An arbitrary number of adjacent polynomial segments can be connected to form a Compound basis.The spectral coefficients on each subinterval are concatenated to form the compound coefficient vector, and the standard transforms operate on each subinterval.The compound basis grid is similarly the concatenation of the subinterval grids.There are no overlapping gridpoints at the interfaces since the polynomial bases use interior grids.Continuity is not required a priori at the interfaces, but is imposed on the solutions when solving equations (see IX A).
The subintervals making up a compound basis may have different resolutions and different lengths, but must be adjacent.Compound bases are useful for placing higher resolution (from clustering near the endpoints of polynomial grids) at fixed interior locations.Compound expansions can also substantially reduce the number of modes needed to resolve a function that is not smooth if the positions where the function becomes non-differentiable are known.Fig. 4 shows the grid of a compound basis composed of three Chebyshev segments.

H. Scaled transforms & dealiasing
Each basis implements transforms between N c coefficients and scaled grids of size N g = sN c , where s is the transform scale.When s < 1, the coefficients are truncated after the first N g modes before transforming.Such transforms are useful for viewing compressed (i.e.filtered) versions of a field in grid space.When s > 1, the coefficients are padded with N g − N c zeros above the highest modes before transforming.Padding is useful for spectral interpolation, i.e. to view low resolution data on a fine grid.
Transforms with s > 1 are necessary to avoid aliasing errors when calculating nonlinear terms, such as products of fields, in grid space.For each basis, the dealias scale is set at instantiation and defines the transform scale that is used when evaluating mathematical operations on fields.The well-known "3/2 rule" states that properly dealiasing quadratic nonlinearities calculated on the grid requires a transform scale of s ≥ 3/2.In general, an orthogonal polynomial of degree N g +n will alias down to degree N g − n when evaluated on the collocation grid of size N g .A nonlinearity of order P involving expansions up to degree N c − 1 will have power up to degree P(N c − 1).For this maximum degree to not alias down into degree N c − 1 of the product, we must have 2N g > (P + 1)(N c − 1).Picking a dealias scale of s = (P + 1)/2 is therefore sufficient to evaluate the nonlinearity without aliasing errors in the first N c coefficients.Non-polynomial nonlinearities, such as negative powers of fields, cannot be fully dealiased using this method, but the aliasing error can be reduced by increasing s.

I. Transform plans
To minimize code duplication and maximize extensibility, our algorithms require that each transform routine can be applied along an arbitrary axis of a multidimensional array.Scipy transforms include this functionality, and we built Cython wrappers around the FFTW Guru interface to achieve the same.The wrappers produce plans for FFTs along one dimension of an arbitrary dimensional array by collapsing the axes before and after the transform axis, and creating an FFTW plan for a two-dimensional loop of rank-1 transforms.For example, to transform along the third axis of a five-dimensional array of shape (N 1 , N 2 , N 3 , N 4 , N 5 ), the array would be viewed as a three-dimensional array of shape (N 1 N 2 , N 3 , N 4 N 5 ) and a loop of N 1 N 2 × N 4 N 5 transforms of size N 3 would occur.This approach allows for the unified planning and evaluation of transforms along any dimension of an array of arbitrary dimension, reducing the risk of coding errors that might accompany treating different dimensions of data as separate cases.
The plans produced by FFTW are cached by the corresponding basis objects and executed using the FFTW new-array interface.This centralized caching of transform plans reduces both precomputation time and the memory footprint necessary to plan FFTW transforms for many data fields.The FFTW planning rigor, which determines how much precomputation should be performed to find the optimal transform algorithm, is also wrapped through the Dedalus configuration interface.

V. DOMAINS
Domain objects represent physical domains, discretized by the direct product of one-dimensional spectral bases.A Dedalus simulation will typically contain a single domain object, which functions as the overall context for fields and problems in that simulation.A domain is instantiated with a list of basis objects forming this direct product, the data type of the variables on the domain (double precision real (64-bit) or complex (128-bit) floating point numbers), and the process mesh for distributing the domain when running Dedalus in parallel.

A. Parallel data distribution
Computations in Dedalus are parallelized by subdividing and distributing the data of each field over the available processes in a distributed-memory MPI environment.The domain class internally constructs a Distributor object that directs the decomposition and communication necessary to transform the distributed fields between grid space and coefficient space.Specifically, a domain can be distributed over any lower-dimensional array of processes, referred to as the process mesh.The process mesh must be of lower dimension than the domain so that at least one dimension is local at all times.Spectral transforms are performed along local dimensions and parallel data transpositions change the data locality to enable transforms across all dimensions.
To coordinate this process, the distributor constructs a series of Layout objects describing the necessary transform and distribution states of the data between coefficient space and grid space.Consider a domain of dimension D and shape (N 1 , N 2 , ..., N D ) distributed over a process mesh of dimension P < D and shape (M 1 , M 2 , ..., M P ): • The first layout is full coefficient space, where the first P array dimensions are block-distributed over the corresponding mesh axes, and the last D − P dimensions are local.That is, the i-th dimension is split in adjacent blocks of size B i,i = ceil (N i /M i ), and the process with index (m 1 , m 2 , ..., m P ) in the mesh will contain the data block from m i B i,i : (m i + 1)B i,i in the i-th dimension.
• The subsequent D − P layouts sequentially transform each dimension to grid space starting from the last dimension and moving backwards.
• After D − P transforms, the first P dimensions are distributed and in coefficient space, and the last D − P dimensions are local and in grid space.A global data transposition makes the P-th dimension local in the next layout.This transposition occurs along the P-th mesh axis, gathering the distributed data along the P-th array dimension and redistributing it along the (P +1)-th array dimension.This is an all-to-all communication within each one-dimensional subset of processes in the mesh defined by fixed (m 1 , ..., m P−1 ).
• The next layout results from transforming the P-th dimension (now local) to grid space.
• The transposition step then repeats to reach the next layout: all-to-all communication transposes the (P − 1) and P-th array dimensions over the (P − 1)-th mesh axis.
• The next layout results from transforming the (P − 1)-th dimension (now local) to grid space.
• This process repeats, reaching new layouts by alternately gathering and transforming sequentially lower dimensions until the first dimension becomes local and is transformed to grid space.
The final layout is full grid space.The first dimension is local, the next P dimensions are distributed in blocks of size , and the final D − P − 1 dimensions are local.Moving from full coefficient space to full grid space thus requires D local spectral transforms and P distributed array transpositions.This sequence defines a total of D + P + 1 data layouts.Fig. 5 shows the data distribution in each layout for 3D data distributed over a process mesh of shape (4, 2).The layout system provides a simple, well-ordered sequence of transform/distribution states that can be systematically constructed for domains and process meshes of any dimension and shape.Conceptually, the system propagates the first local dimension down in order for each spectral transform to be performed locally.Care must be taken to consider edge cases resulting in empty processes for certain domain and process shapes.In particular, if for any mesh axis i, then the last k hyperplanes along the i-th axis of the mesh will be empty.For instance, if M 1 = 4 and N 1 = 9, then the initial block size along the lowest dimension will be B 1,1 = 3, and therefore processes with m 1 = 4 will be empty.These cases are typically avoidable by choosing a different process mesh shape for a fixed number of processes.
For simplicity, we discussed fixed-shape global data throughout the transform process.The implementation also handles arbitrary transform scales along each dimension, meaning N i = N c,i in coefficient space, and N i = N g,i = s i N c,i in grid space.The default process mesh is one-dimensional and contains all available MPI processes.

B. Transpose routines
Consider the first transposition when moving from coefficient space to grid space, i.e. transposing the P and (P + 1)-th array dimensions over the P-th mesh axis.This transposition does not change the data distribution over the lower mesh axes; it consists of separate all-to-all calls within each one-dimensional subset of processes defined by fixed (m 1 , ..., m P−1 ).
The transposition is planned by first creating separate subgroup MPI communicators consisting of each group of M p processes with the same (m 1 , ..., m P−1 ).Each communicator plans for the transposition of an array with the global subgroup shape (B 1,1 , ..., B P−1, P−1 , N P , N P+1 , ..., N D ), i.e. the subspace of the global data spanned by its subgroup processes.This array is viewed as a four-dimensional array with the reduced global subgroup shape (B 1,1 × ... × B P−1, P−1 , N P , N P+1 , N P+2 × ... × N D ), constructed by collapsing the pre-and post-transposition dimensions.In this way, the general case of transposing a D-dimensional array distributed over a P-dimensional process mesh along an arbitrary mesh axis is reduced to the problem of transposing a four-dimensional array across its middle two dimensions.
When transposing the distribution along the p-th mesh axis between the p-th and (p + 1)-th array dimensions, the global subgroup shape is given by (n 1 , ..., n d ) where where the first p array dimensions are distributed over the corresponding mesh axes, the p-th and (p + 1)-th array dimensions are alternating between being local and distributed over the p-th mesh axis, the following P − p array dimensions have already undergone a transposition and are distributed over the corresponding mesh axes less one, and the remaining array dimensions are local.This global shape is collapsed to the reduced global subgroup shape where (63) Routines using either MPI or FFTW are available for performing the reduced data transpositions.
The MPI version begins with the local subgroup data of shape (G 1 , B p, p , N p+1 , G 4 ) and splits this data into the blocks of shape (G 1 , B p, p , B p+1, p , G 4 ) to be distributed to the other processes.These blocks are then sequentially copied into a new memory buffer so that the data for each process is contiguous.A MPI all-to-all call is then used to redistribute the blocks from being row-local to column-local, in reference to the second and third axes of the reduced array.Finally, the blocks are extracted from the MPI buffer to form the local subgroup data of shape (G 1 , N p , B p+1, p , G 4 ) in the subsequent layout.The FFTW version performs a hard (memory-reordering) local transposition to rearrange the data into shape (G 2 , G 3 , G 1 , G 4 ), and uses FFTW's advanced distributed-transpose interface to build a plan for transposing a matrix of shape (G 2 , G 3 ) with an itemsize of G 1 × G 4 × Q, where Q is the actual data itemsize.
x y z Layout 5 (g) TF(z) TP(y,z) TF(y) TP(x,y) TF(x) Figure 5.The parallel data distribution for 3D data over a process mesh of shape (4, 2).The global data is depicted as being split into the portions that are local to each process.The dimensions are labeled e.g.k x when the corresponding dimension is in coefficient space, and e.g.
x in grid space.The transforms (TF) and transpositions (TP) stepping between layouts are indicated.
Figure 6.The effective data redistribution that occurs during the distributed transposition between layouts 1 and 2 of the example shown in Fig. 5.This transposition is switching the second mesh axis with M 2 = 2 from distributing k y to distributing z.
Fig. 6 shows the conceptual domain redistribution strategy for the transposition between layouts 1 and 2 of the example shown in Fig. 5.Both the MPI and FFTW implementations require reordering the local data in memory before communicating.However, they provide simple and robust implementations encompassing the general transpositions required by the layout structure.The MPI implementation serves as a low-dependency baseline, while the FFTW routines leverage FFTW's internal transpose optimization to improve performance when an MPI-linked FFTW build is available.The FFTW planning rigor and in-place directives for the transpositions are wrapped through the Dedalus configuration interface.
These routines can also be used to group transpositions of multiple arrays simultaneously.Transposing S-many arrays concatenates their local subgroup data and the reduced global subgroup shape is expanded to (S × G 1 , G 2 , G 3 , G 4 ).A plan is constructed and executed for the expanded shape.This concatenation allows for the simultaneous transposition of multiple arrays while reducing the latency associated with initiating the transpositions.The option to group multiple transpositions in this manner is controlled through the Dedalus configuration interface.

C. Distributed data interaction
For arbitrary transform scalings in each dimension, the layout objects contain methods providing: the global data shape, local data shape, block sizes, local data coordinates, and local data slices for Field objects.These methods provide the user with the tools necessary to understand the data distribution at any stage in the transformation process.This is useful for both analyzing distributed data and initializing distributed fields using stored global data.
The domain class contains methods for retrieving each process's local portion of the N-dimensional coordinate grid and spectral coefficients.These local arrays are useful for initializing field values in either grid space or coefficient space.Code that initializes field data using these local arrays is robust to changing parallelization scenarios, allowing scripts to be tested serially on local machines and then executed on large systems without modification.

VI. FIELDS
Field objects represent scalar-valued fields defined over a domain.Each field object contains a metadata dictionary specifying whether that field is constant along any axis, the scales along each axis, and any other metadata associated with specific bases (such as parity for the sine/cosine bases or envelope for Hermite and Laguerre bases).When the transform scales are specified or changed, the field object internally allocates a buffer large enough to hold the local data in any layout for the given scales.Each field also contains a reference to its current layout, and a data attribute viewing its memory buffer using the local data shape and type.

A. Data manipulation
The Field class defines a number of methods for transforming individual fields between layouts.The most basic methods move the field towards grid or coefficient space by calling the transforms or transpositions to increment or decrement the layout by a single step.Other methods direct the transformation to a specific layout by taking sequential steps.These methods allow users to interact with the distributed grid data and the distributed coefficient data without needing to know the details of the distributing transform mechanism and intermediate layouts.
The __getitem__ and __setitem__ methods of the field class allow retrieving or setting the local field data in any layout.Shortcuts c and g allow fast access to the full coefficient and grid data, respectively.To complete a fully parallelized distributed transform: The set_scales method modifies the transform scales:

B. Field Systems
The FieldSystem class groups together a set of fields.The class provides an interface for accessing the coefficients corresponding to the same transverse mode, or pencil, of a group of fields.A transverse mode is a specific product of basis functions for the first D − 1 dimensions of a domain, indexed by a multi-index of size D − 1.Each transverse mode has a corresponding 1D pencil of coefficients along the last axis of a field's coefficient data.The linear portion of a PDE that is uncoupled across tranverse dimensions splits into separate matrix systems for each transverse mode.
A FieldSystem of S fields will build an internal buffer of size That is, the local coefficient shape with the last axis size multiplied by the number of fields.The system methods gather and scatter copy the separate field coefficients into and out of this buffer.Each size-N D × S system pencil contains the corresponding field pencils, grouped along the last axis, in a contiguous block of memory for efficient access.
A CoeffSystem allocates and controls just the unified buffer rather than also instantiating field objects.Coefficient systems are used as temporary arrays for all pencils, avoiding the memory overhead associated with instantiating new field objects.

VII. OPERATORS
The Operator classes represent mathematical operations on fields, such as arithmetic, differentiation, integration, and interpolation.An operator instance represents a specific mathematical operation on a field or set of fields.Operators can be composed to build complex expressions.The operator system serves two simultaneous purposes: 1.It allows the deferred and repeated evaluation of arbitrary operations.

It can produce matrix forms of linear operations.
Together, these features allow the implicit and explicit evaluation of arbitrary expressions, which is the foundation of Dedalus' ability to solve general PDEs.

A. Operator classes and evaluation
Operators accept operands (fields or operators) from the same domain, as well as other arguments such as numerical constants or strings.Each operator class implements methods determining the metadata of the output based on the inputs; e.g. the output parity of a SinCos basis.Operator classes also have a check_conditions method that checks if the operation can be executed in a given layout.For example, spectral differentiation along some dimensions requires that dimension to be in coefficient space as well as local for derivatives that couple modes.Finally, operators have an operate method which performs the operation on the local data of the inputs once they have been placed in a suitable layout.
Operators can be combined to build complex expressions.An arbitrary expression belongs to the root operator class, with operands that belong to other operator classes, eventually with fields or input parameters forming the leaves at the end of the expression tree (see Fig. 7).The evaluate method computes compound operators by recursively evaluating all operands, setting the operands' transform scales to the dealias scales, transforming the operands to the proper layout, and calling the operate method.Arbitrary expression trees are evaluated in a depth-first traversal.The evaluate method can optionally cache its output if it may be called multiple times before the values of the leaves change.The attempt method tries to evaluate a field, but will not make any layout changes while evaluating subtrees.It therefore evaluates an expression as much as possible given the current layouts of the involved fields.Finally, operators also implement a number of methods allowing for algebraic manipulation of expressions, described in §VII F.

B. Arithmetic operators
The Add, Multiply, and Power classes implement addition, multiplication, and exponentiation respectively.Different subclasses of these operators are invoked depending on the types of input.A Python metaclass implements this multiple dispatch system, which examines the arguments before instantiating an operator of the proper subclass.For example, the AddFieldField subclass adds two fields by adding the local data of each field.The operation can be evaluated as long as both fields are in the same layout.Addition between fields requires compatible metadata, e.g. the same parity or envelope settings.The AddScalarField class likewise adds a constant to a field.Multiplication and exponentiation of fields must occur in grid space, but otherwise have similar implementations.
The overloaded __add__, __mul__, and __pow__ methods allow for easy arithmetic on fields and operators using Python infix operators.For example, with a Dedalus field, f, the expression f + 5 will produce an AddScalarField instance.We also override the __neg__, __sub__, and __truediv__ methods for negation, subtraction, and division:

C. Unary grid operators
The UnaryGridFunction class implements common nonlinear unary functions: np.absolute, np.sign, np.conj, np .exp,np.exp2, np.log, np.log2, np.log10, np.sqrt, np.square, np.sin, np.cos, np.tan, np.arcsin, np.arccos, np.arctan, np.sinh, np.cosh, np.tanh, np.arcsinh, np.arccosh, and np.arctanh.The operation proceeds by applying the function to the local grid space data of the operand.The overloaded __getattr__ method intercepts Numpy universal function calls on fields and operators and instantiates the corresponding UnaryGridFunction.This allows the direct use of Numpy ufuncs to create operators on fields.For example np .sin(f) on a Dedalus field f will return UnaryGridFunction (np.sin, f).

D. Linear spectral operators
Linear operators acting on spectral coefficients are derived from the LinearOperator base class and the Coupled or Separable base classes if they do or do not couple different spectral modes, respectively.These operators are instantiated by specifying the axis along which the operator is to be applied, which is used to dispatch the instantiation to a subclass implementing the operator for the corresponding basis.These operators implement a matrix_form method which produces the matrix defining the action of the operator on the spectral basis functions.For a basis φ n and an operator A, the matrix form of A is For separable operators, this matrix is diagonal by definition, and represented with a one-dimensional array.For coupled operators, this matrix is returned as a Scipy sparse matrix.
In general, linear operators require their corresponding axis to be in coefficient space to be evaluated.Coupled operators further require that the corresponding axis is local.The local data of the operand is contracted with the matrix form of the operator along this axis to produce the local output data.Operators may override this process by implementing an explicit_form method if a more efficient or stable algorithm exists for forward-applying the operator.For example, forward Chebyshev differentiation uses a recursion rather than a matrix multiplication.

Differentiation
Differentiate classes are implemented for each basis.Differentiation of the Fourier and SinCos bases is a separable operator and therefore a diagonal matrix.Differentiation of the polynomial bases is a coupled operator.Differentiation of the Hermite polynomials and enveloped functions are both naturally banded and therefore require no conversion into a different test basis to retain sparsity.Differentiation of the Chebyshev, Legendre, and Laguerre bases have dense upper triangular matrices, but these are never used when solving equations.Instead, we always convert the differential equations into a test basis with banded differentiation matrices (see §II B 1).These conversions are applied as left-preconditioners for the equation matrices.Appendix A 1 shows the full differentiation and conversion matrices.For the Chebyshev, Legendre, and Laguerre bases, forward differentiation uses O(N c ) recurrence relations rather than dense matrices.Template matrices are rescaled according to the affine map between the native and problem coordinates.The differentiation matrix for the compound basis is the block-diagonal combination of the subbasis differentiation matrices.
For each basis type, a differentiation subclass is referenced from the basis-class Differentiate method.These methods are aliased as e.g.dx for a basis with name x during equation parsing.Additionally, a factory function called differentiate (aliased as d) provides an easy interface for constructing higher-order and mixed derivatives using the basis names, by composing the appropriate differentiation methods: The differentiation subclasses also examine the constant metadata of their operand, and return 0 instead of instantiating an operator if the operand is constant along the direction of differentiation.

Integration
Integration is a functional that returns a constant for any input basis series.Integration operators therefore set the constant metadata of their corresponding axes to True.The operator matrices are nonzero only in the first row (called the operator vector).
The native integration vectors for each basis are rescaled by the stretching of the affine map between the native and problem coordinates.Integration for the compound basis concatenates each of the sub-basis integration vectors and places the result in the rows corresponding to the constant terms in each sub-basis.
For each basis type, an integration subclass is referenced from the basis-class Integrate method.Additionally, a factory function called integrate (aliased as integ) provides an easy interface for integrating along multiple axes, listed by name, by composing the appropriate integration methods: If no bases are listed, the field will be integrated over all of its bases.If the operand's metadata indicates that it is constant along the integration axis, the product of the constant and the interval length will be returned.
The antidifferentiate method of the Field class implements indefinite integration.This method internally constructs and solves a simple linear boundary value problem and returns a new Field satisfying a user-specified boundary condition, fixing the constant of integration.

Interpolation
Interpolation operators are instantiated with an operand and the interpolation position in problem coordinates.The operator matrices are again nonzero except in the first row (called the operator vector) and depend on the interpolation position.The specified interpolation positions are converted to the native basis coordinates via the basis affine map.The strings left , center , and right are also acceptable inputs indicating the left endpoint, center point, and right endpoint of the problem interval.Specifying positions in this manner avoids potential floating-point errors when evaluating the affine map at the endpoints.
The interpolation classes construct interpolation vectors consisting of the pointwise evaluation of the respective basis functions.Interpolation for the compound basis takes the interpolation vector of the sub-basis containing the interpolation position and places the result in the rows corresponding to the constant terms in each sub-basis.If the interpolation position is at the interface between two sub-basis, the first sub-basis is used to break the degeneracy.
For each basis type, an interpolation subclass is referenced from the Interpolate basis-class method.Additionally, a factory function called interpolate (aliased as interp) provides an easy interface for interpolating along multiple axes, specified using keyword arguments, by composing the appropriate integration methods: interp (f , x = 0 .5 , y = 1 ) # xbasis .Interpolate ( # ybasis .Interpolate (f , 1 ) , 0 .5 ) If the operand's metadata indicates that it is constant along the interpolation axis, instantiation will be skipped and the operand itself will be returned.

Hilbert transforms
The Hilbert transform of a function f (x) is the principalvalue convolution with (πx) −1 : The Hilbert transform has a particularly simple action on sinusoids, The actions on cosine/sine functions result from taking the real/imaginary parts.The Hilbert transform is implemented as a separable operator for the Fourier and SinCos bases and referenced from the HilbertTransform basis-class methods.These methods are aliased as e.g.Hx for a basis with name x .Additionally, a factory function called hilberttransform (aliased as H) provides an easy interface for constructing higher-order and mixed Hilbert transforms using the basis names, similar to the differentiate factory function.The Hilbert transform subclasses also examine the constant metadata of their operand, and return 0 instead of instantiating an operator if the operand is constant along axis to be transformed.

E. User-specified functions
The GeneralFunction class wraps and applies general Python functions to field data in any specified layout.For example, a user-defined Python function func(A, B) which accepts and operates on grid-space data arrays, can be wrapped into a Dedalus Operator for deferred evaluation using Dedalus fields fA and fB as GeneralFunction ( func , args = ( fA , fB ) , layout = g ) §XI F gives a detailed example.

F. Manipulating expressions
The operator classes also implement a number of methods allowing the algebraic manipulation of operator expressions, i.e. a simple computer-algebra system.§VIII describes how this enables the construction of solvers for general partial differential equations.
These methods include: • atoms: recursively constructs the set of leaves of an expression matching a specified type.
• has: recursively determines whether an expression contains any specified operand or operator type.
• expand: recursively distributes multiplication and linear operators over sums of operands containing any specified operand or operator type.It also distributes derivatives of products containing any specified operand or operator type using the product rule.
• canonical_linear_form: first determines if all the terms in an expression are linear functions of a specified set of operands, and raises an error otherwise.In the case of nested multiplications, it rearranges the terms so that the highest level multiplication directly contains the operand from the specified set.
• split: additively splits an expression into a set of terms containing specified operands and operator types, and a set of terms not containing any of them.
• replace: performs a depth-first search of an expression, replacing any instances of a specified operand or operator type with a specified replacement.
• order: recursively determines the compositional order of a specified operator type within an expression.
• sym_diff: produces a new expression containing the symbolic derivative with respect to a specified variable.
The derivative is computed recursively via the chain rule.
• as_ncc_operator: constructs the NCC multiplication matrix associated with multiplication by the expression.It requires that the corresponding domain only have a single polynomial basis, and that this basis forms the last axis of the domain.It further requires that the expression is constant along all other ("transverse") axes, so that multiplication by the operand does not couple the transverse modes.The method first evaluates the expression, then builds the NCC matrix with the resulting coefficients.The method allows the NCC expansion to be truncated at a maximum number of modes (N m < N c ) and for terms to be excluded when the coefficient amplitudes are below some threshold (| f n | < δ) so that the matrix is sparse for well-resolved functions (see §II B 3).
• operator_dict: constructs a dictionary representing an expression as a set of matrices acting on the specified pencils of a specified set of variable.This method requires that the expression be linear in the specified variables and contains no operators coupling any dimensions besides the last.
The dictionary is constructed recursively, with each coupled linear operator applying its matrix form to its operand matrices, and each transverse linear operator multiplying its operand matrices by the proper element of its vector form.Addition operators sum the matrices produced by their operands.Multiplication operators build the matrices for the operand containing the specified variables.They then multiply these matrices by the NCC matrix form of the other operand.

G. Evaluators
An Evaluator object attempts to simultaneously evaluate multiple operator expressions, or tasks, as efficiently as possible, i.e. with the least number of spectral transforms and distributed transpositions.Tasks are organized into Handler objects, each with a criterion for when to evaluate the handler.Handlers can be evaluated on a specified cadence in terms of simulation iterations, simulation time, or real-world time (wall time) since the start of the simulation.Handlers from the SystemHandler class organize their outputs into a FieldSystem, while handlers from the FileHandler class save their outputs to disk in HDF5 files via the h5py package ( §X).The add_task method adds tasks to a handler and accepts operator expressions or strings (which are parsed into operator expressions using a specified namespace).
When triggered, the evaluator examines the attached handlers and builds a list of the tasks from each handler scheduled for evaluation.The evaluator uses the attempt methods to evaluate the tasks as far as possible without triggering any transforms or transpositions.If the tasks have not all completed, the evaluator merges the remaining atoms from the remaining tasks, and moves them all to full coefficient space, and reattempts evaluation.If the tasks are still incomplete, the evaluator again merges the remaining atoms from the remaining tasks, moves them forward one layout, and reattempts evaluation.This process repeats, with the evaluator simultaneously stepping the remaining atoms back and forth through all the layouts until all of the tasks have been fully evaluated.Finally, the process method on each of the scheduled handlers is executed.
This process is more efficient than sequentially evaluating each expression.By attempting all tasks before changing layouts, it makes sure that no transforms or transpositions are triggered when any operators are able to be evaluated.Additionally, it groups together all the fields that need to be moved between layouts so that grouped transforms and transpositions can be performed to minimize overhead and latency.

VIII. PROBLEMS
Problem classes construct and represent systems of PDEs.Separate classes manage linear boundary value problems (LBVP), nonlinear boundary value problems (NLBVP), eigenvalue problems (EVP), and initial value problems (IVP).After creating a problem, the equations and boundary conditions are entered in plain text, with linear terms on the LHS and nonlinear terms on the RHS.The LHS is parsed into a sparse matrix formulation, while the RHS is parsed into an operator tree to be evaluated explicitly.

A. Problem creation
Each problem class is instantiated with a domain and a list of variable names.Domains may have a maximum of one polynomial basis, which must correspond to the last axis.The linear portion of the equations must be no higher than first-order in time and coupled derivatives.Auxiliary variables must be added to render the system first-order.Optionally, an amplitude threshold and a cutoff mode number can be specified for truncating the spectral expansion of non-constant coefficients on the LHS.For eigenvalue problems, the eigenvalue name must also be specified; it cannot be lambda since this is a Python reserved word.For initial value problems, the temporal variable name can optionally be specified, but defaults to t .
For example, to create an initial value problem for an equation involving the variables u and v, we would write problem = de .IVP ( domain , variables = [ u , v ] )

B. Variable metadata
Metadata for the problem variables is specified by indexing the problem.metaattribute by variable name, axis, and then property.
The most common metadata to set is the constant flag for any dimension, the parity of all variables for each SinCos basis, the envelope flag for the Hermite and Laguerre bases, and the dirichlet flag for recombining the Chebyshev, Legendre, and Laguerre bases (enabled by default).Default metadata values are specified in the basis definitions.
For example, we can set the parity of variables in our problem along the x axis with

C. Parameters and non-constant coefficients
Before adding the equations, any parameters (fields or scalars used in the equations besides the problem variables) must be added to the parsing namespace through the problem .parametersdictionary.Scalar parameters are entered by value.Non-constant coefficients (NCCs) are entered as fields with the desired data.NCCs on the LHS can only couple polynomial dimensions; an error will be raised if the constant metadata is not set to True for all separable axes.
For example, we would enter scalar and NCC parameters for a 3D problem on a double-Fourier (x, y) and Chebyshev (z) domain as:

D. Substitutions
One of the most powerful features of Dedalus is the ability to define substitutions which act as string-replacement rules to be applied during the equation parsing process.Substitutions can be used to provide short aliases to quantities computed from the problem variables and to define functions similar to Python lambda functions, but with normal mathematicalfunction syntax.
For example, several substitutions that might be useful in a hydrodynamical simulation are: Substitutions of the first type are created by parsing their definitions in the problem namespace, and aliasing the result to the substitution name.Substitutions of the second type are turned into Python lambda functions producing their specified form in the problem namespace.Substitutions are composable, and form a powerful tool for simplifying the entry of complex equation sets.

E. Equation parsing
Equations and boundary conditions are entered in plain text using the add_equation and add_bc methods.Optionally, these methods accept a condition keyword, which is a string specifying which transverse modes that equation applies to.This is necessary to close certain equation sets where, for example, the equations become degenerate for the transversemean mode and/or certain variables require gauge conditions.
First, the string-form equations are split into LHS and RHS strings which are evaluated over the problem namespace to build LHS and RHS operator expressions.The problem namespace consists of: • The variables, parameters and substitutions defined in the problem.
• The axis names representing the individual basis grids.
• The derivative, integration, and interpolation operators for each basis.
• Time and temporal derivatives for the IVP (defaulting to t and dt ).
• The eigenvalue name for the EVP.
• The universal functions wrapped through the UnaryGridFunction class.
A number of conditions confirming the validity of the LHS and RHS expressions are then checked.For all problem types, the LHS expression and RHS must have compatible metadata (e.g.parities).The LHS expression must be nonzero and linear in the problem variables.The LHS must also be first-order in coupled derivatives.The expressions entered as boundary conditions must be constant along the last axis.
For the individual problem classes, the following additional restrictions and manipulations are applied to the LHS and RHS expressions:

Linear boundary value problems
The linear boundary value problem additionally requires that the RHS is independent of the problem variables.This allows for linear problems with inhomogeneous terms on the RHS.Since the LHS terms are linear in the problem variables, this symbolically corresponds to systems of equations of the form where X is the state vector of variable fields, and L is a matrix of operators.The LHS expressions are expanded and transformed into canonical linear form before being stored by the problem instance.

Nonlinear boundary value problems
Nonlinear boundary value problems are systems of the form The RHS can be any nonlinear function of the problem variables.In addition to the L • X and F expressions, the problem constructs the Frechet differential of the RHS with respect to the problem variables: This linear operator indicates the sensitivity, or directional functional derivative, of F with respect to changes in X along ∆X.It is constructed symbolically using the operator methods described in §VII F roughly as In general, the Frechet derivative of an expression will contain non-constant coefficients involving the problem variables X, which would generally couple horizontal modes.Therefore, Dedalus only supports 1D NLBVPs.The LHS and Frechet differential expressions are expanded and transformed into canonical linear form before being stored by the problem instance.

Eigenvalue problems
The eigenvalue problem requires that the RHS is zero, and that the LHS terms must be linear in or independent of the eigenvalue, which we refer to as σ.This corresponds to systems of equations of the form which are generalized linear eigenvalue problems.The M • X and L • X expressions are extracted by splitting the LHS expression on the presence of the eigenvalue variable, before replacing it with 1.These expressions are expanded and transformed into canonical linear form before being stored by the problem instance.

Initial value problems
The initial value problem requires that the LHS coefficients are time independent, the LHS is first-order in time derivatives, and the RHS has no time derivatives.This corresponds to systems of the form The M • X and L • X expressions are extracted by splitting the LHS expression on the presence of the time derivative dummy operator, before replacing it with the identity operator.These expressions are expanded and transformed into canonical linear form before being stored by the problem instance.

IX. SOLVERS
Each problem type has a corresponding solver type which builds the spectral matrices for the problem equations, implements methods for computing the solution to the equations, and stores the solution state as a FieldSystem.

A. Matrix construction
The problem classes begin by building the operator matrices for the LHS expression groups (M • X, L • X, and/or F X • ∆X).The matrices are constructed by first taking the set of equations and boundary conditions that apply to each pencil and calling the operator_dict method on each expression to build the matrices acting on the corresponding coefficients of the problem variables.For each pencil's matrix to be solvable, the number of applicable equations must equal the number of variables in the problem, S. For a given pencil, we refer to the operator matrix from the i-th equation acting on the j-th variable as e.g.L i, j .Each of these matrices is processed as follows: • If the i-th equation is constant along the coupled direction, then all rows except the first of each L i, j are dropped.
• If the i-th equation contains a coupled derivative on the LHS, each L i, j is left-multiplied by the basis preconditioner matrix, which renders the derivative matrix banded (e.g. the T-to-U conversion for Chebyshev bases; see §II B 1).
• By default, if the i-th equation contains a coupled derivative on the LHS, the last row of each L i, j is dropped and replaced with one of the boundary conditions.The keyword option tau in the problem.add_equationmethod overrides this default behavior and explicitly forces or prevents the solver from replacing the last row in that equation.This implements boundary conditions using the tau method (see §II A 2).For the system to be solvable, it typically requires the same number of boundary conditions as coupled differential equations.The tau keyword allows for non-standard cases, such as problems with singular end points.
• If the last basis is compound, the rows corresponding to the final coefficient of each sub-basis, except for the last, are dropped and replaced with internal boundary conditions matching the sub-basis values at each internal interface for each variable.This enforces continuity of all variables across the subsegments.
• If the j-th variable has been marked as constant along the coupled direction, then all columns except the first of each L i, j are dropped.
• If the j-th variable has been marked for Dirichlet preconditioning (enabled by default), then each L i, j is rightmultiplied by the Dirichlet conversion matrix.This has the effect of rearranging the columns so that the matrix acts on the coefficients of the Dirichlet expansion of the corresponding variable, rendering all Dirichlet boundary conditions banded (see §II B 2).
Finally, the processed operator matrices are joined to produce the full preconditioned pencil matrix L. The matrices are interleaved so that the columns and rows are grouped by mode rather than by field (see §II B 4).The bandwidth of the pencil matrix then becomes S times the maximum bandwidth of any of the individual sub-blocks, which is roughly set by the bandwidth of the non-constant coefficient expansions.Interleaved-block matrices P L p and P R combining all the left and right preconditioners, projections, and reorderings are created and stored, since they will need to be applied to the RHS and solution vectors, respectively, when solving the matrix system.
The pencil matrices are stored as Scipy sparse matrices.The matrices produced for each of the LHS expression groups (M•X, L•X, and/or F X •∆X) are expanded to occupy the union of their sparsity patterns so that they can be added efficiently.

B. Linear boundary value solver
The linear boundary value solver is instantiated from a linear boundary value problem.It first constructs the matrices Lp = P L p L p P R for each local pencil p from the stored LHS expression group L • X.Here P L p and P R explicitly indicate that the expression matrices have been preconditioned from the left and the right, and the p subscripts indicate that the left preconditioner and expression matrices vary by pencil.The solver then constructs a system handler for evaluating the RHS equation and boundary condition expressions (F ).
The solver class contains a solve method, which first evaluates the RHS handler for F .At this point, the linear boundary value problem is fully discretized, and conceptually consists of solving an independent matrix problem for each pencil given by The equivalent preconditioned system is given by For each pencil, this system is solved in the following manner: • The RHS vector F p is constructed by taking the pencil data from the RHS handler.
• The pencil's left-preconditioner is applied to F P to produce Fp .
• Since Lp is sparse and banded, it can be efficiently solved against Fp to produce Xp .
• The state-vector pencil is recovered from Xp by reapplying the right-preconditioner as and the result is assigned to the state-vector FieldSystem.
After the RHS is evaluated, this process is trivially parallelized over pencils, with each process performing a series of local sparse matrix solves for its local pencils.The sparsity and bandedness of the matrix Lp makes the linear solve an efficient process, executing in O(N c D ) time.Finally, although (P R ) −1 is a dense matrix, it never needs to be constructed, as reapplying the sparse P R matrix to the output of the linear solve reverses the implicit preconditioning of the unknowns.

C. Nonlinear boundary value solver
The nonlinear boundary value solver is instantiated from the nonlinear boundary value problem.It constructs the matrices Lp for each pencil and handlers for evaluating the expressions F and L • X.
The solver class contains a newton_iteration method, which performs a single iteration of Newton's method to move the state vector towards the nonlinear solution.Conceptually, the Newton step iteratively approaches the solution of the nonlinear problem by solving for the update δX n to the state vector that will cause the future state vector X n+1 = X n + δX n to solve the NLBVP when linearized around the current iteration X n : The Newton iteration begins by evaluating the RHS handlers for F and L • X and building the matrices Fp , which discretize the Frechet-derivative of F using the current state vector.For each pencil, the update is then determined in the following manner: • The RHS vector B p is constructed by combining the pencil data from the RHS handlers.
• The left-preconditioner is applied to produce Bp .
• The LHS matrices are combined to produce Ãp , which is solved against Bp to produce δ Xp .
• The right-preconditioner is applied to recover δX p .
• The state vector is updated as X p → X p + δX p .
We note that the sparse matrix being solved changes at each iteration, since it depends on the evaluation of the Frechet derivative at the current state vector.The magnitude of the perturbations can be monitored to determine when the solver has converged.Convergence can depend sensitively on the initial values of the state vector, but the iterations converge rapidly (quadratically) for sufficiently good starting positions.The initial conditions are set by modifying the fields in the solver.stateFieldSystem.

D. Eigenvalue solver
The eigenvalue solver is instantiated from the eigenvalue problem.It constructs the Mp and Lp matrices and solves the eigenvalue problem for a single pencil at a time, storing the resulting eigenvalues and eigenvectors.The class contains a set_state method which will set the solver's state vector to the specified eigenmode for visualization or further computation.
The solver class contains two methods for solving the generalized eigenvalue problem for a specified pencil, which conceptually takes the form 1. Dense solver The first is the solve_dense method, which converts the LHS matrices to dense arrays and uses the scipy.linalg.eig routine to directly solve the full generalized eigenvalue problem.This has the advantage of solving for all of the SN c D eigenmodes of the discretized system.However, the computational cost scales as O((SN c D ) 3 ), which becomes prohibitive at large resolutions.

Sparse solver
The second is the solve_sparse method, which solves for a subset of the eigenmodes near a specified target eigenvalue σ T .The generalized problem for the preconditioned matrices is first rearranged as a regular eigenvalue problem using a shift and inversion: A Scipy sparse linear operator is constructed to represent the left-side operator Ãp .This is applied to a vector by first applying Mp , and then solving the result against ( Lp + σ T Mp ).This generalized linear operator is then passed to the scipy.sparse.linalg.eigroutine, which uses the implicitly-restarted Arnoldi method in ARPACK to iteratively compute a specified number of eigenmodes with the largest magnitude λ.The right-preconditioner is applied to the resulting eigenmodes to recover X p , and the computed values of λ are inverted and shifted to recover the corresponding σ.
This shift-and-invert formulation allows using sparse regular eigenvalue solvers for the generalized eigenvalue problem, with the requirement that ( Lp + σ T Mp ) is full rank.

E. Initial value solver
The initial value solver is instantiated from the initial value problem with one of the timestepping classes as an argument, defining the integrator to be used to step the problem forward in time.The solver constructs the Mp and Lp matrices for each pencil and a handler for evaluating the RHS expressions F .
Conceptually, the discretized problem takes the form where the systems for different pencils are only coupled through the RHS terms.In general, M p may not be a fullrank matrix, due to the presence of constraint equations and boundary conditions.This system is integrated using mixed implicit-explicit schemes, where the LHS terms are integrated implicitly, and the RHS terms are integrated explicitly.The timestepping loop is written by the user, allowing for detailed control of and interaction with the model as timestepping progresses.

Initial conditions
The solver's initial state must be set before beginning a simulation.The solver state is stored in the solver.statefield system, and initial conditions are set by directly modifying the variables in this system before beginning integration, e.g.: When possible, it is best to begin a simulation with consistent initial conditions that satisfy the constraint equations and boundary conditions; see §XI E. Initial conditions that are inconsistent may introduce persistent errors or stability problems with some timestepping schemes.
Initial conditions can also be loaded from the analysis files produced by Dedalus ( §X) via the solver.load_statemethod.This is particularly useful for restarting simulations from a checkpoint saved by a previous simulation.

Time evolution
The step method of the initial value solver advances the state by one timestep, producing X n+1 from X n , where the superscripts denote the temporal iteration of the state vector.The method accepts the timestep dt as an argument.The method then gathers the state system, calls the specified integration routine to update the system, and scatters the updated state system back to the field objects.In general, the integration step will evaluate the RHS handler to perform the temporal integration and will simultaneously evaluate any other scheduled handlers (e.g.analysis tasks) attached to the evaluator.

Timestep determination
The implemented timestepping schemes accommodate changing the timestep between iterations during a simulation.While the user can implement any desired algorithm for determining the timestep, the CFL class in the dedalus.extras.flow_tools module can help determine what timestep might adequately resolve physical timescales in the evolving solution.The add_frequencies and add_velocities methods allow users to enter expressions corresponding to state-dependent frequencies and velocities of processes in their simulation, using the same string-based parsing system that is used to enter equations.CFL frequencies are derived from the entered velocities by dividing their values by the grid spacing.Internally, the CFL class builds an auxiliary handler to evaluate these frequencies at a specified cadence, and as the simulation runs the suggested timestep is determined via the compute_dt method as follows: • At each point on the grid, all of the specified frequencies are added.
• The maximum total frequency from the entire grid is taken and inverted to determine the CFL timestep.
• This timestep is then multiplied by a safety factor (typically 0.1 − 0.5), specified at the CFL instantiation with the safety keyword.
• The resulting timestep is then bounded to lie with absolute levels set by the min_dt and max_dt keywords, and a within relative factors of the previous timestep set by the min_change and max_change keywords.
• If the fractional change from the previous timestep to the newly determined timestep is smaller than the threshold parameter, the previous timestep is returned.
Otherwise, the newly determined timestep is returned.
The absolute limits can be useful to prevent the timestep from vastly overstepping relevant dynamics or grinding to a halt due to a spurious feature of the solution.The relative limits help prevent ill-conditioning that may occur for some schemes when the timestep varies too suddenly.The thresholding option allows the timestep to be frequently reevaluated but avoids modifying it by inconsequential amounts.This can have significant performance advantages since factorizations of the IVP matrices are stored and reused when the timestep remains the same between iterations.

Termination
To help determine when a simulations should terminate, the initial value solver implements the proceed property, which determines whether any of the following three criteria apply: • The simulation time has exceeded the value assigned to the solver.stop_sim_timeattribute.
• The wall time (in seconds) since the solver was instantiated has exceeded the value assigned to the solver.stop_wall_time attribute.
• The iteration count has exceeded the value assigned to the solver.stop_iterationattribute.
The wall-time stop is particularly useful for stopping simulations before hard time-limits on cluster job submissions have been reached, allowing for clean termination and potential post-processing of the data before a job is terminated by the system.A simple timestepping loop in an IVP can take the form: while solver .proceed : dt = CFL .compute_dt () solver .step ( dt ) This will continue timestepping until any of the specified stopping criteria have been reached, adjusting the timestep along the way via the CFL handler.

F. Timesteppers
Rather than implementing a single specific timestepping scheme, Dedalus implements general algorithms for applying mixed implicit-explicit (IMEX) multistep and Runge-Kutta integrators along with a range of specific integrators of each type.These IMEX schemes implicitly integrate the LHS terms and explicitly integrate the RHS terms.This provides temporal stability for linearly stiff equations without requiring iterative algorithms for integrating the nonlinear terms.

Multistep IMEX integrators
A general multistep IMEX scheme with s steps temporally discretizes preconditioned systems of the form of Eq. ( 83) into the general form The MultistepIMEX class implements this structure using double-ended queues to store CoeffSystems containing M X, L X, F, and dt for the s most recent steps.The class implements a step method, called with the latest timestep dt n−1 , which produces X n as follows: • The timestep queue is rotated with the newest value replacing the oldest and the scheme coefficients a j , b j , c j are evaluated using the timestep history.
• M Xn−1 and L Xn−1 are evaluated for all local pencils without building the dense inverse of the right preconditioner as e.g.

Lp Xp
• The RHS handler is evaluated and the data for each pencil is left-preconditioned and stored in the Fn−1 coefficient system.
• For each pencil, Ãn p is solved against Bn p to produce Xn p .A matrix solver which stores and reuses factorizations of each Ãn p can reduce the solve time if the coefficients a 0 and b 0 remain unchanged from the previous iteration.
• Applying the right-preconditioner recovers the state vector X n p .Specific multistep schemes are implemented as subclasses of the MultistepIMEX and define the scheme coefficients a j , b j , and c j via the compute_coefficients method.Dedalus currently implements a number of Crank-Nicolson leap-frog, Crank-Nicolson Adams-Bashforth, and semi-implicit BDF methods from Wang and Ruuth [91], ranging from first to fourth order schemes.The multistep methods only require a single evaluation of the RHS per iteration.However, since they depend on previous iterations, they cannot run full-order when beginning a simulation.Instead, each scheme falls back on lower-order schemes for the first s iterations of a simulation.Multistep schemes may also become ill-conditioned if the timestep is varied abruptly.

Runge-Kutta IMEX integrators
A general Runge-Kutta IMEX scheme temporally discretizes preconditioned systems of the form of Eq. ( 83) by constructing stages indexed by i = 1, ..., s as

Mp
Xn,i where Fn, j is evaluated at time t n, j = t n,0 + dtc j , X n,0 = X n , and t n,0 = t n .The H, A, and c tableaus define a specific scheme.This expansion is rearranged to sequentially solve for the stages as We implement "globally stiffly accurate" methods where the final stage is the advanced solution, i.e.X n+1 = X n,s and t n+1 = t n,s = t n + dt.These schemes do not require Mp to be full rank, which it generally is not for Dedalus problems with algebraic constraints and/or boundary conditions.The RungeKuttaIMEX class implements this structure using CoeffSystems to store M Xn,0 as well as L Xn,i and Fn,i for all of the stages.The class implements a step method, called with the timestep dt, which produces X n+1 as follows: • Mp Xn,0 is evaluated for all local pencils.
• Then for each stage i = 1, ..., s: -The RHS handler is evaluated and the data for each pencil is left-preconditioned and stored in the Fn,i−1 coefficient system.For each pencil, Lp Xn,i−1 is also evaluated.
-For each pencil, Ãi p is solved against Bi p to produce Xn,i p .A matrix solver which stores and reuses the factorizations of each Ãi p can be used to reduce the solve time if the timestep dt has remained unchanged from the previous iteration.
-The right-preconditioner is applied to recover X n,i p , which is assigned to the state-vector.The solver simulation time is set to t n,0 + dtc i .Specific multistep schemes are implemented as subclasses of the RungeKuttaIMEX base class and define the H, A, and c tableaus.Dedalus currently implements a number of first, second, and third-order methods from Ascher et al. [92] and Sprague et al. [93].A particular advantage of the Runge-Kutta methods is that they do not depend on any previous iterations of the state variables, so they can take full-order timesteps at the beginning of a simulation and trivially accommodate adaptive timestepping.The cost is that the higher-order schemes perform multiple evaluations of the RHS per iteration, but they tend to run stably with larger CFL safety factors than the multistep schemes.
The ease of switching integrators allows users to easily test a variety of schemes to find the best option for their particular problem.In addition, the multistep and Runge-Kutta base classes make it straightforward to implement new timestepping schemes.

G. Matrix solvers
Dedalus implements a generic interface for matrix solvers in the dedalus.libraries.matsolversmodule which simplifies the implementation and comparison of different routines.The primary routines are direct sparse matrix solvers from the SuperLU and UMFPACK libraries, wrapped through scipy.sparsepackage.For each library, we implement fast single-solve routines as well as routines that store and apply the factorized form of a matrix.These routines enable fast solves against multiple right-hand sides at the expense of initially computing the factorization.This is particularly useful for initial value problems where the timestep is fixed or varying slowly.
Additional routines implement a variety of algorithms specialized to banded matrices and block-diagonal matrices, which result from full-Fourier problems and can be efficiently inverted.The solver interface is designed to be easily extensible, allowing users to simply wrap and test new routines for specific problems.
The matrix solver routine can be specified when instantiating a Solver object, and the default can be set using the Dedalus configuration interface.

X. ANALYSIS AND POST-PROCESSING
The Dedalus handler system enables saving arbitrary analysis tasks while an initial value problem is running.This system utilizes the same symbolic parsing system as is used to specify equations and efficiently evaluates the analysis tasks alongside the RHS terms on a specified cadence.Post-processing tools simplify merging and interacting with the resulting analysis files.

A. File handlers
After building a initial value solver, instances of the FileHandler class can be attached to the solver's evaluator object to coordinate the periodic output of some simulation data to HDF5 files using the h5py library.Each file handler is instantiated with an output directory path and the cadence at which handler's tasks will be evaluated.This cadence can be in terms of any combination of simulation time (specified with sim_dt), wall time (specified with wall_dt), and iteration (specified with iter).Simulation time cadences are often useful for data analysis; wall time cadences are often useful for checkpointing, e.g.saving the full state of a simulation every hour.To limit the file sizes produced by the handler, the outputs are split up into different sets over time, each containing some number of writes that can be limited with the max_writes keyword.For example, to setup a file handler to be evaluated every few iterations: output = solver .evaluator .add_file_handler ( output , iter =5 , max_writes = 100 ) Multiple file handlers can compute and save different sets of tasks at different cadences.For example, you may want to occasionally save full copies of the state variables for checkpointing, more frequently save snapshots of some variables for visualization, and very frequently save scalar quantities such as the total energy in the simulation.

B. Analysis tasks
Tasks, or expressions to be computed and saved by the file handler, are added to a given handler using the add_task method.Tasks are entered in plain text and parsed using the same namespace that is used for equation entry.For each task the output layout, scaling factors, and a name can also be specified.For example, creating a task to evaluate the kinetic energy density of a flow might look like: For checkpointing, you can also simply specify that all of the state variables should be saved: output .add_system ( solver .state , layout = g )

C. Post-processing
By default, the output files for each file handler are arranged hierarchically as follows: 1.At the top level is output directory that was specified when the handler was constructed, e.g../output/ .
2. Within this directory are subdirectories for each set of outputs, with the same name plus a set number, e.g.output_s1/ .
3. Within each subdirectory are HDF5 files containing the local data for each process, with the same name plus a process number, e.g.output_s1_p0.h5.
Often it is preferable to deal with the global dataset when performing analysis or visualization in post-processing.The distributed process files can be merged into global files for each set using the merge_process_files function from the dedalus.tools.postmodule.For some analyses, it is additionally convenient to merge the output sets together into a single file that is global in space and time, which can be done with the merge_sets function.However, this can generate very large files, and is not usually necessary for analyses that are local in time, e.g.individually plotting each output of a task.To assist with performing such tasks in parallel, the visit_writes function will coordinate all available processes to apply a given function to each output across all sets from a handler.
Together, the symbolic specification of analysis tasks and helper functions for merging and interacting with the output files can dramatically simplify user interactions with simulation products.High-level plotting functions for plotting slices of fields and tasks are implemented in the dedalus.extras.plot_tools module, and example scripts utilizing these tools to construct visualizations in parallel are available.The HDF5 output file format was chosen because it is widely used in the scientific community, and allows users to easily examine and visualize simulation outputs using a wide variety of tools and languages.

XI. BENCHMARKS AND EXAMPLES
We demonstrate the features and performance of the Dedalus codebase with a variety of examples involving different types of PDEs from a variety of fields.The examples and some of the unique features that they demonstrate are: • Parallel scaling: strong scaling test of an incompressible hydrodynamics IVP across many nodes.
• Kelvin-Helmholtz: accuracy benchmark of compressible hydrodynamics with a finite-volume code.
• Nonlinear Schrödinger Network: complex-valued PDE on a network of 1D segments, programatic extension of Dedalus interface to form spectral element method.
• Quasigeostrophic Flow: asymptotically reduced equations for rotating incompressible flow, LBVP to balance initial conditions.
• Cylindrical Stokes Flow: low Reynolds-number flow in an annulus using polar coordinates and non-constant coefficients.
• Atmospheric Waves: NLBVP to solve for the structure of an atmosphere with radiative diffusion, EVP to examine normal modes.
Example scripts for these problems are available online7.The project gallery8 is updated with user contributed examples on an ongoing basis.

A. Parallel scaling
A parallel scaling suite9 for Dedalus is publicly available.Fig. 8 shows performance and parallel scaling results from 32 to 2048 cores for 3D Boussinesq hydrodynamics and magnetohydrodynamics simulations in Fourier-Fourier-Chebyshev domains.The tests were run on the Flatiron Institute's Popeye cluster10 using 32 Intel Ivy Bridge cores per node.
The code's speed, as measured in mode-iterations per coresecond, is plotted against the number of cores, as measured in pencils per core.The plateaus for each resolution indicate the regions of ideal strong scaling; large 3D problems are able to scale efficiently to thousands of cores.The roll off at high core counts indicates the end of efficient strong scaling: the parallel efficiency of Dedalus typically remains above 50% down to 8 pencils per core.The ratios of the plateau values for different resolutions indicate that the weak scaling efficiency is proportional to 1/log N, as expected for FFT-based computations.Boussinesq hydro with 8 variables is twice as fast as Boussinesq MHD with 16 variables, indicating that execution time scales linearly with the number of problem variables.codes simulated the Kelvin-Helmholtz instability in a moderate Mach-number compressible flow.At low-to-moderate resolution, numerical errors from the finite-volume method can cause unphysical secondary instabilities to develop within the rolls created by the flow.By directly comparing the nonlinear evolution of the flows at late times, the authors found that the finite-volume method requires a resolution of 16384 2 cells to avoid these spurious instabilities and achieve the same accuracy as Dedalus at a resolution of 2048 2 modes (Fig. 9).This test demonstrates the power of high-order methods for solving PDEs with smooth solutions.At low-to-moderate Mach numbers with finite dissipation, the flow solution lacks strong shocks and its spectral expansion converges rapidly.Generally, for incompressible and low-Mach-number flows in simple geometries, the rapid convergence of spectral methods outweighs their larger per-iteration computation cost, making them the ideal method for simulating a broad range of astrophysical and geophysical flows.

C. Nonlinear Schrödinger Network
The nonlinear Schrödinger equation (NLS) is classical field equation describing the dispersive behavior of wavepackets in a weakly nonlinear medium [94].The focusing NLS is a PDE for the complex-valued field ψ given by In this example, we simulate the 1D NSE on a quantum graph, i.e. a network of connected segments with differential equations (see Berkolaiko and Kuchment [95] and Noja [96] for applications).This is achieved using a Dedalus domain with a single Chebyshev segment but separate variables for the solution on each segment, each governed by the NSE and coupled through their boundary conditions to mimic the network.This demonstrates how the parsing system in Dedalus can be combined with a simple Python workflow to mimic a spectral element method by "connecting" distinct domains via boundary conditions.We begin by loading any user-defined network defined by the planar positions of N v -many vertices and a list of N emany directed edges connecting two vertices.We then build a domain with a single 1D, 64-mode Chebyshev basis, which serves as the underlying elemental basis for the spectral element method.We define an IVP by constructing variables and adding equations for each edge.This is done by looping over edges to create variable names encoding the edge index and using Python string substitution to insert these names into the We encode edge-neighbor information using the graph incidence matrix.Continuity of the solution at each vertex is imposed by iteratively matching the solution at each incident edge: Finally, Kirchoff's law for conservation of flux is imposed at each vertex by requiring the sum of the gradients of the solution on each incident edge, weighted by the edge lengths and signed by whether the edge is incoming or outgoing to that vertex (σ e = ±1), to be equal to zero: A bright soliton is placed on a single edge as the initial condition, and the problem is integrated forward in time using the SBDF2 timestepper.The soliton translates to the end of the segment then scatters and disperses into the other segments and eventually fills the graph.Snapshots of the evolution are shown in Fig. 10.This example demonstrates the composability of the Dedalus API and the advantage of working within a high-level Python environment.

D. Orszag-Tang Vortex
Spectral methods can also correctly solve for high-Machnumber flows which develop shocks, provided that diffusion is introduced in the simulation to regularize the shocks.The Orszag-Tang vortex problem [97] is a standard compressible magnetohydrodynamics (MHD) test problem in astrophysics (e.g. for the fixed-grid Godunov code Athena [98,99], the finite difference code Flash [100], smoothed-particle hydrodynamic codes [101], moving mesh codes [102], etc.).The problem can be simulated with shock-capturing algorithms (e.g.Riemann solvers) or those with numerical diffusivity which can regularize the shocks (e.g.SPH).Here, we explicitly add diffusion of momentum, heat, and magnetic fields to the model to regularize the shocks.
We note that a diffusive shock creates entropy at a rate that is independent of (but mediated by) the microphysical diffusion.For example, in Burger's equation, ∂ t u + u∂ x u = ∂ x (ν∂ x u), the dissipation rate across a shock is The leading order dissipation is independent of the viscosity, and the correction is on the order of the inverse Reynolds number with logarithmic corrections.Numerical simulations of shocks should therefore give results that are mostly independent of the viscosity, provided the Reynolds number is large enough.This principle underlies shock-capturing algorithms, but can also be leveraged in spectral simulations.Because of the weak dependence on the Reynolds number, diffusivities much larger than the natural values can be used to regularize shocks while still respecting important global balances.As long as the resulting diffusive-shock length scale ∆ ∼ ν/∆u is resolved, a spectral computation will be free of Gibbs ringing and produce accurate results.We simulate the Orszag-Tang vortex in a 2D domain which is periodic in both x and y directions, and comprises [0, 1] 2 .A vortex with velocity u = (− sin(2πy), sin(2πx)) is initialized in an ideal gas with constant density and pressure, ρ = 25/(36π) and p = 5/(12π), and ratio of specific heats γ = 5/3.The initial magnetic field is specified by a vector potential A z = B 0 (cos(4πx)/(4π) + cos(2πy)/(2π)), with B = ∇×A, and B 0 = 1/ √ 4π.The adiabatic sound speed γp/ρ = 1, so the flow is supersonic in parts of the domain, which leads to the formation of MHD shocks.We solve the equations  Here, ν, χ, and η are the viscosity, thermal diffusivity, and magnetic diffusivity, T 0 = 1/γ is the background temperature, T is the temperature perturbation, and Υ = log ρ.The heat capacity at constant volume c v = (γ − 1) −1 = 3/2 normalizes the conduction and heating in the energy equation for T .In the nonlinear viscous terms, the symmetric stress tensor S = ∇u + (∇u) T − (2/3)∇•u I.This is an MHD-analog of the equations introduced in Lecoanet et al. [18].
We run the simulation at rather high resolution, using 4096 modes in the x and y directions.We use 3/2 dealiasing in each direction, although this does not eliminate aliasing errors from converting from Υ to ρ.We set all the diffusivities equal to 10 −4 .For timestepping, we use a third-order, four-stage DIRK/ERK method (RK443 of Ascher et al. 92), with an initial timestep size of 2.5 × 10 −5 .We use adaptive timestepping based on the CFL condition associated with both flow and Alfvén speeds (but not the sound speed), with a safety factor of 0.6.The simulation is run to t = 1.
Fig. 11 shows the density at t = 0.5.There are very sharp density gradients due to the formation of shocks.Even more impressive, Dedalus is able to correctly follow the interaction between multiple shocks, which produces the features in the upper left and lower right corners of the figure.These are similar to what is found in high resolution simulations with shock-capturing codes [e.g.99].
To better visualize the shocks, we plot a horizontal profile of the density at y = 0.3125 in Fig. 12.At this height, there are shocks at x ≈ 0. very close to x = 0.7.The shock is well-resolved by 4-5 grid points.Note that the density stays smooth despite no diffusion in the density equation.Rather, it appears that the combination of viscosity and thermal diffusivity cause the temperature and pressure to regularize, which in turn regularizes the density via the ideal gas equation of state, p = ρT.

E. Quasigeostrophic Flow
One of the original motivations behind Dedalus was to allow the straightforward computation of non-standard equation sets.It is a common practice for modelers to start with something universal, such as Navier-Stokes, or Maxwell Equations, make a series of asymptotic reductions, and arrive at a new set of equations that can capture an interesting physical regime.It is often not clear how useful these equations are until they are simulated.But it is risky to write "off-the-shelf" solvers for new equations before there is good evidence that they are useful to at least a few people.
The Quasigeostrophic (QG) model is a classic approximation used in geophysical and astrophysical fluid dynamics.Initially, the QG model was intended to reduce the overall cost of simulations.The idea is to filter sub-dominant fast-timescale waves, while retaining essential nonlinear dynamics.Current state-of-the art simulations no longer require the QG assumption on the basis of cost.However the equations are still used widely because of their relative simplicity and explanatory power.QG is a non-standard model that became widely adopted.Since the advent of QG, other strongly nonlinear reduced models appear in different fields occasionally, but computational results can lag by several years.For example, in stratified fluids [18,103,104], in magnetized fluids [105,106], in Langmuir turbulence [107,108], in near-inertial wave dynamics [109][110][111], and in rapidly rotating fluids [112][113][114] (theory) [93,115,116] (simulation).
The solution of the QG equation has a long history starting with numerical weather prediction in the 1950s [117,118].
Under certain restrictive assumptions, the QG problem can be reduced to 2D equations on the surface [e.g.119,120], or three-dimensional equations in a triply periodic domain [e.g.121,122].However, the most general case requires a three-dimensional layer with boundaries [e.g., 123], and nonconstant coefficients.In our current work, we pick QG as an interesting example problem because it highlights many features unique to Dedalus.We choose the most difficult and general case of a finite 3D layer because it illustrates the most design features within a single model.Like previous work, it would be straightforward to simplify our example script to model a triply periodic domain, or a two-dimensional layer, if one wanted.
The traditional QG equations collapse to a single, secondorder equation for the potential vorticity (PV).This is equivalent to a first-order formulation with two dynamical variables and two first-order equations.A first-order formulation does not increase the computational cost because the second-derivative matrix has double the bandwidth of the firstderivative matrix.Dedalus can use the traditional PV formulation in terms of the streamfunction and its vertical derivative.Here, we describe an alternative formulation choosing more physically meaningful variables, which makes the problem more straightforward to pose and generalize; at no additional cost.
We solve for two variables: w (upwelling) and p (pressure).All other physical quantities (e.g.buoyancy and velocity) are diagnostic in terms of w and p.The upwelling field is not part of the traditional formulation.Although we solve for two variables instead of a single variable (PV) in the traditional formulation, using the vertical velocity makes the boundary conditions much simpler; we do not need to solve a nonlinear buoyancy advection equation on the boundary.
We use Dedalus substitutions to define several of the important physical variables in terms of the pressure, p, Dedalus substitutions can be recursive, e.g., to define ζ.The substitutions reflect the well-known geostrophic and hydrostatic diagnostic balances inherent in the theory.Pressure assumes the role of the stream function, ψ.We subsequently define the advection substitution with an argument, We also use substitutions to prescribe horizontal 4th-order hyperdiffusion, The problem parameters are the Coriolis variation β, the Ekman friction γ, the stratification profile Γ(z), the thermal-wind profile U(z), and the hyper-diffusivities ν 4 and κ 4 .We solve the coupled prognostic equations, The traditional formulation solves the entire buoyancy evolution equation on the boundary.This is a complicated way to impose no vertical flow.Because the upwelling is part of our formulation, we impose the boundary condition directly.More precisely, we impose Ekman flux boundary conditions.
Ekman pumping conditions result from a closure model of an asymptotically thin viscous boundary layer [124].We could easily include boundary stress or topography on the right-hand side.
The −U shear term in the bulk equation comes from a large-scale background thermal-wind profile where Θ(y, z) = y S(z) is a linear north-south-varying buoyancy profile.Unstable modes extract energy from this thermal variation.The evolution equations are a coupled pair of 1st-order equations in ∂ z for p and w.There is only one independent time derivative as ζ and θ are both related to p, as is PV Looking at the diagnostic balances, we can see that This is the "thermal-wind" balance that follows from geostrophy and hydrostatic balance.The vertical velocity field, w, satisfies its own diagnostic balance at each time step, which removes non-balanced forcing terms.The vertical velocity here is exactly analogous to the pressure field for incompressible hydrodynamics; it is a Lagrange multiplier enforcing the consistent evolution of ζ and θ, which are both related to the pressure.
Balanced initial conditions -We initialize the pressure with filtered random noise on the grid; vorticity and buoyancy follow directly.In practice, this is likely to be sufficient to initialize the simulation.Some timesteppers may correct for unbalanced initial conditions.For completeness, we illustrate how to solve a Linear Boundary-Value Problem (LBVP) for the upwelling given an initial pressure field.
Knowing p(t = 0, x, y, z), we set up the coupled problem for w(t = 0, x, y, z): We could collapse these to a single system for w.However, we can mostly reuse the original system if we introduce a slack variable, t .The slack variable takes the place of the time evolution terms in the full system.Solving this LBVP gives the vertical velocity at t = 0 and t = ∂ t p| t=0 .This general approach can be used to derive balanced initial conditions for other equations with time-independent constraints, e.g., incompressible hydrodynamics.The simulation -We non-dimensionalize z with the domain depth, H.We non-dimensionalize x, y with the Rossby radius of deformation at the top of the box, L = N H/ f , where N is the characteristic background buoyancy frequency at z = H, and f is twice the background planetary rotation rate.We solve for the relative nonlinear fluctuations around a linearly unstable thermal wind profile; U(z) = z, U (z) = 1.We non-dimensionalize time using the background shear rate.Here we instead use a depth-dependent non-dimensional stratification parameter, which necessitates a fully three-dimensional approach.
Dedalus expands this profile in a Chebyshev series up to a cutoff tolerance of 10 −8 .The presence of the shear implies a large-scale background buoyancy gradient.We include a background planetary vorticity gradient (β-effect) with β = 0.1.To saturate the simulation we use hyper-diffusivities ν 4 = κ 4 = 10 −6 , bottom friction γ − = 0.16, but no top friction.The horizontal domain size is 40 × 20 Rossby radii in the x, y directions respectively.We use 256 × 128 Fourier modes in the x, y directions, and 32 Chebyshev modes in the z direction, all using 3/2 dealiasing.The Chebyshev tau method balances energy to exponentially high accuracy; exact energy conservation could be still be imposed, however, by formulating a self-adjoint system with an alternative orthogonal polynomial basis [60,125,126].
The whole configuration is baroclinically unstable.The motion is a kind of side-ways convective heat transport.For background, we highly recommend the book "Atmospheric and Oceanic Fluid Dynamics" by Geoffrey K. Vallis [127].To summarize the phenomenology: "It is the instability that gives rise to the large-and mesoscale motion in the atmosphere and ocean -it produces atmospheric weather systems, for example -and so is, perhaps, the form of hydrodynamic instability that most affects the human condition." We compute the nonlinear evolution until the system reaches a statistically stationary state, which takes roughly 75 dynamical time units.Fig. 13 shows the PV and vertical vorticity ζ at t = 200.The solution contains a sea of compact eddies that form, merge, and breakup over several dynamical times.For these parameters, the system is mostly two-dimensional.There are, however, nontrivial variations in the depth-dependence of PV.

F. Stokes Flow
Many important biophysical and industrial fluid problems occur in a high-drag (or low Reynolds number) limit (u 0 0 /ν = Re 1) where the momentum equation reduces to Stokes flow.Here, we present and solve a simple test problem demonstrating boundary-driven Stokes flow in curvilinear geometry with time-dependent tracer fields.
We simulate the classic "unmixing" demonstration [e.g.128], where a Taylor-Couette device with an inner-cylinder radius, R in , turns n times and then reverses back to the start.For low Re, the flow is reversible, and m-many dye tracers will appear to mix and then unmix.We non-dimensionalize lengths with the gap width, 0 = R out − R in , and velocities with the maximum inner-cylinder speed, u 0 .The Peclet number u 0 0 /κ = Pe controls the tracer mass diffusion.Taking Re → 0, The domain is a no-slip two-dimensional cylindrical annulus with radius 1 ≤ r ≤ 2 and angle 0 ≤ θ < 2π.The diffusive dye flux F m = −∇c m /Pe = 0 at the boundaries.The flow vanishes at the outer wall; u(t, R out , θ) = 0.The tangential velocity at the inner wall is a time-dependent regularized square wave u θ (t, R in , θ) = arctan(50 sin(t/n rot )) arctan( 50) .
The time-dependent boundary condition uses the GeneralFunction mechanism described in §VII E. The initial dye fields are circular Gaussians with widths δ = 1/4 centered at r dye = 3/2, θ m = 2πm/3.Since the domain is a cylindrical annulus, which excludes the origin, it can be accurately discretized using a direct product of Fourier and Chebyshev bases in θ and r, respectively.For simulating the full disk, non-direct product bases imposing certain regularity conditions at the origin are necessary [59] (these will be implemented in future versions of Dedalus).
In polar coordinates, the differential operators ∇, ∇ 2 contain non-constant coefficients proportional to 1/r and 1/r 2 multiplying the radial and azimuthal derivatives, which can simply be added when entering the equations.Naively (for example) one might implement the angular part of the Laplacian as d( u,th=2)/r**2 where th is the θ spatial variable.However, the Chebyshev expansion r −2 converges very slowly, leading to dense non-constant coefficient matrices.To avoid this, we multiply all equations by r or r 2 (depending on the order of derivatives).The resulting non-constant coefficient matrices only contain bandwidth one or two.
We solve this system with 512×512 modes using the Runga-Kutta 443 time stepper and a fixed dt = 0.005.The simulation  runs approximately four rotations forwards and backwards; n rot = 8.Fig. 14 shows snapshots of the flow at five times, symmetric about the middle of the simulation.The dye spreads out into thin sheets as the inner cylinder rotates in the counterclockwise direction; the sheets congregate back, differing from the original Gaussian blobs due to the action of the dye.The final shapes are not Gaussian.Shear dispersion leads to thin structure (sharp gradients) in the radial direction, and hence the diffusion is not isotropic.This simulation uses high Pe = 10 7 , causing the sheets to become very thin.The dye-patches area stays constant in the absence of diffusion.Equating the initial and final area A 0 ∼ πδ 2 /4 ≈ A f ∼ n rot 2πr dye δr, implies the sheet width δr ∼ 1/200.For the diffusion time across the sheets to be greater than the total simulation time, τ D 2n rot 10, we require Pe 10/δr 2 ∼ 4 × 10 5 .The example code also works with Pe = ∞, disabling the ∇ 2 c m /Pe terms and flux boundary conditions.In this case, the well-resolved sheets return to the original Gaussian with only slight time-stepping errors.

G. Atmospheric waves
Stratified (non-rotating, neutral) atmospheres support two classes of linear waves: acoustic waves and gravity waves.The properties of these waves are easy to compute in constant coefficient atmospheres (like isothermal atmospheres), but require numerical solutions when the atmosphere structure introduces non-constant coefficients.Solving for the nonlinear atmospheric background structure is challenging.Here we use a nonlinear boundary value problem (NLBVP) to solve for an atmosphere profile, and then solve an eigenvalue problem (EVP) for the wave modes.
We study an optically thin plane-parallel atmosphere coupled to an underlying, optically thick adiabatic layer.This models the Sun's photosphere and lower atmosphere, or Jupiter's atmosphere spanning the radiative-convective boundary and the lower stratosphere.For the background structure, we solve hydrostatic and thermal equilibrium, including radiation transport under the Eddington tensor approximation [e.g., [129][130][131][132]: with P g the gas pressure, P r the radiation pressure, ρ the density, g the constant gravity, κ the opacity, F r the constant radiative flux, and c the speed of light.We use an ideal gas equation of state P g = ρRT with R the gas constant and T the temperature, the Eddington tensor closure P r = f aT 4 with a the radiation constant and f = 1/3, and a Kramer-like opacity law for κ: With these substitutions, and non-dimensionalizing by ρ 0 , T 0 , and a lengthscale L, the equations become where g * = gL/RT 0 , F Edd = gc/κ 0 , and Q = ρ 0 κ 0 F r L/4 f acT 4 0 .For F F Edd , the optically thick deep solution will be nearly polytropic with ρ ∝ T m where m = (3 − b)/(1 + a).We fix a = 1, b = 0 and take g * = m + 1 to define the lengthscale L. Q is taken to be 1 − 1.68 × 10 −4 , corresponding to ln(T eff /T 0 ) = −2 in a gray atmosphere.We solve this nonlinear boundary value problem in Dedalus using ln ρ and ln T as dynamical variables with 128 Legendre modes.A dealiasing factor of 2 is used, which reduces but cannot eliminate aliasing errors since exponential nonlinearities (formally including infinite polynomial orders) are present in the formulation.Solutions converge to a relative tolerance of 10 −8 in O(10) Newton iterations.When F = 0 there is an analytic solution due to Brandenburg [133] which our numerical solution matches to 10 digits in the L 2 norm, verifying the correctness of the implementation.We proceed here with F/F Edd = 10 −5 , for which there is no analytic solution.
We plot the temperature, density, and pressure in Fig. 15.The bottom, optically thick, part of the atmosphere is nearly polytropic with ρ ∝ T (3/2) , as expected for this choice of a and b [134].The top, optically thin, part of the atmosphere is nearly isothermal while the density and pressure drop exponentially.This region has a constant Brunt-Väisälä (buoyancy) frequency Next we consider the oscillation modes of this atmosphere.The character of waves modes depends on their frequency relative to ω ± , which are defined as where the Lamb frequency ω L and acoustic frequency ω ac are related to the properties of the background atmosphere [e.g., 135].Acoustic waves have frequencies greater than ω + , while gravity waves have frequencies less than ω − .We plot ω ± for our atmosphere in Fig. 15.High-frequency sound waves can propagate all the way to the bottom of the atmosphere, whereas gravity waves mostly stay in the optically thin isothermal layer.We write the ideal, linearized fully compressible equations as ∂ t w + ∂ z T 1 + T 0 ∂ z ln ρ 1 + T 1 ∂ z ln ρ 0 = 0, (126) ∂ t u + ∂ x T 1 + T 0 ∂ x ln ρ 1 = 0, (127) ∂ t ln ρ 1 + w∂ z ln ρ 0 + (∂ x u + ∂ z w) = 0, (128) ∂ t T 1 + w∂ z T 0 + (γ − 1)T 0 (∂ x u + ∂ z w) = 0, (129) for the perturbation velocities w and u, and thermal and log density perturbations T 1 and ln ρ 1 .The non-constant coefficients of the atmosphere are ∂ z ln ρ 0 , T 0 and ∂ z T 0 and we take γ = 5/3.For simplicity, we impose impenetrable top and bottom boundaries (w = 0).We formulate this as an eigenvalue problem in Dedalus by replacing ∂ t = iω and ∂ x = −ik x and solving for complex eigenvalue ω given specified horizontal wavenumbers k x .We use the eigentools12 package to automatically test whether eigenvalues are numerically converged using the techniques described in Boyd et al. [78, ch 7.5].As expected, we find about 50% of the eigenvalues are converged for sufficiently high resolution.Here we solve with 256 Legendre modes for twenty distinct k x , themselves logrithmically spaced.Legendre expansions produce optimal polynomial approximations in the L 2 norm and may be preferable to Chebyshev expansions in problems where the lack of a fast transform is unimportant, such as dense eigenvalue problems.
Fig. 16 shows the frequencies (or equivalently periods) of the wave modes calculated with the Dedalus eigenvalue solver.Because sound waves have frequencies greater than ω + and gravity waves have frequencies less than ω − , we also plot the minimum of ω − and maximum of ω + over the vertical extent of the atmosphere (both of which occur in the isothermal layer).This allows us to easily distinguish sound waves from gravity waves.We also find that, at fixed horizontal wavenumber, the frequency spacing between sound waves is about constant, whereas the period spacing between gravity waves is about constant.This well-known property follows from the dispersion relation in the large-vertical-wavenumber limit.
Fig. 17 shows vertical velocity eigenfunctions w, scaled by √ ρ.These modes correspond to the marked modes in Fig. 16.
All eigenfunctions have been normalized by the mode kinetic energy: We find that the eigenfunctions follow the intuition of the propagation diagram in Fig. 15.Gravity modes are trapped in the upper atmosphere, while acoustic modes span a larger region of the full atmosphere with the highest frequency modes reaching the bottom of the domain.The effects of the non-constant coefficients are visible in the acoustic modes which have a varying vertical wavelength in the deep adiabatic interior and a nearly constant vertical wavelength in the upper isothermal region.This example of wave eigenfunctions and eigenvalues demonstrates the ability to link a complex atmosphere with detailed solves.This example considered ideal waves in a bounded atmosphere, but can be easily extended to nonadiabatic waves with thermal damping, to magnetohydrodynamic waves propagating through a background magnetic field, and to systems with open boundary conditions.

H. Diamagnetic Levitation
This example computes the motion of a levitating rigid body under the competing influence of gravity and an imposed diffuse background magnetic field.There are three general classes of magnetic levitation: (I) AC Electromagnetic suspension of a conductor relying on an alternating background field; (II) Motion-induced suspension such as Maglev trains and spinning tops; and (III) diamagnetic levitation, which results from magnetically induced eddy currents from a DC field within special types of macroscopic media.Diamagnetic levitation is usually the weakest form, but superconductivity provides the important exception.Past experimental work on diamagnetic levitation used an approximately 16 Tesla solenoid to suspend a frog quasi-stably (and mostly safe for the frog) [136].
For context, magnetic levitation of any kind is actually a rigid-body version of magnetohydrodynamic buoyancy (or Parker instability), which are well-know in astrophysics.[137].Maglev train physics is very similar to differential-rotation induced magnetic buoyancy; which is important in stellar and  Gravity modes are almost entirely confined to the isothermal portion of the atmosphere and are evanescent in the adiabatically stratified deep interior.Acoustic modes propagate in both regions, with high frequency waves propagating deeper.
planetary interiors [138].All types of levitation would be amenable simulation in Dedalus; including magnetohydrodynamic buoyancy.Here we focus exclusively on solid-body diamagnetic levitation.Electrodynamics -We use the (non-relativistic) Maxwell's equations for the magnetic flux density, B; electric field, E; magnetic field, H; and current density, J.
Closing the system requires a constitutive relation between H and B, and Ohm's Law between E and J (in a frame moving with velocity v), The constitutive parameters are the resistivity ρ (inverse electrical conductivity), the magnetic permeability µ, and the diffusivity η = ρ/µ.We use a magnetic potential B = ∇ × A to enforce the magnetic Gauss's law ∇ • B = 0.In two dimensions we can use a scalar magnetic potential (three dimensions would require a gauge choice).Then Faraday's law of induction is where B x = ∂ y A and B y = −∂ x A.
Eulerian velocity -We embed a freely movable (yet coupled) rigid body within the neutral background.First, the rigid-body translation and rotation produce the local Eulerian "fluid" (continuum) velocity if (x, y) ∈ M(x 0 , y 0 , θ), then v x (t, x, y) = x 0 (t) − (y − y 0 (t)) θ(t) (136) v y (t, x, y) = y 0 (t) + (x − x 0 (t)) θ(t) The set M(x 0 , y 0 , θ) represents the points inside the body with center of mass x 0 (t), y 0 (t) and orientation angle θ(t).The Eulerian velocity vanishes outside the body.Mask function -We use a smooth mask function (rather than strictly binary) to indicate the solid-object region.We model our solid as an ellipse with 2:1 semi-major/semi-minor axis ratio.We use an adjustable error function profile for the mask function The smoothed-mask function approach is called the "smoothed-volume-penalty method" (SPV).Recent detailed analysis shows this method is quite competitive with all other techniques for treating moving solid objects.Also, the smooth transition region aids in controlling error compared to an abrupt transition [62].
Coupled ODEs -Translation and rotation about the center of mass follow a set of ordinary differential equations describing Newton's laws of conservation of momentum and angular momentum.
The parameters m and I are the object's mass and moment of inertial; g represents gravitational acceleration.For the ellipse I/m = π a b (a 2 + b 2 )/4.Solving this system of 6 ODEs requires computing the total force, Fx , Fy , and torque, τz .Each depends on the background magnetic field and material properties of the rigid body.
Lorentz force -We compute the total force on the object by integrating the Lorentz force density The force on the rigid body is The gradient of the permeability makes sense because the material is highly localised, yet infinitely differential via Eq.(138).Magnetic susceptibility -For most everyday substances (e.g., frogs) the magnetic permeability is extremely close to that of free space, µ 0 .Defining Figure 18.Evolution of a diamagnetic object levitating in a magnetic field.The purple ellipse shows the solid-body region.The black lines show magnetic field lines the same way iron filings trace the magnetic field from a strong bar magnet.Green dots show the timedependent trajectory of the ellipse's center of mass at equally-spaced time intervals.
we assume | χ| 1 (e.g., graphite has χ ≈ −2 × 10 −5 ), This simplifies our equations considerably.This assumption would not be justified for a partial superconductor where χ ≈ −1.In the case of the force and torque τ ≈ − χ ∫ |B| 2 2µ 0 (r − r 0 ) × ∇M ε dx (147) Considering a negative upward magnetic pressure gradient, it follows that levitation requires χ < 1.The proportionality of the force to χ shows the need for very strong magnetic fields.Simulation -We solve the magnetic induction equation in non-moving electrically neutral background.Most of the domain is filled with a harmonic magnetic field to a good approximation; like the air around us.We non-dimensionalize the kinematics of the simulation based gravity and the semiminor axis of our elliptical rigid body.Therefore g = 1, a = 2, and b = 1.We choose a mask smoothing parameter ε = 0.025.We solve the magnetic induction equation under the above approximations in a rectangular 16 × 8 domain.We use a Fourier series with 1024 modes for the x direction and a Chebyshev series with 512 modes for the y direction.In both directions we use the 3/2 dealiasing rule.We fix the vertical magnetic field at the bottom of the domain, and allow a free vacuum field at the top.We enforce the top boundary condition via the Hilbert transform in the x direction A = − B 0 k sin(k x) at y = y 0 (148) B x − H x (B y ) = 0 at y = y 1 (149) The simulation starts with a harmonic field satisfying the boundary conditions.We release the solid ellipse from rest with an initial 30degree tilt.Fig. 18 shows the ellipse and magnetic field at three representative times in its evolution.The object free falls until it encounters a strong enough magnetic pressure gradient to rebound.Along the way, some of the background field diffuses into the object.As the object bounces upward, the captured field resists and eventually halts the rebound.The process continues until the object is mostly sliding and rotating along the top of the magnetic arches.

XII. CONCLUSION & OUTLOOK
The integrity and reproducibility of computational science relies critically on robust, open-source, and well-supported code.We have introduced Dedalus, a public Python framework for solving PDEs using spectral methods with an interdisciplinary community of users and developers.
Dedalus enables users to construct custom domains using the direct product of spectral series, to symbolically specify systems of equations and boundary conditions, and to perform custom data analysis.Dedalus supports initial value, eigenvalue, and linear and nonlinear boundary value problems.The solution of these problems is automatically parallelized using MPI, allowing for seamless scaling from individual laptops to supercomputers with tens-of-thousands of cores.The Dedalus distribution includes simple example scripts to help users become familiar with the code's features.In this paper, we have included a diverse range of example problems demonstrating more advanced features and the code's adaptability to many different physical models.
We are committed to continually enhancing and optimizing Dedalus and supporting its users.Substantial extensions to the codebase are currently underway.These include support for multiple coupled dimensions, coordinate-free equation entry, enhanced stand-alone data analysis, and non-direct-product bases for tensorial quantities in curvilinear coordinates (particularly full disks, spheres, and balls [59][60][61]).We actively help users to troubleshoot problems and formulate Dedaluscompatible models on our public, searchable mailing list.Our goal for the future is to continue growing through community development and to provide robust tools for a wide range of scientific applications.

Figure 1 .
Figure 1.Chebyshev polynomials (a) can be viewed as cosine functions (b) drawn on a cylinder and projected onto the bisecting plane (c).

Figure 2 .
Figure 2. Derivative matrices using (a) Chebyshev-T and (b) Chebyshev-U polynomials as test functions.Using different families of test and trial functions allows general differential operators to be represented with sparse matrices.

importFigure 3 .
Figure 3. Conceptual stages in matrix construction for the Chebyshev discretization of Poisson's equation with Dirichlet and Neumann boundary conditions (matrix entries colored by sign).(1) Original (T-to-T) matrices with block columns corresponding to u and u x , and block rows corresponding to the LHS of ∂ x u x = 0 and ∂ x u − u x = f (x).The T-to-T differentiation matrices are dense upper triangular.(2) After T-to-U conversion and the addition of boundary conditions via the tau method.The derivatives are sparse and the identity-block bandwidths increase slightly.The boundary conditions involve all coefficients of u and u x .(3) After Dirichlet recombination.Boundary rows are sparse and the equation-block bandwidths increase slightly.(4) After reversing the Kronecker product to group by modes rather than variables.This final matrix is highly sparse and completely banded.

Figure 4 .
Figure 4. Example collocation grids for the implemented bases.The Fourier basis grid is evenly spaced on a periodic interval.The Chebyshev and Legendre basis grids cluster near the ends of a finite interval.The Hermite basis grid spreads out over the real line.The Laguerre basis grid spreads one way over the half line.The Compound basis concatenates other bases on adjascent segments; e.g. three Chebyshev segments.

#
Substitution defining the kinetic energy # density for a 3D fluid simulation with # density rho and velocity (u ,v , w ) .problem .substitutions [ KE_density ] = \ " rho * ( u * u + v * v + w * w ) / 2 " # Substitution defining the Cartesian Laplacian # of a field .Here A and Az are dummy variables # that would be replaced by simulation variables # in the equations .Note the system is written # in first -order form along the z dimension .problem .substitutions [ L (A , Az ) ] = \ " dx ( dx ( A ) ) + dy ( dy ( A ) ) + dz ( Az ) "

Figure 8 .
Figure 8. Dedalus performance (mode-iterations per core-second) vs parallel scaling (pencils per core) from 32 to 2048 cores for 3D Boussinesq (a) hydrodynamics and (b) magnetohydrodynamics. Colors correspond to different 3D (Fourier-Fourier-Chebyshev) resolutions.For each resolution, the open circle indicates the run on 512 cores, with the core count increasing by factors of two to the right.Efficient strong scaling is seen down to roughly 8 pencils per core.Weak scaling shows the expected ∝ 1/log N efficiency for FFT-based computations.Execution time scales linearly with the number of problem variables.

8 Figure 9 .
Figure 9. Snapshots of a moderate Mach-number Kelvin-Helmholtz instability test problem simulated at various resolutions with a finitevolume code (Athena) and Dedalus.The finite-volume method introduces small errors which trigger unphysical secondary instabilities in the vortex rolls.These spurious instabilities disappear as the simulation resolution is increased.Quantitative comparisons show comparable accuracy between the finite-volume method with 16384 2 degrees of freedom and Dedalus with 2048 2 degrees of freedom.Figure adapted from Lecoanet et al.[19]

Figure 10 .∂ 2 ψ e ∂x 2 =
Figure 10.Evolution of the nonlinear Schrödinger equation on a network, simulated by coupling the boundaries of different fields on a 1D Chebyshev segment.A soliton initially isolated to one segment scatters at the vertices and fills the network over time.

Figure 11 .
Figure 11.Density at t = 0.5 in the Orszag-Tang vortex test.The sharp density jumps visualize the MHD shocks.The structures in the upper left and lower right corners are due to shock-shock interactions, which are correctly treated in Dedalus.

Figure 12 .
Figure 12.A profile of the density at t = 0.5 and y = 0.3125 in the Orszag-Tang vortex test.The inset shows the density profile around x = 0.7, with crosses denoting grid points (using the non-dealiased grid).The shock is resolved by several grid points.

Figure 13 .
Figure 13.Surface and lateral slices of the PV (top) and buoyancy perturbation θ (bottom) in a 3D quasigeostrophic flow.Both images are at t = 200 in the statistically saturated state.

Figure 14 .
Figure 14.Evolution of the tracer fields in a reversible Taylor-Couette Stokes flow.The three separate tracer fields are visualized as red, green, and blue in a single RGB image.As time progresses, we rotate the inner cylinder first counter clockwise and then clockwise.By the end of the simulation, the flow has returned to its initial conditions, modified only by mass diffusion of the dye.

Figure 15 .
Figure 15.(a) Structure of a balanced atmosphere with an Eddington closure for radiation transport and a Kramer-like opacity law with a = 1, b = 0.At depth, the atmosphere is nearly an adiabatic polytrope with constant s/C P and N 2 ≈ 0. The upper atmosphere is nearly isothermal with exponentially decaying ρ and P. (b) The upper atmosphere is a resonant cavity for internal gravity waves with N 2 > 0. Frequencies are normalized by the Brunt-Väisälä frequency N in the isothermal layer.

Figure 16 .
Figure16.Eigenmode (a) frequencies and (b) periods for a radiative atmosphere.The mode frequencies ω are normalized by the Brunt-Väisälä frequency N in the isothermal layer.Acoustic modes ("ac", blue), gravity modes ("gw", red), and f-modes ("f", black) are identified by their frequencies.Acoustic waves are expected to have nearly equal frequency spacing, while gravity waves have nearly equal period spacing, as is observed.Colored circles highlight modes shown in Fig.17.
Figure16.Eigenmode (a) frequencies and (b) periods for a radiative atmosphere.The mode frequencies ω are normalized by the Brunt-Väisälä frequency N in the isothermal layer.Acoustic modes ("ac", blue), gravity modes ("gw", red), and f-modes ("f", black) are identified by their frequencies.Acoustic waves are expected to have nearly equal frequency spacing, while gravity waves have nearly equal period spacing, as is observed.Colored circles highlight modes shown in Fig.17.

Figure 17 .
Figure 17.Vertical velocity eigenfunctions for a radiative atmosphere, scaled by √ ρ, including (a) acoustic modes and (b) gravity modes.