Data assimilation for massive autonomous systems based on second-order adjoint method

Data assimilation (DA) is a fundamental computational technique that integrates numerical simulation models and observation data on the basis of Bayesian statistics. Originally developed for meteorology, especially weather forecasting, DA is now an accepted technique in various scientific fields. One key issue that remains controversial is the implementation of DA in massive simulation models under limited computation time and resources. In this paper, we propose an adjoint-based DA method for massive autonomous models that produces optimum estimates and their uncertainties within practical computation time and resource constraints. The uncertainties are given as several diagonal components of an inverse Hessian matrix, which is the covariance matrix of a normal distribution that approximates the target posterior probability density function in the neighborhood of the optimum. Conventional algorithms for deriving the inverse Hessian matrix require $O(CN^2+N^3)$ computations and $O(N^2)$ memory, where $N$ is the number of degrees of freedom of a given autonomous system and $C$ is the number of computations needed to simulate time series of suitable length. The proposed method using a second-order adjoint method allows us to directly evaluate the diagonal components of the inverse Hessian matrix without computing all of its components. This drastically reduces the number of computations to $O(C)$ and the amount of memory to $O(N)$ for each diagonal component. The proposed method is validated through numerical tests using a massive two-dimensional Kobayashi's phase-field model. We confirm that the proposed method correctly reproduces the parameter and initial state assumed in advance, and successfully evaluates the uncertainty of the parameter. Such information regarding uncertainty is valuable, as it can be used to optimize the design of experiments.


Abstract
Data assimilation (DA) is a fundamental computational technique that integrates numerical simulation models and observation data on the basis of Bayesian statistics. Originally developed for meteorology, especially weather forecasting, DA is now an accepted technique in various scientific fields. One key issue that remains controversial is the implementation of DA in massive simulation models under limited computation time and resources. In this paper, we propose an adjointbased DA method for massive autonomous models that produces optimum estimates and their uncertainties within practical computation time and resource constraints. The uncertainties are given as several diagonal components of an inverse Hessian matrix, which is the covariance matrix of a normal distribution that approximates the target posterior probability density function in the neighborhood of the optimum. Conventional algorithms for deriving the inverse Hessian matrix require O(CN 2 + N 3 ) computations and O(N 2 ) memory, where N is the number of degrees of freedom of a given autonomous system and C is the number of computations needed to simulate time series of suitable length. The proposed method using a second-order adjoint method allows us to directly evaluate the diagonal components of the inverse Hessian matrix without computing all of its components. This drastically reduces the number of computations to O(C) and the amount of memory to O(N ) for each diagonal component. The proposed method is validated through numerical tests using a massive two-dimensional Kobayashi's phase-field model. We confirm that the proposed method correctly reproduces the parameter and initial state assumed in advance, and successfully evaluates the uncertainty of the parameter. Such information regarding uncertainty is valuable, as it can be used to optimize the design of experiments.

I. INTRODUCTION
Determining the model parameters and initial states of simulation models is an important task in various scientific fields, as it enables the temporal evolution of the target system to be observed. However, in many practical cases, this procedure is somewhat complex, because it is often impossible to observe the parameters and initial states experimentally. In materials engineering, for example, phase-field (PF) models are often used to simulate the evolution of microstructures during the processes of solidification and phase transformation [1][2][3][4][5][6][7].
PF models phenomenologically describe the dynamics of phases using field variables that evolve in time depending on the gradient of the total free energy. Since a PF model usually requires a huge number of grid points to discretize the field variables, the computational cost tends to be prohibitive. Nonetheless, PF models are accepted beyond the field of materials engineering, such as in hydrodynamics [8][9][10], as they can be employed to model phases and their dynamics using mathematical expressions that are easy to manipulate. In PF models, the parameters and initial states perfectly determine the dynamical properties of the phases.
However, various limitations in practical experiments sometimes prevent such parameters and initial states from being estimated.
Data assimilation (DA) is a computational technique that integrates numerical simulation models and observational data on the basis of Bayesian statistics. Thus, DA enables the parameters and initial states of PF models to be estimated by systematically extracting as much information as possible from the given observational/experimental data. The process of DA evaluates a probability density function (PDF) (or the "posterior PDF," to be precise) of the unknown parameters and unobservable states that is conditional on the given observation data [11]. DA was originally developed in the fields of meteorology and oceanography [12][13][14], but is now applied in areas such as seismology, marketing science, and industrial science [15][16][17][18]. Several sequential Bayesian filters and other non-sequential estimation methods have been used in DA. Common sequential Bayesian filters such as the ensemble Kalman filter [19][20][21] and particle filter [22][23][24] estimate the target posterior PDF using Bayes' theorem. This approximation is formed using an ensemble of realizations, meaning that the computational cost is proportional to the number of realizations. The implementation of one sequential Bayesian filter on a given simulation model is not especially complex, and a sufficiently accurate estimate of the posterior PDF can be achieved when the number of degrees of freedom N of the simulation model is sufficiently small. However, Bayesian filters become inefficient when applied to massive simulation models, as the number of realizations required to obtain a converged posterior PDF is proportional to e O(N ) . Unlike sequential Bayesian filters, adjoint methods [25][26][27] directly determine the optimum solution using a gradient method to maximize the target posterior PDF. Although this achieves a drastic reduction in the computational cost, the ordinary adjoint method cannot evaluate the uncertainty in its estimations, something which sequential Bayesian filters obtain in a straightforward manner. Such uncertainties provide valuable information related to both the estimations and the optimum solution. For example, the uncertainties provide feedback for the experimental design that helps to identify the parameters of interest with the required accuracy. The quantification of uncertainty is currently a very important issue in the application of DA to massive simulation models.
This paper describes an adjoint-based DA methodology that simultaneously estimates the optimum solution and its uncertainty, and is capable of being applied to simulation models with a huge number of degrees of freedom. We first construct a method to estimate the parameters and initial states involved in an autonomous system, and then validate our approach using a PF model as a testbed. Section II introduces the formulation of an adjoint-based DA method to simultaneously obtain an estimation and its uncertainty using second-order information of the posterior PDF. Section III describes the formulation of an estimation test using synthetic data generated from the time series given by Kobayashi's PF model [1]. Section IV presents and discusses the results of estimation tests, and Section V concludes this paper by summarizing the results of this study. Suppose an autonomous simulation model is given by ∂z/∂t = A(z; a), where z(t) ∈ R Nz denotes a time-dependent variable and A : R Nz → R Nz is a function of z and a time-invariant parameter vector a ∈ R Na . The system model describes the time evolution of a state vector consisting of z and a. Let X(t) = z ⊤ , a ⊤ ⊤ ∈ R N be the state vector, where • ⊤ denotes the transpose of • and N = N z + N a . Since a is time-invariant, i.e., ∂a/∂t = 0, the system model can be represented by The observation model describes how X(t) relates to a time series of observation data where K denotes the dimension of the observations. Considering that the data include noise, the observation model can be described as whereȟ : R N → R K is an observation operator that outputs quantities from X comparable with the data and Ω(t) denotes observation noise. In this paper, we assume that f andȟ are nonlinear functions, and Ω is white noise that follows a normal distribution with a diagonal covariance matrix. Our purpose is to obtain the optimum initial state X(0) together with the uncertainties of the variables of interest.
In consideration of PF models, we also assume that X(0) is constrained by where X Lower ∈ R N and X Upper ∈ R N denote the lower and upper bounds of X(0), respectively.
To simplify our formulation, we normalize X as This leads to the following θ-dependent forms of Eqs. (1)-(3): where is an observation operator after transforming X to θ, and Θ = θ(0).
Bayes' theorem states that a conditional PDF p (Θ|D), which is called the posterior PDF, can be described as where p(Θ) and p (D|Θ) are called the prior PDF and likelihood, respectively. Note that proportional to a product of the prior PDF and the likelihood.
The prior PDF contains prior information provided by experience and intuition. If we suppose that this prior information is the constraint condition given by Eq. (7) and that Θ i is independent for each i, p(Θ) is given by a product of prior PDFs of Θ i , where When observation data are obtained at t = t 1 , t 2 , · · · , t n , p (D|Θ) can be written as where p (Ω k (t s )) = 1 and σ k is the standard deviation of Ω k (k = 1, · · · , K). We consider σ k to be a hyperparameter.
This paper defines the optimum solutionΘ to be the Θ that maximizes the posterior PDF p (Θ|D). For the convenience of numerical computation, we aim to minimize a cost which comes from a negative logarithmic posterior PDF, i.e., p(Θ|D) ∝ e −J , to findΘ, rather than maximizing p (Θ|D). The constraint in Eq. (13) arises from the term − log p (Θ), which appears when calculating − log p(Θ|D).

B. Optimization via an adjoint method
Typically, J is optimized using a gradient method such as steepest gradient descent, the nonlinear conjugate gradient method, or the Limited-memory Broyden-Fletcher-Goldfarb-Shanno (LBFGS) method [28]. Gradient methods require ∂J/∂Θ to update J, but it is difficult to calculate this quantity because J does not explicitly include Θ, as seen in Eq. (13). Generally, J is fully determined by setting Θ = θ(0) through Eq. (5), so that J must be written as a function of Θ, i.e., J(Θ). According to Eq. (13), J is also a function of θ (t s ) (s = 1, · · · , n) that satisfies Eq. (5), not including Θ. Summarizing these two expressions for J: where t f is an arbitrary time later than t n , and λ(t) ∈ R N denotes a vector of the Lagrange multipliers that impose Eq.(5) as the constraint condition of θ. J is the time-dependent function denotes the Dirac delta function. Taking a variation of Eq. (15), we have a time evolution equation for λ: where Details of the derivation can be found in [26,29]. Solving Eq. (17) backwardly in time with the condition in Eq. (19), we obtain the objective ∂J/∂Θ as λ(0). Such a procedure to obtain the gradient of the cost function using the adjoint equation (Eq. (17)) is called the adjoint method.
When we apply the adjoint method to our problem, a variable transformation is needed in the process of updating Θ based on a gradient method, since Θ has the constraint shown in Eq. (7). The variable transformation converts the constrained optimization problem of Θ into an unconstrained one with respect to Ψ. The update procedure is as follows. After obtaining ∂J/∂Θ by the adjoint method based on Eqs. (17)- (19), we convert Θ to Ψ using Eq. (20) and ∂J/∂Θ to ∂J/∂Ψ as Using this formulation to update Ψ, we can obtain an updated Θ from the inverse transformation of Eq. (20): Although this update procedure does not allow Θ i to be exactly 0 or 1, owing to the definition of Eq. (22), Θ i can be sufficiently close to 0 or 1 to pose no problem in practical cases.
The adjoint method calculates ∂J/∂Θ for a fixed σ k , so that an optimization of σ k is to be done at the same time as Θ by substituting Eq. (14) into Eq. (16) every time Θ is updated.
The advantages of the adjoint method over sequential Bayesian filters are that only O(C) computations and O(N) memory are required to findΘ, where C is the number of computations needed to run the given simulation model from t = 0 to t = t f .

C. Evaluation of uncertainty via a second-order adjoint method
The adjoint method described in Section II B gives the optimum solutionΘ that maximizes p (Θ|D). However, it does not provide information about the behavior of p (Θ|D) in the neighborhood of Θ =Θ, which reflects the uncertainty in the estimation ofΘ. To extract such information, another procedure must be implemented on the adjoint method.
Considering that ∂J/∂Θ| Θ=Θ = 0, the Taylor expansion of J with respect to Θ −Θ is where terms of order higher than three have been neglected, and H is a Hessian matrix given by We normalize p(Θ|D) ∝ e −J into which Eq. (23) is substituted as where H −1 is the inverse of H and |•| denotes the determinant of •. Equation (25) where q ∈ R N is a vector with elements q l = 1 and q i =l = 0. The solutionr obviously includes (H −1 ) l,l asr l = N j=1 (H −1 ) l,j q j = (H −1 ) l,l . Note that H is a constant matrix that requires complex computations because of its large dimension. We must obtainr from an initial guess via an iterative technique such as the conjugate gradient method or conjugate residual method. The iterative method needs, in the way of the iteration, to compute each of the Hessian-vector products Hγ for a vector γ. The second-order adjoint method enables us to compute such Hessian-vector products.
Let ξ(t) ∈ R N and ζ(t) ∈ R N be perturbations ofθ andλ, which respectively correspond to θ and λ when Θ =Θ. Their time evolutions are given by The combination of Eqs. (27) and (28) where the PF variable φ(x, t) denotes the existence probability of the relevant phase, e.g., solid or liquid. The parameters τ and ǫ non-dimensionalize time and space, respectively, and m characterizes the velocity of the interface between the two phases. We assume that these parameters are time-invariant constants. We know that φ(x, t) should be constrained in 0 ≤ φ(x, t) ≤ 1, as it describes a probability. This condition is automatically satisfied by setting the initial phase to 0 ≤ φ(x, 0) ≤ 1, since φ = 0 and φ = 1 are the fixed points of Eq. (29).
Kobayashi's PF model underlies various PF models that describe physical phenomena such as dendrite growth [1, 5], crack propagation [32,33], and interface-driven pattern formation [34]. Therefore, Kobayashi's PF model is a good choice for verifying whether the proposed DA method works well, and is a first step towards future applications in more complex PF models.

B. Synthetic data
Twin experiments are often conducted in the field of DA to verify a newly developed method on the basis of synthetic data. The synthetic data are usually generated using the given simulation model, in which the true parameters and initial state are pre-determined.
Verification then proceeds by checking whether the DA method applied to the synthetic data reproduces the true parameters and initial state. In our case, the synthetic dataset is a time series of φ that is numerically calculated by Kobayashi's PF model with a true initial state and parameter m. The synthetic data are then contaminated by observation noise that follows a normal distribution with mean zero and variance σ 2 . The twin experiments are intended to confirm that the proposed method estimates the initial state and parameter with the associated uncertainty.
Let n x and n y be the numbers of grid points in the x-and y-directions, respectively, M be the total number of grid points, i.e., M = n x n y , and h be the grid spacing. A periodic boundary condition is imposed on the boundary of the computational domain. Letting φ i (t) be the phase at the i-th grid point, Eq. (29) can be rewritten as where △ i denotes a second-order difference operator acting on the four nearest neighbors of Figure 1 shows the time evolution of φ in two-dimensional space, where n x = 300, n y = 200, h = ǫ, and the time increment in the Euler method is 0.1τ . The assumed initial state is shown in Fig. 1(a), and the true value for the parameter m is assumed to be 0.1. The synthetic data are given by where ω i (t) is normally-distributed observation noise with mean zero and variance σ 2 .
When DA is applied to Kobayashi's PF model, the time evolution equation with respect to m is needed to construct a system model within the state-space model (Section II A).
The time evolution equation can be written as where b = m + 1/2, which denotes the normalization of m, i.e., 0 < b < 1.

C. Cost function
We consider the synthetic observation data to be the snapshots of φ obtained from t = T min to t = T max with time interval ∆T . Let T be the set of observation times and n be the number of observations. Combining Eqs. (13) and (31), J can be rewritten as The values of φ i (0) and b(0) that minimize Eq. (33) also minimize where since σ is independent of φ i (0) and b(0). When the optimumφ i (0) for φ i (0) andb(0) for b(0) are obtained by minimizing J ′ , the optimum σ can be obtained aŝ whereφ i (t) denotes φ i (t) simulated usingφ i (0) andb(0).

D. Procedures
Prior to applying the proposed method to the PF model, the constraint for the initial state 0 ≤ φ i (0) ≤ 1 is to be changed to 0 < φ i (0) < 1 to satisfy the domain of the variable transformation Eq. (20). The state variables θ(t) ∈ R M +1 and Θ ∈ R M +1 can be defined as The constraint for Θ becomes 0 < Θ i < 1 (i = 1, · · · , M + 1). The system models of Eqs. (30) and (32) are rewritten in terms of θ as We adopt the LBFGS technique [28] as the gradient method for optimizing Θ. Starting from an initial guess, the LBFGS method updates Θ by satisfying 0 < Θ i < 1 for all i owing to the variable transformation mentioned in Section II B. To tune the LBFGS method, we set the tolerance to 10 −8 and determine the step length by Armijo's rule [35]. Once the optimumΘ has been obtained, the optimum standard deviationσ can be estimated by Eq. (36) and the optimumm for m is given byΘ M +1 − 1/2 orb(0) − 1/2.
One of the most remarkable features of the proposed method is its evaluation of the uncertainties. These uncertainties can provide important information that is beneficial to updating the experimental design. In accordance with the procedure mentioned in Section II C, we consider a linear equation is a vector to be determined, and q ∈ R M +1 is a vector containing the elements q M +1 = 1 and q i =M +1 = 0. The uncertainty δm can be computed from the solution r as The conjugate residual method, in which the tolerance is set to 10 −8 , is adopted to solve the linear equation. The second-order adjoint method computes each of the Hessian-vector products H ′ γ that appears in the optimization process of the conjugate residual method.
Substituting the right-hand side of Eq. (38) for F in Eq. (27), the tangent linear model can be rewritten as with the initial condition ξ(0) = γ, whereθ denotes the state vector corresponding toΘ.

IV. RESULTS AND DISCUSSION
The proposed method is verified through three twin experiments: (I) estimation of the parameter m conditional on the true initial state φ true  Table I summarizes the parameter values used in the experiments. The first observation is assumed to occur at T min = 0.1τ , and the true value of m is assumed to be m true = 0.1. The initial guess used in the LBFGS method is m = −0.1. In this experiment, the true phase field at t = 0, i.e., φ true i (0) (see Fig. 1(a)) is given as the initial state. The results reported here are the average values ofm and δm over twenty trials with different random seeds. Figure 2 shows the results of Test I-(i). Figure 2 (a) indicates that the parameter estimation is successful, because the true parameter is included in the rangem − δm < m true < m + δm. The estimation of the uncertainty δm fails when T max is at its minimum, i.e., T max = 0.2τ , because insufficient data causer M +1 to become negative. Figure 2(b) indicates that δm is proportional to T −1.5 max in this range. The reason that the decrease is more rapid than the law of large numbers would suggest is related to the nonlinearity of Kobayashi's PF model. A theoretical evaluation actually indicates that δm is proportional to T −2.5 max when T max ≪ τ , and will converge with a constant value as T max increases. This is because no additional information is included in the observation data after φ(x, t) becomes almost uniform across the entire computational domain. Figure 3 shows the results of Test I-(ii). Figure 3(a) indicates that the proposed method successfully reproduces the true parameter, and Fig. 3(b) shows that δm is proportional to ∆T 0.5 when ∆T < τ , which seems to follow the law of large numbers. Figure 4 shows the results of Test I-(iii). Figure 4(a) indicates that the parameter estimation is again successful, and Fig. 4(b) shows that δm is proportional to σ. In summary, the results of Test I demonstrate that the proposed method is capable of estimating the true parameter and the associated uncertainty.

B. Twin experiment II: Simultaneous estimation
Twin experiment II investigates the influence of the observation noise in two cases: (i) when the noise has a small standard deviation (σ = 10 −4 ) and (ii) when the noise has a large standard deviation (σ = 0.3). The true parameter is assumed to be m true = 0.1, and the true initial state φ true i (0) is assumed to be the phase field shown in Fig. 1(a)  Twin experiment III confirms whether the estimation of m affects the generation of the spot-like pattern found in twin experiment II-(ii). Therefore, twin experiment III is set up to estimate only the initial state φ i (0) with a fixed parameter m = 0.1. The true initial state φ true i (0) is assumed to be the phase field shown in Fig. 1(a). The other observational conditions are T min = 8.0τ , T max = 30.0τ , ∆T = 0.1τ and σ = 0.3, and the initial guess is A spot-like pattern again appears in the estimated initial state (Fig. 6(b)) as the number of iterations increases. Note that the cost function J ′ is almost the same after the 31st step and after the final step, although Figs. 6(a) and (b) are much different. This is caused by a feature inherent in the two-dimensional Kobayashi's PF model. When and (c) the true initial state ( Fig. 1(a)).
a spot of radius R 0 evolves with time based on the PF model, whether it grows or decays depends on the relation between m and R 0 . Figure 7 shows the phase diagram obtained by the two-dimensional Kobayashi's PF model under the assumption of the axial symmetry.
The destiny of a given spot depends on whether the radius is above or below the critical line, which means the critical radius is approximately inversely proportional to m [36]. The radius of each spot in Fig. 6(b) is actually smaller than the critical radius, which is approximately 7.3ǫ in the case of m = 0.1.
Time evolutions starting from three different initial states shown in Fig. 6. These results indicate that the phase fields at the time of the first observation, i.e., t = 8.0τ , are completely coincident (Fig. 8).

V. CONCLUSIONS
This paper has described an adjoint-based DA method for massive autonomous models that not only determines the optimum estimates but also gives their uncertainties within a practical computation time and reasonable resource requirements. The uncertainties can be obtained as several diagonal components of the inverse Hessian matrix, which is the covariance matrix of the normal distribution that approximates the posterior PDF in the neighborhood of the optimum estimates. This proposed approach provides a new methodology for evaluating uncertainties using a second-order adjoint method that obtains Hessian-vector products in the process of the computation. Twin experiments using a two-dimensional Kobayashi PF model demonstrated the validity of the proposed method.
The uncertainties associated with physical quantities of interest depend on the quality and amount of data. Thus, conducting twin experiments prior to practical experiments allows us to determine how many observations are required to obtain the physical quantities of interest to the desired accuracy. Such feedback to practical experiments is already possible in systems with only a few degrees of freedom, but the proposed method makes this possible for massive simulation models.
The proposed method is not only applicable to PF models, but to various models described by autonomous systems, e.g., shallow water equations, Navier equations for elastic materials, and Boltzmann equations. The proposed method is of great utility for evaluating the uncertainties of model parameters through DA, which is important in various fields of science, even when using massive models.