Data-driven determination of the spin Hamiltonian parameters and their uncertainties: The case of the zigzag-chain compound KCu$_4$P$_3$O$_{12}$

We propose a data-driven technique to estimate the spin Hamiltonian, including uncertainty, from multiple physical quantities. Using our technique, an effective model of KCu$_4$P$_3$O$_{12}$ is determined from the experimentally observed magnetic susceptibility and magnetization curves with various temperatures under high magnetic fields. An effective model, which is the quantum Heisenberg model on a zigzag chain with eight spins having $J_1= -8.54 \pm 0.51 \{\rm meV}$, $J_2 = -2.67 \pm 1.13 \{\rm meV}$, $J_3 = -3.90 \pm 0.15 \{\rm meV}$, and $J_4 = 6.24 \pm 0.95 \{\rm meV}$, describes these measured results well. These uncertainties are successfully determined by the noise estimation. The relations among the estimated magnetic interactions or physical quantities are also discussed. The obtained effective model is useful to predict hard-to-measure properties such as spin gap, spin configuration at the ground state, magnetic specific heat, and magnetic entropy.


I. INTRODUCTION
An effective model in materials science often explains the origin of physical properties in materials. Many methods have been developed to construct an effective model for a target material, and they can be divided into two groups. One is ab initio calculations, which determine the model parameters in an assumed effective model by providing only basic information of the target material [1][2][3][4][5][6][7][8][9] . The other is a data-driven approach in which model parameters are determined so as to fit the experimentally measured data in the target material [10][11][12][13][14][15] .
The paper estimates a spin Hamiltonian as an effective model of KCu 4 P 3 O 12 by a data-driven approach. Figure 1 shows the crystal structure of KCu 4 P 3 O 12 (a =7.433Å, b = 7.839Å, c = 9.464Å, α = 108.28 • , β = 112.68 • , γ = 92.73 • , and space group: P1) 42 . Cu(II) ions have S = 1/2 isotropic Heisenberg spin, but their magnetic properties have not yet been reported. As will be explained later, the lattice structure of the Cu ions can be regarded as a zigzag chain consisting of eight Cu ions. Thus, the quantum Heisenberg model on a zigzag chain is the target of the spin Hamiltonian of KCu 4 P 3 O 12 to be estimated. We determine the superexchange interactions between Cu ions with uncertainty in this target model by a data-driven approach in which the experimentally measured susceptibility and magnetization curves are in- putted. Once an estimated spin Hamiltonian is established, theoretical analysis of the Hamiltonian predicts various magnetic properties, which cannot or have not been measured. These properties include the magnetic specific heat, magnetic entropy, spin configuration, and spin gap. These predictions are helpful to propose a further experimental plan and design.
The rest of the paper is organized as follows. Section II shows the experimental results of the magnetic susceptibility and magnetization curves as functions of temperatures in KCu 4 P 3 O 12 along with the experimental methods. Section III explains our data-driven approach to estimate a spin Hamiltonian. The posterior distribution of model parameters given in the data is constructed by a statistical noise model with respect to the experimental observations and the prior distribution of the model parameters. Plausible model parameters are determined by the maximizer of the posterior distribution. Furthermore, a systematic method, which allows the statistical uncertainty of the model parameters to be evaluated, is presented to estimate the amplitude of noise in the noise model. Under our formulation, multiple types of physical quantities can be used to estimate the effective model. Section IV explains the results of the spin Hamiltonian estimation with uncertainty by our data-driven approach. First, we assume the shape of the target spin Hamiltonian for KCu 4 P 3 O 12 from the viewpoint of the crystal structure. Next, since the L 2 regularization is adopted as a prior distribution to suppress an increase in the absolute values of the magnetic interactions, determination of the hyperparameter in the L 2 regularization is discussed. Subsequently, by considering the observation noise, four types of magnetic interactions are estimated with the error bars in a target spin Hamiltonian, and their relationships are discussed from the distributions of sampling data by the Monte Carlo method. Finally, various magnetic properties of KCu 4 P 3 O 12 are predicted. Section V presents the discussion and summary.

II. EXPERIMENTALLY MEASURED MAGNETIC PROPERTIES
Figure 2 (a) shows the temperature dependence of the magnetic susceptibility with a magnetic field of 0.01 T, which was measured by a superconducting quantum interference device magnetometer, magnetic property measurement system (Quantum Design). Crystalline KCu 4 P 3 O 12 powder was synthesized by a solidstate reaction. Even at sufficiently low temperatures, a finite constant value of susceptibility remains. The inset of Fig. 2 (a) shows the susceptibility upon removing this constant term. This data is used in the data-driven approach. Figure 2 (b) shows the magnetization curves at temperatures of 1.3 K, 4.2 K, 20 K, 30 K, and 50 K. These magnetization curves were measured using an induction method with a multilayer pulsed field magnet installed at the Institute for Solid State Physics, the University of Tokyo. Although a high magnetic field is imposed (≤ 40 T), the magnetization is not saturated. From the ESR measurements, the g-factor of Cu ions is determined to be 2.08.

A. Posterior distribution for effective model estimation
Our developed effective model estimation method in Ref. 39 is based on the Bayesian statistics. We consider that the target Hamiltonian to be estimated has K-types of model parameters: x = (x 1 , ..., x K ). Let y ex = {y ex (g l )} l=1,...,L be the set of experimentally measured physical quantities. This set depends on the external parameter g l , where the number of data is L. Using Bayes' theorem, the posterior distribution P (x|y ex ), which is the conditional probability of the model parameters given experimental data, is expressed as where P (x) is the prior distribution of the model parameters and P (y ex |x) is the likelihood function of y ex given x. Assuming that the observation noise for the measurements follows a Gaussian distribution with a mean of zero and a standard deviation of σ, the likelihood function is given by Here, {y cal (g l , x)} l=1,···,L , which is expressed as y cal (x), are the physical quantities calculated from the Hamiltonian at the external parameter g l theoretically.
To express the posterior distribution briefly, we introduce the "energy function" as a function of x and the noise σ is given by where an error function ∆(x, σ) is Then, the posterior distribution is expressed as From the viewpoint of the maximum a posterior (MAP) estimation, plausible model parameters to explain y ex are regarded as the maximizer of Eq. (5). Correspondingly, the MAP estimation is reduced to a minimization problem of the energy function. To search for the maximizer of Eq. (5) or the minimizer of Eq. (3), various optimization techniques such as the steepest-descent method, Markov chain Monte Carlo (MCMC) method 39 , and Bayesian optimization 37 can be used. Selecting the prior distribution of the model parameters P (x) is important to obtain a physically appropriate Hamiltonian 39 . The prior distribution for the effective model estimation can be regarded as the regularization terms in the minimization problem 43 . The most common are L 1 (LASSO) and L 2 (ridge) regularization with corresponding prior distributions P (x) = exp(−λ|x|) and P (x) = exp(−λ x 2 ), respectively, where hyperparameter λ determines the strength of regularization. If the L 1 regularization is applied, model parameters with large contributions can be selected based on the feature selection. On the other hand, the L 2 regularization suppresses the increase in the absolute values of the model parameters. Depending on the situation and purpose, it is necessary to select the proper prior distribution.

B. Observation noise estimation
To obtain the uncertainty of the model parameters by a MAP estimation, the width of the posterior distribution around the maximizer must be estimated. The noise amplitude σ is crucial to estimate the uncertainty [44][45][46][47] . In our framework, the noise amplitude σ is considered to be a hyperparameter and a plausible value is determined by minimizing the Bayes free-energy F (σ) defined as where Z(σ) is the normalization of the posterior distribution given by where Ω x is the support of the posterior distribution determined by the prior distribution. To evaluate F (σ), it is convenient to extend the posterior distribution P (x|y ex ) to P β (x|y ex ) with a "finite temperature", which is defined as where β is the inverse temperature and Z β (σ) is the normalization. By using Z β (σ), the Bayes free-energy is calculated as where the ensemble average E(x, σ) β with respect to Eq. (8) can be obtained by the MCMC method. Here, the third term in RHS of Eq. (9) does not depend on σ and is omitted below. The noise amplitude of the experimental data σ * is evaluated as the value where F (σ) is minimized. By MCMC sampling from Eq. (5) with the fixed σ * , the uncertainty of the model parameters can be evaluated.

C. Multiple sets of physical quantities
In our experiments of KCu 4 P 3 O 12 , six different sets of physical quantities, that is, the susceptibility and the magnetization curves under different temperatures, are obtained. Then different types of physical measurements are likely to be combined in our estimation problem.
For simplicity, the different physical quantities are assumed to be independently obtained. Then the likelihood function of a series of experimental results is defined as where the index n denotes the type of physical quantities, and N is the total number of types. Thus, the posterior distribution is written as (11) where L n is the number of data points for the n-th type measurement and the energy function is a weighted sum given as This means that the posterior distribution significantly depends on both L n and σ n . In this paper, for simplicity, the number of data points for all inputted physical quantities is arranged as L n = L for ∀n. Furthermore, the case where σ n does not depend on the type of physical quantities is considered. That is, the standard deviation of the observation noise is the same for all types of physical quantities: σ n = σ for ∀n . To make this assumption more realistic, the contributions of each type of physical quantity are arranged and the following normalization is imposed: .
By using the normalization, the upper and lower bounds for each type of physical quantity are 1 and 0, respectively.

A. Target Hamiltonian
Our model estimation method assumes the shape of the target Hamiltonian to be estimated. KCu 4 P 3 O 12 has a complicated three-dimensional structures (Fig. 1). On the other hand, by only focussing on the superexchange interactions, that is, the Cu-O-Cu paths in the crystal structure, the lattice structure of the Cu ions is approximated by a set of independent zigzag chains. Each chain has eight Cu ions, and the bond lengths between the Cu ions are smaller than 3.2Å. Note that this approximation should be valid except for extremely low temperatures. For low temperatures, weak interactions besides these superexchange interactions may affect magnetism 14,15,48 , and this approximation is poor. In this paper, by focussing on the magnetic properties except for those at low temperatures, the target Hamiltonian is set to a Heisenberg model on the chain with eight spins. Figure 3 shows the lattice structure. Although symmetry considerations imply that there are seven types of nearest neighbor interactions in each chain, only four types of independent interactions appear (i.e., J 1 , J 2 , J 3 , and J 4 in Fig. 3). Furthermore, since the magnetic ion is Cu, anisotropy should not be considered. Consequently, the target Hamiltonian with isotropic Heisenberg spins is written as (15) where J 1 = J 1,2 = J 7,8 , J 2 = J 2,3 = J 6,7 , J 3 = J 3,4 = J 5,6 , and J 4 = J 4,5 . Here, (ŝ x i ,ŝ y i ,ŝ z i ) are the S = 1/2 Pauli matrices on ith site. The magnetic properties of this target quantum Hamiltonian can be easily calculated by the exact diagonalization method. If an effective model with a large number of spins need to be estimated, another simulation method such as the Monte Carlo method or mean-field calculation should be performed.
Since many magnetic interactions are already trimmed, there is no need to use the L 1 regularization for the prior distribution. On the other hand, large magnetic interactions are not preferable for the physical sense. To avoid an increase in the absolute values of the estimated magnetic interactions, we adopt the L 2 regularization as the prior distribution. Furthermore, to estimate an effective model to roughly capture the magnetic properties under wide temperature and magnetic field ranges, six types of physical quantities (N = 6) measured in KCu 4 P 3 O 12 (see Fig. 2) are used as inputted data. In addition, the number of data points in each inputted physical quantity is fixed as L = 100.

B. Determination of hyperparameter in the L2 regularization
To determine the hyperparameter λ in the L 2 regularization, the maximizer of the posterior distribution is analyzed. Since σ does not generally depend on the value of the model parameters from the viewpoint of a MAP estimation, we search for the minimizer of the following equations: where α := 2σ 2 λ, and E (x, α) and ∆ (x) are the energy function and error function normalized by 1/2σ 2 . Note that determining the plausible value of α is equivalent to deciding the hyperparameter λ in the L 2 regularization depending on σ.
The minimizer of E (x, α) is searched by the MCMC method where the probability distribution is proportional to exp[−E (x, α)]. The MCMC samplings are performed by emcee package 49,50 for various α. In each MCMC sampling, the stretch move 51 is used for the update scheme of states, and 8,000 states are sampled. Among the sampled states with fixed α, the model parameters x * such that E (x, α) is minimized are selected. As mentioned in Sec. III A, this optimization of E (x, α) can be also performed by various fast optimization techniques instead of MCMC. Figure 4 shows the α dependence of E (x * , α), ∆ (x * ), and x * 2 . Here, all experimental results, that is, the magnetic susceptibility with 0.01 T and magnetization curves at five temperatures (1.3 K, 4.2 K, 20 K, 30 K, and 50 K), for KCu 4 P 3 O 12 are inputted. The result of E (x * , α) shows an elbow curve, indicating a boundary between regions, where in each part, the regularization is effective or ineffective. Thus, the appropriate α is selected as the elbow position. That is, α * = 10 −2 , because we want to find the spin Hamiltonian not only to explain the experimental results but also to obtain small magnetic interactions as possible. This is similar to the determination technique of the cluster number in clustering analysis 52,53 .
In the error function ∆ (x * ), the elbow curve is also observed, but the elbow position deviates in E (x * , α). Preferably, ∆ (x * ) is a small value at α * = 10 −2 , and if the value of α decreases, the error is almost unchanged. In addition, x * 2 monotonically decreases against α, and it becomes sufficiently small at α * = 10 −2 . In this stage, the estimated magnetic interactions (i.e., x * ) at α * = 10 −2 are J 1 = −6.65 meV, J 2 = −5.49 meV, J 3 = −4.96 meV, and J 4 = 6.92 meV. The absolute values of these interactions are proper in the Cu systems 14,15 . These facts indicate that the L 2 regularization is useful to estimate a spin Hamiltonian with small magnetic interactions.

C. Evaluation of the uncertainty and distribution of sampling data
To evaluate the uncertainty of the estimated magnetic interactions, the noise amplitude is assessed according to Sec. III B. Here, the Bayes free-energy in the estimation problem for KCu 4 P 3 O 12 is given by where N = 6 and L = 100. Figure 5 (a) shows the inverse temperature β dependence of E (x, α * ) β by MCMC calculations using emcee package where the Monte Carlo step is 8,000. The average values of eight independent runs are plotted, and the error bars denote the 95% confident intervals. It monotonically decreases as a function against β. Using the results of E (x, α * ) β , Fig. 5 (b) shows the σ dependence of F (σ) which is calculated by numerical integration with the trapezoidal rule. The minimum value of F (σ) is obtained at 2σ * 2 = 10 −2 . This value is a plausible standard deviation for the observation noise. This means that the noise amplitude is ∼ 7% when the six types of physical quantities are inputted. Using the appropriate observation noise, MCMC sampling is performed around the estimated magnetic interactions in Sec. IV B when the probability distribution is proportional to exp(−E (x, α * )/2σ * 2 ) with 2σ * 2 = 10 −2 and α * = 10 −2 . Figure 6 shows the scatterplots of the sampling results between all combinations of J 1 , J 2 , J 3 , J 4 , and E(x). Here, the last 3,000 sampling results in the MCMC with 8,000 Mote Carlo steps are plotted. First, we determine the estimated magnetic interactions and the error bars as 95 % confidence interval are evaluated by these distributions as J 4 = 6.24 ± 0.95 meV. (22) Note that the inputted experimental results (Fig. 2) are not noisy, and the evaluated error bars for the estimated interactions are sufficiently small. Thus, to consider more noisy cases, the artificial noises are added to the experimental results of KCu 4 P 3 O 12 . Figures S1 and S2 in supplemental material show the estimation results depending on the artificial noise. We confirm that if the artificial noise is increased, the value of σ * by our noise estimation is also increased and consequently the error bars for the estimated interactions become large. Thus, we conclude that our estimation method can be applied to the noisy experimental results and correctly evaluate the uncertainty in the estimated effective model. Next, from the distributions of magnetic interactions (Fig. 6), the region of J 3 realizing good fitting is narrower compared to the other interactions. That is, J 3 is the most sensitive. In contrast, changing J 2 has a smaller impact on the fitting error, indicating J 2 has the highest uncertainty. Furthermore, we can see that J 2 and J 3 are positively correlated, while J 1 negatively is correlated with J 2 and J 3 . If J 1 increases, J 2 and J 3 should decrease to maintain the good fitting. On the other hand, J 4 is almost independent of the magnetic interactions. Consequently, J 4 can be freely tuned within the error bar. We discuss these extracted correlations from physical insights. Considering the lattice symme- try, J 2 and J 3 are placed in a similar environment, for example, there are two places in the lattice and the interacted spins are not edge one. Thus, J 2 and J 3 would have a positive correlation. Furthermore, J 1 is an antiferromagnetic interaction as well as J 2 and J 3 , and J 1 should become smaller with increasing J 2 and J 3 to keep the energy scale of antiferromagnetic interactions in the Hamiltonian, which means that J 1 is negatively correlated with J 2 and J 3 . We note that these correlations obtained in the estimation strongly depend on the target Hamiltonian and estimated interaction parameters. Thus, these discussions are correlations of the estimated interaction parameters in the effective model, not correlations of the magnetic interactions in the real materials. However, these extracted correlations can deepen an understanding of properties of the estimated effective model. Figure 7 (a) compares the physical quantities between the experimental and calculated results by the estimated spin Hamiltonian with J 1 = −8.54 meV, J 2 = −2.67 meV, J 3 = −3.90 meV, and J 4 = 6.24 meV. A good fit can be obtained except for the low temperature magnetization curves. In particular, for 1.3 K, the fitting result is worse and the 1/4 magnetic plateau-like behavior is observed in the calculation results. Since our target Hamiltonian only has eight spins, the appearance of such plateau cannot be avoided at low temperatures. To describe the low temperature magnetizations, the target Hamiltonian must include further magnetic interactions besides the four interactions. Moreover, Fig. 7 (b) shows the scatterplots of the sampling results by the same MCMC in Fig. 6 for all combinations of errors between the experimental and calculated results, i.e.,

D. Prediction of the magnetic properties
The most important benefit from estimating the effective model is predicting magnetic properties that cannot be easily or have not been measured. Thus, the value of the spin gap, which is the energy gap between the ground and excited states, the spin configuration at the ground state without a magnetic field, and the temperature dependences of magnetic specific heat and magnetic entropy, are calculated (Fig. 8). The predictions indicate that the magnetic specific heat in KCu 4 P 3 O 12 will have a peak around 30 K, but increasing the magnetic field suppresses this peak ( Fig. 8 (b)). In this magnet, the magnetic entropy will be almost unchanged by a magnetic field over 20 K (Fig. 8 (c)). Furthermore, to predict the magnetic refrigeration property [54][55][56][57][58][59] , the change in magnetic entropy is also evaluated (Fig. 8  (d)). The inverse magnetocaloric effect will be observed in KCu 4 P 3 O 12 , which is a characteristic behavior of antiferromagnets [60][61][62][63] , and the magnetic entropy change should be quite small. In this way, using an effective model determined by our data-driven approach, difficultto-measure properties can be predicted, improving the understanding of the magnetic properties of KCu 4 P 3 O 12 .

V. DISCUSSION AND SUMMARY
We have determined the spin Hamiltonian of KCu 4 P 3 O 12 using a data-driven approach. The flow of our prescription of data-driven approach is as follows. (i) Assume a target effective model and the posterior distribution. This constitutes the difference between the experimental and calculated results obtained by an effective model and the appropriate prior distribution of model parameters. (ii) Determine an appropriate hyperparameter in the prior distribution by the elbow method for the MAP estimation results and estimated model parameters. (iii) Obtain a plausible observation noise to minimize the Bayes free-energy. (iv) Perform MCMC samplings using an estimated noise amplitude around the estimated model parameters in (ii). From the obtained sampling data distributions, determine the model parameters with uncertainty. (v) Predict various properties, which cannot be easily measured in experiments using the estimated effective model. Our data-driven approach found that the spin Hamiltonian of KCu 4 P 3 O 12 is the quantum Heisenberg model on the zigzag chain with eight spins of J 1 = −8.54 ± 0.51 meV, J 2 = −2.67 ± 1.13 meV, J 3 = −3.90 ± 0.15 meV, and J 4 = 6.24 ± 0.95 meV. In this estimation, the magnetic susceptibility and magnetization curves at various temperatures are fitted. This model can describe the experimental results with very small error, except for the extremely low temperature magnetization curves, demonstrating that such a data-driven approach is useful to estimate the effective model. Our approach should be useful in parallel with ab initio calculations.
On the other hand, the results by the data-driven approach should strongly depend on the inputted data. Actually, we estimated different magnetic interactions when the inputted data is only one type of physical quantity (See Figs. S3 -S8 in supplemental material.). In this case, the fitting of the inputted quantities is well performed, but the experimental results, which are not used in the estimation are not well fitted. This fact means that each physical quantity can be well described by multiple sets of magnetic interactions. Consequently, preparing multiple kinds of physical quantities as the input data will not only estimate model parameters more uniquely but will also produce a more reliable effective model.
Recently, the parameter estimation for real materials by data-driven techniques has been frequently performed. These data driven techniques are roughly divided into two strategies. (i) A regression model that explains material parameters from feature variables such as material composition and structure is constructed from a large amount of collected data for known materials 18,64,65 . The aim of this strategy is to predict parameters in unknown materials using the trained regression model via datadriven approach. (ii) The material parameters are determined by minimizing the discrepancy between physical quantities of the target material and those obtained by a computational model with the material parameters [66][67][68] . The aim of this strategy is to understand properties of the target material through the estimated parameters in the computational model. The former, however, requires the preparation of a large amount of data sets of material compositions and structures paired with target parameters, i.e., magnetic interactions in our problem. Therefore, for the purpose of an estimation of magnetic interactions, the former is not necessarily suitable, but the latter is more suitable, and indeed our method belongs to the latter strategy. Furthermore, our proposed noise estimation method using MCMC to evaluate the uncertainty of estimated parameters can be applied to various methods belonging to the latter strategy, and we believe that it is also useful in various data-driven techniques to estimate materials parameters in the computational model. Our data-driven approach will come into its own when easy-to-measure properties obtained in the laboratory such as SQUID are inputted. That is, our data-driven approach can predict difficult-to-measure properties, which are obtained by large-scale experimental equipment such as neutron scattering. Thereby, our data-driven approach will reduce the cost of materials development and accelerate discoveries of novel materials.