A fully non-perturbative charm-quark tuning using machine learning

We present a relativistic heavy-quark action tuning for the charm sector on ensembles generated by the CLS consortium. We tune a particular 5-parameter action in an entirely non-perturbative and -- up to the chosen experimental input -- model-independent way using machine learning and the continuum experimental charmonium ground-state masses with various quantum numbers. In the end we are reasonably successful; obtaining a set of simulation parameters that we then verify produces the expected spectrum. In the future, we will use this action for finite-volume calculations of hadron-hadron scattering.

and [18], as do the pion masses. Further details on these ensembles can be found in [19]. Here, N conf indicates the number of well-separated gauge configurations used and N src the number of time-translated wall-sources averaged over per gauge configuration.
We perform calculations on gauge-field ensembles generated by the Coordinated Lattice Simulations (CLS) consortium [19,20] with 2+1 flavours of dynamical, non-perturbatively improved Wilson fermions [4]. We use 5 different lattice spacings ranging from a = 0.09929 to 0.04981 fm, and with pion masses between 421 and 282 MeV, on trajectories where the trace of the quark mass matrix Tr M is kept constant and approximately physical. In our study we use ensembles with either open [21] or periodic boundary conditions in time. The ensembles we considered for the tuning of the relativistic heavy-quark action are listed in Tab. I.

B. Charm-quark action
We will follow [22], also known in the literature as the "Tsukuba" action, and write down a general asymmetric Wilson action, Commonly, r t = 1 is chosen as this parameter is argued to be redundant. The remaining five free parameters are κ c , r S , ν, c E , and c B . Prescriptions in the literature exist for choosing these parameters based on mean-field-improved perturbation theory and for nonperturbative tuning of individual parameters. An in-depth discussion about our implementation of this action within the library openQCD can be found in App. A.
In [7,8], c sw = c E = c B ≡ u −3 0 is set to the value from tree-level tadpole-improved perturbation theory, with the tadpole factor u 0 determined from the fourth root of the plaquette, or the mean Landau link. In this approach only κ c is determined non-perturbatively (from tuning the kinetic mass M 2 of the D s -meson to its physical value), making predictions of the resulting charmonium spectrum and splittings.
In their approach the fine temporal lattice spacing combined with a tuning of the mass and anisotropy parameters that reproduce the physical η c mass and a relativistic dispersion relation enables the use of moving frames. Discretisation effects in the spin-dependent splittings are still, however, expected to be sizable in such an approach [6].
In [22], the full 5-parameter action of Eq. 1 was introduced. In [28,29][30] this action was semi non-perturbatively tuned on a set of ensembles with the same lattice spacing a −1 = 2.194(10) GeV where κ c was tuned to obtain the physical spin-average of the η c and J/ψ masses. The parameter r s was determined completely perturbatively using one-loop mean-field improved expressions, whereas c E and c B were determined from perturbative mass-corrections to the non-perturbatively measured c SW of [31]. The value of ν was tuned independently of all the other parameters to obtain the relativistic dispersion relation c 2 = 1.
We will discuss the validity of such an approach later in Sec. III B, as we allow all parameters to vary and can infer their inter-dependence. Having c 2 = 1 is quite useful as we will not have to determine masses of states through their dispersion relation; in the Fermilab language we are enforcing that M 1 = M 2 .
Finally, in [32] a more conventional non-perturbative tuning of the 4-parameter action with r t = 1 and r s = ν, and of the 3-parameter action where additionally c E = c B has been performed. As we see no reason why these parameters would be independent, in fact we will see they are related in a somewhat complicated manner, we prefer to follow a much more global and general approach.
We will follow App. A of [33] (in that context used for tuning an NRQCD action) and use partially-twisted Coulomb-gauge-fixed wall sources [34] to determine the spin-averaged dispersion relation of the η c and J/ψ. We will partially-twist [35,36] along the diagonal (θ, θ, θ) to maximally-reduce effects from rotational symmetry breaking and use the following 5 twist angles (in lattice units): such that our values of (ap) 2 are evenly-distributed between 0 and 3.
To determine c 2 and our masses am we will fit the η c and J/ψ correlators to the following form,

C. Chosen measurements and correlation functions
To make sure that we are using observables that are sensitive to all of the parameters in the RHQ action, we will be using both S-and P-wave charmonium ground-state masses (listed in Tab. II) in addition to the dispersion relation for the spin-averaged S-wave ground state. A possibility that would make this dependence a bit more obvious would have been to focus on the 1S hyperfine splitting, along with the S-wave -P-wave splitting, the spin-orbit splitting and the tensor splitting within the P-wave, as for example used in [9,37]. For computational reasons, we instead restrict the interpolator basis to the states that can be reached using the simple mesonic operators at rest, and compare these directly with their continuum analogs. Note that the J P C = 1 +− h c is however a good proxy for the spin-average of the 1P states, as the 1P hyperfine splitting is expected to be very small [38] (which can also be seen in a previous lattice determination [9]).
Throughout, we will use gauge-fixed wall sources with Gaussian sink-smearing implemented by the method discussed in App. B to optimise for the ground states and those with twisting.
An example of our zero-momentum states can be found in Fig. 2 for the ensemble N202.
For the open-boundary ensembles we perform a single wall-source propagator inversion at T /2, in the middle of the bulk, and symmetrise the resulting correlator. We must make sure to not perform our fits too close to the open boundary as these boundary-effects can be relatively strong, as seen in Fig. 1.

D. Neural-network charm-action tuning
Our idea is to randomly choose values of κ c , r s , ν, c E , and c B and measure the basic spectrum of Tab. II on a sufficient number of gauge configurations. We will then train a neural network in predicting these parameters for given input masses and c 2 , and finally use this model to predict the best guesses for the parameters of the action that lie closest to the experimental masses with spin-averaged c 2 = 1, hopefully within a precision of 1%. We acknowledge that all of the states will likely have residual cut-off effects with different slopes but the parameters of the action should be able to absorb the leading cut-off effects [5,6].
As we approach the (chiral-)continuum limit the states from the predicted parameters tend to the continuum masses, by design.
We strived to have at least about 30 different sets of run parameters randomly drawn from initially broad Gaussian distributions per ensemble [40]. For some ensembles we performed more runs to investigate the possible inter-ensemble dependencies, such as those emanating from the pion mass or the volume. Typically, after about 20 runs we start refining our parameter space by narrowing the new trial parameters and by feeding guesses back into the system to accelerate convergence. For each state entering the network we use 1000 bootstraps, which will both help give the network lots of data to train against and inform it of correlations between states.
We will first consider each individual ensemble separately to train the network, eventually investigating combinations of ensembles and performing more-global fits. We initially have 6 input parameters: the lattice determinations of the η c , J/ψ, χ c0 , χ c1 , and h c mesons, and the lattice η c − J/ψ spin-averaged dispersion relation c 2 . We will use a feed-forward network with one hidden layer of only 12 neurons [41] and an output layer of 5 neurons (κ c , r s , ν, c E , and c B ). On each of the layers we use the sigmoid activation and the Adam minimiser, with an adjusted learning rate, early-stopping, and a batch-size of 80. With the loss-function set as the mean-squared deviation. Often the model will converge rapidly to stable training and validation losses in fewer than 200 epochs.
Once the network understands the relation between the lattice-determined states and the charm-action parameters on an ensemble, we determine the best finite-lattice-spacing approximation to the action parameters needed to obtain something close to the experimental states with c 2 = 1 from the spin-averaged dispersion relation. Initial studies indicated that this approach gave predictions with heavier-than-physical η c and χ c0 states, and we decided to introduce so-called "class weights" to tell the network to prefer tuning κ c by a factor of 2 compared to the other outputs. Empirically, this makes the η c and χ c0 closer to physical while making the χ c1 and h c and c 2 a little less accurate, as will be seen in Fig. 4.
For the selection of our training and validation sets we randomly shuffle the runs and remove approximately 20% for validation. We run the training over 50 of these reshufflings and perform a weighted average (with weight proportional to the inverse of the validation loss) of the networks' best guesses to determine our predictions. In the following ( Tab. III) we quote our predicted parameters with 1σ errors, although this should not be construed as a true error but rather an indication of the variation of the parameters within the training runs we performed.

III. RESULTS
A. In Tab. III we present our results for the tuning on individual ensembles as well as a combined data-set built from the individual ones at fixed lattice spacing, which is practically an assumption of a flat approach to physical light and strange quark masses, combined with the (likely justified) assumption that finite volume effects in our charmonium observables are not relevant for our volumes. Typically the loss from the neural network reduces when the larger data-set "Comb" is used, and this larger data-set should protect us a little more from over-fitting. Both of the parameters c E and c B are greater than the value of (but tending toward) c SW from [42], with the hierarchy of c B > c E as expected perturbatively from [22].

B. On the validity of fixing parameters and tuning others
As c 2 = 1 from the spin-averaged dispersion relation is one of our inputs we can use this as a handle and investigate the dependence on the predicted physical parameters as this value is varied (and all the other states are still set to their physical values). If the deviation on κ c , r s , c E , c B , and ν is large as small changes are applied to this dispersion relation, then the model is suggesting that there is significant inter-dependence between these parameters and that this tuning shouldn't be done independently, as was performed in [28].  for the combined A653 and A654 ensembles as we varied the spin-averaged dispersion relation's c 2 . We can see that κ c and r s do not depend strongly on c 2 whereas ν, c E , and c B do.
With c E and c B being somewhat anti-correlated with ν. Although ν seems to be the main perpetrator in determining the effective speed of light, c E and c B both change accordingly and would likely need to be re-tuned to accommodate an independent tuning in ν.

C. Accuracy of our predictions and systematics
We now proceed to test our predicted parameters from Tab. III by performing the same measurement runs on the same configurations as the training was performed on, and comparing the determined states and the effective speed of light to the physical continuum states.
We will investigate both the combined and the individual results to illustrate the lack of measurable dependence on the pion mass within the precision of our determinations. A plot of our ability to reproduce the spectrum is shown in Fig. 4 and one can see that within our large errors there is good agreement, mostly this is due to the lack of precision of our lattice spacing.  left point of the pair) and the result from using all ensembles at the same value of β (the right point of a given pair). Our overall goal here was to to obtain c 2 = 1 to about 1%, which we achieve on most ensembles. For the dispersion relation, the single-ensemble tuning works a bit better than combining ensembles at each lattice spacing. However, the opposite will be true for the splitting discussed in the next few paragraphs. Note, that while there are some outliers, the situation is vastly improved compared to a standard Wilson-action (c B = c E = c sw , and r s = r t = ν = 1) as used for the charm quark in [17], and discussed in the next sub-section (Sec. III D).
To investigate residual discretisation effects within our approach, we take a look at the mass-splittings among our tuning states. In the left-hand-side panel of Fig. 6 we show the 1S hyperfine splitting between the J/ψ and η c mesons resulting from the neural net training with all ensembles at a given value of β. While the run parameters suggested by the neural net based on the training data do not lead to a particularly accurate 1S hyperfine splitting at coarse lattice spacing, this discrepancy does get smaller towards the continuum limit.
This suggests that our strategy of using the physical inputs for the tuning of the action parameters at each lattice spacing is working. In addition to the data from our final runs we also show the PDG value [39] as a magenta star, and an independent lattice determination that provides this splitting as a prediction [9]. For a plot of further results for this quantity see Fig. 8 of the same reference.
The right hand side of Fig. 6 shows a mass-splitting which is equal to the S-wave -P-wave splitting in the absence of a P-wave hyperfine splitting. In this quantity we expect a very mild heavy-quark dependence, as the physical S-wave -P-wave splittings in charmonium and bottomonium are almost the same. Just like for the hyperfine splitting, there is no obvious sign of issues with our tuning approach. Overall we tend to get somewhat smaller values for this splitting, although with large errors coming from our determination of h c . As our application for this action will mostly be qualitative studies of hadron-hadron scattering, we currently see no need to attempt to further improve the observed behaviour, which would require an improvement in statistical precision on our side combined with an improvement in the determination of the lattice spacing.

D. A tuning comparison on the same ensembles
In [17] a charm-quark tuning was performed that uses c E = c B = c SW and r s = ν = 1, varying κ c to achieve the physical continuum value of the D s -meson mass on the same ensembles as used in this work. This tuning treats the charm-quark on the same footing as the light and strange quarks by using the standard Wilson-clover action, one may well expect this to have significant discretisation effects as aM ηc becomes of order 1 for the coarse ensembles.  The hyperfine splitting in Fig. 7 shows a similar slope in a 2 between the two approaches although the D s -tuning lies systematically below that of our tuning and the physical hyperfine splitting. This is due to the ensembles used in this comparison lying at the SU(3) fsymmetric point and the D s -tuning compensating for the valence strange being lighter than its physical value. This has the side-effect that a precise charm-tuning via the D s is only valid at the physical point, or through extrapolations toward it when the strange-quark mass is unphysical.

IV. CONCLUSIONS AND OUTLOOK
We have shown that it is possible to non-perturbatively determine charm-quark action parameters reasonably precisely using machine learning techniques. This method is appealing as it doesn't rely on lattice perturbation theory or an underlying model to tune parameters.
Its main drawback is that a sufficiently-large number of measurement runs need to be made for training purposes, which could make it costly for large ensembles. Although, at the precision we are able to achieve, we see little pion-mass or finite-volume dependence of our tuning parameters. We also note that it appears impossible to exactly reproduce the chosen continuum states of charmonium at finite lattice spacing with this 5-parameter action, even though a good approximation can be achieved.
We see no reason why this approach couldn't be extended to, for example, b-quark physics using the same relativistic heavy-quark action. While residual discretisation effects are expected to be sizable for bottomonium [6], this approach should be very useful for refining some older predictions for exotic hadrons in the B-meson spectrum [12]. The same approach could equally-well be used to tune an effective action, such as NRQCD, provided one has enough continuum states to compare against that allow for a handle on the underlying simulation parameters. The method presented here, quite naturally, extends to even higherorder heavy-quark actions due to the inherent flexibility of neural-network fits.
In future works we will use this action for scattering studies using Lüscher's finite-volume method for heavy-light meson spectroscopy and charmonium spectroscopy, with the stochastic distillation technique [43,44]. In this regard a first step consists of testing the dispersion relations of heavy-light D and D s mesons in our approach. could not be utilised. On paper the arithmetic intensity of this charm action is about a factor of 2 more expensive than the usual Wilson action. Although an optimisation was used here to turn the underlying spinors into color-major, spin-minor order (OpenQCD-1.6 is spin-major, color-minor) to improve cache-coherence and perform fewer shuffling and register loading operations. It is likely that this charm-quark action is worse-conditioned than the usual Wilson action as it is found to have a larger clover term than c SW . Nevertheless, our implementation is roughly comparable to a valence strange quark inversion at physical m s [33] in OpenQCD using the DFL_SAP_GCR algorithm [45].
In initial investigations we found some deviations between charmonium correlators at large times due to the stopping condition of the L 2 -norm of the residual depending on the precision we used. To help resolve this we changed the stopping criteria to be the L ∞ -norm instead, which empirically behaved much better. S(x + r, t), when one goes to compute the contracted meson there are clearly double-sums and in the limit of R 2 → 3 L 2 2 we are performing two sums over the spatial volume, which becomes very expensive. Empirically, we find the cost in contractions scales quadratically with R 2 , and this sum quickly becomes the dominant cost of contractions as R 2 increases.
A more-efficient implementation would be to notice that the box-sink approach could be considered as a convolution with a particular step-function, f (x) = θ(r < R 2 ), and herein lies the connection to sink-smearing and a clue to efficient calculation of these propagators.
An approximation to this step-function could be a Gaussian with some width σ ∝ R 2 , f (x) = e − x 2 σ , (although in principle any arbitrary function could be used, for S-wave states something with rotational symmetry makes sense) and we apply the convolution using the usual (fast) Fourier transform prescription, where F is used to denote the Fourier transform, and likewise F −1 its inverse.
We have turned a somewhat expensive approximate convolution into a full convolution with an arbitrary function f (x), the cost of which are 12 × 12 complex spatial-volume FFTs per time-slice.
It is important to note that this approach only holds because of the Coulomb gaugefixed wall sources we use, as only in this case can one treat the links as identity matrices and hence completely ignore them and apply a continuous function. Such approaches have been known in the literature in the case of smearing either the source and/or the sink e.g.