Extending the average spectrum method: Grid points sampling and density averaging

Analytic continuation of imaginary time or frequency data to the real axis is a crucial step in extracting dynamical properties from quantum Monte Carlo simulations. The average spectrum method provides an elegant solution by integrating over all non-negative spectra weighted by how well they fit the data. In a recent paper, we found that discretizing the functional integral as in Feynman's path-integrals, does not have a well-defined continuum limit. Instead, the limit depends on the discretization grid whose choice may strongly bias the results. In this paper, we demonstrate that sampling the grid points, instead of keeping them fixed, also changes the functional integral limit and rather helps to overcome the bias considerably. We provide an efficient algorithm for doing the sampling and show how the density of the grid points acts now as a default model with a significantly reduced biasing effect. The remaining bias depends mainly on the width of the grid density, so we go one step further and average also over densities of different widths. For a certain class of densities, including Gaussian and exponential ones, this width averaging can be done analytically, eliminating the need to specify this parameter without introducing any computational overhead.

Analytic continuation of imaginary time or frequency data to the real axis is a crucial step in extracting dynamical properties from quantum Monte Carlo simulations. The average spectrum method provides an elegant solution by integrating over all non-negative spectra weighted by how well they fit the data. In a recent paper, we found that discretizing the functional integral as in Feynman's path-integrals, does not have a well-defined continuum limit. Instead, the limit depends on the discretization grid whose choice may strongly bias the results. In this paper, we demonstrate that sampling the grid points, instead of keeping them fixed, also changes the functional integral limit and rather helps to overcome the bias considerably. We provide an efficient algorithm for doing the sampling and show how the density of the grid points acts now as a default model with a significantly reduced biasing effect. The remaining bias depends mainly on the width of the grid density, so we go one step further and average also over densities of different widths. For a certain class of densities, including Gaussian and exponential ones, this width averaging can be done analytically, eliminating the need to specify this parameter without introducing any computational overhead.
Quantum Monte Carlo (QMC) simulations have become an indispensable tool for studying quantum manybody systems. They often compute Green or correlation functions on the imaginary-time axis or Matsubara frequencies, which then need to be analytically continued to the real axis to extract dynamical information about the system of interest. One important example of analytic continuation is obtaining the spectral function A(ω) at real frequencies from finite-temperature Green function values G(τ ) at imaginary times τ ∈ [0, β], where the two functions are related by and β = 1/T is the inverse temperature. Another example is determining the optical conductivity spectrum σ(ω) from the current-current correlation function Π(iω m ) evaluated at bosonic Matsubara frequencies ω n = 2mπ/β. The relation between the two reads Π(iω m ) = 2 π ∞ 0 dω ω 2 ω 2 m + ω 2 σ(ω) . (2) In general, the analytic continuation problem reduces to solving an integral equation. The difficulty is that, in the presence of noise, this is an inherently ill-posed problem. When evaluating the data on the imaginary axis, oscillations and sharp features in the spectrum get smoothed and noise gets damped due to the integration. This makes the inverse problem of reconstructing the details of the spectrum extremely challenging. Without regularization, small noise on the data leads to catastrophically large errors on the best data-fitting spectrum.
There are different approaches to tackle this problem including the maximum entropy method [1][2][3][4][5], the average spectrum method [6][7][8][9][10][11][12][13], Pade approximation techniques [14][15][16][17], and some recent machine-learning based approaches [18][19][20][21]. The most commonly used approach is the maximum entropy method (MaxEnt), which is rooted in Bayesian inference. It tries to find a spectrum by balancing the fit to the data and the entropy relative to some default model. This entropy term acts as a regularization that penalizes deviations from the featureless default model and thus suppresses rapid oscillations that otherwise would dominate the solution.
The average spectrum method (ASM) is an alternative Bayesian approach with the following premise: Since the data is not exact, all spectra that fit the data equallywell, up to the noise level on that data, should be considered equally. As a result, ASM integrates over all admissible spectra weighted by how well they fit the data. The average spectrum method makes no assumptions about the smoothness of the spectrum and any regularization comes from averaging only, which is expected to smooth out details not supported by the data: larger noise leads to more smoothing.
We continue here our work published in Ref. [13], which in the following will be referred to as ASM1. In ASM1, we showed that a naive discretization of the functional integral involved in the average spectrum method does not produce unique results. The results are biased by the discretization grid on which the spectrum is represented. We constructed the grids explicitly by mapping the real-frequency axis to the unit interval using a density function and then discretizing this interval using a regular grid. We found that the density function of the grid points plays the role of a default model while the number of grid points acts as a regularization parameter. We proposed a practical recipe for choosing a reliable grid by comparing the results of different grid densities and choosing the one with the least dependence on the number of grid points.
In the present paper, we generalize the average spec-trum method by releasing the grid points and sampling their position from a prior grid density. Although, we still need to specify a grid density and a number of points, we show that this the bias is significantly reduced and we observe that it depends mainly on the width rather than the shape of its density. The proper width can be chosen according to the same type of recipe used earlier for the fixed grid. We can, however, go further and extend the method to sample over a whole class of grid densities of variable widths. The method is then able to automatically relocate the grid points and concentrate them into the important region of the real-frequency axis. Test cases show that this width-sampling method gives good results resolving the features of the spectrum without the need for fine-tuning the grid.

A. Background
Mathematically, the analytic continuation problem can be formulated as a Fredholm integral equation of the first kind where the left-hand side g(y) represents QMC data, while the integral kernel K(y, x) is a continuous function known analytically. The goal is estimating the spectrum f (x), an integrable non-negative function. QMC provides noisy and incomplete samples of the data evaluated at a finite number of y-coordinates. Using the central limit theorem, one can assume that the exact data vector g has a Gaussian distribution with mean equal to the sample meanḡ and covariance matrix equal to the sample covariance matrix C. Then the likelihood of a spectrum f being the exact one is proportional to where g[f ] is the data corresponding to the spectrum f and χ 2 [f ] is its fit to the measured data. Maximizing the likelihood (or equivalently minimizing the fit χ 2 ) gives a spectrum dominated by diverging amplitudes that are extremely sensitive to the noise of the data. This illposedness constitutes the primary difficulty of the analytic continuation problem. The average spectrum method uses the likelihood alongside our prior knowledge about the non-negativity of the spectrum and computes a weighted average over all non-negative spectra as an estimate of the true spectrum Other exact prior knowledge, like sum rules, can be easily incorporated by restricting the averaging to the spectra satisfying them. Despite the apparent elegance of this functional integral formulation, we found in ASM1 that it is not a welldefined expression because the result depends on how the spectrum f (x) is discretized. In ASM1 the discretization was specified by a grid density function ρ(x) and the total number of grid points N . Using ρ(x), we mapped the domain of the variable x into a unit interval of an auxiliary variable z(x) := x xmin dx ρ(x ), discretized it uniformly with N points, and mapped the points back to x. We showed for such grids that the density ρ(x) plays the role of a default mode, while N plays the role of a regularization parameter.
To demonstrate this role of the grid density ρ(x) and simultaneously introduce some notation, we repeat the argument of appealing to symmetry when the data contains no information other than normalization to unity. In that case, the spectral integrals over different grid in- The widths of these intervals are inversely proportional to the grid density with x i ∈ I i being the grid point representing the i-th interval. Therefore, in the absence of data, the estimated mean values of the spectrum are equal to the grid density justifying calling it a default model.

B. Formalism
To spell out the dependence of the average spectrum method on the discretization grid explicitly, let us parameterize the spectrum by its grid points x and its integrals over the grid intervalsf , defined in (6). Together they are enough to determine the data (up to a discretization error) without knowing the details of the spectrum inside the grid intervals. This follows from the first meanvalue theorem for integrals, which states that for any non-negative integrable function f (x), there is a specific point Using this, we can approximate the data as following The approximation comes from using the grid points x i instead of the unknown optimal points x i , which depend on both the spectrum f (x) and the kernel K(x, y). The approximation error is proportional to the difference between the maximum and minimum values of the kernel K(x, y) inside each interval. Since the kernel is a continuous function of x, this error gets smaller, the smaller the intervals are and the smoother the kernel is. Using a fine enough grid, the error becomes so small that it is negligible in comparison to the noise on the data. Contrary to the data, the spectral integralsf do not provide sufficient information to determine the spectrum completely and some assumptions about its behavior inside the grid intervals are needed. We may also need to specify how to go from grid points x to the intervals edgesx. To stay as general as possible, we encode whatever assumptions we have in the object f (f , x; x), which maps a set of gird points x and integralsf into a nonnegative integrable function of the continuous variable x. A simple choice is using delta functions located at the grid points another is a piece-wise constant function The advantage of the above general formulation is that we do not need to know the exact form of f (f , x; x) when sampling the spectral functions, it becomes relevant only when averaging them, which will be discussed in Sec. II B. The functional integral of the average spectrum can now be written as a multidimensional integral over parameterized functions f (f , x; x) given the mean data vectorḡ, the covariance matrix C and the gird points x. The averaging is carried over all spectral integralsf on that grid, weighted by their fit where the dependence of the fit χ 2 on the grid points x is through the discretized kernel matrix K evaluated at these points II. RELEASING GRID POINTS The form of Eq. (13) spells out the grid dependence of ASM explicitly and is suggestive of a straightforward extension of the method. Instead of having the grid points fixed at regular intervals based on some density function ρ(x), let us sample them freely from this distribution and average the results (15) Although we still need to specify the density function ρ(x) and the number of grid points N as before, we expect the effect on the results to be weaker than in the fixedgrid scenario because the data is now allowed to influence the positions of the grid points during the sampling.
It is worth noting that sampling grid points has been done before in the context of the average spectrum method in Refs. [8,11]. In these papers, the spectrum was represented as a superposition of delta functions whose both weights and positions are sampled. However, it was implicitly assumed that sampling the positions is a technical detail and that the result would be the same as the typical average spectrum method with fixed positions [6,7,10]. This is probably due to the mistaken belief in the existence and uniqueness of the functional integral of Eq. (5). As shown in ASM1, this functional integral does not exist and the result depends on the discretization grid. In Sec. III we will show that the results, in fact, depend on whether the grid points are sampled and rather improves significantly. But before that, we will describe an efficient algorithm for performing the sampling and how the spectra of different grids are averaged.

A. Sampling algorithm
The multidimensional integral in Eq. (15) is evaluated using a Monte Carlo sampling algorithm. We start from some initial spectrum on an initial grid. In practice, we use the same grid as in the fixed-grid case, i.e., we choose the grid points at regular intervals based on the density ρ(x) as in ASM1. We also use the non-negative least squares (NNLS) solution on that grid as the initial spectrum [22].
The spectral integrals are then updated on the current grid using the blocked-mode sampling introduced in ASM1. Given the spectral integrals, the grid points are updated one at a time using the Metropolis-Hastings algorithm explained below. All samples off and x are stored during the sampling, and the average spectrum f ASM (ρ, N ; x) can be evaluated later at any point x by evaluating each sampled model f (f , x; x) at that point and averaging the results.
The Metropolis-Hastings algorithm for sampling grid points has the following acceptance ratio where q(x i → x i ) is the proposal distribution of moving grid point i from an old position x i to a new position x i .
As a proposal distribution, we use a Gaussian approximation of the data factor e −χ 2 /2 . It is derived by writing the data fit as a function of x i and expanding the kernel around x i . Keeping only terms to second-order in x i , we obtain where r := g − j K(x j )f j is the old residual vector and are kernel derivatives. By completing the squares, the data fit can be written in the following suggestive form where µ χ := x i + σ 2 χfi r T ∂K i is the mean of the Gaussian approximation and σ −2 Using this as a proposal probability gives in general a high acceptance rate because the data is typically the dominating factor in the acceptance ratio (16). However, when a grid point is far away from zero or its weight is very small, the Gaussian (18) is quite wide and the prior density ρ(x) becomes important. To account for such cases, we combine the data Gaussian with another one, centered at the old position x i , whose width we choose equal to the width of the prior density. The latter is used by itself in the proposal probability should the data fit have negative curvature so that the approximation (18) fails.
Samples are produced by iteratively sampling all the grid points x followed by sampling all the spectral integralsf . The grid points are sampled one after the other in a random order using the above algorithm. The spectral integrals are sampled using blocked-mode sampling with a random block size. The movement of grid points implies that the kernel matrix is changing, so the singular value decomposition (SVD) of its blocks should be recalculated in each iteration after all the grid points have been updated. This decomposition costs O(N 2 b M ) where N b is the block size and M is the data vector size. Since this is computationally expensive, we restrict the block size to a maximum value. For a fixed grid, this leads to more correlation between samples because the global updates using larger blocks are skipped. For a released grid, however, the effect is not as severe because the movement of the grid points compensates for the lack of these global updates. We found that a maximum block size of 32 provides a good balance between the cost of each sample and the correlation between samples. Due to grid points sampling and SVD, the computational cost of a single sample of the average spectrum method using released grid points is larger than that of the fixed grid. However, we found that the samples of the released-grid calculations are much less correlated than those on a fixed-grid so that the total cost is effectively similar. The execution time for any of the results presented later in this paper does not exceed 2 minutes on a typical modern laptop.

B. Binning and averaging
Averaging samples requires evaluating the sampled spectra on some fixed grid. We call this grid the binning grid to distinguish it from the sampled one x. Let us denote its intervals, the bins, as B i . The binning would be different depending on the mapping f (f , x; x). Assuming the delta functions representation of Eq. (11), the i-th bin average is computed as where the superscript k is indexing the samples and L is the total number of samples. Alternatively, we can assume a constant value inside each interval I k j of the k-th grid sample, as done in Eq. (12). This implies that the corresponding spectral integral f k j should be split proportionally among the bins that intersects this interval This type of binning can be thought of as a linear interpolation of (19) and thus leads to an average with less noise from binning. Nevertheless, whatever binning we use, the averages are similar when using reasonably large grid sizes N . For simplicity, we typically use the delta binning.
Note that the bin size also affects the statistical error of its average. Larger bins contain more sampled grid points and thus have lower fluctuations and less noisy averages. Therefore, in practice we use the following binning, which gives roughly equal error bars across the bins: aggregate all the grid samples and choose the bins such that each bin contains roughly the same number of grid points. When we want to compare with fixed-grid ASM, we use, however, the fixed grid for binning, making comparisons easier. Also note that the error-bars of a binning grid will not change when doubling the number of grid points and halving the number of samples.

III. DENSITY DEPENDENCE
To study the effect of releasing the grid points, we choose a test case first introduced in Ref. [23], which we studied further using fixed-grids in ASM1. We want to reconstruct an optical conductivity given by p=0,±1 W |p| 1 + ((ω+sgn(p)ε |p| )/Γ |p| ) 2 (21) where the sum has three terms: a peak of weight W 0 = 0.3 and width Γ 0 = 0.6 centered at zero, and two peaks of weight W 1 = 0.2 and width Γ 1 = 1.2 centered at ω = ±ε 1 = ±3. All terms are multiplied by a damping factor with Γ e = 4 for a faster decay at large frequencies. To get the necessary data for analytic continuation, we compute the imaginary-frequency correlation function Π(iω m ) analytically using Eq. (2) on the first 60 Matsubara frequencies ω m = 2mπ/β with inverse temperature β = 15. We then add relative Gaussian noise with a standard deviation of 10 −3 to simulate the noise in real QMC data.
In Fig. 1, we compare ASM results using fixed and released grid points. The same grid densities and numbers of grid points were used in both cases: uniform grid densities with increasing cutoffs: 8, 16, 32, and 64 and correspondingly increasing number of points N = 32, 64, 128, and 256. It is clear that using a fixed uniform grid biases the results significantly, leading to more pronounced spurious features as the cutoff increases. This completely disappears when the grid points are released so that the cutoff has negligible effect on the result. In Fig. 2, we also show results for different grid densities with comparable widths. Also here, by releasing the grid points, the influence of the shape of the grid density is reduced significantly. This is understandable as now the grid points can move to the region where the spectrum is concentrated, allowing for the data to override the prior information From this we might conclude that sampling the grid points solves the bias problem. We know, however, that in the absence of data except for a sum rule on the spectrum, averaging spectra on any grid just gives a result proportional to the grid density function ρ(x). Thus, also released-grid ASM must have a default model bias.
To see this bias clearly, we consider a somewhat contrived test case: we seek to recover a Gaussian spectral function of width 0.5 from Green function data computed using Eq. (1). The data are evaluated on 60 τ -points equally spaced in the interval [0, β] with β = 50. As before, we add relative Gaussian noise with a standard deviation of 10 −3 . One might think that reconstructing such a featureless Gaussian should be a trivial and boring task. However, it turns out that avoiding spurious features in this setting is more challenging than anticipated. In Fig. 3, we show ASM results using fixed and released grids. We use a Gaussian grid density with different widths: 0.5, 1.0, 2.0. When the default model (grid density) width equals the width of the exact spectrum 0.5, both methods give perfect results as expected. As the width increases, spurious features start to develop quickly. Although these features are milder for releasedgrid than for fixed-grid ASM, they are still clearly visi-   Fig. 3. The results are obtained using Bryan's method implemented in Ref. [24]. Other methods for choosing the regularization parameter, i.e., classic and historic MaxEnt give indistinguishable results.
ble, in contrast with the earlier optical conductivity case. This indicates that the data here is weaker in forcing the grid points to stay in the frequency region of interest. A practical solution for choosing the best width is to use the same criterion as the one employed in choosing the fixed-grid in ASM1: we choose the grid density with the best fit to the data, which in this case would single out the width 0.5. Still, this approach requires considering a number of grid densities ρ(x) of different widths and choosing the best one. In the next section, we describe a method that avoids choosing a specific width altogether. It is worth mentioning that the bias caused by the default model is not unique to ASM. Also MaxEnt, which is known for its smooth results, produces in this case spurious features when the width of the default model is larger than it is supposed to be. They are, however, somewhat milder than those of the two flavors of ASM (see Fig. 4).

IV. AVERAGING WIDTH OF GRID DENSITY
Instead of finding the width giving the best fit to the data, we propose here an alternative that is more in the spirit of the average spectrum method: Instead of choosing some width a priori, we average over the width parameter of the grid density. Integrating over this parameter in Eq. (15) requires evaluating the integral One way could be sampling the width directly using Metropolis-Hasting, but that would be inefficient: updating the width would change the prior probabilities for all the grid points. So for a large number of grid points, one would be forced to take very small updates of the width to achieve a reasonable acceptance rate. The more grid points, the less efficient the sampling is. There is, however, a much more elegant and efficient solution.
We notice that unlike the grid points x and spectral integralsf , the width parameter w is not directly related to the data. Therefore, the above expression can be rearranged such that the integral over the width is a function of the grid points only We can then perform the width integral P (x) analytically, e.g. for the family of density functions which are known as the exponential power distributions and include the Gaussian distribution (q = 2), the exponential distribution (q = 1) and the uniform distribution (q → ∞). The integral over the width then becomes where the L q -norm [25] is defined by We use this norm to make the following change of variable so that the integral over the new variable z becomes independent of the grid points, leaving us with From this, we see that weighting the grid points vector by a power of its L q -norm is equivalent to using a q-th power exponential function as a grid density and integrating flat over the width parameter. More specifically, using the L 2 -norm is equivalent to using a Gaussian grid density and integrating over its standard deviation. Similarly, using the L 1 -norm is equivalent to using an exponential density and integrating over the scale parameter while using the L ∞ -norm corresponds to using a uniform density and integrating over the cutoff. Note that using a reciprocal prior in the width integral, as appropriate for scale parameters [26, p. 109], instead of a flat one leads to a very similar result where the power of the norm is N −2 instead of N −1. The difference between the two choices is insignificant in practice. Substituting Eq. (28) back into the original expression Eq. (23) and absorbing the integral over z in the overall normalization constant we get the formal expression for this extended version of the average spectrum method (29) We can easily adapt the sampling algorithm for releasedgrid ASM discussed in Sec. II A to evaluate this expression: We simply replace the prior density function in the acceptance ratio by the power ratio of the norms of grid samples. We only need to choose a reasonable value for the width parameter w of the proposal distribution which can be easily estimated, e.g., from the non-negative least squares (NNLS) solution [22]. This value need not be very close to the width of the exact model. We have run calculations where it was an order of magnitude off. Obviously, its choice only affects the acceptance ratio of grid point sampling.
In Fig. 5, we show the results of width-sampling ASM for the same test case as in Fig. 3. We did the calculations using both exponential (L 1 -norm) and Gaussian grid densities (L 2 -norm). The results of both calculations are in excellent agreement with the exact spectrum without the need for fine-tuning the exact value of the width parameter. Width-sampling ASM produces even better agreement than MaxEnt (Fig. 4).  Optical conductivity σ(ω) obtained using widthsampling ASM for the same problem as in Figs. 1 and 2. An exponential (q = 1) and a Gaussian grid density (q = 2) with 512 grid points is used. The samples are binned on a uniform grid.
In Fig. 6, we show the histogram of the scaled L 2norm of grid samples from Gaussian densities (q = 2). We calculated the scaled norm of a grid sample asŵ := i x 2 i /N . Notice how the method automatically finds the optimal value of 0.5 and averages around it to the extent allowed by the noise in the data.
For completeness, we also report the effect of width averaging on the optical conductivity test case. The earlier results with released grid points were already very good and showed no noticable dependence on the width, so it does not come as a surprise that the results with width averaging are as good, as can be seen from Fig. 7.
Note that, unlike the fixed-and released-grid average spectrum method, width-sampling ASM does not have a default model: in the absence of data, except for a sum rule, the method does not give a result. This is by design: For convergence, width-averaging requires the data to provide information about the width. Having the least informative prior, width-sampling ASM thus is the least biased of the methods discussed here as may be seen by comparing Figs. 3 and 4 with Fig. 5.

V. GRID SIZE DEPENDENCE
In ASM1, we discussed the dependence of the fixedgrid average spectrum method on the number of grid points and found that it plays the role of a regularization parameter: as the number of grid points increases, the results change and get more biased towards the grid density. The same behavior still applies to the releasedgrid ASM but is much weaker. This is in line with the overall reduction in the dependence on the grid density we saw earlier. For example, we compare in Fig. 8 the N -dependence of the optical conductivity test case using fixed and released grid points. While the fixed-grid calculations show a significant variation, released-grid ASM results hardly change with the number of grid points except for small differences near ω = 0. We know that using a much larger number of grid points, even the released-grid method will eventually show a more pronounced dependence on the grid size. We did not, however, observe this dependence in this test case for any reasonable value of N . To see it in released-gird ASM or its width-sampling extension, we need to look at yet another case where the exact spectrum is significantly different from the singlypeaked default models we typically use.
To this end, let us take a spectral function composed of four Gaussian peaks symmetric around zero. Two of the peaks are narrow with width 0.1 and weight 0.15 and located at frequencies ±0.5. The other two are wider with width 0.5 and weight 0.35 and located further out at ±2. As with the previous spectral function, we generate the green function data using Eq. (1) on 60 τ -points for β = 50. We add relative Gaussian noise with a standard deviation of 10 −2 .
In the top panel of Fig. 9, we plot the results of widthsampling ASM using a Gaussian grid density (q = 2) and Dependence of the optical conductivity σ(ω) obtained using fixed-grid ASM (top) and released-grid ASM (bottom) on the grid size N (label). A Lorentzian grid density with parameter γ = 2.5 is used. For ease of comparison, the samples of the released calculations are binned and averaged on the grids of the corresponding fixed calculations. different grid sizes. As the grid size increases, the results get smoother and the peaks get wider demonstrating the regularizing effect of the grid size even in the released-grid case. The behavior is similar using a fixed width. A better understanding is gained by checking the fit histograms of the sampled spectra in the bottom panel. As the grid size increases beyond N = 128, the fit histograms, introduced in ASM1, shift to the right and get wider showing a systematic bias towards spectra of worse fits. On the other end, we observed that when the grid size is lower than N = 16, the data fit gets extremely bad due to the large discretization error. We did not include results using such low grid sizes to avoid cluttering the plots.
The qualitative behavior of the fit in the average spectrum methods is depicted in Fig. 10. For very low grid sizes, the discretization error dominates, leading to a very bad fit. Once the grid size is large enough such that the discretization error becomes negligible relative to the noise on the data, increasing the grid size leads to more bias and worse fit. This dependence on the grid size starts out slowly and then accelerates till the average spectrum approaches the default model (grid density) for very large N . Therefore, a weak dependence on the grid size indicates a better grid density, and when the density matches the exact model, the sweet spot extends to infinity. This behavior applies equally to both the fixed and the released-grid ASM. However, there are two main differences. The discretization error for released-gird ASM is normally less than that of the fixed grid. Also, the region of low N -dependence extends further because the bias towards the grid density is much reduced compared to fixed-grid ASM. In general, the results of released-grid ASM show weak dependence for the typical grid sizes we use. A good recipe for choosing N is to use the largest value for which the fit does not get substantially worse. For the test case of Fig. 9 this would be N = 128.
It may be worth mentioning that we tried sampling the grid size using a flat prior. We found that the method chooses a high number of points when the default model is highly compatible with the data as in Fig. 8. But when the default model does not fit the data very well, the method chooses a very low number of points. This happened in cases like the one shown in Fig. 9. In the end, we decided to keep the grid size as an independent parameter of the method that we use for checking the reliability of the results: A strong dependence on the grid size indicates a strong bias towards the default model and away from data.

VI. SUMMARY AND DISCUSSION
The results of the average spectrum method can strongly depend on the discretization grid. One approach for handling this dependence is choosing a grid density that fits the data decently. We showed that sampling the grid points helps to reduce the bias dramatically and thus obviates, in many cases, the need to search for the best grid. But in some cases, the width of the grid density still influences the results noticeably. We, therefore, went one step further and averaged over the results for grids of different widths. Remarkably, for a large family of grid densities, we could perform this additional sampling analytically, incurring only a negligible computational overhead.
The approach used here for handling the grid dependence led us to the following hierarchy of average spectrum methods where each method extends the previous one by averaging over the relevant parameters dw dx .
It is obvious that the approach can be taken further by varying over other parameters of the grid density. Had we, for example, observed a strong dependence on the functional form of the grid density, e.g. the parameter q in Eq. (29), we would have average the results of different such values using suitable weights. From this perspective, the functional-integral based average spectrum method can be seen as a general framework for obtaining data-compatible spectra in the context of analytic continuation and similar spectral reconstruction problems. Given a certain parametrization of the spectrum, the most straightforward solution is estimating these parameters by fitting the data. In many situations, this may be enough to single out a small region of the parameter space with tolerable variations in the spectrum. When observing a noticeable sensitivity of the results to a parameter, one should consider aver-aging over the results of different values of this parameter to smooth out details not supported by the data. In light of this, the fixed-gird method of ASM1 can be seen as an extension of the non-negative least squares method (NNLS), where instead of finding the spectral integrals that fits the data best, it averages over, giving equal weights to all spectral integrals fitting the data equally.
In this paper, we could discuss only a few test cases, each of which was introduced for illustrating specific aspects of the average spectrum methods. Further test cases and applications of theses extensions to real-world problems are discussed in Ref. [27]. To encourage further testing of the ASM methods and their use for practical problems, efficient web-based implementations of the different flavors of ASM are accessible at [28].