Calculating the many-body density of states on a digital quantum computer

Quantum statistical mechanics allows us to extract thermodynamic information from a microscopic description of a many-body system. A key step is the calculation of the density of states, from which the partition function and all finite-temperature equilibrium thermodynamic quantities can be calculated. In this work, we devise and implement a quantum algorithm to perform an estimation of the density of states on a digital quantum computer which is inspired by the kernel polynomial method. Classically, the kernel polynomial method allows to sample spectral functions via a Chebyshev polynomial expansion. Our algorithm computes moments of the expansion on quantum hardware using a combination of random state preparation for stochastic trace evaluation and a controlled unitary operator. We use our algorithm to estimate the density of states of a non-integrable Hamiltonian on the Quantinuum H1-1 trapped ion chip for a controlled register of 18 qubits. This not only represents a state-of-the-art calculation of thermal properties of a many-body system on quantum hardware, but also exploits the controlled unitary evolution of a many-qubit register on an unprecedented scale.


I. INTRODUCTION
The idea of using one quantum system to efficiently simulate another one was the vision of Feynman over 40 years ago [1].This paradigm is known as quantum simulation [2][3][4][5] and is expected to be one of the first real applications of the current generation of quantum computers [6].In particular, recent progress has been made in simulating the dynamics of strongly correlated many-body systems on current devices [7][8][9][10][11], albeit with systems which are still too small to compete with calculations on classical super-computing architectures.The hope is that the achievable system sizes will eventually become large enough to surpass what is classically possible.
In terms of using quantum simulators to extract eigen energies of many-body systems, early ideas include algorithms based on quantum Fourier transform [12] such as quantum phase estimation [13][14][15] and adiabatic state preparation [16].The development of algorithms for the extraction of ground-state energies is central for the promise of being able to perform quantum chemistry and materials simulations on quantum computers [17][18][19][20] and ground state energy calculation is a target of many variational quantum algorithms [21,22].Results for finite temperature and excited states are more scarce.However recent proposals to measure finite temperature expectation values on hardware include sampling [23][24][25] and imaginary time evolution [26] and more recently algorithms which may have potential for computing microcanonical expectation values were proposed in [27].
The more general idea of using quantum computers to do statistical mechanics is a topic which is gaining traction [27,28].In this work we focus on developing an algorithm that gives a coarse-grained estimate of the density of states (DOS) based on the classical kernel polynomial method (KPM) [29].The KPM provides a reconstruction of a spectral function by means of a Chebyshev polynomial expansion, weighted by suitable kernels to damp the Gibbs oscillations that occur due to finite series truncation.Chebyshev moments are computed iteratively by applying functions of the Hamiltonian on some initial state.This step is a challenge to implement on quantum hardware.
Block encoding of a Hamiltonian is deeply connected with the Chebyshev polynomials [30], implementing the Hamiltonian as a quantum walk as exploited in the context of the KPM in [31] and more generally to estimate physical properties in [32,33].An alternative is to compute the Chebyshev moments iteratively in a variational quantum algorithm [34] or otherwise overcoming the problem of implementing the Chebyshev polynomials using suitably defined Fourier ones [35,36].
In this work we devise a hybrid algorithm which uses a combination of pseudo-random state preparation, Hadamard test and Suzuki-Trotter (ST) decomposition [37] to evaluate Chebyshev moments.These moments are then used in the standard KPM expansion.We use an arc-cosine approximation of the Hamiltonian to implement Chebyshev polynomials from standard ST decomposition and implement our algorithm on the Quantinuum H1-1 trapped ion quantum simulator [38].We were able to approximate the DOS of a non-integrable spin chain for up to 18 qubits using a single ancillary qubit.Our simulations represent one of the first explorations of the use of near term quantum computers for calculations in statistical mechanics.
In Section II we introduce the KPM method and discuss its use in the context of statistical mechanics.We discuss how to compute the DOS and how a pseudo random state can be used for stochastic trace estimation.In Section III we explain the quantum algorithm for extracting Chebyshev polynomials and discuss the subroutines for random state preparation and implementing the arccosine approximation of the Hamiltonian.In Section IV we then introduce the model we simulate on hardware and the corresponding gate decomposition used to implement the controlled unitary.We display our results for estimations of DOS computed using our hybrid algorithm for systems sizes of 12 and 18 qubits.

A. Density of states
In this section we give an overview of the classical KPM and discuss how it is used to calculate the DOS [29].For a system of L qubits with Hamiltonian Ĥ, the DOS is defined as where we denote the energy eigenvalues by E k and the corresponding eigenvectors by |k , i.e.Ĥ |k = E k |k .The DOS gives access to all thermodynamic properties: in particular, the canonical partition function can be evaluated as Differentiation of Z yields any desired thermodynamic quantity: for example, the energy and the entropy is the free energy.
In order to extract the DOS from a system of size L one would typically need exact diagonalization, which requires memory resources scaling as O(2 3L ).In contrast, the KPM described in the following section is able to approximate with memory scaling only as O(2 L ) (combined with stochastic evaluation of the trace, analysed in Section II D).

B. Kernel polynomial method
Consider some function f (x) defined on the interval x ∈ [−1 , 1].The KPM provides an optimal approximation of this function by a finite series of M Chebyshev polynomials.Mathematically, it is defined as where γ M m are the kernel coefficients used to damp Gibbs oscillations, T m are the Chebyshev polynomials, and µ m are the Chebyshev moments.While a more detailed discussion can be found in Appendix A, the polynomials are generally defined as For example, T 0 (x) = 1, T 1 (x) = x, and all higher Chebyshev polynomials obey the recursion relation The KPM expansion Eq. ( 4) is thus reconstructed by computing the corresponding Chebyshev moments The KPM can be adapted to estimate general spectral functions involving a given quantum mechanical Hamiltonian.In this work, we will be interested specifically in using it to get an estimate for the DOS.We note that recently the KPM has been implemented using tensor network techniques [39], and was used to study thermalisation [40] as well as to extract the DOS of simple lattice gauge theories [41].One clear advantage of the KPM is that it does not suffer from the sign problem that is synonymous with Monte-Carlo simulations.

C. Chebyshev moments of the DOS
Let us discuss the key steps in computing the KPM approximation of the DOS, g(E).Since the domain of Chebyshev polynomials is [−1, 1], Ĥ must have a spectral norm || Ĥ|| ≤ 1.If not, Ĥ can be normalised as with where E max and E min are the largest and smallest eigenvalues of Ĥ.Additionally, a small cutoff ε is introduced to avoid stability issues that can arise close to the boundaries of the spectrum.After this rescaling is performed, the expression of the moments µ m becomes

D. Stochastic trace evaluation
The first task in the extraction of Chebyshev moments of the KPM is the efficient estimation of the trace of an operator X acting on L qubits, as in Eq. (10).For example computational complexity of determining an element on the main diagonal, X ii := i| X |i , is O(2 L ), and therefore determining all elements on the main diagonal requires O(2 2L ) operations [42].A common alternative method to this computational procedure is stochastic trace estimation.The main idea is to estimate X on a set of randomly chosen states.Let {|r } be a set of R random states on L qubits: so that the stochastic estimate will be If ∀r the statistical average is 0, i.e.
(where the • average is over i), and if then the variance in the estimate of Tr[ X] would be (for more details see the overview in [29]).Therefore, the relative error will scale as O 1/ √ R2 L , which means that fewer random states will be needed to achieve a fixed level of precision as the size of the system increases.Note that if the coefficients of the random states are distributed as a Gaussian, then the variance of Θ only depends on Tr[ X2 ].
The stochastic trace is used in conjunction with the recursion relation Eq. (6) to estimate the Chebyshev moments.Starting with a random state |r = |r 0 , one can define and recursively generate a series of M vectors The moments can then be obtained from the overlap r m |r = r|T m ( Ĥ)|r .One potential issue with computing the moments in this way is that errors make the precision of m-th moment dependent on errors from the evaluation of the previous ones.As mentioned earlier, the computation of the µ m is the major bottleneck of the classical KPM method, limiting the size of the systems it can be applied to.In the next section we describe our algorithm to compute the m-th Chebyshev moment on a quantum computer.with ĤK connected to the arc-cosine expansion defined by Eq. ( 38) which can be then connected to the Chebyshev polynomials of the Hamiltonian of order m.
Fig. 1 shows the circuit we use to compute Chebyshev moments on quantum hardware.Our proposal is reminiscent of the DQC1 protocol [43], which is a sub-universal computational paradigm that requires L + 1 qubits, an ancillary one (that will act as a control) and L qubits initialised in a maximally mixed state.The circuit consists of a Hadamard test where a controlled unitary is applied to L qubits and then the control qubit is measured.We simulate the maximally mixed state by a randomisation procedure.We now discuss the two main parts of the circuit in detail.Section III A analyses the random circuit we use to mimic the identity register and Section III B the application of the controlled unitary for the extraction of Chebyshev moments.

A. Stochastic trace evaluation via state randomisation
The generation of random states on quantum computers has garnered considerable recent attention recently due to their important role in benchmarking [44].However, producing uniformly random states that sample the Haar distribution is inefficient in the sense that the number of gates required scales exponentially with the register size [45,46].T-designs are a type of circuit that replicate moments of the Haar distribution up to order T or lower [47].They offer some improvement in terms of 1 st 2 nd 3 rd 4 th 5 th 6 th FIG. 2. Shown in this figure is the 6-layer randomised circuit used for trace evaluation in the L = 12 (the blue box on L-1 qubit register in Fig. 1).(a) A representation that illustrates the connectivity of the circuit, the grey dots represent the qubits, and the colored lines are the 2-qubit gates where they are numbered according to their application.Note that this particular structure is chosen to exploit the all to all connectivity of the Quantinuum H1-1 device [38].(b) Gate decomposition of the random circuit.The colored lines represent ẐZ(π/2) gates, and the dark gray squares are the singlequbit rotations, randomly selected from X(π/2), Ŷ (π/4) and Ẑ(π/2).The latter two will be removed by the compilation since they commute with ẐZ(π/2) and anti-commute with the other operators.Therefore, their effect is to add two more types of single-qubit rotations.
gate efficiency, with the number of gates required scaling polynomially with the number of qubits.Nonetheless, T-designs are still costly and may be overly random for specific purposes.Here we take an alternative approach based on pseudorandom state generation [46].This involves the generation of states that do not uniformly sample the Haar distribution [48], but still possess the desired properties such as Eqs.( 13) and ( 14).This method has been shown to generate states that are sufficiently random in an efficient way.
Using pseudorandom states to stochastically evaluate the trace has been proposed in various papers [35,[49][50][51][52], and recently used on quantum hardware to extract hightemperature transport exponents [11].In order to generate pseudo-random states, one can use a circuit composed of alternating layers of 2-qubit gates and layers with random single qubit rotations, as suggested by Ref. [46] and adopted in Refs.[11,44,49] with small variations.This random composition has become a widely accepted method for generating this type of random state.In Ref. [49] the random state is aimed at generation on a quantum computer where the qubits are connected in a ring geometry.Layers of 2-qubit gates connecting even-odd and odd-even qubits are alternated and in between them, there are layers of single-qubit gates randomly chosen among { X(π/2), Ŷ (π/2), Ẑ(π/4)} so that the same rotation is not applied to the same qubit sequentially.
In our case, we focus on a variation of this procedure that takes advantage of the all-to-all connectivity and of the specific set of elementary gates that can be implemented directly on the Quantinuum H1-1 trapped-ionbased quantum computer (see Fig. 2).The gate set of the device Quantinuum H1-1 device includes The device that we will use also supports parallelisation (using Quantum charge-coupled device (QCCD) architecture with five parallel gate zones [53]).Inspired by existing techniques to create shallow randomisers, we change the single qubit rotations to be chosen from { X(π/2), Ŷ (π/4), Ẑ(π/2)}.Note that the former two rotations can be applied as a single Û1q (θ, φ) gate, while the latter can be implemented virtually [54].The 2-qubit gate will now be a ẐZ(π/2) with a different connectivity.Each ẐZ(π/2) will connect the qubits 2i and (2i + p) mod L with i = 0, . . ., L/2 and p an odd number called a jump.The jumps are chosen so that each qubit is narrowly connected to the other.For instance, as shown in Fig. 2, the jumps of the first half of the layers can be chosen with p = −(−1) (2s + 1) with s ∈ N 0 and the index of the layer, and with p = −p − for the second half of the layers.In particular, Fig. 2 shows the random compiler with L = 12 and s = 1.Notice that s = 0 reproduces the same pattern of 2-qubit gates as in [49].
Following Ref. [49], we quantify the randomising effect of our circuit by checking how well the half-system entanglement entropy converges to the Page value [55], which is the entanglement entropy for a typical Haarrandom state.The von Neumann entanglement entropy of a state ρ r = |r r| on a space divided in subspaces H a and H b is: As shown in [55], this value converges to the Page value log(dim In Fig. 3, we present a classical simulation comparing the convergence rates to the Page value of three different approaches: Ric., which is the approach suggested by Richter et al. [49]; Par., the approach used in this work; and Seq., a variation of Par.where the 2-qubit gates are applied as a chain.In this latter method, the layers of 2-qubit gates are composed of a sequence of gates, each of these has support overlapping over half of the previous and half of the succeeding gate.Therefore, we ), the one used in this work (Par.)(see Fig. 2) and one where the 2-qubit gates are applied sequentially (Seq.).All three circuits lead to fast convergence to a maximally bi-partite entangled state.We note that the relative error in Seq.seems to be most favorable but we have used Par. on the hardware here because less layers are required to get a good estimate of the trace.
will compare these three methods in terms of the number of 2-qubit gates used instead of the number of layers.Although this last constraint limits the implementation on Quantinuum's hardware it is helpful in analysing the three different random compilers.In fact, while Par. is not the fastest at converging to the Page value, it performs best at estimating the trace, as shown in Fig. 4. In contrast, the states produced by Seq.converge to the Page value fastest, but they are not as effective for trace estimation as those built with Par.This suggests that the entanglement entropy alone is not sufficient to determine a suitable state for stochastic trace estimation.In fact, we have observed that, as the entropy converges to the Page value, the fourth moment of the coefficients c ri converges to 2 (as required by Eq. ( 15)).This suggests that the states become Gaussian-distributed at the same rate as they approach the Page value.

B. Implementing the Chebyshev polynomials
Simulating the unitary evolution operator or other functions of the Hamiltonian are central tasks in the field of quantum simulation.In recent years there has been significant progress, with various approximations, like qDrift [56][57][58][59], LCU [60], Taylor expansion [61,62] and qubitization [63,64].The latter relies on the sequential application of two oracles: select ( Ŝ) and prepare ( P ), defined as follows.Consider a normalised Hamiltonian that can be decomposed into a sum of unitary The select operator is defined as where (a) denotes an ancillary Hilbert space comprising a qubits, where a ≥ log 2 P [64].The prepare operator acts as These operators can be combined to obtain a block encoding of the Hamiltonian as The idea behind qubitization is to exploit the block encoding of Ĥ in a quantum walk to generate any function of Ĥ.The walk operator is defined as where the operator R(a) acts as a reflection about the |P state.The walk operator can be decomposed as where E k are the eigenvalues (with respective eigenstates |k ) of Ĥ and each term of the direct sum acts on the subspace H k generated by |φ k := |P |k and its orthogonal state Likewise the Ŷ(k) operator will act as a Pauli Ŷ operator on this subspace.Eq. ( 25) is also useful to understand how Ŵ is isomorphic to that is the minimal block encoding of Ĥ.A fundamental feature of Ŵ , on which the efficiency of qubitization is based, is that repeating it m times and projecting it on |P generates the Chebyshev polynomials: Let us now consider the smallest decomposition of Ĥ in unitaries, where Eq. ( 25) becomes easier to interpet.Given any Hamiltonian Ĥ, there is a unitary operator ÛH such that where It can be observed that Û 2 H = Î as would seem to be required by Eq. ( 20); however, we notice that this condition can be relaxed by introducing a new walk operator V := R(a) Ŝ † and alternating it with Ŵ .As shown in detail in Appendix C, an alternative decomposition to Eq. ( 28) is For this decomposition of Ĥ, the prepare operator simplifies to while the select operator becomes The reflection operator becomes R(a) = 2|+ +|− Î (a) = X(a) (33) and the walk operators can be constructed as The isomorphism between Ŵ and Eq. ( 26) is now clear since it results in mapping the ancilla from Ẑ to Ŷ .The quantum walk now generates Finally, we notice that Therefore, the iteration of the select operator already generates the desirable walk: We now want to utilise the decomposition in order to exploit it for the evaluation of the DOS via our KPM inspired algorithm.The KPM generally works better at the centre of the Hamiltonian spectrum, where exponentially many states reside.Close to the spectral edges it can become unstable, particularly at smaller system size, and hence less reliable [29].Exploiting the fact that exponentially many eigenstates are at the centre of the spectrum, one can expand the arccos( Ĥ) around E k = 0: It follows that the select operator (now depending K) becomes and The select operator now can be implemented with just two controlled operators as In what follows we choose to implement the ÛH K using the standard ST decomposition since this decomposition is already known to require adequate resources on the hardware platform [65,66].A key limitation of the ST formulae in general is that the only operator it can implement is the evolution operator.However, the combination of the arc-cosine approximation, ST decomposition and qubitization allow us to approximate polynomials of Ĥ which is sufficient for our goal.We leave the explicit qubitization of ÛH for future work.From Eq. ( 40) we see that the Chebyshev moments can be written as Hence, a Hadamard test on top of the random state will correctly implement this operation, as depicted in Fig. 1.Note that, as shown in Fig. 1, a final Ẑ(−β) rotation on the control qubit is needed to implement the component of ĤK that is proportional to the identity.For K = 0 this rotation is simply i.e. β = −a/b, where a and b are defined in Eq. ( 8).Note that since the random state is used to simulate the maximally mixed state, the circuit of Fig. 1 is a variant of a DQC1.

A. Example Model to be simulated
As a test model to implement our algorithm on quantum hardware we choose the non-integrable spin- 1  2 XYZ Heisenberg chain with a staggered interaction along the Z direction: By choosing J z = Λ, we can add a small advantage from the perspective of gate count, without adding any symmetry to the Hamiltonian.Indeed, the exponentiation of Ĥ (Eq.( 44)) can be split into two non-commuting terms: interactions between even-odd spins and between odd-even spins.The latter, when Λ = J z , have non-zero couplings only along the X and Y direction.Then, evenodd terms require a gate composition of Fig. 5.(b), while odd-even terms require the even shallower Fig. 5.(a).

B. Hardware results
We are now in a position to test our quantum algorithm on physical hardware using the model described FIG. 5. Gate decompositions used to implement the controlled unitary (i.e. the orange gate of Fig. 1) for the Hamiltonian of Eq. ( 44).The controlled ST decomposition of this Hamiltonian requires a controlled exp iα X ⊗ X + iβ Ŷ ⊗ Ŷ for the odd-even spin pairs (a) and a controlled exp iα X ⊗ X + iβ Ŷ ⊗ Ŷ + iγ Ẑ ⊗ Ẑ for the even-odd spin pairs (b), with α, β, γ ∈ C. The hatched area in (a) represents the part that for the first ST step will naturally mix with the last layer of the random circuit, without requiring additional gates; the hatched area of (b) in the last ST step can be neglected.The (c) and (d ) respectively represent the conversion of the C-Ẑ(2α) and C-X(2α) in the native gate set currently supported by the System Model H1-1.Note that this particular gate decomposition was chosen in order to require as few gates as possible on the control qubit, to enhance parallelisation and reduce the probability of errors occurring.above.To evaluate the effectiveness of the method at each stage of approximation, we conducted quantum and classical simulations of the chain defined by Eq. ( 44) at L = 12 and L = 18.In Fig. 6, we compare the Chebyshev moments obtained by various methods: by analytical calculation (using eigenvalues obtained from ED), by approximating the arccos function, by including the ST approximation, and by simulating the circuit of Fig. 1 with and without noise.To generate the random state, we implemented a series of gates on the L-qubit register via the Par.method, as described in Section III A. The Chebyshev moments were computed using the methods described in Section III B, taking the arccosine expansion to order K = 0, i.e. arccos( Ĥ) ≈ π/2 − Ĥ, and using a single ST step.Remarkably, we find that even with these parameters it is possible to achieve an appreciable precision.We found that hardware errors overcome any attainable improvements from using arccosine expansions with K > 0 or more than a single ST step.More detail on the error analysis is given in Appendix D.
We found that for L = 12, the KPM expansion reaches a sufficient degree of convergence to ED values at M ∼ 25.Due to resource limitations on the Quantinuum device we used only four different random states, running 1000 shots for each one of these.We simulated up to m = 7 on the quantum hardware, and approximate µ m ≈ 0 for m > 7, since these moments are too small to be distinguished from zero with our resources (see Appendix D).  44) at L = 12 with Jx = 1, Jy = 1/3, Jz = Λ = 1/2.In (a) the estimates for the first 25 Chebyshev moments.The comparison is between Chebyshev moments obtained in different ways: analytically (Cheb), applying the arc-cosine approximation at the first order (Arccos), adding the ST approximation with one single step (ST), and the results from the circuit of Fig. 1 from Quantinuum, the emulator (H1-1e) and the quantum computer (H1-1).We used 4 different random circuits with j = 5 and 1000 shots.Each cross represents the result from a circuit with a different random circuit.Figure (b) is composed by two parts.On top: the DOS obtained with the five different estimates for the Chebyshev moments (using a kernel with M = 25) and the DOS from the exact diagonalisation (ED).On bottom we reported the comparison of the DOS obtained with different approximation of the Chebyshev moments w.r.t. the analytical ones.
from the exact values within the bulk of the spectrum, and that the KPM with M = 25 is able to accurately reconstruct the DOS using these moments.We next attempted the same calculation on a register of L = 18, which uses 19 out of 20 qubits currently available on the H1-1 System [38].Here the KPM requires M ∼ 50 to achieve an accurate approximation, while the moments that can be simulated accurately range up to m = 11, beyond which their values become too small to be distinguished from zero given the number of shots we have used.Simulating a larger system requires even more resources, and due to the higher costs associated with these larger circuits, we ultimately only simulated µ 2 on the real hardware for the L = 18 case.Our results demonstrate that the emulator (H1-1e) consistently produces accurate outcomes and we compute the rest of the moments with it.Additionally, we note that the fidelity of the ẐZ gate appears to improve with smaller angles.Our circuit heavily employs these gates (see Fig. 5), and any discrepancies in the emulator results can be attributed to an underestimation of these gates' fidelities.We report the results from this last system in Fig. 7, where in the plot of the DOS we add a projection obtained by combining the results from hardware and emulator together.

V. CONCLUSIONS
In summary, we have performed the first estimation of the density of states of an non-integrable manybody quantum system on a digital quantum simulator.We have designed and implemented a quantum algorithm that exploits a combination of the Hadamard test, Suzuki-Trotter decomposition and random state preparation to extract Chebyshev moments.Proof-of-principle hardware simulations were performed on registers of L = 12 and L = 18 qubits on the Quantinuum H1-1 ion trap quantum computer, obtaining a good approximation to the DOS for a non-integrable Hamiltonian in the bulk of the spectrum (corresponding to high microcanonical temperatures).We explored in detail the crucial subroutines of stochastic trace evaluation and controlled evolution with arccosine approximation.We believe that our quantum hardware results represent the current state-of-theart, in terms of both the generation pseudo-random states and the implementation of controlled unitary operations on a many-qubit register.We emphasise that the accuracy of our hardware results has been limited primarily by financial constraints, and not by fundamental resource scalings nor even by noise on the H1-1 device.
For the DOS we found it was sufficient to take the arccosine expansion to very low order (K = 0), which is ultimately due to the concentration of energy levels at the centre of the spectrum in large systems.We note that our KPM-inspired approach can easily be tailored to compute finite-temperature expectation values in the diagonal and micro-canonical ensembles, in addition to other spectral functions such as the Lehmann representation of multitime correlation functions.In these cases, it may be necessary to consider higher-order expansions (i.e.K > 0) to account for features away from the centre of the spectrum.Our methods could also be combined with other quantum algorithms tailored to compute ground-state and low-lying excited state properties [17][18][19][20][21][22], in order to estimate thermodynamic properties across the full range of temperature scales.
Our estimation of the DOS on current quantum hard-ware represents an important step forward towards quantum statistical mechanics calculations on quantum computers.As the devices improve, we expect that this algorithm and subroutines can be used to extract useful approximations to thermodynamic properties in regimes not accessible to state-of-the-art classical numerical techniques for strongly correlated systems.
extended in time (its convergence domain is an infinite strip, symmetric around the real axes [67]).Similarly, we can evaluate f (and the coefficients a m and b m ) just in the [−π, π] interval.The Fourier series is then exponentially convergent for periodic functions with derivatives bounded in [−π, π], namely its coefficients a m and b m decrease exponentially in m: with n 1 and q a constant for some r > 0. If f is even(odd) all the sine(cosine) coefficients will cancel out and the expansion becomes the Fourier sine(cosine) series.
Many physical phenomena are not periodic but bounded (occur in a limited space).So, to make a function behave periodically one can apply a change of variable ) the expansion reduces to the Fourier cosine series: where cos(m cos(θ)) are the Chebychev polynomials T m (cos(θ)).Reformulating this last equation in x we have where now Due to this strong connection with the Fourier series, the Chebyshev series inherits all its properties (among which the exponential convergence) with now the advantage to be working also with non-periodic functions.
We can notice that cos(m arccos(x)) is actually a polynomial in x by looking at the identity cos(2φ) = cos 2 (φ) − 1 (A8) which leads to the following iterative property of the Chebychev polynomials: FIG. 8. Comparison of the first 50 analytical Chebyshev moments with different approximations: the first order arc-cosine approximation (Arccos1), the second, which involves the 3 rd power of the Hamiltonian (Arccos3), the normal ST approximation at the first order (ST) and, its version with the updated parameters (updated ST).After m = 15 the difference between these last two approximations becomes negligible.

Appendix B: Higher orders of the arc-cosine expansion
Due to the concentration of eigenstates around zero, in a lot of physical Hamiltonians, the first-order expansion is already enough to shape the DOS with a good approximation.However, in case a better precision is required one can truncate the arccos approximation to the second order (i.e.K = 1).At this order of approximation, a k-local Hamiltonian would become 3k-local.However, the terms that are actually 3k-local contribute to a lesser degree.Moreover, a large part of Ĥ3 will lie on the same operators of Ĥ.For instance, in the Hamiltonian of Eq. ( 44) the only coefficients in Ĥ3 that increase with the system size are the ones of Pauli operators already present in Ĥ.Indeed, one can simply implement the same combination of Pauli operators of Ĥ by updating the parameters to include the contributions from Ĥ3 /6.These updated parameters will be, considering the case J z + Λ = J x , for the J x of the even sites x + (3L − 4)J 2 y + 3β 2 + 6J y β , (B1) for the J x of the odd sites Similarly for J y of even sites: J y + 1 6 (3L−2)J 3 y +(9L/2−4)J y J 2 x +3J y β 2 +6J 2 x β (B3) and for the odd ones: Then, for z-coupling our Hamiltonian only has even terms with a coefficient updated as: ((9L/2 − 6)J 2 x + (3L − 4)J 2 y + 3β 2 + 6J y β) (B5) and finally, the coefficient of the identity becomes: The comparison of how the ST decomposition is affected by using these parameters instead of the initial one is reported in Fig. 8.
The first contributes to the error are for even and odd moments respectively.To roughly characterize it, we need to introduce speculation about the spectrum of the system.In many physical systems, the DOS can be modelled as a Gaussian distribution [70], idea behind which seems also to be inspired the GIT method [32].If the variance of the DOS is ∆ E and Ē the average, the Gaussian curve that describes the spectrum of Ĥ is For K = 0, the contributes of Eq. (D3) become Note that following the normalisation of Eq. ( 8), both Ē and ∆ E will be less than 1.Choosing ∆ E ≥ Ē (as it is in our case) justifies why in our results we did not see better performances at even rather than odd moments.
The second to the ST decomposition, to decompose the operator in simple gates, if the ST order is the first and the amount of steps is 1 the error will go as O(5 t m 1+ 1 t /ε 1 t ) where t is the ST order [37].The third follows from the stochastic trace estimation of Section II D, where the error is O 1/ √ R2 L .The fourth is due to the circuit and the finiteness of the number of shots: )

18 FIG. 3 .
FIG. 3. The convergence of different half-chain entanglement entropies to the expected value, for L = 10, 12, 14, 16 and 18 averaged over 20 different random states.The shadows are drawn by the values of the individual random states.(a) The half-chain von Neumann entropy S is shown as function of random circuit depth and shows the saturation to the Page value [55] (dark grey lines) for different system size for the circuit used in this work (Par.).(b) Comparison of the relative error of the von Neumann half-chain entropy εs at L = 18 of the three different random circuits: the one proposed in Richter et al. [49] (Ric.), the one used in this work (Par.)(see Fig.2) and one where the 2-qubit gates are applied sequentially (Seq.).All three circuits lead to fast convergence to a maximally bi-partite entangled state.We note that the relative error in Seq.seems to be most favorable but we have used Par. on the hardware here because less layers are required to get a good estimate of the trace.

FIG. 4 .
FIG.4.Convergence of relative error εt of the stochastic trace to the trace of the Hamiltonian defined by Eq. (44) with L = 18 for the three random circuits used to generate Fig.3.The results are averaged over 20 different random states from which the shadows are obtained.

Fig. 6 . 1 FIG. 6 .
FIG.6.Results for the Hamiltonian of Eq. (44) at L = 12 with Jx = 1, Jy = 1/3, Jz = Λ = 1/2.In (a) the estimates for the first 25 Chebyshev moments.The comparison is between Chebyshev moments obtained in different ways: analytically (Cheb), applying the arc-cosine approximation at the first order (Arccos), adding the ST approximation with one single step (ST), and the results from the circuit of Fig.1from Quantinuum, the emulator (H1-1e) and the quantum computer (H1-1).We used 4 different random circuits with j = 5 and 1000 shots.Each cross represents the result from a circuit with a different random circuit.Figure(b) is composed by two parts.On top: the DOS obtained with the five different estimates for the Chebyshev moments (using a kernel with M = 25) and the DOS from the exact diagonalisation (ED).On bottom we reported the comparison of the DOS obtained with different approximation of the Chebyshev moments w.r.t. the analytical ones.

FIG. 7 .
FIG.7.Similarly to Fig.6, the results for the Hamiltonian of Eq. (44) at L = 18 with Jx = 1, Jy = 1/3, Jz = Λ = 1/2.In (a) the estimates for the first 50 Chebyshev moments.Here, for the Quantinuum executions, we used 10 different random circuits with j = 4 and 1000 shots.Each cross represents the result from a circuit with a different random circuit.Figure(b) is composed by two parts.On top: the DOS obtained with the five different estimates for the Chebyshev moments and the DOS from the exact diagonalisation (ED).Since we computed µ2 only from the H1-1 System, we also included what the DOS estimate from the µ2 from the real hardware combined with the estimates from the emulator would look like (H1-1*).On bottom we reported the comparison of the DOS obtained with different approximation of the Chebyshev moments w.r.t. the analytical ones. ).