Density functionals and Kohn-Sham potentials with minimal wavefunction preparations on a quantum computer

One of the potential applications of a quantum computer is solving quantum chemical systems. It is known that one of the fastest ways to obtain somewhat accurate solutions classically is to use approximations of density functional theory. We demonstrate a general method for obtaining the exact functional as a machine learned model from a sufficiently powerful quantum computer. Only existing assumptions for the current feasibility of solutions on the quantum computer are used. Several known algorithms including quantum phase estimation, quantum amplitude estimation, and quantum gradient methods are used to train a machine learned model. One advantage of this combination of algorithms is that the quantum wavefunction does not need to be completely re-prepared at each step, lowering a sizable pre-factor. Using the assumptions for solutions of the ground-state algorithms on a quantum computer, we demonstrate that finding the Kohn-Sham potential is not necessarily more difficult than the ground state density. Once constructed, a classical user can use the resulting machine learned functional to solve for the ground state of a system self-consistently, provided the machine learned approximation is accurate enough for the input system. It is also demonstrated how the classical user can access commonly used time- and temperature-dependent approximations from the ground state model. Minor modifications to the algorithm can learn other types of functional theories including exact time- and temperature-dependence. Several other algorithms--including quantum machine learning--are demonstrated to be impractical in the general case for this problem.

Quantum computing has been proposed as an alternative to classical computing, 1 and there are some problems which can be solved faster than known classical algorithms. 2-7 One of the most sought after and potentially far reaching applications on a quantum computer is the solution of quantum chemistry problems. 8-13 Obtaining exact solutions from a quantum computer efficiently could revolutionize modern applications including the creation of new medicines, fertilizers, batteries, superconductors, and more. [14][15][16][17][18][19][20] To do this, one important quantity to determine is the ground state energy. The energy is a highly useful quantity for determining properties such as the equilibrium geometry of a molecule. Yet, the energy is not descriptive enough to fully characterize all desired properties of a system. For example, the band structure can be a useful tool to characterize a material, but this requires measurements at several k-points. 21 So, many measurements of the wavefunction would be required for some simple quantities.
Measurement on the quantum computer is expensive because the wavefunction must often be re-prepared before a second measurement is performed. It has already been shown on a quantum computer that obtaining the wavefunction can be extremely costly, 22-25 taking months or years even for moderately sized systems, 24,25 and that the wavefunction can not be copied. 26 The wavefunction is therefore a valuable commodity and measurements should be minimized. 27 One option to encode many solutions into one measurement is to use a machine learned (ML) model. 28 In general, ML models can interpolate remarkably well between input data to give access to many systems, including those not already solved. In principle, the ML model can be constructed directly on the quantum computer or from classical data generated by the quantum computer.
Using ML models would also allow for users of the quantum computer to export solutions to classical users. The results could then be quickly retrieved classically from the model and are generally accurate over a training manifold on which the model was constructed in a desire to generate the best machine learned model possible.
Finding the full wavefunction would require exponentially many measurements, so this can be difficult to implement on a quantum computer. But the same information can be expressed in a more compact form. So, we can look at alternative formulations of quantum physics for the most descriptive model.
The route pursued here is with density functional theory (DFT). 29 Hohenberg and Kohn established that the one-body density, n(r), is one-to-one with the external potential v(r) up to a constant shift of the potential. In essence, the density can replace the wavefunction, but it has fewer variables.
In order to use DFT, we must find some other means of obtaining the energy, since the Hamiltonian is not used in DFT. Instead, the universal functional, F [n], must be found. It was proven that the universal functional exists and is common to all problems of the same electronelectron interaction. 29,30 The quantities required for the classical user to find self-consistent solutions are the exact functional (determining the energy) and the functional derivative. 31 So, in addition to finding F [n], we also must find some other quantity such as the density, n(r), or the Kohn-Sham (KS) potential, v s (r). 30 With these components, we can fully characterize a quantum ground state and solve for other measurable quantities.
It has already been established that the density functional can be successfully modeled with ML methods on the classical computer. [31][32][33][34][35][36][37][38][39][40][41][42] Exact quantities at several different external potentials must be found for the ML models to be trained. The number of training points needed to construct accurate models are not prohibitively large. From the ML functionals, self-consistent solutions can be obtained. 37,[43][44][45] Numerically accurate ML functionals satisfy all exact conditions of F [n] 32,46 and escape the common errors of approximated density functionals. 47,48 To apply the classical ML-DFT methods on a quantum computer, some additional constraints must be minded. Previous attempts to obtain functionals from the quan-tum computer have relied on many measurements of the wavefunction for each system of interest. 13, [49][50][51][52][53] In our view, a worthy goal is to avoid both excessive measurements and re-preparations of the wavefunction especially in the case of time-dependent quantities. 50,54 This work proposes a feasible algorithm that finds the ML model for F [n] on the quantum computer if a groundstate wavefunction is available. The algorithm leaves the wavefunction largely undisturbed so it can be used as the starting state for another system, greatly reducing the pre-factor required to solve other systems. This is accomplished by using a quantum counting algorithm to extract descriptive quantities such as the density. Much of the algorithm is kept entirely on the quantum computer to motivate future improvements for speed, but the counting algorithm does allow for information to be output classically.
We also demonstrate that the Kohn-Sham potential can be solved using a similar strategy as the wavefunction. A gradient evaluated on a cost function for the KS system allows for the determination of the exact KS potential. This strategy can be faster than obtaining the density. Further, we demonstrate how access to the functional can be used to find approximate time-and temperature-dependent behavior in a system from the ground state functional, and modifications can be added to obtain exact results.
One temptation would be to use quantum machine learning, but long-known bounds on the efficiency of these methods preclude their use here. 55 This agrees with recent demonstrations that some known quantum machine learning algorithms are not universally advantageous. [56][57][58][59] We also discuss general limitations on known algorithms such as quantum machine learning and why these algorithms are expected to be inefficient here.
Section II presents the algorithm. Section III will discuss several uses of the resulting functional and considerations in choosing quantities to solve for. Section IV will discuss known limitations and justify why the algorithm is constructed as presented. Necessary background information on quantum chemistry, DFT, ML, and quantum computing algorithms is given in the Appendices.

II. ALGORITHM FOR THE FUNCTIONAL
In order to establish an algorithm for the quantum computer that gives results in quantum chemistry, knowledge of both fields must be understood. To avoid a lengthy summary in the main text, we have included relevant background knowledge in the Appendices in case they are needed. This section will contain all the elements of the algorithm and assumes only a background of algorithms in quantum computing.
We provide Fig. 1 to illustrate the steps necessary for one iteration of the algorithm, which we will refer to as a recycled wavefunction for minimal pre-factor (RWMP) method. Although many quantities could be produced from this algorithm, we will focus on the components useful for the density functional. The inputs are the external potential for some system (|v(r) ) and initial guess weights for the ML model (|w (i) ) represented as classical variables throughout. The following steps are required to obtain the solution for a given system and then update the parameters of the ML model.
1. We prepare a ground state wavefunction |Ψ for a given external potential, |v(r) , and number of electrons, N e . This can be done by real-time evolution (RTE, see Apx D 4). On Fig. 1, this is denoted by a box for RTE. The subroutine that obtains the wavefunction here does not have to be RTE. If another, more advanced solver is developed and used, then this can be substituted with no change to the rest of the RWMP algorithm.
Note that the methods to obtain quantities from the result of classical computations are not available since the quantum wavefunction has coefficients that are stored in superposition (a linear combination of |0 and |1 ). This means that the coefficients of the wavefunction can not be found except with many measurements. We must first find the energy before obtaining other relevant quantities.
2. We obtain the ground state energy |E from the ground state wavefunction |Ψ given in the previous step.
Access to the ground state energy is provided by quantum phase estimation (QPE, Apx. D 2) or with some other method like qubitization. 60 On Fig. 1, this step is denoted by QPE.
The next task is to determine some quantities of interest without requiring a full measurement of the wavefunction. The counting algorithm used allows for the wavefunction to be used again on the next iteration.
3. Given the energy |E and wavefunction |Ψ , we generate some quantity that is denoted here as FIG. 1. One step of the RWMP algorithm for the density. The next iteration uses the output wavefunction as the starting state for the next system to reduce the pre-factor. A similar procedure can be used to find the KS potential, replacing n(r) with vs(r) (with a QGA as an oracle query for the QAE). The ML model may need inputs from other registers. |n(r) from a quantum amplitude estimation (QAE, Apx. D 5), a name which we use interchangeably with quantum counting throughout.
The symbol used, n(r), is for the density (Apx. B), but we can substitute this quantity for others. For example, one could also determine the KS potential |v s (r) . This step is denoted as QAE on Fig. 1. Note that this step may involve an oracle query such as a quantum gradient algorithm (QGA, Apx. D 3) or have other subroutines. This is a point we will expand on in the next section, Sec. II A, when discussing how to obtain either n(r) or v s (r). For now, we focus on what to do once the quantity is obtained.
The output wavefunction is slightly modified by the QAE but remains nearly the same state with a small amount of error. The procedure to return the wavefunction to its original state does not completely re-prepare the wavefunction and instead has an iterative set of steps to repair the wavefunction as explained in Ref. 61 (see also Apx. D 5).
4. The output of the QAE can be used to update the ML model's parameters |w (i) to the next iteration, |w (i+1) . More than one ML model can be trained here (e.g., a ML model for |E and for |n(r) ).
The typical steps in a stochastic gradient descent (SGD, Apx. C) of forward and backward propagation can be used. For backward propagation, the output of a QGA can be used to update the ML parameters. This step is marked as ML on Fig. 1. This ML operation may need to be controlled on |v(r) depending on which quantity is being trained. Note that the QAE output could also be stored clasically and then machine learned not on the quantum computer, skipping this step but eliminating the opportunity for an improvement in the ML box from quantum advantages. 5. Another external potential is provided (|v + ∆v (not shown on Fig. 1) and the wavefunction is reused as the starting state for the next RTE.
Here, ∆v is chosen by the user and simply added to the coefficients of v(r) to give the next potential. Many passes through a set of potentials must occur to obtain an accurate ML model.
Recall that the resource estimate given in Ref. 24, several months may be required for a small molecule with RTE. The RWMP algorithm allows for a sequence of intermediate systems to be visited, effectively making use of that time to obtain data. So, if the system starts in a configuration where the initial wavefunction is accurate, then this can become the first data point for the ML model. Each subsequent time step could be another potential data point for the model.
The second advantage to this strategy is that the sequence of potentials in the RWMP algorithm can allow for the ordering of the next potential to be close by. This allows for the best starting state for the next system to be used and reduce the total amount of RTE steps that must be run over all systems. In summary, the RWMP algorithm here could make use of the preparation time for a hard to solve system by finding data from the intermediate steps and reduce the pre-factor in the solution.
The RWMP algorithm repeats until all systems are visited. The final step is to measure the parameters of the ML model, |w . The model can then be used classically.
In what follows, we discuss which quantities should be obtained by the QAE for the best description of the ground state. Then in Sec. III, we discuss aspects of this strategy that were not absolutely necessary in defining the algorithm and how a classical user can use the result. Finally, we expand on many points that were not crucial for the basic understanding of the RWMP algorithm and explain limits that ultimately lead to this algorithm (and why other subroutines were not used) in Sec. IV.

A. Quantities of interest
We move on to discussing which quantities are best to determine from the wavefunction via the QAE. There are two main options: the coefficients of the density matrix and the KS potential. When choosing the quantities of interest, both the functional and the functional derivative must be determined with the ML model in order to find solutions on the classical computer. There are several types of functionals that can be trained for this, some of which are presented here.

Density functionals
In the RWMP algorithm, one option is to find the density, n(r), since this is proven to be a suitable replacement for the wavefunction from the Hohenberg-Kohn theorem in Ref. 29. The N 2 elements of the density matrix are expectation values of the operatorĉ † iĉ j (not just diagonal elements; see Apx A 1) whereĉ is a fermionic operator defined in Apx. A 1. Spin indices are ignored for simplicity for now. Since the expectation value is on the interval [-1,1], we can use the operator (ĉ † iĉ j + 1)/2 (defined on [0,1]) and afterward shift the result back to the original interval. Using this shifted operator is a necessary component to using the QAE because the expectation value can now be related to a probability. The number of rounds required for the QAE relates to the inverse of the precision requested. 61 There are several options for the ML model. The ML step can train directly from n(r) to E or we can take as an input v(r) to the model and train both n[v](r) and E[v]. The first option gives a pure density functional (Apx. B). The second option gives a potential functional, which is a dual functional to DFT (Apx. B 2). 62 Both of these theories can be solved self-consistently. One can also train the bifunctional E[n, v]. 33

Kohn-Sham potentials
The other main quantity, which has more value, is the KS potential, v s (r). The defining feature of the KS system is a non-interacting system that has the same density as a given interacting system (see more discussion and ways to realize this potential in Apx. B 1). The potential defining this non-interacting system is defined as v s (r) which is described by N parameters.
Two potentials can have the same density if the systems do not have the same electron-electron interactions, so as not to violate the Hohenberg-Kohn theorem. 29 The KS potential is highly valuable since it can be applied in several other instances. This includes finding time and temperature dependent calculations from v s (r) or the KS band structure. 54,63,64 A central question is whether v s (r) exists for a given interacting system. This is known as the problem of vrepresentability. Since it is proven that the KS potential always exists on a lattice, v s (r) always exists here. [65][66][67][68] The KS potential must be converged to, just as we had to evolve an initial state in RTE to a final state. The difference in finding the KS potential is that a gradient is applied instead of a time evolution operator for the wavefunction.
The KS potential, v s , will satisfy the minimization, [69][70][71] where Ψ is the interacting wavefunction and Φ is the KS wavefunction (see Apx. B 1 d for more information). There are other methods to obtain the KS potential, [72][73][74][75] but the method used here is straightforward on a quantum computer given a close enough starting guess or small enough molecule (i.e., those solvable by RTE). Equation 1 is used as the output of the oracle query in QAE for the QGA. 76,77 Note that the QGA is particularly useful for finding the functional derivatives here, notably taking the variation of all possible v s (r) in one oracle query. 76 The resulting gradient is applied on the coefficients of the KS potential and the process is repeated sufficient times until the true KS potential is obtained. An initial guess for the parameters could be taken from existing semi-local approximations such as local density approximations, etc. 78,79 or using a classical method. [68][69][70][71][72][73][74][75]80 The other classical methods where a gradient is used to evolve the potential may be useful, but the density does not need to be constructed to use Eq. (1).
In order to construct Eq. (1) on a quantum computer, the eigenvalues of free Hamiltonians, such as the KS Hamiltonian (T s +V s ) can be mapped to the interval [0,1] for the QAE. The operator must also be scaled by a constant, but we also note that shifting the potential by a constant, v s (r) → v s (r)+C, is allowed to ensure all eigenvalues are positive without changing the eigenvectors. 29 Further, identifying some upper bound on the expectation value, ε max , the scaled operator could appear as Ψ|T s +V s + C|Ψ /ε max .
The expectation value Φ|T s +V s |Φ can be evaluated in one of two ways. First, it can be computed by diag-onalizingT s +V s (determining h k ) and taking the sum Φ | k h kĉ † kĉ k |Φ in the diagonalized basis withΦ is now used. This procedure uses QAE to find the expectation value analogously to finding the coefficients of the density. Alternatively, one could apply a QPE (or qubitization) directly for the KS system, noting that the gate count is drastically less for the non-interacting system.
The KS potential as encoded in the ML model can be expressed as either v s [v](r) or as v s [n](r), where both are well-defined. 81

B. Example for the Kohn-Sham potential
To illustrate further some of the more abstract quantities in the RWMP algorithm in Sec. II, we provide an expanded example of the RWMP algorithm of how to obtain the ML-KS potential.
1. An initial potential v(r) is chosen. This can be done by assuming the external potential is a set of nuclei with a Coulombic interaction, v(r) = a (−Z/|r − r a |), where a indexes the positions of the nuclei, r a with atomic number Z.
The model can then be discretized in this basis and the resulting model has fermionic operators (see Apx. A 1). 3. RTE is run and the ground state is obtained. 4. QPE is used to find the ground state energy, E.
5. Ψ and E are used in the QAE to find the first term in Eq. (1). The energy of the non-interacting system is also obtained for the second term in Eq. (1) using some initial guess for v s (r). This constructs the oracle in the QGA.
6. The QGA result is added to the coefficients of the KS potential and the last step is repeated until the v s (r) a certain number of times to find the minimum.
7. The values E and v s (r) are input into a ML model. The gradient of the parameters in the model are updated with a QGA by repeatedly computing the gradient and adding it to the current coefficients.
We can also use the QGA to output E and v s (r) to the classical user and learn the final set of potentials classically. 8. A new set of atomic coordinates are provided, r a , the difference ∆v between the previous and this potential is computed, and the result is added to the old potential.
We now have the next potential and can start again at step 3 in this list until all systems are visited.
Note that this computation can be restarted at any time (at the significant cost of re-preparing the wavefunction). Note that it has been assumed that all systems are run for the same basis set, although this condition could be relaxed in principle.
One advantage of using this method is that the Kohn-Sham potential is characterized by N coefficients κ k , v s (r) = N k=1 κ k ϕ k (r). This is a factor N less than the density required. The evolution of the Kohn-Sham potential from an initial guess potential by gradients is very similar to how the wavefunction is evolved with RTE.

III. ADDITIONAL CONSIDERATIONS AND USE OF THE FUNCTIONAL
In the previous section, an RWMP algorithm for finding the density functional and Kohn-Sham potential were detailed. It was also noted that other variations of the density functional could be found using the same technology.
Several points that could extend the RWMP algorithm-and other details about what was introducedare discussed here. These include resource costs, a comparison between finding the density and the KS potential, the universality of the functional, how the functional can be used, applications to other types of functional theories, and opportunities for near-term studies. Some points relating to a quantum advantage that are discussed here are continued in Sec. IV A.

A. Scaling and resources required
The algorithm will require a fully scalable quantum computer, probably with self-correcting memory. Nearterm examples are available for density functionals, and a full discussion of the known features of finding the density functional with this method are discussed here.

Algorithmic scaling
The scaling of the RWMP algorithm for the density functional in terms of the number of basis sets is O(N 2 ) for one system asymptotically in the number of basis functions due to time evolution. The actual complexity in practice is between O(N 2 ) and O(N 4 ) for intermediate system size (see Apx. A 2 and D 4). The other subroutines scale at most as O(N 2 ) (see Appendices).
2. Pre-factor for convergence of the wavefunction Even though the scaling in terms of the number of basis functions is polynomial, the actual cost is significant due to a pre-factor. 24 The pre-factor of the algorithmic scaling is problem dependent and will be the dominant cost to obtaining a ground state wavefunction. The pre-factor will depend on the time used to prepare Ψ, the number of steps that the QAE algorithm must be run, and the required number of systems to be visited. Note that the number of steps that must be run to obtain the correct time evolution is dependent on how close the starting state is to the system's solution, the number of electrons, how strongly correlated the electrons are, and many other variables.
Note that if the first system in the RWMP algorithm is exactly equal to the initial wavefunction (e.g. the wellseparated limit for a neutral molecule where each separated piece is a hydrogen atom with one electron and where Hartree-Fock is exact for one-electron systems), then the time to make the first solution is zero and each subsequent motion of the atoms closer together will be another data point that may not require too many time steps to obtain.

Convergence of the machine learned model
In order to aid the convergence of the ML model, it would be useful to have several quantum computers running at once. This would mean that more than one system can be used to construct a mini-batch update for the cost function of the gradient descent, making the resulting update to the ML model more accurate. One also can save the output quantities as future training points at the cost of extra registers. Once an accurate ML model is generated, the problem does not need to be solved again.

Resources required
The RWMP algorithm presented in this work requires 4+ζ qubit registers for ζ quantities of interest that are not the energy (e.g., learning E and v s (r) implies ζ = 1). There will also be overhead for the QAE and other parts of the RWMP algorithm. One can also divide the parameters in the ML model into separate registers for each additional quantity of interest since the model spaces between them should be sparse.
The number of qubits required is clearly large and dependent on the specific steps of the RWMP algorithm used. This means the RWMP algorithm is best suited for a fully working quantum computer that is expected in the future. There are many ways to reduce the steps necessary. For example, in one way, by outputting the QAE to a classical user. However, we want to maintain the flexibility that a quantum advantage could be realized here as discussed next in Sec. III B.

B. Structure of the neural network
In a neural network, the form of the non-linear function, S (see Apx. C), is often chosen so that it is easily differentiable. On the quantum computer, determining the gradients are accomplished in one oracle query (see Appendix D 3), so the traditionally used functions on a classical computer (e.g., sigmoids, etc.) can be swapped out for another function that may be lead to better performance or faster convergence. It is not clear if this will produce any detectable advantage.
The number of coefficients required to obtain an accurate ML model can be very high, but some guidelines can reduce this number for a given quantity. 82 For the problem at hand, note that the training is done in the same basis as the problem is expressed. In this case, the connectivity of the neural network can be constrained on physical arguments. Known bounds on the structure of local correlations as proved by Hastings in Ref. 83 apply to densities that was originally shown by Kohn,et. al. in Ref. 84 and 85 This means that perturbations on the density decay exponentially with distance for gapped systems and as a power law for gapless systems. 86 Then, the neural network models for the density do not need to account for arbitrarily long ranged connections and can remain local. In essence, the connectivity of edges on the graph for the neural network would be similar to the structure of a multi-scale entanglement renormalization ansatz 87 (scaling as N log N ) but in three-dimensions. Note that this estimate is a minimum and it may still be advantageous to add more hidden layers or connectivity. Note that we will expect that gradient-free methods (Apx. C 1) will not necessarily be more useful in training the ML model.
Not all quantities will have a local structure. The v s (r) is fully non-local due to the Hartree potential (see Apx. B 1 a) and can require all-to-all connectivity between subsequent hidden layers in the neural network.
Note that it is not required to train the ML model on the quantum computer. One can determine the quantities of interest for each system and output the results with the QAE from the quantum computer to the classical computer. 61 But we give the option here to train the neural network on the quantum computer in case a quantum advantage can be realized. Using the RWMP algorithm as presented in Sec. II would still avoid excessive measurement; meanwhile, outputting to classical variables will reduce the amount of overhead needed to store ML parameters.
C. Comment on the universal vs. exact functional ML functionals are very accurate over the training manifold on which they were constructed. 44 We have chosen the appropriate adjective (exact or universal) describing the functional very carefully in each use here. The procedure we describe here is not truly universal since the accuracy of the ML functional is limited to the training manifold of potentials that we have explored for the ML model. For example, trying to solve a model trained for solutions with N e electrons with a new potential in the training manifold with N e + 1 electrons will try to project the solution back onto the N e solutions.
The resulting ML functional can be numerically exact, however, for a given problem. This functional will be called the exact functional, implying that it is accurate for some systems but not all possible systems.
To complete the discussion, one can have a universal but not exact functional. For example, if one estimated the energy to be a fixed value for all systems, this would be universal but not very useful.

D. Comparing the search for the density and Kohn-Sham potential
To actually compare the true cost of the O(N 2 ) operations to find the density with the O(T N ) coefficients for the KS potential, one would need to know the number of times T a gradient must be applied. This depends on the system studied and starting KS potential. It is therefore not clear as a general guideline if directly finding the coefficients of the density matrix is always better than starting from a KS potential that is close and applying the gradients. The key difference in the two cases is that the KS potential relies on a suitable starting state and the density does not. The assumption for the KS potentialthat a good starting state is required-is very similar to the restriction on RTE itself to find the ground state, so all assumptions are consistent.
The fact that the KS potential is competitive with finding the density here is more a comment on the overhead required for the density (which is far greater than on the classical computer) than the KS potential being any easier to find. There is one advantage to using the minimization for v s (r) here in that the QGA is much more efficient than the classical computer.
In addition to comparing the scaling, the nature of the KS problem requires a large basis set to obtain the proper KS potential and avoid any Gibbs oscillations that would appear in a truncated basis set. 74 So, the N required for an accurate density or energy might be smaller than for an accurate KS potential in practice.
Regardless, any effort to find v s (r) is worth it since v s (r) is far more descriptive and gives access to useful quantities.

E. Use of the functional
The algorithmic cost in Sec. III A is only paid once. When the model is given to a classical user, a separate cost must be paid to solve it for the groundstate. Given the ML model, the classical user can solve the ML functional self-consistently by determining the Euler-Lagrange equation to minimize the functional (see Apx. B 1 f). This requires that a projection back into the training manifold must occur to ensure the functional derivative is accurate. 33,43 This projection is estimated to scale as O(N 2 ) but can be machine learned separately to speed up computation. 32,33,43 Finding the pure density functional has better scaling for the classical user, scaling like the number of basis functions, N . Note that even though we can learn F [n], the external potentials (as well as particle numbers and polarizations) over which the ML model is accurate must also be given to the classical user so it is understood what the training manifold is.
The KS system scales as the cube of the number of basis functions formally, N 3 , since it is a non-interacting problem. Note that when solving the KS system with the exact functional, convergence is proven. 68,88 If a bifunctional, E[n, v], is used then a self-consistent solution is not necessary. 33 Once the model is trained sufficiently accurately, one can always re-train for another type of functional (e.g., from the KS potential, the density can be obtained and a pure density functional can be trained).
Finding the density functional-and using it compute quantities-is preferable to generating a library of system properties and machine learning those properties. From the DFT model, the other quantities can be constructed.
As an example application, if trained over enough external potentials, this method could efficiently evaluate molecular dynamics problems giving accurate results on laptops instead of supercomputers. 89

F. Other types of functional theories
There are many other types of functional theories that are related to DFT that can be obtained with these methods. The most easily extended method is density matrix functional theory (DMFT). 90 Since our method of obtaining the density was to actually obtain the coefficients of the density matrix (see Apx A), the density matrix functional is learned.
Extensions of DFT can also be solved with this method including the motion of nuclei, 91 time dependence (TD-DFT), 54,92-94 thermal properties, 95 superconducting functionals, 96,97 quantum electrodynamics-DFT, 98-100 ensembles, 101,102 and more. 62,[103][104][105][106][107] In each case, the solution on the quantum computer is modified in some way. For example, a superconducting functional theory can be found if the pairing potential 108 is also computed and learned.
Two of the methods in the above list (time-and temperature-dependent methods) deserve extended discussion in the subsequent sub-sections since both exact and approximate funcitonals can be found from both. Note that the approximate methods only require the ground state functional in both cases.

Time evolution
Given an accurate enough ML approximation for the KS potential, one can time evolve the system. 41 If the system is adiabatically time evolved with a weak enough perturbation, this would be available immediately from just the KS system at zero time (see Eq. (B25) in Apx. B 3). The adiabatic approximation is often sufficient for many physical processes.
For time evolution beyond the adiabatic limit, an exchange-correlation kernel denoted as f xc (see Apx. B 3) would need to be found. Full evaluation of f xc could be accomplished with the QGA (Apx. D 3 a); however, one can estimate the kernel via back propagation in the neural network (see Apx. C). An accurate functional derivative will be required in this case. 109 One can also time evolve on the quantum computer and use expectation values at various times for other quantities. 53,60,110,111

Finite temperature calculations
For temperature dependent processes, one approximation that is available from the ground-state KS potential is the Fermi-weighted KS technique. 112 In this method, occupied and unoccupied orbitals found at zerotemperature can be weighted with a Fermi-Dirac distribution to obtain a finite temperature density.
For exact computations at finite temperature, Apx. B 4 discusses the temperature dependent KS potential (which is very similar to the ground state case) and can be found if a finite temperature wavefunction is provided. 110,113 Other theories may require computation of other quantities, including those not relevant to quantum chemistry. Yet, they can follow the same strategy as the RWMP algorithm (i.e., re-use of the wavefunction and QAE) to find the relevant quantities.

G. Non-density functional quantities
Note that the RWMP algorithm is not specific to the density, the KS potential, or even a density functional. Many quantities of interest can be obtained. To see how the continued fraction representation of the Green's function can be obtained, see Ref. 114.

H. Reduced examples for testing
Throughout, it is implied that many qubits will be required, but we view this as similar to requiring sufficient memory on the classical computer. However, there are test systems that are used to provide insight into DFT and may be useful for proofs of principle. [115][116][117][118][119][120][121][122][123] A simple model that may be within reach of existing quantum computers is the two-site Hubbard model, which has been used to study simplified DFT. [124][125][126][127][128][129][130][131][132]

IV. LIMITATIONS OF KNOWN ALGORITHMS
This section discusses limitations on the types of algorithms that could have been used to machine learn the functional and how future improvements in algorithms must continue to improve the performance on quantum computers. We hope that a clear and complete discussion of the current hurdles for quantum computation motivate future algorithms and provide a consistent account.
A. Feasibility of obtaining the starting state In the previous section, we omitted a discussion of how to prepare the initial state in superposition and how it is converged. The time it takes to obtain a wavefunction is known to be inefficient with current techniques. For some algorithms, the ideal input would have been a superposition of solutions that would include all combinations of particle numbers and spin polarizations for all external potentials.
In this section, we consider the complications for even preparing a suitable state in superposition and known limits. We will discuss the RTE algorithm in the context of its scaling, why the large pre-factor can prohibit its use on some systems, comparing the implementation of RTE on the classical and quantum computer, and limitations to constructing a superposition of all solutions with any method. This last point will address the feasibility of finding the original starting state for the QML.
Some other algorithms that were not used as subroutines in the RWMP method are also discussed.

General considerations for real time evolution
One of the primary motivations for making the RWMP algorithm for the density functional is to recycle the ground state wavefunction, reducing the cost of RTE. It is true that RTE scales only polynomially (Apx. A 2) with the number of orbitals. In comparison, the full configuration interaction (FCI) gives the exact results but scales as O(N e !). 133 So, RTE has a scaling advantage over FCI; however, the pre-factor matters.
The exact same RTE algorithm can be run on a classical computer. The reason that quantum chemistry computations are not run with RTE is that the pre-factor equal to the number of time steps is large. 24,134 This is especially problematic for large systems and those where a near-degeneracy must be avoided, necessitating a small step size. But this is exactly where quantum computers are hoped to be applied.
Note also that the Trotterization of the time evolution operator is suited for planar molecules since interactions are comparatively localized, which is the same reason that matrix product states prefer these geometries 86,135,136 further limiting the usefulness of RTE.
Further comment becomes far more complicated because these alternative, smaller wavefunction ansatzs on the classical computer may display all the same features of the true ground state and accurate energy. These solvers typically have systematic errors that are studied, can solve systems faster, and have led to solutions of large systems 137 and new discoveries. 138 So, it is not clear if a quantum computation can beat all classical representations in terms of efficiency. Note that on the classical computer, another solver can be used (e.g. tensor networks, 86,135 quantum Monte Carlo, 139 random phase approximation, 140,141 or many, many other methods 142 ). The number of choices here is vast and storied, so we leave more discussion for others. 142,143 In summary, the time complexity of solving the quantum computation should be expected to be larger than classical solvers. Quantum computers can represent a reduction in the amount of space required to store a wavefunction, but it is not clear if this will always beat every classical representation. Improved methods of finding the ground-state would be highly valuable.

Choice of basis sets for algorithms
It is true that the quantum computer provides a different representation of the wavefunction. In some representations for classical algorithms, the memory will grow considerably with system size. For example, a coupled cluster calculation can have many coefficients as in the number of operations may become exponentially large. 144 This is due to the need to store more coefficients for a given basis. The quantum wavefunction on the quantum computer may have some advantage here since coefficients are stored in superposition on each qubit. However, there are many wavefunction ansatzs to consider in comparison because some methods can find accurate answers with considerably less coefficients, and the time needed for the quantum computer's wavefunction can be lengthy.
Some recent efforts on the quantum computer have sought to impose specific conditions to reduce the amount of operations required. The strategies that are pursued are to use different basis sets and generate criteria for removing terms from the Hamiltonian. For example, planewaves lead to a sparse Hamiltonian (see also Apx. A 2). 145 Such systematic methods for this should be expected to be as difficult as (or more so than) solving the wavefunction outright.
The proper comparison between RTE on a quantum computer with a plane-wave basis set and RTE on a classical computer is that that the classical computer can handle any basis. So, a comparison with a plane-wave basis set on a quantum computer should be compared to a RTE on the classical computer with some other basis set (e.g. Gaussians, etc.). The difference in the number of orbitals required to accurately simulate matter for planewaves can be much higher than other basis sets due to cusps in the wavefunction that require large numbers of plane waves to resolve. 142,[146][147][148] So, restricting the basis to plane-waves at best matches the classical equivalent in terms of operations required (see also Apx. A 2).
Wavelets have appeared in some references recently suggesting that these functions can provide a systematic way to compress a Hamiltonian, but in the 30 year existence of these functions (for more information, see the references in Ref. 149), they have been demonstrated to scale poorly to large system sizes due to the curse of dimensionality. 150 These methods typically give access to 1-3 electrons when not approximating the Fock operator. To use these functions, a compression ratio of nearly 100% would be required. Using these functions beyond one-dimension faces significant hurdles in the general, interacting case.
In conclusion, restrictions to a particular basis set or procedures to remove terms in a Hamiltonian is not a cure-all for computation in general. The task of identifying which terms of the Hamiltonian without first solving the problem is typically complicated, and the resulting answer is not necessarily exact anymore. Approximated calculations are essentially the strategy of classical methods which leave out some effect or terms, and it is not clear how to do this systematically in general, nor if any simple strategy can be expected. A single method or change in basis is not a panacea to making the solution on a quantum computer less complex.

Limits on solutions in superposition
With regards to applying any generic algorithm to find a superposition of solutions, we can place a limit on finding the initial state required for algorithms that need this superposition of solutions.
Theorem IV.1. A quantum computer can not efficiently generate a superposition of solutions for all potentials necessary for F [n] unless BQP=QMA-complete.
Proof. It is also known that at least one of the systems contained in the universal functional is in the computational class QMA-complete to solve. 151 This is not efficient on the quantum computer to compute objects in this complexity class. So, no algorithm should be able to obtain all solutions efficiently.
Thus, some elements should be expected to be unconverged in a superposition unless the algorithm is run for an impractically long time. Note that algorithms requiring a superposition generally require more than one solution, so the superposition is not run just once.

Alternative algorithms to real time evolution
We do not rule out that some improvement may allow RTE (or some other method) to receive a quantum advantage. If a superior algorithm is developed (e.g., the tools exist to make imaginary time evolution, 152 perturbation theory, 153 preparation of projected entangled pair states, 154 or a very expensive version of the density matrix renormalization group 135,155,156 at the present time), it is likely that it will rely on converging from some initial state. Also, any algorithm like exact diagonalization for general systems is not expected to be efficient since this problem is not contained in the BQP complexity class. 151,157,158 This creates more motivation to focus on algorithms that converge.
We do note that progress through the decades on quantum chemistry has been difficult to find a unifying principle that would help in algorithm design, 143 but perhaps quantum computing may motivate a new way to look at the problem for cases of interest since the prospects of finding a general algorithm are prohibitive.

B. Other methods and quantum machine learning
In regards to other methods to train the ML model, we had investigated using alternative subroutines using a superposition of solutions. Algorithms that we considered included Grover's algorithm, 4,7 quantum walks, 159,160 and others. Each of these requires a solution of a superposition of systems, some means of identifying the correct solutions through a phase kickback, undoing the superposition of systems, and then repeating the process until the error is low enough to ensure the correct solution is determined to some high probability.
The limitations described in Thm. IV.1 are one hurdle, but in our view, these strategies of uninformed search were too lengthy even on the quantum computer. Since the improvement in the number of steps required is only by a square root factor, searching the exponentially sized database causes the algorithm to run for far longer than other classical algorithms for electronic structure.
The variational quantum eigensolver (VQE) 161 might be adapted to finding, for example, the Kohn-Sham potential with a method from Refs. 68-75, but it is not clear if errors can be kept small in a reasonable amount of time and we wish keep focus on finding the exact KS potential. The VQE would also require the exact density from another method, but it could in principle be adapted. The main question is how well this suggestion would perform on a quantum computer based on current hardware.

Quantum machine learning
Quantum machine learning (QML) algorithms were also not suitable since known bounds on the number of oracle queries imply that there is only a polynomial speedup for QML algorithms. See Ref. 55 and experimental proof on a simple case in Ref. 162. A confirming statement is also found in in Ref. 163.
Lacking an exponential speedup, it is likely too expensive for the quantum computer to run in a reasonable amount of time here.
Existing statements in the literature on the hardness of determining the universal functional can lead to limits on the types of QML algorithms we expect can exist. We formalize the relevant limits in some statements here. 164 Theorem IV.2. No QML algorithm can discover the universal functional in polynomial time unless QMAcomplete reduces to the complexity class BQP.
Proof. The functional is proven to be QMA-complete to learn. 158 If an algorithm determined the functional in polynomial time, then BQP=QMA-complete.
Note that this says nothing about whether QML can do slightly better than the classical algorithm, but typically a step in a QML algorithm is to re-prepare the wavefunction and this is one of the main issues we want to avoid here. Because the functional is known to be QMA-complete to learn, finding the exact and universal functional with the RWMP algorithm would require an exponential amount of time to visit all systems, as expected. 151,157,158,165 We continue on to make a connection with some statements about learnability.
Theorem IV.2 also implies limits on how learnable the universal functional is by any method. In Ref. 55 This is a consequence of the hardness of finding the functional. If this were not true, we could find a QML algorithm that could discover F [n] in exponentially fewer steps, which is a violation of both Thm. IV.2 and Ref. 55. So, there can be no exponential speedup for QML under the PAC model here.
While there may be cases that lie outside of the PAC model, [167][168][169][170][171] we have we no evidence that the functional is not subject to the PAC assumptions. There is also good evidence to suggest that simple systems-at leastobey the limits of the PAC model . 162 In summary, the difficulty of finding the universal functional on a quantum computer places limits on the ability for many-body solvers on the quantum computer and QML as well. Note the generality of the statements here for all QML algorithms. Recent works have shown that some existing QML algorithms are not as efficient as once thought, 56,57 but the arguments here apply to QML in general for this problem.
These general arguments do not prohibit a quantum advantage if another algorithm can be found to solve systems in a more specific case or restricted class of systems. Recent progress on finding classical algorithms that are superior to quantum algorithms illustrate the need for caution when proposing an efficient QML algorithm, however. 59 Still, QML does not seem to be a feasible way forward for the problem of interest here.

C. Summary
The main take-away from these statements is that an exponentially more efficient algorithm for the most general case is ruled out when discussing the solution on a quantum computer. Reducing the pre-factor, therefore, becomes highly beneficial. This does not preclude algorithms on more restricted systems, but there is no hint of how exactly to construct such an algorithm or that this is any easier than a straight-forward solution.
The RWMP method avoids repeated measurement, reduces the pre-factor to solve each system iteratively and allows for more systems to be solved with RTE or some other method.

V. CONCLUSION
It has been demonstrated that a combination of algorithms applied to a wavefunction on the quantum computer can yield the Kohn-Sham potential, energy, and density matrix coefficients without completely repreparing the ground-state wavefunction each time. This limits the pre-factor of the algorithm. The determined quantities can be used to train into a machine learned model using gradient-based methods either on the quantum computer or classically. The ground state wavefunction was used as the starting point for the next system, avoiding an expensive computation of the ground state at each step. This efficiency was also used for the Kohn-Sham potential with a minimization condition.
Once a model is created, a classical user can extract the relevant quantities from the machine learned model and use it for ground state, time-, and temperaturedependent calculations. Finding the Kohn-Sham potential is especially useful here since it gives access to many properties of the ground state; in addition, there was some indication that the Kohn-Sham potential might scale better in some cases as opposed to finding the density. Known limitations on the complexity of finding the universal functional and quantum machine learning have constrained the choice of subroutines in the algorithm here. A better method to solve for the ground state on the quantum computer must be a focus of future research to make quantum chemistry studies feasible, but this algorithm will allow for solutions to exported to many users.
which is the kinetic plus external potential terms. The two-electron integral is where v ee (r − r ) = 1/|r − r | for the case of a Coulomb interaction and that this expression assumes the orbitals for both spin-up and spin-down electrons are the same. Note that a Hubbard model is an approximation with only the most diagonally dominant terms of the Coulomb operator, V ijk = U δ ij δ jk δ k for Hubbard interaction U . 174 We have restricted our consideration to the Born-Oppenheimber approximation, 175 even though the discussion can be generalized to the motion of nuclei. 91 Solving the entire many-body problem is known to be difficult if not impossible. However, approximate methods can yield results that are accurate to what is known as chemical accuracy (1 mHa) or a stricter limit applies in some cases. 142

Basis sets
We can note that Eq. (A 1) has been written in the second quantized form since we expect to need a basis to truncate the problem to a more manageable size. One may, for example, choose Gaussian orbitals 176 so that Eqs. (A2) and (A3) can be evaluated analytically and chemical accuracy can be obtained with only a few functions. Other basis functions can also be chosen. 142 It is known that Eq. (A3), when represented in a local basis, reduces to 142 lim loc.
(A4) for a density matrix γ where the limit is taken for well separated, local orbitals at large distances. This reduces the computational complexity from O(N 4 ) to O(N 2 ) in the asymptotic limit, although the true scaling lies somewhere in-between depending on the details of the system. 24 This argument only applies to orbitals that drop off sufficiently quickly with distance from the origin.

a. The curse of dimensionality and other limitations in point-like basis sets
We note that the reduction in Eq. (A4) happens immediately when using purely local basis sets, with no spatial extent, is used. In that case, we would reduce to a sum over only the diagonal elements of the two-electron integral, V ijk , if the orbitals were point-like. However, using only these localized orbitals (grid points, plane waves, wavelets, etc.) comes at a steep price.
In particular, note that wavelets are very expensive for large scale quantum chemistry problems. It has been known for some time now that a curse of dimensionality shows that the number of functions in one dimension scales as N d 1D for d dimensions with a number of functions in one-dimension, N 1D . 150 Due to the large number of basis functions, wavelet based functions have only been able to solve 2 and 3 electron systems maximum in the general case, 177 although these function can be efficient for larger non-interacting or single Slater-determinant theories or other cases of very particular interest. 178 Wavelets are simply not expected to be efficient for real threedimensional systems of any meaningful size based on preexisting works unless the problem is converted to a noninteracting theory or a special geometry is chosen. For more information, see the references in Ref. 149.
For plane wave functions, many thousands of functions are required to resolve the electron-electron cusp (e.g., the behavior of the wavefunction at the nucleus in a hydrogen 1s orbital). 142,146-148 We will not consider pointlike basis functions further here to concentrate on the general case, although plane-waves can be useful for periodic systems. With respect to the density matrix (which is a highly important quantity in Sec. B), the full double sum must be taken (see Eq. (B3)) and not just diagonal elements.
In summary, even though the scaling of the twoelectron operator is better for point-like basis sets, many more basis functions will be required to obtain accurate results except in special cases. So, choosing a point-like basis function will not represent a general strategy for all types of quantum chemical problems that we may wish to solve.

Appendix B: Density functional theory
The foundations of density functional theory (DFT), including the Kohn-Sham system and other variants, are introduced here.
A compact representation of the quantum ground state is the one-body density. In DFT, the ground state wavefunction is replaced with the density. It was proven in Ref. 29 that the one-body density, defined as n(r) = . . . |Ψ(r, r 2 , . . . , r Ne )| 2 dr 2 . . . dr Ne (B1) is sufficient to characterize the ground state. Note that in order to obtain this quantity on the lattice, the onebody reduced density matrix must be obtained for N e electrons, ρ(r, r ) = . . . Ψ * (r, r 2 , . . . , r Ne ) (B2) ×Ψ(r , r 2 , . . . , r Ne ) dr 2 . . . dr Ne and is related to the density in the limit where r → r where ρ ij = Ψ|ĉ † iĉ j |Ψ . A spin index has been suppressed, signifying a spin degenerate ground state. However, extensions to ground states without spin degeneracy are also available. 48 Having replaced the wavefunction for the more compact density, the Hamiltonian must be substituted for another mathematical object that acts on the density. In general, an object that maps a function to a scalar value is known as a functional. 179 In DFT, a functional maps the one-body density to a scalar energy value.
In order to find the ground state energy, we can use a minimization over all densities 180 although it is impractical to search for the ground state density with this formulation. and is common to all systems since it does not depend on the external potential. Clearly, the minimization is not an efficiently way to find the functional, but it is useful as a mathematical tool. Because F [n] is unknown explicitly (its existence is proven by contradiction), it requires approximation to use. 30 Some limiting cases are known, such as one-or two-electron cases, the uniform gas via a fitting procedure, and one-dimension. [181][182][183][184] Many exact properties of the functional are known from rigorous mathematical statements 185,186 or limited test cases. 88,122 One common way to design new functionals is to build in exact conditions. [187][188][189] Note that to solve a problem with F [n], the functional derivative can be used and is defined as 47,48 δF where Υ is an arbitrary test function, g is the function we wish to evaluate F around, and η is a small parameter. The first functional derivative is most well-known from classical physics where it is used to minimize the Lagrangian via Euler-Lagrange minimization. 190 In order to find the minimal density, a functional derivative can be taken. This is synonymous with the Euler-Lagrange equations in this case 191 where a constant chemical potential µ was added as a Lagrange multiplier for the total particle number. This equation is then used to solve for orbital-free DFT.

Kohn-Sham density functional theory
One useful alternative formulation of F [n] is KS-DFT. This reformulation of DFT proposes an external potential whose solution resulting one-body density is equivalent to obtaining the one-body density of the fully interacting system. The original goal of DFT was to propose a purely wavefunction-free method to characterize the quantum ground state, but it is difficult to find suitable approximations that are accurate enough.
It can be noted that approximating F [n] is a large approximation on the total energy. In this alternative formulation, one introduces an easy to solve, noninteracting, auxiliary system to make the required approximation a smaller fraction of the overall energy. Obtaining the KS potential gives insight to many more physical quantities than just the density, and the orbitals of the non-interacting system can be used in a variety of other contexts.

a. Finding the Kohn-Sham potential
To formalize the KS system, what is known as the adiabatic connection can be used to transform from the original problem to the final non-interacting problem. 192 Equation (A1) can be rewritten as where an express dependence on the coupling constant λ has been introduced. The tuning parameter λ can vary between the KS system (λ = 0, whereV (λ=0) =V s ) and the original system (λ = 1). Note that the external potential operator (V = v(r)) has received an implicit coupling constant dependence, but there is no simple analytic form forV (λ) . The constraint given in this problem is that the density must be the same for any λ, n(r) ≡ n (λ=1) (r) which is difficult to construct in practice. Note that the other limit of λ → ∞ can also be used to base a functional theory. 193 b. Components of the functional in the Kohn-Sham system The form of the universal functional for the KS case is This form is known from perturbative expansions of the many-body system. 30,172 Note that the subscripted "s" on the kinetic energy is to signify that T s is evaluated over non-interacting wavefunctions, φ, but has the same form as the same kinetic energy operator in Eq. (A2). This term shows that the KS scheme is not a pure density functional but one that relies the auxiliary non-interacting orbitals. The cost to solve the non-interacting system is larger than pure-DFT, but still smaller than many other approximations.
In addition to the kinetic energy, another known energy in Eq. (B11) is the Hartree energy, which is fully nonlocal. The unknown term in Eq. (B11) is the exchangecorrelation energy, E xc , which is not known as a density functional and requires approximation in practice. If the exact E xc is used, then the theory is exact. The usefulness of defining the KS system is that for many systems the approximation to the total energy is small for many systems of practical interest. where a functional derivative is taken over the relevant energy terms, for example, for the Hartree potential, and the form of v xc [n](r) is not known explicitly. In summary, by re-grouping the non-kinetic energy terms in the Hamiltonian, U [n] + V [n] + E xc [n], the resulting system will appear as noninteracting. The electron-electron term is contained in the resulting potential of the non-interacting system.

d. Variational principle for the Kohn-Sham potential
The Kohn-Sham potential also satisfies the minimization of the quantity 69 where we follow Refs. 69-71 closely. Note that Ψ(r, r 2 , . . . , r Ne ) is not an eigenstate ofT +V s but that Φ(r) is. So, by the variational principle. 194 The functional derivative of Eq. (B15) with respect to v s (r) is 69 and equals the difference in the densities of the two systems, one computed from Ψ (n Ψ ) and the other density from Φ (n Φ ). 69 When this difference is zero, the condition for the Kohn-Sham potential is found given in Eq. (B10).

e. v-representability
The KS scheme is exactly defined provided that vrepresentability is satisfied. In common practice, this is not a concern since it was proven on a grid that the system must be v-representable since the kinetic energy is regularized. 66,67 So, we always expect v-representability here.

f. Minimization of the Kohn-Sham functional
Note that the Euler-Lagrange minimization of the functional yields the KS equations for some KS energy eigenvalues j and KS orbitals φ j (r). The density is then the sum over occupied orbitals equivalent to n(r) = j∈occ.
|φ j (r)| 2 (B19) which can be found from Eq. (B3) by noting that the excitations are orthogonal. One recovers Eq. (B3) with an additional index for the excitations when φ is decomposed into a chosen basis.
g. Relationship between the energies of the Kohn-Sham and the fully interacting system The adiabatic connection from Sec. B 1 a does not conserve energy. The relation between the ground state energy of the interacting system, E, and the energy of the KS system (the sum of eigenvalues of the non-interacting system, j∈occ. j ) is 47 for Hartree energy U , exchange correlation energy E xc , and exchange-correlation potential v xc , Note that Eq. (B20) shows it is not sufficient to have only the KS potential to find E, although perturbation theory on the density can be used. 195

Potential functional theory
When examining Eq. (B4), it is natural to ask if a dual theory can be formulated based on v(r) instead of n(r) since both are one-body quantities. This question stems from noticing that functional derivatives of n(r) yield equations that can be solved for the density, resulting in the Euler-Lagrange minimization for the density functional from Eq. (B8).
To the question: can we instead take a functional derivative with respect to v(r) instead? The answer is yes. It was proven in Ref. 62 that the dependence on the functional in terms of the external potential was sufficient to describe the ground state. In this theory, the density must be determined from v(r) directly as n(r) → n[v](r). The resulting energy becomes which is similar to Eq. (B5).
a. Why the functional derivative is also necessary Very importantly, one can not determine the entire character of the ground state (i.e., find n(r) or equivalent) with only E [v]. To see this, note that the Euler-Lagrange equation for potential functional theory is 107 where a derivative of the density with respect to the external potential in the second term of the left-hand side must be determined to solve this equation and find the density. In summary, one can formulate potential functionals, E[v], provided that n[v](r) is known. 81,106 In other words, the functional derivative is also necessary to perform self-consistent calculations if only the energy is known for a given potential.

Time dependent density functional theory
In a time-dependent DFT (TD-DFT), one may simply propagate the KS potential according to Schrödinger's equation for time evolution 54,191 with an initial starting state Φ 0 . A formal justification for the existence of TD-DFT is available. 92,93 Computing response functions is also necessary if a perturbation to v s (r) is applied. Knowing just the KS orbitals is sufficient to determine the response function for the KS system, 54 with occupation numbers ξ j , eigenvalues j , frequency ω, and small parameter η. Hence, knowing all eigenvalues of the v s (r) at t = 0 gives the KS response function. One can also relate χ s to the interacting response function χ via a kernel (n g.s. is the ground state density) and the relation In summary, if a system is solved at a given temperature, one can solve for the KS potential analogously to the ground state with an extra term −τŜ (and a term for the particle number) representing the entropy in the functional. Note that the ground state density is replaced by theΓ object which is akin to a density matrix and that extra weights must be solved. In order to find this, several states ψ Ne,i must be used.

Comment on density functional approximations
There are many functionals that can be used to approximate E xc . Each performs with its own set of systematic deficiencies.
The most pertinent approximations for this paper are the ML functionals. [31][32][33][34][35][36][37][38][39][40][41][42] The general strategy of fitting a functional may be unpalatable, 78 but the generic strategy of fitting exact data is not unique to ML functionals. The simplest approximation to the functional-known as the local density approximation-is a fit of highly accurate quantum Monte Carlo data. 197,198 Further, coefficients present in hybrid functionals are also fit to existing data, 199 among other examples. The ML functionals simply represent a more robust approximation that can interpolate well provided the system solved is close to the training manifold. This strategy would capture the exact conditions of the exact functional. 32,[46][47][48]200 can be performed. The basic idea is to ensure that any evolution of the coefficientsŵ occur along the steepest negative gradient in the system. In order to ensure that the gradient is negative, we can start from a consequence of Gauss's law which states that a gradient of a scalar (here, g) along the direction of changingŵ (denoted δŵ) is equivalent to the Laplacian of g, ∆g = ∇ŵg · δŵ where ∇ŵg = (∂ w1 g, ∂ w2 g, . . .).
If the form of the update from iteration over a time δt (which can also be expressed in the discrete case as earlier) is chosen aŝ for some small parameter η (called a learning rate), then then applying ∇ŵg to each side with a dot product gives ∆g = −η |∇ŵg| 2 < 0. In other words, the Laplacian of g is guaranteed to be negative for a small enough η such that g is linear on a small neighborhood. The argument was shown for the weights of the neural network, but the same argument applies to the biases.
Evaluating Eq. (C1) for all ι provided can be very costly since the resulting gradient is very noisy. To speedup the gradient descent, a randomly sampled subset of the provided training data, called a mini-batch. This can be one or more selected points. Since the points are randomly selected at each step, the gradient descent is taken stochastically.
Even though the cost function evaluated over the minibatch must be negative, the entire cost function evaluated over all training points does not need to decrease. However, on average, the observable's value is lowered over several SGD steps. The condition is ∂ t g < 0 for a minibatch in the general case but would be ∂ t g < 0 if all points are used.
There are two steps required when training a ML model with SGD: forward and backward propagation. In forward propagation, Eq. (C2) is applied straightforwardly from the input to the output layers of the neural network. The backward propagation requires that a derivative of Eq. (C2) from the output to input layers. Repeating this constructs the cost function and then applies the gradient to all weights and biases in the network until the model is more converged.
More advanced algorithms can also be used to converge gradient descents as well. 201

Gradient-free methods
This section has been focused on training ML models with gradients. Another class of method, one that uses random walks, 202-211 is also available. However, these methods of training that involve a random walk typically only perform well on small numbers of parameters. In fact, these methods perform better than gradients in some cases for these small systems. If the problem is too large, then a gradient-based method is generally better. 207 There may also be opportunities to combine the two, 212,213 although this may require that the wavefunction is re-solved to implement on the quantum computer. We were not able to find any evidence that random walks can reliably compete with gradient-based training methods; however, if one could use such an algorithm, a quadratic speedup is available on the quantum computer for training with the random walk. 159 We also note that kernel based methods can be used to train the neural network but that they are not as "choicefree" since a functional form of the kernel must be selected. The minimization of the coefficients with a kernel does not require gradients. 45 Appendix D: Quantum algorithms On a quantum computer, both the method of manipulating and storing information is different from a classical computer. On a classical computer, electrons are moved around and represent different information based on where they are placed.
In a quantum computer, the quantity that we manipulate is the spin of an electron or some other quantity that is allowed to exist in a superposition of states. 1 We will refer to qubits in this work but note that one can extend the ideas to qudits where more than two states are possible. It is not possible to determine all coefficients of a wavefunction in a superposition without an exponential number of measurements to find them (i.e., measuring the spin for the both |0 and |1 states for each qubit individually). To manipulate the state of a quantum wavefunction, we can apply operators, specifically operators that are unitary. Often, the operators are applied to a specified qubit and also another auxiliary qubit to keep required operations unitary.
Sequences of these unitary operators can be cast as a tensor network diagram. Each line of the diagram represents a single qubit or a group of qubits called a register. Each block is a unitary operation that manipulates the state of one or more qubits. Note that the allowed operations on the quantum computer maintain the number of lines and do not involve truncations of the space at any point. This is qualitatively different from other uses of tensor network diagrams used for "tensor network methods" which can refer to a class of algorithms that solve for the ground state of a quantum system on classical computer. 86 So, there is a subtle distinction between the tensor network methods and the tensor network diagrams we draw here even though the general properties of the diagrams is the same in both. The main difference here is that no form of truncation or a connection with a renormalization group is taken explicitly.
One example of a useful gate that will apear in several places is the Hadamard gate, which is written in the {|0 , |1 } qubit basis states. Ap-plying this gate on a |0 state will give an equal superposition over the |0 and |1 states, which can be applied identically to more qubits as denoted by the ⊗ operator. We present some common algorithms in the context of solving quantum chemistry problems. 214 The quantum gradient algorithm (QGA) is used to find derivatives of a function, generically. Meanwhile, the quantum phase estimation (QPE) determines the phase of a given state. An important sub-algorithm in the QGA and QPE is the quantum Fourier transform (QFT) which is detailed first. 1 Real time evolution (RTE) in addition to quantum amplitude estimation (QAE) are also discussed.
Note that everywhere we use the symbol N for the number of qubits in this section. Many times in the literature, the number of qubits may also implicitly mean N registers with r qubits each for r digits of precision. If this is the case, the same concepts would apply, regardless.

Quantum Fourier transform
The QFT begins with a set of input data recorded on an initial set of registers, which we denote as |y . The end result of the QFT is to change the data |y into |x according to the discrete Fourier transform 215 for N qubits with a normalization factor 2 −N/2 coming from the pre-factor in Eq. (D1). The key first step is to understand how Eq. (D3) can be re-expressed in binary instead of integers integer y. A number y can be re-expressed in base 2 with the digits (assuming value either 0 or 1, corresponding to the states of a qubit) y 0 y 1 . . . y N −1 where N is the maximum number of (qu)bits we allow for the binary number. In full, y relates to the binary digits as A similar form can be written for using a qudit where a different basis is used (trinary, etc.). To express the binary representation of |y as qubits, let us express the state as with each of the sub-indices representing another digit of y's binary representation (y ∈ {0, 1}).
We can substitute y for its binary representation as using the previously defined expressions in Eq. (D4) to Note that the rotation operator is about the z axis, and the Hadamard is about the x axis. So, they can not commute nor can they be placed in a different order.
where the sums in Eq. (D3) are rewritten for the binary representation as for simplicity. Noticing that each term is factorizable, this becomes where for a given , the division by a power of 2 in the argument of the exponential will reveal one digit of the binary representation of the integer x. 1 The operators necessary to perform a QFT on a quantum computer can be identified in Eq. (D8). Note that applying the Hadamard gate on a qubit written in an arbitrary binary form is as can be verified by noting that x k is either 0 or 1. So, a Hadamard will rotate each bit of x into a basis that is only different from an individual term in Eq. (D8) by a phase on the |1 state. If we apply the phase rotation gate of the form after the Hadamard, then this will constitute the most basic operation in the QFT. That is, applying H and then R a certain number of times will generate terms required in Eq. (D8). More than one R gate may need to be applied depending on which bit of x we are acting on. The structure of gates is shown in Fig. 2. The next qubit has all but the first register's rotation matrix applied. We then continue to the next qubit, applying one less set of gates, and continue until we apply only the Hadamard on the last qubit. If a swap operation is applied, the qubits will appear in order and this completes the QFT. Note that the overall cost of this algorithm scales as O(N 2 ) but that a cheaper O(N log N ) is also available. 216 Note also that since we can not efficiently obtain all coefficients from the superposition, the algorithm does not provide a useful speedup over the classical algorithm which scales exponentially, not polynomially. However, the time to measure all elements is exponentially long, so this quantum advantage is not truly advantageous. Still, the QFT can be useful as a tool in other subroutines.

Quantum phase estimation
Given an input state, |ψ , we want to determine the associated eigenvalue of the form exp(i2πϕ) for some real ϕ denoting a phase. For this algorithm, we must first have the initial state, |ψ , and a number of auxiliary qubits at least equal to the number of digits that the phase must be accurate to.
The strategy will be to generate, from an initial wavefunction ψ, the binary digits of the phase of ϕ-as defiend in Apx. D 1-and then perform an inverse QFT to obtain the phase on an auxiliary register.
Let a gate U(= R 1 from Eq. (D10)) be one such that when applied to ψ, and controlled on one of the auxiliary qubits, it produces a phase that is the j th value of the binary representation, where |0 j is the j th qubit and t is the total number of qubits that this operator will be applied to.
The gate U is applied as in Eq. (D11) to a register of auxiliary qubits and to ψ as in Fig. 3. The resulting state is then transform is applied on the last step, we obtain |ϕ and have then represented the phase on a set of qubits.
In order to construct the unitary operator U, we simply must obtain the representation of the exponentiated Hamiltonian as exp(−iHt). 11 The energy of the wavefunction (ϕ related to E) is then related to the time applied and may involve other pre-determined constants. 11 In order to apply the exponentiated Hamiltonian, one option is to use the Trotter-Suzuki decomposition 217 to decompose the exponential into a product of exponentials with fewer terms. 24 By expanding the number of qubits used in the auxiliary register, we can increase the accuracy of the final result. We will defer detailed analysis of this point to Ref. 1, but it is worth noting that some error in this algorithm can be reduced with more resources. The total error can be reduced arbitrarily to 1-η for some small number η.
Note that improvements can be applied to the algorithm to generate more methods of QPE. 218 A recent improvement known as qubitization can bring down the gate-count for the determining the phase. We will defer to the discussion in Ref. 60.

Quantum gradient algorithm
Given an oracle for a function f , its gradients can be computed in one query of the oracle instead of m + 1 classically for m grid points. 76 We start with three multi-qubit registers. One is used for the computation of f . The other contains an equal superposition over all states (representing the infinitesimal directions that f can be shifted for an eventual gradient). The last receives a QFT (on initial register set to 1 while the others are set to zero). This final register will be used for a phase kickback. The purpose of the phase kickback is to modify the phases of the QFT. When the inverse QFT is applied, we obtain the gradient similar to how the phase was obtained for the QPE.
The steps of this algorithm are shown in Fig. 4. The Hadamard gate produces where N is the number of qubits contained in the second register.
When calling the oracle query as controlled on the equal superposition in Eq. (D13), f is evaluated on all arguments δ giving f (c + δ), having perturbed an initial coordinate by δ or some similar function. 77 On the third register in Fig. 4, a QFT has been applied on an initial value of 1, giving an output of 1 2 Np/2 w e i2πw/2 Np |w (D14) which is similar to Eq. (D3). The number of qubits in this register, N p , are enough to allow a pending bitwise addition to be carried out properly. The state of the full quantum wavefunction is then |ψ = 1 √ 2 N 2 Np δ w e i2πw/2 Np |w |f (c +δ) |δ (D15) whereδ = L(δ − N/2)/2 N (D16) and N is a vector of (2 N , 2 N , . . .) and provides the offset factor to convert integers to real numbers. M and L are assigned meaning in the following. The factor M scales the maximum amount of ∇f to keep this quantity expressed as an integer. The parameter L is a neighborhood over which the derivative is accurate to first order, for example in one-dimension is the decomposition in one-dimension. The next step is to add (bitwise addition denoted by ⊕) the first register to the third under the addition w → w ⊕ (2 N 2 Np f )/(M L) mod 2 Np so that the register and phase are shifted as Recall that, by inspection from Eq. (D3), applying the inverse Fourier transform will give So, given a continuous function f , we obtain a gradient ∇f . The representation on the qubits for both f and ∇f is in the binary representation of their continuous values. Improvements to this algorithm have been noted. 77 a. Functional derivatives with quantum gradient algorithms The QGA can be used to evaluate the functional derivative, Eq. (B6). The test field Υ(x) can be provided by hand or by Hadamard transformation with each resulting state giving the same result. Evaluating the derivative as before on η produces the functional derivative. This could be applied to a variety of functionals, such as the density functional or the partition function (although this last object would be very difficult to compute before taking the functional derivative due again to the curse of dimensionality). 219 There are other ways to get the functional derivative, such as with a chain rule if an alternative form is required.

Real time evolution
While it is possible in theory to implement a more advanced classical algorithm on the quantum computer, there may be sizable overhead. It is generally accepted in the literature to use the real time evolution (RTE) method which we choose to introduce here. 220 The initial state of qubits can be initialized into a state consisting of a single-particle Hamiltonian's, H 0 , eigenstate. This could be the Hartree-Fock solution for some number of electrons, N e . 11 The Hamiltonian is then given a time dependence such that t = 0 is H 0 and t = t max. is the full Hamiltonian, for some time-dependent function λ(t) and interaction term H 1 . By tuning the time parameter slowly enough, the new ground state can be found. A constant C is added in anticipation of the QPE and is simply taken into account when converting the output phase to the energy. 220 The final state of the RTE must be a close approximation to the true ground state for QPE to work properly. 1, 221 In order to evolve the initial state to the ground state, the error in the Trotter step must be no more than the allowed accuracy for a computation. 23 For the case of molecular systems, this is 1 mHa (although this can be even lower for some applications). A variable number of steps is required to fully evolve the initial wavefunction to the ground state, but this may be on the order of a number of thousands and the entire process can take months or much, much longer. 24 There are also other algorithms that could be used, 135,152-156 but these may have a large overhead. Just as with phase estimation, we present the most widely known algorithm here for ease of presentation. The RWMP method of the main text can interchange subroutines for the best algorithm.

Quantum amplitude estimation (quantum counting)
A way to obtain useful quantities from a wavefunction without measuring it is to use QAE 222 as described by Ref. 5 (although we follow the quantum counting algorithm used in Ref. 61 and also note Ref. 223). One can envision the application of an operatorÔ onto the wavefunction (e.g.,ĉ † iĉ j onto |Ψ ) as being represented in a superposition of the original function Ψ and all other states that are perpendicular, Ψ ⊥ , aŝ We will simply assume some process exists to apply the operator of interest is available. The goal is to estimate α 0 which is the expectation value. A series of steps is required to find this coefficient. The fraction of times that the following is successful will give α 0 . Before performing any of the subsequent steps, we assume that the energy of Ψ has been obtained via phase estimation beforehand on a separate register. In total, four registers are required: one for Ψ, one for the saved ground state energy, one for the check energy, and one for the pointer qubit. One iteration of the full algorithm is: 1. An operator is applied to Ψ 2. The energy of the resulting state is determined. The energy is in a superposition over all states 3. The newly found energy is compared to the saved energy of the original wavefunction 4. The difference is represented as a single bit (called a pointer qubit) 5. (Accept) The pointer qubit is measured. The algorithm then branches: if the measurement of the single pointer qubit gives a success, implying the energies match and the original Ψ was recovered, we repeat the steps starting from step 1 here after returning the state to the original configuration. A separate counter is incremented by one each time this step is reached.
6. (Reject) If the pointer measurement results in a failure, then the wavefunction found is not the original. We must recover the original wavefunction by undoing the QPE, undoing the operator applied (Ô † ), applying the operator again, and again finding the energy difference. We return to step 1. More details and diagrams can be found in Refs. 5, 61, and 224. Intuitively, we are counting the number of times the starting wavefunction is recovered when applying an operator. The ratio of accepted counts to the total number of times the operator is applied is related to the coefficient on the ground state, α 0 . While the description here involves a measurement and therefore gives classical data, the algorithm can be used as an oracle query for the QGA (as stated in Ref. 61 at the cost of additional auxiliary qubits).
In order to see how the algorithm will converge when the reject step is activated, we can analogize with the half-life of radioactive isotopes to envision when the rejection procedure must eventually recover the correct ground state. A detailed analysis of the convergence of the rejection step shows the number of steps required to find the original Ψ is related to 1/ε for some precision, ε and is found in Refs. 5 and 61.