Hybrid Ground-State Quantum Algorithms based on Neural Schr¨odinger Forging

,


I. INTRODUCTION
In recent years, significant advances have been made in simulating the static and dynamical properties of manybody quantum systems using variational algorithms.For instance, density matrix renormalisation group methods based on matrix-product states [1][2][3], neural networks quantum states [4], equivariant neural networks [5] or kernels methods [6] have accurately computed the ground state energy of spin systems [7], or also fermi systems, such as molecules [8] or nuclei [9].While neural network quantum states represent wave functions using classical representations, we can also consider their quantum counterpart, where the wave function ansatz takes the form of a parametrized quantum circuit [10], such as in the popular variational quantum eigensolver (VQE) [11].Even if VQE has been successfully applied in various areas, such as chemistry [11][12][13], spin chains [14][15][16][17] or nuclei [18][19][20][21], it is still unclear whether VQE is a scalable algorithm.Hence, the optimization procedure becomes increasingly difficult [22] with the system size because of the presence of barren plateaus in the loss landscape [23].It is consequently desirable to conceive variational quantum algorithms acting on a minimal number of qubits.
Although the VQE is already a hybrid algorithm, in the sense that it relies on classical resources to perform the optimization, we take a step forward and design an algorithm relying on both neural networks and quantum circuits.More specifically, we consider VQE based on entanglement forging (EF) [24], a circuit-knitting strategy that effectively performs a Schmidt decomposition of the variational quantum state, optimizes the two sub-systems separately, before reconstructing the entanglement classically.This procedure has the desirable properties of reducing the number of qubits while still reproducing the ground-state energy with high accuracy.It is similar in spirit to quantum-embedded density functional theory [25], where quantum resources are only used for the most challenging parts.
Besides computing ground state energies, EF also allows practical heuristic simulations, notably in analyzing bipartite entanglement.This concept is fundamental in quantum mechanics, as its measurement provides an understanding of the behavior of strongly correlated systems [26].For instance, bipartite entanglement has been used in condensed matter physics to study phenomena such as quantum phase transitions, topological order, and many-body localization [27,28].Advances in experimental techniques have made it possible to measure entanglement entropy in a variety of condensed matter systems over the past few years, revealing insights into their underlying quantum properties [29].
The main contribution of this paper is a Schrödinger forging procedure using an autoregressive neural network (ARNN) [30,31].This method combines the versatility of Schrödinger forging with controlling the computational resources required via the introduction of a cutoff.Generative neural networks have already been proposed for EF [32], but only in the context of Heisenberg forging, which requires permutation symmetry of the two subsystems.Our method, however, does not require permutation symmetry between the two sub-systems, making it a more versatile approach to solving ground-state problems using quantum computers.Moreover, our algorithm naturally includes a cutoff in the number of basis states, limiting the required number of quantum circuits.
This paper is structured as follows.We first introduce EF in Sec.II A, as well as two ways to tackle its scalability issue based on Monte Carlo sampling and neural networks.The main contribution of this paper is then proposed in Sec.II B as a third option.We conclude our work with numerical simulations in Sec.III testing our hybrid architecture on various physical models, such as one-dimensional spin chains, spins on a triangular lattice with a random external field, and the nuclear shell model.

II. METHODS
The general strategy of variational algorithms for ground state problems is to prepare a wave function ansatz |ψ⟩, and using the variational principle, to approximate the ground state energy E 0 of the Hamiltonian of interest H.The ansatz can take the form of, e.g., a neural network [4] or a quantum circuit [11], while the variational parameters are usually optimized with, e.g., gradient-based methods.In the following, we will explore hybrid classical-quantum models aiming at describing a bipartite system with quantum circuits, while the entanglement between the partitions is forged classically.

A. Entanglement Forging
The starting point of the EF procedure is to employ a Schmidt decomposition, a direct application of a singular value decomposition (SVD), to write a quantum state |ψ⟩ of a bipartite H = H A ⊗ H B quantum system, with dimensions N A and N B , as In the above, U and V are unitaries and |σ⟩ X ∈ {0, 1} N X and λ σ are the corresponding Schmidt coefficient.The latter are positive, normalized σ |λ σ | 2 = 1, and the number of Schmidt coefficients is called the Schmidt rank.We recall that the distribution of the Schmidt coefficients is related to the level of entanglement between the two sub-systems, with the von Neumann entropy being calculated by Therefore, maximal entanglement is characterized by a uniform distribution, while minimal entanglement by a dirac delta.The variational state is obtained by parametrizing U and V with two quantum circuits and considering the Schmidt coefficients as additional variational parameters.Following Eddins et al. [24], the most direct way to compute the expectation values, called Schrödinger forging, is to directly insert the Schmidt decomposition, e.g.Eq. ( 2), into ⟨O⟩ = ⟨ψ|O|ψ⟩.Assuming that the observable O admits a bipartition O = O A ⊗ O B , the expectation value can then be expressed as where ϕ p σn,σm = |σ n ⟩ + i p |σ m ⟩, Z 4 = {0, 1, 2, 3} and all the bitstrings σ have been labeled with a number n (or m).Decompositions with equally sized subsystems are considered: N A = N B =: N/2.We note that this is not a strict requirement, but we use it to simplify the notations.
We remark that this involves an exponential sum in the system size.As such, two methods have been suggested to solve this scalability issue [24].The first uses an unbiased estimator of ⟨ψ|O A ⊗ O B |ψ⟩, that can be evaluated by importance sampling according to ∼ λ n λ m .The second approach, is to leverage permutation symmetry between the two sub systems, producing another EF scheme.Since it is defined at the operator level, we refer to it as Heisenberg forging.This approach has been further developed by Huembeli et al. [32] using ARNN.More details about Heisenberg forging can be found in Appendix A.

B. Schrödinger forging with generative neural networks
In this section, we present an approach to Schrödinger forging.The starting point is to remark that the Schmidt coefficients decay exponentially if the two subsystems are weakly entangled, as it is the case in low-energy eigenstates of chemical and spin lattice model Hamiltonians.By introducing a cutoff in the sum, it is therefore possible to improve the efficiency of the estimation while keeping a sufficiently low additive error.However, it requires a selection of a set of bitstrings among the 2 N/2 total possibilities, which represents an open problem for EF.To this end, we propose to use generative models (more specifically ARNN [33]) to select the best candidates.ARNN is a type of neural network architecture commonly used in time-series forecasting and sequence modeling tasks.The autoregressive property means that the output at a given time step is regressed on its own past values.In fact, autoregressive models predict the next value in a sequence based on the previous values in that sequence.The use of an ARNN is motivated by the fact that the Schmidt coefficients are normalized and can thus be interpreted as a probability density.Following [34], we propose an algorithm, which is summarized in Algorithm 1.We note that this approach shares some similarities with quantum-inspired genetic algorithms, see e.g., [35].The parametrized unitaries and the Schmidt coefficients are finally optimized with a gradient-descentbased algorithm.A summary of the entire algorithm is shown in Fig. 1.

Inputs:
Cut-off k Outputs: Set of k bitstrings Initialize: Start with a random set A of k bitstrings while the algorithm has not converged do 1.Generate a set of bitstrings G with the ARNN 2. Using the bitstrings σ in the set A ∪ G, find their Schmidt coefficient λσ by solving the system of equations 3. Create the set A ′ composed of the bitstrings from A ∪ G with the k biggest λσ 4. Train the ARNN such that it models p(σ) ∼ |λσ| 2 5. Update A ← A ′ end while return the set A of k bitstrings First, we explain how to use an auto-regressive neural network to efficiently identify the relevant bitstrings.Since the Schmidt coefficients are normalized as σ |λ σ | 2 = 1, they can be interpreted as a probability density.The chain rule from probability theory can be used to write and the bitstring pairs, associated with λ σ can be encoded by stacking the bitstrings of subsystem B at the end of the bitstrings of subsystem A, Note that here (σ) i denotes the ith bit of the bitstring σ.
Neural networks, and more particularly, auto-regressive methods are powerful tools to model such conditional densities [36] by generating elements sequentially conditioned on the previous ones.To build the autoregressive model, we consider a dense ARNN, whose architecture is very similar to a dense feedforward neural network.The notable difference is that the weights are tridiagonal matrices, ensuring the autoregressive nature of the model.From the ARNN, the bitstrings can then be sampled directly and efficiently, as detailed in Appendix B.
Exploring the full space of bitstrings is exponentially difficult, motivating the use of machine learning techniques to select the basis states that contribute the most to the wave function.Inspired by the work of Herzog et al. [34], we introduce an algorithm whose primary objective is to bypass exploring the extremely large space of basis states.Starting from a random set of bitstrings A 0 , the strategy consists of adding bitstrings generated according to the approximation of the |λ σ | 2 modeled by the ARNN.Since the variational energy is quadratic with respect to the Schmidt coefficient λ σ , at each iteration, they can be determined by solving the constrained linear equation system where the sum runs over the set A ∪ G, with A being the current set of bitstrings while G is the set of bitstrings sampled by the ARNN.The first equation ensures that the forged wave function has minimal energy, while the second guarantees its normalization.In a second step, the current set A is updated by taking the k bitstrings with the highest Schmidt coefficients.The ARNN is finally trained to model p(σ A , σ B ) ∼ |λ σ | 2 in a supervised way.These steps are iterated until convergence, which is reached when the current set A is stable and the loss of the ARNN is close to zero.The choice of the cutoff k is usually optimized by trial and errors.Here, we start with a small cutoff, and slowly increase it until no further improvement is observed.We note that small cutoffs are often preferable, since they requires less expensive calculations, but also make the ARNN easier to train.This is why the ARNN plays an important role in choosing the optimal set of bitstrings.In summary, the training is composed of two stages: the training of the ARNN using some random initialization for U and V , followed by the optimization of the two unitaries in order to tailor them to the set of bitstrings.The ARNN is trained to model the distribution p(θ) on the target q obtained by solving the system of linear equations in Algorithm 1.
The choice of the loss function L and training set T play an important role in the training of the ARNN.For the training set, two possibilities are investigated: the model is either trained on the current set A and the generated bistrings G or, following Ref.[34], only on the non pruned bitstrings, i.e., the new set A ′ .Concerning the loss functions, we consider the explicit logcosh loss [37] and the implicit maximum mean discrepancy (MMD) loss Figure 1: Schema of the end to end algorithm.The set of bitstrings is first generated by the ARNN, and is then used to perform the Schrödinger forging VQE.This involves iteratively computing the variational energy on the quantum processing unit and classically optimizing the variational parameters until convergence. [38] where is chosen to be a Gaussian kernel, with || • || 2 the 2-norm and ∆ the bandwidth parameter.The latter determines the width of the kernel and controls the sensitivity of the MMD measurement.A larger bandwidth allows more global comparisons, while a smaller bandwidth focuses on local details.The MMD loss function effectively minimizes the difference between the mean embedding of the two distributions.It involves a pairwise comparison of every bitstring in the training set with their contribution being controlled by the kernel.
As a benchmark, we also consider a more standard approach for modeling probability distributions using the reversed Kullback-Leibler (KL) divergence for the loss of the ARNN.A detailed description of this method is presented in Appendix D.

III. NUMERICAL SIMULATIONS
In this section, we present numerical experiments.We begin with the performance of the bitstrings selection algorithm on small models.Then, we proceed to expound upon the bitstrings selection and subsequent energy minimization process on various models of increasing complexity.

A. Identify the relevant bitstrings
We investigate the performance of the generative algorithm on small symmetric models: the transverse field Ising model (TFIM), the Heisenberg and J 1 -J 2 model on a one-dimensional (1D) chain of 14 spins with periodic boundary condition, the two-dimensional (2D) TFIM on a 4 × 3 triangular lattice with a diagonal cut and open boundary condition and the t-V model on a 4 × 3 grid.These models, further detailed in Appendix C, allow for an exact Schmidt decomposition, enabling us to assess the algorithm's performance by examining how many bitstrings associated with high Schmidt coefficients can be identified.
The results, for a cutoff dimension of k = 8 bitstrings, are presented in Table I.More specifically, we can find the performance of Algorithm 1 in terms of the number of correctly identified bitstrings using logcosh and MMD loss.Two training sets are considered for the former: the union T = A ∪ G and the pruned set T = A ′ .Furthermore, the impact of the parameters' initialization is attenuated by model averaging (MA).The ensemble technique consists of training four ARNNs with different initial weights and taking their average as the starting point of a final ARNN.Results obtained with the more standard reversed KL approach (see Appendix D) are also presented.Finally, the last column of the  The algorithm proposed in this paper is able to find the majority of the most important bitstrings.Moreover, the MMD and logcosh loss function are superior to the standard approach based on the reversed KL divergence.Indeed, with the latter, relevant bitstrings can only be found if the system is small or when the level of entanglement is high (leading to a wide probability distribution).This is not suitable in most applications, since low entanglement is important to guarantee a low additive error with a cut-off dimension.The best results are highlighted in bold, and are in general obtained with the MMD loss.More precisely, with the MMD loss, it is always able to find the four bitstrings with the highest Schmidt coefficient.This loss enables the ARNN to generalize well and make the algorithms converge quickly, as it can be further appreciated for the 2d TFIM with 12 spins in appendix G, Fig. 17.In that case, the ARNN has only seen 24 bitstrings in total during the training and it is able to find the seven bitstrings with the highest Schmidt coefficients, containing the five most important ones, in only two iterations.
To gain a better understanding of the dynamics of the generative algorithm, the loss of the ARNN and the number of bitstring updates between two iterations are presented.Figure 2 shows the results with the MMD loss on the five different Hamiltonians.In all cases, we observe that the ARNN loss converges to zero and that the generated set of bitstrings is stable.
In general, the dynamics can be divided into two phases: an initial phase where the loss is high and the model explores a diverse range of bitstrings, followed by a second phase where the model attempts to exploit its approximation of the probability distribution to converge and generate bitstrings with high Schmidt coefficients.This exploration-exploitation trade-off can be modified by adjusting the number of bitstrings sampled at each iteration and the learning rate of the ARNN.

B. Complete entanglement forging scheme
In the last section, we shown that the ARNN is able to identify the bitstrings with the highest Schmidt coefficients.We now test the complete EF scheme.First, spin systems on a ring are considered, before going to a two dimensional lattice, and finally to the nuclear shell model.

Spins in one dimension
We begin by considering the one-dimensional TFIM.More precisely, we consider a spin chain with periodic boundary conditions, an even number N = 20 of spins, and set the coupling and the external field coefficient to one.The Hamiltonian of the model can be written as Since the system is invariant under permutation symmetry, we can compare our approach to the Heisenberg forging with ARNN.Because of the symmetry, we can choose σ A = σ B , which reduces the number of possible bitstrings.As above, a cutoff dimension of k = 8 is chosen for the number of bitstrings, which was chosen by trial and error.Figure 3 shows the energy error ratio for the three forging schemes, i.e., Schrödinger with a random uniform set of bitstrings, Schrödinger with the generated set, and Heisenberg forging with the generated set.Following Ref. [32], a pre-training of the quantum circuit over 1000 iterations is performed.In both cases, the unitaries take the form of hardware efficient ansatz where D = 15 is the number of layers, CX i,j is a CNOT gate with control qubit i and target j, while U (x) is N fold tensor product of arbitrary single-qubit rotation parametrized with 3N parameters.We denote with Θ the set containing all indexed θ j i .Details on the training procedure, such as values for the hyperparameters and the optimization algorithm, can be found in Appendix E.
We observe that the choice of the random set has little impact on the performance of the Schrödinger forging procedure.Moreover the models enhanced with the ARNN display better results, both being quite similar.
To ensure that specific physical properties of the ground state, outside of its energy, are correctly reproduced, the spin-spin correlators Z i Z j of the forged states have been calculated.They are shown in Fig. 4. We observe that the accuracy is not degrading over the overlap, suggesting that the error can be explained mainly by the training of the circuits rather than the EF procedure.The error on the correlators Z i Z j is minimal when j = i + 1 and maximal if the two spins are far apart in the chain.This can be explained by the locality of the ansatz, built using gates acting on neighboring qubits.

Spins in two dimensions
We now move towards two-dimensional spin lattices, which are more challenging due to local operators being mapped to non-local ones when projected onto a line.We consider the TFIM on a 2D topology described by a triangular lattice, as shown in Fig. 13, see Appendix C. We break the permutation symmetry of the two subsystems by applying a random external field Setting the coupling constant to one, the Hamiltonian is given by where ⟨i, j⟩ are neighbors according to the triangular topology.The triangular lattice has a high coordination number, leading to strong magnetic susceptibility [39], meaning that the system is more sensitive to external magnetic fields and can therefore exhibit stronger magnetic order and complex physical phenomena, such as, e.g., disorder, localization, and heterogeneity.The two-dimensional lattice is divided with a cut along the diagonal axis.We consider open boundary conditions (OBC), cylindrical boundary conditions (CBC), and toroidal boundary conditions (TBC).Since the boundary conditions can lead to different levels of entanglement [40,41], they play an essential role in the EF procedure, which is why different configurations are considered.
The convergence of the variational energies for the three boundary conditions are shown in Fig. 5. Like in the one-dimensional case, a cutoff of k = 8 is chosen in the Schmidt decomposition.We observe that the bitstrings generated by the ARNN lead to an improvement of approximately 10 −2 in the error energy ratio with respect to taking a random set.The most striking result, though, is that the gap between the random and generated methods is increasing with respect to the one-dimensional case, suggesting that sampling with the ARNN is becoming more effective when considering systems of increased complexity.On the other hand, no advantage can be noted in the context of TBC.It seems that the parametrization of the unitaries is the limiting factor in improving the energy error.

Nuclear shell model
Finally, we consider light nuclei in the shell model with Cohen-Kurath [42] interactions, where the Hamiltonian can be written in second quantization as Here, â † i and âi are the creation and annihilation operators, respectively, for a nucleon in the state |i⟩.Singleparticle energies are denoted as ϵ i and two-body matrix elements as V ijkl .The orbitals |i⟩ = |n = 0, l = 1, j, j z , t z ⟩, are described as functions of the radial n and orbital angular momentum l, the total spin j, its projection on the z-axis j z its projection, and the z-projection of the isospin t z .
We consider nucleons in the p shell model space, which includes six orbitals for the protons and six orbitals for the neutrons, while each energy is computed with respect to an inert 4 He core.The shell-model Hamiltonian [see Eq. 13] is converted into a qubit Hamiltonian via the Jordan-Wigner [43] transformation.Each single-particle state is represented by a qubit where |0⟩ and |1⟩ refer to an empty and an occupied state, respectively.Therefore, each nucleus can be distinguished by the number of excited orbitals, representing the protons and neutrons on top of the 4 He core.
The partition is made at the isospin level, meaning that sub-system A consists entirely of protons while sub-system B consists of neutrons.Therefore the Schrödinger forging is the only possible choice since the system is not symmetric under proton-nucleon exchange.To build a chosen nuclei, we start from an appropriate initial state, with the desired number of nucleons, and act with an excitation preserving (EP) ansatz.EP ansätze can be built as a product of two-qubit excitation preserving blocks U (θ, ϕ), also known as hop gates [44,45], of the form This set can be extended with four-qubit excitation preserving gates [46], defined as The parameterized circuit then takes the form of a layered ansatz composed of a product of excitationpreserving gates, where the dth layer is described by We denote by RZ i (ϕ) a rotation of the ith qubit around the z-axis and the subscript of the U and G gates indicate the qubit the gate is acting upon (i : j is a slice from i to j).The large Θ d parameters regroup all parameters in the dth layer, i.e., Θ Since the Schmidt rank of the p nuclear shell model is at most 20, the generative algorithm is unnecessary, as all bitstrings in the Schmidt decomposition can be used.The energy minimization for the various nuclei is presented in Fig. 6.We observe that every ground state energy in the p shell can be reproduced with an error ratio In the final experiment, nucleons in the sd shell model space, including 12 orbitals for the protons and 12 orbitals for the neutrons, are considered.For the latter, each energy is computed with respect to an inert 16O core.Using the Jordan-Wigner mapping, this model leads to a 24 qubits Hamiltonian and is composed of a total of 11'210 overlapping terms.This high number can make EF particularly expensive, as it scales linearly with it.However, since most of their coefficients are close to zeros, an approximate Hamiltonian, consisting of the 38 overlapping terms with the most significant coefficients, is instead considered.Despite this approximation, the Hamiltonian can still reproduce 97% of the ground state energy of the 23 Na nucleus, which is the focus of this experiment.
The 23 Na nucleus is composed of three protons and four neutrons on top of a 16 O inert core.The ARNN sampler has therefore been modified to generate bitstrings with three ones in subsystem A (protons) and four ones in subsystem B (neutrons).The energy minimization and the final Schmidt decomposition of the 23 Na are presented in Figs. 8 and 9, respectively.Once again, a higher accuracy is obtained with the generated set.Multiple states from the generated set are contributing to the VQE, meaning that the ARNN is useful in selecting appropriate bitstrings.On the contrary, when the random set is used, the variational circuit prefers to adapt to one state, and set the contribution from the others to zero.

IV. DISCUSSION AND CONCLUSION
This paper proposes an alternative way to perform Schrödinger forging using autoregressive neural networks.We build on the work from Eddins et al. [24], which introduced the EF-based VQE, and on Huembeli et al. [32], which efficiently compute quantum expectation values as statistical expectation values over bitstrings sampled by a generative neural network.While their work leverages the additional permutation symmetry, our work is fully general and computationally efficient due to the introduction of a cutoff dimension.Moreover, the latter is giving us additional control over the amount of quantum resources required.This is not the case in the Heisenberg forging scenario, as shown in Appendix B, where the ARNN begins by sampling many bitstrings and finishes by using only one.Therefore, in this specific case, the Heisenberg forging with neural networks is expensive at the beginning of the training and loses its expressive power at the end.On the other hand, Schrödinger forging enables better control on the trade-off between expressiveness and computational expensiveness of the variational model without having the assumption of symmetric permutation of the two subsystems.When the additional permutation symmetry is present, we still recommend using Heisenberg forging, since it requires less epochs to be trained.However, we stress that many systems, such as molecules or nuclei, do not exhibit this symmetry, providing important use cases for Schrödinger forging.
Numerical simulations have been performed on ring and triangular lattice spin systems.Schrödinger forging with the ARNN consistently achieves better performance for the computation of the ground state energy and correlators, compared with random sampling and Heisenberg forging with neural networks.In the case of the triangular lattice, different boundary conditions are considered, directly affecting the performance.The parametrization of unitaries is a limiting factor when complex boundary conditions are considered.
The most striking result is that the performance gap between random sampling and using the ARNN increases with the system's complexity, thus suggesting that our approach will be more profitable for larger systems.Finally, the nuclear shell model is also solved using the Schrödinger forging case up to the 10 −3 error ratio for the most complex nucleus.The ARNN is unnecessary since the maximum number of possible bitstrings is 20, as all bitstrings can be used.The approach is then tested on a larger nucleus in the sd shell model, 23 Na.Once again, the generated set results in better accuracy than a random one.
Autoregressive models are easily interpreted and can naturally generate bitstrings with a certain number of excitations.They are also well suited for addressing the task at hand, owing to their robustness as density estimators.They do exhibit certain limitations, specifically in terms of sampling speed and the requirement for fixed-order input decomposition [47].Nevertheless, the limited number of samples in this algorithm renders the issue of sampling speed inconsequential.Furthermore, experiments were conducted by varying the decomposition orders, and it was determined that such alterations did not yield any substantial changes in the obtained results.Masked multilayer perceptrons are a straightforward choice for building the autoregressive model.However, other architectures could be more suitable in some cases.In particular, transformers [48] provide a strong alternative since they are highly parallelizable and efficient in capturing the global context and long-range dependencies due to their attention mechanism.
At the beginning of this work, simulations were carried out on small models.In these cases, all bitstrings could be taken into account during the VQE.It was observed that the order of the bitstrings, with respect to their coefficient (in absolute value), did not significantly change during the VQE.Therefore, choosing the bitstrings at the beginning of the circuit training enables us to perform well.However, it could be suitable in some cases to train the quantum circuits and ARNN simultaneously, taking advantage of parameter sharing.
Finally, we note that there is not necessarily a corre-lation between having sets of bitstrings associated with high Schmidt coefficients and the trainability of the corresponding variational state.Indeed, in some cases, taking a set of bitstrings with lower Schmidt coefficients might be favorable to make the variational circuits easier to train.Therefore, it may be possible to include this feature in the algorithm by choosing bitstrings that maximize the gradients.Alternatively, an algorithm, adapting the form of each variational circuit, an approach close to the ADAPT-VQE [49], could be investigated.

CODE AVAILABILITY
The numerical simulations of the quantum circuits have been performed with Pennylane [50], powered by a JAX backend [51], while the NETKET library [52] has been used for the ARNN.Solving the constraint systems of equations in the generative algorithm involves a projected gradient descent algorithm available in the JAXopt library [53].Moreover, the Heisenberg forging code is available on Github [54].Visualization of the evolution of the von Neumann entropy is performed using orqviz [55].The Python code of this project is accessible on Github [56].
to generate bitstrings with a fixed number of ones.Indeed, we just need to change the conditional probability in the sampling procedure, which can be done by setting p(σ i |{(σ) j , j < i}) = 0 if j<i (σ) j = k.This ensures a maximum of k excitation.If, on the other hand, there is only l < k excitation at the end of the string, the last k − l bits are turned into one to correct for it.While this leads to non-uniform sampling at the beginning of the training, we expect the ARNN to overcome this issue by incorporating it through the learning stage.In the Heisenberg case, at the beginning, a large number of states are required to be prepared on the quantum computer, while at the end, only one state remains.For Schrödinger forging instead, we have full control over the number of states we want to prepare.
A notable difference between the Schrödinger and Heisenberg forging schemes is that for the latter, it is impossible to control how many states one has to prepare on the quantum hardware.Indeed, in this case, all bitstrings sampled by the ARNN must be taken into account.In practice, as shown in Fig. 11, many states must be prepared at the beginning of the training and only one at the end.In the case of Schrödinger forging, since the cut-off can be fixed at the beginning, the number of states to be prepared on the hardware is constant.Following Ref. [32] a 1000-epoch pre training on the unitaries has been performed as proposed in Ref. [32].However, other optimization strategies could be considered.
where ⟨i, j⟩ are neighbors according to the triangular topology, see Fig. 13, which also shows the different cuts and boundary conditions.This model is more challenging due to local operators being mapped to non-local ones when projected onto a line.Moreover, it has a high coordination number which leads to a strong magnetic susceptibility [39], meaning that the system is more sensitive to external magnetic fields and can exhibit stronger magnetic order.The Hamiltonian of the t-V model is with a i and a † i being respectively the creation and annihilation operators on site i.A 4 × 3 system of spinless fermions with periodic boundaries and t = V = 1 is considered.It is mapped to a qubit Hamiltonian with the Jordan-Wigner transformation.In this model, fermions are allowed to move on the grid, modifying the energy of the system.In this spinless version, there is only one spin-orbit per site, giving a final Hamiltonian of 12 qubits.

Table I:
Small models: number of bitstrings generated which are among the ones with the eight biggest Schmidt coefficients in the exact decomposition.The proposed algorithm is evaluated with A ∪ G or A ′ as training set T , with or without MA.Furthermore, the ARNN if trained with the reversed KL, logcosh and MMD loss.The final column shows the sum of the truncated Schmidt coefficients.the sum of the eight highest Schmidt coefficients squared from the exact decomposition.It indicates the amount of entanglement, the cutoff's accuracy, and the probability distribution's sharpness.

Figure 2 :
Figure 2: Small models.Training of the generative algorithm for different physical systems.[Top] the number of bitstrings updates between two consecutive iterations.[Bottom] value of the MMD loss at each iteration.

Figure 3 :
Figure 3: 1D TFIM 20 spins.Convergence of the variational energy of forged quantum states.The blue curve represents the mean energy over ten sets of k = 8 random bitstrings, with the shaded area displaying the standard deviation.The purple one is instead showing the training using the set generated by the ARNN.In addition, the simulation with the Heisenberg forging algorithm is shown in pink.

Figure 4 :
Figure 4: Correlators in 1D.Correlators Z i Z j of the Schrödinger and Heisenberg forged states on the TFIM 20 spins in 1D.The pairs ⟨i, j⟩ are ordered as follows: [ [⟨i, j⟩ for i < j] for 0 ≤ i < N ].The neighboring cases, with j = i + 1, are highlighted with a black vertical line.

Figure 5 :
Figure 5: 2D TFIM 12 spins.Convergence of the variational energy of forged quantum states.The colors indicate different boundary conditions, while the shaded curves show the mean energy over ten sets of k = 8 random uniform bitstrings.

Figure 6 :
Figure 6: Convergence of the variational energy of the Schrödinger forged states corresponding to the various nuclei of the nuclear p shell model.The Schmidt rank being at most 20, all bitstrings have been used in the Schmidt decomposition.

Figure 7 :
Figure 7: (a) Von Neumann entropy of the various nuclei during the training.(b) Visualization of the two main components of the Von Neumann entropy of the 11 B in the variational space.In addition, the value of the entropy at each training epoch is shown (in gray) as well as the final value (in red).

Figure 8 :
Figure 8: Convergence of the variational energy of the Schrödinger forged states corresponding to the 23 Na nucleus of the nuclear sd shell model.

Figure 9 :
Figure 9: Final Schmidt decomposition of the variational energy of the Schrödinger forged states corresponding to the 23 Na nucleus of the nuclear sd shell model.

Figure 11 :
Figure 11: Comparison between the number of different bitstrings sampled for Heisenberg and Schrödinger forging.In the Heisenberg case, at the beginning, a large number of states are required to be prepared on the quantum computer, while at the end, only one state remains.For Schrödinger forging instead, we have full control over the number of states we want to prepare.

Figure 12 :
Figure 12: One dimensional spin chain with PBC, N = 14 spins (a) and N = 20 spins (b).The blue cut represents the separation between the 2 subsystems.

Figure 13 :
Figure 13: Triangular lattices used for the simulations.Lattices of 12 spins with OBC, CBC and TBC are shown in (a), (b) and (c) respectively.The two subsystems are defined with a diagonal cut (blue).
table contains