Learning Quantum Hamiltonians from Single-qubit Measurements

It is natural to measure the observables from the Hamiltonian-based quantum dynamics, and its inverse process that Hamiltonians are estimated from the measured data also is a vital topic. In this work, we propose a recurrent neural network to learn the parameters of the target Hamiltonians from the temporal records of single-qubit measurements. The method does not require the assumption of ground states and only measures single-qubit observables. It is applicable on both time-independent and time-dependent Hamiltonians and can simultaneously capture the magnitude and sign of Hamiltonian parameters. Taking quantum Ising Hamiltonians with the nearest-neighbor interactions as examples, we trained our recurrent neural networks to learn the Hamiltonian parameters with high accuracy, including the magnetic fields and coupling values. The numerical study also shows that our method has good robustness against the measurement noise and decoherence effect. Therefore, it has widespread applications in estimating the parameters of quantum devices and characterizing the Hamiltonian-based quantum dynamics.


I. INTRODUCTION
Developing the methods for estimating Hamiltonians has two important motivations in quantum information processing. First, Hamiltonians fully govern the dynamics of quantum systems. Hence, whether the Hamiltonians can be precisely estimated determines whether the control operations are highly accurate on these quantum devices. For instance, quantum circuits are generally realized through the control pulse techniques [1], which are beforehand designed and optimized according to the parameters of the system Hamiltonians. Second, as a branch of quantum process tomography [2], estimating Hamiltonians provides an alternative approach to estimate the fidelities of the performed quantum simulations. Therefore, estimating Hamiltonians is a central problem in the related quantum fields, such as quantum platforms [3], quantum control [4,5], and quantum simulations [6,7].
So far, various methodologies have been studied for this purpose. In principle, Hamiltonians can be estimated by quantum state and process tomography by considering the Hamiltonians are the generators of the dynamical processes [2,8,9]. However, this approach requires exponential physical resources, although many-body Hamiltonians have the polynomial number of unknown parameters because of the physical constraints. Previously, some methods using Fourier transform or fitting on the temporal records of measurement of some observables also are proposed to estimate Hamiltonians with few qubits [10][11][12]. Zhang and Sarvoar [13,14] proposed an approach for estimating Hamiltonian based on the limited measurements by the eigensystem realization algorithm (ERA). This method was experimentally demonstrated on nuclear magnetic resonance quantum processor [15]. Akira Sone et al further studied the identifiability problem of Hamiltonians and the necessary experimental resources in ERA method and they show that more observables are necessary and the required experimental measurements have exponential scaling with the size of systems for complicated Hamiltonians [16,17]. Many-body local Hamiltonians can be uniquely estimated by a single eigenstate of Hamiltonians, which also inspires the subsequent research [18][19][20][21][22]. Recently, a quantum quench method also was proposed to reconstruct a generic many-body local Hamiltonian [23], which uses pairs of generic initial and final states connected by the time evolution of Hamiltonians.
In this work, we propose a machine learning method, Recurrent Neural Network (RNN), to estimate the parameters of Hamiltonians from single-qubit Pauli measurements on each qubit. In our method, the initial state does not require the ground states of target Hamiltonians and only single-qubit Pauli observables are measured at a discrete-time forming the temporal records of single-qubit measurements which are fed into RNN. The intuition of this method is that if the Hamiltonians are identifiable under the temporal records of single-qubit measurements, then there exists the underlying rule from single-qubit measurements to the target Hamiltonians, which can be learned directly from single-qubit measurements via data-driven machine learning, although this rule may have complicated or even unknown functional forms. Our paper is organized as follows. In Sec. II, we first describe our framework for estimating Hamiltonians via RNN and then test our methods for different types of time-independent and time-dependent Hamiltonians with up to 7 qubits. The robustness against the measurement noise and decoherence is successively studied. In Sec. III, we in detail discuss the required measurement resources in the practical applications, followed by our conclusions and outlooks. The detailed techniques in our method are placed in Sec. IV.

A. Learning Hamiltonians via the RNN
We firstly describe the dynamics of single-qubit observables under the target Hamiltonians. Here, we consider that a quantum system with N qubits starts from an initial state |Ψ 0 and undergoes a dynamical process governed by the unknown Hamiltonian H. H is parameterized as where B m is the tensor product of Pauli matrices I, σ x , σ y , and σ z , and a m is the parameter of Hamiltonians. For single-qubit Pauli operator P ∈ S P = {σ Here, P (t) = n=0 i n t n n! P n . P 0 = P and P n = M m=1 a m [B m , P n−1 ]. Hence, if the parameter a m participates in the dynamics of single-qubit observables, it is possible to learn the Hamiltonian parameters from the temporal records of their expectation values. In this work, we consider the identifiable Hamiltonians under single-qubit measurements and the initial state |Ψ 0 . The most common of Hamiltonians belong to this category. Next, we describe our machine learning method for estimating the Hamiltonians from single-qubit Pauli measurements.
As illustrated in Fig. 1, a N -qubit system starts from the initial state |Ψ 0 = N i=1 ⊗|ψ i 0 . Here, |ψ i 0 = R z (π/4)R y (π/4)|0 can be prepared from the state |0 using rotation operations R z (π/4) and R y (π/4). The purpose for choosing such an initial state ensures that the dynamics of single-qubit observables have nontrivial initial values. During the dynamical evolution e −iHt , the expectation values of single-qubit operators σ (2) ρ(sτ ) is the density matrix of the system at the moment sτ . The parameters of Hamiltonians are collected as a vector H = {a m |1 m M }. Then we can train a neural network framework consisting of Recurrent and full connected (FC) neural networks with generated training data {I, H}. After the training, we can predict the unknown Hamiltonian parameters H from single-qubit measurements I. As shown in Fig. 1, in our NN framework, we use Long Short-Term Memory (LSTM) network which is a type of RNN [37]. Compared with traditional feed-forward neural networks, LSTM can learn the correlation in time sequences, which has been widely applied on handwriting recognition and speech recognition in the classical field [38], and quantum control and quantum process tomography in the quantum field [39,40]. LSTM is appropriate to estimate the Hamiltonians from the temporal records. In this training, we define the input and output layers, objective function, and similarity function as follows.
(i) The input and output layers. I and H are respectively used as the input and output layers of our NN framework. At the moment sτ , the expectation values of single-qubit measurements are collected as a vector It is firstly fed into the s-th LSTM cell. Lastly, an FC neural network is applied before exporting the prediction H. Hence, the number of required LSTM cells equals the number of sampling points S.
(ii) The objective function. Our neural network is trained by minimizing the distance between the predicted outcome H pred and the true outcome H true . Here, we use Mean Square Error (MSE) between H pred and H true as the objective function. It is This definition can learn the magnitude and sign of the parameters, because L decreases to 0 only when H true m and H pred m are absolutely the same. To minimize the objective function in this work, we use Adam optimization algorithm, one of the state-ofthe-art gradient descent algorithms, to train the hidden parameters of the network. (iii) The similarity function. To estimate the performance of our trained NN, we need to compute the similarity between the predicted and the real outcomes for the test data. Here, we use the definition of the cosine proximity function between two vectors. It is Here, F ∈ [−1, 1]. As shown in Fig. 1

B. Applications
Ising Hamiltonian 1-. As a demonstration of applications, we first train an RNN framework for estimating the parameters of 7-qubit Hamiltonians with the nearest-neighbor XY interactions placed in a static magnetic field around z axis as follows, a (i) z and J (j) are the parameter of magnetic field on j-th qubit and the coupling value between the nearest-neighbor qubits, respectively. Suppose that a k ), 1 s 25, k = x, y, z, and 1 i 7}, and then we randomly generate 100,000 training data {I, H} fed into the neural networks. The test data consists of 5,000 pairs of Hamiltonians H and the corresponding single-qubit measurements I. Our RNN is trained by minimizing the distance between the actual and predicted outcomes in Eq. (4). After finishing the training of RNN on the training data, our RNN has the ability to estimate the unknown parameters of 7-qubit Hamiltonians H 7 XYZ from single-qubit measurements with high accuracy. We compute the similarity F test between the actual parameters H test and the predicted outcome H pred for 5,000 test data. The averaged similarity on the whole test data is over 0.99 and F test as a function of epochs is also presented in Fig. 2(a). Figure 2(a) also gives the comparison between the actual value J (1) test and the prediction J (1) pred for 100 randomly test data at the beginning and end. Ising Hamiltonian 2-. Besides, our RNN framework can also be applied to more general Hamiltonian models. Here, we use our RNN to learn the parameters of 6-qubit Ising Hamiltonians with the nearest-neighbor interactions in three directions. The Hamiltonian of this 6-qubit system can be written as, Similarly, single-qubit observables σ (i) x , σ (i) y , and σ (i) z also are measured at a discrete-time separated by τ = 0.02π/J 0 and the number of sampling points is S = 75. We randomly generate 200,000 pairs of such Hamiltonians and the corresponding single-qubit measurements as the training data. After learning on these training data, RNN can predict the outcome of the test data. For 5,000 randomly generated test data, the average accuracy of the predictions is around 0.98. More details about the results can be found in Fig. 2(b).
Time-dependent Hamiltonians-. Most of the existing methods are designed for the time-independent Hamiltonians and they are not directly applicable to time-dependent Hamiltonians. Our proposed RNN method presented in the above can also be used to estimate the parameters of time-dependent Hamiltonians. As a numerical demonstration, we consider a 3-qubit system with the nearest-neighbor XY interactions placed in a time-dependent magnetic field around z axis. The used neural network is presented in Fig. 1. The corresponding Hamiltonian is, We assume that a     k ), 1 s 300, k = x, y, z, and 1 i 3}. Our training data also consists of 100,000 randomly generated pairs of Hamiltonians H and the corresponding singlequbit measurements I. After training RNN to convergence on these training data, it can be used to learn the temporal behavior of a (i) z (t) from only the measurements I. Figure 3 presents the temporal behavior of the predicted values (solid lines) and its comparison with the actual values (dotted lines) for time-dependent parameters a (i) z (t). It shows that a good agreement between the predicted and real results has been achieved.

C. Robustness against the noise
The temporal records of single-qubit measurements inevitably are influenced by the statistical and environmental noises, and may these noises deviate the predicted values of RNN from the ideal ones. Here, we further study the robustness of our RNN framework in learning the unknown Hamiltonians under the Gaussian noise and decoherence effect. Following simulations are performed for a 3-qubit system with Ising Hamiltonian H 3 XYZ . H 3 XYZ is The unknown parameters in H 3 XYZ form a vector H = [a (1) z , a (2) z , a Robustness against the Gaussian noise-. First, we train RNN frameworks by feeding 100,000 noiseless training data {I, H} with the sampling points S = 25 and S = 50, respectively. Two trained RNN models RNN_0noise_25 and RNN_0noise_50 are obtained. Then we predict the Hamiltonian parameters by feeding noisy test data into these two RNN models. These noisy data is artificially generated by adding the Gaussian noise in the data I, i.e., I = I + N (0, ). Here, N (0, ) is a Gaussian distribution with the mean of 0 and the standard deviation of . We change from 2% to 10% with the step 2% and create 5,000 noisy test data for each . Figure 4(a) presents the average similarities between the predicted parameters H pred and the true parameters H true as a function of . RNN_0noise_50 has a better performance than RNN_0noise_25, but both their predicted accuracy decrease with the increasing of . When = 0.1, the accuracy of RNN_0noise_25 decreases to 0.98. To further improve the robustness of our RNN frameworks under the noise, we adopt the following approach.
Second, we change to train RNN frameworks by feeding 100,000 noisy training data. The training data is perturbed under a Gaussian noise with a standard deviation of = 0.1. Two RNN models RNN_10noise_25 and RNN_10noise_50 are trained to convergence. Similarly, we use these models to test the noisy data. also is changed from 2% to 10% with the step 2% and 5,000 noisy test data for each is created. The average values of predicted accuracy as a function of also are presented in Fig. 4(a). It shows that it has good performance with the similarity of over 0.99 and the predicted accuracy improves to 0.995 from the previous 0.98 when = 0.1. The above simulations show that training RNN frameworks with the noisy data will greatly enhance the predicted accuracy and the more sample points will bring better robustness against the noise. From the simulation, it can be roughly concluded that learning Hamiltonians via RNN has robust performance under the Gaussian noise. Robustness against the decoherence-. The total time for measuring the temporal records may reach or even exceed the coherence time of the experimental devices. Hence, the collected temporal records contain the decoherence effect, leading to a decrease in the predicted accuracy. For this purpose, we also numerically study the performance of our RNN frameworks under the decoherence effect. The temporal records with decoherence effect are created according to the Kraus representation of decoherence dynamics. The evolution of Hamiltonians is divided into slices with the duration of each slice being δτ . Supposing that the density matrix is ρ(t) at the moment t, then density matrix at t + δτ is Here, E i j is the kraus operator of the i-th qubit with, λ i is a parameter with λ i = (1 + e −δτ /T i 2 )/2. T i 2 is the decoherence time of i-th qubit. We change T i 2 from 1π/J 0 to 6π/J 0 with the segment 2π/J 0 . For each T i 2 , we create 5,000 decoherence test data with the sample points of S = 150 (Sample interval is 0.02π/J 0 and corresponding sampling time is 3π/J 0 ).
As shown in Fig. 4(b), when we feed these test data to the model RNN_0noise_150 to predict the Hamiltonian parameters H, it is found that the accuracy of predicted H rapidly falls with the decrease of coherence time. To improve the robustness against the decoherence effect, we trained a RNN framework using 100,000 decoherence train data, named by RNN_T2noise_150. Figure 4(b) presents that the predicted accuracy will have a significant improvement with the average value of over 0.99, when using RNN_T2noise_150 to process the decoherence test data.

III. DISCUSSION AND CONCLUSION
We briefly discuss the required measurement resources and feasibility in the practical experiments, including the sampling interval and the number of sampling points. First, single-qubit measurements are easy-to-implemented in current quantum platforms [8,[41][42][43][44], such as using the dispersive readout on superconducting qubits and the ensemble measurements on nuclear magnetic resonance. Single-qubit measurements also have the lower readout errors than multi-qubit measurements [45,46]. Second, the sampling interval τ should be traded-off, accounting for the coherence time. On the one hand, the total sampling time may exceed the coherence time of qubits if τ is too large, leading to the decrease of the prediction accuracy. On the other hand, the temporal records of single-qubit measurements may be hard to distinguish if τ is too small, also leading to the decrease of the prediction accuracy. As shown in Fig. 5(a), we change the sampling interval τ from 0.01τ to 0.09τ with the step 0.02τ (τ = 0.02π/J 0 ) and fix the sampling points S = 25. Then we train our RNN models with 100,000 training data for each τ and test their performance with 5,000 test data. The considered Hamiltonian is described in Eq. (9). The result shows that the RNN model can not be trained to a high accuracy if τ is too small. Third, the number of total sampling points is 3N S, where factor 3 is the number of elements {σ z }, N is the number of qubits, and S is the number of sample points. Here, we numerically study how S increases with the size of the systems in our method. In our simulation, we consider Ising Hamiltonians in Eq. (6), in which the number of qubits is changed from 2 to 6, and we train the neural networks with 100,000 randomly generated training data for given N and S. Then, we test the average accuracy of trained neural networks using 5,000 test data. Figure 5(b) presents the achieved accuracy as a function of N and S. From the simulated results, it is shown that S has a gentle increasing with the size of the system for this type of Hamiltonians. It may be understood from the following aspect. As long as this Hamiltonian is identifiable under the chosen initial states and single-qubit observables, it is possible to learn the parameters of Hamiltonians from their temporal records with finite sampling points. For instance, many-body Hamiltonians have polynomial parameters. The polynomial sampling points may be enough to estimate the parameters for many-body Hamiltonians in machine learning method.
In summary, we conclude that a composite neural network can be trained to learn the Hamiltonians from single-qubit measurements, and numerical simulations of up to 7 qubits have demonstrated its feasibility on time-independent and time-dependent Hamiltonians. Compared with the existing methods, this neural network method does not need to prepare the eigenstates of target Hamiltonians and it can learn all the information of Hamiltonians including the magnitude and sign of the parameters. Once the neural network is successfully trained, it can be directly used to predict the parameters of unknown Hamiltonians from the measured data without any post-processing. It is a 'once for all' advantage. Besides, the initial states and single-qubit measurements in this method are easy-to-implemented in current quantum platforms, and the high accuracy can be achieved even under the potential experimental noises, including Gaussian noise and decoherence effect. It will bring some potential applications in performing the tasks of Hamiltonians identification in the experiments. Our method also has possible extensions in the future, such as learning the environment information around the system and simulating the dynamics of closed and open systems.

IV. METHODS
Structure of LSTM-. The LSTM is a form of the recurrent neural network designed to solve the long-term dependencies problem. An LSTM consists of a chain of repeating neural network modules called LSTM cells. As shown in Fig. 6(a), the s-th LSTM cell imports O(sτ ), f s−1 , and c s−1 and exports f s and c s for the next LSTM cell. Here, O(sτ ) and f s−1 are firstly combined by an FC neural network whose structure is shown in Fig. 6(b). In our training, this layer includes 256 neurons. Then different activation functions σ and tanh are used and finally different operations ⊕ and ⊗ are implemented before exporting f s and c s . Next, we introduce the detailed operations in the LSTM cell.
As shown in Fig. 6, the long-term memory of LSTM is called cell state c s , which stores information learned by flowing through the entire chain. To update the cell state, the cell has two layers called "forget gate" and "input gate" to remove or add information to the cell state. The cell also has the ability to output the information from cell state called "output gate". Thus, these three gates control the cell state and construct an LSTM cell. At the beginning, the cell uses forget gate G to decide what past information to remove from the cell. The input of current moment o(s) and the output of last moment f s−1 go through the forget gate G as follows: where σ(x) = 1/(1 + e −x ) is the Sigmoid function. Then, it uses input gate I to decide what new information to add to the cell state as follows: I = σ(W i · [f s−1 , o(t)] T + b i ). And o(s) and f s−1 go through a tanh layer to create a candidate cell state E as follows: The next step is to update the cell state by forget gate G and input gate I as follows: c s = G × f s−1 + I × E. In the end, it uses output gate to decide what information to select as output and generate the output. The equations are given as: