Exact NMR simulation of protein-size spin systems using tensor train formalism

We introduce a new method, based on alternating optimization, for compact representation of spin Hamiltonians and solution of linear systems of algebraic equations in the tensor train format. We demonstrate the method's utility by simulating, without approximations, a 15N NMR spectrum of ubiquitin --- a protein containing several hundred interacting nuclear spins. Existing simulation algorithms for the spin system and the NMR experiment in question either require significant approximations or scale exponentially with the spin system size. We compare the proposed method to the Spinach package that uses heuristic restricted state space techniques to achieve polynomial complexity scaling. When the spin system topology is close to a linear chain (e.g. for the backbone of a protein), the tensor train representation is more compact and can be computed faster than the sparse representation using restricted state spaces.


Introduction
The amount of patience required to simulate exactly a nuclear magnetic resonance (NMR) spectrum of an N-spin system scales approximately as O(2 N ).That much is rarely available, and considerable thought has consequently been given over the last decade to more efficient methods of computing NMR spectra [1,7,14,22,51], in particular to algorithms that promise to achieve that objective in polynomial time.Such algorithms do exist [7,24], but they make significant a priori assumptions about the spin system evolution -it is usually assumed that the system stays weakly correlated for the duration of the experiment [24,35].
Outside the NMR community, significant progress was recently made in the development of tensor structured methods [46,32,47,29,20], all of which descend broadly from the density matrix renormalization group (DMRG) formalism [55,56] and matrix product states (MPS) [18,31].Typical applications of DMRG algorithms are 1D chains at absolute zero [46] and finite temperature [6,48,54], with extension to 2D lattices [36,38,58,49].Invariably, however, the interesting spin systems encountered in the daily practice of biological NMR spectroscopy (proteins, polynucleotides, polysaccharides) are all irregular three-dimensional roomtemperature agglomerations with multiple interlocking loops in the spin coupling graph and no identical couplings [9].When the strict NMR spectroscopy requirement for correct wavefunction phase during the very long (milliseconds to seconds) dissipative NMR spin system trajectories is added to the list, time-domain DMRG methods are currently struggling.
There are some biologically relevant cases, however, that may still be treated as linear chains -for the purposes of simulating simple backbone NMR experiments, protein side chains may often be ignored.This makes the corresponding spin system a weakly branched linear chain that is amenable to DMRG type treatment.Simple NMR experiments can also be reformulated as a matrix-inversetimes-vector problem in the frequency domain, for which efficient algorithms in tensor product formats have recently emerged [23,11,12,13].We report in this communication the behavior of the AMEn algorithm [12,13], applied to the solution of the NMR simulation problem in the frequency domain, as well as to the technical task of adding together, without loss of accuracy, tensor train representations of thousands of spin Hamiltonian terms for a protein.
Having integrated the algorithms described below into Spinach (a large-scale magnetic resonance simulation library [22]), we are reporting here the first exact quantum mechanical simulation of a liquid-state 1D NMR spectrum for a protein backbone spin system with several hundred coupled spins.Beyond the physical assumptions made by chemists at the problem formulation stage and the controllable numerical rounding error of the tensor train format itself [40], there are no approximations.

Tensor product formats
Tensor product expressions appear naturally in spin dynamics because the state space of a multi-spin system is a direct product of the state spaces of individual spins [17].A simple example is the nuclear Zeeman interaction Hamiltonian where N is the number of spins, B 0 is the applied magnetic field, A (n) are nuclear chemical shielding tensors, and the sum runs over all nuclei.Components of the nuclear spin operators ˆ S where 1 denotes an identity matrix of appropriate dimension and the Pauli matrices σX , σY , σZ occur in the n-th position of the tensor product sequence.This representation is known in numerical linear algebra as the canonical polyadic (CP) format [32].Although CP representations have been known in magnetic resonance spectroscopy for a long time [5], they suffer in practice from rapid inflation -spin Hamiltonians encountered in NMR and ESR (electron spin resonance) systems can be complicated [17] and, even for simple initial conditions, the number of terms in the canonical decomposition increases rapidly during system evolution.More ominously, the number of CP terms can change dramatically after small perturbations in the Hamiltonian or the system state.A simple example is where â⊗N = â ⊗ • • • ⊗ â.The left hand side of this equation contains N terms, but the expression approximating it on the right hand side has only two terms, and one could be tempted to use it to reduce storage and CPU time.However, both terms of the approximation grow to infinity when ε → 0, and the accuracy is lost due to rounding errors.Such instabilities in the CP format make it difficult to use -in finite precision arithmetic the number of terms in the decomposition quickly becomes equal to the dimension of the full state space and any efficiency savings disappear.Unlike the CP format, which is an open tensor network, closed tensor network formats are stable to small perturbations.The most popular closed tensor network format was repeatedly rediscovered and is currently known under three different names: DMRG in condensed-matter physics [55,56], MPS in computational physics [18,31,42], and TT (tensor train) in numerical linear algebra [40].It is defined as follows x = τ(x (1) , . . ., x(N) ) = α 1 ,...,α N−1 x(1) The TT representation of the total ŜZ operator in Eq. ( 3) is similar to the highdimensional Laplacian [28]: The number of terms in each summation (known as bond dimension, or TT rank) is two, and all entries of the decomposition are now bounded.The TT representation in the right hand side of Eq. ( 5) has 4N − 4 operators, each of which is either zero, or the identity 1, or the Pauli matrix σZ .The CP representation of ŜZ in Eq. (3) has N 2 such operators -the tensor train representation is clearly more memoryefficient.
Another notable example is the ZZ coupling Hamiltonian that often makes an appearance in simple linear spin chain models: This is a CP expression with N(N − 1) terms and N 2 (N − 1) operators.The corresponding TT representation is Here each summation runs over three terms only, and the total number of matrices Ĵ(n) α n−1 ,αn is 9N − 12: much fewer than the storage requirements of the CP format.
The storage requirements of tensor structured representations (both CP and TT) stand in sharp contrast with the classical approach to magnetic resonance simulations [1,51], where the Hamiltonian is represented as a 2 N × 2 N sparse matrix with all non-zero entries stored in memory.As soon as the sparse matrix is assembled, the CPU and memory resources grow exponentially with the number of spins N, making the simulation prohibitively difficult for large systems.To avoid this problem (known colloquially as the curse of dimensionality), all data ideally should be kept in compressed tensor formats of the form given in Eqs. ( 1) and ( 4), without ever expanding the Kronecker products.TT format achieves this objective [32,29,20].
A very considerable body of literature exists on manipulating expressions directly in tensor product formats.In particular, a given tensor may be converted into the TT format using sequential singular value decompositions [46,42,40].Given tensors in the TT format, one can perform many linear or bilinear operations (addition, element-wise multiplication, matrix-vector multiplication) [47,40], Fourier transform [10], and convolution [27], avoiding exponentially large arrays and computational costs.
These developments would have permitted large-scale magnetic resonance simulations entirely in the TT format, were it not for a significant obstacle -the summation operation in tensor train representation is an expensive procedure that carries a significant accuracy penalty due to a large number of tensor truncation (rounding) steps.This was also previously noted for three-dimensional potentials occurring in electron density computations [30,45,19].Spin Hamiltonians of practically interesting biological systems contain many thousands one-and twospin terms of the kind shown in Eq. ( 2).Intermediate expressions in spin dynamics simulations also frequently involve large sums.We demonstrate below that in those circumstances the standard bundle-and-recompress tensor network summation procedure leads either to the bond dimension expansion beyond the limits of modern computing hardware, or to a catastrophic accuracy loss.Here we propose an alternative algorithm for computing large sums, based on alternating tensor train optimization, and use it to carry out NMR simulations on protein-size spin systems.

Simulation setting and experimental context
Fully 13 C and 15 N-labelled protein human ubiquitin (PDB code 1D3Z, Figure 1) containing over a thousand magnetic nuclei in 76 amino acid residues was cho-  [26], HNCOCA [3], HNCA [26], etc.The isotropic NMR Hamiltonian was assembled using chemical shift values from the BMRB database [50] and Jcouplings from the literature data [8,37,41,52,53].In the cases where an experimental value of a particular J-coupling was not available in the literature, it was estimated based on the known values for structurally similar substances [2,21,33] -for most NMR simulation purposes and certainly for the purpose of the demonstration of the performance of the tensor train algorithm the accuracy of such coupling estimates (about 20%) is sufficient.The raw data for the magnetic couplings used in this work is available in the example set supplied with the current public version of the Spinach library [22].
NMR experiments were performed at 25 • C on a Varian Inova 600MHz (14.1 Tesla) spectrometer equipped with a z-gradient triple-resonance cryogenic probe using a 0.5mM sample of uniformly 13 C and 15 N-labelled human ubiquitin in 10% D 2 O.The 15 N spectra were collected as 2D 1 H-15 N HSQC [4] spectra incorporating gradient enhanced coherence selection [25] and water flip-back.The spectra were recorded with acquisition times of 150ms (t 1 , 15 N) and 500ms (t 2 , 1 H).During the 15 N evolution period 1 J HN and 1,2 J NC couplings were either allowed to evolve, or decoupled by insertion of a rectangular 15 N or a shaped 200µs 13 C inversion pulse using the central lobe of a sinc function.During 1 H acquisition 15 N nuclei were either evolved or decoupled using 40ppm broadband WURST sequence [34].
The liquid-state NMR Hamiltonian of 13 C, 15 N-labelled ubiquitin is: where canonical NMR spectroscopy notation is used [17], k index runs over all nuclei, l and m indices run over pairs of nuclei that belong to the same isotope, p and q run over pairs of nuclei that belong to different isotopes, r and s runs over the nuclei influenced by radiofrequency control pulses, ω X (t) and ω Y (t) are time profiles of those pulses, ω k are offset frequencies arising from the chemical shielding of the corresponding nuclei [43], weak are "weak" NMR J-couplings [17], and the spin operators Z are defined by Eq. ( 2).In the case of extended ubiquitin backbone, Hamiltonian in Eq. ( 8) contains 563 shielding terms, 1840 coupling terms, and 1126 radiofrequency terms.All calculations reported below were performed by extending the functionality of Spinach library [22] to the tensor train formalism and interfacing it to TT-Toolbox [39] where appropriate.
Due to the abundance of complicated multi-pulse NMR experiments with timedependent Hamiltonians [17], magnetic resonance simulations are generally carried out in the time domain.They always require long-term evolution trajectories with accurate phases (at least 100ms, much longer than the reciprocal Hamiltonian norm) for the density operator ρ(t) under the Liouville-von Neumann equation: where R is the relaxation superoperator (T 1,2 model with literature values for relaxation times [9] was used in the present work), ρeq is the thermal equilibrium state, and Ô is the observable operator, usually a sum of ŜX or ŜY operators on the spins of interest.In very simple cases where the Hamiltonian is not time dependent, the general solution to Eq. ( 9) can be written as: where Ĥ is the Hamiltonian commutation superoperator.Direct time domain evaluation of this equation in tensor train format, either using explicit operator exponentiation or Krylov type propagation techniques that avoid it, does not appear to be possible -in all cases described by Eq. ( 8) the ranks in the tensor train expansion quickly grow beyond the capacity of modern computers.Increasing the singular value cut-off threshold at the representation compression stage leads to catastrophic loss of accuracy.Fortunately, there are simple cases (most notably pulse-acquire 1D NMR spectroscopy) where amplitudes at only a few specific frequencies are actually required for the Fourier transform of Eq. (10), meaning that the problem can be reformulated as: That is, to compute an observable in point ω in the frequency domain, we need to solve a linear system Ĥ + i R + ω1 |x = | ρ0 .The formulation in Eq. ( 11) sacrifices a great deal of generality of Eq. ( 9) (simulation of arbitrary NMR pulse sequences is no longer possible), but it does serve as a stepping stone and enables the demonstration calculation presented below.

Tensor train algorithm for the summation and solution of linear systems
The DMRG algorithm has been originally proposed [55,56] to find the "ground state" of a Hermitian matrix A by the minimization of the Rayleigh quotient The dynamical DMRG algorithm [23] was then developed to find solution of a linear system A|x = |b with a Hermitian positive definite matrix A by the minimization of the "energy function" Besides the change of a target function, the algorithms are similar.In DMRG algorithms the vector of interest x is sought in a form of a tensor train / MPS in Eq. ( 4).The minimization over all tensor train cores x (n) simultaneously is a highly nonlinear and complicated task.To make the procedure feasible, it is replaced by a sequence of optimizations carried over one core at a time: J(τ(x (1) , . . ., x (n) , . . ., x (N) )). ( The MPS/TT format is linear in all sites x (n) , which can be written as x = X =n x (n) , where the frame matrix X =n maps the parameters of the TT-core x (n) to the vector x.
The linearity allows to rewrite Eq. ( 12) as x (n) = arg min J n (x (n) ) = A −1 n b n , where J n is the energy function for the local problem A n x (n) = b n with A n = X * =n AX =n and b n = X * =n b.Using the non-uniqueness of the tensor train format, one can always construct the representation with the unitary frame matrix X =n , that guarantees the stability of the local problem.After the solution x in the tensor train, and continue for n = 1, . . ., N, and then back and forth the chain.
The convergence of the described above one-site DMRG depends on the initial guess, in particular the TT ranks, because they remain the same during the sequence of updates defined by Eq. ( 12).This is a severe restriction, and additional methods are used to adapt the TT ranks during the computations.One way to do it is to replace the optimization over the single cores by the optimization over the pairs of neighboring cores, and then adapt the TT rank between them.Another possibility is to extend the search space by auxiliary directions.The first method of the latter type is the corrected one-site DMRG algorithm [57] which targets in addition to x a surrogate of the next Krylov vector Ax.
For the solution of linear systems, the alternating minimum energy (AMEn) algorithm was recently proposed [12,13], which also uses an additional direction for the purpose of adaptivity.The local optimization step in AMEn is carried over one site only.To increase the TT ranks and improve the convergence, the TT blocks are expanded by auxiliary information, x (n) := x (n) z (n) .The enrichment z (n) introduces new directions in the subspace spanned by X =n+1 .A good choice of the enrichment is the component z (n) of the TT representation (exact or approximate) z = τ(z (1) , . . ., z (N) ) for the residual z = b − Ax.The proposed AMEn algorithm is as fast as one-site methods but is rank adaptive as the two-site DMRG algorithm, and demonstrates a comparable convergence rates.For the solution of a linear system Ax = b with a Hermitian positive definite matrix, it has a proven global bound on the geometrical convergence rate.
In this work we use the AMEn algorithm for two purposes.First, we apply it to a system with a trivial matrix A = 1, but a complicated right-hand side b being a sum of many elementary tensors like the one in Eq. ( 2).This allows us to compress a Hamiltonian returned by the Spinach package from the CP format given by Eq. ( 8) into the TT format Eq. ( 4).The Hamiltonian is stretched into a vector, and the target functional J(x) = x − b 2 is a Frobenius-norm distance between a given Hamiltonian b and Hamiltonian x sought in the tensor train format.The one-site optimization in Eq. ( 12) is effectively the solution of the over-determined linear system X =n x (n) = b by least squares.For the unitary frame matrix we have x (n) = X * =n b, and therefore the local optimization step is solved by the contraction of the frame matrix with the given Hamiltonian b.The enrichment step uses the low-rank approximation of the residual z = b − x, which is acquired by one-site DMRG optimization.
Second, we compute the NMR spectrum of a spin system by solving linear systems in Eq. (11).Since the matrix Ĥ + i R + ω1 is not Hermitian, we symmetrize the problem, and for R = µ1 obtain ρ This equation is solved by the AMEn algorithm at all points ω in a selected spectral domain.

Results and discussion
As discussed above, a major problem in the application of tensor train methods to magnetic resonance simulation of large systems is the calculation of lengthy sums involved in the construction of spin Hamiltonians and density matrices, and their compression into the TT format.Fig. 2 illustrates the performance of our proposed solution to this problem in the case of minimal (H, N, C, CA, HA) and extended (H, N, C, CA, HA, CB, HB) ubiquitin backbone spin systems.The storage requirements for the TT format in Eq. ( 4) depend on all TT ranks r 1 , . . ., r N−1 , and are characterized by the effective TT rank r, defined by Nr 2 = N n=1 r n−1 r n .It is immediately apparent from the left panels of Fig. 2 that the primary showstopper -rapid growth in the tensor train bond dimension -has been removed by the spin number AMEn summation RSS IK-1(4,2) basis system of terms nnz, M time, sec nnz, M time, sec.bb 5490 3.0 423 6.8 9080 ext bb 10408 6.3 1765 23 67500 Table 1: The summation time and storage size for the isotropic part of Hamiltonian given by Eq. ( 8) for minimal (H, N, C, CA, HA) and extended (H, N, C, CA, HA, CB, HB) ubiquitin backbone spin systems (denoted as 'bb' and 'ext bb').The large number of terms are summed up by the AMEn algorithm to the TT format Eq. ( 4), and the reduced state space method [22] to the sparse matrix.
AMEn method: the effective TT ranks stay below 50 for the extended backbone and below 40 for the minimal backbone, well within the capability of modern desktop workstations.Since r 2 is less than the number of terms in the CP representation, the TT format with Nr 2 operators provides more compact storage than the CP format.An alternative approach to AMEn is the binary summation, which sequentially adds and compresses the components pairwise.The TT ranks after the binary summation, however, rise up to several hundred and thereby make the solution of linear systems in Eq. ( 13) exceedingly difficult.It is clear from the right panels of Fig. 2 that the CPU time requirements of the AMEn summation compared to the binary summation are essentially the same, making the AMEn procedure clearly superior for all practical purposes.We also note that the summation of the Hamiltonian given by Eq. ( 8) to the TT format is more efficient than the sparse representation using the reduced state space (RSS) technique [22], see Table 1.
The resulting representation of the ubiquitin backbone spin Hamiltonian matrix is, up to the rounding error of the complex double precision arithmetic, exact.In magnetic resonance spectroscopy this is an unprecedented development -ubiquitin NMR simulation is currently just about feasible [16], with significant approximations and colossal computational resources.Tensor train representation is therefore a large step forward, even though Eq. ( 11) is not applicable to arbitrary NMR experiments.
The accuracy of the 15 N pulse-acquire NMR spectra, computed using Eq. ( 13) with the AMEn algorithm, is below 10 −6 across the frequency interval, as demonstrated in Fig. 3.The deviations of the TT simulation from the RSS one result from the basis set effects in the restricted state space method, and the rank truncation threshold in the AMEn method.
Due to the intrinsically low sensitivity of liquid state 15 N protein NMR spectroscopy, it is not possible to record the experimental equivalent of Fig. 3 directly with a sufficient signal-to-noise ratio; we have therefore taken a somewhat longer route to the experimental validation of the tensor train simulation -Fig.4 shows experimental proton-detected 15 N-1 H HSQC spectra of ubiquitin, compared to the simulations obtained at the basis set limit of the restricted state space formalism [24].Perfect agreement is apparent in both cases.This provides an experimental evidence to the accuracy of the restricted state space approach.The tensor train results in Fig. 3 can now be justified by comparison to the RSS approximation.It is clear from Fig. 3 that the TT formalism performs as intended.The successful 1D NMR simulation notwithstanding, very significant obstacles remain to the practical application of tensor train formalism to NMR spectroscopy.The following requirements should be addressed in future work to fully uncover the potential of the DMRG/MPS/TT formalism for spin dynamics simulation in NMR: 1.The DMRG requirement for the spin system to be a chain or a tree should be lifted.Biological magnetic resonance spin systems are irregular polycyclic interaction networks with multiple interlocking loops in the coupling graph, particularly in solid state NMR, where inter-nuclear dipolar couplings form a very dense network.A generalization of tensor product algorithms to complex tensor networks that fully mimics the molecule structure is required.
2. Rank explosion problem for time-domain simulations should be solved.It is clear from the success of the restricted state space approximation [35,22] that the order of spin correlation in many evolving magnetic resonance spin systems either is or may safely be assumed to be quite low.This evidences data sparsity and indicates that some kind of low-rank decomposition is possible.100 110 120 130 15 N chemical shift, ppm Basis set limit of the restricted state space approximation Tensor train formalism (no approximations) Figure 3: Amino group region of pulse-acquire 15 N NMR spectrum of human ubiquitin computed at the basis set limit of the restricted state space approximation [22] (upper trace) and using Eq. ( 11) as described in the main text.The relative deviation between the two spectra is below 10 −6 across the frequency interval.
3. Our experience indicates that DMRG type objects are very far from being drop-in replacements for their matrix counterparts in standard simulation algorithms and software -it does actually appear that nearly everything in the very considerable body of magnetic resonance simulation methods needs to be adapted to the realities of DMRG.Current implementation of tensor product methods still requires a number of tuning parameters (e.g.approximation accuracies, TT ranks of the enrichment).The broad adoption of tensor product algorithms require the operations to be handled transparently and seamlessly by the existing simulation software packages, in the same way as sparse matrices currently are.
4. Transparent and clear tensor train approximation accuracy criteria, rank control and a priori error bounds should be developed -one of the most irksome factors we had encountered in our practical work with DMRG representations was the unpredictable accumulation of numerical rounding errors.This problem is particularly acute for the state vector phase in time domain simulations: magnetic resonance experiments rely critically on the phase being correctly predicted.
All of that having been said, we are still optimistic about the future of DMRG/MPS/TT methods, having also found them useful in Fokker-Planck type formalisms related to NMR and EPR spectroscopy [15].Their primary strength is the lack of heuristic assumptions and the controllable nature of the representation accuracy.An experimental implementation of tensor train magnetic resonance simulation paths, via an interface to the TT-Toolbox [39], is available in version 1.3.1980 of our Spinach library [22].

Conclusions
Even with their well-documented limitations (the requirement for the spin system to be close to a chain, difficulty with long-range time-domain simulations, code implementation challenges, etc.), the ability of DMRG type formalisms to simulate simple liquid state NMR spectra of large spin systems essentially without approximations is impressive.They cannot yet match the highly optimized dedicated methods developed by the magnetic resonance community [22], but if some of the limitations are lifted by the subsequent research, tensor network methods would have the potential to become a very useful formalism in NMR research.Benefits to EPR spectroscopy, with its star-shaped spin interactions graphs, are likely to be harder to achieve, but may still be obtained by exploiting the direct product structure of combined spin and spatial dynamics appearing in Fokker-Planck type problems [15].
Having solved in this paper the last purely technical problem on the way to the broad adoption of tensor train formalism in magnetic resonance spectroscopy, we are quite optimistic about its potential.We look forward to generalizing the AMEn method to arbitrary tensor networks, that closely match the geometry of a molecule.We also plan to develop reliable methods for indefinite matrices, as well as time evolution schemes maintaining conservation laws in tensor formats.

Figure 2 :
Figure 2: Performance comparison for binary and AMEn tensor train summation during the construction of the NMR spin Hamiltonian.Top -human ubiquitin backbone (H, N, C, CA, HA), bottom -human ubiquitin extended backbone (H, N, C, CA, HA, CB, HB).Here H 0 refers to the isotropic part of the Hamiltonian and Q k,n to the irreducible spherical components of the anisotropic part.

Figure 4 :
Figure 4: Theoretical (left) and experimental (right) 1 H-15 N HSQC spectra of 15 N, 13 C-labelled human ubiquitin.Top -proton decoupling switched off in the indirect dimension and nitrogen decoupling switched off in the direct dimension to demonstrate the accurate quantum mechanical treatment of spin-spin coupling by the simulation.Bottom -additionally, carbon decoupling is also switched off in the indirect dimension.Signal groups marked A-F in the theoretical spectrum are not visible in the experimental data due to partial deuteration and slow conformational exchange of the corresponding amino acid residues.