Automatic Differentiable Numerical Renormalization Group

Machine learning techniques have recently gained prominence in physics, yielding a host of new results and insights. One key concept is that of backpropagation, which computes the exact gradient of any output of a program with respect to any input. This is achieved efficiently within the differentiable programming paradigm, which utilizes automatic differentiation (AD) of each step of a computer program and the chain rule. A classic application is in training neural networks. Here, we apply this methodology instead to the numerical renormalization group (NRG), a powerful technique in computational quantum many-body physics. We demonstrate how derivatives of NRG outputs with respect to Hamiltonian parameters can be accurately and efficiently obtained. Physical properties can be calculated using this differentiable NRG scheme--for example, thermodynamic observables from derivatives of the free energy. Susceptibilities can be computed by adding source terms to the Hamiltonian, but still evaluated with AD at precisely zero field. As an outlook, we briefly discuss the derivatives of dynamical quantities and a possible route to the vertex.


I. INTRODUCTION
The Numerical Renormalization Group (NRG) [1,2] is a standard tool in computational physics for solving a certain class of quantum many-body problem, known as quantum impurity models. These comprise a few interacting quantum degrees of freedom, coupled to one or more non-interacting baths. The most famous example is the Kondo model, which involves a single spin-1 2 'impurity' coupled to a single fermionic bath [3]. The low-energy physics is non-perturbative and controlled by a single emergent energy scale (the Kondo temperature, T K ), below which the impurity is dynamically screened by a spatially-extended entanglement 'Kondo cloud' of surrounding conduction electrons [4]. Generalized quantum impurity models describe the scattering from magnetic impurities in metals [3,5], semiconductor quantum dot devices [6][7][8][9], single molecule transistors [10,11], and are the local effective models within dynamical mean field theory (DMFT) for correlated materials [12][13][14].
NRG is a tensor network method [15] that exploits the renormalization group structure of such quantum impurity models [1]. At its core, the method involves the iterative numerical diagonalization of a set of renormalized effective Hamiltonians. There is no statistical element, as with quantum Monte Carlo impurity solvers [16].
In this paper, we apply the 'differentiable programming' methodology to NRG, and demonstrate some advantages of its use for practical physics applications. We show that the core eigensolver routine in NRG is wellsuited to Automatic Differentiation (AD), an emerging programming paradigm that has its origin in Deep Learning (DL) [17]. In DL, hugely complicated neural networks (NN) feed their output into a loss function, which must be optimized with respect to the NN parameters (weights). This can be accomplished efficiently via gradient descent using AD programming: the gradient of the loss-function with respect to the NN weights can be computed at about the cost of the evaluation of the NN [18]. The remarkable aspect of this technique is that the computed gradient is not approximated, but is numerically exact. Without AD, derivatives can of course be approximated using finite difference (FD) derivatives. However, without care, FD derivatives can be inaccurate; when treated rigorously to guarantee convergence, FD can become computationally expensive. The efficiency and accuracy of AD owes to the fact that if each step of a computer program can be differentiated, then outputs can be differentiated with respect to inputs via the chain rule. Evaluating derivatives is trivial and cheap since the program itself is differentiated.
AD has been made more accessible through the introduction of AD libraries such as jax [19], tensorflow [20] or PyTorch [21]. The advantage for computational physics is that when formulated using an AD library, exact derivatives can be obtained 'for free' [22][23][24][25][26][27][28]. Recent applications in physics include optimal control in quantum systems [29], mitigating the sign problem in Monte Carlo [30], and optimizing tensor networks [31]. In addition to its use for optimization problems, AD has particular appeal in physics because physical quantities are related to derivatives of generating functionals [32].
Here we formulate a differentiable NRG (∂NRG) scheme. The key step is an explicit expression for the derivative of NRG eigenvalues and eigenvectors: the exact gradient of the entire eigensolver routine at any given NRG step is determined symbolically. This eliminates the need for full backpropagation, and when combined with AD for the other program elements, provides highly efficient access to derivatives of program outputs with respect to model parameters. For example, thermodynamic observables may be obtained from derivatives of the free energy; and the physical response of a system to perturbations can be simply and cheaply determined.

II. FORWARD-AND BACKWARD-MODE AD
The AD approach is similar to symbolic differentiation, with standard derivative rules applied algorithmically [17,33]. Regarding a program as a function f that maps a given input x to an output y, we may write f (x) = y. This is typically achieved in practice by concatenating several primitive functions f (i) , viz: where "•" denotes the composition of two functions. We denote the number primitive functions comprising f as ops(f ). The flow of data between primitive functions can be expressed as a compute graph [34]; for simplicity, we consider here only programs whose graph has a chain topology as in Eq. (1). To compute the derivative of f with respect to x one can apply the chain rule, Eq. (2) can be evaluated algorithmically in either the forward mode, or the backward mode. In the forward mode, we apply the recursion starting with i = 1 and increasing up to i = n, with f (n) ≡ f such that δ n = ∂f /∂x is the desired derivative. In backward mode (also known as backpropagation) we use the recursion starting from i = n − 1 and decreasing down to i = 0, withδ 0 = ∂f /∂x the desired derivative. These forward and backward mode AD recursions are symbolically equivalent methods for calculating derivatives. In the practical implementation of AD the intermediate derivatives δ i orδ i are evaluated numerically and stored in memory. This provides a way to compute numerically exact derivatives, with the true analytical result recovered when the value of δ i is not truncated [35].
By contrast, the FD approximation reads, with h being the FD value. The exact limit h → 0 ± cannot be taken numerically, and so ∂ x f must be approximated for finite h and the convergence D h [f ](x) → ∂ x f checked explicitly. This may require many function evaluations (runs of the whole program) since for very small h, f (x) and f (x+h) may be numerically indistinguishable (truncation error). Furthermore, convergence should be checked for both the forwards (h > 0) and backwards (h < 0) difference quotient. The situation gets worse for higher-order derivatives, and numerical convergence of FD approximations can be costly and/or fraught [36]. Above we considered scalar functions of the form f : R → R. However, both AD and FD approaches can be straightforwardly generalized to compute the Jacobian of vector-valued functions defined over a vector space f : R n → R m . The FD and forward/backward mode AD scale differently with respect to the input dimension n and output dimension m. The FD exhibits the most expensive scaling of computational cost with ∼ 2nm function evaluations required to compute the Jacobian of f in the single-shot case. By contrast, AD methods do not require the function to be evaluated in order to compute the derivative, as with the symbolic approach (although since many derivatives are related to their antiderivative, function evaluations can be reused to reduce computational cost at the expense of increased memory cost [37]). Instead, the computational cost of AD methods is controlled by ops(f ). Specifically, the forward mode scales as ∼ c F n ops(f ), while the backward mode scales as ∼ c B m ops(f ), with c F , c B < 6 (although the forward mode typically outperforms the backward mode unless m n [37]). The computational cost of the FD approach is therefore typically much higher than either of the AD modes; in this work we use forward mode AD unless stated otherwise. A comparison of the performance and precision of the different methods, as applied to NRG, is presented in Appendix A.
Through AD one can obtain the numerically exact derivative of any program with a single run, as long as all operations comprising the program are themselves differentiable. Thus, one can differentiate a given physics solver program simply by utilizing known derivatives of its constituent operations.
On the other hand, in certain cases it may be possible to find the symbolic derivative of the entire solver with respect to certain input parameters. In the following we show how the latter can be achieved for NRG through analysis of the backpropagation chain.

III. NRG
Wilson's NRG is a numerical solver for quantum impurity models, of the typeĤ =Ĥ imp +Ĥ bath +Ĥ hyb , whereĤ imp is the 'impurity' Hamiltonian describing a few local, interacting quantum degrees of freedom, whileĤ bath = σ,k kĉ † σkĉσk describes a non-interacting fermionic bath. The coupling between them is given bŷ where we have assumed for simplicity here that a single impurity orbitald σ couples to the bath.
Due to the impurity interactions, such a model is a genuine quantum many body problem, and in general has no exact solution [3]. NRG treats instead a discretized version of the model, which becomes computationally tractable through a process of iterative diagonalization and truncation. This constitutes an RG procedure, in which useful physical information can be extracted at each step, as progressively lower energy scales are probed. Full details can be found in Refs. [1,2]; here we introduce only the elements necessary to formulate ∂NRG.
The first step is the logarithmic discretization of the bath and mapping to a Wilson chain. The continuous local density of states of the uncoupled bath at the impurity position is divided up into intervals of decreasing width, defined by the points x ± n = ±DΛ −n , where D is the half-bandwidth, Λ > 1 the discretization parameter, and n = 0, 1, 2, .... The spectrum is discretized by replacing the continuous density in each interval by a single pole of the same weight and at the average position. A semi-infinite tight-binding chain (the Wilson chain) is then defined such that the spectral function at the end is the same as the discretized bath spectrum (this is achieved in practice by the Lanczos algorithm). The discretized bath Hamiltonian reads, where we have assumed particle-hole symmetry here for simplicity. The original model is recovered as Λ → 1. The specific form of the Wilson chain hopping parameters t n depends on details of the dispersion relation k . However, for a metallic system, t n ∼ DΛ −n/2 for large n.
For a Wilson chain of total length N , adding an extra site N + 1 is therefore always a small perturbation. The NRG exploits this energy scale separation down the chain through the iterative diagonalization scheme. Starting from the impurity, the chain is built up by successively adding new Wilson chain sites. At each step, the system is diagonalized, and high-energy states discarded. One defines a sequence of rescaled Hamiltonians, comprising the impurity and the first N sites of the Wilson chain, with V the effective impurity-bath coupling. The full (discretized) Hamiltonian is recovered in the limitĤ = lim N →∞ Λ −(N −1)/2Ĥ N . The sequence of Hamiltonianŝ H N satisfy the recursion relation, We denote excitation energies relative to the ground state as ∆E N ;i = E N ;i − E N ;0 , andĤ imp ≡Ĥ −1 . To construct H N +1 we add another Wilson chain site. The Fock space ofĤ N +1 is spanned by basis states, comprising the tensor product of eigenstates |i N ofĤ N , and the added chain site N + 1 denoted |s . Matrix elements ofĤ N +1 in this basis read, Finally,Ĥ N +1 is diagonalized to find the new eigenbasis at step N + 1, viz: Eq. (11) can be simplified using the energies E N ;i ofĤ N and the tensor η ss σ = s |ĉ σN +1 |s (which is independent of N ), viẑ where η N ;jj σ = īs ,ī s U † N (j,īs) U N (j ,ī s ) × δīī ηs sσ and s = {0, −1, +1, 2} for |s = {|0 , | ↓ , | ↑ , | ↑↓ }.
When following these steps, the dimension of the Fock space grows exponentially with the length of the chain. This is avoided in NRG by truncating the Fock space at each step, discarding high-energy states. In practice, one retains M K of the lowest energy eigenstates ofĤ N at each step. The NRG approximation [1,38] is that the states and energies ofĤ N approximate those of the full H. This approximation is justified by the exponential decay of the Wilson chain coefficients t n , which means that high-energy states discarded at a given iteration do not become low-lying states at a later iteration. Convergence of the NRG calculation can be checked post hoc by increasing M K . Hereafter it is to be understood that the condition i ≤ M K applies to Eq. (10) (with the eigenvalues sorted lowest to highest). The dimensionality of the Fock space is therefore constant at each step, and the NRG calculation scales linearly with chain length N . With increasing N the physics on successively lower energy scales is uncovered.
The entire process of going from one iteration to the next can be summarized as an RG transformation, The full (discretized, renormalized) HamiltonianĤ N can therefore be constructed iteratively starting from the impurity HamiltonianĤ imp . Regarding the latter as a function of n physical model parameters {θ}, the entire NRG algorithm can be viewed as a function, where we have used the shorthand notation for an N +1fold concatenation of functions Note that an operatorÔ, acting only on impurity degrees of freedom, can be expressed in the in the eigenbasis ofĤ N . We denote this matrix asÔ N (i; i ) = N i|Ô |i N . From Eq. (12) it follows that, where the primed sum implies that i, i ≤ M K , and the second line follows from the fact thatÔ does not act on states |s of site N +1. Eq. (16) can be used iteratively to transformÔ −1 (i; i ), evaluated explicitly in the impurity basis, into the eigenbasis of any later iteration. Symbolically, we denote this process, which we refer to as propagating forward the operator.

IV. DERIVATIVE OF EIGENVALUES AND EIGENVECTORS IN NRG
As outlined in the previous section, NRG outputs the eigenvalues and eigenvectors for the set ofĤ N . Since all physical quantities for the quantum impurity model are obtained from these, calculating their derivatives using Eq. (2) requires the derivatives of the eigenvalues and eigenvectors ofĤ N . From Eq. (15), we regard NRG as a function defined over an n dimensional domain, with an M K × M K dimensional image. Since n denotes the number of model parameters inĤ imp , and M K × M K is the Fock space dimension ofĤ N , we have M K × M K n and therefore forward mode AD is significantly faster [35]. In the following we consider only forward mode AD.
For a Hermitian matrixÂ (such as the Hamiltonian H N ) with non-degenerate eigenvectors |i and distinct eigenvalues λ i satisfyingÂ|i = λ i |i , we may express the differentials through [39,40], These expressions are of course well-known in the context of non-degenerate perturbation theory [41]. However, the assumption implicit in Eq. (18) that A must be free of eigenvalue degeneracies is rather restrictive. Methods treating the general degenerate case are much more computationally involved and may suffer from numerical instabilities [40,42]. Of course in physics, whereÂ is the Hamiltonian of some system, energy eigenvalue degeneracies are common. This may be because of underlying non-Abelian symmetries (for example SU(2) spin symmetry), which endow the spectrum of the Hamiltonian with a multiplet structure. In this case, the Hamiltonian becomes block diagonal in the associated conserved quantum numbers, and the diagonalization Eq. (12) can be done separately in each block. Alternatively, NRG can be formulated directly in multiplet space by using the Wigner-Eckart theorem [43]. Either approach removes the problem of such degeneracies in Eq. (18). However, this may not fully solve the problem, since there could be accidental degeneracies or emergent symmetries [1,3,7] that lead to additional energy eigenvalue degeneracies in physical systems.
To overcome this, we make a simple approximation. To the diagonal entries ofĤ N we add noise, with random variables drawn from a normal distribution of width σ. The noise lifts the eigenvalue degeneracy, meaning that derivatives can be obtained using Eq. (18). Note that the smallest terms in the rescaled HamiltonianĤ N are O(1) [1,2], so σ 1 constitutes a small perturbation. Care should be taken to add the noise in such a way to respect bare symmetries.
With a straight application of AD via Eq. (3), the derivatives of eigenvalues and eigenvectors forĤ N require the evaluation of Eq. (18) at every NRG step (and hence noise must be added at every NRG step). For small enough noise width σ, physical properties at a given iteration should be unaffected. Indeed, in Appendix A we show for the AIM that NRG remains highly accurate when σ < ∼ 10 −6 . At larger noise levels, errors may propagate through the iterative process (and will snowball if they introduce RG relevant perturbations), so care must be taken to avoid this with such an AD approach. On the other hand, forward mode AD for the AIM requires σ > ∼ 10 −6 to stabilize the calculation of derivatives using Eq. (18). Smaller noise levels introduce more severe numerical instabilities because of the recursive nature of the derivative calculation in AD via Eq. (3) -derivatives at one iteration depend on those of previous iterations. This stability-accuracy tradeoff in AD is analyzed in Appendix A.
Below, we derive an alternative formulation used in ∂NRG, which utilizes the symbolic derivative of the entire eigensolver forĤ N . This has the advantage that Eq. (18) need only be applied once, at the NRG step for which derivatives are required. Not only is this far less computationally costly, but it also avoids accumulating errors introduced by noise, which can then be added at a much lower level.

A. Formulation in ∂NRG
In the context of NRG, one can formulate a differential recursion relation based on the RG transformation Eq. (14), This allows derivatives ofĤ N to be computed in AD forward or backward mode. However, one can also reformulate the problem in a much more simple and efficient way. Taking derivatives with respect to model parameters θ ofĤ imp , we may write from Eq. (7) the operator identity ∂ θĤN Λ (N −1)/2 ∂ θĤimp (where follows from the NRG approximation). Since ∂ θĤimp only involves impurity operators, it can be trivially evaluated in the impurity basis, Eq. (17) can then be used to propagate this operator forward into the basis ofĤ N . It therefore follows that, From Eq. (18) we then obtain, Eq. (21) is the main result of this work. It shows how the derivative of energies and states of the NRG Hamiltoni-ansĤ N can be obtained by simply forward-propagating the impurity operators ∂ θĤimp . Derivatives of physical quantities can then be related to those obtained via Eq. (21). The above can also be generalised to deal with derivatives ∂ φ of other Hamiltonian parameters φ (for example V or t n in Eq. (7)) and higher order derivatives (see Appendix B).
To calculate first-order derivatives of eigenvalues and eigenvectors in AD forward mode (assuming the simplest NRG implementation) one must perform seven additional operations for every NRG step. The ∂NRG approach involves only one additional operation per NRG step and does not require any propagation of derivatives related to the chain rule (see Appendix C). Hence ∂NRG, utilizing Eq. (21), is far more efficient than standard forward mode AD. Performance benchmarking is demonstrated in A.
Derivatives with respect to multiple parameters can be obtained straightforwardly by propagating forward any [∂ θĤimp ] −1 of interest using Eq. (16) in ∂NRG. Therefore, ∂NRG scales linearly with the number of derivatives n, but is independent of ops(f ), unlike forward mode AD.
In practice, the use of Eq. (21) again requires lifting eigenvalue degeneracies by adding a small diagonal noise term. We found in numerical tests for the AIM that σ 10 −11 is sufficient for stabilizing the numerics in ∂NRG, without noticeably impacting any measurable physical properties -see Appendix A.
If derivatives at only step N are required, such a noise term need only be introduced at that step (rather than adding noise at each step) and the RG flow is unaffected. This is typically the situation for ground-state properties or low-temperature thermodynamics. On the other hand, if information is required from every NRG step (for example in the calculation of dynamical quantities [38]) noise must be introduced at each iteration. Unlike with the straight AD implementation, ∂NRG allows us to mitigate the possibility of snowballing errors introduced by propagating the noise terms -if this level of accuracy should be required (e.g. in the vicinity of a quantum critical point). This is because the eigensystem derivatives at different N via Eq. (21) are independent in ∂NRG. Noise may be added toĤ N for the purpose of evaluating Eq. (21), but the pristineĤ N without noise can be used for the main NRG recursion. This gives the most accurate and reliable results, at the cost of an additional matrix diagonalization. See Appendices A and C for performance comparisons (and note that this more rigorous approach is still faster than FD and straight AD in many circumstances, and certainly more accurate).

V. APPLICATION TO ANDERSON IMPURITY MODEL
We illustrate the use of Eq. (21) by applying the ∂NRG scheme to the paradigmatic Anderson impurity model (AIM), for whichĤ wheren = σn σ andn σ =d † σdσ . WithĤ imp so defined, the exact (discretized)Ĥ N is given by Eq. (7). In NRG, H N is approximated through the RG procedure Eq. (14), For a givenĤ N , we obtain the partition function Z N (β) = i e −βE N ;i and free energy F N (β) = − 1 β log[Z N (β)]. In the original Wilsonian formulation [1,2], the effective inverse temperature β ≡ 1/k B T is related to the NRG iteration number (Wilson chain length) N via β = Λ (N −1)/2β , withβ = O(1) in practice. With this definition, the NRG free energy at inverse temperatureβ is a good approximation to the true free energy of the original (undiscretized) model at inverse temperature β [1,2], F (β) F N (β). The corresponding differential follows as, For the AIM, we may use Eq. (21) to obtain derivatives of the free energy with respect to the impurity parameters and U . For example, Since ∂ F N (β) = n H N ,β n H,β , Eq. (24) is precisely equivalent to the standard Wilsonian approach to calculating local thermodynamic expectation values in NRG [2]. The above illustration demonstrates that ∂NRG is analytically equivalent to the well-known result for such local thermodynamic quantities.
Another commonly computed quantity for such models is the local impurity magnetic susceptibility at zero field, is the impurity spin operator and τ is imaginary time. This can be alternatively obtained by adding a source term BŜz to the Hamiltonian Eq. 22, and then taking the second-order derivative of the free energy with respect to B, evaluated at B = 0. That is, we can write χ(T ) = ∂ 2 ∂B 2 F B=0 . Since ∂NRG is able to deal with second (and higher) order derivatives (see Appendix B) we calculate χ H N ,β = ∂ 2 ∂B 2 F N (β) B=0 . This is obtained automatically in ∂NRG, but from Eq. (B2) it can also be expressed as, This form of χ H N ,β is equivalent to the Lehmann representation of χ(T ) evaluated in NRG HamiltonianĤ N at effective inverse temperatureβ = βΛ −(N −1)/2 [2]. Since all degeneracies are lifted in ∂NRG, convergence factors typically used in the Lehmann representation of dynamical quantities are not needed here. Note that χ H N ,β can be obtained at exactly zero magnetic field, so no symmetries are broken and no limit is taken numerically. Since the derivatives of primitive program functions are symbolic, source terms added toĤ imp may be evaluated at zero coupling constant and still yield finite derivatives. However, we note that such symmetry-breaking source terms cannot be added to the Hamiltonian if non-Abelian quantum numbers are implemented from the outset. In the case of the magnetic susceptibility, we can therefore utilize conserved total S z , but not conserved total S, when adding the source term BŜ z (even though in the end we set B = 0). Numerical results are presented in Appendix A, and reproduce precisely the results of standard NRG using Eq. (25). Similarly, ∂NRG may be used to obtain the charge susceptibility, which in the wide-band limit is given simply by χ C (T ) ∂ 2 F N (β)/∂ 2 = ∂ n H N ,β (and can be obtained by replacingŜ z byn in Eq. (25)). Indeed, Maxwell relations provide non-trivial connections between physical quantities obtained as derivatives. This was exploited recently in Ref. [44] to extract the fractional entropy of multi-channel Kondo states in quantum dot experiments, comparing with NRG calculations of charge derivatives.
The above examples for the AIM demonstrate that the ∂NRG algorithm is equivalent to known expressions for certain derived quantities. However, the power of ∂NRG is that it does this automatically within a generalized framework, and works equally well for any derivative in any quantum impurity model.

VI. NUMERICAL RESULTS
For the following numerical demonstrations, we implemented a basic NRG code in jax [19], using the included AD routines to obtain derivatives via Eq. (21). The code is available open-source at Ref. [45]. For simplicity we did not exploit quantum numbers here, and so eigenstate degeneracies were removed by adding Gaussian noise with a small variance σ 10 −11 (FD derivatives were calculated for comparison without noise, σ = 0). In the following we set the conduction electron half-bandwidth to D = 1 and use an NRG discretization parameter Λ = 3. Further details on the numerical calculations and the finite difference derivatives can be found in Appendix D and Appendix E respectively.
First we use ∂NRG to compute the derivative of the ground-state energy E N ;0 of the AIM with respect to the interaction U . AD results in Fig. 1 (blue lines) are compared with FD approximations (points), as a function of U at iteration N = 5 (upper panel) and N = 40 (lower panel), normalized by their respective U → 0 derivatives. The green lines show the variation of E N ;0 itself. The results show the non-trivial effect of renormalization going from N = 5 to N = 40 at intermediate U , as well as the saturation of the ground state derivatives at both large and small U . In this case we see excellent agreement between AD and FD results (although the former are far less computationally expensive to obtain). Fig. 2 demonstrates the use of ∂NRG to obtain thermodynamic quantities from derivatives of the NRG free energy. The inset shows the impurity occupation n ∂ F N at a temperature T /D ∼ 10 −5 (corresponding to N = 20 andβ = 0.9) for the same systems as in Fig. 1. Since = −U/2, the AIM possesses an exact particlehole symmetry and hence is at half-filling, n = 1. The AD results (blue line) satisfy this exact result precisely, while the FD results (points) show some numerical error. More interestingly, the ∂NRG framework allows to obtain higher derivatives with equal ease (Eq. (B2) is used instead of Eq. (18a)), as shown in the main panel of Fig. 2.
Here we calculate the corresponding second derivatives ∂ 2 F N /∂U ∂ ∂ U n , which again show non-trivial behaviour as a function of interaction strength U . The FD approximations agree well, but are much more costly to obtain [36], requiring for every point five executions of the entire NRG code per second derivative, and an expensive convergence test. ∂NRG requires only a single function evaluation (see Appendix C).
As a final application of ∂NRG for the AIM, we turn to the RG energy level flow diagram shown in the top panel of Fig. 3. The excitation spectrum of the effective rescaledĤ N plotted against iteration number N shows the well-known flow between fixed points [1][2][3]. In the example shown, the crossover scales are well-separated, such that we see distinct level structures associated with the free orbital (FO), local moment (LM), and strong coupling (SC) fixed points as marked on the diagram. At a fixed point, the level structure does not change with N . Indeed, the RG structure of the problem and resulting universality implies that the fixed point level structure is the same, independent of the microscopic model parameters U (≡ −2 ) and V -only the crossover scales between fixed points are affected. This is demonstrated in the bottom panel of Fig. 3, where we plot the derivatives with respect to U of the NRG energy levels E N ;i , again as a function of iteration number N . As expected, the derivatives vanish at the fixed points (the level structure does not depend on U at the fixed points); but there is a strong dependence along the crossovers, since the crossover scales depend on U .

VII. OUTLOOK: DYNAMICS AND THE VERTEX
The above numerical results for the AIM are provided as a demonstration proof-of-principle. Future useful applications exploiting the full power of ∂NRG may be found for more complex models, situations involving higher-order derivatives, optimization techniques requiring exact gradients, or in cases where partial derivatives of NRG outputs with respect to inputs are difficult to obtain by standard FD means. An example of the latter is the derivative of frequency-dependent dynamical quantities, such as impurity Green's functions or the conduction electron scattering T-matrix, with respect to bare model parameters.
In the context of dynamical mean field theory (DMFT) for correlated materials [12,13,46], model machine learning techniques [47] could be applied to optimize simpler effective models with respect to more complicated microscopic ones, by comparing their Green's functions. Given the non-trivial dependence of such dynamical quantities at different energies on model parameters, the exact gradient within the AD framework becomes an essential ingredient for gradient descent optimization.
Refs. [48,49] recently uncovered the analytic structure of the full local vertex and proposed a scheme to compute it within NRG. The vertex is an important object, entering for example in extensions of DMFT beyond the local limit [50]. It is possible that some partial information on the vertex could also be obtained by ∂NRG. This is inspired by the result in Ref. [51] for the AIM which, within the Matsubara formalism, connects the vertex at zero bosonic frequency F loc νν (ω = 0) to the functional derivative of the interaction self-energy Σ ν with respect to the hybridization ∆ ν , viz: where G ν is the single-particle impurity Matsubara Green's function. For a precise definition and discussion of F loc νν (ω = 0), see Ref. [48,51]. Since G ν and Σ ν can be calculated within standard NRG [52], we argue that such an object is obtainable within ∂NRG. We speculate that a Keldysh version of Eq. 26 may similarly provide access to certain information about the real-frequency vertex at finite temperatures.
In order to calculate such derivatives of dynamical quantities, further code development is required, since the full-density-matrix NRG method would need to be implemented using AD libraries such as jax [19] and integrated within the ∂NRG scheme described in this paper. We leave this for future work.

VIII. CONCLUSION
NRG is the gold-standard method of choice for solving generalized quantum impurity models [1,2]. In this work, we introduce a new variant of the standard algorithm -∂NRG -which makes use of the differential programming paradigm to automatically and efficiently obtain derivatives of NRG outputs with respect to input model parameters.
We make use of the AD jax library [19], together with a bespoke routine based on Eq. (21) which allows the derivatives of Hamiltonian eigenvalues and eigenvectors to be obtained at any iteration in an accurate and inexpensive way. Our fully commented code is available open-source [45].
We demonstrated the use of ∂NRG by application to the Anderson impurity model, for which we obtained the derivative of NRG energy levels with respect to model parameters, calculated thermodynamic quantities from derivatives of the NRG free energy, and susceptibilities from derivatives of Hamiltonian source terms. ∂NRG may be useful for machine learning applications involving NRG [47] for which gradient descent optimization requires derivatives of a loss function; for adaptive broadening schemes [53]; or for optimal control protocols [29]. Richer physical information may be obtained from derivatives of dynamical quantities. This also opens the door to automatic differentiable DMFT, with ∂NRG as the impurity solver. Finally, we note that the ∂NRG methodology is compatible with interleaved-NRG (iNRG) [54,55] and generalizations utilising full symmetries [43]. This is left for future work. In this Appendix we demonstrate that: (i) NRG and ∂NRG are not affected by the addition of Gaussian noise with a small variance; and that (ii) the real-world performance of ∂NRG exceeds that of basic AD and FD in terms of both accuracy and speed.
A degeneracy-free NRG Hamiltonian is required for application of Eq. (18). However, physical systems often do have eigenvalue degeneracies, arising for example from underlying symmetries. Although one can partially mitigate this problem by exploiting Abelian and non-Abelian quantum numbers, some degeneracies may remain. Here we consider the 'worst case' scenario in which no symmetries are exploited.
First we examine the effect of adding Gaussian noise with mean µ = 0 and variance σ to the diagonal elements ofĤ N . We use the Kondo temperature T K as one figure-of-merit for assessing the effect of the noise term. For simplicity we define the Kondo temperature through the impurity entropy via S imp (T = T K ) = 0.5 (suitably between the local moment and strong coupling fixed point values). We compute T K for 10 −8 ≤ σ ≤ 10 −2 and different numbers of kept states 100 ≤ M K ≤ 500 (using fixed Λ = 3). Fig. 4(a) shows clearly that for σ ≤ 10 −6 the Kondo temperature does not noticeably depend on the noise level (we have confirmed explicitly down to σ = 10 −15 that the results are fully converged for each M K ). Other physical quantities computed in standard NRG show similar behaviour. However, the effect of adding noise is more pronounced in derivatives. In Fig. 4(b) we compare ∂ B F N (β)| B=0 = Ŝ z H N ,β as a function of Wilson chain length N , as computed with ∂NRG and straight AD for different σ. Since the impurity magnetization is evaluated at zero field, SU (2) spin symmetry implies the exact result Ŝ z H N ,β = 0. However, this spin symmetry is not enforced, and so we see deviations due to the noise. As Fig. 4(b) shows, ∂NRG accurately approximates the exact result even at relatively high σ. By contrast, AD yields derivatives that strongly depend on σ and have much higher error than ∂NRG at a given σ. Indeed, derivatives are numerically not computable for all N in standard forward mode AD for σ < ∼ 10 −6 ; at larger noise levels, AD derivatives are computable but the accuracy can become poor, especially at later iterations due to the propagation and accumulation of errors through the derivative chain rule.
Therefore, although NRG for the AIM is insensitive to noise for σ ≤ 10 −6 , the AD approach is not stable at these noise levels. At higher noise levels, AD is stabilized, but the accuracy suffers. On the other hand, ∂NRG avoids such problems: a lower noise level can be used since each derivative calculation is independent, and highly accurate results can be obtained.
Within ∂NRG, the magnetization calculation via the analog of Eq. (24) shown in Fig. 4(b) does not involve eigenvector derivatives, and is therefore particularly stable. By contrast, the AD calculation of any derivative is built up recursively via Eq. (3) and therefore does involve the computation of eigenvector derivatives at each and every step. This contributes to the relative performance gain in ∂NRG.
In Fig. 5 we examine the magnetic susceptibility χ(T ). The calculation of this quantity, which is obtained automatically in ∂NRG via the second derivative of the NRG free energy, is formally equivalent to the analytic expression, Eq. (25) (and does involve eigenvector derivatives). Gaussian noise of width σ is added to the diagonal ofĤ N at iteration N . The calculated χ H N ,β atβ = 0.9 (used hereafter) is then interpreted as the true χ(T ) at inverse temperature β = Λ (N −1)/2β . The numerical results from ∂NRG show that χ H N ,β is obtained reliably at all N (and hence all T ) for σ as low as 10 −11 . Only at smaller σ does the method break down: derivatives are then not computable for earlier iterations/higher temperatures (the correct low-T behavior is however still captured). As demonstrated in Fig. 4 and confirmed in Fig. 5(a), highly accurate results are obtained for σ < 10 −6 . Therefore in ∂NRG, we have a wide window 10 −11 < σ < 10 −6 over which numerical results are fully converged and stable. By contrast, in Fig. 5(b) we show results for the same quantity obtained by straight AD. The calculation is not numerically stable for small σ, but very inaccurate for large σ (typical of the breakdown for higher-order derivatives). As such there is no reliable regime for which robust results can be obtained by straight AD.
In Fig. 5(c) we examine the feasibility of an alternative approach to the eigenvalue-degeneracy problem that avoids adding Gaussian noise. The method, proposed by Liao et al in Ref. [31], consists of reformulating Eq. (18b), (A1) with f (x) = 1/x. Divergences induced by eigenvalue degeneracies can be avoided by replacing the function f (x) with the approximate broadened form f (x) ≈ x/(x 2 +δ 2 ) with δ 1. This broadening approximation is known to distort somewhat the overall results, but has the advantage that divergences are removed without the need to lift degeneracies. is only obtained in the δ → 0 limit at the lowest temperature scales. Finite δ at intermediate T appears to yield rather poor results. We conclude that ∂NRG is indeed the method of choice in this context, in terms of accuracy.
We now turn to an analysis of the real-world performance of ∂NRG in terms of calculation time, comparing with straight AD implemented in jax [19]. The figure of merit is the runtime measured in seconds [s] of both algorithms run on the same machine (in this case an AMD Threadripper 2950X platform running Python 3.8.10 with the OMP_NUM_THREAD = 16 flag; further package specifications can be found in [56]). We do not utilize any quantum numbers here.
In Fig. 6(a) we compare the runtime for the calcula-tion of (i) Ŝ z H N ,β (solid lines), and (ii) Ŝ z H N ,β and n H N ,β together (dashed lines); using ∂NRG (blue), forward mode AD (red), and backward-mode AD (green), at the last NRG iteration, N = 40. We see that for all M K , ∂NRG performs best (with the relative speedup becoming more pronounced at larger M K ). For straight AD, we see that forward mode beats backward mode for the calculation of a single derivative; but since backward mode AD scales with the number of inputs rather than outputs, it will overtake the forward mode when many derivatives are calculated. In Fig. 6(b) we compare runtimes using ∂NRG with a single diagonalization ofĤ N (blue), ∂NRG with double diagonalization ofĤ N (orange), and forward mode AD (red). We calculate Ŝ z H N ,β (solid lines), and χ H N ,β (dashed lines) at all iterations N ≤ 40. For the orange lines, two diagonalizations ofĤ N are performed at each step (with and without noise), which thereby eliminates error propagation from the added noise and provides the most accurate calculation (this may or may not be needed in practice, depending on the situation). We see that ∂NRG with a single diagonalization per step is the fastest in all cases. For the simpler quantity Ŝ z H N ,β (which involves only a first-derivative), the ∂NRG using two diagonalizations and forward mode AD have similar runtimes, although AD is slightly faster. However, for χ H N ,β , both versions of ∂NRG are considerably faster. This is because the calculation of χ H N ,β requires a second-derivative, which is much more computationally expensive in AD, but almost as cheap in ∂NRG. The relative performance gain for ∂NRG also increases if several derivatives are computed.
In conclusion, in a typical setting, ∂NRG is considerably more efficient than AD (often by orders of magnitude) -while at the same time being more accurate.
whereĥ i is an operator defined over the same Hilbert space asĤ, and with the corresponding coupling constant θ i a real scalar. In this case, all second order derivatives of the Hamiltonian must vanish, ∂ 2Ĥ ∂θi∂θj = 0. It is then possible to derive a closed-form expression for the second-order derivatives of eigenvalues and eigenvectors with respect to the couplings θ 1 , θ 2 ∈ {θ}, viz: where, With these formulae one can compute physical observables depending on second-derivatives, such as susceptibilities, using ∂NRG.

Appendix C: Comparing differentiation methods
The basic NRG algorithm is described in the main text, with the key steps contained in Eq. (6)- (13). For more details, see Ref. [2]. Assuming that degeneracies in the NRG HamiltonianĤ N are lifted, we can use Eq. (18) to compute the derivatives of eigenvectors and eigenvalues ofĤ N . The Wengert list [37,57] with the forward primal trace (function evaluation) and the forward tangent trace (function differentiation) can then be compiled, as shown in Table I. Similarly we compile the forward primal trace and forward tangent trace for the ∂NRG algorithm, see Table II. For ∂NRG, we have one additional step in the primal trace for the main algorithm (corresponding to Eq. (16)), but a trivial tangent trace.
We can now compare the efficiency of the two approaches.
For forward mode AD (fAD) we have ops(fAD) = 7N , and to compute forward primal and tangent traces, 2×ops(fAD) operations are required. For ∂NRG by contrast, ops(∂NRG) = 8N , but no other step is required to compute the derivative ofĤ N . However, this does not mean that ∂NRG is twice as fast as fAD because the bottleneck eigensolver routines [58] appear only in the forward primal trace. Nonetheless ∂NRG still provides a considerable performance advantage, as established by explicit benchmarks in Appendix A

Forward Primal Trace Forward Tangent Trace
where h is the FD value. The FD derivative is connected to the definition of the derivative by taking the limit, In practice we compute FD derivatives for a set of finite {h i }, and check for convergence. However, care must be taken due to the trade-off between round-off and truncation errors [37]. The truncation error arises due to the finite h > 0, which is required for the numerical evaluation of D h [f ](x), and which would diminish in the formal limit h → 0. However for very small h, the difference in the function evaluations for f (x) and f (x + h) cannot be distinguished numerically due to inevitable round-off errors in any finite precision arithmetic. This leads to increasing errors as h → 0.
We quantify the FD error as, This quantity is plotted in Fig. 7 for the derivative of the AIM impurity occupation with respect to the impurity interaction, ∂ U n H N ,β . We have used the ∂NRG derivative as the true value of ∂f ∂x in Eq. (E2). Fig. 7 shows rather typical behavior, with truncation errors dominating at large h and round-off errors dominating at small h, with a stability plateau in between where the derivative should be evaluated. However, comparison of the upper and lower panels (which correspond to model parameters U = 10 −2 and U = 10 2 respectively), demonstrates an important limitation of FD differentiation: the optimal h is not fixed but depends on input model parameters. Reliable results therefore require such a convergence analysis for each new set of model parameters and for each new derivative. This becomes computationally costly. The situation becomes significantly worse when considering higher-order derivatives.