Aspects of capacity of entanglement

Many quantum information theoretic quantities are similar to and/or inspired by thermodynamic quantities, with entanglement entropy being a well-known example. In this paper, we study a less well-known example, capacity of entanglement, which is the quantum information theoretic counterpart of heat capacity. It can be defined as the second cumulant of the entanglement spectrum and can be loosely thought of as the variance in the entanglement entropy. We review the definition of capacity of entanglement and its relation to various other quantities such as fidelity susceptibility and Fisher information. We then calculate the capacity of entanglement for various quantum systems, conformal and nonconformal quantum field theories in various dimensions, and examine their holographic gravity duals. Resembling the relation between response coefficients and order parameter fluctuations in Landau-Ginzburg theories, the capacity of entanglement in field theory is related to integrated gravity fluctuations in the bulk. We address the question of measurability, in the context of proposals to measure entanglement and R´enyi entropies by relating them to U ð 1 Þ charges fluctuating in and out of a subregion, for systems equivalent to non-interacting fermions. From our analysis, we find universal features in conformal field theories, in particular the area dependence of the capacity of entanglement appears to track that of the entanglement entropy. This relation is seen to be modified under perturbations from conformal invariance. In quenched 1 þ 1 dimensional CFTs, we compute the rate of growth of the capacity of entanglement. The result may be used to refine the interpretation of entanglement spreading being carried by ballistic propagation of entangled quasiparticle pairs created at

Entanglement entropy has been proven quite useful as a diagnostic of topological properties of ground states of quantum many-body systems, for a recent review see e.g. [1]. At the same time, starting with [2] it has also been playing an instrumental role in understanding the connection between the geometry of space-time and strongly coupled field theories in the AdS/CFT correspondence, a recent extensive review is [3].
In this paper, we study another quantity associated to a reduced density matrix, namely the capacity of entanglement. It is defined in the same way as one defines heat capacity for thermal systems, which was the original motivation to introduce this quantity in [4] (see also [5]). Explicitly, if λ i are the eigenvalues of a reduced density matrix, entanglement entropy is defined as S EE = − λ i log λ i , and capacity of entanglement can be written as C E = i λ i log 2 λ i − S 2 EE . The latter can also be thought of as the variance of the distribution of − log λ i with probability λ i , and it is clear that it contains information about the width of the eigenvalue distribution of the reduced density matrix.
Our main motivation was to understand the connection between capacity of entanglement and quantum fluctuations, and to e.g. see whether capacity of entanglement sheds any interesting light on the accuracy of the semi-classical approximation in gravity. This connection turns out to be the following. As is well-known, the Ryu-Takanagi formula [2] relates the entanglement entropy in a quantum field theory to the volume of a minimal surface ("RT surface") in a dual anti-de Sitter spacetime. If we include quantum gravity fluctuations of the RT surface, and integrate over them, the integral does indeed compute the capacity of entanglement in the quantum field theory [6]. Thus, capacity of entanglement associates a "width" to the entanglement entropy: a measure of quantum gravitational effects. This relationship is a leading contribution to a more complete sum of gravitational fluctuations. In [7], the RT surface was replaced by a domain wall with tension (a "cosmic brane"), including its complete gravitational backreaction. The volume of the brane in the backreacted geometry was shown to be equal to a variant of the Rényi entropies in the quantum field theory, coined "modular entropy" in [8]. This result was used in [6], independently the modular entropy was introduced and used to define the capacity of entanglement in the earlier work [4,5].
Entanglement entropy and capacity of entanglement are two features of the full entanglement spectrum [9] (the set of eigenvalues of the reduced density matrix). There has been growing interest to study the entanglement spectrum as a diagnostic of the phase structure of various systems. For our purposes, of special relevance are the studies of the entangle-ment spectrum for 1+1 dimensional gapless [10][11][12] and gapped [13] systems. Originally it was also hoped that there would be a precise relation between the entanglement spectrum and the energy eigenvalue spectrum, but there are caveats [14]. Connections between the capacity of entanglement and the entanglement spectrum have been explored in [15].
Complete knowledge of all the Rényi entropies is in principle sufficient to recover the complete set of eigenvalues of the reduced density matrix, the entanglement spectrum distribution, by a suitable integral transform. In this way, [10] derived the spectrum for 1+1 conformal field theories in the ground state, with a finite length interval as the subsystem, and found that the spectrum is (under some assumptions) characterized by a universal function, depending only on two parameters: the central charge c of the theory and the largest eigenvalue λ max . Universality was also found in gapped systems [13], while the assumptions of [10] were further studied in [16] and the spectrum was interpreted as a reparametrization of the Cardy formula counting energy levels in a CFT.
While we will make a few remarks about the full entanglement spectrum, we will mostly restrict to the two lowest moments or cumulants of the spectrum, which are precisely the entanglement entropy and capacity of entanglement. In terms of the modular Hamiltonian K A = − log ρ A they are the expectation value and variance of K A respectively. As stated above, for systems of with a gravity dual, higher cumulants capture the whole series of the quantum gravitational fluctuations about the RT surface, giving the complete entanglement spectrum or the Rényi / modular entropies.
Besides looking at gravitational fluctuations, we will in this work study the capacity of entanglement from many different points of view. We develop a more comprehensive list of its properties and relations to other concepts of quantum information, not just in the context of holographic gauge/gravity duality. We study a range of quantum systems from simple qubit systems and random pure states to spin chains, conformal and non-conformal quantum field theories, with and without a gravity dual, and also non-equilibrium quenched systems. We are especially interested in identifying universal features. For example, we find evidence for an "area law" for the capacity of entanglement in conformal theories. Curiously, in four-dimensional conformal field theories, in a fairly natural regularization scheme, the ratio between the coefficients in front of the area term in entanglement entropy and capacity of entanglement turns out to be precisely a/c, the ratio of the a and c anomaly coefficients, for spherical entangling surfaces. This area law suggests that most of the quantum fluctuations of the RT surface are located near the boundary of AdS and this therefore does not seem to shed much light on the size of local bulk quantum fluctuations.
Another important observation is that systems where are all entanglement is carried by EPR pairs (pairs of qubits that contribute log 2 to the entanglement entropy) have zero capacity of entanglement. Therefore, whenever we find that entanglement entropy and capacity of entanglement are approximately equal to each other, EPR pairs are not a very good approximation of the quantum state. As we will show, randomly entangled pairs of qubits give a much better description, and this can be used to e.g. sharpen the picture of ballistic propagation of entanglement carried by quasiparticles created at a quench.
The outline of this paper is as follows. Section 2 begins with some basic definitions and relations. We begin with definitions of entropy, capacity of entanglement, and modular entropy based on quantum information theory. We discuss the moments and cumulants of the entanglement spectrum / modular Hamiltonian, and show how to obtain them from the Rényi entropies. We then move to analogues of thermodynamic definitions of entropy and heat capacity applied to quantum entanglement, and show agreement with the previous quantum information theoretic definitions. We move to review the definitions of fidelity susceptibility and quantum Fisher information, and show how they are related to the capacity of entanglement (related discussion is also in [17,18]). Finally, we map capacity of entanglement to the heat capacity of a thermal CFT on a hyperbolic sphere, recovering the thermodynamic equality of heat capacity and variance of entropy.
Section 3 discusses some properties of the capacity of entanglement. We first derive an upper bound for it in quantum systems with a finite dimensional Hilbert space. We also recast this result as an estimate of the variance in classical information theory. Addressing the measurability of the capacity of entanglement, we follow the proposal to relate bipartite entanglement for non-interacting fermions with fluctuations of conserved U (1) charges [20][21][22][23][24]. We show how the moments and cumulants of entanglement spectrum and U (1) charges are related. We point out that an equality between the entanglement entropy and capacity of entanglement arises naturally in this context in a large N limit, at the leading level. We speculate how such a limit might arise in theories with a holographic bulk gravity dual, and consider an example of an eigenvalue distribution giving rise to the equality C E = S EE . We then interpret the capacity of entanglement holographically. We give a shorter derivation of the bulk integral formula of [6], and discuss its interpretation.
In Section 4, we study how the capacity of entanglement depends on the initial quantum state of the complete system. In simple n-qubit examples, the entanglement entropy and the capacity of entanglement are in general not proportional. For subsystems of random pure states, the approximate equality of the two can arise as ensemble averages, and we compute the precise ensemble averages for several choices of the dimension of the parent Hilbert space where the pure states are defined and choices of dimension of the subsystem.
In Section 5, we study the capacity of entanglement in field theories. First we calculate it in 1+1 CFTs at equilibrium and in local and global quenches. In all cases, the capacity of entanglement is equal to the entanglement entropy. For the non-equilibrium cases this implies that the rate of growth of the capacity of entanglement follows that of S EE . We comment on the interpretation of ballistic spreading of entanglement. We then extend the computation to higher dimensions, following an alternative method based on stresstensor correlators [25,26]. After that, we study universality. We discuss some aspects of the entanglement spectrum and its (non)-universality. Then we consider the possible dependence on regularization schemes. In generic quantum field theories and CFTs, the ratio C E /S EE depends on the regularization scheme, but for CFTs with a holographic gravity dual, the gravity interpretation provides a natural preferred regularization scheme. With this scheme, for spherical entangling surfaces in D = 4, the ratio is bounded as a consequence of the conformal collider bounds of [27,28]. Restricting to theories with holographic duals without higher derivative terms, the ratio turns out to be exactly equal to one.
Finally, in section 6, we study how the capacity of entanglement behaves when conformal invariance is broken slightly. As a warm-up example, in the anisotropic Heisenberg XY spin chain, the equality of capacity and entropy emerges at critical domains. Moving away from criticality, the capacity of entanglement and the entanglement entropy are no longer equal and develop different subleading divergence structures. We then perform a more general analysis perturbing CFTs by relevant operators, following the strategy in [29][30][31].
We end with some concluding remarks.

Definitions and relations
In this section, we introduce the concepts that will be the focus of later sections. As in thermodynamics, there will be two different routes to defining these concepts. A "microscopic" route is to follow the standard definitions of quantum information theory. The second, "thermodynamic" route, follows from associating a Boltzmann distribution to the (reduced) density matrix and then formally applying thermodynamic definitions to quantum entanglement. We will begin with the first route, then follow the second route, and finally establish that the two different routes lead to the same concepts. Apart from some refinements, this section is mostly a review.

Concepts of quantum information theory
We start with a quantum system prepared into a pure or mixed state described by the density matrix ρ. Since we will mostly be interested in bipartite entanglement between a subsystem A and its complement, we assume that the density matrix is a reduced density matrix in the subsystem A. However, this is just for terminological convenience, most of the concepts obviously apply more generally. The most familiar concept quantifying the purity of the state ρ, or the entanglement between the subsystem and its complement, is the entanglement entropy S EE = −Tr(ρ log ρ) . (2.1) Introducing the modular Hamiltonian K = − log ρ, the entanglement entropy becomes the expectation value of K, Another much studied measure of entanglement is given by the Rényi entropies S α , Recent work [7] suggested that in addition to the Rényi entropies S α , it is interesting to study a modification, called 'modular entropy' in [8], The motivation in [7] comes from entanglement entropy in CFTs with a holographic gravitational dual, where the Ryu-Takayanagi formula relates the entanglement entropy to a volume of a dual minimal surface. In [7] it was found that the modular entropy also satisfies a holographic area law, with the interpretation 1 where the "Cosmic Brane" is a domain wall with tension (α − 1) backreacting to the bulk geometry. In the limit α → 1, (2.5) reduces to the Ryu-Takayanagi formula.
In an earlier work [4] (see also [5]), (2.4) was used as a starting point to define a new quantity, capacity of entanglement C E , to extend the thermodynamic relations found in quantum entanglement. In this paper we prefer to define it first in a microscopic, quantum information theoretic way, as a property of the state ρ, From this definition, written in terms of the modular Hamiltonian K, the capacity of entanglement is equal to the variance (the second cumulant) of K: The thermodynamic definition and its equality with (2.6) will be discussed in sections 2.2 and 2.3. A complete set of data associated with ρ is the entanglement spectrum [9], the complete set of eigenvalues {λ m = e −εn } of the (reduced) density matrix ρ or the eigenvalues ε m of the modular Hamiltonian K, along with their multiplicities g m . There are other ways to encode the data of the spectrum {g m , ε m }. One alternative is to consider its moments or cumulants. For that, one can define a generating function k(α) as the analytic continuation 2 of the moments of the density matrix ρ, (2.8) By the usual rules, k(α) is a generating function for all the moments of the modular Hamiltonian: Formally, one may also refer to these as the moments of entanglement entropy 3 , S n EE = K n . The partition function is related to the (analytically continued) Rényi entropies. 1 with α being a natural number 2 In general, the analytic continuation may not be well defined for all α, but we will need the continuation only into a small neighbourhood of α = 1.
3 In classical information theory, Shannon information H = − i p(x i ) log 2 p(x i ) quantifies the average information in messages composed of letters x i with probabilities p(x i ), generated at a source. In this context, the higher moments of H have a natural interpretation, characterizing the fluctuations in information. (A related concept is that of information loss in the evolution of chaotic dynamic systems, where the cumulants of H [32], and their generating function, the Rényi information [33] can be used to characterize dynamical chaos.) Conversely, the analytic form of the Rényi entropies S α determines k(α) = exp[(1 − α)S α ], from which the eigenvalue spectrum can be extracted by a suitable expansion or integral transformation (see e.g. [34]).
The logarithm of the generating function, is the generating function for the cumulants (connected correlators), If the explicit form of the Rényi entropies S α is known, through (2.11) by applying derivatives, we can then easily calculate the capacity of entanglement C E , which we defined in (2.6), (2.12) The first derivative gives the first cumulant, which is S EE . To summarize, there are four alternative ways to represent the entanglement data: the entanglement spectrum, the Rényi entropies, the moments K n or the cumulants K n c . It is possible to move from one description to the other. For example, knowing all the moments allows to construct the generating function k(α), which determines the entanglement spectrum. The different alternatives may be practical for highlighting different features of the entanglement data. The rest of the paper will focus on studying what features are captured by the second cumulant, the capacity of entanglement.
For pure states obviously all moments and cumulants vanish. For maximally mixed states, K n = (ln d) n ; K n>1 c = 0 , (2.13) in a Hilbert space with dimension d < ∞. The spectrum has only one d-fold degenerate eigenvalue ε = ln d.

Thermodynamical definitions
In the previous section, we gave a quantum information theoretic definition and interpretation of the capacity of entanglement. In this section, we follow the original definition, by following the analogues of thermodynamical relations applied to quantum entanglement. We begin with the state ρ = e −K and introduce an inverse temperature β and compute a partition function Z ρ (β) = Tr(ρ β ) = Tr(e −βK ) . (2.14) This is of course the same as the generating function k(α), with β = α. The only reason for the two different notations is to make a distinction between the thermodynamic and quantum information theoretic interpretations. Z(β) is a more natural notation in the thermodynamical context, while k(α) refers to the quantum information theoretic context, hopefully this will not create confusion. One can then proceed to introduce a free energy and an "internal" energy which, as seen in the last line, is just the "thermal" expectation value 4 of the modular Hamiltonian.
The thermodynamic definition of entropy satisfies the usual relation with the internal and free energy, (2.17) Finally, one defines a heat capacity, Except for the partition function, it is not manifestly clear how these thermodynamic definitions are related to the quantum information theoretic definitions of section 2.1. That is the topic of the next section.

Relations between the thermodynamic and quantum information theoretic definitions
To begin with, from the density matrix ρ = e −K we construct a one-parameter family ρ α of density matrices with proper normalization 5 , The key idea here will be that when we apply the quantum information theoretic definitions of section 2.1 (which were given for arbitrary states ρ) to the one-parameter family of states (2.19), we make contact with the thermodynamic definitions with β = α.
We consider first the entanglement entropy. We observe that 20) verifying the relation between the two definitions of entropy.
On the other hand, we may consider the modular entropiesS α of the original state ρ. This gives another relation, Finally, we show that the quantum information theoretic definition of the capacity of entanglement C E (which was a property of an arbitrary state ρ, characterizing the variance of its spectrum), applied to the one-parameter family ρ α , matches with the thermodynamic definition: This also establishes (setting α = 1) the quantum information theoretic counterpart of the thermodynamic response-fluctuation relation between heat capacity and variance of entropy: for a state ρ the entanglement heat capacity and the variance of entanglement entropy satisfy which we used as the "microscopic" definition of C E in (2.6). Finally, later in the paper we will consider grand canonical ensembles of either identical bosonic or fermionic particles, and there it will be convenient to use the expression of the capacity or variance in terms of the expectation values of the occupation numbers n l as a single sum, where we choose the + signs for bosonic and − signs for fermionic systems.

Variance, fidelity susceptibility and quantum Fisher information
The capacity or variance (2.6), (2.12), can also be related to other concepts of quantum information: fidelity susceptibility (FS) and quantum Fisher information (QFI). These concepts are important in quantum metrology, and can be used to in various contexts, such as Bose-Einstein condensation, temperature estimates, and quantum phase transitions, see e.g. [35], [36], and the recent extensive review [37] (and references therein). Here we give only a brief review for the purpose of making a comparison to the capacity of entanglement C E .
Consider deformations of a density matrix, parametrized by a continuous parameter θ. This essentially gives a one-parameter flow ρ(θ) in the state space. One measure of a "distance" between two density matrices ρ, σ is the (Uhlmann) fidelity F (ρ, σ) [38], Then F (ρ(θ), ρ(θ + )) can be used as a measure of how much the density matrix changes along an infinitesimal flow θ + . Since the fidelity is at maximum value 1 at = 0, its Taylor series expansion is where the coefficient is called the fidelity susceptibility. For example, consider θ as a control parameter driving the ground state of a system across a quantum phase transition. Then the fidelity susceptibility is an interesting observable, manifesting singular behavior that can be used to characterize the quantum phase transition . A slightly more general approach is to compare ρ and ρ + δρ, and work in a basis where ρ is diagonal with eigenvalues λ i . Note however that in this basis the perturbation δρ need not be diagonal. A brute force computation gives then from which we read the fidelity susceptibility We will shorty connect this to capacity of entanglement but before doing that we first discuss another related concept, quantum Fisher information. Let us specify a point θ = 0 and consider small deformations about it. In quantum measurements, one is often interested in detecting the parameter θ by finding an observableθ, a locally unbiased estimator with the normalization ∂ ∂θ Tr(ρ(θ)θ)| θ=0 = 1 . (2.30) The accuracy of the estimates of θ is characterized by the variance ∆θ 2 , which is bounded from below by the quantum version of the Cramér-Rao bound [39], where N is the number of samples and the lower bound is characterized by the quantum Fisher information g(θ). To define it, consider first the so-called symmetric logarithmic derivative L(θ) of ρ(θ), defined by ∂ρ(θ) ∂θ = 1 2 (Lρ + ρL) . (2.32) The quantum Fisher information g(θ) is then defined as g(θ) = Tr(ρ(θ)L 2 (θ)) = Tr ∂ρ(θ) ∂θ L(θ) . (2.33) For an explicit formula, consider again a small deformation ρ + δρ, in a basis where ρ is diagonal (but δρ need not be). Then solve the equation for L, in terms of the eigenvalues λ i of ρ we get It is now straightforward to establish an equality relating the Fisher information to the fidelity susceptibility (2.29), g = Tr(ρL 2 ) = 4χ . (2.36) Coming back to the variance K 2 c , let us consider a particular deformation of the density matrix ρ = e −K by deforming in the direction of (imaginary) modular flow, ρ(θ) = e −(1+θ)K /(Tr e −(1+θ)K ) , (2.37) in other words the family of escort matrices (2.19). Computing the fidelity F (ρ(0), ρ(θ)) to second order in θ (the normalization factor must also be expanded), and setting θ = 0, yields i.e., the capacity C E is equal to the fidelity susceptibility for this particular flow and the quantum Fisher information for estimating the parameter θ. Alternatively, expanding  [17]. In a finite temperature system, when the density matrix is that of a canonical ensemble, the above quantities are called the thermal fidelity susceptibility etc., and also equal to the heat capacity [40]. However, we emphasize that the first equality in (2.38) assumes the particular flow (2.37). For example, suppose that we deform the modular Hamiltonian by a perturbation V , to consider a flow ρ(θ) = e −K−θV /(Tr e −K−θV ) . (2.39) This leads to a more complicated expansion of the fidelity, since in general V, K do not commute, and the second equality in (2.38) does not hold.

Relation to thermal heat capacity
So far we have mostly been keeping the discussion at a general level, without specifying the origin of the quantum state ρ. In the remainder of the paper, we will narrow our focus to bipartite entanglement in quantum systems, where the system is first prepared to a generic quantum state ρ 0 , then partitioned into a subsystem A entangled with its complement B, and then study the reduced density matrix ρ A = Tr B ρ 0 . The entanglement entropy is the von Neumann entropy of ρ A . In this section, we consider specifically the entanglement entropy for CFT initial vacuum states and spherical entangling surfaces, in the context of [41,42]. In these papers, the authors discuss the interpretation of EE and Rényi entropies, by mapping the causal development of the enclosed ball A to a hyperbolic cylinder R × H d−1 , where the CFT in the vacuum in the original spacetime becomes a CFT in H d−1 in a thermal bath, with temperature where R is the radius of the original entangling sphere and the curvature scale of H d−1 .
The entanglement entropy is then mapped to standard thermal entropy S therm (T 0 ) of the CFT in the thermal bath, similarly the Rényi entropies are mapped to those of the thermal ensemble.
We consider the capacity of entanglement C E of the ball-like subsystem. We show that it maps to the standard thermal heat capacity C therm of the thermal CFT. We start from 6 (2.12), but do not yet take the limit α → 1 but consider where S α is the Rényi entropy. On the other hand, it is related to the free energy F on the hyperbolic sphere by [42] Writing T α ≡ T 0 /α and β α = 1/T α , we get Thus, i.e., the capacity of entanglement C E of ρ A becomes the usual heat capacity C therm of the thermal CFT on the hyperbolic sphere, as was expected 7 . The temperature is parameterized by the radius of the entangling sphere, (2.40). This also guarantees that the holographic dual interpretation with a topological black hole in the bulk will be the heat capacity (associated with the horizon) of the black hole. Then, dialing the temperature of the black hole translates back in the original spacetime to the imaginary modular flow among the escort matrices accompanying the reduced density matrix ρ A of the subsystem A.
It should be noted that in Appendix A, [42] considered various inequalities satisfied by the Rényi entropy, and interpreted one of them (their (A.4)) as a result of proportionality to a heat capacity which is positive. These inequalities were studied and proven in [6].
Interim summary. We have established a string of equalities between the variance of the entanglement spectrum K 2 c , the variance of entanglement entropy, the capacity of entanglement, a reduced fidelity susceptibility and quantum Fisher information, However, the third equality (marked with an asterisk) holds for a particular (modular) flow among density matrices. Hereafter, we will mostly adopt the notation C E , and we will be interested in comparisons to the entanglement entropy S EE , in a given state ρ.

Properties of capacity of entanglement
We have introduced the cumulants of the modular Hamiltonian, and interpreted the second cumulant as the capacity of entanglement or the variance of entanglement entropy. In this section we study some properties of these concepts.

An upper bound on the capacity of entanglement
In a system with a finite Hilbert space of dimension N , the von Neumann entropy takes values between 0 and S max = ln N . It is interesting to derive similar bounds for the higher cumulants of the spectrum of ρ. In this section we derive an upper bound for the capacity of entanglement C E = K 2 c . We may assume that we have diagonalized the density matrix. We denote the eigenvalues λ i , with the degenerate eigenvalues also carrying separate labels. There are two types of questions to consider: (i) for a given entropy S, what is the maximum variance K 2 c , or (ii) find the maximum of K 2 c with no restriction on the entropy. We start with the problem (i). It leads to the variational problem to extremize the functional where the Lagrange multipliers A, B i and C enforce that the entropy takes a given value, that the λ i are non-negative, and that the sum of the eigenvalues is equal to one, respectively. Note that F is not equal to K 2 c , instead F = K 2 c + S 2 , but since S is fixed extremizing F is the same as extremizing K 2 c . Differentiating F with respect to λ i yields We see that if λ i = 0, this is a quadratic equation for log λ i with at most two solutions for λ i . In addition, some of the λ i could be zero, for which this equation is somewhat ill-defined, but can formally be solved with B i = 0 and C infinite. We now first move on to the alternative problem (ii), which turns out to lead to a simple result, before turning back to (i). If we do not fix S, we need to maximizẽ Differentiating with respect to λ i now leads to Also in this case, having three different non-zero values among the λ i would lead to a contradiction, as all three would be solutions of a quadratic equation for log λ i with the same coefficients. We therefore conclude that for either optimization problem, the non-zero eigenvalues can take at most two different values. Since a zero eigenvalue does not contribute to either entanglement entropy or the variance, we will ignore the zero eigenvalues for now.
We will denote the two non-zero eigenvalues by λ 1 and λ 2 . If we call the corresponding occupation numbers n 1 and n 2 , then n 1 λ 1 + n 2 λ 2 = 1. We can express n 1 and n 2 in terms of λ 1 and λ 2 and when we do this we find which shows that in order to maximize C E , we should make λ 1 and λ 2 as different as possible.
It will be convenient to introduce the variables so that we can express all quantities in terms of x, y and N = n 1 + n 2 . We find that and (3.54) Let us try to maximize K 2 c . We notice that first of all we want to minimize y. The minimum value would be y = 0, but this is just the maximally entangled state, for which the capacity of entanglement vanishes. Instead, we set y = 1/(N − 1). Then, the maximum of K 2 c is found at x N , which is the solution of We then find an upper bound for the variance, Solving the transcendental equation (3.55) iteratively, we obtain for large N and Notice that we did not keep S fixed when we varied x and y. However, for the above values of x and y the entropy for large N remains of the order S ∼ log N so we expect that it remains true that C E,max is proportional to S 2 max (with a different numerical constant) if we were to vary K 2 c keeping S fixed and of the order log N . It would be interesting to know if states with maximum capacity of entanglement have special properties, but we have no specific insights at this point.
Formally, the inequality resembles Popoviciu's inequality on variances [19], which states that the variance of a random variable X with a range from inf(X) = m to sup(X) = M satisfies If we were to interpret the entropy S EE as a random variable with the upper bound S max and the lower bound 0, the inequality (3.59) would indeed superficially look like a special case of Popoviciu's inequality. However, the spectrum of the modular Hamiltonian depends on the probability distribution, and has no finite upper bound. To clarify, we recast our result in the language of classical information theory. Let X be a random variable with N outcomes {x 1 , . . . , x N }, with a discrete probability distribution p(x i ). Then move to a new random variable, the information content I(X) = − log 2 p(X). The expectation value is the Shannon information, H = E(I(X)), with 0 ≤ H ≤ H max = log 2 N . The same variational calculation as we performed above, changing the notation, gives an estimate for the variance of I(X) This is not a special case of Popoviciu's inequality: unlike for X, for the random variable I(X) the supremum depends on the probability distribution p(x), with sup Instead, what appears in (3.61) is the supremum of the expectation value. We are unaware of whether (3.61) is a new result in classical information theory, or previously known. Note also that the upper bound applies for a finite system. In the thermodynamic limit, with the system size becoming infinite, it would be interesting to study if the capacity of entanglement displays interesting non-analyticity e.g. when the system is undergoing a quantum phase transition.

Bipartite particle fluctuations and entanglement fluctuations
In this section, we discuss scenarios where it is natural to consider fluctuations in entanglement entropy and its cumulants. We consider entanglement in a bipartitioned system. There has been growing interest in developing strategies to measure entanglement. An interesting proposal relates entanglement in a bipartitioned system (between the subsystem A and its complement) to fluctuations of a conserved U (1) charge, such as the particle number, across the partition boundary [20][21][22][23][24], for systems that can be mapped to noninteracting fermions. The particle number in the subsystem A, N A , thus becomes a random variable, and so does the entropy as entangled charges wander in and out. There are detailed results relating the fluctuations in the particle number N A in the subregion A to the entanglement entropy S EE and even to the Rényi entropies S α . The probability distribution of the random number N A is characterized by its cumulants n m , defined by with the generating function where the expectation value is computed with respect to the initial state of the full system. Thus, for example the second cumulant n 2 (also called the fluctuations) is Given the relation between entanglement and particle number fluctuations, we expect the probability distributions of these two random variables to be related, and in particular to be able to express all the cumulants of the entanglement entropy (the modular Hamiltonian) in terms of the cumulants of the particle number. Let us show this in more detail. We emphasize that this only applies to systems that can be mapped to non-interacting fermions. In this context, the entanglement Rényi entropies have the following series expansion in terms of the n m : with the coefficients where ζ(s, a) is the Hurwitz zeta function. Note that only the even cumulants n 2k appear in the series. In principle, the series is valid for real values of α > 1, but one has to be careful about its convergence. In [44], it was shown that, for systems equivalent to noninteracting fermion gases, the series (3.65) is well behaved, because the first cumulant n 2 contributes the leading N (d−1)/d log N asymptotic behavior, while the higher cumulants n k>2 contribute only subleading behavior, N being the total particle number. Thus the series essentially truncates.
In this case, we can viewk(α) = (1 − α)S α (A) as a well behaved generating function for the cumulants of the entanglement entropy / modular Hamiltonian. In particular, the entanglement entropy and the capacity of entanglement have the expansions in the above B 2k are Bernoulli numbers. From the results, we immediately see that if the particle fluctuations are Gaussian distributed (n m>3 = 0), the entanglement entropy is equal to its variance. Furthermore, in [44] it was noted that in the N → ∞ limit the Rényi entropies remarkably satisfy where + · · · indicates corrections that vanish. A consequence of this is that (3.69) in the N → ∞ limit. Note that the dimension d of the system was generic in the derivation. However, here the size of the subregion A is independent of N and is kept fixed. As a diversion, let us make some very speculative but perhaps inspirational remarks regarding systems that have holographic gravity duals. In this setting, we might make the following argument for invoking a large N limit and Gaussian statistics. Holographic gauge/gravity duality is simplified in a limit where the bulk gravity system is classical, corresponding to a large N limit in the dual theory at the AdS boundary, where N is the number of constituent degrees of freedom. In the dual theory, one could then focus on the average number of degrees of freedom fluctuating in the subregion A, where N (i) A is the particle number of the species i in the subregion A. Viewing each N (i) A as an independent random variable, not necessarily having identical distributions, under suitable conditions (e.g. Lyapunov conditions), the central limit theorem applies and in the large N limit the average particle number N A becomes Gaussian distributed. If the quantum entanglement is attributed to the average particle number fluctuations about the subregion, from above one could deduce that C E = S EE . An additional complication is that in gauge/gravity duality, one typically works in the large 't Hooft coupling limit so that the bulk spacetime is weakly curved. The dual theory on the boundary is then strongly coupled, so the concept of "particles" is lost and the definition of a "particle number" becomes less clear. Another caveat is that the definition of entanglement for gauge invariant observables is more complicated [45,46]. In Section 6.1, we study a system which can be mapped to noninteracting fermions. We consider ground state entanglement in a Heisenberg XY spin chain at different phases, with the ground state adjusting to the changes in the parameters. Its Rényi entropies were computed in [34,47], although they worked in a double scaling limit N → ∞, L → ∞ with L/N fixed, where L is the size of the subsystem. Therefore, the above large N result from particle number fluctuations is not directly applicable. In particular, we will find that the entanglement entropy is not always equal to the capacity of entanglement. This system will also work as an introduction to later sections where we study CFTs and their perturbations with relevant operators.

An simple example where capacity equals entropy
The large N limit in the previous section led to the equality C E = S EE . Later, we will arrive several times at the same result when we discuss entanglement in CFTs. It is interesting to ask what probability distributions could lead to such an equality. Here is one concrete example. It will be convenient to write λ = e −E and to use a density of states ρ(E). For a finite dimensional density matrix with eigenvalues λ i , the density of states would be ρ(E) = i δ(E + log λ i ). Then,

Gravity dual of the capacity of entanglement
Results in [7] can be used to find the gravity dual of the capacity of entanglement. This was done in [6], but here we present a simplified derivation of the result, and point out some additional properties.
Starting from the modular entropy (2.4), we can rewrite the capacity of entanglement as a first derivative, The modular entropyS α is given by the area of a suitably backreacted cosmic brane as in (2.5) and to determine that we need to consider the action for a brane coupled to gravity with h the induced metric on the brane. To leading order, the cosmic brane is just a minimal surface. To first subleading order, we need to take backreaction from the tension of the brane into account. The brane yields a source for the metric, and to first order in the source this affects the metric everywhere in spacetime through the bulk graviton propagator. It could also affect the matter fields to first order, in case the kinetic terms for the graviton and matter fields mix in this background. At higher order, we would would need to include more complicated Witten diagrams to understand the backreaction. If we only consider linear metric fluctuations, the location of the minimal surface does not change, only its area changes. This is because the minimum area surface is an extremum of the area functional and therefore its location does not change under first order perturbations.
The first-order variation of the area is given by The field equation of (3.73) is where a simplified notation has been used. A more precise way to write the last term, using the explicit expression for the induced metric h ij in terms of the embedding x µ (σ i ) of the surface, In order to not clutter the notation, we will keep using the imprecise notation in (3.75) but write the final answer in a more accurate way. We need to solve (3.75) to first order in (α − 1). To leading order, the metric obeys the field equations in the bulk, and we need to linearize around the background in order to find the first order variation, g µν = g background with D some second order operator representing the kinetic term of the graviton. Therefore, if G is the graviton propagator in the background, we get Inserting this in the variation (3.74), we are left with a double integral over the minimal surface Notice that the indices of the graviton propagator are contracted with the metric on the brane, in other words we are effectively taking a trace, but only in the directions along the brane.
A more precise version of (3.80) in terms of the embedding x µ (σ i ) is This result was derived in [6], via a different route. We note in passing that the result (3.81) looks very familiar when phrased in terms of the (reduced) fidelity susceptibility associated with the flow (2.37) as in Section 2. Then 8 , a relationship between response coefficient and fluctuations resembling that of the magnetic susceptibility and order parameter fluctuations in Landau-Ginzburg theory, The analogue is satisfying, viewing bulk gravity as an effective theory for the quantum theory on the boundary. One might have thought that one can simply decouple this piece of the graviton propagator by restricting to the conformal mode, but that would involve a trace over all indices, not just those along the brane. We have not attempted to compute (3.80) in an actual example but can relate this expression to CFT computations that have appeared previously in the literature [25,26] for the expansion of the Rényi entropy S α around α = 1 for a spherical entangling surface. We first notice that the first-order variation of entanglement entropy itself reads and therefore (3.81) can be interpreted as a two-point function where we view δh ij as a fluctuating bulk field. This clarifies more directly the relation between the bulk computation and the boundary computation in terms of correlation functions of the modular Hamiltonian K. Indeed, if we interpret the first law of entanglement entropy as an operator statement (see e.g. [48,49]), so that then the equivalence between δS EE δS EE and KK becomes manifest. In this way, we can also much more directly derive bulk expressions for the higher moments of K, they simply become higher order bulk correlators of δS EE , and in this way perturbatively prove the result of [7]. The above computations also nicely illustrate the relation of quantum entanglement to gravity: we can see that entanglement fluctuations are interpreted as self-gravitation, with the two-point function (3.86) as the leading order integrated self-gravitation of the bulk surface 9 .

Direct CFT computation
As we just described, a more direct CFT computation of the capacity of entanglement and higher moments of the entanglement spectrum involves the computation of n-point functions of the modular Hamiltonian. For spherical regions, the modular Hamiltonian has a local expression (3.88) and the capacity of entanglement then involves a double integral of the 2-point function T tt (x)T tt (x ) . In the case d = 2, using a slightly ad hoc way to treat the divergent 9 Given the equality (2.46) of the capacity of entanglement, fidelity susceptibility and quantum Fisher information under modular flow, the corresponding bulk interpretations of the three should also be related. However, on the face of it, proposals for the latter two look quite different [17,18,50,51]. It would be interesting (but beyond the scope of this work) to understand in more detail the relation of the bulk interpretations when (2.46) holds.
integral, 10 (3.93) In the above, we used the well-known T (z)T (0) ∼ c 2z 4 formula and also set a UV regulator ε = 2a L by hand. In the first step, we did the x integral assuming the contour had been shifted off the real axis. The second integral is then performed with an explicit cutoff.
One can systematically study higher cumulants and see that they are given by the terms that appear in the higher order correlation functions of the energy momentum tensor. This is explored in quite some detail in [26], see also section 5.2, in particular the relation between the second derivative of the Rényi and the coefficient that appear in the three-point function of the energy momentum tensor is worked out in this paper. This paper also discusses a particular regularization of the integrated correlation functions of the type above, but only in a simple example, and proposes to use dimensional regularization whose connection to standard regulators is obscure.

State dependence of the capacity of entanglement
The capacity of entanglement depends on the choice of the original state of the system. In this section we study this dependence by considering various simple examples.

Simple n-qubit examples
We begin by studying simple n-qubit systems. Two-qubit systems were discussed in [52]. As a simplest example, consider the states |θ, φ = cos(θ/2)|10 + e iφ sin(θ/2)|01 (4.94) and form a reduced density matrix by tracing over the other spin, ρ red = diag(sin 2 (θ/2), cos 2 (θ/2)) . The entanglement entropy grows monotonically towards the maximum, but the capacity of entanglement goes to zero both in the maximally entangled and separable state limits while reaching a maximum at a partially entangled state with cos 2 (θ/2) ≈ 0.2885 ( Figure  1). 11 Moving to a three-qubit system, we consider the subspace |θ, φ = cos θ|001 + sin θ(cos φ|010 + sin φ|100 ) (4.99) and form a reduced density matrix by tracing over one spin. In this case, we again see that the capacity of entanglement goes to zero in the limit of maximal entanglement and separable states, while reaching a maximum for partially entangled states ( Figure 2.) As these simple examples illustrate, the entanglement entropy and its fluctuations, characterized by the cumulants have no reason to be equal, except for some special states. The capacity of entanglement characterizes the width of the eigenvalue spectrum of the density matrix. We study that by a simple example. Let us assume that the (reduced) system has N levels, and we have moved to a diagonal basis. We then assume that the N eigenvalues λ i of the (reduced) density matrix have a gaussian distribution with variance σ, i.e. they form a gaussian vector (an example is shown in Figure 3).
We can now interpolate from σ = 0, where the density matrix describes a pure state, to σ → ∞ limit giving a maximally mixed state. We then plot the (entanglement) von 11 Ref. [52] considered a slightly more general example, starting with the states with the normalization 4 i=1 |a i | 2 = 1. The diagonalized reduced density matrix for one spin takes the same form as above, but [52] wrote it in terms of the concurrence c of the 2-bit system, where c = 2|a 1 a 4 − a 2 a 3 | .

Random bipartite entanglement
One could imagine that the origin of entanglement and capacity of entanglement has to do with the fact that we are projecting a random pure state to a subsystem. It is therefore interesting to compute the expectation value of the entanglement entropy and capacity of entanglement in such a setting. More precisely, we are going to consider random pure states in a Hilbert space of dimension pq of the form H q ⊗ H p , projected to the subsystem with Hilbert space H p of dimension p (q ≥ p), a review is e.g. [53]. In this way we will get a probability distribution of reduced density matrices, and we can compute the expectation The probability distribution for the eigenvalues λ i of the reduced density matrix is [54] where N p,q is a known normalization factor [55]. To compute the expectation value of the Renyi entropies, we need to compute the expectation value of λ α i under this probability distribution. The relevant integrals can be extracted from the results in [56]. For example, we find that when p = q = 2 trρ α 2,2 = N 2(α 2 + α + 2)Γ(α + 1) Γ(α + 4) (4.101) from which we obtain S EE 2,2 = 1 3 , We therefore see that for p = q = 2 the entanglement entropy and capacity are very close to each other. For arbitrary p, q an exact answer for S EE p,q was conjectured in [57] and proven in [58], for p ≤ q.
Similarly we can compute trρ α 2,q = N 2(q − 2)!(α 2 + α + 2q − 2)Γ(α + q − 1) Γ(α + 2q) (4.104) from which one can obtain for example S EE 2,3 = 9 20 and C E 2,3 = 1169 3600 . Numbers quickly become unwieldy, but as we increase p and q in general the entanglement entropy increases and the capacity divided by entanglement decreases. This is consistent with the idea that as p and q increase, the reduced density matrix becomes more and more maximally mixed. To illustrate, for p = 2 and q = 100 we get S EE 2,100 0.686, close to the maximal value log 2 0.693, while C E 2,100 0.015.
The case p = q = 2 is the case where entanglement entropy and capacity are nearly equal to each other. The ratio C E /S EE decreases even when we keep p = q and increase p and q. Already for p = q = 3 where S EE = 1669/2520 and C E = 2898541/6350400 the ratio has decreased to C E /S EE 0.689. As we briefly review below, this can be proved rigorously be studying a the limit of large p, q using random matrix theory [59,60].
The main observation of these studies is that a possible explanation for the approximate equality of S EE and C E in a system is that most of the entanglement is being carried by randomly entangled pairs of qubits (the case p = q = 2).

Random pure states and Wishart-Laguerre random matrices
We may also study the reduced density matrices of random pure states by using their connection to the Wishart-Laguerre ensemble of random matrices. We are not going to perform a mathematically rigorous analysis here (along the lines of [59]), but just quickly study whether the ratio C E /S EE decreases to zero in the limits p = q → ∞, and p, q → ∞ with p/q → 0, following the approach in [60]. In [60], the reduced density matrix was written in the form where W ≡ Y Y † is a p × p random matrix belonging to the β = 2 Wishart-Laguerre ensemble. In the limit of large p, q with α = p/q fixed, one may replace Tr(Y Y † ) → p, so that the eigenvalues λ i of ρ are related to the eigenvalues µ i of the Wishart matrix W by a simple rescaling This gives a quick way of computing the ensemble average of the Rényi entropy, by using the average spectral density of the Wishart ensemble. We start with is the spectral density of the Wishart matrix. Its ensemble average ν(µ) is given by the Marchenko-Pastur distribution [61]n(µ), where α − ≤ µ ≤ α + , with α ± = (1 ± √ α) 2 , α = p/q. The ensemble average k(s) thus becomes The general result from (4.110) for the entanglement entropy and the capacity of entanglement is then In the limit α = p/q → 0, one obtains confirming the numerical guess. In the limit p = q (4.110) gives k(s) = p 1−s 1 2π in agreement with Eqn. (12) in [59], from which we obtain also confirming the numerical guess C E / S EE → 0 in the limit p = q → ∞. We therefore see that a possible explanation for approximate equality of S EE and C E in a system is that most of the entanglement is being carried by randomly entangled pairs of qubits. 5 The capacity of entanglement in quantum field theories and universality

1+1 dimensional CFTs
We now move to consider conformal field theories in various dimensions. In particular, in 1+1 dimensions we can make use of very general analytical results for the Rényi entropies. Rényi entropies for generic CFTs have been computed in a variety of cases: for an initial vacuum state for infinite and finite systems, for thermal backgrounds for finite and infinite systems, and for non-equilibrium dynamics involving different quench protocols, notably global and local quenches 13 . A recent review is e.g. [64]. The computations lead to where W A is a function of the size of the subsystem (and time, for quenched systems) but independent of α, and c α with c 1 = 1 are model specific constants that do not depend on the size l. The Rényi entropies S α (A) are then From the cumulant generating functionk(α) = (1 − α)S α , we immediately obtain the general result for the capacity of entanglement, and in particular, for large W A , the last three terms on the right hand side are negligible. Explicit results for W A for various different situations are listed in [65]. For time independent cases with A a finite interval of length l, 2 log l infinite system, T = 0 2 log L π sin lπ L finite system, size L, T = 0 2 log β π sinh lπ β infinite system, T > 0 .

(5.119)
For time-dependent cases with two semi-infite intervals, some known results are W A = log β π cosh(2πt/β) global quench at t = 0 log t 2 +λ 2 λ/2 local quench at t = 0 , (5.120) with quench parameters β (the resulting inverse temperature) and λ. 13 In Section 2 we discussed the relation of the Rényi entropies and the entanglement spectrum. A recent paper [63] studies the time evolution of the entanglement spectrum and the entanglement Hamiltonian after a quench.
It may be somewhat surprising that the leading order result C E = S EE holds in all of the above cases, even for the quenched nonequilibrium systems throughout their time evolution. The entangled states are rather special, prepared by using conformal mappings. Also, the result tells us that after the quench, the growth rate of the capacity of entanglement is the same as that of the entanglement entropy. The growth of the entanglement entropy is consistent with the interpretation where the spreading of entanglement is carried by quasiparticle pairs created at the quench and propagating ballistically through the system at the speed of light. It is not obvious at all that this picture should imply the same growth rate for the capacity of entanglement. Consider for example the following toy model. Suppose we quench a system and afterwards entanglement builds up because pairs of perfectly entangled quasiparticles are created. The reduced density matrix of a subsystem will then roughly consist of the original reduced density matrix combined with say n(t) qubits which are pairwise perfectly entangled with n(t) qubits outside the subsystem. The number of such qubits will be time dependent. We will model the reduced density matrix as where ρ U V represents the very UV degrees of freedom which were entangled, and remain entangled across the boundary of the region. The pure state ψ(t) represents the reservoir of states from which qubits are extracted as more and more entangled quasiparticles become relevant. This pure state lives in some Hilbert space of dimension D − 2 n(t) with a very large D. Details of the reservoir pure state are irrelevant. It is straightforward to compute the Rényi entropies for this density matrix, We then deduce that EE grows as a function of t, where the function n(t) can be adjusted to the quasiballistic interpretation (with the characteristic long linear growth regime), but the capacity of entanglement stays constant receiving contributions only from the UV, not from the created quasiparticles. Therefore if the capacity of entanglement and EE grow identically after the quench, it is in tension with the simple ballistic propagation of created pairwise perfectly entangled quasiparticles which leads to the above model. Rather than expecting the quasiparticles to be pairwise perfectly entangled, we could expect them to be created in random pure states with some statistical distribution. In Section 4.2 we studied random pure states with bipartite entanglement between degrees of freedom inside and outside the subsystem. There we saw that it was indeed possible for the capacity of entanglement to be approximately equal to the entanglement entropy, for ensemble averages of randomly entangled pairs of qubits. On the other hand, in Section 3.2 we saw that such an equality also arises (in the large N limit) from a more detailed connection between particle fluctuations into a subregion and its entropy. We conclude that the intuitive interpretation of entanglement being carried by ballistic propagation quasiparticles needs to be refined with correct statistics for the distribution of entanglement.
Finally, we make some brief comments on the two interval case. Let the two intervals be A = [u 1 , v 1 ] and B = [u 2 , v 2 ]. The Rényi entropy for the two interval case has the general form [66] Trρ α A∪B = c α where c α is a non-universal constant and F α is a non-universal function of the cross ratio η. This non-universal function makes the analysis more complicated, so here we consider only the case F α = 1, such as in the massless fermion theory [67][68][69]. Then the Rényi entropy takes the same form as (5.116), and we readily obtain a similar relationship for the leading terms as in (5.118),

An alternative calculation in higher dimensions
In subsection 3.5 we discussed an alternative calculation to compute the capacity of entanglement for a 1+1 dimensional conformal field theory. In this section we discuss its generalization to higher dimensional CFTs. Suppose we consider the entanglement of a ball. Inserting the explicit form of the modular Hamiltonian, we obtain a suitably integrated two-point function of the energymomentum tensor, in agreement with (3.86) and (3.87). Such types of integrated correlation functions were studied in [25,26]. The idea of the computation is as follows. A ball in a constant time slice is mapped to itself under a subgroup SO(1, d − 1) of the conformal group, and the double integral is invariant under the action of this conformal group. We can use this symmetry to fix one of the points at the origin, leaving us with a single integral over the ball. This integral is still divergent, but because the modular Hamiltonian involves the integral of a conserved current over a fixed timeslice, we can evaluate it on any suitable time slice, and in particular we can move the slice a little bit so as to avoid the origin. What is then left is a finite integral, which we still need to multiply by the volume of the gauge group that we fixed. This volume is the volume of SO(1, d − 1)/SO(d − 1), which is the Euclidean hyperbolic space, because the origin is left fixed by a SO(d − 1) subgroup of SO(1, d − 1). This hyperbolic space is the same space that appears at the boundary of AdS if we map the Rindler wedge associated to the ball to a hyperbolic black hole as in [41].
It is clear from the above that the final result will only depend on the coefficient C T which appears in the two-point function of two energy-momentum tensors The relevant computation of the integrated two-point function was studied in detail in [25] for arbitrary CFT's, and equation (1.3) therein reads where the volume of hyperbolic space appears as we just explained, and we can interpret the result as the capacity of entanglement by C E = ∆S 2 EE = −2S q=1 . The result is indeed proportional to the coefficient C T that appears in the two-point function of two energymomentum tensors. For later use, we write the result in a different form. As also explained in [25], the C T above should not be confused with the coefficient c appearing in the d = 4 trace anomaly of a CFT or theC T used in [42] in their expression for the Rényi entropies. They are proportional to each other,C such thatC T | d=4 = c. Thus, from (5.128), we arrive the expression This expression is similar to that of the entanglement entropy. When computing the entanglement entropy via the conformal mapping to hyperbolic space, the original computation in [41] obtained the expression where a * d is a dimensionless constant, a central charge. In even dimensional spacetimes, this equals the coefficient of the A-type trace anomaly of the CFT. In odd dimensional spacetimes, a * d ∝ log Z S d , i.e. the logarithm of the partition function of the CFT on a unit sphere.
The regulated hyperbolic volume is given explicitly in (3.1) in [25] and is also easy to compute independently. The expression is sensitive to the choice of the regularization scheme, but even and odd d have a universal coefficient for the logarithmic and constant term, respectively. They are To conclude, the above computations suggest a simple universal ratio for the entanglement entropy and the capacity of entanglement, We discuss universality in more detail in Section 5.6.

The entanglement spectrum
The capacity of entanglement captures a particular feature of the full entanglement spectrum. Here we briefly review some aspects of the full entanglement spectrum and comment on the implications for the capacity of entanglement. We start with the two-dimensional case where the entanglement spectrum can be explicitly obtained in several cases. In particular, from (5.116) one can in principle extract the entanglement spectrum for 1+1 dimensional CFT's using an inverse Laplace transform. Of course, the precise answer will depend on the detailed structure of c α , but it is instructive to show the spectrum in an explicit example. For c α = 1/α, the inverse Laplace transform leads to the following result for the density of eigenvalues of the reduced density matrix [10,16] and one can verify explicitly that indeed It is perhaps more instructive to rewrite this in terms of an energy spectrum with λ = e −E .
It is interesting to see that the spectrum cuts off at a value E c which is determined by the UV cutoff that one needs to introduce in order to regulate the divergences in the computation of the Rényi entropies. One is almost tempted to view E c as some sort of analog of the Casimir energy. As we remove the cutoff, all eigenvalues of the density matrix are located in an ever smaller interval starting at λ = 0. In the above, we only considered the leading term of the Rényi entropy in terms of a cutoff, one can in principle do more precise computations taking the exact cutoff dependence into account (which involves a choice of boundary state) [13,16], but that is beyond the scope of this paper. Ultimately, in 1+1 dimensions, (5.137) follows from conformal invariance, but in a rather indirect way. It would be desirable to have a more direct understanding of the connection between conformal invariance and the various features of (5.137), as that might shed further light on the relation between C E and S EE in higher dimensions.
Expressions where c α is some other integer power of α can in principle be obtained from (5.137) by differentiating or integrating with respect to E. The feature of (5.137) which does not depend much on these details is its large E behavior. For large z, the asymptotic behavior of I ν (z) is I ν (z) ∼ e z / √ 2πz. Keeping only the exponential we therefore see that ρ(E) behaves as Another interpretation of this expression is that it is the density of states for a 1+1 dimensional CFT on the plane. In higher dimensions, it maps to the density of states on the hyperbolic plane, but the one-dimensional hyperbolic plane is just the ordinary line. Actually, according to [41] after a change of coordinates the causal diamond attached to the spatial interval r ∈ [−l/2, l/2] maps under the change of coordinates where, for an interval of length l, R = l/2. Putting a UV cutoff at r = ±(R− ) corresponds under this change of coordinates to u ± log(l/ ) for small . The length of the u-interval is therefore precisely what we called W A . We therefore need to compute the density of states on the plane for a box of size W A and relate the energy on the plane to the energy on the causal diamond. The change of coordinates (5.140) has a Schwarzian derivative of −1/2. To get the full shift in the energy we need to integrate this over the spatial box. Therefore the full shift in the left-or right-moving energy is cW A /24. This leads to a shift in the total energy of the form Moreover, the energy spectrum for a box of size W A is quantized in units of ∼ 2π/W A , and there is an extra factor of 2π relating the energy to L 0 , so altogether what one should use in the Cardy formula is which explains the form of (5.139) from the usual Cardy formula.
In higher dimensions, we can study features of the entanglement spectrum whenever the CFT has a holographic dual, since the Rényi entropies can then be obtained from the thermodynamics of hyperbolic black holes [42]. The density of states will now obey which implies in particular that ρ(E) = 0 for E < E c . It is tempting to interpret E c once more as some sort of Casimir energy. The high-energy behavior of ρ is determined by the behavior of f (α) as where c d is some constant which only depends on the dimension d. The scaling with E is exactly what one expects generically for a CFT at high energy/temperature, this simply follows from extensivity of the free energy. Corrections to (5.148) can be obtained from subleading terms in the expansion near α = 0. The first subleading term scales as α 3−d which gives rise to corrections to (5.148) of the form It is not clear whether these results also hold for more general CFT's without a holographic dual. We also observe that the capacity of entanglement is related to the behavior of f (α) near α = 1, whereas the cutoff energy E c and the high-energy behavior are related to the large and small α behavior of f (α). A priori, these are unrelated to each other, and therefore the capacity of entanglement does not in general make any prediction for the behavior of the density of states at either high or low energy. If the theory has a holographic dual we know the explicit form of f (α), which only depends on the dimension d, and there are some simple factors of order unity which relate the behavior for small α, α of order 1, and large α. We also confirm once more that We notice that in the above cases not just the capacity of entanglement, but all Rényi entropies obey an area law. This implies that the function f (α) in (5.144) has an area law and scales as a 2−d where a is a short-distance UV cutoff. This translates into the following leading cutoff dependence of the density of states whereρ does not depend on the cutoff. This is precisely the scaling one would get from a local Rindler point of view. Therefore, the area law for capacity of entanglement (and more generally for the Rényi entropies) seems to arise from the fact that these quantities are dominated by UV degrees of freedom localized near the entangling surface, which in turn can be well approximated by local Rindler modes.

The capacity of entanglement in holographic systems
Entanglement Rényi entropy was considered for systems with a holographic dual with a black hole in [42]. Their method was to relate Rényi entropies of spheres to thermal entropies in hyperbolic spacetime at a range of temperatures. This method is applicable only for systems with bulk duals that admit black hole solutions at various temperatures. After obtaining the Rényi entropies, the computation of entanglement entropy and the capacity of entanglement are straightforward. All the holographic Rényi entropies are proportional to the volume of hyperbolic space, which contains all the divergent terms. As discussed in [42], the ratio S α /S 1 of the Rényi entropies then yields universal information characterizing the dual CFTs. In similar spirit, we choose to study the ratio of the entanglement entropy and the capacity of entanglement, another universal constant (at least for spherical entangling surfaces).
In holographic duals of Einstein gravity, the entanglement entropy and the capacity of entanglement are equal, where Vol H d−1 denotes the volume of the hyperbolic space. HereL is the curvature scale of the dual AdS spacetime. In Einstein gravity, it equals the curvature scale in the cosmological constant term but it's not the case for higher curvature theories. In Gauss-Bonnet gravity, the ratio becomes more interesting. First, for d = 4, the bulk and boundary theories have two central charges, a and c, that appear in the boundary Weyl anomaly. The entanglement entropy and the capacity of entanglement are so that in d = 4 the gravity calculation indeed produces the same ratio (5.133) as the field theory calculation. A priori, one might not have expected any constraints for the ratio of the two. However, the ratio of the CFT central charges is restricted by the "conformal collider bounds" proposed in [27] and proven in [28], implying a bound for unitary 4d CFTs with gravity duals. The central charges a and c are known explicitly for some field theories. It is interesting to compute the ratio even when the theories are not holographic -one may ask if the holographic "prediction" (5.153) still holds. For example, for conformal free scalar field theory and massless free Dirac fermion theory the ratios of c and a are [70] c scalar a scalar = 3, c fermi a fermi = 18 11 . (5.154) For general d ≥ 4 the theory is parametrized with λ that is the coefficient for the 4-dimensional Euler density term [42], In this case one finds so that the ratio C E /S EE is again a universal λ-and d-dependent constant. Here,L is proportional to the L in the action,

Violation of the area law
It has been known for some time that the area law of entanglement entropy is violated in some systems, we point out that the same is true for the capacity of entanglement. Apart from the logarithmic scaling in 1+1 dimensional conformal field theories, the most common example is provided by critical systems with a finite Fermi surface such as free fermions or Fermi liquids [71][72][73]. The Rényi entanglement entropy of these systems has been computed to the leading order in [74] S α = 1 + 1 α where the integrals are taken over the entangling surface and Fermi surface, the n's are their normal unit vectors and L corresponds to the effective system length. The capacity of entanglement again tracks the entanglement entropy, with C E = S EE , both scaling as L d−1 log(L).

On the universality of the ratio of entanglement entropy and the capacity of entanglement
We have already seen in many field theories that the leading order of divergence of both the entanglement entropy and the capacity of entanglement are the same. It's interesting to ask if the ratio of the leading terms, computed using the same regularization scheme, is universal or if it is scheme dependent. At first thought, there is no immediate reason why there would not be some universality. After all, they have the same power dependence on the regularization parameter. However, the choice of the regularization scheme could contain some hidden dependence on the quantity we are computing, spoiling the universality. We demonstrate this below by two cases, free massless scalar fields and fermions.
First, we consider the free massless scalar field theory with both d = 3 and d = 4. The Hamiltonian of the theory is with the standard commutation relations. The entangling surface is a sphere. As the first way of regularizing the theory, we expand the scalar field φ and its conjugate momentum field π in Fourier modes for d = 3 (and in spherical harmonics for d = 4), and discretize the remaining radial integral to a sum. The entanglement entropy and variance can be expressed in terms of correlation matrices inside the sphere. This method of computation was originally used in [75] and a more detailed explanation can be found in [76].
Using these methods, we performed a quick numerical computation of the entanglement entropy and the capacity of entanglement at various radii and found that These ratios are only approximate. The Rényi entropy of this model in the four dimensional case was considered to greater accuracy in [77] using the same regularization scheme and can be used to confirm our approximate ratio. As the second alternative regularization scheme we consider the heat kernel method. In this way, the Rényi entropy of the free massless scalar field theory for a spherical region was also computed analytically in [78]. The effective action, W n = − log Z n , of the n-sheeted spacetime is where the regularization is done by limiting the lower limit of the integral. The heat kernel is defined as K(s, n, X, X ) = n X|e −sD |X n , (5.162) where D is the field operator of the theory. The trace itself is With these, it can be seen that the ratio for the leading terms, exactly, for all d. This is an obvious disagreement with the previous result (5.160). We conclude that at least for the free massless scalar field theory, the ratio of the leading terms is scheme dependent. While the universality fails for generic field theories, we may restrict our consideration to conformal field theories. An alternative regularization scheme valid for all Rényi entropies of conformal field theories is provided by the holographic computation technique used in [41,42] and discussed above. We can test universality by the following criterion. We separately expand the entanglement entropy S EE and the heat capacity C E , and then compare one by one the ratios of the leading terms and the ratios of subleading terms at the same order in the expansion. If universality holds, the ratios of respective terms in the expansions should all be the same in all regularization schemes. Conversely, if we compute in one regularization scheme the ratio of say, the leading terms, and in a different regularization scheme the ratio of subleading terms (at same order), and find a different ratio in the two schemes, universality does not hold. We show an example below.
Let us consider free massless fermions in d = 3, a conformal field theory. For the leading terms, we first perform a numerical computation using similar methods as above, specified in [76].The numerical computation leads to which implies that the ratio of the leading terms is approximately 2.9.
Next we consider subleading terms, and a different regularization scheme. For d = 3, the expansion of Rényi entropies contains a universal constant term, related to topological properties of the system. For entanglement entropy, this is known as the F -term (or the negation of it). The F -term has been shown to be equal to the constant term in the free energy of the system restricted to a unit sphere, S 3 . This has been computed for free massless fermions in e.g. [79]. The corresponding term for the heat capacity has been computed e.g. in [25]. The ratio of these two terms is approximately 1.4, in disagreement with the ratio of leading terms.
Therefore, we can conclude that at least for generic conformal field theories, the ratio of coefficients in the expansions of entanglement entropy and capacity of entanglement are not in general universal. 14 Narrowing the set of theories further, we could consider CFTs which have a gravity dual. In these cases, geometry may provide a natural regularization scheme. Indeed, in sections 5.2 and 5.4 we saw that both the entanglement entropy and the capacity of entanglement were proportional to the hyperbolic volume factor containing the divergences, leading to a clean ratio of the two. Regarding the interesting question which CFTs then have gravitational duals, we are lead to speculate that perhaps such natural ratios in CFTs are a hint of a dual gravitational interpretation.

On the shape dependence of the capacity of entanglement
Much of the discussion has focused on spherical or planar entangling surfaces. As a natural generalization, we briefly comment on the shape dependence of the capacity of entanglement.
For general shapes in d > 2 quantum field theories, the coefficient of the leading divergent term of the Rényi entropies is proportional to the area of the entangling surface and non-universal. The subleading terms have more variety and, in general, the expansion is different from the spherical case. Much work has been carried out to understand the shape dependence of entanglement Rényi entropies, and some universal results are known.
A well-studied case is the 4-dimensional CFTs, for which the universal log term can be expressed in terms of integrals over the entangling surface Σ, [80] S univ Here y and h are the coordinates of the entangling surface and the induced metric, respectively, and R Σ , K, and, C ab ab , the intrinsic Ricci scalar of Σ, the extrinsic curvature of Σ, and the contraction of the Weyl tensor along coordinates orthogonal to the entangling surface. The functions f depend only on the physical data of the CFT and n. The function f a can be computed by mapping the sphere to hyperbolic spacetime. On the other hand, f c and f b can be studied using deformations of spherical entangling surfaces to first and second order, respectively [81]. In general, it is known that f a (1) = a, f b (1) = f c (1) = c. The universal log term of the entanglement entropy is thus In comparison, the universal log term of the capacity of entanglement for holographic theories is [81] C univ The ratio C univ E /S univ EE thus depends on a, c and the geometric quantities. Similar expressions with integrals over local geometric quantities for the universal terms should exist for other even-dimensional CFTs [82]. For odd dimensions, this is not possible for the universal constant term, making the computation of the universal term more challenging. 15 For spherical entangling surfaces of CFTs with gravity duals, we saw that there was a natural choice of UV regulation such that the entanglement entropy and the capacity of entanglement were proportional to each other. It would be interesting to see, whether deformed shapes would also have similarly natural choices for UV regulation.

The capacity of entanglement under perturbation with relevant operators
In previous sections we studied conformal field theories, and in particular in 1+1 dimensions we found that the leading terms of the first cumulants, the capacity of entanglement K 2 c = C E and the entanglement entropy K c = S EE were equal. In this section we will study what happens when the theory develops a mass gap or more generally is deformed by relevant operators to break the conformal invariance.
As a warm-up example, we study the anisotropic Heisenberg XY model. We find that the relationship C E = S EE is broken, when the parameters are moved away from the critical domains, and the equivalent free fermion system develops a mass gap. We will also study how the divergence structure of the capacity of entanglement alters under perturbing away from criticality. The first order perturbation will contain a new universal log 2 divergent term.
After this concrete example, will now perform a more general analysis and study CFTs perturbed with relevant operators. We will be considering two different entangling surfaces, planar and spherical. The strategy that we use follows the work [29,31].

The capacity of entanglement in the anisotropic Heisenberg XY spin chain
The anisotropic Heisenberg XY spin chain is a nice example of a system with a nontrivial phase structure. The system can be mapped to noninteracting fermions, with critical domains having a gapless spectrum, but elsewhere the spectrum is gapped. Using analytical results by Korepin et al [34,47] on Rényi entropy in the anisotropic Heisenberg spin chain, we compare S EE with the capacity of entanglement. We find that at criticality C E = S EE , but moving away from the critical domains causes deviations from this relation.
The Hamiltonian of the model is where γ parameterizes anisotropy, and h is the external magnetic field. The phase diagram of the model has three regions: universal terms for all Rényi entropies [85] S univ with the critical lines γ = 0, h ≤ 2 where it becomes isotropic (the XX-model), and h = h c = 2 corresponding to the critical value of the magnetic field. The line h = h f (γ) = 2 1 − γ 2 separating the regions 1a, b is not a phase transition, but the entanglement entropy has a weak singularity on it, and ground state is double degenerate with a basis given by two product states.
Korepin et al computed the entanglement and Rényi entropies for a spin chain of N spins, considering the system in its ground state, separating a subsystem A of L spins, and considered the double scaling limit N, L → ∞ with L/N fixed. For the Rényi entropy (with the reduced density matrix ρ A and exponent α) they obtained with the elliptic modulus parameter and its complement k = √ 1 − k 2 , and the nome q = e −πI(k )/I(k) ≡ e −πτ 0 (6.175) where I(k) is the complete elliptic integral of the first kind. Either by taking the limit α = 1, or computing the first derivative of the generating functionk(α) ≡ (1 − α)S R (ρ A , α), the entanglement entropy becomes The result for h > 2 comes from an expression where the latter series can be found from Abramowitz and Stegun in closed form with the elliptic integrals [87] to recover the result in the above. In region h > 2, the capacity of entanglement, from the double derivative of the generating function, becomes We have not identified the closed form for the infinite series, but appears clear that in general the capacity of entanglement differs from S EE . We can study this in more detail near the critical phases, to find that at criticality C E = S EE while they begin to deviate moving away from the critical phase. For γ = 0, near the phase boundary at h c = 2, for the leading and next-to-leading order contributions we obtain where the expansion parameter is the inverse of the relevant length scale ξ, The leading term in the entanglement entropy matches with the result for a conformal field theory with c = 1/2, We can now investigate how the deviation from C E = S EE happens in the vicinity of the critical line. The difference of the two, S EE − C E is depicted in the Figure 3A. In this region S EE grows faster than the capacity of entanglement C E . On the other hand, near the isotropic critical line γ = 0, for h < 2, we find with the relevant inverse length scale so the leading term in S EE matches with the CFT result (6.182) with c = 1. Figure 3B depicts S EE − C E in this case. In this region C E grows faster than S EE . The saddle seen in figure 5 likely reflects the boundary h f (γ) = 2 1 − γ 2 separating the phase regions 1a and 1b, but a full matching would require a higher order calculation than (6.179). From the approximate results it appears that S EE is larger in the vicinity of the h c = 2 critical line, while C E is larger in the vicinity of the γ = 0 critical line.

Field theories perturbed with relevant operators
Much of the initial discussion here has already appeared in [30]. Let M be a d-dimensional Euclidean manifold. Consider its ground state |0 and the corresponding density matrix |0 0|. We partition the manifold to V andV and take a trace of the density matrix over the degrees of freedom inV , obtaining the reduced density matrix in V , ρ V . Inspired by the replica trick, we consider a cut C and its both sides, C + and C − . The density matrix ρ can be expressed as a path integral Now, we let I be an action with a small perturbation by operator O, i.e.
Note that the action governs the whole system, so the integral is over the whole manifold M, not just V . In the limit of small g, we expand with where · · · 0 is the vacuum expectation value in the unperturbed theory. To first order, where ρ 0 and ρ V,0 are the unperturbed vacuum density matrices of the whole system and the subsystem, respectively. The vacuum expectation value of O vanishes whenever O is a pseudoprimary operator. We assume this to be the case from now on. In addition, we focus on operators with scaling dimension ∆ > d 2 as the perturbative expansion of CFTs fail whenever ∆ ≤ d 2 [88]. To compute the Rényi entropies, we need The first order correction to the entanglement entropy is then (6.192) and to the capacity of entanglement, In the case of a QFT with a planar entangling surface or a CFT with a spherical entangling surface, K is known and expressible as an integral of the energy-momentum tensor. As a final remark, the last term in the perturbation of variance can be ignored as its contributions will cancel with the normalization constants of the former terms.

Planar entangling surface
We consider first a planar entangling surface Σ = R d−2 , at x 1 = 0, x 0 = 0. For any Cauchy surface A that ends at the entangling surface, the modular Hamiltonian is 16 (6.194) where x 0 and x 1 are the coordinates transverse to the entangling surface and y are the coordinates parallel to Σ. In addition, ξ = x 1 ∂ x 0 − x 0 ∂ x 1 is a Killing vector that keeps the entangling surface invariant and n is the normal vector to the Cauchy surface. The usual choice for the Cauchy surface is to set x 0 = 0 which yields (6.195) Due to the freedom in choosing the Cauchy surface, we can also write K as 16 For the sign, we follow the conventions of [30].
In addition, we can also rewrite the integral of the O contribution in the KO term as [31] R d 6.3.1 Massive free scalar field theory in four Euclidean dimensions As in [29], we start with the free scalar field theory, with a mass term as a relevant deformation: g = − m 2 2 and O(x) = φ 2 (x). The stress tensor of the massless theory is Using Wick contractions, we can compute the terms appearing in the variation of variance. The two-point functions of the scalar fields is (6.199) We point the reader to the appendices for computational details. When computing the integrals, we will run into UV or IR divergences. These emerge when integrating perpendicular to the entangling surface. We will regulate these with ε and 1/m respectively. Summing all the contributions, we get log(δm) (6.201) so in d = 4 the first order correction to the capacity of entanglement ∆S 2 EE is In the above, A Σ is the area of the entangling surface. Interestingly, the first order perturbation to entanglement entropy has the form Thus, when d = 4, the correction terms are equal for entanglement entropy and the capacity of entanglement.

A general CFT
The computation for a general CFT is more involved. We need the general forms of T µν O and T µν T αβ O for any CFT. Due to the tracelessness and sourcelessness of the energymomentum tensor, the two-point function automatically vanishes for any CFT. Thus, the entanglement entropy receives no correction at first order in the coupling g. This is not the case for the capacity of entanglement. The computation requires some effort but is, nevertheless, straightforward. We leave the computational details to appendices and move on to the results. We consider two special cases. In the following, a 3 is a multiplicative numerical factor appearing in the three point function. In most cases, we do not know any physical interpretation for it.
First, for ∆ = 2 and all d we get a logarithmic contribution. In this case, the general first order correction to the capacity of entanglement is where Λ is an IR regulator and a 3 is a numerical coefficient. Interestingly, the result has an IR divergence only for even d ≥ 4, in agreement with the breakdown of CFT perturbations for ∆ ≤ d 2 . For d = 2, the correction vanishes. In fact, T µν T αβ O = 0 at d = 2, so just like S EE , C E receives no corrections at leading order, for all ∆ for which the perturbative approach is valid.
Second, we consider the weight ∆ = d − 2 which corresponds e.g. to a mass term in a scalar field theory. The first order correction is Once again, we see the emergence of the divergences near d = 4. However, the divergence for d = 2 is not a real divergence as the three-point function is zero. The expression vanishes when d = 3.
The overall numerical factor a 3 , for a conformally coupled scalar field theory with a mass perturbation g = − m 2 2 , has the value To summarize, in these special cases we find divergent corrections δC E at leading order in g. The exceptions are d = 3, ∆ = 1 and d = 2, ∆ > 1, where δC E = 0. In particular the equality C E = S EE receives no corrections at leading order in g.

Spherical entangling surfaces of a CFT
There is a conformal mapping from the Rindler wedge to the causal diamond of a ball. Hence, for a spherical entangling surface of radius R in Euclidean spacetime, the modular Hamiltonian for a CFT is where B is any Cauchy surface that ends at the spherical entangling surface. Once again, n is the normal vector to the Cauchy surface and ξ = 1 2R (R 2 +x 2 0 −r 2 )∂ 0 + x 0 r R ∂ r is a conformal Killing vector that keeps the entangling surface at x 0 = 0, r = R invariant. With the simple choice B = B, the modular Hamiltonian is When computing the effects of the pertubation, only the KKO term is found to be non-zero at first order. We can recycle some of the computations done for the planar entangling surface. We leave many of the details to the appendices. The correction to the capacity of entanglement is The integral over the x 3 coordinates is essentially the same as for the planar entangling surface done in A.2.
After the integral over x 3 , we are left with where a 3 is the numerical factor that already appeared in the planar computations and f is a rational function and its specific value for arbitrary d and ∆ is seen in (A.44). The gamma functions capture the possible IR divergences for ∆ ≤ d/2 (where the perturbative approach breaks down [88]). In addition, the relevant perturbations are for ∆ ≤ d. What remains is the computation of the integral I d , Further details on evaluating this integral can be found in appendix B. The perturbation has a universal logarithmic term whenever ∆ is an even number. It is For other values of ∆, there is a universal constant term, which is . (6.215) Once again, we consider an example, d = 4, for which To summarize, we have isolated explicit expressions for universal perturbative corrections to C E in d = 4. Similar results can be obtained in other dimensions from (6.211).

Discussion and outlook
In this paper we discussed several aspects of capacity of entanglement C E , which encodes a particular feature of the set of eigenvalues of a reduced density matrix. It can roughly be thought of as the variance of the eigenvalue distribution, and can be obtained in a straightforward way from the (analytically continued) Rényi entropies. It may have interesting divergences across quantum phase transitions (in which case particular critical exponents may appear) but we have not studied this particular aspect in this paper.
If the reduced density matrix is viewed as a thermal density matrix, and we are allowed to change its 'temperature' by considering a suitable one-parameter flow (the "modular flow" ρ(θ) = ρ 1+θ /Tr(ρ 1+θ )), then we can relate capacity of entanglement to the ordinary heat capacity defined with respect to this fiducial temperature. At the same time this allows us to connect to various information theoretic quantities like fidelity susceptibility and Fisher information. Unfortunately, in an actual generic physical subsystem there is no natural operation corresponding to this change in 'temperature' and therefore these relations remain somewhat academic.
While in general C E can by much larger than S EE , we found several situations where C E is comparable to S EE . First and foremost this happens in CFT's with a holographic dual where both S EE and C E have an area law divergence. In principle the coefficients that appear in this divergence are scheme dependent in d > 2, but if the reduced density matrix is that associated to a ball in the ground state there is a fairly natural scheme in which S EE is precisely equal to C E .
We have studied various setups in order to gain some intuition for this approximate equality. First, following earlier work, we related the capacity of entanglement to fluctuations of U (1) charges fluctuating in and out of the subregion, communicating entanglement. In this context, if the fluctuations are Gaussian distributed, or if the total particle number N → ∞, we found that the capacity of entanglement becomes equal to the entanglement entropy, C E = S EE .
In a second setup, which we expect to be closely related to the previous one with fluctuation U (1) charges, we considered random bipartite entanglement and observed that for randomly entangled pairs of qubits C E and S EE are also approximately equal.
All this then suggests that the approximate equality of C E and S EE implies that entanglement is effectively carried by randomly entangled UV pairs of qubits and not by large numbers of maximally entangled EPR pairs, as in the latter case C E would be much smaller than S EE . This in particular applies to the quasiparticle picture sometimes used to describe entanglement growth after quenches: if C E is proportional to S EE , the quasiparticles should be approximately randomly entangled and not in maximally entangled EPR pairs.
The area law for capacity of entanglement seems to extend quite generally to an area law for Rényi entropies. This is perhaps not that surprising, because a volume law would disagree with the basic observation that the Rényi entropy for a subregion equals that of the complement of the subregion. In addition, such an area law is also in agreement with the above picture of randomly entangled UV pairs of qubits, and as we explained in section 5.3, in agreement with a local Rindler picture of the entangled degrees of freedom. This is all self-consistent, as in Rindler space the entanglement is thermally distributed and therefore also not mostly carried by maximally entangled EPR pairs.
In AdS/CFT, C E has a relatively simply bulk interpretation, given by metric fluctuations integrated over the Ryu-Takayanagi surface. This relationship is strongly reminiscent of the relation between response functions and fluctuations in Landau-Ginzburg theories. It is not entirely obvious where the equality C E ∼ S EE comes from in this computation. One would expect a divergence to arise if one or both points in the graviton propagator approach the boundary of the Ryu-Takayanagi surface. If one point approaches the boundary there is an area divergence from the integral over that boundary point, but that gets reduced by the graviton propagator which decays at long distances. Therefore the divergence must come from the region of integration where both points are close to the boundary 17 . Some naive power counting suggests that this gives indeed the right behavior, but it would be nice to examine this in more detail and try to connect it to the previous discussion.
One of the motivations for this work was to study quantum fluctuations in the metric at the RT surface and to study the validity of the semiclassical approximation. Since most of the contributions seem to come from the region near the boundary, capacity as we have defined it is perhaps not a very good probe of the size of bulk metric fluctuations. A better probe would be to consider fluctuations in manifestly finite quantities such as mutual information and relative entropy. It is straightforward to find a generalization of C E for the case of relative entropy 18 . It is given by C E (ρ|σ) = Tr(ρ(log ρ − log σ) 2 ) − (Tr(ρ(log ρ − log σ))) 2 (7.217) and a candidate quantity that generalize C E to mutual information is C E (ρ AB |ρ A ⊗ ρ B ). It would be interesting to study these quantities in more detail but we leave that to future work.
The equality of C E and S EE for (suitably regularized) CFT's with a holographic dual suggests that it may be possible to extract a necessary criterion for the existence of a holographic dual from consideration of capacity of entanglement. This led us to study what happens when one breaks conformal invariance and we found that the capacity of entanglement of the entanglement entropy starts to deviate from the entanglement entropy. As a demonstration, we considered the anisotropic Heisenberg XY spin chain, where the equality of C E and S EE holds at criticality but is broken as the parameters move from the critical lines. We also studied deformations of CFTs by relevant operators, and found similar results. In that case, the analysis is complicated by various singularities whose precise understanding we leave to future work.
Finally, we notice that finding estimates of fluctuations in the reduced density matrix as we dial external parameters is also of great relevance in putting bounds on the accuracy of local measurement. We intend to explore this connection in more detail in the near future.
Acknowledgments JJ and EKV are in part supported by the Academy of Finland grant no 1297472. JJ is also in part supported by the U. Helsinki Graduate School PAPU, and EKV is also in part supported by a grant from the Wihuri Foundation. We thank U. Danielsson, N. Jokela, A. Kupiainen, A. Lawrence, D. Sarkar, W. Taylor, E. Tonni, T. Wiseman for useful discussions and comments while this work was in progress. JdB and EKV also thank the workshops "Quantum Gravity, String Theory, and Holography", "Black Holes and Emergent Spacetime", "Field theory anthropics and naturalness in cosmology", and "Holography: What is going on these days?" for hospitality and partial support during this work.
A Computational details of the perturbative computations with planar entangling surface A.1 Mass deformation of a massless free scalar field theory First, we need where Ω d−1 = 2π d 2 Γ(d/2) is the surface area of the d − 1 sphere and When we put in α, β, µ, ν = 0 and set the stress energy tensor at the same point in time, we get We now compute the required integrals, following [29]. We begin with the two-point function as it is simpler. Using time translation symmetry, we set the integral to the form (A.5) We first shift y → y +ȳ and compute the integral over y andȳ (A.7) We compute the integral of x 1 only from δ to m −1 to regulate both the UV and IR divergences. We take the limit d → 4, yielding The appendix in [29] provides us with a good guide through the calculation. One identity that will prove useful is (A.12) We can write the integrand in the form which we can easily integrate over z to get (A.14) The remaining integrals can be computed as before. Thus, the contribution from the first term in (A.3) sums up to The contribution from the second term in (A.3) is more involved. We rewrite the integrand (A. 18) which we integrate over z to get which we know how to integrate with respect to the two other coordinates. The contribution from the second term sums up to This gives the contribution of the three point function as Summing all the contributions we get Thus, under the mass deformation the correction to the capacity of entanglement grows as fast as the correction to the entanglement entropy.

A.2 A general CFT
The three-point function does not vanish in general and is quite complex [94]: where The integrals can be computed in a similar manner as before and are straightforward. In this paper, we are only interested in the case where all spacetime indices are set to 0 and x 1,0 = x 2,0 .
After integrating over x 3 , we have the intermediate result In this appendix, we try to give more details on the computation of (6.212) for the values appearing in the text. We make use of the Fourier transformation identity After also integrating over the angular coordinates of k, we have where we have removed the R dependence from the integral. We can compute the remaining integral explicitly, It is still divergent for even values of ∆ but is now finite for odd values of ∆. The divergences with even ∆ emerge from the UV limit k → ∞, which we will regulate with k → 1 . The logarithmically divergent term is