Energy is Entanglement

We compute the local second variation of the von Neumann entropy of a region in theories with a gravity dual. For null variations our formula says that the diagonal part of the Quantum Null Energy Condition is saturated in every state, thus providing an equivalence between energy and entropy. We prove that the formula holds at leading order in 1/N, and further argue that it will not be affected at higher orders. We conjecture that the QNEC is saturated in all interacting theories. We also discuss the special case of free theories, and the implications of our formula for the Averaged Null Energy Condition, Quantum Focusing Conjecture, and gravitational equations of motion. We show that the leading-order gravitational equations of motion, Einstein's equations, are equivalent to leading-order saturation of the QFC for Planck-width deformations.


Introduction
The connection between quantum information and energy has been an emerging theme of recent progress in quantum field theory. Causality combined with universal inequalities like positivity and monotonicity of relative entropy can be used to derive many interesting energy-entropy bounds. Examples include the Bekenstein bound [1], the quantum Bousso bound [2,3], the Averaged Null Energy Condition (ANEC) [4,5], and the Quantum Null Energy Condition (QNEC) [6][7][8][9]. Here we strengthen the energy-entropy connection, moving from bounds to equalities.
The key insight of the QNEC, which we will exploit, is that one should look at variations of the entropy S of a region as the region is deformed. Consider the entropy as a functional of the entangling surface embedding functions X µ . Then one can compute the functional derivative δ 2 S/δX µ (y)δX ν (y ) which encodes how the entropy depends on the shape of the region. In general, this second variation will contain contact, or "diagonal," terms, proportional to δ-functions and derivatives of δ-functions, as well as "off-diagonal" terms. Our interest here is in the δ-function contact term, and we introduce S µν as the coefficient of the δ-function: δ 2 S δX µ (y)δX ν (y ) = S µν (y)δ (d−2) (y − y ) + · · · (1.1) Null Variations First consider the null-null component of the second variation, S vv (y), where v is a null coordinate in a direction orthogonal to the entangling surface at the point y. 1 Suppose the entangling surface is locally restricted to lie in the null plane orthogonal to v near the point y. With this setup we can apply the QNEC, which says S vv ≤ 2π T vv . Our main conjecture is that this inequality is always saturated: 2 We believe this holds for all relativistic quantum field theories with an interacting UV fixed point in d > 2 dimensions. For the special case of an interacting CFT this fully specifies the stress tensor in terms of entropy variations: by considering (1.2) for all entangling surfaces passing through a point, T µν is completely determined up to a trace term. In a CFT the trace of the stress tensor vanishes, and so the entropy variations determine the full stress tensor in that case. This is the sense in which energy comes from entanglement. Our primary evidence for (1.2) is holographic, as explained below. But if we restrict attention to quantities that can be built out of local expectation values of operators and the local surface geometry there is no other possibility for S vv . A significant constraint comes from considering the vacuum modular Hamiltonian, K, which is defined by S(σ + δσ) − S(σ) = Tr (Kδσ) + O δσ 2 , (1.3) 1 We are restricting attention to field theories in Minkowski space throughout the main text. 2 In [10] the issue of QNEC saturation was also investigated, but this is a different notion of saturation.
Their analysis did not isolate the δ-function component, and instead considered the total variation in the entropy including the contribution of off-diagonal terms. So the examples in [10] where the QNEC is not "saturated" are not in contradiction with our results.
where σ is the vacuum state reduced to the region under consideration and δσ is an arbitrary perturbation of the state. If we had a general formula for S in terms of expectation values of operators, we would be able to read off the modular Hamiltonian from the terms in that formula linear in expectation values. 3 For a region bounded by an entangling surface restricted to a null plane the modular Hamiltonian has a known formula in terms of the stress tensor [11], and in particular we have That is why T vv is the only possible linear term we could have had in (1.2). A nonlinear contribution to S vv , such as a product of expectation values, is restricted by dimensional analysis and unitarity bounds: the only possibility is if the theory contains a free field. Then we can take the classical expression for T vv , which is quadratic in the field, and replace each of those fields with expectation values to get an expression quadratic in expectation values with the right dimensionality to contribute to S vv . For interacting fields, nonzero anomalous dimensions prevent this from working. We will say more about free theories in Appendix B, where we will see that this possibility is realized by a term ∼ ∂ v φ 2 for a free scalar field, which is why we limit ourselves to interacting theories in the main text. The substance of (1.2), then, is the statement that there are no non-local contributions to S vv .
Relative Entropy There is a natural interpretation of (1.2) in terms of relative entropy. The relative entropy of a state ρ and a reference state σ-for us, the vacuum-is a measure of the distinguishability of the two states. We will denote the relative entropy of ρ and the vacuum by S rel (ρ). By definition, the relative entropy is S rel (ρ) = ∆ K − ∆S, (1.5) where ∆ K and ∆S denote the vacuum-subtracted modular energy and vacuum-subtracted entropy, respectively. A consequence of (1.2) is that ∆S vv = ∆ K vv , so we can say that S rel,vv = 0. (1.6) This equation is implied by (1.2) but is weaker, since it does not require us to know what the modular Hamiltonian actually is. The extra information of (1.2) is the expression (1.4) for the second variation of the modular Hamiltonian. It can be useful to formulate our results in terms of relative entropy instead of entropy itself because relative entropy is generally free from UV divergences, at least for nice states. 4 Non-Null Deformations Now let us move beyond the null case. Our goal in doing this is to understand the simplest setup where non-null deformations can be analyzed, and so we will make several additional restrictions that we do not make in the null case. As explained in [12,13] and below in Section 2.2, (1.2) for the null case is a well-defined, finite equation in field theory. Local stationarity conditions on the entangling surface are enough to eliminate state-independent geometric divergences in the entropy, and the remaining state-dependent divergences cancel between the entropy and stress tensor. In the non-null case, eliminating divergences is more difficult. State-independent divergences can be dealt with by considering the vacuum-subtracted entropy ∆S rather than just S. State-dependent divergences associated with low-lying operators in the theory are more problematic. To eliminate these divergences, it is enough to restrict our attention theories where all relevant couplings have mass dimension greater than d/2, and to states where operators of dimension ∆ ≤ d/2 have vanishing expectation values near the entangling surface. The idea of these restrictions is to make sure there are no parameters with scaling dimension small enough to contribute to divergences. We will make the further restriction in the non-null case to planar entangling surfaces, and this last restriction is made purely to simplify the analysis and presentation. With these assumptions in place we find where n µν is the normal projector to the entangling surface and h ab is the intrinsic metric on the entangling surface. 5 Note that (1.7) implies that S rel,µν = 0.

Consequences for Field Theory and Gravity
We view (1.2) and (1.7) as deep truths about interacting quantum field theories, worthy of further study. At present, our evidence for these conjectures comes from holography. We will calculate S µν directly and prove that (1.2) and (1.7) hold precisely at leading order in large-N for all bulk states. We will also argue that subleading corrections in 1/N do not alter these conclusions. While this does not amount to a full proof, it is enough evidence for us to posit that (1.2) is true universally, and that (1.7) holds with relatively few additional assumptions. An immediate application, which we discuss in Section 6, is to gravity. If we couple our field theory to gravity, then we can effectively isolate the δ-function part of the null second variation by deforming the entangling surface over a Planck-sized, or slightly larger, domain. According to the Raychaudhuri equation, if the surface is locally stationary then the leading change in its area due to this deformation is determined by R vv , the null-null component of the Ricci tensor. Using (1.2) together with Einstein's equations, R vv = 8πG N T vv , we learn that this change in area is precisely canceled by 4G N S vv . This means that the leading-order change in generalized entropy-area in Planck units plus entropy-is actually zero under such a deformation. In Section 6 we will show how this argument can also be reversed, demonstrating that this leading-order cancellation in the variation of the 5 In [9], a quantum version of the dominant energy condition which involved spacelike deformations of entropy was proposed for d = 2 dimensions. In that inequality, timelike components of the stress tensor were bounded by spacelike components of the entropy variation, whereas in (1.7) timelike components of the stress tensor are related to timelike components of ∆S µν (ignoring the second term of (1.7), which is absent in two dimensions). Our techniques are not directly applicable to two dimensions, and a naïve extrapolation of (1.7) is probably incorrect, but it would interesting to investigate this issue further in the future.
generalized entropy can be taken as a fundamental principle and used to derive Einstein's equations. This is essentially an update of the thermodynamic derivation of Einstein's equations by Jacobson [14].
Outline In Section 2 we review some of the basic concepts of entropy, relative entropy, and the holographic setup that will be relevant for our calculation. In Section 3 we prove (1.2) for situations where it is sufficient to consider linear perturbations of the bulk geometry. This includes any state where gravitational backreaction in the bulk is small. In Section 4 we extend this proof to any bulk state. The idea is that S vv is related to near-boundary physics in the bulk, and for any state the near-boundary geometry is approximately vacuum. So the proof reduces to the linear case. In Section 5 we move away from null deformations to prove (1.7) using the same techniques. We conclude in Section 6 with a discussion of extensions and implications of our work. Several appendices are included discussing closely related topics.

Setup and Conventions
In this section we will make some general remarks about the known relations between entropy and energy, and the implications of our conjecture.

The Field Theory Setup
Let u = (t − x)/ √ 2 and v = (t + x)/ √ 2 be null coordinates, and let y denote the other d − 2 spatial coordinates. For now, and for most of the rest of the paper, we will take the boundary of our region ∂R to be a section of the null plane u = 0. This boundary is specified by the equation v = V (y). We take the region R to be a surface lying witin the "right quadrant," having u < 0 and v > V (y) (marked in yellow in Fig 1). A oneparameter family of functions V λ (y) specifies a one-parameter family of regions R(λ). We always take the one-parameter family to be of the form V λ (y) = V 0 (y) + λV (y) withV ≥ 0, so that λ plays the roll of an affine parameter along a future-directed null geodesic located at position y.
Given any global state of the theory, we can compute the von Neumann entropy S of the region R. Keeping the state fixed, the entropy becomes a functional of the boundary of the region, S = S[V (y)]. When we have a one-parameter family of regions, then we can write S(λ) = S[V λ (y)]. Throughout the rest of this work we will be interested in the derivatives of S with respect to λ, as well as the functional derivatives of S with respect to V (y). These are related by the chain rule: We can parametrize the second functional derivative as follows: Figure 1. Most of our work concerns the variations of entanglement entropy for the yellow region R whose boundary ∂R lies on the null plane u = 0. The entangling surface is specified by the function V (y).
We have extracted the δ-function term explicitly, which we sometimes refer to as the "diagonal" part, and the remainder carries the label "od" for "off-diagonal." Note that the off-diagonal part of the variation does not have to vanish at y = y . The quantity S vv is the same as S in [6,15,16]. In addition to the entropy of the region R, we can define the vacuum-subtracted modular energy, ∆ K , and relative entropy with respect to the vacuum, S rel , associated to the region R. The modular energy is given by the boost energy along each generator of the null plane [11]: The relative entropy is defined as the difference between the vacuum-subtracted modular energy and the vacuum-subtracted entropy: For the regions we are talking about, the entropy of the vacuum is stationary and so drops out when we take derivatives of S rel . Then for a one-parameter family of regions we have the relations (2.7) Note here that our conjectured equation (1.2) can be restated as saying that the diagonal second variation of the relative entropy is zero. These equations will be mirrored holographically in Section 3 below.

The Bulk Setup
While we have a few remarks on the free-field and weakly-interacting cases in Appendix B, most of our nontrivial evidence for (1.2) and (1.7) comes from holography. In this section we will describe the holographic setup for the calculations outlined above. We are actually able to do without much of this machinery in Section 3, though it will become important afterward.
The boundary theory is a quantum field theory in d-dimensional Minkowski space obtained by deforming a CFT with relevant couplings. We take the bulk metric to be in Fefferman-Graham gauge (at least near the boundary) and choose to set the AdS length to one: Here x µ stands for u, v, or y. In the small-z expansion, the metric γ µν is given by [17] 6 In a fully-quantum treatment, γ µν is an operator in the bulk theory and we would need to take the expectation value of any geometric expression to extract a numerical result. Then there would be a difference between, say, γ µν 2 and γ 2 µν that we would have to resolve in order to move beyond leading order in a semiclassical expansion. A consequence of our analysis below is that only expressions which are linear γ µν end up being important for proving (1.2) and (1.7), and thus this potential difficulty is avoided. With that in mind, we will treat the bulk geometry as classical for ease of presentation.
The term at order z d in (2.9), γ (d) µν , contains information about T µν [18]. We will review the dictionary below. The terms at lower orders than z d are associated with lowdimension operators in the theory [17]. If O is a relevant operator of dimension ∆ and coupling g, then possible such terms that we need to be aware of include with m ≥ 2. The coupling g, when present, is a constant. With only a single operator, terms involving derivatives of O will always be of higher order than z d as long as the unitarity bound ∆ > (d − 2)/2 is obeyed. When there is more than one low-dimension operator then we can also have terms with different combinatorial mixes of couplings and expectation values [19]. In this case, there could also be terms of the form where O 1 and O 2 are two operators and g 1 is a relevant coupling associated to O 1 . There are other possibilities as well, but we will not need to enumerate them. In order to demonstrate the cancellation of divergences explicitly in (1.2), we would need to make use of certain relationships among the various parts of the small-z expansion of the metric. Since there are general arguments for the finiteness of (1.2), we will be content to show that the leading state-dependent divergences cancel. 7 To that end, we will need the following fact. Suppose that in the sum (2.9) there is a term of the form γ (α) µν = γ (α) η µν . Then, assuming that α cannot be written as α 1 + α 2 for some other α 1 , α 2 occuring in the sum, there will be another term γ (α+2) µν with a null-null component given by This equation is obtained by solving Einstein's equations at small-z [17,18]. Four-derivative terms are also possible, at order α + 4, but if d ≤ 6 then the unitarity bound ensures that α + 4 > d. For simplicity we will ignore those terms in this section, but with a little more effort they can also be accounted for.
Holographic Entropy and its Variations Our tool for computing the entropy is the Ryu-Takayanagi holographic entropy formula [20,21] including the first quantum corrections [22]

13)
A ext refers to the area of the extremal area surface anchored to ∂R at z = 0. The dictionary for computing variations in the entropy as a function of V (y) was laid out in [15] as follows.
Let the bulk location of the extremal surface be given by where the log term is important for even dimensions and the in the case of relevant deformations with particular operator dimensions. X µ (y) are the embedding functions of ∂R andX µ (y, z) satisfies the extremal surface equation, where H is the induced metric on the extremal surface and Γ are bulk Christoffel symbols. Note that we have introduced the notationX µ for the bulk extremal surface coordinates which approach X µ on the boundary. We will be interested in computing δA ext /δX µ (y), which by extremality is a pure boundary term evaluated at a z = cutoff surface: All of the factors appearing in the integrand need to be expanded in . The result will be a power series in containing divergent terms as well as finite terms: In other words, we will only explicitly demonstrate the finiteness of (1.2) given some conditions on the operator dimensions which make the terms we display the only ones that are around. 8 In this section and in our main analysis we are only working to next-to-leading order so that the prescriptions of [22] and [23,24] agree. If we wanted to work to higher orders in 1/N , we would need to use the quantum extremal surface prescription instead [23,24]. We discuss this further in Section 6.1.
Here K µ is the extrinsic curvature of the entangling surface. We need to ensure that all divergences cancel or otherwise vanish in (1.2) and (1.7) in order that these be welldefined statements. So here we will explain the structure of the divergences in the entropy variations, as well as how to extract the finite part.
Null Variations First, we will consider the special case X µ (y) = V (y), which is the relevant case for (1.2). If there are no terms of the form (2.11) in the metric, then the situation reduces to that of [15], in which it was shown that the divergent terms in (2.17) are absent as long as the entangling surface ∂R is locally constrained to lie in a null plane.
If there are state-dependent terms of the form (2.11) in the metric, then there will be non-vanishing divergent contributions to δA ext /δV (y) proportional to, e.g., In general, an extra term at order z α in the metric leads to a contribution at order α+2 inX µ that we can obtain by solving (2.15) at small z. We only need to concern ourselves with terms that have α + 2 < d, as those are the ones which lead to divergences. As mentioned above, for d ≤ 6 the only terms in the metric at order α such that α + 2 < d are those of the form γ (α) µν = γ (α) η µν . After solving the extremal surface equation in the presence of such a term we find Plugging this in to (2.16) leads to where we have eliminated a potential log term by restricting ourselves to the case of generic operator dimensions. The non-generic case can be recovered later as a limit. Using this, we can find the leading-order contribution to the second variation of the entropy: .
(2.20) Even though this is a very complicated expression in general, we will be able to extract the δ-function contribution and see that it is given by T vv as in (1.2).

Non-Null Variations
For a general non-null variation we lose some of the simplifications present in the non-null case. One additional assumption we will make in Section 5 is to consider entangling surfaces which are planar prior to being deformed, which simplifies some of the geometric expressions. More importantly, however, notice that (1.7) only makes reference to the vacuum-subtracted entropy variation, ∆S µν , and not S µν itself. So any state-independent terms in (2.17) can be ignored. Furthermore, for the discussion of the non-null variations we are only going to consider theories where relevant couplings (if present) have mass dimension greater than d/2, and states where operators of dimension ∆ ≤ d/2 have vanishing expectation values in the vicinity of the entangling surface. The result of these restrictions is that terms like (2.11) will not be present in the metric up to order z d , and so there will be no state-dependent entropy divergences. Thus for our analysis of non-null deformations, it follows from (2.17) that . (2.21) In Section 5 we will also not deal explicitly with the bulk entropy term, but we expect its contributions to be qualitatively similar to the bulk entropy term in the null case.
Identification of the Stress Tensor We will also need a holographic formula for the stress tensor, T µν . Normally a renormalization procedure is required to define a finite stress tensor. Since our conjectures (1.2) and (1.7) are meant to be finite equations, it will be enough to regulate the stress tensor with a cutoff as we did with the entropy above. 9 By definition, the (regulated) stress tensor is computed as the derivative of the regulated action: In holography, the regulated action is defined as the action of the bulk spacetime within the z = cutoff surface, plus additional boundary terms (like the Gibbons-Hawking-York term) which are necessary to make the variational principle well-defined. [18,25]. For Einstein gravity in the bulk with minimally-coupled matter fields, the regulated stress tensor is then given by the Brown-York stress tensor evaluated on the z = cutoff surface [26]: 10 Any state-dependent terms in the metric that occur at order z α with α < d will contribute to divergences in the stress tensor. In particular, when we discuss null variations we will find contributions from terms of the form (2.12). In total we find In the second line we used (2.12). Comparing this to (2.20), we see that the divergences indeed cancel out in (1.2). 9 We still want to define the stress tensor so that Tµν = 0 in vacuum, so the constant vacuum energy term will be subtracted. 10 Care must be taken to impose the correct boundary conditions at z = . Since we are interested in a flat-space result, we must place a flat metric boundary condition at z = before taking → 0. This is the only way to get the divergences to cancel out properly between the entropy and the energy in (1.2), and this treatment of the boundary condition is especially important if one wants to extend the analysis to curved space [12].
For the non-null case we have additional difficulties. One can easily see that, in general, there are state-dependent divergences in T µν that do not appear in S µν . For example, if there are operators of dimension ∆ < d/2 in the theory then there will be a term in γ µν at order z 2∆ proportional to O 2 η µν . By the unitary bound, 2∆ > d − 2, such a term will not contribute divergences to S µν , but it will contribute divergences to the stress-tensor of the form Thus, when we derive the relationship (1.7) in Section 5, we will put sufficient restrictions on the theory and the states in consideration so that both sides of the equality are finite and well-defined. As in the case of the entropy variation, all divergences in T µν can be eliminated by restricting the theory so that any nonzero relevant couplings have mass dimension greater than d/2, and by restricting the state so that operators of dimension ∆ ≤ d/2 have vanishing expectation values (at least locally near the entangling surface). When this is true, the metric perturbation γ µν starts at order z d , and so T µν will be finite. Furthermore, we can treat the stress tensor as being effectively traceless even though we are not in a CFT. That is because in general the trace is proportional to products of couplings and scalar expectation values, g O , but with our restrictions on the theory and state there is no pair of nonzero coupling and operator expectation value with total dimension adding up to d. The end result is the standard formula for the stress tensor familiar from holographic renormalization [18]: We will make use of this formula in Section 5.

Null Deformations and Perturbative Geometry
In this section we will prove the relation S vv = 2π T vv for states with geometries corresponding to perturbations of vacuum AdS where it suffices to work to linear order in the metric perturbation. This includes classical as well as quantum states. Below in Section 4 we will extend our results to non-perturbative geometries. The arguments presented here can be repeated for linearized perturbations to a non-AdS vacuum, i.e., the vacuum of a non-CFT. We restrict ourselves to the AdS case because explicit solutions to the equations are available, and the AdS case also suffices for nearly all applications in the following sections. We will see in Section 4 that in certain situations an appeal to the non-AdS vacuum case is necessary, but because of general arguments (like the known form of the modular Hamiltonian as discussed in the Introduction) we know that the non-AdS case should not behave differently than the AdS case.

Bulk and Boundary Relative Entropies
In [27] it was argued that bulk and boundary relative entropies are identical: where S rel,bulk is calculated using the bulk quantum state restricted to the entanglement wedge of the boundary region R -the region of the bulk bounded by the extremal surface and R. 11 We already discussed in Section 2.1 the form of S rel for the regions we are considering, but to leading order in bulk perturbation theory there is an analogous simple formula for S rel,bulk . We only need to know two simple facts. First, if ∂R is restricted to lie in the u = 0 plane on the boundary then, to leading order, the extremal surface in the bulk also lies in the u = 0 plane. Second, to leading order the bulk modular energy corresponding to such a region is given by the AdS analogue of (2.4): In keeping with our earlier notation,V (y, z) gives the location of the bulk extremal surface withV (y, z = 0) = V (y). Now we simply solve (3.1) for the vacuum-subtracted boundary entropy ∆S, and take two derivatives with respect to a deformation parameter λ to find The first term represents a contribution of 2π T vv to S vv . So (1.2), S vv = 2π T vv , amounts to showing that the remaining two terms do not contribute to S vv . We examine them both in the next section.

Proof of the Conjecture
From the discussion around (3.4), the conjecture S vv = 2π T vv amounts to the statement that the terms do not contribute a δ-function to the second variation of S. Together these terms comprise the second derivative of the bulk relative entropy. We treat the two terms individually.

Bulk Modular Energy
The modular energy term is simple to evaluate. Note that (3.2) depends on the entangling surface V (y) through the extremal surfaceV (y, z). So functional derivatives of that expression with respect to V (y) involves factors of δV (y, z)/δV (y ). This is the boundary-to-bulk propagator of the extremal surface equation in pure AdS. The result, which can be extracted from our discussion in later sections, is [30] δV (y, z) (3.6) Then we have We can diagnose the presence of a δ-function by integrating with respect to y 1 over a small neighborhood of y 2 . If the result remains finite as the size of the neighborhood goes to zero, then we have a δ-function. Whether or not this happens depends on the falloff conditions on T bulk vv near z = 0, which in turn depends on the matter content of the bulk theory. If we suppose T bulk vv ∼ z β as z → 0, then it is easy to see that there is no δ-function so long as where ∆ is the dimension of the dual operator. This is even true when the non-normalizable mode φ ∼ gz d−∆ is turned on, as long as the coupling g is constant. For bulk Dirac fields, In either case, equation (3.8) reduces to the unitarity bound on the dual operator dimension, ∆ > (d − 2)/2 + s, where s = 0, 1/2 is the spin. In the limiting case where the unitarity bound is saturated and the dual operator is a free scalar or free fermion, one may find a δ-function in (3.7). Indeed, in Appendix B we find extra contributions to S vv besides 2π T vv for a free scalar field, so the appearance of an additional δ-function in this case is an expected feature. The case of a free fermion has not yet been worked out in the field theory, but methods similar to those in Appendix B should be applicable. For operators which do not saturate the unitarity bound, we have shown that ∆K bulk does not contribute to S vv .
Bulk Entropy It is much more difficult to make statements about d 2 S bulk /dλ 2 . In a coherent bulk state we know that d 2 S bulk /dλ 2 = 0, so for that class of states we are done. 12 More generally, we can write and ask what sort of behavior would be required of δ 2 S bulk /δV (y, z)V (y , z ) in order to lead to a δ-function in y 1 − y 2 . As a toy model, we can imagine a collection of particles on the u = 0 surface which are entangled in a way that depends on their distance from each other. This is a fairly general ansatz for the state of a free theory in the formalism of null quantization [31]. At small z (which is the dominant part for our calculation) this would correspond to a second variation of the form (3.10) The factor (zz ) ∆ /(zz ) d−1 reflects that entropy variations should be proportional to the amount of matter present at locations z and z . The numerator encodes the falloff conditions on the density of particles in a way that is consistent with the falloff conditions for a bosonic matter field, and the denominator is a measure factor that converts coordinate areas to physical areas. The function F is arbitrary.
With the assumption of (3.10), a constant rescaling of all coordinates by α leads to an overall factor of α 4−2d+2∆ in (3.9). A δ-function in y 1 − y 2 would scale like α 2−d , and anything that scales with a power of α less than 2−d would correspond to a more-divergent distribution, like the derivative of a δ-function. As long as ∆ > (d − 2)/2 this is avoided, and a δ-function is only present when the unitarity bound ∆ = (d − 2)/2 is saturated. This is consistent with what we found previously for the modular energy, and with our general expectations for free theories.

Non-Perturbative Bulk Geometry
Now we turn to a proof that applies for a general bulk geometry, still restricting the deformations to be null on the boundary. We will use the techniques outlined in Section 2.2, which relate the entropy variations to changes in the bulk extremal surface location. At first we will stick to boundary regions where ∂R is restricted to a null plane, leaving a generalization to regions where ∂R only satisfies certain local conditions for Section 6.2.

Extremal Surface Equations
Small z, Large k The extremal surface equation (2.15) forŪ andV is a very complicated equation. If we perturb the boundary conditions by taking V → V +δV , then the responses δŪ and δV will satisfy the linearized extremal surface equation, which is a bit simpler. It may be that the coordinates we have chosen are not well-suited to describing the surface perturbations deep into the bulk. That problem is solved by only aiming to analyze the equations in the range z < z * for some small but finite z * . In fact, by choosing z * small enough we can say that the spacetime is perturbatively close to vacuum AdS, with the perturbation given by the Fefferman-Graham expansion (2.9). Since the corrections to the vacuum geometry are small when z * is small, the extremal surface equation reduces to the vacuum extremal surface equation plus perturbative corrections. All of the deep-in-thebulk physics is encoded in boundary conditions at z = z * . The situation is illustrated in Fig. 2 The boundary conditions at z = z * are essentially impossible to find in the general case, so the restriction to z < z * does not make the problem of finding the extremal surface any easier. However, according to (2.20) all we are interested in is the δ-function part Figure 2. By restricting attention to z < z * the geometry is close to pure AdS, and we can solve for δX perturbatively. All of the z < z * data imprints itself as boundary conditions at z = z * . We show that these boundary conditions are unimportant for our analysis, which means that a perturbative calculation is enough.
of δU (d) . It will turn out that this quantity is actually independent of those boundary conditions.
The idea is very simple. In Fourier space a δ-function has constant magnitude. That means it does not go to zero at large values of k, unlike the Fourier transform of a smooth function. So the strategy will be to analyze the extremal surface equation in Fourier space at large k. We will see that the large-k response ofŪ (and hence U (d) ) is completely determined by near-boundary physics, and in particular will match the results we found in previous sections. This will establish that S vv = 2π T vv for very general bulk states.

Integral Equation forŪ
We will begin by finding an integral equation forŪ in the range z < z * . SinceŪ vanishes at z = 0 it must remain small throughout z < z * , as long as z * is small enough, and so we can use perturbation theory to findŪ in that range. Then we will compute the response ofŪ to variations of the boundary conditions V at z = 0. Expanding (2.15) in small z, we can write the equation forŪ as where γ µν /z 2 is the deviation of the metric from vacuum AdS, as in (2.9). To solve this equation perturbatively we require a Green's function G(z, y|z , y ) of the linearized extremal surface equation that vanishes when z = 0 or z = z * . Then the solution to (4.1) can be written as It is important to remember that J(y, z) is itself a functional ofŪ , and the usual methods of perturbation theory would involve solving forŪ iteratively. It will be more useful for us to look at the Fourier transform of this equation: The Green's function with the correct boundary conditions is easily obtained from the standard Green's function G AdS by adding a particular solution of the vacuum extremal surface equation. In Fourier space, the answer is (4.5) In the limit of large k, the first term of (4.3) becomes exponentially suppressed. So we see that the boundary conditions at z = z * do not matter. Furthermore, the integration range z 1/k in the second term also becomes exponentially suppressed. So only the small-z part of the source J contributes at leading order in the large-k limit.

Terms in the Source
Let us consider the form of the source in position space in more detail. We know that J = J[Ū ,V , γ] is a functional of the extremal surface coordinates and the metric perturbation. We can treat J as a double power series in γ andŪ since we are doing perturbation theory in those two parameters. We will repeatedly take advantage of the "boost" symmetry of the equation: under the coordinate transformation u → αu, v → α −1 v, the source must transform as J → αJ in order for the whole equation to be covariant. Since every occurrence ofV must be accompanied by either a γ orŪ to preserve the boost symmetry, J[Ū ,V , γ] is actually a triple power series in all three of its parameters. Another important fact is dimensional analysis, which comes from scaling all coordinates together: J has length dimension −1, whileŪ andV have dimension 1 and γ has dimension zero. This will also be used to restrict the types of terms we can find.
The variation δŪ satisfies an integral equation similar to that ofŪ except with the source, J, replaced by the variation of the source, δJ. Like J, we can treat δJ as a power series. Each term in the δJ power series contains a single δŪ , δγ, or δV , multiplied by some number ofŪ ,V , and γ factors (and their derivatives). It is important to note that these unvariedŪ ,V , and γ factors are smooth, and therefore their Fourier transforms decay at large k. So the Fourier transform of a term in δJ looks schematically like where Ψ is either γ,V ,Ū , or their derivatives and h is the Fourier transform of a smooth function. The k-dependence at large k of a given term in δJ is completely determined by the factor δΨ being varied. The case where Ψ = γ can be reduced immediately to the other two, because δγ = δV ∂ v γ + δŪ ∂ u γ.
In Fourier space, we can write δJ(k, z) as a sum of terms of the form δJ mn z m k n at small z and large k. 13 Since the effect of z * is exponentially suppressed at large k, we can drop the first term in (4.3) and push the limit in the second term off to infinity. Additionally, the difference between G k (z|z ) and G AdS k (z|z ) is exponentially suppressed. Thus for our purposes we have If m < d − 2 then the first term in (4.7) represents a contribution toŪ that could have been obtained by doing the small-z expansion of the extremal surface equation. In a CFT these would consist only of geometric terms that depend on extrinsic curvatures of the entangling surface, but our boundary condition U = 0 guarantees that those vanish. Still, when a relevant deformation is turned on there may be terms proportional to g l 1 ∂ v O 2 which enterŪ at low orders in z. An important fact, enforced by the unitarity bound, is that these low-order terms are all linear in expectation values. When m = d − 2 each of the terms in (4.7) becomes singular, but actually the combination above remains finite and generates at z d log z term. Since (4.7) is well-behaved in this limit, we can treat the non-generic case m = d − 2 as a limiting case of generic m. Thus throughout our discussion below m is assumed to be generic. Finally, for d > 6 another term proportional to z 4+m (and z 6+m in d > 8, etc.) should be included, but for simplicity we have not written it down. Qualitatively it has the same properties as the z 2+m term.
Our focus is on the z d term, as this is where the finite contributions to the entropy variation come from, as in (2.20). From (4.7), we see that the δ-function is determined by source terms with n − m = 2 − d, which corresponds to k 0 behavior at large k. So our task is simply to enumerate the possible terms in δJ which have this behavior. We will see that such terms are completely accounted for by the linearized analysis of the previous section, 14 which completes the proof.
Ingredients Before diving into the terms of the source, we will collect all of the facts we need about the functionŪ ,V , γ, and their variations. In particular, we will need to know what powers of k and z we can expect them to contribute to the source. 13 There may also be terms in the source of the form z m log(z). Qualitatively these terms behave similarly to the z m terms as far as the δ-function part of the entropy variation is concerned, so we will not explicitly keep track of them.
14 As mentioned in the previous section, for simplicity of presentation we are performing our perturbation theory around empty AdS, whereas in complete generality one would want to perform the analysis based around the vacuum of the theory in question. The difference is that some terms which are linear in expectation values O might appear at higher orders in perturbation theory around empty AdS even though they are fully accounted for in the linearized analysis about the correct vacuum.
We begin withV . UnlikeŪ ,V does not have any particular boundary condition at z = 0. Thus the Fefferman-Graham expansion forV contains low powers of z that depend on geometric data of the entangling surface. In particular, the boundary condition itself entersV at order z 0 , which is neutral in terms of the n − m counting. That same behavior extends to the variation δV : in Fourier space, the state-independent parts of δV are functions of the combination kz. In other words, we find schematically δV ∼ (1 + k 2 z 2 + k 4 z 4 + · · · )δV. (4.8) The boundary condition δV itself is taken to go like k 0 at large k (i.e., a δ-function variation). So in terms of our power counting, which only depends on n − m, these terms are all completely neutral. So a factor of δV in the source is "free" as far as the power counting is concerned. There will be other terms in δV , even at low powers of z, but the terms in (4.8) are the ones which dominate the n − m counting. U is also an extremal surface coordinate, but it has the restricted boundary condition U = 0. That means it does not possess terms like those in (4.8). The lowest-order-in-z terms that can be present are of the form g l It is only terms like this which contain a single factor of O that can show up at lower orders than z d , because of the unitarity bound ∆ > (d − 2)/2. Taking a variation, we find a term in δŪ of the form The final ingredient is the metric perturbation γ. We don't have to consider variations of γ directly, since they can be re-expressed in term s of variations ofŪ andV . γ itself has a Fefferman-Graham expansion which in includes information about the stress tensor at order z d , but can have lower-order terms as well that depend on couplings and expectation values of operators. We will see that the important terms in the source that affect the δ-function response are those which are linear in γ.
Terms with δŪ Now we will analyze the possible terms in the source which can be obtained by piecing together the above ingredients. We begin with terms proportional to δŪ . As stated above, there are dominant contributions toŪ in terms of the n − m counting which are proportional to derivatives of expectation values of operators.
ButŪ does not occur alone in the source J: since all terms withŪ alone in the equation of motion are part of the linearized equation of motion on the left-hand-side of (4.1). An additional factor ofV does not affect the dominant n − m value of the term, but the combinationŪV is also prevented from appearing in J by boost symmetry. We need to have at least another factor ofŪ , or else a factor of γ. The dominant possibility without using γ is something of the form ∂Ū ∂V ∂ 2 δŪ , where derivatives have been inserted to enforce the correct total dimensionality. Taking into account the derivatives, a term like this can have at most n − m = 3 using the unitarity bound. So this sort of term will not matter for the δ-function response.
Making use of γ allows for more possibilities. Terms of the schematic form γδŪ in the source can have n − m > 2 − d, and if we allow fine-tuning of operator dimensions we can even reach n − m = 2 − d. These sources are obtained by taking a state-independent term in γ which is proportional to some power of g 1 and a term in δŪ which is proportional to ∂ 2 v O 2 . We can even multiply by more factors of γ, giving γ l δŪ schematically, as well as factors ofV , as long as we don't involve more factors ofŪ . A second factor ofŪ brings with it a large z-scaling, so we run into the same problem we had above in theŪV δŪ case. The end result is that all of the potentially-important terms in this analysis are linear in the expectation value O . That means they are subject to restrictions on the modular Hamiltonian as mentioned in the Introduction, which means that they will actually not show up in (1.2) despite being allowed by dimensional analysis.
Terms with δV Now we consider terms in δJ that are proportional to a variation δV . As discussed above, δV has several state-independent terms which are neutral in the n − m counting. Due to the boost symmetry, δV cannot occur alone in δJ. It must be accompanied by at least two factors ofŪ or one factor of γ. We have already discussed how two factors ofŪ have a large-enough z-scaling to make the term uninteresting, so it remains to consider factors of γ.
Terms in the source proportional to δV with only a single factor of γ are those present in the theory of linearized gravity about vacuum AdS. Furthermore, since we argued that boundary conditions at z = z * do not affect the answer, the Green's function we use to compute the effects of the source is also the same as we would use in linearized gravity about vacuum AdS. We already considered the linearized gravity setup in Section 3, even though we didn't solve it using the methods of this section. In Section 3 we saw that S vv = 2π T vv , and so it is enough for us now to prove that the general computation of the δ-function terms reduces to the linearized gravity case. There is only one more loose end to consider: terms in δJ proportional to δV that have more than one factor of γ.
With more than a single factor of γ, it is clear that the only contributions that could possibly be important at large k are those coming from the powers of z less than z d in (2.9). These terms are made up of couplings g, operator expectation values O , and their derivatives. In order to have the correct boost scaling, we need to include v-derivatives acting on operator expectation values. As we have discussed many times, the unitarity bound prevents any term with more than one factor of O from being important. So just as with the δŪ terms discussed previously, all of these terms are subject to constraints from the modular Hamiltonian and hence do not appear in (1.2) Our analysis so far has been very simple , but we have reached an important conclusion that bears repeating: the source terms which give the k 0 behavior for δU (d) were already present in the linearized gravity calculation of the previous section, and we are allowed to use the ordinary Green's function G AdS to compute their effects. In other words, for the purpose of calculating the δ-function response we have reduced the problem to linearized gravity. We have shown previously that the linearized gravity setup leads to S vv = 2π T vv , and so our proof is complete.

Non-Null Deformations
Having established S vv = 2π T vv for deformations of entangling surfaces restricted to lie in the plane u = 0, we will now analyze arbitrary deformations of the entangling surface to prove (1.7). The technique is very similar to that of the previous section. As discussed in Sec 2.2, there are additional assumptions and restrictions we make in this case to help us deal with divergences and to simplify the analysis. First, we restrict attention to theories where all relevant couplings, if present, have mass dimension greater than d/2. Second, we restrict the state so that operators with scaling dimension ∆ ≤ d/2 have vanishing expectation value near the entangling surface. Finally, we restrict the entangling surface itself to be planar prior to taking any variations.

New Boundary Conditions
Above we analyzed deformations within the null plane u = 0 at small z and large k. These limits allowed us to show that the perturbation theory for δU (d) reduced to linearized gravity, which we had already studied in Section 3. There strategy here is the same, except we want to be able to perform perturbation theory on bothŪ andV in order to get more than just the null-null variations. The simplest case, which is all that we will analyze in this work, is to start with the boundary condition V = 0 at z = 0 in addition to U = 0. In other words, we take our undeformed entangling surface to be the v = u = 0 plane. That is a severe restriction on the type of surface we are considering, but we gain the flexibility of being able to do perturbation theory in bothŪ andV . From (2.21), where ∆S refers to the vacuum-subtracted entropy. Vacuum subtraction removes all stateindependent terms from the entropy, including divergences. blueFor the remainder of the section, we will drop the bulk entropy contribution. With the U = V = 0 boundary conditions, we can again write down our perturbative extremal surface equation for the z < z * part of the bulk. Since the null direction is no longer preferred, we will use a covariant form of the linearized equation: Following the same steps as in the previous section, we can use Green's functions to solve this equation in Fourier space. There is one new ingredient that we did not have before. When we computed the variation of U (d) with respect to V , we were changing the boundary conditions ofV and computing the response inŪ . In particular, the boundary condition ofŪ itself remained zero. In the more general setup of this section, we need to compute the response of a particular component ofX µ when its own boundary conditions at z = 0 are varied.
Since we only care about the δ-function contribution to the entropy variation, we will immediately use δX µ (k) = e iky 0 ξ µ as the boundary condition for δX µ . Here ξ µ is just a constant vector which tells us the direction of the perturbation. The presence of this boundary condition at z = 0 is simple to account for with one additional term in the integral equation forX µ compared to (4.3) in the previous section. In total, we now have As above, in the large-k limit the term coming from boundary conditions at z = z * (the first term in the second line of (5.3)) will drop out and so can be ignored completely. The term from boundary conditions at z = 0 (the first line of (5.3)) will not drop out automatically, and so will contribute to the second entropy variation. This contribution to the entropy variation is known as the entanglement density in the literature and was previously computed in [32,33]. From (5.3) it is clear that the entanglement density is completely determined by the AdS Green's function and is therefore state-independent. By restricting attention to the vacuum-subtracted entropy the entanglement density will drop out, and in any case is not proportional to a δ-function.

Terms in the Source
As in the null deformation discussion of Section 4, we need to compute the effects of the source δJ µ . As we did there, we will accomplish this by cataloging the various terms which can appear in the power series expansion of J µ as a function ofX and γ. Again, terms which scale like k n z m ultimately lead to k n−m+d−2 dependence at large k for δX µ (d) . Any term in δJ µ will look like δX ν multiplied by some function of γ andX. For the purposes of computing δJ µ only the state-independent parts of δX ν , represented by the first line of (5.3), will matter. That is because these terms are a function of the combination kz, which means they have n − m = 0. Now we just have to consider all of the possible combinations of γ andX which multiply δX.
There cannot be any terms in δJ µ that are schematically of the formXδX with some derivatives but no factors of γ. Such a term would have to come from nonlinearities in the vacuum AdS extremal surface equation. That equation is invariant underX → −X, so all terms have to have odd parity like the linear terms. Anything of the formXXδX, or higher powers ofX, will not contribute at large k because of power counting: The vanishing boundary condition means thatX starts at order z d , which means that the most favorable possible term of this type, (∂ zX ) 2 ∂ 2 z δX, still only amounts to a contribution to the entropy variation which scales like k 2−d . Now we consider terms which have at least one factor of γ. Because we have assumed that all couplings have dimension greater than d/2 and that expectation values of operators with dimension ∆ ≤ d/2 vanish, the leading order piece of γ scales like z d . Thus we can get contributions to δX (d) which go like k 0 from source terms which are schematically of the form γ∂ 2 δX, as well as other combinations. Given their importance, we will analyze terms of the form γδX below in more detail.
Terms with additional factors ofX or γ beyond the first power of γ will not lead to non-decaying behavior at large k because of power counting. So we see that only the linear gravitational backreaction is necessary to completely characterize ∆S µν . We will now calculate those terms explicitly.

Linearized Geometry
We have reduced our task to computing J µ to linear order in γ andX µ (the latter condition comes from our choice of a planar undeformed entangling surface). This is a simple exercise in expanding (2.15). The result in position space is a, b, c indices represent the y-directions and repeated indices are summed over. Taking the variation and evaluating atX µ = 0 gives The only terms in (5.5) that will contribute at k 0 are those with two y derivatives acting on δX µ or with z derivatives, i.e., the first line of (5.5). Then the result for δX µ (d) at large k is obtained from (5.3) as Here we have explicitly included factors of the entangling surface metric h ab (which is equal to δ ab ) rather than using repeated a, b indices for added clarity. In the last line, we have used the dictionary (2.26) to replace γ µν with T µν . The first two terms of (5.6) correspond to δ-functions in position space. The final term clearly contains a δ-function piece which will end up being proportional to the trace of T ab , but it also contains off-diagonal contributions. We can use the identity (5.7) to see the full effect in position space. However, for our purposes we are only interested in the δ-function contribution. Isolating this part and combining it with the first two terms of (5.6), we ultimately find n µν h ab T ab (5.8) where n µν is the normal projector of the entangling surface. This completes our derivation of (1.7).

Discussion
We have found formulas for the δ-function piece of the second variation of entanglement entropy in terms of the expectation values of the stress tensor. In this section we conclude by discussing a number of possible extensions and future applications of this result.

Higher Orders in 1/N
Since we believe (1.2) and (1.7) to be valid at finite N , it must be that our calculations are not affected by higher-order corrections within holography. One potential source of higher-order corrections comes from incorporating quantum fluctuations in the geometry, rather than treating the geometry as a classical background. We have already addressed this issue in Section 2, but we will repeat it here. The problem of a fluctuating geometry arises because the metric fluctuation γ µν is actually a quantum operator, and as such a classical expression which is nonlinear in γ µν has an ambiguous quantum interpretation because, in general, γ 2 µν = γ µν 2 . However, our analysis has shown that the δ-function part of the second entropy variation is determined entirely by terms which are linear in γ µν , and so this problem is avoided. There are two other classes of higher-order corrections we can consider: those coming form higher-curvature corrections to the bulk gravity, and those coming from the bulk entropy. These corrections can be encapsulated in the all-orders formula [24,29] S = S gen [e(R)] = S Dong [e(R)] + S bulk [e(R)]. (6.1) The first term here is the Dong entropy functional [28], which is an integral of geometric data over the surface e(R), 15 and the second term is the bulk entropy lying within the region bounded by e(R). Finally, the surface e(R) is the one that extremizes the S gen functional.
If we ignore the S bulk term for a moment, then S Dong behaves qualitatively the same way as the area in the Ryu-Takayanagi formula. The coordinatesX µ of e(R) obey a certain differential equation, and the variations in the entropy are still related to δX µ (d) as before. One change is that the overall coefficient of δX µ (d) relative to the entropy will change in a way that depends on the bulk higher curvature couplings. However, the dictionary relating γ µν to T µν also changes in a way that precisely preserves (1.2) and (1.7) [12].
Incorporating the S bulk term is simple in principle but difficult in practice to deal with. Since it is S gen that must be extremized, we have to include an extra term in the extremal surface equation of motion proportional to δS bulk /δX µ (y). That means the bulk entropy itself plays a role in determining the position of the surface. It was argued in [34] (assuming some mild falloff conditions on variations of the bulk entropy) that the presence of this source could be incorporated to all orders simply by removing the explicit bulk entropy term from (2.20). In other words, calculating δX µ (d) using the correct quantum extremal surface equation is enough to properly account for all bulk entropy contributions to the total entropy variation. At order-one in the large-N expansion this prescription agrees with our analysis above, as it must. Beyond this, the most we can say about the contributions of the entropy are arguments of the type given above in Section 3. While this is a potential loophole in our arguments, we still believe that our evidence suggests that new contributions to (1.2) and (1.7) do not appear.

Local Conditions On ∂R Are Enough
We now briefly discuss why we expect that we can relax the stationarity conditions on the entangling surface to hold just in the vicinity of the deformation point. We will focus on the null-null case, but a similar result should hold in the non-null case (where it should also be true that our restriction on expectation values for operators with ∆ < d/2 is allowed to be local).
We can analyze the source (4.6) in a little more detail in the case where we only impose local stationarity near y = y 0 . Even though in position spaceŪ (y 0 , z) does not contain any state-independent terms at low orders in the z-expansion near, the inherent non-locality of the Fourier transformŪ (k, z) will contain those terms. There are two ways this could affect (4.6): through δΨ = δŪ or through the h-factor. In either case, the large k limit reduces to the problem back to the globally-stationary setup.
For example, by setting δV (k) = e iky 0 we can isolate the part of δU (d) that gives a δ-function localized at y = y 0 . Then the important part of δV (i.e., the state-independent part) is δV (k, z) = e iky 0 2 Then we can organize (4.6) as a derivative expansion of h, with the leading term given by δJ(k, z) ∼ e iky 0 h(z, y 0 )(kz) d/2 K d/2 (kz), (6.3) and the remaining terms suppressed by powers of k. In other words, the integral over k in (4.6) combined with the (k − k )-dependence of δV essentually returns h to position space localized near y = y 0 . Only the first d derivatives of h at y = y 0 will be relevant at large k, so only the first d derivatives of U need to be set equal to zero at y = y 0 in order for the large-k behavior to match the case where U vanishes identically. Thus it is enough to have entangling surfaces which are in the u = 0 plane up to order d in y − y 0 . Note, this crude analysis does not strictly apply if the entangling surface cannot be globally written in terms of functions U (y), V (y). For example, an entangling surface which is topologically a sphere does not fall within the regime of our arguments. We leave an analysis of those types of regions for future work.

Curved Backgrounds
It is interesting to ask what happens to this proof when the boundary spacetime is curved. Our arguments make it clear that S µν is completely determined by local properties of the state in the bulk and on the boundary. So naturally one would expect that there is a curved-space analogue of the same formula. In [12,35], several local conditions on the entangling surface and spacetime curvature were found such that the QNEC would hold in curved space and be manifestly scheme-independent. We would expect that under those same conditions one could show that S vv = 2π T vv . Non-null variations in a curved background have yet to be explored, and it would be interesting to investigate aspects of the curved background setup in more detail.

Connections to the QFC and Gravity
An interesting application of our result is to the interpretation of Einstein's equations. Combining (1.7) with Einstein's equations leads to an explicit formula relating geometry to entropy. This result is the latest in a growing trend of connections between geometry and entanglement [36][37][38][39][40][41][42].
We can make a direct connection with the deep result by Jacobson of the Einstein equation of state [14]. There it was argued that Einstein's equations were equivalent to a statement of thermal equilibrium across an arbitrary local Rindler horizon, namely the equation δQ = T δS, together with an assumption that S is proportional to area. This argument used a thermodynamic definition of the entropy without mentioning quantum entanglement. We can give this result a modern interpretation with the equation S vv = 2π T vv .
The connection to our result is most easily phrased in terms of the generalized entropy for a field theory coupled to gravity, which is defined as S gen = S Dong + S ren . (6.4) Here G N is the renormalized Newton's constant, and S ren is the renormalized entropy of the field theory system restricted to a region, and S Dong is the same geometric functional of the boundary of the region introduced in Section 6.1, and which at leading order is Area/4G N . Variations of this quantity were considered in [16], where the conjecture S gen,vv ≤ 0 was dubbed the Quantum Focusing Conjecture (QFC). Inspired by the arguments of [14], we will consider evaluating S gen,vv on a surface passing through a given point in an arbitrary spacetimem where v now denotes a null direction of our choosing. We will want to make sure that the surface is as close to stationary as possible in the v direction. It is always possible to make the expansion and shear of our surface vanish at the chosen point, but generically these quantities will have nonzero derivatives along the surface. In order to keep our calculations well-defined, and avoid potential violations of the QFC [13], we should consider deformations which are integrated over at least a Planck-sized region of the surface [43]. While not strictly a δfunction, if the mass scales governing the matter sector are must less than the Planck scale then for all practical purposes this is the same as a δ-function deformation from the point of view of the matter entropy. The result of doing this type of deformation is [44] 4G N S gen,vv = −R vv + 4G N S ren,vv + O( 2 /L 4 ), (6.5) where L is the characteristic scale of the background geometry and is the Planck scale (or whatever other cutoff scale is appropriate for the effective gravitational theory). The corrections at order 2 /L 4 come both from higher curvature corrections present in S Dong beyond the Area/4G N term, as well as from the generic non-zero derivatives of the expansion and shear at the central point of the deformation. Now suppose we imposed the principle that 4G N S gen,vv is always of order 2 /L 4 , which is much smaller than the size 1/L 2 of the first term −R vv . Then it must be that this large contribution is canceled by 4G N S ren,vv , which by our result above (or, more precisely, by the appropriate curved-space generalization) is equal to 8πG N T vv . In other words, we would be imposing This is the leading-order part of the full gravitational equations of motion, up to an unknown cosmological constant term coming from our restriction to null variations. The argument can also be run the other way, so that Einstein's equations, interpreted as the leading order part of the gravitational equations of motion, become equivalent to the statement 4G N S gen,vv = O( 2 /L 4 ). (6.7) We have essentially retraced the steps of [14], replacing the Jacobson's original assumption of δQ = T δS with the this statement about the generalized entropy, together with (1.2).

Proof for General CFTs
We view our results as sufficient motivation to look for a proof of (1.7) and (1.2) in general field theories. In conformal field theories, entanglement entropy can be calculated using the replica trick. A replicated CFT is equivalent to a CFT with a twist defect. Within the technology of defect CFTs, shape deformations of entropy is generated by displacement operators (see [8] for a review of these concepts). The variation δ 2 S/δV (y)δV (y ) then is related to the OPE structure of displacement operators in this setup. Since the coefficient of the delta function piece in (1.1) is fixed to have dimension d and spin 2, one might be able to see that only the stress tensor could appear as a local operator in S vv . It further needs to be shown that no other non-linear (in the state) contributions could appear in S vv . Results in that direction will be reported in future work [45].

A Connections to the ANEC
In A.1 we briefly review the connection between the relative entropy and the ANEC. Equation (1.2) then implies an interesting connection between the off-diagonal second variation of the entropy and the ANEC. In A.2 we analyze this result in more detail for holographic field theory states dual to perturbative bulk geometries.

A.1 ANEC and Relative Entropy
As in Section 2.1, the region R is a region whose boundary ∂R lies in the u = 0 plane. We also consider a one-parameter family of such regions, indexed by λ, with the convention that increasing λ makes the R smaller. In this section we will focus on a globally pure state reduced to these regions. The relative entropy (with respect to the vacuum) and its first two derivatives obey the following set of alternating inequalities: The first two of these are general properties of relative entropy in quantum mechanics, known as the positivity and monotonicity of relative entropy, respectively. The third inequality is the QNEC together with strong subadditivity. We can also consider the entropyS and relative entropyS rel of the complement of R, which we will denote byR. Since we specified that the global state is pure, we haveS = S. The set of inequalities obeyed byS rel is From (2.6) and the analogous equation forS rel , together with the monotonicity of relative entropy inequalities, we can conclude This is the ANEC, and its connection to relative entropy was first pointed out in [4,46]. The relation (A.3) has interesting implications. Note that the integral of T vv is completely independent of λ. If we let λ → ∞, it must be the case that dS rel /dλ → 0 or else positivity of relative entropy will be violated. Similarly, as λ → −∞ we must have dS rel /dλ → 0. Then we can say From the definition of relative entropy, this means that (A.5) So the diagonal and off-diagonal parts of the second variation entropy contribute equally when integrated over the entire one-parameter family of surface deformations. Since there are two y integrals on the RHS of (A.5), naïvely one might have thought that a limiting case forV (y) existed which caused the RHS of this equation to vanish while leaving the LHS finite, but this is not true. We will say more about the order-of-limits involved in the holographic context below. Applying the relation S vv = 2π T vv we see that, after integration, the off-diagonal variations can be related back to the ANEC: This is a nontrivial consequence of (1.2). Note that δ 2 S od /δV (y)δV (y ) ≤ 0 by strong subadditivity [16].

A.2 ANEC in a Perturbative Bulk
In this section we will investigate (A.6) in AdS/CFT for perturbative bulk states. Once again, we will drop the contributions of S bulk for simplicity. This amounts to considering coherent states in the bulk. From (3.4), we can see that for perturbative classical bulk states the bulk boost energy completely accounts for the off-diagonal entropy variation. Then from (3.7) we get As a consequence of (A.6) we then have the equation This is a nontrivial matching between the ANEC on the boundary and an associated ANEC in the bulk, made possible by the relationship betweenV andV that comes from solving the extremal surface equation: We can get some intuition for these equations by considering shockwave solutions in the bulk.
Superpositions of Shockwaves At linear order in the bulk perturbations we can take superpositions of shockwaves. This allows us to create any bulk and boundary bulk stress tensor profile along the u = 0 plane, and in that sense represents the most general state for the purpose of this calculation. The bulk and boundary stress tensors would be and The single shockwave is the special case ρ = Eδ(v)δ d−2 (y)δ(z − z 0 ). We can repeat some of the calculations we did before, but qualitatively the results will be the same. The deformed bulk extremal surface always "lags behind" the deformed entangling surface in a way that depends on z and the width of the deformation, and as a result the bulk energy flux at finite deformation parameters will always be less than the boundary energy flux. Taking the deformation width to zero at finite deformation parameters will cause the bulk energy flux to drop to zero. It would be interesting to characterize this behavior directly in the field theory without the bulk picture.

B Free and Weakly-Interacting Theories
Our conjectures (1.7) and (1.2) are only meant to apply to interacting theories. In this appendix we will explain how the null-null relation (1.2) is violated in free theories, and indicate how it might be fixed when interactions are included.

B.1 The Case of Free Scalars
The case of free scalar fields for entangling surfaces restricted to u = 0 was analyzed extensively in [6], and we will make use of that analysis here. As in Section 2.1 we have a one-parameter family of regions indexed by λ. The deformation velocityV (y) is taken to be a unit step-function with support on a small region of area A in the y-directions. The crucial point is to focus attention on the pencil of the u = 0 plane that is the support oḟ V (y). As λ varies, the entangling surface moves within this pencil but stays fixed outside of it.
The State and the Entropy For the purpose of constructing the state, we can model the full theory as a 1 + 1-dimensional massless chiral boson living on the pencil, together with an auxiliary system consisting of the rest of the u = 0 plane. This is the formalism of null quantization, which is reviewed in [6]. There are two facts we're going to use to write down the sate ρ(λ) on the pen-cil+auxiliary system. First, in the limit of small A, the state on the pencil becomes approximately disentangled from the auxiliary system. The fully-disentangled part A 0 part of the state looks like the vacuum, while the leading correction goes like A 1/2 and consists of single-particle states on the pencil entangled with states of the auxiliary system. The second fact is that we can always translate our state in the pencil by an amount λ so that the entangling surface is at the origin and the operators which create the state are displaced by an amount λ from their original positions. A coordinate system where the entangling surface is fixed is preferable. Putting these facts together lets us write The states |i of the auxiliary system are merely those which diagonalize the A 0 part of ρ, and the K i are numbers specifying the eigenvalues.
As indicated above the state ρ (1/2) ij (λ) should be interpreted as a state on the half-line x > 0. We can write this state in terms of a Euclidean path integral in the complex plane: where φ(x ± ) refers to boundary conditions just above/below the positive real axis. The insertion O ij (λ) is a single-field insertion which specifies the state: As in [6] we will normalize our field so that ∂φ(z)∂φ(0) vac = −1/z 2 and T vv = (∂φ) 2 /4πA. Then one can show that Q ≡ S vv − 2πT vv is given by where α ij = K i − K j and if z = re iθ with 0 ≤ θ < 2π then The quantity Q is manifestly negative, as required by the QNEC, but it is not zero.
Recovering the ANEC In Appendix A.1 we showed how one can recover the ANEC by integrating the QNEC on a globally pure state. In the present context, we don't have any off-diagonal contributions to the entropy. Instead we have the function Q, and repeating the argument above would lead us to conclude We can check this equation by integrating (B.4). Note that the assumption of global purity that was used in Appendix A.1 is crucial: the expectation value of T vv (λ) depends only on the part of the state proportional to A, which we have not specified and in principle has many independent parameters. For a globally pure state there is a relationship between that part of the state and the A 1/2 part of the state which we must exploit.
In the pencil+auxiliary model, the global Hilbert space consists of the full pencil plus a doubled auxiliary system. The doubling allows the auxiliary state to be purified. Let the global pure state by |Ψ . Then we have |Ψ = |vac ⊗ i e −πK i |i ⊗ |i + A 1/2 i,j e −πα ij /2 |Ψ ij ⊗ |i ⊗ |j + · · · (B.7) Any subsequent terms will not affect the ANEC. The factor of exp(−πα ij /2) is purely for future convenience, and the |Ψ ij are not necessarily normalized. The expectation value of the ANEC operator in this state is given by We can make contact with our earlier formulas by computing the density matrix |Ψ Ψ| and tracing over the second copy of the auxiliary system. We find that ρ (1/2) ij = Tr x<0 (|Ψ ij vac| + |vac Ψ ji |) .

(B.9)
This lets us identify the part of O ij in the lower half-plane as the operator which creates |Ψ ij . Then, in our previous notation, we find (B.10) Our job now is to reproduce this by integrating (B.4) with respect to λ. The main identity we will need is ∞ −∞ dλ (z − λ) 2−iα ij (w * − λ) 2+iα ij = 4ie −2πα ij sinh πα ij α ij (1 + α 2 ij )(w * − z) 3 e πα ij Θ(τ )Θ(τ ) − e −πα ij Θ(−τ )Θ(−τ ) . (B.11) Using this formula, the integral of (B.4) splits into two terms. We may combine them by exchanging i and j in the first term, leaving us with dλ Q(λ) = −2πi ij dxdτ dx dτ ψ ij (x, τ )ψ ij (x , τ ) * (w * − z) 3 e πα ij Θ(τ )Θ(τ ) − e −πα ij Θ(−τ )Θ(−τ ) Coherent States For coherent states we obtain a correspondence between Q and T vv without integrating over λ. This must be true because coherent states satisfy S vv = 0, but it is reassuring to see it happen explicitly. In a coherent state of the original d-dimensional theory, the pencil and auxiliary system factorize and the pencil is in a 1 + 1-dimensional coherent state. In other words, we have We can obtain Q for this state by taking the general equation (B.4) specializing to the case where ψ ij = ψδ ij exp(−πK i ). Making use of the normalization condition i exp(−2πK i ) = 1 we find the simple expression (B.14) We recognize this as simply −2π T vv coherent , as expected.

B.2 Weakly Interacting Effective Field Theories
In the main text we provided evidence for that S vv = 2π T vv for interacting theories, but in the previous section we explained that for free theories Q = S vv − 2π T vv was nonzero, and in fact could be quite large. In this section we will show how we can transition from S vv = 2π T vv to S vv = 2π T vv when a weak coupling is turned on. 16 The essential point is that one should always consider the total variation d 2 S/dλ 2 as the primary physical quantity. S vv is a derived quantity obtained by considering a limiting case of arbitrarily thin deformations. However, a weakly-coupled effective field theory in the IR comes with a cutoff scale , and we cannot reliably compute d 2 S/dλ 2 for deformations of width . Now we will see how this can resolve the issue. In the free theory, as we have explained above, the second functional derivative of the entropy has the form δ 2 S free δV (y)δV (y ) = 2π T vv δ (d−2) (y − y ) + Qδ (d−2) (y − y ) + δ 2 S δV (y)δV (y ) od . (B.15) The function Q is related to the square of the expectation value of the field ∂φ. This is especially obvious in the formula for the coherent state, (B.14), but the more general formula is essentially of the same form. In a free theory (∂φ) 2 has dimension d and is exactly of the right form to contribute to a δ-function. This fact was touched upon in the Introduction. When we turn on a weak coupling g, the dimension of φ will shift to ∆ φ = (d − 2)/2 + γ(g). 17 There will still be a term in the second variation of the entropy associated to (∂φ) 2 , which we will call Q g , but now it no longer comes with a δ-function: δ 2 S g δV (y)δV (y ) = 2π T vv δ (d−2) (y − y ) + Q g f g (y − y ) + (other off-diagonal terms) . (B.16) Here f g is some function of mass dimension d − 2 − 2γ which limits to a δ-function as g → 0, such as f g (y) ∼ γ/y d−2−2γ . So the Q g term has migrated from the δ-function to the off-diagonal part of the entropy variation. Now consider integrating (B.16) twice against a deformation profile of width and unit height to get a total second derivative of the entropy. Suppose that is very small compared to the length scales of the state, but still large compared to the cutoff . Then we have d 2 S g dλ 2 = 2π T vv d−2 + Q g d−2+2γ + (other smeared off-diagonal terms) . (B.17) We can write Q g ∼ QM 2γ , where M is a mass scale characterizing the state and Q is what we get in the g → 0 limit. So at weak coupling, we can say that Q g d−2+2γ ∼ Q d−2 (1 + 2γ log M + · · · ) . (B.18) Thus we find that the answer for the weakly-coupled theory is approximately the same as for the free theory, as long as γ log M 1. The smallest we can make is of order the cutoff , and the condition that γ log M remain small is analogous to the problem of large logarithms in perturbation theory. The renormalization group is typically used to get around the problem of large logarithms, and it would be interesting to apply those same ideas to the present situation.
This argument hints that for general effective field theories S vv may not have a good operational meaning in terms of physical observables. The relevant condition for isolating the δ-function is that (M ) 2γ 1 should be possible within the effective description. Clearly this can be done in an exact CFT with finite anomalous dimensions, but it should also be possible if the theory is approximately given by an interacting CFT over some large range of length scales. For instance, if an interacting CFT is weakly coupled to gravity and we consider states with energy M much less than the Planck scale then it should be possible to have (M ) 2γ 1 while maintaining Planck . Finally, a more precise version of the arguments given above can be given by interpreting the second functional derivative of the entropy as an OPE. We hope to use these techniques to find the exact form of f g in future work [45].