The Second Law of Quantum Complexity

We give arguments for the existence of a thermodynamics of quantum complexity that includes a"Second Law of Complexity". To guide us, we derive a correspondence between the computational (circuit) complexity of a quantum system of $K$ qubits, and the positional entropy of a related classical system with $2^K$ degrees of freedom. We also argue that the kinetic entropy of the classical system is equivalent to the Kolmogorov complexity of the quantum Hamiltonian. We observe that the expected pattern of growth of the complexity of the quantum system parallels the growth of entropy of the classical system. We argue that the property of having less-than-maximal complexity (uncomplexity) is a resource that can be expended to perform directed quantum computation. Although this paper is not primarily about black holes, we find a surprising interpretation of the uncomplexity-resource as the accessible volume of spacetime behind a black hole horizon.


Quantum Complexity and Classical Entropy
Complexity theory, particularly its quantum version, is a new and relatively unknown mathematical subject to most physicists 1 . It's a difficult subject with few quantitative results and, at least for the moment, no experimental guidance. Our original interest in complexity began with questions about black holes [1,2,3], but broadened into the issue of what happens to quantum systems between the time they reach maximum entropy and the much later time they reach maximum complexity.
The mainstream goals of complexity theory are to organize tasks into broad qualitative complexity classes. Our main focus will be somewhat different. Our concern is with the quantitative behavior of complexity as a system evolves. The two types of questions are by no means unconnected but they are different and probably require different tools. In this paper we will consider whether physics-especially statistical mechanics and thermodynamics-may be useful for analyzing the growth and evolution of complexity in generic quantum systems.
In particular we are interested in whether there is an analog, involving quantum complexity, for the second law of thermodynamics. In [4,5] such a Second Law of Complexity was conjectured, and invoked for the purposes of diagnosing the transparency of horizons, i.e., the absence of firewalls [6]. It was argued [5] that opaque horizons with firewalls are associated with states of decreasing complexity, and that as long as the complexity of the quantum state increases, the horizon will be transparent. A Second Law of Complexity would ensure that a black hole formed from natural processes will have increasing complexity and therefore a transparent horizon, at least for an exponentially long time.
In this paper we argue that the Second Law of Complexity for a quantum system Q is a consequence of the second law of thermodynamics for an auxiliary classical system A.
Two distinct notions of quantum complexity will be discussed. The first, denoted C, is computational complexity, also called circuit complexity or gate complexity. It measures the minimum number of gates required to prepare a given unitary operator or a given state 2 from an unentangled product state. The second is Kolmogorov complexity, denoted C κ , whose relevance will become clear in Sec. 6.4.

The Evolution of Complexity
The object of interest is the time-development operator U (t) = e −iHt , for a generic klocal system of the type that model black holes. The question of interest is how the computational complexity 3 of U (t) evolves with time. Both black hole and quantum circuit considerations suggest the following conjecture summarized in Fig. 1. Figure 1: The conjectured evolution of quantum complexity of the operator e −iHt , where H is a generic time-independent k-local Hamiltonian. The complexity increases with rate K, and then saturates at a value exponential in K. It fluctuates around this value. Quantum recurrences occur on a timescale that is double-exponential in K; a very rare, very large fluctuation brings the complexity down to near zero. This figure would also describe the entropy of a classical chaotic system with exp[K] degrees of freedom.
The complexity C(t) grows linearly as for a time exponential in K. At t ∼ e K the complexity reaches its maximum possible value C max and flattens out for a very long time. This is the period of complexity equilibrium [5] 3 The concept of computational complexity that we are using is essentially the same as quantum circuit complexity, i.e., the minimal number of quantum gates needed to prepare a given unitary operator.
during which the complexity fluctuates about the maximum, (1.2) On a much longer timescale of order exp[e K ] quantum recurrences quasiperiodically return the complexity to sub-exponential values. All of this is a conjecture which at the moment cannot be proved, but which can be related to other complexity conjectures [7]. The pattern described above is reminiscent of the evolution of classical entropy. Starting a classical system in a configuration of low entropy (all the molecules in the corner of the room) the subsequent evolution, as the gas comes to equilibrium, follows a similar curve to Fig. 1, but for entropy-not complexity. However, for the classical case the linear growth of entropy will persist for only a time polynomial (in the number of degrees of freedom), the maximum entropy will also be of order the number of degrees of freedom, and the recurrence time will be simply exponential and not doubly exponential.
A simple and concise way to express the parallel is: The quantum complexity for a system of K qubits behaves in a manner similar to the entropy of a classical system with 2 K degrees of freedom.
The primary goal of this paper is to understand this similarity.
In [4] a two-dimensional toy analog model for complexity was conjectured. (We recommend that the reader first read [4] before this paper.) The motivation for the toy model was Nielsen's geometric approach to complexity. Here we are going to consider the far more complex case based directly on a version of Nielsen's high-dimensional geometry [8] [9].
Another goal that we discuss is the construction of a resource theory of complexity in which the relevant thermodynamic resource would be the gap between the complexity of a system and the maximum possible complexity-the 'uncomplexity'. This is expended by performing directed quantum computation, which means reducing the relative complexity of the initial state and the target state. We suggest that this resource can, under appropriate conditions, be used to do directed quantum computation in much the same way that in conventional thermodynamics free energy is used to do directed work. We will have more to say about this in Sec. 8.
A guide to the notations and conventions and units used in this paper can be found in Appendix A.
2 The Quantum System Q

Randomness
There are many problems in both classical and quantum physics that are extremely difficult when particular instances of the problem are considered. The strategy of averaging over ensembles of instances sometimes allows conclusions to be drawn about generic behavior that would not be possible for specific cases. A particular example, which has generated recent interest, is the SYK approach to scrambling. By averaging over an appropriate ensemble of time-independent Hamiltonians it is possible to show that almost all such Hamiltonians saturate the fast-scrambling bound [10] [11]. Potentially this kind of averaging can also be applied to questions about the evolution of complexity.
Another type of randomness is stochastic randomness in which a time-dependent statistically fluctuating (noisy) Hamiltonian is averaged over. Generally the more one averages over the easier it is to draw conclusions, and indeed stochastic averaging is easier than averaging over time-independent Hamiltonians. Of course, if our interest is in the behavior of time-independent Hamiltonians (as it is in this paper) it is not entirely clear that the lessons we learn from stochastic behavior are applicable.

k-locality
The systems we will consider are constructed from qubits and have a type of dynamics called k-local. Other than that they are very generic. The building blocks of a k-local Hamiltonian are Hermitian operators that involve at most k qubits. The term "weight" applied to an operator means the number of single qubit factors that appear in the operator. A k-local Hamiltonian is one that contains terms of no higher weight than k. Ordinary lattice Hamiltonians with nearest neighbor couplings are k local-in fact they are 2-local. But k-locality does not assume any kind of spatial locality. For example we may have Hamiltonians in which any pair of qubits directly interact. The general form of an exactly k-local Hamiltonian built out of standard qubits is 4 H = i 1 <i 2 <...<i k a 1 ={x,y,z} . . . a k ={x,y,z} J a 1 ,a 2 ,...,a k i 1 ,i 2 ,...,i k σ a 1 i 1 σ a 2 i 2 .... σ a k i k . (2.1) The SYK model is another [12,13,14,15] type of k-local system which is built out of real anti-commuting degrees of freedom χ i , (2. 2) The SYK Hamiltonian is similar to Eq. 2.1, 3) The SYK model is k-local when written in terms of fermions, but if we try to rewrite it in terms of standard qubit operators it will be highly non-local. Despite this, most of what we describe applies to it. For definiteness we will illustrate the principles for systems of standard qubits with Hamiltonians of the form Eq. 2.1.
There is however a caveat. The SYK model is usually studied at low temperature where it has an approximate conformal invariance and behaves roughly like a near-extremal charged black hole. At low temperature, standard qubit models are different; for example they may have spin-glass behavior. Our interest will instead be in the high-temperature behavior where we expect both kinds of models behave somewhat similarly to uncharged Schwarzschild black holes. At high temperature the conformal invariance does not play a role.
It should be noted that for systems of fermions or qubits the high-temperature limit is not a high-energy limit. The energy and entropy per qubit do not go to infinity at infinite temperature; in fact the entropy per degree of freedom does not change much between the usual SYK low-temperature regime and infinite temperature. It is also true that the ratio of the Lyapunov exponent to the energy-per-qubit tends to a finite constant as temperature increases. This is in contrast to the ratio of the Lyapunov exponent to temperature, which goes to zero in the high-temperature limit. This means that at higher temperatures the Maldacena-Shenker-Stanford bound [11] is not tight and a stronger bound might be expressed in terms of the energy-per-qubit rather than the temperature.
Hamiltonians of the type Eq. 2.1 are very common. They include lattice systems, for which the couplings are non-zero only for nearest neighbors on some ordinary lattice. But these "condensed matter" Hamiltonians are very rare in the space of the couplings. The generic k-local Hamiltonian is a fast-scrambler [16] [10], meaning that every qubit is coupled to every other qubit, but only through terms involving at most k qubits. We will be interested in this generic case. Averages over the J's will be dominated by fast scramblers.
For now assume that the J's are known definite numbers, but keep in mind that the trick of averaging over Hamiltonians may make otherwise impossible problems tractable. To simplify the notation we will write Eq. 2.1 in the schematic form where I runs over all (4 K − 1) generalized Pauli operators 5 with the proviso that only the k-local couplings are non-zero.

The Quantum System
The quantum system Q consists of K qubits interacting through a k-local Hamiltonian of the form Eq. 2.1. Adapting the discussion to fermionic degrees of freedom, as in Eq. 2.3, should be straightforward. We will not be interested in any particular Hamiltonian; following Sachdev-Ye and Kitaev we will consider the properties of the system when averaged over a Gaussian statistical ensemble of the J-coefficients. The probability for the k-local couplings J I to take a specified set of values is, The constant B a determines the variance of the distribution. The non-k-local couplings are assumed to be zero. 5 By the generalized Pauli operators we mean the set of 3K Pauli operators σ a i together with all possible products, with no locality restrictions.
The space of states is 2 K dimensional. Unimodular unitary operators are represented by 2 K × 2 K matrices in SU (2 K ). These matrices can be thought of in two ways. The first is as operators acting on the state space of the K qubits. The second is as wavefunctions of maximally entangled states of 2K qubits. In this latter sense the identity matrix represents a thermofield-double (TFD) state at infinite temperature. As such it is dominated by the highest energy states of the system.
Let us consider the variance of the Hamiltonian in the infinite temperature TFD state, (2.6) (Here and throughout this paper we normalize Tr so that Tr 1 = 1.) The generalized Pauli matrices σ I satisfy Note that the average of the Hamiltonian itself is zero since all the terms in H have zero trace. We will use the notation E to represent the energy relative to the ground state. This is not zero. The variance of E is the same as the variance of H. The normalization of J is a convention related to the normalization of time. We choose it by observing that fast scramblers are models for neutral static black holes. It is a general fact about such black holes that their dimensionless Rindler energy E (defined relative to the ground state), and the variance of the dimensionless energy (∆E) 2 are both equal to the entropy. For the infinite temperature TFD state the entropy of each copy is S = K. It follows that the distribution of J's should satisfy where the average in av is an ensemble average.
If the Hamiltonian is exactly k-local then the number of J-coefficients is Letting J 2 be the variance of any of the J I , Eq. 2.9 gives (2.11) The same argument, when applied to the SYK model, correctly gives the variance, the only difference being the absence of the factor 3 −k in the SYK case 6 . The relevance of these facts will become clear when we study the thermodynamics of the auxiliary system A.
Hamiltonians of the form Eq. 2.4 can easily be generalized to stochastic evolution by allowing the J's to have a time-dependence governed by a stochastic probability distribution. The resulting "Brownian circuits" were discussed in [17].

The Classical System A
For the moment we will ignore issues of complexity and define a classical system that represents the evolution of a quantum system as the motion of a non-relativistic particle moving on SU (2 K ). Later we will modify the geometry to a "complexity geometry" along the lines of [9].
The space SU (2 K ) is a homogeneous group space generated by (4 K − 1) generators which in the Pauli basis are the generalized Pauli operators σ I . Each point on SU (2 K ) corresponds to an element of SU (2 K ): it is a particular 2 K by 2 K unimodular matrix U. Up to an overall constant factor, the unique bi-invariant metric is given by 7 , This metric is called 'bi-invariant' since it is invariant under both left-and right-multiplication: for any W ∈ SU (2 K ) and V ∈ SU (2 K ), transforming U by The factors of 3 are due to there being three Pauli matrices.
7 Throughout this paper the notation Tr refers to normalized trace such that Tr 1 = 1.
does not change the metric distance in Eq. 3.1.

Equations of Motion
The time evolution of the system with Hamiltonian Eq. 2.4 defines a moving point U (t) which we may think of as the motion of a fictitious particle moving on SU (2 K ). The particle starts at the point U = 1, i.e., the identity matrix. The motion can be represented by ordinary classical mechanics. Begin with the Schrodinger equation for U (t), This is a first order (in time) equation and, given the Hamiltonian, through every point U there is a unique trajectory. We would like to write this in a way that does not make reference to a specific Hamiltonian. To that end we first solve for H, This is the second order equation of motion of a non-relativistic particle moving on SU (2 K ). It is well known that such motion is along geodesics with constant velocity. In terms of general coordinates the equation of motion has the familiar form where Γ M AB are the Christoffel symbols derived from the standard metric on SU (2 K ).
In summary, there are two ways to specify the evolution of a unitary under a timeindependent Hamiltonian. One is to specify U (t = 0) and H and use the first-order equation of motion Eq. 3.3. The second is to specify U (t = 0) andU (t = 0) and use the second-order equation of motion Eq. 3.5. These two formulations are equivalent, since we can translate between H andU using Eq. 3.4. In what follows we will find it more convenient to work with the second-order equation of motion, which makes no explicit reference to H and instead stores that information in the initial conditionU .

Velocity-Coupling Correspondence
Note that the equation of motion Eq. 3.5 no longer makes reference to the Hamiltonian. That information is now encoded in the initial conditions. To see how this works we write the Hamiltonian Eq. 2.1 as The Schrodinger equation takes the forṁ We can easily solve for J I , At the origin U = 1 we may write The left side of this equation is the projection of the initial velocity onto the tangent space axes oriented along the Pauli basis. In other words the J I are the initial values of the velocity components V I , (3.11) We'll call Eq. 3.11 Velocity-Coupling correspondence, or just V/J-correspondence. A point to emphasize is that the classical mechanics described by the equation of motion Eq. 3.8 is not the theory of any particular Hamiltonian. It is the theory of all Hamiltonians of the form Eq. 2.1 with the J's playing the role of initial velocities.

Action
The equations of motion Eq. 3.6, or equivalently Eq. 3.5, may be derived from an action 8 , In terms of U , this action has the simple form

Conservation Laws
The A-system has a conserved Hamiltonian which is not to be identified with the Hamiltonian of the Q-system (namely Eq. 2.1). From the form of the Lagrangian one finds that the auxiliary energy is the same as the Lagrangian, The energy is of course just the familiar non-relativistic expression for a particle of unit mass, The other conservation laws follow from the bi-invariance of the metric. Invariance under right multiplication gives rise to conservation of the matrix elements of the right charges, The left charges, are also conserved, but they are not functionally independent of the right charges-the Q L I s can be written in terms of the Q R I s and the matrices U . 8 The subscript a refers to the auxiliary system; for a guide to conventions see Appendix A.

Ergodicity
Naively we might expect the motion generated by a generic time-independent k-local Hamiltonian to be ergodic on SU (2 K ). But in fact the motion is very far from ergodic. To see this, consider writing e −iHt in the energy basis For a given Hamiltonian there are 2 K energy eigenvalues and it follows that U moves on a torus of dimension 2 K . This is much smaller than the dimension of SU (2 K ), which is 4 K . The particular torus defined by Eq. 3.18 depends on the Hamiltonian. We may ask how big a space is swept out by varying over all Hamiltonians of the form Eq. 2.1. Specifically does varying over Hamiltonians lead to an almost space-filling set on SU (2 K )? The answer is no; the number of parameters specifying H (the J's) is polynomial in K and given by Eq. 2.10. Thus for a given k the dimension of the set covered by k-local evolution is only slightly bigger than a 2 K -dimensional subset.
On the other hand we may ask: For each Hamiltonian is the motion on the 2 K -torus ergodic? Generically the answer is yes. Ergodicity is equivalent to the incommensurability of the energy eigenvalues, a condition which will be satisfied for almost all members of the ensemble of J's.
To summarize, while the A-system is formally defined on a 4 K -dimensional configuration space, the effective dimension of the system is actually much smaller ∼ 2 K .
In Sec. 2.1 we explained that by starting with a random time-dependent quantum Hamiltonian, a stochastic system can be defined. That stochastic system can be thought of as a classical stochastic version of the auxiliary system A. Reference [17] refers to such systems as Brownian circuits. In that case, since the Hamiltonian is now time-dependent, the motion on SU (2 K ) is a random walk not restricted to a torus-it fills up all 4 K dimensions and is ergodic on SU (2 K ).

The Distance Between Quantum States
Consider the question: how far apart are two quantum states |A and |B ? The usual measure of the distance between them is defined by (4.1) The distance d AB is bounded between 0 (when the two states are the same) and π/2 (when the two states are orthogonal On the other hand we can consider two states |A and |B in which all of the qubits are mixed up (scrambled) by two very different scrambling operations. These two states would also be orthogonal, and therefore no further apart than |A and |B . But clearly there is some sense in which |A and |B are much further apart than |A and |B . The inner product distance of Eq. 4.1 fails to capture this difference.
The difference between the two senses of distance has operational consequences. Consider the first case with |A and |B : it is not hard to create a coherent superposition of states, α|A + β|B ; nor is it hard to do an interference experiment that is sensitive to the relative phase of α and β; and nor is it hard to cause a transition between |A and |B . But doing any of these three things with |A and |B would be extremely difficult.
Distances according to the Fubini-Study metric of Eq. 4.1 are conserved under time evolution: the inner product between U |A and U |B is the same as between |A and |B . But that does not mean that if they start easy to interfere, they will remain so: large differences between initially similar states can be created merely by the passage of time. Let's take the states |A and |B which are in some sense similar, although orthogonal. Now let's evolve them both by some generic Hamiltonian that allows all the qubits to interact. After a long time the evolved states are If the system is chaotic then the states |A and |B will be very different from one another, and also very difficult to interfere. Some kind of distance between the states will have grown very large. Moreover that distance will continue to grow long after the extra qubit has thermalized with the others. In fact it will grow until it becomes exponentially difficult to interfere the states.
Of course you could argue that the states |A and |B are easy to interfere. Just initially interfere |A and |B to make α|A + β|B and then evolve the superposition forward for time t. That is true, but the point is that this way of preparing α|A + β|B takes a very long time. With some locality assumptions we can show that there is no faster way to do it [7].
The question then is: Is there a different measure of the distance between states that captures the similarity of |A and |B , and at the same time the large difference between |A and |B ? To our knowledge this fundamental issue has not been discussed before.
Here we propose that the answer is a metric based on a concept of relative complexity.
Consider all the unitary operators that can connect the two states, The relative complexity of |A and |B may be defined as the complexity of the least complex unitary operator satisfying Eq. 4.3. This of course tells us nothing unless we have a criterion for the complexity of a unitary operator. We shall be very brief here and just remind the reader of the concept of circuit complexity. We consider all K qubit circuits composed of k-local gates that allow us to prepare U. For simplicity we take the gates to act in series, The circuit complexity of U is denoted C(U ). It is the minimum number of k-local gates that it takes to construct U in this way. It depends on the choice of allowable gates; for example it depends on k, but the dependence is rather weak and we assume that it can be accounted for.
We will demand that whenever g is an allowed gate so too is g † ; it follows that the complexities of U and U † are the same. As a consequence the relative complexity is a symmetric function of |A and |B .
Relative complexity defines a notion of distance between states-a complexity metricwhich is exactly what we want in order to know how hard it is to make transitions between states, to interfere them, and to measure the relative phases between them in a superposition 9 .
Relative complexity can also be defined for a pair of unitary operators. Let U and V be such a pair. The relative complexity of U and V is just the complexity of U † V, or equivalently V † U.
Inspired by ideas of Nielsen [8,9] we will build a new auxiliary theory, A, based on a complexity metric.

Complexity Geometry
In this subsection we examine and adapt the ideas of Nielsen et al. [8,9] about 'complexity geometry'. The idea of the complexity geometry is to make a new metric on SU(2 K ), different from the standard metric, in which the distance between two elements of SU(2 K ) reflects their relative complexity. We will have a great deal more to say about complexity geometry in a forthcoming paper [18], in which we derive some of the results quoted below, and illustrate them with simple low-dimensional examples.
There is no single unique definition of complexity, even in the context of quantum circuits. The definition depends on the allowed set of gates. For example one possibility is to allow all one-and two-qubit gates. Another is to allow up to three-qubit gates, or to choose a discrete collection of gates as long as it is universal. Each gate set gives a different quantitative measure of the complexity of a unitary operator. Since any universal gate set can be simulated by any other universal gate set, the ambiguity is multiplicative and order unity 10 .
We get a different perspective by focusing on what is not allowed. In this way of thinking we assign a very large, or even infinite, complexity to all unallowed gates. For example we may allow arbitrary gates but penalize all those with weight greater than 2, i.e., those involving more than two qubits, by assigning them a large complexity. This strategy of allowing all gates but introducing a penalty for large gates underlies Nielsen's geometric approach.
The bottom line is that there is no unique definition of circuit complexity but rather there is a family of complexity measures, which under certain conditions may be multiplicatively related.
We need a concept of complexity that is appropriate for continuous Hamiltonian systems, and which matches expectations summarized by the toy model of [4]. Nielsen's idea of a geometry of computation-from now on called complexity geometry-is a good starting point. By a complexity geometry we will mean a non-standard metric on SU (2 K ) such that the minimum geodesic distance between points U and V is proportional to the relative complexity (or complexity distance) between them. Here are some of the features that such a geometry should have: • It should be a geometry on SU (2 K ). The evolution of U (t) defines a path on SU (2 K ).
For a discrete quantum circuit the path consists of a sequence of discrete segments. The segments represent individual gates in the case of a series circuit, or K/k gates for a parallel Hayden-Preskill circuit 11 .
For continuous Hamiltonian systems discrete paths are replaced by continuous paths generated by possibly time-dependent Hamiltonians. Figure 2 shows schematic representations of discrete and continuous paths through SU (2 K ). Figure 2: The shaded regions in these figures depict the space SU (2 K ). The broken trajectory represents the evolution of a discrete quantum circuit and the smooth curve represents Hamiltonian evolution. In both cases computational complexity is identified with the shortest path between the identity and the unitary operator U.
• Hayden-Preskill circuits exhibit an effect called the switchback effect [2,3]. The switchback effect is closely related to scrambling [10]. This same effect appears in the complexity-action duality of [19,20] and we regard it as an important requirement that the geometry of computation should reproduce it. As we will see Nielsen's original complexity geometry fails in this respect and requires significant modification.
• The distance function should satisfy the triangle inequality. Given two unitary operators U and V the complexity of the product U V should be less than or equal to the sum of the complexities of the two operators. This follows from the definition of complexity. The triangle inequality is not enough to prove that complexity geometry is Riemannian, but we will assume that it is.
• The geometry should be right-invariant. Consider the construction of U by a sequence of gates in time order starting with the unitary operator W : (4.5) The relative complexity of U and W is the minimum number of gates that satisfies this equation. Now multiply on the right by an arbitrary unitary V , It follows that the relative complexity of U V and W V is the same as the relative complexity of U and W . This is not true of V U and V W. To see this we write The operator V g N g N −1 .....g 1 V † will generally not be a product of N gates. Thus the complexity distance is right-invariant but not left-invariant. Right-invariance is enough to ensure that the geometry is homogeneous.
• All right-invariant metrics are parameterized by a symmetric "moment of inertia" tensor 12 I IJ , in terms of which the metric has the form [21] The metric should penalize motions along directions σ I that are themselves highly complex. This is the analog of what in circuit theory corresponds to the requirement that gates be simple. Thus we require the metric distance along non-k-local directions to be increased relative to k-local directions. This is accomplished by an appropriate choice of I IJ . The ambiguity in choosing I is analogous to the ambiguity in the choice of a gate set for circuit complexity.
The matrix I IJ should be chosen block diagonal with one block corresponding to the unpenalized k-local directions, and the other block corresponding to directions σ I containing more than k single qubit operators. Being unpenalized, the k-local block is naturally taken to be the unit matrix with eigenvalues all equal to 1. The non-k-local block (the penalized block) should be positive definite, with eigenvalues greater than 1. The eigenvalues should increase as the weights of the σ I increase.
• It was shown in [4] that consistent descriptions of scrambling and the switchback 12 The parallel with the equations for an asymmetric rigid body is intentional. The case K = 1 is mathematically the same as an ordinary rigid body in three dimensions, since SO(3) = SU(2)/Z 2 . The matrix I would be the moment of inertia tensor and dΩ/dt would be the angular velocity vector. The symmetric rigid body corresponds to Eq. A.1. We will have more to say about this in [18]. effect require that generic sectional curvatures be negative, and of order 13 1/K. If no penalty is imposed, in other words if I IJ = δ IJ , the metric is bi-invariant and all sections are positively curved. The introduction of penalty factors tends to make the sectional curvatures negative, but it is not obvious that the natural order of magnitude is 1/K. In Sec. 4.4 we will show that is indeed the case.
The original version of complexity geometry in [8][9] fails badly in this last respect. In our present notation the proposal is equivalent to choosing the non-k-local block to be diagonal with all eigenvalues being equal to the enormously large value 4 K . As we will see this has the effect of making typical sectional curvatures negative (that's good) but huge ∼ 4 K (not good). This is a far cry from the sectional curvatures ∼ 1/K required by [4] in order to reproduce the switchback effect. The penalty assumed in [9] is much too draconian and must be made more moderate.

A More Moderate Penalty
The reason why [9] chooses the penalty factor for non-k-local operators to be of order 4 K is that the most complex unitary operators have complexity of order 4 K . In order to insure that this is reflected in the properties of the complexity metric, the authors simply penalize all non-k-local operators by the common factor 4 K . It is certainly true that the highest weight operators should be penalized by such a factor but the switchback effect requires a much more gradual growth of the penalty as the weight of σ I increases. This will be seen in Sec. 4.4.
Let w I be the weight of the generalized Pauli operator σ I . We'll assume that the moment of inertia tensor is diagonal, For w I ≤ k the coefficients I(w I ) = 1. Our basic assumption is that the penalty factors I(w) for w < K are independent of K. In other words the price that we pay for moving along the direction I is independent of the total number of qubits and depends only on the weight of σ I . 13 At first sight this seem inconsistent with [4] where we claimed the curvature should be 1/K 2 . The reason for the discrepancy is that in [4] we assumed complexity is geodesic length rather than action as in Sec. 5.2. The factor of √ K relating length and action in Eq. 5.7 accounts for the difference. This is explained in Appendix C.
For w I > k we assume the eigenvalues smoothly increase from order 1 to order 4 K . A simple behavior would be with c some constant of order one.
We will now show that for 14 k = 2 the typical sectional curvatures are indeed negative and of order 1/K as required by the switchback effect.
In a companion paper devoted to quantitate aspects of complexity geometry [18], we will calculate geometric properties of the complexity metric for various k-local systems.
In the next section we will show the answer for one particular such system, and show how the sectional curvature is typically negative and of order 1/K.

Sectional Curvature
Our intuition for how complexity geometry should work is based on the two-dimensional toy model of [3] and [4]. We will now argue that the toy geometry can be thought of as being embedded as a two-dimensional section of the full complexity geometry.
Certain basic facts about the evolution of complexity, including the switchback effect, can be summarized by two properties of such sections (that are briefly reviewed in Appendix C): the typical sectional curvatures are negative; and the magnitude of the sectional curvature should be of order 1/K.
We will now compute the two-dimensional sectional curvatures of the complexity geometry defined by Eqs. 4.8 & 4.9, and see that they indeed are negative and of order 1/K.
Our 'section' is the two-dimensional surface consisting of all geodesics through a given point (we will label this point the 'origin') that are generated by linear combinations of two k-local Hamiltonians. For definiteness we will choose k = 2 for the remainder of this section. By definition the sectional curvature is the curvature at the origin.
Without loss of generality we may choose one Hamiltonian to be H and the other to be H + ∆dθ, where ∆ is a 2-local operator orthogonal to H and dθ is an infinitesimal angle. The surface defined in this way will generally not have zero extrinsic curvature, and so geodesics connecting two points on the section will in general take shortcuts off the surface. For the sectional curvature defined as the curvature at the origin, t = 0, this issue does not arise. : Two neighboring geodesics (in blue) leave the origin at t = 0; this corresponds to evolution under two nearby k-local Hamiltonians. The two geodesics are connected by a Jacobi vector (in red); this corresponds to the (non-k-local) connecting operator e Λ = e −iHt e i(H+∆dθ)t . As t increases the connecting Jacobi vector grows, and the acceleration of this growth rate gives the geodesic deviation. In the bi-invariant metric the geodesics always converge, but on the complexity metric the geodesics can diverge for I 3 > 4/3.
The Loschmidt echo operator e Λ is defined by (4.12) By the Baker-Cambell-Hausdorff formula, We find that, to order dθ and t 3 , this comes to (4.14) The metric distance along the infinitesimal interval defined by ∆ is where the dot-product is taken with the moment of inertia tensor I IJ as the metric 15 . The metric distance along the radial direction (the t direction) is In total, to order t 4 and order dθ 2 the metric is (4.17) In order to evaluate the weighted traces, we need to know the weights of the operators involved. The first two terms are easy-since both H and ∆ are by assumption 2-local, and since 2-local terms by assumption are unpunished, we have I = 1 and so ] has both 2-local and 4-local pieces, but only the 2-local pieces survive when the trace is taken against ∆, and so the third term also has I = 1. The fourth term is harder: [H, ∆] has both 1-local and 3-local pieces. However, as we will argue, in the limit of large K the expression is dominated by the three-local terms, so it is a good approximation to treat it as weighted by a factor we will denote I 3 . In total we have We have already assumed that H and ∆ have the exactly 2-local form let us now assume that all the coefficients are independent random variables with zero mean (for example they could be Gaussian, though that will not be essential). After averaging over the random variables we have (for large K) Were it not for k-locality this quantity would be of order one, but because of k-locality all except a fraction 1/K of terms in H commute with ∆, so the answer is ∼ 1/K. Putting it together we will find [18] R t=0, K k=2 Let us now examine the implications of this remarkable formula. First (as noted in [9]) the sectional curvatures will generically be negative if I 3 is large enough, I 3 > 4/3. Second, Equation 4.23 shows that if I 3 is of order 4 K (as assumed in [9]) the curvature will also be of order 4 K , which as we explained is too large. In particular it is incompatible with the switchback effect and with the scrambling time being log K. (Instead, geodesics would deviate so violently that scrambling would be almost immediate.) However suppose that Eq. 4.11 governs the I w . In that case I 3 is about 4 and the curvature is of order 1/K.
(We have confirmed [18] that for all k-local generalizations of Eq. 4.21 the sectional curvature is generically ∼ k 2 /K. Again, this is because the probability that two randomly chosen k-local terms share a qubit and therefore fail to commute is roughly k 2 /K.) The significance of the negative curvature is that geodesics exponentially diverge with typical (local) Lyapunov behavior. This suggests that the motion in complexity space is chaotic. The Lyapunov exponent corresponds to the exponent that controls the 'growth of operators' obtained from out-of-time-order correlations [11]. However, to follow the exponential growth of complexity all the way out to the scrambling time we would need to compute higher orders in t which we have not yet done.
It should be noted that the sectional curvature at the origin only depends on the first penalty factor I 3 . In calculating the geodesic deviation to higher orders in t, the higher-weight penalty factors will appear.

Particle on the Complexity Geometry
Earlier we considered the motion of a particle on the bi-invariant geometry of SU (2 K ). What we really want to study is the motion on the right-invariant complexity metric Eq. 4.8.
What is the relation between the geodesics of the bi-invariant metric and those of the complexity metric Eq. 4.8? The answer is simple and easy to prove. Suppose the initial velocity components lie in the k-local subspace. In that case the geodesic will always lie in the k-local subspace. That follows from the right-invariance of the metric. Furthermore such a geodesic will be exactly the same for either metric-bi-invariant or right-invariant. This includes the length along the geodesic.
In fact we only care about these k-local geodesics, but we can generalize the above statement to a wider class. Suppose the initial velocity is any eigenvector of I IJ , Again this will continue to be the case along the entire geodesic. Furthermore such geodesics will define the same curves for both metrics, but the length along them will differ by a factor equal to the corresponding eigenvalue √ λ. Since we are only interested in the k-local geodesics we can calculate them and their lengths from an action principle using either metric. In fact most of the discussion in Sec. 3 remains the same if we replace the bi-invariant metric with the complexity metric. The only exception is that the complexity metric is not left-invariant and as a result the left charges are not conserved.

Geodesic Deviation
Let us consider a pair of neighboring geodesics, both generated by k-local Hamiltonians H and H + ∆dθ. The geodesics intersect at the origin t = 0 as in Fig. 3.
Geodesic deviation is defined in terms of the rate of change of the length of the Jacobi vectors along the geodesics. Because length in the complexity metric and in the standard metric are not the same, geodesic deviation will be different in the two metrics. In particular the sign of the geodesic deviation is controlled by the sectional curvature of the section containing the two geodesics. The sectional curvatures in the standard metric are all positive, corresponding to geodesic convergence (negative geodesic deviation). By contrast in the complexity geometry the sectional curvatures are typically negative for large enough penalty factors, and geodesics diverge (positive geodesic deviation). This property of negative sectional curvature is central to the duality between classical and quantum chaos that we are proposing. Ergodic behavior (see Sec. 3.5) is necessary for classical chaos but not sufficient. The additional ingredient is the sort of instability characteristic of negative curvature and geodesic deviation. Without being precise about the definition of chaos, positive deviation leads to the kind of sensitivity to initial conditions that characterizes chaos. However, because of the conservation laws the chaos of the A-system can only take place within a 2 K -dimensional sub-manifold The fact that merely ergodic motion can be made to appear chaotic by changing the metric from the standard metric to the complexity metric is a mathematical representation of the differences discussed in Sec. 4.1. It is something that needs more study.
The negative curvature controls the Lyapunov exponents of the classical auxiliary model and the largest Lyapunov exponent may be identified with the quantum Lyapunov exponent discussed in [4].

Complexity Equals Action
The obvious guess would be that the complexity of the unitary operator U (t) is the minimal geodesic distance separating it from the identity operator 1, where the integral is taken along the shortest geodesic connecting 1 and U. However, with the normalization for the metric we have chosen in Eq. 3.1, rather than using geodesic length we instead use the action of the auxiliary system 16 , as discussed further in Ap-pendix C, with the constraint that the conserved energy E a of the auxiliary system is equal to the actual dimensionless Rindler energy E of the quantum system Q, Since it is well known (see e.g. [22]) that the total energy of a black hole in Rindler units is proportional to its entropy K we may identify Thus we postulate that: The complexity of a unitary operator U is the minimum action of any trajectory connecting U and the identity, subject to the condition that the energy E a of the particle is fixed and equal to K.
(The minimum action here refers to the action evaluated in the complexity metric, not the bi-invariant metric.) The relation between geodesic length Eq. 5.2 and the quadratic action of Eq. 5.3 is easy to derive, Action = E a Length, (5.6) or, using E a = K, Now let us argue that length and complexity C are related in exactly the same way; in other words that, In order to normalize length, we note that according to the standard (bi-invariant) metric the geodesic length between any two orthogonal unitary operators is π/2 ∼ 1. Along k-local directions this is also the distance that it takes for U to become orthogonal to its initial value 17 . In other words two points U and U are orthogonal if they are separated along a k-local direction by a distance π 2 ∼ 1. Second, the Aharonov-Anandan bound [23] tells us that the time for U to become orthogonal to its previous value (orthogonality time) is ∼ 1/∆E where E here refers to the Q system. From Eqs. 2.8 & 2.9 we see that ∆E = √ K. It follows that the orthogonality time is ∼ 1/ √ K. This is discussed in detail in [24] and also in [20]. On the other hand the rate at which effective gates act (the rate of complexity growth) is K. Therefore the number of gates that act in an orthogonality time is ∼ √ K. Putting it all together we see that the number of gates corresponding to a geodesic distance ∼ 1 is √ K. Thus the complexity accumulated over a distance L is where L is length. It follows that complexity and length differ by precisely the same factor-namely √ K-as action and length. The factor of √ K in Eq. 5.8 is the same factor that appears in Appendix C.

The Growth of Complexity
From the ordinary nonrelativistic connection between kinetic energy and velocity, and from Eq. 5.5, we find that the velocity of the auxiliary particle satisfies, It also follows that the value of the Lagrangian is Our basic hypothesis-that complexity equals action-implies that the rate of growth of complexity is L a . Thus we find that as expected, complexity grows according to The inner product of two unitary operators U 1 and U 2 is defined as Tr U † 1 U 2 .
As we have seen, the classical motion generated by k-local Hamiltonians takes place on a sub-manifold of dimension slightly larger than 2 K . Recall the conjecture of Sec. 1 that the quantum computational complexity of a K-qubit system Q evolves in a similar manner to the entropy of a classical system with ∼ 2 K degrees of freedom. The new conjecture should now be obvious: up to a factor to be determined, the ensemble-averaged complexity of Q is the entropy of A, denoted S a . We will have to refine this idea, but in essence that's the proposal. Given an energy and entropy, the classical system A has its own thermodynamics which is quite distinct from the thermodynamics of Q. We may call it the thermodynamics of complexity.
As we mentioned at the end of Sec. 2 we can generalize the quantum system by allowing stochastic time dependence in the J's. The effect on the classical auxiliary system is to turn it into a problem of diffusion on the complexity geometry.

Statistical Mechanics of Complexity
As we mentioned early in this paper, the growth of complexity for a quantum system of K qubits resembles the growth of entropy for a classical system with an exponential number of degrees of freedom. We will now consider the statistical mechanics of A and how it is related to the complexity of Q.

Average Complexity Equals Entropy
The phase space probability distribution for a classical non-relativistic gas often separates into two factors, one depending on the positions of the particles and the other on the momenta, As a consequence the total entropy is a sum of two terms: the positional entropy associated with the distribution F (x), and the kinetic entropy associated with G(p), It is not necessary that the system be in equilibrium for the entropy to separate in this way. It is only necessary that the probability factorizes.
Let us now state the basic two-part conjecture. The first part is about the computational complexity of U and the positional entropy of the A-system. The second part is about kinetic entropy and Kolmogorov complexity. In both cases the term ensemble average implies an average over initial velocities, or by V/J-correspondence, an average over the couplings J.

Computational Complexity and Positional Entropy
Our conjecture states that: At any instant, the ensemble average of the computational complexity of the quantum system Q, is proportional to the classical positional entropy of the auxiliary system A.
There are two qualifications to note. The first is that we identify computational complexity with positional entropy instead of total entropy. The reason for this qualification is that computational complexity has only to do with the distance of a point U from the origin; in other words its position in complexity space, not its velocity. In subsection 6.4 we will consider kinetic entropy and its connection with complexity.
The other qualification is the use of proportional to rather than equal to. Computational (or circuit) complexity depends on a number of factors such as the gate set. We assume that different choices lead to a multiplicative ambiguity in the definition of complexity. On the other hand, if our conjecture is correct, a particular normalization of complexity will allow us to equate average complexity with positional entropy.
The conjecture can be stated in another way. We consider the number of unitary operators that can be reached by time-independent k-local Hamiltonians, with complexity less than or equal to C. Call it N (C). Our conjecture amounts to the claim that for 1 with a being a constant independent of K, but dependent on the specific scheme (gate set, etc.) for defining complexity. If true it would allow us to define a normalization for complexity for which a = 1. An intuitive counting argument for the conjecture will be given shortly.

Kinetics
We have considered the positional aspects of entropy. Now let us consider the kinetic aspects. The auxiliary energy in Eq. 3.15 is simply expressed in terms of J, i.e. the energy is proportional to the sum of the squares of the couplings. Recall that the probability distribution P (J) in Eq. 2.5 has the form of a Gaussian. Using the velocity-coupling (V/J) correspondence of Sec. 3.2 this distribution is seen to be a Maxwell-Boltzmann velocity distribution, Alternatively it defines a Gibbs ensemble, with the constant B a being the inverse temperature of the auxiliary system.
The temperature may be determined in a number of ways, the easiest being to use the fact that every degree of freedom in a Maxwell-Boltzmann distribution has energy T a /2. The result depends on the locality parameter k. For illustration we consider k = 2. The total energy is given by Eq. 5.5 as E a = K and there are of order K 2 3 2 excited degrees of freedom 18 . Thus the energy per degree of freedom is 2/9K and the temperature is More generally, if the Hamiltonian is k-local instead of 2-local the temperature will satisfy, T a ∼ 1/K k−1 .
(6.9) (To be clear, T a is the temperature of the classical auxiliary system A; it is not the temperature of the quantum system Q.) There is of course an entropy associated with the probability distribution of the J's. By V/J-correspondence it may be thought of as the kinetic part of the total entropy.

Kolmogorov Complexity and Kinetic Entropy
This raises an interesting question: What, if anything, does the kinetic term in the entropy have to do with complexity? It would be odd and maybe disappointing if one term in the auxiliary entropy (the positional entropy) was an average complexity and the other (the kinetic entropy) was not. We don't believe this to be the case.
Given that the velocities are related to the J-coefficients we can identify the kinetic entropy of the classical auxiliary system A with the entropy of the probability distribution P (J). (Since this is a function of the coupling constants J, this is a property not of the quantum state but of the quantum Hamiltonian; this is a consequence of Sec. 3.1, where we saw that the velocities in A are given by the quantum Hamiltonian in Q.) For a moment suppose the J's are each either 0 or 1. The Hamiltonian Eq. 2.1 would then be specified by a bit-string (0110100.....). It would be natural to ascribe a Kolmogorov complexity C κ (s) to the string s. Kolmogorov complexity measures the length of the shortest algorithm that can prepare a string. Applied to the string of J's it would define a Kolmogorov complexity for each specific instance of a Hamiltonian.
The Kolmogorov complexity is a measure of randomness which, unlike classical entropy, does not depend on probabilistic assumptions, or the existence of a statistical ensemble. In some respects it is a more physical quantity than entropy in that it is defined for each instance and does not make reference to the state of knowledge of the observer [25]. Its disadvantage is that it is uncomputable and difficult to work with. Fortunately under suitable assumptions the average Kolmogorov complexity is connected to entropy.
If we are given a statistical ensemble of bit-strings we may define two measures of randomness or genericity for the ensemble. The first is the good old entropy defined by the usual formula − P (J) log P (J). The other is the ensemble average of the Kolmogorov complexity P (J)C κ (J). What if anything is the relation between these quantities? In fact under mild assumptions 19 the two are the same [25] [26], The J's are real numbers, not binary digits. This means that to specify them with infinite accuracy will in general take an infinite amount of information, which means infinite Kolmogorov complexity (the same infinity that shows up in the classical entropy of continuous variables, such as velocity). We will fall back on a discrete approximation to the continuum. For example, suppose J takes on real values on some interval. We can replace the real numbers by a fine lattice with spacing δ. All together there are ∼ 1/δ points on the lattice. A value of J can be specified by an integer from 1 to 1/δ. It is well known that the typical Kolmogorov complexity of such an integer is of order log (1/δ) and therefore diverges logarithmically as δ → 0.
Despite this divergence we still expect the ensemble averaged complexity to be the same as entropy. This is because the same log δ divergence appears in the entropy for the reason that the probability for any value of J is order δ and the sum P log P will be proportional to − log δ. The average Kolmogorov complexity of the J's depends logarithmically on the tolerance in specifying the Hamiltonian 20 .
Before concluding this section we will give a circuit analogy. The analog of Hamiltonian evolution would be to start with a unitary circuit of small depth, call it Γ, and to repeat it over and over, Most of what we described here can be adapted to that case. Averaging over Hamiltonians would be replaced by averaging over an ensemble of Γ's. In computer science terms Γ is the program that determines what computation the circuit carries out. Part of the complexity of the entire computation is the Kolmogorov complexity of Γ. The ensemble average defines an entropy, which as we've seen, is related to kinetic entropy.
The kinetic entropy of the A system is time-independent and of order the number of J's. This is polynomial in K. On the other hand the positional entropy is time dependent and 20 To be clear, we are calculating the Kolmogorov complexity of a time-independent Hamiltonian, with a tolerance δ. The time t does not appear. If instead we were calculating the Kolmogorov complexity of a quantum state evolving under a time-independent Hamiltonian-which, to be even clearer, is not the quantity of interest for the purposes of relating Q and A-we would find that this generically scales like like log t at intermediate time. Consider the algorithm for specifying the state that first gives the (simple) initial state, then says 'evolve for time t', and then specifies the time-independent Hamiltonian to be used in the evolution. The first part is a fixed overhead that doesn't scale with t. The second part-specifying the time-requires log t bits. The third part-specifying the Hamiltonian-also requires log t bits, because to approximate e −iHt for a time t requires an accuracy in H that is an inverse polynomial in t [27].
can grow to exponential size ∼ 2 K at equilibrium. During the early period of complexity growth the two can compete but in equilibrium the entropy is dominated by the positional term. To put it another way, the Kolmogorov complexity is essentially a fixed overhead having to do with the complexity of the algorithm, but after the algorithm has run for a long time the computational complexity vastly exceeds the fixed overhead.
The computational complexity measures the total number of gates required to build the minimal circuit that generates the state. Even for a time-independent Hamiltonian, this scales like t since you need to keep paying over and over again to apply the same gates over and over again. The Kolmogorov complexity is (no more than) the number of bits in the most compressed possible description of this circuit. For time-independent Hamiltonians you do not need to keep paying over and over again as you concatenate identical sub-circuits, since you can just specify the total number of such sub-circuits with log t bits.
Whether or not we add the Kolmogorov complexity to the circuit complexity to define a total complexity is a matter of definition. In ordinary thermodynamics the two kinds of entropy are transmutable into each other, for example by adiabatic compression or expansion, so adding them is essential. In the present context one thing is clear: the Kolmogorov complexity of Γ and the circuit complexity both contribute to the overall complexity of carrying out a computation.

A Counting Argument
The set of operators reached by evolving with k-local Hamiltonians forms a space of dimension not much bigger than 2 K . Ideally we would like to know how much of the volume of that space is occupied by operators of complexity C. The conjecture of Eq. 6.3 is that it is exponential but we haven't proved it. Brownian or random circuits which fill all (4 K − 1) dimensions of SU (2 K ) are easier to analyze. The counting problem in this case is the unrestricted counting of all unitary operators with complexity less than or equal to C. We'll do that counting now.
There is an important difference between the time-independent Hamiltonian model and stochastic random circuit models. The difference has to do with the Kolmogorov complexity of the circuit. In both the time-independent Hamiltonian model, and the repeated-Γ model the Kolmogorov complexity is essentially a fixed overhead which does not grow linearly as the circuit evolves. Thus whether we include it or not, we may ignore it over long timescales. This is not the case for random circuits, where at each time step a new random choice of gates has to be made. It is evident that the Kolmogorov complexity increases linearly with time and therefore, if included, it will contribute to the growing total complexity of an evolving circuit. In what follows we include the Kolmogorov complexity in the counting for a stochastic or Brownian circuit.
In the simplest model at each instant a single gate acts. If we only have to choose from a small gate set, the Kolmogorov complexity per gate would be order 1 and would not be very important. But at each step the choice also involves which set of k qubits the gate acts between. For example in the case k = 2 there are K(K − 1)/2 possibilities to choose from. That means that each gate adds a Kolmogorov complexity ∼ log K 2 . We can easily account for this by assigning a complexity log K 2 to each gate. (We would not do this in the Γ model in which the Kolmogorov complexity is essentially a fixed overhead. In that case each gate is assigned complexity O(1).) The full complexity of a unitary operator in the stochastic model is log K 2 times the minimum number gates that are required to prepare U.
We can give a rough counting argument for how complexity grows. The argument is closely related to one given by Roberts and Yoshida [28]. Let's consider a path through SU (2 K ) defined by a series of n 2-qubit gates U (n) = g n g n−1 ...g 1 . (6.12) The gate-set is assumed universal and includes m gate-types which can act on any pair of qubits. Thus each gate involves a choice of mK(K−1) 2 possibilities. The system of paths defined this way forms a tree [4]. The tree is a discrete analog of complexity geometry.
The number of such paths of length n is N (n) ∼ mK(K − 1) 2 n ∼ e n log (mK 2 ) . (6.13) Does each path produce a different unitary operator or are there collisions where two paths produce the same operator? Because of the very high dimensionality of SU (2 K ) collisions of this type are rare until n is very large. In fact the fundamental assumption that this work is based on is that collisions do not generically occur until n is exponential in K.
Under these conditions the set of unitary operators that can be reached in this way include all U with complexity less than C = n log K 2 , and no U with complexity greater than n log K 2 . The conclusion is that the number of unitary operators with complexity less than or equal to n log K 2 is N (n). Thus the number of U s with complexity of order C is (6.14) We may think of all the operators with complexity between C and C + δC as living in a shell of volume e C surrounding the root of the tree. The positional entropy of an ensemble supported in this shell is the logarithm of this volume and is therefore S a ≈ C. This counting argument relies on the assumption that at subexponential times collisions are rare. This assumption seems particularly warranted in the context of the stochastic random circuit model we have considered so far in this subsection. To establish Eq. 6.3 we'd like to make the same assumption in the context of unitary operators generated by the exponentiation of k-local time-independent Hamiltonians. In this case the assumption seems less secure, since by restricting ourselves to this special subset of unitaries we may have made collisions more likely. Nevertheless, the subset of unitaries that may be generated by k-local time-independent Hamiltonians is still exponentially big, and our conjecture is that this should be good enough to underwrite Eq. 6.3.

A State-Complexity Argument
We'll give one more argument for Eq. 6.3, not based on operator complexity but on state complexity. Earlier, in Sec. 4.1, we discussed relative state complexity. In order to define absolute state complexity one needs a concept of a simple state. By a simple state we will mean one with no entanglement, for example the product state |000..00 . Once one specifies what states are simple, the absolute state complexity of |ψ just means the smallest relative complexity between |ψ and a simple state. To say it another way, the state complexity of |ψ is the minimum number of gates required to convert it to an unentangled state.
The geometry of state complexity is similar to that of unitary operator complexity [18]. The most important difference with unitary operator complexity is that the space SU (2 K ) is replaced by the projective space of normalized states CP (2 K − 1).
In order to count states we have to regularize CP (2 K − 1) by dividing it into cells of linear size . The number of such cells in CP (2 K − 1) is obtained by dividing the volume of CP (2 K − 1) by the volume of a ball of radius . The answer is that the number of states is given by This is often simplified to e 2 K . We will return to the dependence in a moment. Now consider the number of states that have complexity C. Let us assume that it is exponential where α is a constant to be determined. The maximum state complexity is ∼ 2 K and almost all states have that complexity. On the other hand the total number of epsilonballs in CP (2 K − 1) is given by Consistency of Eq. 6.17 and Eq. 6.18 requires α = 1. Thus the number of quantum states with a given complexity grows as the exponential of the complexity. Taking the logarithm implies that average complexity is auxiliary entropy. Coming back to the dependence, the logarithmic divergence in the counting of states is familiar in classical statistical mechanics. Strictly speaking the continuous nature of phase space implies that entropy is infinite. The divergence may be regulated by discretizing space and momentum space, and one finds the divergence being logarithmic as in the exponent of Eq. 6.16.
On the complexity side we have been a bit sloppy in claiming that the maximum complexity is 2 K . Complexity, like entropy, also requires a cutoff, and a more correct statement is that the maximum complexity is |log |2 K . Thus the divergences in complexity and entropy match.
The two arguments we've given-the counting argument and the state-complexity argument-are arguments for the plausibility of our conclusion, but are far from rigorous, and it would be interesting to explore this question further.

The Second Law
In this section we come back to the original question that we asked in Sec. 1: is there a Second Law of Complexity? Let us first discuss an obstruction to complete thermalization of A.

Obstruction to Thermalization
The Maxwell-Boltzmann velocity distribution in Eq. 6.5 is an initial condition connected with a choice of a Gaussian distribution for the coupling constants J. It is not a consequence of dynamical thermalization of the A system. In fact the large number of conservation laws associated with right-multiplication invariance creates an obstruction to thermalization. By contrast the tendency toward maximal positional entropy is not obstructed and takes place for each value of the conserved quantities.
There are 4 K conserved generators of right multiplication. They are given by Eq. 3.9. Within each leaf of the foliation (by the values of the generators) the auxiliary system with complexity metric is chaotic. This means that the positional entropy will grow with time and eventually reach its maximum, but if the initial velocities are not Maxwell-Boltzmann distributed, the system will never reach thermal equilibrium. This is roughly like a gas of completely free particles on a very large negatively curved Riemann surface. The kinetic energy of every particle is conserved, but the positions will spread out and eventually fill the space.
Granting the correspondence between average complexity and auxiliary entropy, we can give a rough analogy for the growth and evolution of computational complexity. Initially a large number ∼ 2 K of particles are located near the origin of a large box of volume e 2 K . The velocities are Maxwell-Boltzmann distributed. The gas begins to expand and the positional entropy grows. Eventually the gas fills the box and comes to equilibrium. It stays in equilibrium for a very long time but on timescales e 2 K recurrences happen. Figure 1 is the result of translating this picture into the computational complexity of the system Q.

Second Law of Complexity
The thermodynamic laws of complexity are just the usual laws of thermodynamics applied to A. The second law, applied to positional entropy implies a second law of computational complexity [4]: If the computational complexity is less than maximum, then with overwhelming likelihood it will increase, both into the future and into the past.
The classical system A tends to positional equilibrium after a time polynomial in the number of classical degrees of freedom, and then remains in equilibrium for a classical recurrence time. This implies that the quantum system Q comes to complexity equilibrium after a time exponential in the number of qubits, and remains there for an even greater quantum recurrence time, the quantum recurrence time being doubly exponential in K. Thus we achieve our goal of understanding the growth of complexity for a K qubit Q system (Fig. 1) in terms of the behavior of classical entropy for an A system of 2 K degrees of freedom.
In principle one can reverse the evolution of a large but finite system by intervening with a process which changes the sign of its Hamiltonian. In classical physics this reverses the trajectory in phase space and if it can be done with sufficient precision it will reverse the increase of entropy, causing an apparent violation of the second law of thermodynamics. The only problem is that decreasing entropy is unstable when the system is chaotic. The effect of a tiny change in a single degree of freedom will exponentially grow, and quickly reverse the decrease of entropy, turning it back to an increase.
We can apply this property of classical physics to the A-system and derive an important property of quantum complexity. In principle quantum states of a many body system can be prepared which will evolve toward decreasing complexity 21 . But the quantumclassical duality between system Q and system A implies that the decrease is unstable. The application of a small perturbation to a single degree of freedom will exponentially spread through the system, and reverse the decrease of complexity. This phenomenon and its relation to negative curvature was studied in the toy model [4]. It can also be explicitly seen in black hole dynamics using the classical shock wave calculus of [29].
Finally, as pointed out in [4], the largest classical Lyapunov exponent of A is the quantum Lyapunov exponent [11] of Q.
In a similar way, when a black hole evolves, it uselessly generates complexity. Black holes are not only the fastest computers in nature, they are also the most useless. They implement the highest number of gates per unit mass per unit time, but which gates they implement are chosen by quantum gravity, not by the user. The result is computation that, while extremely fast, is undirected-useful only for those whose purpose is to simulate black holes.
But the second law of thermodynamics has another side to it beyond the inevitability of the increase in entropy, the side that led to its creation by steam-engineers. An entropy gap, namely the difference between the entropy of a system and the maximum entropy in thermal equilibrium, is a resource [30]. This resource can be harnessed to perform directed work.
In this paper we are interested in the question of whether complexity defines a resource that can be harnessed in a useful, directed, manner, in analogy with thermodynamic work. We expect that the analog of directed work is directed quantum computation-we will call this 'computational work'.
In exploring this conjecture, we will be guided by the analogy between the complexity of the quantum system Q and the entropy of the classical system A. This is an incompletely-defined idea, but nevertheless we will give some reasons to believe that a resource interpretation of complexity exists.
Without giving a formal definition of thermodynamic 'work', for a process to do work it must have the following features: 1. Doing work enacts a directed transition from one macroscopic state to another, with a deliberate goal. (For example, raising a weight.) 2. Doing work expends a resource. Once the available resource is fully expended, no further work is possible until the resource is replenished.
3. Doing work involves a procedure that depends only on the macrostate of the system involved, and not on the specific microstate.
(This definition of work excludes the kind of work that involves Maxwell's Demons.) By a quantum computation we will mean a quantum circuit that begins with a pure input quantum state and ends with a pure output quantum state. The circuit may be composed of gates or a possibly time-dependent Hamiltonian. In other words it is a quantum-in-quantum-out process and its purpose is to reach a target state. The computation can be thought of as a trajectory on the space of states or in the configuration space of the auxiliary system A. No measurement is allowed during the course of the computation, as measurements are not part of the Q-A correspondence. Of course to be useful the computation must be followed by a measurement but only at the very end. The computational work and the necessary resources refer to the quantum-in-quantum-out computation and not to the measurement.
In thermodynamics, the free energy is a resource that represents the amount of energy that can be directed toward useful work. Applied to the auxiliary system the definition of free energy would be E a − T a S a , or equating auxiliary entropy with complexity, For the auxiliary system, as formulated thus far, both the energy E a and the temperature T a are fixed parameters that only depend on the number of qubits through Eqs. 5.5 & 6.8. The only variable in the free energy is the complexity. Therefore we propose that the quantity −C be treated as a resource. More exactly we propose that the gap between the complexity and the maximum possible complexity-the 'uncomplexity'-is a resource that can be utilized for directed computation, To understand why uncomplexity might be viewed as a resource, let's consider how useful a computer would be if the resource is all used up. Consider a tired old quantum computer that has been allowed to run for such a long time that the state-complexity has reached its maximum value, exponential in K, and therefore ∆C = 0.
For most purposes a state of maximal complexity is indistinguishable from a maximally mixed density matrix. In both cases the expectation values of all but the most complex operators are given by their Haar-random values. Suppose our computer is initialized in a mixed state with density matrix proportional to the unit operator, ρ ∼ 1.
Consider any unitary operation G that we may apply. (We use the notation G to suggest that the operation may be composed of gates.) The action of G on any density matrix changes it to G ρ G † . This may or may not be useful in general, but when applied to the maximally mixed density matrix it does nothing. Whatever operation is applied, the result is the same: the maximally mixed state. Therefore unless the computer is re-initialized no useful computation is possible.
The same is true for a maximally complex state as long as G is not so complex that it can undo the exponential complexity of the initial state.
The state with the maximal resource has C = 0 which means a simple unentangled product state. It seems reasonable that the most powerful initial state for general allpurpose computing would be the simplest state.
In attempting to think of the uncomplexity ∆C as a resource we will use the correspondence between the quantum complexity of Q and the classical entropy of A as a guide. We will now give some examples based on thermodynamic analogies.

Combining Systems: A Paradox
Many thermodynamic questions concern what happens when two isolated systems, each in equilibrium, are brought into contact. The first question is: What does it mean to combine two auxiliary systems, and how is it related to combining the corresponding quantum systems? Here we will consider a simple case: two thermodynamically identical A subsystems at the same temperature T a and entropy S a are combined. This should give rise to a single system in equilibrium at the same temperature, with an entropy 2S a .
We would like to understand what it means to combine two classical auxiliary systems, each in complexity equilibrium, into a composite auxiliary system. In other words given an auxiliary system A, what is the meaning of A × A?
Here is the paradox: Naively we might think that combining two auxiliary systems involves combining the two corresponding quantum systems in the form Q ⊗ Q, where each factor contains K qubits, and is in complexity equilibrium. Let's see what happens if we do so. Each subsystem has complexity of order C = 2 K . Immediately after combining the systems the total entropy is 2 × 2 K , and the maximum complexity of the combined system is This is the square of the individual complexities, not the sum. Therefore the resulting systems, when combined, will be very far out of complexity equilibrium. That is not what should happen if we combine two identical thermodynamic systems; the entropy should be additive. Evidently combining two quantum systems does not correspond to combining the auxiliary systems in an additive way. Instead it multiplies the number of degrees of freedom of the auxiliary systems. This seems to be evidence that complexity does not behave like entropy.
The resolution of this paradox is that the operation of combining auxiliary systems is entirely different from combining the corresponding quantum systems. The right idea is to take the system of K qubits and add just a single additional qubit. Adding one qubit doubles the dimension of the Hilbert space, and therefore doubles the number of classical degrees of freedom of the auxiliary system.
Let's show this in equations. If |ψ 0 and |ψ 1 are both K qubits states with ψ 0 |ψ 1 = 0, we combine these two systems by constructing the maximally-entangled K + 1 qubit state The new auxiliary system has twice as many degrees of freedom 22 as the auxiliary system for the original K qubit quantum system. This is because the wavefunction has twice as many components. Thus we see that the addition of one qubit is the operation that doubles the auxiliary system. If the states |ψ 0 and |ψ 1 are independently picked at random their relative complexity will almost always be maximal. In that case it can be shown that the complexity of |Ψ will be twice the complexity of either |ψ 0 or |ψ 1 .
Let's suppose that the new qubit, which we'll call τ , is uncoupled from the other qubits and that |ψ 0 and |ψ 1 are separately maximally complex. Thus the overall auxiliary system is two copies, each in complexity equilibrium.
Next we turn on generic k-local interactions between τ and all the other qubits. The overall system will come to complexity equilibrium with complexity In other words the final complexity will be the same as the sum of the complexities of |ψ 1 and |ψ 0 . This is exactly like mixing two uncorrelated gases of classically identical particles, each initially in equilibrium at the same temperature. The final entropy is the sum of the initial entropies and the process is reversible.
In the thermodynamic case it is obvious that no useful work can be extracted from such a process. In the complexity case, at all stages of the process the system is in a state of maximal complexity; thus according to our earlier discussion, no useful directed computational work can be done. Now let's consider the case |ψ 0 = |ψ 1 . In this case, the extra qubit is not entangled with the rest of the system, which we continue to assume is in complexity equilibrium, This time the two auxiliary systems are in exactly the same state. The initial complexity is 2 K , but after turning on an interaction that depends on the extra qubit and waiting for a long time, the final complexity is 2 × 2 K , i.e. double the initial complexity 23 . Is there a thermodynamic analog to this situation? Indeed there is. Imagine creating the two gases in exactly the same micro-state. Such a distribution is far from equilibrium: every particle of one gas is constrained to have exactly the same position and momentum as the corresponding particle of the other gas. The total initial entropy is the same as the entropy of one copy. However, perturbing one of the copies of the system, and then letting the whole system interact and come to equilibrium will result in a final entropy that is twice the initial. This is schematically illustrated in Fig. 4. Figure 4: The left panel shows a gas of 2N classical particles created with the particles paired. The entropy is the same as a gas of N particles. In the right panel the gas has come to equilibrium and the particles become randomly distributed. The entropy in the right panel is twice the entropy in the left panel. No work can be extracted from a gas in equilibrium, but the gas of paired particles is far from equilibrium and so can be used to do work.
For a genuine classical system it follows from the laws of thermodynamics that work can be extracted from the initial out-of-equilibrium state. In the quantum-complexity case this would correspond to a resource being available in a state of sub-maximal complexity. This resource-uncomplexity-can be used to do computational work.

One Clean Qubit
In this subsection, we will give an example of how uncomplexity can be used to do 'computational work'. We will see that in the process, the uncomplexity is expended. First consider a system that has no uncomplexity-a state of maximal complexity. A maximally complex state is very much like a maximally mixed density matrix as long as we restrict ourselves to reasonably simple experiments. If we act on such a state with a polynomial size circuit the complexity can only be reduced by a negligible fraction. For any measurement of a non-exponentially complex observable, the result will be Haar random, so again no useful computation can result from an initial maximally complex state. A quantum computer that runs for an exponential time and reaches maximal complexity becomes useless for computation. Now consider adding to this maximally complex state a single additional qubit in a pure state. This doesn't change the complexity, but the maximal complexity doubles, so the complexity is now only half the maximal value. From the analogy with the two component out-of-equilibrium gas in Fig. 4, we should expect that the additional qubit, which has replenished 24 the uncomplexity resource, will allow us to once again perform useful computational work.
Computation that makes use of either a maximally mixed state (or a maximally complex state) plus just one additional unentangled qubit is called "One Clean Qubit" computation. Just how much power one clean qubit computation provides and how to quantify it is not certain but it is known to be able to efficiently solve problems including some classically hard problems [31]. Known examples include calculating the trace of a unitary operator and estimating certain properties of Jones polynomials. We'll review the illuminating example of calculating the trace of a unitary operator, which was first worked out in [31].
We suppose we have a unitary operator G in the space SU (2 K ). The operator G is constructed as a known product of a polynomial number of gates G = g N g N −1 ....g 1 . The goal is to approximate its trace. For simplicity let's only worry about the real part of the trace.
Begin with the space of states CP (2 K − 1). We will try to construct a K qubit circuit such that a measurement of σ z 1 will give some non-trivial information about the value of Tr G † + Tr G. Assume the circuit is initialized to the simple state |00000...0 .
Consider the neighborhood of all the states |ψ for which Call that the target set. If by running the circuit we can navigate to one of these points, then by a subsequent measurement of σ z 1 we learn something about Tr G † + Tr G. By repeating the experiment we can improve our knowledge. Thus the goal of directed computation is to decrease the relative complexity to zero between the initial state and some state that's in the target set. Figure 5 schematically illustrates the idea. The circles represent CP (2 K − 1) in a way such that distance from the center represents state complexity. In order to have a high probability of success it is important that each step increases the complexity. Now let's suppose that instead of starting with the minimally complex state |000..00 we start with a state in the darker pink outer region where the complexity is maximal ∼ 2 K . There are no blue points in this region since the expectation value of any observable is Haar random. With overwhelming probability any gate that acts on a state in the dark pink region will leave the point in that region. This shows that directed computation is not possible starting with a state of maximal complexity, i.e., ∆C = 0.
But now let us add one clean qubit τ , thereby doubling the maximal complexity. The larger circles in Fig. 6 represent CP (2 K+1 − 1), the space of K + 1 qubit states. The darker pink still shows states of complexity 2 K , but the region beyond it goes out to twice that complexity. Figure 6: Left: adding an extra qubit doubles the maximum complexity (adding an annulus to the space of possible states) and replenishes the uncomplexity resource. Right: given the additional resource, what was previously a state of maximal complexity now has some uncomplexity and can be used to further computation; this is illustrated by showing how a target state can be reached from the original maximally complex configuration.
Note that the initial state for the one-clean-qubit calculation is in the dark pink region, but now we can reach blue dots by moving outward towards increased complexity; we don't have to fight against the second law.
The actual algorithm is simple [31] and we will describe it now. . The input state has a total of K + 1 qubits-the one clean qubit (top) and the K maximally complex qubits (bottom).
The symbol H represents a Hadamard gate acting on the clean qubit. The clean qubit acts as a control for the circuit G: the circuit applies G to the other K qubits if the clean qubit is |1 , and does nothing if the clean qubit is |0 .
Consider the quantum circuit shown in Fig. 7. The initial state is where |max is any state of the K qubit system with maximal complexity. We act with the first Hadamard gate 25 to give Next apply the controlled G operation G c = |1 1|G + |0 0|1, where G = g N g N −1 ....g 1 ; this circuit applies G to |max if the control qubit is |1 , and otherwise leaves it unchanged. This gives Now acting with the second Hadamard yields This completes the computation. To make use of it we note that the expectation value of τ z is given by This in itself is not useful for our purpose-determining Tr G-but because |max is a maximally complex K qubit state, with overwhelming likelihood max|G|max = Tr G. (8.13) Thus by applying the circuit HG c H we have set up a state in which we can learn something about Tr G by making a measurement of τ z . The measurement itself cannot be represented as an operation in the classical auxiliary system. As we said earlier it should not be considered as part of the computational work. The computational work is associated with the process that went into setting up the state, i.e., acting with the circuit HG c H, and only then at the very end do we allow a measurement. By repeating this experiment, including the measurement, over again with fresh clean qubits we can get an arbitrarily accurate estimate for Tr G.
In classical thermodynamics we can repeat an operation designed to raise a weight one 25 The Hadamard gate is defined by the matrix (τ z + τ x )/ √ 2.
meter and thereby raise it two meters, three meters, four meters, and so on until we run out of resource. The same is true of computational work. For example by repeating the circuit of Eq. 8.14 in the form one can determine information about the trace of G n . (As before we only make a measurement at the end.) For obvious reasons the problem of determining the trace of a higher power of G becomes more difficult as the power increases. It is also clear that the repeated action of the circuit depletes the resource, roughly by the complexity of G each time it is repeated. 'One clean qubit' computation is an example of using uncomplexity to do computational work. It exhibits all three of the criteria that we listed at the start of Sec. 8.
1. First, it implements a transition that is directed towards a goal-the goal of calculating the trace of G.
2. Second, it uses up a resource-at the end of the computation, the additional qubit is no longer clean, and the complexity of the K + 1 qubit state has increased by approximately the complexity of G. Or to put it another way the uncomplexity resource has diminished by that amount.
3. Third, the process involves a transition from one macroscopic state to another by a procedure that does not depend on the microscopic state-we extracted information about Tr G without knowing precisely which state we started or ended in. (Thus no Maxwell's Demons were involved. Instead we did something analogous to doing work by expanding the volume of a gas using a procedure that does not require knowledge of the starting microstate.) It would be very interesting to know how the power of one clean qubit is connected to the doubling of the maximum complexity, and whether it is similar to the ability to do work with a system of identical gases in which the molecules are paired in a non-thermal distribution.

Kolmogorov Uncomplexity as a Resource
We have argued that computational 'uncomplexity' is a resource that can be used to do directed quantum computation. But computational complexity is not the only kind of complexity that has arisen in this paper. In Sec. 6.4 we argued that while the positional entropy of A corresponds to the computational complexity of Q, the kinetic entropy of A corresponds to the Kolmogorov complexity. This therefore raises the question of whether Kolmogorov uncomplexity is also a resource.
The answer is yes, but we will see that the resource is useful for a rather different purpose than computational uncomplexity. This means that from the point of view of the Q-A correspondence this subsection is something of an aside, but it is well worth explaining. In this subsection we will explain that Kolmogorov uncomplexity is a resource that is useful for doing erasure. This is beautifully illustrated by an example in an old paper of Bennett, Gacs, Li, Vitanyi, and Zurek [32], which examines apparent violations of Landauer's Principle [33]. Landauer's Principle says that, while reversible transformations can be performed without free-energy cost, to erase a bit (to reset it to zero no matter whether it starts at one or at zero) requires a free energy of k B T log 2. However, there are some examples where bits can seemingly be reset much more cheaply than this.
What Bennett et al. show is that these apparent violations occur precisely in cases that have Kolmogorov uncomplexity, since in those cases the states can be compressed before being erased. (For example, it requires less free energy to erase the first million digits of π than to erase a million random digits. This is because it is possible to reversibly transform the first million digits of π to the much shorter computer program that outputs them. Since this compressed description has much less Kolmogorov uncomplexity than the original description, performing the compression expends uncomplexity.) Specifically, they show that the free energy cost of deleting a bit string is not given by the total number of bits, but by the Kolmogorov complexity of the bit string. For generic bit strings these two coincide, but for special low complexity strings the Kolmogorov complexity is less. The total saving compared to a naive application of Landauer's principle is given exactly by the uncomplexity, The Kolmogorov uncomplexity of one bit string can be used to erase another bit string; in the process, the resource is expended.
We thus see that both computational uncomplexity and Kolmogorov uncomplexity can be used to carry out information theoretic tasks.

Uncomplexity as Spacetime
Our original interest in complexity theory began with the question: How does one describe the interior of a black hole in holographic terms? In this section we would like to come back to that question in light of the conjecture that uncomplexity is a resource. We will see that the black-hole/complexity-connection provides a new way to think about uncomplexity as a spacetime resource 26 based on classical general relativity (GR). In particular classical GR provides another way to think about the rejuvenating power of one clean qubit.
To understand the uncomplexity resource in GR terms let's suppose Alice is a black hole explorer stationed just outside a one-sided AdS black hole at boundary time t. She intends to jump from the AdS boundary into the black hole. The resource that she cares about is spacetime volume-without which she will perish at the horizon.
Recall that the quantum state of the black hole interior has a growing complexity (for t > 0) that is dual to the growing spacetime volume behind the horizon. At any instant the complexity is given by the Einstein-Hilbert action of the Wheeler-DeWitt (WDW) patch anchored at time t on the boundary [19] [20]. The part of the WDW patch outside the horizon has a time-independent divergence, which after initial transients can be regulated by considering only the portion of the space behind the horizon, as shown in Fig. 8. The action of the dark yellow region behind the horizon is of order its spacetime volume. Slightly simplifying the discussion, we can say that the complexity is given by the spacetime volume V 4 of the dark yellow region, multiplied by some numerical factors that depend on the gravitational constant G and the AdS radius of curvature AdS , (9.1) A straightforward GR calculation shows that the action increases linearly with time, with a coefficient equal to the mass of the black hole. This is consistent with the early growth of complexity in Fig. 1. It is believed that the classical description of the black hole breaks down when the complexity stops increasing, once C = C max . This occurs at t max = e S at which time the horizon becomes opaque by developing a firewall [5,6]. In Fig. 8 the cutoff at t max is shown as a blue diagonal slash in the upper right corner of the diagram. Time does not literally run out at the cutoff, but because complexity is bounded by C max the classical growth of the black hole interior must break down.
Let us consider in more detail the maximum complexity. Figure 9 shows the WDW patch pushed up to the cutoff time. The maximum complexity C max is the action of this new WDW patch. Classically the action (4-volume) in the upper corners would grow indefinitely, but the cutoff at t max ∼ e S keeps it finite. Figure 9: The WDW patch for maximum complexity.
The uncomplexity ∆C(t) ≡ C max − C(t) is given by the action of the dark yellow region of Fig. 9 minus the action of the dark yellow region of Fig. 8. This difference is shown in blue in Fig. 10. The uncomplexity is proportional to the 4-volume of the blue triangular region, which is cutoff at t max ∼ e S . This 4-volume is finite, and goes to zero as t → e S .
We see something interesting from the figure. The blue region may be identified with the union of all interior locations behind the horizon that Alice can visit if she enters the black hole at any time after t. The uncomplexity therefore represents the spacetime resource available to an observer who intends to enter the horizon.
Suppose Alice wishes to jump in after the black hole has become maximally complex. According to [5] she will run into an obstruction at the horizon. The situation is analogous to attempting to compute with a computer that has reached maximal complexity; the resource will have been exhausted. Can Alice do anything to renew the resource?
As explained in [5], all Alice has to do is to throw in one thermal photon and wait a scrambling time. This will restore the transparency of the horizon for an additional exponential time, in the same way that the computing power of a maximally complex computer can be restored by adding a single clean qubit. This phenomenon is essentially a classical GR effect, which we illustrate in Figure 11. Figure 11: The first panel shows the upper-right corner of the black hole Penrose diagram with the red line representing an opaque horizon that would be expected for a black hole of maximal complexity. The opaque horizon can be modeled by an infinitely thin Shenker-Stanford gravitational shockwave. The blue line in the second panel is a thermal quantum injected from the boundary. Such a quantum increases the entropy of the black hole by one bit. The effect of the low energy quantum is to shift the shockwave up and to the left thus separating it from the horizon. In a scrambling time it will be lost into the singularity. The right panel which was taken from [5] shows the process in more detail. The upshot is that within a scrambling time the horizon has become transparent; this newfound transparency lasts for an exponential time.
The conclusion drawn in [5] is that the obstruction at the horizon due to maximal complexity will be removed by adding to the black hole one clean qubit in the form of a thermal quantum. The rejuvenating effect of the added qubit parallels the effect in a quantum computer that has reached maximal complexity.

Summary
Let us summarize the material in this paper: • Section 2 introduced the class of quantum systems Q that we study, namely k-local systems composed of K qubits interacting through a Hamiltonian which is a sum of terms, each containing no more than k qubits. Alternatively the qubits may interact in a k-local quantum circuit built of gates with no more than k qubits. Such systems are typically fast scramblers.
We explained that the evolution of complexity for a k-local system of K qubits closely resembles the classical evolution of entropy for a system of exp[K] classical degrees of freedom and raised the question of the source of this similarity.
We also explained the SYK strategy of averaging over randomly chosen time-independent Hamiltonians. This sometimes allows us to determine the average behaviors for problems which are too difficult to solve in individual instances.
• In Sec. 3 we formulated the evolution of the time-development operator e −iHt as a classical mechanics problem of an "auxiliary" system A. The system A consists of a non-relativistic particle moving on the space SU (2 K ). The auxiliary system for a system of K qubits has a number of classical degrees of freedom exponential in K.
The first-order Schrodinger equation of Q is replaced by a second-order equation of motion for A, in which the Hamiltonian is eliminated altogether, in favor of initial conditions on the velocity of the particle. Averaging over a Maxwell-Boltzmann ensemble of initial velocities is equivalent to averaging over quantum Hamiltonians as in SYK.
• The usual inner-product metric on either the space of states or the space of unitary operators is poorly suited to studies of quantum chaos. Section 4 is devoted to the concept of relative complexity: a metric which represents the degree of difficulty in making a transition between two states, and also of doing an interference experiment that measures the relative phase between states. Relative complexity can also be defined for unitary operators and has a similar meaning.
The "complexity metric" defined by relative complexity is much better suited to measuring the difference between states of a chaotic system than the standard inner product metric. Inspired by the work of Nielsen and collaborators [8,9], Sec. 4 works out the basic mathematical properties of complexity metrics and shows that they are closely related to the negatively curved geometry of the toy model. In particular we calculated sectional curvatures and showed behavior consistent with the toy model.
• Section 5 introduces the A system as a classical nonrelativistic particle moving on this complexity geometry. The relative complexity of two unitary operators is the minimal action required to go from one to the other subject to a constraint on the auxiliary energy of the particle.
• Section 6 introduces our basic conjecture relating classical entropy to quantum complexity. We argued that after averaging over Hamiltonians (as in SYK) the ensembleaverage of quantum complexity is equal to the classical entropy of the auxiliary system A. In order to make this identification complete we must include not just the circuit complexity-the number of gates in the circuit-but also the Kolmogorov complexity of the algorithm that the circuit implements. In the case of a Hamiltonian quantum system the Kolmogorov complexity is the length of the shortest program needed to specify the Hamiltonian. Unlike the gate complexity, it does not grow linearly with time and so soon becomes negligible compared to the gate complexity.
The connection between quantum complexity and classical entropy is the link that suggests a thermodynamic description of complexity. In Sec. 7 we used this connection in order to formulate a Second Law of Complexity which is really just the second law of thermodynamics for the auxiliary classical system A. This line of reasoning explains the observation in Sec. 1 that the evolution of complexity for a K qubit system behaves like the evolution of entropy for a system with a number of classical degrees of freedom exponential in K.
• In Sec. 8 we discuss the concept of uncomplexity-the gap between the complexity of a state and the maximum possible complexity-and give evidence that it is a resource useful for doing computational work. An important component of resource theory [30] is combining systems into bigger systems. In the present framework this means combining auxiliary systems. Surprisingly, combining two auxiliary systems has nothing to do with combining the corresponding quantum systems. To double the size of an auxiliary system one only needs to add a single qubit to the quantum system. We illustrate the idea of uncomplexity as a resource with the example of "one clean qubit" computation.
• Finally, in Sec. 9, we look at the holographic dual to the uncomplexity of a boundary state. We show that when a black hole is present the resource-uncomplexity-is the total spacetime volume accessible to an observer who plans to cross the horizon.

Questions
The strategy of averaging over an ensemble of Hamiltonians (in computer science this would amount to averaging over algorithms) may allow one to solve problems about average behaviors that would be much too hard for individual instances. We are raising the possibility that very difficult problems of complexity theory may be solved on average by classical statistical mechanics and thermodynamics. As an example we point to the correspondence between the evolution of quantum complexity-an extremely difficult problem for specific Hamiltonians-and the classical evolution of entropy-a merely hard problem. This paper raises many questions, a few of which we will mention here.

• Definition of Complexity
We have assumed that there is a robust concept of complexity, but in fact there is a large family of complexity measures. It is important to understand how they are related and whether a preferred measure of complexity can be identified. In the context of the complexity geometry the different measures are encoded in the moment of inertia tensor I IJ . We showed that the sectional curvatures will generically be negative and order 1/K (in agreement with the toy model) as long as the penalty factors are not too small. What are the rules governing the choice of I, how should its elements grow with increasing weight, and is the curvature approximately constant as predicted by the toy model of [4]?
• Counting The conjecture that average complexity and classical entropy are the same rests on the assumption that the number of unitary operators with complexity less than or equal to C grows like e C for sub-maximal C. We were able to give arguments in the stochastic context and for state-complexity, but the arguments are far from a proof. Proving the conjecture requires counting the unitaries on 2 K -dimensional tori in SU (2 K ).

• Local vs Global Chaos
The motion of the A-system with a time-independent Hamiltonian is generically ergodic. Whether or not it is chaotic seems to depend on what metric we attach to SU (2 K ). According to the bi-invariant metric all sectional curvatures are positive which implies that geodesics converge.
On the other hand, as we emphasized in Sec. 4.1, conventional inner-product metrics do not capture an important concept of distance between states, or for that matter, between unitary operators. Distances in the bi-invariant metric are bounded by π/2 but complexity distances can grow to enormously large values. Evidently complexity distances between neighboring trajectories can grow exponentially with time whereas the inner product distances do not, although in both metrics the system is ergodic.
The question is whether the motion in complexity geometry is genuinely chaotic, and does it matter? True classical chaos is often diagnosed by the spectrum of Lyapunov exponents, with a single positive Lyapunov exponent indicating chaos. The concept of a Lyapunov exponent is a global one, defined by the infinite time average of trajectory deviation. By contrast there is also a concept of local Lyapunov exponents, which diagnoses local deviation, and local unpredictability. Local Lyapunov exponents are positive in regions of negative curvature. When local Lyapunov exponents are positive the system will behave chaotically for a length of time, but over sufficiently long times it may only be ergodic. Of course if this time is long enough-say for example exponential in K, the distinction between global and local chaos may be unimportant.
Our guess is that the A-system (with the complexity metric) is locally chaotic over an exponentially long time, but that it is not truly chaotic. But by then it hardly matters.
• Classical Complexity In this paper we have been concerned with the thermodynamics of quantum computational complexity. However, many of the arguments would apply to classical computational complexity. Can we also define a thermodynamics of classical computational complexity?

• Least Action and Least Computation
We can ask about the action-complexity connection. By now we have several versions of Action Equals Complexity. In [20] it was conjectured that the principle of least action for a gravitational system might ultimately become a principle of least computation. In this paper we have proposed another least action principle for the auxiliary system A, which would also describe the evolution of the state of a black hole. The question is: what is the relation between these apparently different but similar principles of least action/computation? More specifically, are they somehow the same? A similar suggestion in a slightly different context was recently proposed in [34].
• Uncomplexity as a Resource One of the most interesting questions raised by this paper is whether there is a sense in which the gap between quantum complexity and maximal quantum complexitythe uncomplexity-is a quantitive measure of a resource useful for quantum computation. Can we precisely characterize the resource and does it fit into standard resource theory [30]?
Can we understand the interplay between computational uncomplexity and Kolmogorov uncomplexity?
• First Law of Complexity In this paper we have argued for the existence of a second law of complexity. Identifying a first law of complexity is left for future investigation.
The conventional theory of thermodynamics was developed through a sequence of thought experiments involving adiabatic compression, heat engines, refrigerators, and the vanquishing of Maxwell's demon. Can we come up with a set of parallel thought experiments involving quantum complexity? What will be the steam engine of quantum computation?
2) The space of special unitary operators acting on K qubits (or 2K real fermion operators) is SU (2 K ). Elements of SU (2 K ) are denoted U, V, W, .... The Pauli basis for the generators of SU (2 K ) consist of: the Pauli operators σ a i , where a labels the three axes x, y, z and i labels the K qubits; and all products of Pauli operators for multiple qubits. In all there are (4 K − 1) such generators. They will be labeled σ I where I runs over (4 K − 1) values.
The weight of a σ I is the number of single qubit Pauli operators that it contains. Thus for example the weight of σ x 1 is 1, and the weight of σ x 1 σ y 3 σ y 4 is 3.
3) J I is a coefficient or coupling constant in the quantum Hamiltonian of the Q-system. 6) The complexity metric is written where the X s are coordinates on SU (2 K ).
7) The classical auxiliary system defined in Sec. 3 is denoted A. The original quantum system of K qubits with Hamiltonian given by Eq. 2.1 is denoted Q.

8)
A subscript a indicates that a quantity refers to the auxiliary system, not the quantum system. Thus V a represents the magnitude of the velocity of the particle in the auxiliary system. E a indicates the energy of the A system, etc.
9) The time t is measured in dimensionless units. For an uncharged neutral black hole the time t differs from the asymptotic Schwarzschild time t schw by a factor β/2π, where β is the inverse temperature of the black hole. The time t is the Rindler boost-angle time. The corresponding energy is also dimensionless and is equal to the entropy, which in the qubit model is equal to the number of qubits K. For quantum circuits with parallel Hayden-Preskill architecture, t also has special significance. In that case t has the significance of the clock time which ticks off one unit for every step in which there are of order K gates. For sub-exponential times the rate of complexity growth in these units is ∼ K.
10) The circuit complexity is denoted by C. The Kolmogorov complexity of a string s is denoted C κ (s). 11) B a is the coefficient in the Gaussian probability distribution of the coupling constants J. It is also the inverse temperature of the A-model. T a = 1/B a is the temperature of the A-model. 12) I IJ is a symmetric matrix in the adjoint representation of SU (2 K ). It is called the moment of inertia tensor. 13) Gates are denoted g. A sequence of n gates forming a circuit is denoted g n g n−1 ....g 1 .
14) e Λ denotes a Loschmidt-echo operator defined by e Λ = e −iHt e i(H+∆)t . 15) By a Hayden-Preskill circuit we mean a circuit of K qubits such that in each time-step the qubits are paired and interact through K/2 gates [16]. (This is the version with 2-local gates; it can be generalized to a version with k-local gates in which at each time-step the qubits are sorted into groups of k and interact through K/k gates.) 16) The 'uncomplexity' is the amount ∆C by which the computational complexity C of a state is less than the maximum possible complexity for that state, as in Eq. 8.2 ∆C = C max − C. (A.4)

B Some Clarifications
After we initially circulated this paper some questions came up from colleagues that we find worth discussing.
1. The first concerns Eq. 2.1 and the definition of k-local. The expression in Eq. 2.1 contains only terms of weight k whereas the standard definition of k-local allows all terms of weight up to and including k. In several places throughout the paper the equations refer to the more restricted version of 'exact' k locality-only operators of weight k in the Hamiltonian-but they can be easily generalized to accommodate the more general case. 3. We have been asked why the draconian choice of penalty factors in [9] is inconsistent with the switchback effect. To understand this we remind the reader how the circuit complexity of precursors evolves for times earlier than the scrambling time (for a review see [4]). The complexity of precursors grows very slowly until the scrambling time, and then suddenly begins to increase linearly. (The same is true for Loschmidt echo operators.) Before the scrambling time the growth rate is not zero but is negligible. However, draconian penalty factors of order 4 K would punish shortcuts exponentially harshly. With shortcuts effectively forbidden, the complexity would grow linearly almost immediately. In order to agree with the complexity growth for discrete quantum circuits we need the penalty factors in the continuous Hamiltonian theory to turn on much more smoothly. We will come back to this point in [18]. 4. Another question that came up is why, in Sec. 8.2, we do not consider the measurement at the end of a computation as part of computational work. For example why do we not allow a complete measurement that re-initializes the computer to a random simple state? The answer is that measurement is not something that is part of the auxiliary description of the quantum evolution. But more important, from the global point of view the measurement is equivalent to the development of entanglement of Q with the rest of the world. To follow the resource we would have to consider the changes in the complexity of everything, including the observer. We believe that if we did so the overall complexity would increase when a measurement is done, and that would cause a decrease in the global version of the resource.
5. The circuit depicted in Fig. 7 does not obviously look like a k-local circuit. However let us define the (k + 1)-local gateg n = Hg nc H, where H is the Hadamard gate and g nc is the k-local gate g n controlled by τ. Then if the operator G is a product of k-local gates G = ...g 5 g 4 g 3 g 2 g 1 , the circuit in Fig. 7 is equivalent to the (k + 1)-local circuit We may therefore think of the computational work as being done in small (k+1)-local steps, each using a small amount of the resource.
It is interesting to view the computation from the point of view of the 2-gas model in Fig. 4. The effect of the operations in Eq. B.3 is to evolve one of the component gases according to the circuit ...g 5 g 4 g 3 g 2 g 1 while leaving the other component fixed.
Since the initial state |max is maximally complex the fixed gas is already in equilibrium. The effect of the circuit Eq. B.3 is to break the correlation between the two components, and if it goes on long enough, to bring the whole system to equilibrium.

C Action vs. Distance in the Toy Model
(Note about conventions: In [4] the time variable was called τ while in this paper the same variable is called t.) In the original version of complexity geometry, complexity was identified with geodesic distance from the identity. In [4] we remarked that there is an alternative formulation in which complexity is identified with the action along a geodesic. However the analysis was carried out with the earlier formulation. Since this paper uses the action formulation, there is a minor difference of conventions between [4] and the present paper. The difference between the two formulations can be absorbed into a re-definition of the scale of the metric.
In the toy model the complexity geometry is simplified to a two-dimensional geometry with uniform negative curvature. The metric has the form dl 2 = F 2 (dr 2 + sinh 2 r dθ 2 ). (C.1) Consider two neighboring geodesics passing through r = 0. The distance between them grows like e r which can be written as The two formulations can be expressed as follows: Distance formulation: C = F (ṙ 2 +θ 2 sinh 2 r) dt.
In the action formulation the growth of complexity requires There is no inconsistency between the two formulations. The difference can be absorbed into the normalization of the metric Eq. 3.1. If we wish to use distance rather than action we would need to modify Eq. 3.1 to Such a change would have no effect on the agreement between the curvature and the calculation in Sec. 4.4.