Certificates of quantum many-body properties assisted by machine learning

Computationally intractable tasks are often encountered in physics and optimization. Such tasks often comprise a cost function to be optimized over a so-called feasible set, which is specified by a set of constraints. This may yield, in general, to difficult and non-convex optimization tasks. A number of standard methods are used to tackle such problems: variational approaches focus on parameterizing a subclass of solutions within the feasible set; in contrast, relaxation techniques have been proposed to approximate it from outside, thus complementing the variational approach by providing ultimate bounds to the global optimal solution. In this work, we propose a novel approach combining the power of relaxation techniques with deep reinforcement learning in order to find the best possible bounds within a limited computational budget. We illustrate the viability of the method in the context of finding the ground state energy of many-body quantum systems, a paradigmatic problem in quantum physics. We benchmark our approach against other classical optimization algorithms such as breadth-first search or Monte-Carlo, and we characterize the effect of transfer learning. We find the latter may be indicative of phase transitions, with a completely autonomous approach. Finally, we provide tools to generalize the approach to other common applications in the field of quantum information processing.


Relaxation approach
Target set Variational approach (a) Schematic representation of the optimization task. The task is to optimize a function over a hard to characterize set (yellow set). The variational approach allows to parameterize subsets within the set of interest (different red sets). Different parameterizations yield different subsets that are more or less convenient depending on the task. Relaxation techniques allow to efficiently represent larger sets than the one of interest (different blue sets) exploiting, for instance, convexity or linearity. Neither different variational approaches nor different relaxations need to be contained into one another, so the sets they represent are incomparable in general.

Target value
(b) Values of the objective function. In black, the optimal unknown value. In red, the different minima obtained by variational methods. The smaller the value, the better the bound. In blue, different minima obtained by relaxation techniques. The greater the value, the more accurate their associated certificate. In grey, the uncertainty region where the optimal solution lies, given by the best variational and the best certificate obtained so far. the PPT criterion is via symmetric extensions [30,31], which are families of increasingly better, albeit increasingly costly, SdP-based certificates. In the device-independent version of QIP [32], following a similar philosophy, relaxation techniques have also played a major role. For instance, in cryptographic security proofs, one needs to be safe against all possible quantum attacks, which are very difficult to characterize, therefore motivating research for supraquantum theories that are more tractable analytically [33,34]. Indeed, in the quest for the characterization of the set of quantum correlations [35], several operationally simple, outer approximations have been proposed [36][37][38][39][40][41][42][43], as well as systematic relaxations via SdP relaxations [44][45][46]. Many variations over this method have been developed in different scenarios [47][48][49][50][51][52][53][54], e.g., their commutative counterpart [55,56] has been studied in various contexts related to local hidden variable theories [57,58] and classical spin models [59].
Not surprisingly, simpler proofs may be easier to obtain, although they may also yield looser bounds. At the same time, some proofs may be more elegant/smarter than others of similar complexity, yielding a better bound while using similar computational resources. The latter ones generally exploit useful properties of the system, such as the existence of symmetries. This has been paramount in self-testing protocols based on operator-sums-of-squares (OSOS) decompositions, which are, again, obtained via a SdP. One of the main difficulties encountered in finding OSOS is to find analytical proofs which are simple enough to be manageable [60][61][62][63][64]. In other words, it is paramount to find, among all possible relaxations of the original problem, the best trade-off between accuracy and simplicity. However, a successful search often relies on specific insight about the problem at hand. Conversely, the analysis of an efficient proof is more likely to reveal useful insight about the system's properties.
In last years, machine learning approaches have shown great success at solving combinatorial optimization problems like the one previously described [65]. Different architectures have been proposed, from supervised learning of neural networks [66] to unsupervised approaches over graphs [67]. Closer to the problem proposed on this paper, the former have been used to ease the solution of SdP relaxations [68]. Another common approach for such problems has been the use of reinforcement learning [69,70]. While traditional algorithms rely on heuristics and specific insight about the nature of the problem, machine learning approaches are able to solve many of these without any prior knowledge and faster than those. In the same line, machine learning approaches of all kinds have been lately applied to several problems in physics [71].
In this work, we propose a method to systematically search for an optimal relaxation within a given computational budget, using reinforcement learning (RL) techniques. We propose a scheme in which an agent has access to a black box that computes the relaxation of the problem by solving an SdP (see Fig. 3a). The agent can increase or decrease the relaxation level, observing an output that depends on both the computational cost and the quality of the obtained certificate. We illustrate the procedure in the context of finding the ground state energy of local Hamiltonians. Our results show that even for very simple scenarios we find counter-intuitive optimal relaxations. Then, we compare our RL approach to other optimization algorithms and, finally, we show how to use transfer learning in the proposed framework.
Applying RL to obtain useful certificates can be seen as a meta-algorithm with a wide range of applicability. In this work we shall present it through a running case study, without hindering its more general flavor. In Section VI we discuss how the same principles, as presented here, apply to diverse areas of quantum information processing.
The paper is structured as follows: In Section II we describe the optimization task that we consider as our running example. We discuss the natural ways to build a relaxation out of this task, by imposing a set of constraints. In Section III we introduce the constraint space for the agent and in Section IV we introduce the optimization framework. In the latter, we define the state space, actions and rewards for the agent. We present our main results in Section IV A. In particular, we devote Section IV A 2 to benchmarking the proposed approach and Section IV A 3 to characterizing the possibilities of transfer learning. Then, we discuss some particular cases of interest in Section V and we discuss how our framework naturally applies to various relevant problems in quantum information in Section VI. Finally, we conclude in Section VII.

II. PRELIMINARIES
This section explains the methods to systematically build certificates, which we are going to consider throughout the paper. These certificates are based on the optimal solution of a semidefinite program. The optimality of the SdP solution or, at least, a valid bound for a certificate follows from strong or weak duality properties, respectively (cf. Appendix A). This methodology will be incorporated into the reinforcement learning procedure in Section IV as a black box module. In the interest of simplicity, throughout all the paper we shall consider a running example. While this does not restrict the applicability of our work to other areas in quantum information (see Section VI), it shall certainly ease the exposition.
Let us therefore fix an optimization task, which is to find the ground state energy E 0 of a quantum local Hamiltonian The To find E 0 , a possibility is to directly construct a quantum state that has E 0 energy with respect to H. Therefore, a first possible approach is to parameterize a family of quantum states |ψ(θ) exploiting some known properties of H. We can safely assume the parameterization yields a valid ( i.e., normalized) quantum state for any value of the parameters θ. Additionally, by construction, ψ(θ)| H |ψ(θ) ≥ E 0 for all θ. Let us denote which satisfies γ ≥ E 0 by construction. An example of such a parameterization would be to describe |ψ(θ) as a tensor network contraction, which exploits the locality properties of H, limiting the entanglement present in its ground state [12,[72][73][74][75]. Complexity theory results (in particular, QMA-hardness) strongly suggest that finding, or even approximating, the ground state energy of a local Hamiltonian is a hard task, even for a quantum computer [76][77][78]. Furthermore, this hardness persists in physically relevant instances [79]. Notice that, even if we found the actual solution |ψ(θ) , we cannot prove, solely from that, that it is the global minimum [80].
It is therefore highly desirable to obtain a bound from the other side; i.e., a value β for which one can prove E 0 ≥ β. This would guarantee E 0 ∈ [β, γ] and, thus, help determine whether it is worth to refine the search depending on |γ − β| < ε. However, for a proof of the type E 0 ≥ β, constructing an example |ψ(θ) is not good enough. We need a proof that is satisfied by all valid quantum states and, possibly, a larger set, as long as it makes the proof simpler. Such a proof is referred to as a certificate, and it is typically obtained by numerical means. SdP is a natural tool to obtain such certificates upon which we capitalize in our work.
A common technique to construct a relaxation for the local Hamiltonian problem is via the triangle inequality [81][82][83][84]: where ρ and ρ i are density matrices acting on the support of H and H i respectively. Note that i refers to a Hamiltonian term and it has nothing to do with the i-th party. Furthermore, in Eq. (3), theĤ i are sums of some local terms H j of Eq. (1), grouped so that supp(Ĥ i ) is as large as possible while still allowing for computation of their minimal eigenvalue. This size obviously depends on the available computational resources. Let us observe that the RHS in Eq. (3) is a sum of minima, where each minimization is carried out independently. Due to this independence, in general, it is not the case that different ρ i are mutually compatible; i.e., that there exists a global state ρ such that each ρ i is the corresponding partial trace of ρ. The converse is true, however: every valid quantum state ρ has an associated set of partial traces ρ i , but given a set of ρ i , a global ρ may not exist. This is what proves the inequality Eq. (3).
The minimization of the RHS of Eq. (3) is equivalent to solving the following SdP (cf. Appendix A): Since there is no mutual compatibility enforced among the ρ i , and each is treated independently, the triangle inequality Eq. (3) constitutes a trivial relaxation. A natural way to strengthen the relaxation is to impose further restrictions on the collection of possible ρ i , in such a way that any quantum state would also satisfy them. The strongest restriction possible is to directly ask that {ρ i } come from a global quantum state. Unfortunately, this would be equivalent to finding the value of E 0 , which is QMA-complete. Furthermore, it is strongly connected to solving the so-called quantum marginal problem (QMP), which is also QMA-complete [76][77][78]. The QMP has been solved completely in very rare instances, such as the global state being symmetric [85] or for the case of one-body marginals [86]. Nevertheless, the SdP based formulation Eq. (4) motivates a hierarchy of relaxations based on solving the QMP up to some degree of compatibility.

A. Constructing tighter certificates
In order to build certificates that yield a tighter bound than that of the triangle inequality, our first observation is that the set {ρ i } does not need fulfill any mutual compatibility constraint. It would be natural to expect that, at least, the partial traces on different supports' intersection match. This will reduce the space of solutions, provided that {ρ i } must fulfill additional conditions. Therefore, since the minimization is over a smaller set, its result can only be a tighter bound.
Hence, the first level of compatibility we might want to ask for is that ρ i and ρ j yield the same reduced density matrix (RDM) on their common support, which we shall denote ρ i∧j : Here, the partial trace Tr S (·) denotes that we eliminate subsystem S and the superindex c indicates the complementary set. Thus, Tr S c produces the RDM acting on subsystem S. Note that the partial trace condition is linear in ρ i . Therefore, it can be naturally imported into Eq. (4) and still be formulated in terms of a SdP: Given that the sets of {ρ i } that satisfy the constraints of Eq. (6) also satisfy the constraints of Eq. (4), we have β ∅ ≤ β 1 ≤ E 0 , by construction.
The certificates obtained from Eq. (6) can be further strengthened by adding virtual RDMs. For instance, even if H is 2−local, we might want to ask e.g. that the two-body RDMs acting on Alice−Bob and Bob−Charlie are such that they both come from a virtual three-body density matrix acting on Alice − Bob − Charlie. The latter is not strictly necessary in order to compute the energy, for 2−body density matrices suffice, but this compatibility condition further restricts the set {ρ i }, hence improving the bound. In mathematical jargon, this method is known as representing the feasible set as a projected spectrahedra [56]. Hence, instead of solely asking that ρ i and ρ j yield the same RDM on their intersection, now we might impose a stronger constraint, which is that ρ i and ρ j come from a valid density matrix ρ i∨j defined on the union of their supports: We observe that the constraints imposed in Eq. (7) automatically imply those of Eq. (6), so we have omitted their writing, as they became redundant.
We also observe that, although now we have β ∅ ≤ β 1 ≤ β 2 ≤ E 0 , the cost of solving Eq. (7) is substantially higher than that of Eq. (6), because the SdP variables ρ i∨j act on more qubits than ρ i and the cost of representing them grows exponentially in the number of qubits. Similarly, the relaxations Eq. (7) can be strengthened further by considering compatibility with more regions, yielding a chain of inequalities β ∅ ≤ β 1 ≤ β 2 ≤ . . . ≤ E 0 .
In Eq. (7) the compatibility constraints are enforced on all possible pairs (i, j). However, not all the constraints are equally useful. In an extreme case, when supp(ρ i ) ∩ supp(ρ j ) = ∅, adding the variable ρ i∨j with its respective constraints makes no difference. Indeed, since Tr[ρ iĤi + ρ jĤj ] = Tr[(ρ i ⊗ ρ j )(Ĥ i ⊗ 1 j + 1 i ⊗Ĥ j )], the choice ρ i∨j = ρ i ⊗ ρ j is always possible, as it satisfies the rest of constraints, therefore not changing β 2 . We remark this tensor product choice is possible because the supports do not intersect. However, if we define ρ i∨j as a variable in Eq. (7), we increase its computational complexity without improving the bound, thus yielding a worse certificate.
In Appendix A we give details on the basics of SdP and how to obtain mathematical proofs from their solutions.

III. THE CONSTRAINT SPACE
In this section we introduce the space of constraints for the relaxations and study its structure. This constraint space shall induce an underlying structure for the action space of the reinforcement learning agent in Section IV. Following our running example, let us consider a set of n qubits, labelled from 0 to n − 1, and denote Our first observation is that, to every subset C ⊆ P([n]), we can associate a certificate in the following way: for each element S ∈ C, which corresponds to a subset of [n], we consider the RDM acting on the qubits labelled by the elements in S, which we denote ρ S . Let us denote Ξ C := {ρ S } S∈C the collection of RDMs associated to C. By enforcing compatibility on their overlapping supports, we can define the SdP where the partial trace over the whole system is set to one by convention Tr [n] [ρ] = 1. We have written the objective function as i H i for the following reasons: first, C could be small enough so that there is no S ∈ C such that supp(H i ) ⊆ S. If this is the case, then we substitute H i by the minimal eigenvalue of H i , in the same spirit as the trivial relaxation Eq. (4). Hence, if C = ∅, the cost function of Eq. (8) amounts to the sum of the minimal eigenvalue of each H i . Otherwise, if Ξ C contains a density matrix ρ i whose support contains the support of H i , we simply compute H i = Tr[ρ i H i ]. Note that, in case that multiple density matrices from Ξ C could be used to compute H i , the last constraint of Eq. (8) guarantees the result is well-defined; i.e. independent of the choice ρ i ∈ Ξ C . In practice, the last constraint of Eq. (8) rarely needs to be imposed over all the subsets of the intersection, and it is enough to take R = S ∩ S for all pairs S, S ∈ C. In Appendix B we discuss these implications in a detailed way. Regardless of the constraint implementation of Eq. (8), a valid lower bound is yielded by the SdP.
Furthermore, given a set of constraints C ⊆ P([n]), it is not necessary to define Eq. (8) over all the variables contained in Ξ C . If some S ∈ C is contained in another S ∈ C, such that S ⊆ S , we can simply use ρ S , as it contains all the information on ρ S . This choice is well-defined due to the constraints in Eq. (8) and it naturally defines a simplification function s : Ξ C → s(Ξ C ), which allows to simplify the SdP by removing redundant variables.
One of the main motivations of this work is to optimize the quality of the lower bound within a limited computational budget. The asymptotic complexity of an SdP with m variables of matrix size n depends on the method that it is used to solve it. A rough estimate is O(m 2 n 2 ), but iteration costs of the algorithm are not factored in [87]. There exist interior-point methods which are faster than the ellipsoid method [88], e.g. Alizadeh's algorithm runs inÕ( √ m(m + n 3 )L) time, where L is an input parameter and theÕ notation is used to supress polylog(mn/ε) terms, where ε is the required precision [89,90]. In our case, we use the self-dual minimization method SeDuMi [91], which has a complexityÕ(m 2 n 5/2 + n 7/2 ) for large-scale instances, although there are algorithms ofÕ(nm 3 ), suitable for small matrix sizes [92]. Interestingly, quantum algorithms have been proposed to solve SdP [93], and machine learning methods have been studied to aid the SdP solver [94].
In light of the whole zoo of algorithms for SdP and their various complexities, it is clear the time complexity of an SdP instance is highly dependent on the solver used. Nevertheless, for our case study it is important that a given computational budget will determine a set of maximal (m, n) that are allowed, which we estimate by effectively limiting the size and contents of Ξ (cf. Section IV A 2).
The space of constraints forms a partially ordered set (poset) with respect to the following partial order relation. Given C, C ∈ P([n]), we say C C if, and only if, for each S ∈ C there exists a S ∈ C such that S ⊆ S . The motivation of the partial order relation is that C C implies β C ≤ β C by construction: every density matrix in Ξ C can be obtained by tracing out some elements of another density matrix in Ξ C , and the constraints in Eq. (8) enforce mutual compatibility among all the elements in Ξ C and Ξ C . In Fig. 2 we illustrate such structure, which motivates the agent definition in Section IV.
FIG. 2: Poset structure of the constraint space. The different circles represent Ξ C for different C ⊆ P([n]). The arrows represent the partial order relation so that Ξ C Ξ C is represented from an arrow from Ξ C to Ξ C . Only the arrows relative to the central node are drawn. Dashed arrows indicate that there exist many more Ξ C arriving/departing from the central node that are simply not drawn. The orange dashed line separates those Ξ C that fall into the allowed computational budget (green, blue and pink nodes) from those that are too expensive (red). Moving vertically up into the diagram provides better certificates, but at a higher cost. Since is a partial order relation, some nodes ( e.g. the three at the bottom) are incomparable.

IV. CONSTRAINT OPTIMIZATION
In this section we discuss a method to achieve the best trade-off between the computational cost and the quality of a certificate by exploring the constraint space described in Section III. Hence, we face a constrained optimization problem over the constraint space, subject to the computational budget. Due to the high amount of structure in this extensive combinatorial space, we propose to use Reinforcement Learning (RL) [69] with function approximation, which, with our proposed framework, naturally prefers lower cost solutions and is able to optimize its exploration strategy based on previous experiences. In such spaces, experience in one region may be useful in others, e.g. in periodic systems, actions in one domain should be identical to actions in another, which further allows for easy transfer of learning without explicit analysis of the model parameters (see Section IV A 3).
To this end, we frame the optimization problem as a Markov decision process (MDP). The MDP is defined through a state space, an action space, a transition function between states given an action and a reward function, which associates a value to each state-action-state tuple. All the parts are detailed below. A learning agent, as the learning program is called in RL terminology, explores the constraint space with the goal to find the set of constraints C * ⊆ P([n]) that provides the best possible certificate within a limited computational budget, while using the least amount of resources. In algorithmic terms, we distinguish two main independent parts: i. A black box, acting as reward function. It takes a set of constraints C as input, computes β C by solving the associated SdP (Eq. (8)) and outputs a reward, which depends on the quality of the resulting bound and its computational cost.
ii. A learning agent capable of generating sets of constraints and inputting them into the black box (i). The agent can choose to strengthen or loosen the constraints, effectively exploring the constraint space with its actions. In doing so, the agent obtains different rewards that guide it towards finding the optimal relaxation. Note that the agent is completely agnostic about the actual physical problem at hand.
We aim to understand up to which extent such a fully automated approach may help in studying physical systems. In the following, we connect the MDP components to our running example. See Fig. 3a for a schematic depiction. State space -The state space corresponds to the constraint space introduced in Section III, in which each state is a specification of constraints C ⊆ P([n]) and it is bound by the computational budget, as illustrated in Fig. 2. We represent the states by onehot encoding of the active constraints S ∈ C: considering a set of 2 n -dimensional canonical vectors with only a non-zero unit element, each representing an element S ∈ P([n]), a state vector is the sum of the vectors that encode the components S ∈ C. Equivalently, it identifies the set Ξ C = {ρ S } S∈C of RDMs that enter as variables in Eq. (8). As shown in the leftmost part of Fig. 3a, the RDMs ρ S are ordered according to their dimension in the state vector. Out of the 2 n possible variables, we need only consider poly(n) of them, effectively reducing the state vector size: we can ignore the 1-body constraints as well as those ρ S whose sole contribution to the cost of solving the associated SdP would exceed the computational budget. With a computational  budget B, this leaves n O(log(B)) available RDMs to construct the certificate. If no S ∈ C is such that i ∈ S the 1-body constraint corresponding to ρ {i} is added by default. Therefore, the smallest set of constraints that we allow for is represented by a state vector of zeros, and we take it as the initial state of the MDP.
Actions -An action a consists of either adding or removing a constraint, driving the agent from one state to another. In practice, actions flip bits in the state vector corresponding to the encoded constraints. The agent is free to add a constraint of any size, as long as the cost associated to the resulting set is within the computational budget. For instance, the agent can start by adding a 4-body constraint, e.g. ρ 0123 , to the initial state. In contrast, removing a constraint has a different effect. In order to keep the state space exploration consistent, removing a constraint splits it into its most immediate components of a lower degree. For instance, in 1D, removing ρ 0123 would result into ρ 012 and ρ 123 . Note that a valid action always corresponds to an arrow (in both directions) in the poset depicted in Fig. 2.
Transition function -The transition function is a simple deterministic function implicitly defined above: T (C|a, C ) is a Kronecker delta, attaining unit value if the constraint configuration C is reached by adding or removing the constraint specified by the action a from the set of constraints C .
Reward -The reward function is defined to match the overall optimization goal, provided that the learning agent aims to maximize the obtained reward. The reward associated to a state C depends on: 1) the energy bound β C , obtained solving its associated SdP, and 2) its computational cost. In practice, we take the amount of free parameters in the SdP Eq. (8), which we denote by p, as a representation of the computational cost. Note that, given an initial, unconstrained, optimization problem, we have no prior knowledge about the optimal β and p. Therefore, in order to compute the reward associated to a given state, we rely on a set of references that are updated as the constraint space is explored. More precisely, we keep track of the best and worst bounds obtained, β max and β min respectively, and the best and worst set of parameters with which the best bound so far β max has been observed, denoted p best and p worst respectively. The reward associated to a state is computed by comparing the actual β and p to the reference values as where d is a fixed exponent that controls the shape of the line (β − β min )/(β max − β min ). Such exponent is introduced in order to provide better discrimination depending on how close to each other are the different bounds obtained for different C. Notice that p worst ≥ p best and, therefore, p worst /p ≥ 1. Thus, the prefactor p best /p worst ≤ 1 ensures that R(β, p) ∈ [0, 1], ∀ β, p. Fig. 3b shows a schematic of the reward function. In summary, the reward function mainly focuses on the resulting bound β, unless various states provide the maximum possible bound β max . In this case, those with higher computational costs are penalized.
The agent -Within the proposed framework, the constrained optimization can be solved through various methods. As mentioned before, we propose to use RL with function approximation. The learning program or agent specifies the policy by which actions are taken, with the ultimate goal of maximizing the obtained reward. More precisely, we use double deep Q- learning [95][96][97] with an -greedy policy π. At each state C, the agent estimates the Q-values Q π (a, C) of each possible action a, a measure of the expected rewards associated to taking each action and then following the policy π. The -greedy policy considers that the actions are taken according to Fig. 3a shows a schematic representation of the whole process. In Section IV A we show that such approach leads to solutions faster compared to other classical optimization methods and, sometimes, it is even able to find the optimal solution where the other methods fail.
A. Application to the Heisenberg XX model Following our running example of finding a lower bound to the ground state energy of quantum local Hamiltonians, we focus on a paradigmatic condensed matter model: the anti-ferromagnetic 1D quantum Heisenberg XX model [98], described by the Hamiltonian where σ α , α = x, y, z, are the Pauli matrices, J i is the antiferromagnetic exchange interaction between spins and B i is the strength of the external magnetic field. We consider periodic boundary conditions, such that σ α n = σ α 0 . In the homogeneous case, i.e. J i = J, B i = B ∀i, the model presents a quantum phase transition at B = 2J [99] between an antiferromagnetic and a paramagnetic phase, in which the entanglement vanishes [100][101][102]. We will hence refer to these phases as entangled and unentangled, respectively. Although the 1D XX model Eq. (11) is efficiently solvable via the Jordan-Wigner transformation [103], corresponding to a quadratic fermionic Hamiltonian [104][105][106], the agent is oblivious to such information. We emphasize that the points in the search space have no semantics to the agent, which, moreover, is not provided with any information about the Hamiltonian in any explicit way. This guarantees that our approach is as generally applicable as possible.

Results
We present the results on the application of the RL method to the homogeneous version of the aforementioned Hamiltonian. The maximum budget considered for this example allows for the allocation of half of the possible 3-body constraints and all the 2-body ones. Given the budget, we proceed with finding the best approximation to the ground state in the whole phase diagram of the Hamiltonian.
Unentangled phase B/J ≥ 2 -In the unentangled phase, the ground state can be perfectly described by the set of independent 1-body RDMs. Therefore, we would expect the optimal set of constraints to be the minimum that the agent can consider C = {{0}, . . . , {n − 1}}. Nevertheless, this is only true in the extreme case of J = 0. In a general scenario, with 0 < 2J ≤ B, the optimal solution is made out of 2-body constraints, as shown in Fig. 4 diagram (d). This is to provide support for the 2-body terms of the local Hamiltonian. Recall that, in our implementation, whenever a term H i of the Hamiltonian is not supported by the set of RDMs Ξ C = {ρ S } S∈C , we take H i to be its minimal eigenvalue min(σ(H i )) = −J. With 2-body constraints, the resulting RDMs are rank-1 projectors, which correspond to pure states such that H i = 0 for the 2-body terms, thus yielding a better energy bound. Increasing the size of the constraints any further does not improve the energy bound at all.
Entangled phase B/J < 2 -In the case of the entangled ground state, its exact energy can only be obtained by considering the system as a whole, corresponding to C = {[n]}. Therefore, the agent can only provide the best possible approximation to the exact energy within the allowed computational budget. Just like in the previous case, it may seem reasonable to expect the optimal set of constraints to be unique for the whole phase. Nevertheless, the agent finds three separate regimes as depicted in Fig. 4: • Close to the phase transition, the best certificate is obtained by alternating 2-body and 3-body constraints, as shown in Fig. 4 diagram (c). This solution has a lower complexity than (a) and (b), but it provides a higher energy bound.
• In an intermediate regime, as shown in Fig. 4 diagram (b), the best possible certificate is obtained combining the overlap of some of the largest possible constraints with the inclusion of a smaller constraints.
• Deep into the phase, as shown in Fig. 4 diagram (a), the best possible certificate is obtained by evenly distributing all the largest possible constraints throughout the system. A priori, we would expect this to be the optimal solution throughout the whole phase.
Note that, in the entangled phase, the two intermediate optimal configurations (b) and (c) provide better bounds than the set of constraints (a) in Fig. 4, even with (c) yielding simpler certificates. This simple scenario shows that evaluating the quality of a relaxation beforehand is not a trivial task and it becomes even less straightforward when considering different kinds of Hamiltonians and budgets. Additionally, a budget that allows the allocation of several 3-body RDMs, may also allow for the allocation of some 4-body constraints, which are also taken into account in the optimization. For instance, with n = 10, the agent could introduce a single 4-body constraint. However, the solution found by the agent shows that it is better to combine 3-body and 2-body RDMs rather than using such a limited amount of 4-body ones.
In Fig. 4 we show the solution of a small system of n = 6 sites for illustrative purposes. In larger systems, we observe that the same optimal patterns remain consistent, suggesting that the qualitative solutions obtained in small systems can be used at larger ones with similar properties. In Appendix C we provide further details about the quality of the obtained certificates throughout the phase space and show that the optimal sets of constraints do remain optimal across different system sizes. While a thorough characterization of the Hamiltonian in terms of its optimal SdP constraints is of great interest, it falls out of the scope of the work. Already in such a simple scenario, the agent is able to find a rich set of intermediate solutions, which may, at first glance, seem counter-intuitive. The solutions are, nevertheless, closely related to the actual entanglement structure of the ground state of the system [101]. This shows that the agent is able to capture physical properties of the system, even when various possible solutions are very close in terms of cost and quality.
As a final remark, see that, the ground state of the unentangled phase is a product state, meaning that the exact solution lies within the budget with which the agent is provided. In contrast, in the entangled one, the ground state can only be exactly described by its full density matrix, meaning that the exact solution falls outside of the budget. With the framework we here present, when the agent is far from using the whole budget, it may be seen as a strong indication that the provided result is exact (cf. Section V).

Benchmarking
As briefly introduced at the beginning of Section IV, the proposed framework allows for the straightforward application of several optimization algorithms, besides RL. In this section, in order to evaluate the quality of the RL results, we use two informative points of reference: breadth first search (BFS) [107] and Monte Carlo (MC) optimization [108]. To the best of our knowledge, this is the first time such kind of optimization is performed. Thus, not having a pre-defined benchmark, we establish the first steps.
For the comparison, we consider an inhomogeneous version of the XX Heisenberg model Eq. (11) in which we keep a constant magnetic field B i = B = 1 and tune the interaction strength J i = i mod 3. This provides us with isolated groups of three interacting sites. Note that, depending on the system size, there may be exclusively triplets, triplets and an isolated site or triplets and a pair. Such model allows us to find out the optimal set of constraints beforehand, which lets us compare the performance of the optimization algorithms with respect to the actual optimal solution.
As a figure of merit to evaluate the algorithm's performance, at each time-step we compute the reward of the given state, as in Eq. (9), with full knowledge of β max , β min , p best , p worst . This provides a measure of closeness to the optimal configuration, obtaining reward 1 for the optimal state.  Note that the algorithms have different ways to explore the state-space. Hence, in order to perform a fair comparison of the progress towards the optimal set of constraints, we do not take into account repeated visits to the states. Contrary to the the BFS, both the RL and the MC agents can go back and forth revisiting the same states several times. Given that the main computational cost comes from solving the associated SdP to each state, we keep a memory of the solutions already obtained throughout the path. Hence, we consider that revisiting a state implies a negligible computational cost.
Consequently, we evaluate the overall performance by keeping track of the best obtained reward for every new visited state. In Fig. 5, we depict the amount of new states visited by fifty agents before, on average, they reach a reward of 0.95. The process is repeated for several system sizes, with which the constraint space increases exponentially. The hyper-parameter tuning for the RL and MC optimizations are performed at a system size of n = 10 and kept throughout the whole process (see Appendix D).
First, we benchmark the agent performance providing them with a small budget, which allows the agents to allocate only half of the available 3-body constraints. The results are depicted in Fig. 5a. For small systems, there are no substantial differences in performance, given that the state space is reduced. Already at N = 11, the BFS is not able to find the optimal bound within a reasonable time. While the MC optimization provides better results for small systems, it is out-performed by the RL agent at N = 16. We hypothesize that, at this size, the overhead of learning is overcome by the increasing complexity of the state space.
In order to test this hypothesis, we conduct the same experiment with a larger computational budget that allows the agents to allocate all the 3-body constraints. With this, for the same system sizes, the agents encounter significantly larger constraintspaces. The results are shown in Fig. 5b. In this case, the differences between the MC and RL optimizations are relatively smaller for smaller systems and the RL agents outperform the MC optimization earlier on. This means that, for large state spaces, the learning cost involved in the RL optimization pays off, making it better than following a simple MC heuristic. In addition, unlike the RL, the MC shows a strong dependency on a proper hyper-parameterization, e.g. choosing an appropriate inverse temperature, provided that, as soon as the parameters are not optimised for the specific problem, the performance is dramatically affected. Proper parameter tuning is, in itself, a computationally costly task, given the constraint-space size. The RL scheme, being quite resilient to its hyper-parametrization, provides a significant advantage in this sense, allowing us to tune it in reduced systems.

Transfer learning
An interesting feature of the proposed framework is that none of its parts require prior information about the actual problem. This suggests the possibility of exploring a given constraint optimization and its underlying system in a completely autonomous way. One way to take advantage of this feature is by performing transfer learning (TL) [109]. In order to do so, we start by training an agent to solve a system under the action of a Hamiltonian. Then, we leverage the experience obtained by the agent in the initial task using it as initial condition to solve a new problem with a similar Hamiltonian.
We consider an homogeneous version of the Heisenberg XX model Eq. (11). As commented before, this Hamiltonian shows a quantum phase transition at B/J = 2, but also shows three different solutions in the entangled phase (B/J < 2). An agent is trained to solve the constraint optimization deep in one phase, with B/J = 5. Then, we use such agent to find the optimal solution for the rest of the phase space. In Fig. 6 we show the ratio between the time it takes the algorithm to converge with TL t T L and the time it takes with a cold start t 0 , i.e. a training starting from scratch. Hence, with t T L /t 0 < 1 there is favorable TL and with t T L /t 0 > 1 there is negative transfer. The convergence time is obtained averaging the results of training fifty independent agents, shown on the right panel of Fig. 6 (see also [110]). We observe that TL in the same phase is quite favorable. Indeed, for this particular problem, the optimal set of constraints is the same across the whole phase, including the critical point (cases (d) and (c), respectively). When applied across phases, the advantage of TL diminishes sharply. Close to the phase transition (case (b)), there appears a local minimum in which some agents get stuck and, under the given conditions, it takes them hundreds of training episodes to correct it. In this regime, the TL still provides an advantage regarding convergence, although it does not help avoiding the sub-optimal configuration. Deep into the opposite phase (case (a)), even though TL barely affects the performance, as t T L /t 0 1, it has a slightly negative impact.
The vertical lines of Fig. 6 show the phase transition (solid) and the intermediate points in which the optimal set of constraints changes (dashed). As shown, the loss of a convergence advantage from TL can be indicative of changes in the ground state of the system. Hence, this approach can thus be used to infer the properties of the physical system in a completely unsupervised way, by exploiting the failure of the method such as in [111,112].

V. PARTICULAR CASES
Here we analyze some cases where the local Hamiltonian Eq. (1) enjoys desirable properties that make a certificate easier to obtain.
• If H is a frustration-free Hamiltonian, its lowest energy eigenstate coincides with a lowest energy state of each of the individual terms H i . In other words, global ground states correspond to local ground states. In this case, let |ψ be the ground state of H. It is also a ground state of every H i and it defines a set of RDMs ρ i = Tr supp(Hi) c |ψ ψ|. Note that frustation-freeness guarantees that the contribution of each term equals its algebraic minimum Tr[ρ i H i ] = min σ(H i ). Hence, the minimal set of constraints C ∅ (cf. Eq. 4) already reproduces the ground state energy: on the one hand, given a term H i , there is no ρ i that yields a smaller value than min σ(H i ). On the other hand, the set of RDMs that correspond to the actual ground state satisfy this condition. This implies that strengthening the constraints in Eq. (8) to any C C ∅ will be of no effect in increasing β C .
A couple of comments are in order: -Obtaining a set of constraints Ξ C which recovers an exact lower bound β C = E 0 does not automatically imply that we can recover the ground state configuration, even if the problem is fully classical. For instance, even if H corresponds to a classical 3-SAT problem: H can be written in the computational basis as a sum of projectors Π i that act non-trivially on 3 variables x i1 , x i2 and x i3 . Since Π i 0 and there exists a satisfiable instance, we obtain β C = 0 for any relaxation. By inspecting the values of the ρ i that the SdP Eq. (8) outputs, it does not need to be the case that ρ i is a rank-1 projector onto the solution state |x i1 x i2 x i3 and thus directly interpretable as part of the solution to 3−SAT.
-Frustration-free Hamiltonians constitute an important class of models. All short-range, gapped, Hamiltonians can be well-approximated by frustration-free ones by increasing their locality to be O(log(n)) [113]. Frustration-free Hamiltonians comprise notable models, both commuting and anticommuting: On the one hand, frustration-free, commuting models include the toric code [114,115], Levin-Wen models [116] and quantum error correcting codes [117]. Importantly, graph states [118] or, more generally, stabilizer states such as the cluster state [119] are included in this class. Graph states can be approximated as ground states of two-body Hamiltonians [120], although it has been shown for spin-1/2 that this approximation cannot be made exact (ground states of frustration-free 2-local qubit Hamiltonians are unentangled [121,122]), even if we drop the frustration-freeness condition [123,124]. On the other hand, frustration-free, noncommuting models include the Affleck-Kennedy-Lieb-Tasaki (AKLT) [125], Rokhsar-Kivelson models [126,127] and parent Hamiltonians that are defined from injective projected entangledpair states (PEPS) [73,[128][129][130][131]. Sufficient conditions on when a Hamiltonian must be frustration-free have been studied in [132].
• If H is a sum of mutually commuting terms, its eigenstates correspond to eigenstates of each of the H i . Note, however, that the order of the eigenenergies in H needs not correspond to the order of the eigenenergies in H i . For instance, changing H i to −H i reverses the order of the eigenstates, but leaves commutativity untouched. The simplest example of a commuting, non-frustration-free Hamiltonian is to consider H = <i,j>∈E σ z , where E are the edges of a triangle. In this case, tightening the constraints in Eq. (8) helps in better capturing the frustration in the model, thus improving β C , as a larger number of sites is considered.

VI. GENERALIZATIONS
The framework that is here presented applies to the meta-problem of obtaining the best certificates given a computational budget by finding the most suitable convex relaxation. Although our case of study was centered around lower-bounding the ground state energy of local Hamiltonians, our methodology can be directly applied to many other tasks. The only requirement is to adapt the black box routine from Eq. (8) to the new tasks, and appropriately map the constraint space to the new problem. Once it is done, provided that the presented optimization framework is entirely agnostic to the actual problem, its implementation to other tasks is straightforward.
Convex sets arise naturally in quantum information in many flavors. An efficient way to characterize them is through linear witnesses. Among those, witnesses that can be easily measured are clearly preferred. This property means, in practice, that they consist of an O(poly(n)) number of terms. An important subclass of them is that in which these terms are local; i.e. acting on O(1) parties at most. In Appendix E we thoroughly discuss how to perform the SdP formalization of some relevant problems in quantum information. In Appendix E 1 we discuss an important class of entanglement witnesses, which are derived from local Hamiltonians, in Appendix E 2 we discuss how our approach can be used to optimize outer approximations to the set of quantum correlations, in Appendix E 3 we consider the more general problem of finding better sum-of-squares representations of multivariate polynomials and in Appendix E 4 we discuss how our method can be applied in problems that are amenable to linear programming, such as finding outer approximations to projections of the set of correlations that satisfy the no-signalling principle.

VII. CONCLUSION AND OUTLOOK
In this work, we have introduced a novel approach to construct optimal relaxations to obtain certificates of quantum manybody properties, given a finite computational budget. Then, we have proposed a machine learning approach, based on deep reinforcement learning, to find such certificates. We have showcased its properties in the context of approximating the ground state energy of quantum local Hamiltonians.
With the proposed framework, the RL agent is able to find the certificate that maximizes the objective function with the lowest complexity and whose cost lies within the computational budget. We have studied the validity of the method in the well-known Heisenberg XX model, showing that the agent is able to correctly characterize the ground state across the phase diagram. Indeed, we have shown how the certificates found by the agent change accordingly to the changes in the ground state.
Already for small systems, the agent is able to capture the complexity of the system of study and go beyond more trivial and simpler solutions, even when these are close in terms of the objective function. We have also shown that the agent is able to solve the opposite case, in which simpler proofs provide better bounds than more complex ones. Besides, we have shown that the qualitative solutions obtained in reduced systems can be used in larger ones, as these remain consistent for any size. Hence, the constraint optimization can be performed in a reduced version of the original problem in order to minimize the computational workload.
Additionally, we have shown that the reinforcement learning approach handles large optimization spaces rather successfully, strongly outperforming other classical optimization algorithms. As final result, we have shown how to leverage transfer learning, positively impacting scalability. Moreover, we have characterized its behaviour, to find that it may be indicative of changes in the nature of the ground state of the system of study, some of which are due to phase transitions. The structure of the constraints which suffice for a good approximation correlates with the system's phase and the entanglement properties of the ground state. Unravelling their precise relation is a matter deserving future investigation.
Finally, we have provided an analysis of some particular cases within the context of ground energy estimation, as well as the tools to generalize the framework to other common tasks such as entanglement witnessing or outer approximations to the quantum set of correlations, to name a few. The presented framework can be readily extended to other tasks in quantum information that are based on finding good outer approximations of convex sets that are hard to describe.
As future work, it remains open the question of which properties of the Hamiltonian have lead to better bounds with cheaper solutions. Furthermore, transfer learning can be used to analyze common patterns between different Hamiltonians. Besides, the architecture of the reinforcement learning agent can be adapted to allow for the transfer learning between problems of different sizes. As an additional step, it would be interesting to study how introducing explicit information about the Hamiltonian may affect the optimization process. For instance, whether a RL agent can help in designing better adiabatic schedules [133] or whether better certificates can be built by combining RL following an adiabatic path.

VIII. CODE AVAILABILITY
The code for the method proposed in this work is accessible in Ref. [134] in form of a Python library, with tools to reproduce the results presented and use the method in various scenarios.

Appendix A: SdP-based certificates
In this section we discuss the details on how a proof is obtained via a semidefinite program (SdP). To this end, let us recall the (primal) form of a SdP in canonical form: To each primal SdP one can associate a dual SdP, which is the following optimization problem.
Although the primal SdP Eq. (A1) and the dual SdP Eq. (A2) are different optimization problems any two X and y that satisfy their respective problem constraints obey the weak duality relation In practice, this transformation is taken care of automatically with SdP parsers such as cvx [135,136] or yalmip [137] and an SdP solver (e.g. [91,138]) numerically finds the values of the optimal X and y of both problems. Furthermore, Eq. (A3) typically becomes tight at optimality of both Eq. (A1) and Eq. (A2) under reasonable regularity conditions, such as strict feasibility [56]. Let us first address what we mean by qualitative solutions and how these are generalized to different sizes. The optimal sets of constraints shown in Fig. 4 can be seen as patterns of reduced density matrices (RDMs) that span the system and can therefore be reproduced at any size. This way, sets of constraints made out of the same RDM pattern may be seen as the same qualitative solution. Let us describe these patterns and provide some examples: (b): Span the system with 3-body RDMs, including an additional 2-body RDM that shifts one of the 3-body ones. This is the least straightforward pattern to generalize, provided that it can either be interpreted as having only one extra 2-body RDM, or including some additional 2-body RDMs every few sites. We have found that, for the considered system sizes, the optimal set of constraints is found by including these 2-body RDMs every 6 sites, i.e. spanning the system by repetition of the 6-body pattern. In Fig. 8 we show the energy bounds obtained by all the sets of constraints that, at some point along the phase diagram, are optimal. Indeed, we find that the optimal sets of constraints at different system sizes represent the same qualitative solutions and the regimes under which these are optimal are all the same. This suggests that a reduced version of the original problem can be used in order to find the optimal set of constraints, significantly reducing the computational cost of the optimization.
Additionally, we see that the qualitative solution (c), in some regions of the phase space, provides the same energy bound as (b) and, even more, yields a better bound than (a), while being a much simpler certificate that involves 20% less SdP variables than (a) and (b). Overall, the relative behaviour of each set of constraints is the same across system sizes and they all converge to the same value at B/J = 2, where the phase transition happens.
is critical in larger problems.

Benchmarking optimization methods
Here we describe the methods used to benchmark our results: bread first search (BFS) and Monte Carlo (MC) optimization. BFS does not have any hyper-parameter. Starting from the initial state, it builds a queue of states to visit by recursively expanding each state. Expanding a state consists on appending, at the end of the queue, all the possible states that can be reached from it through valid actions. We have taken randomized orders in the state expansion and we do not consider states that have already been visited or that are already in the queue.
The MC optimization has only one hyper-parameter, that is, the effective temperature T . The algorithm consists on proposing random valid actions to go from one state to another. Then, the movement is accepted or rejected depending on the reward associated to the old and new states, R old , R new , with acceptance probability p(R new , R old ) = min 1, e (Rnew−R old )/T . We tune the effective temperature to obtain a 50% acceptance ratio in a long, well converged, optimization. The results from Fig. 5a are obtained with T = 0.084 and the results from Fig. 5b with T = 0.097.

Appendix E: Generalizations
In Section VI, we briefly mention the generalization of the presented method to other common tasks in the field of quantum information processing, beyond the running example of lower bounding the ground state energy of local Hamiltonians. We present, here, a non-exhaustive set of examples explicitly showing how to implement the black box routine from Eq. (8) to such tasks. This allows the straightforward implementation of the RL framework, provided that is entirely agnostic to the actual problem.

Entanglement witnesses from local Hamiltonians
The most straightforward adaptation of our case study relates to entanglement detection [139]. In particular, to finding relaxations to the separability bound of entanglement witnesses constructed from local Hamiltonians [140,141]. In this case, we need ask that the global quantum state ρ is fully separable. Note that the global optimization task is to minimize H over the set of separable states, which form a convex set. The RDMs of a fully separable ρ are also separable, but the separability condition is hard to impose in a SdP. Actually, deciding membership in the set of separable quantum states has been shown to be NP-hard [28], even in notably simpler instances [31,142,143]. However, the set of separable states is contained in the set of states that fulfill the PPT condition, which are easy to characterize via a SdP. Hence, Eq. (8) can be straightforwardly modified to yield a lower bound on the separable bound of a local Hamiltonian H, when H is seen as an entanglement witness: where the superscript Γ A indicates that the partial transposition (1 A c ⊗T A ) has been applied to the elements of S = A∪A c . Note that this is a linear operation since it simply permutes elements of ρ S . Hence, any quantum state satisfying Tr[ρH] < β full−sep C contains some entanglement.
The optimization in Eq. (E1) can be tightened in several directions. First, one can consider symmetric extensions [30] in order to improve the approximation of the PPT set to the separable set, at a cost of increasing the computational demands of Eq. (E1). In some cases, one can furthermore demand that the bound detects a higher degree of entanglement, yielding a k-producibility bound. In [50,51] device-independent witnesses of entanglement depth have been proposed, and their lower bounds are found via a SdP that encodes a relaxation of the quantum marginal problem. In this case, the relaxation can be tightened by imposing compatibility with larger quantum states, as long as these remain within a computational budget.

Outer approximations to the set of quantum correlations
The set of correlations that are produced by quantum mechanics is also a convex set [144]. A whole program aiming at its characterization has obtained several operationally-motivated characterizations of it [36][37][38][39][40][41][42][43]. Systematic methods also yield relaxations, which can be made arbitrarily accurate at a higher computational cost [44][45][46]. In this case, we note we also have a poset structure that can be exploited to build a similar constraint space.
Let us recall that the so-called Navascués-Pironio-Acín (NPA) hierarchy [44] chooses a set of operators S = {1, A 0 , B 0 , . . .} from which it builds a moment matrix Γ = S † S. Non-trivial relationships among the entries of Γ are imposed by the algebra generated by the elements of S: commutation relations or identities such as A † i A i = 1 impose linear constraints among the entries of Γ. Then, given a Bell inequality that can be formally represented as I = Tr[CΓ], one can find a lower bound to its value over the quantum set by solving where C and C i are real matrices, thus obtaining a quantum Bell inequality of the form I ≥ β S Q . Note that C picks the coefficients of I and the C i enforce the conditions arising from the operator algebra. For instance, if the Bell scenario is such that the outcomes of the measurements are ±1 then A 2 k − 1 = 0. Then C i picks the entries A 2 k and 1 in Γ with the appropriate coefficients, imposing the equality constraint. Similarly, commutation relations such as [A k , B l ] = 0 are enforced in the same way.
In this case, the poset structure lies in the definition of the set of operators S. The partial order relation corresponds to the inclusion order relation ⊆ between two different sets of operators and the agent can perform actions in a similar way, by adding and removing operators.
In analogy to Section V, some witnesses for the quantum set admit a proof for a ver low operator degree in S. The paradigmatic example is the CHSH Bell inequality [145], which can be shown to be bounded by 2 √ 2: On the other hand, inequalities such as the so-called I 3322 inequality [146] do not seem to admit a tight proof for their quantum bound, even in the case that S contains operators up to degree 5 [147,148]. Having simple certificates such as those of the form of (E3) turns out to be extremely convenient for proofs in device-independent quantum information processing protocols, such as self-testing: When β S Q is tight, it means that the quantum state and measurements yield exactly zero expectation value on all the sos terms (cf. (E3)). These equations then impose conditions that allow to characterize the states and/or measurements performed to some extent, solely from their statistics [47,60,62,[149][150][151][152][153].

Improving sum-of-squares representations of non-negative polynomials
Semi-definite programming optimization is essentially equivalent to finding sum-of-squares decompositions [56]. The latter arise naturally when trying to answer the following question: given a real polynomial in d variables, does it take non-negative for all points in R d ? This is precisely Hilbert's 17th problem [154]. On the one hand, it is a trivial observation that every polynomial that admits a sum-of-squares representation is non-negative by construction, and the latter can be efficiently found via a SdP. Unfortunately, not every non-negative polynomial admits a sum-of-squares decomposition in terms of polynomials [155]. In fact, although Hilbert's problem was solved by Artin in 1927 [156], who showed that every non-negative polynomial admits a sum-of-squares representation in terms of rational functions, the problem remains NP-hard. By controlling the degree of the denominator in the rational function sos, one also obtains a hierarchy.
Interestingly, non-negative polynomials also appear naturally in physics and optimization. For instance, imagine one wants to find the minimal energy of a classical local Hamiltonian. This task appears naturally in the verification of quantum optimizers [59] or in the context of finding the classical bound of a Bell inequality with few-body correlators [106]. In these cases, there are some geometric properties imprinted in the cost function which one would like that they persist in the sos decomposition. However, sos decompositions are not unique in general. When the underlying graph that connects variables that interact directly is chordal, the sparsity in the objective function percolates to a sparse sos decomposition [157][158][159][160]. However, in the case that the underlying graph has a complicated chordal extension, it is significantly harder to obtain good sos decompositions, since there is no systematic method in this case, making the situation amenable to a RL agent. It would be interesting to see to which extent a RL agent recovers a perfect elimination ordering stemming from a chordal graph and whether it can find effective strategies when the graph is approximately chordal.

Optimization of nonlocality depth witnesses from few-body Bell inequalities
Here we consider the following multipartite Bell scenario, where n parties labelled from [n] are space-like separated and each of them can perform m measurements each yielding d possible outcomes. At the end of the experiment, parties have collected enough statistics to estimate the conditional probability distribution p(a|x), where x is an n-dimensional vector denoting a collective choice of measurements and a is also an n-dimensional vector labelling the corresponding outcomes. Studying Bell nonlocality in such a multipartite scenario easily turns into a highly complex task, even from the point of view of designing or finding relevant Bell inequalities [161][162][163][164][165]. Furthermore, in the multipartite scenario, analogously to entanglement [50,51,139,141,166], Bell nonlocality can come in many flavors, from fully local models to bilocal models that are only falsified by genuinely nonlocal correlations. In addition, the multipartite Bell scenario poses the extra challenge of a consistent time ordering in defining a partially local model, otherwise it could be self-contradicting [167,168]. This caveat can be avoided by defining a so-called k-local model, which is a mixture of models of the form where {S i } L i=1 form a partition of [n] with |S i | ≤ k, the so-called response functions p(a Si |x Si , λ) satisfy the no-signalling principle and a S , x S indicate that we select from a or x, respectively, only those components whose index belongs to S ⊆ [n]. By mixing models of the form Eq. (E4), one constructs k-local models, in this case, under no-signalling constraints.
In [169,170] a way to optimize Bell inequalities for k-nonlocality depth was proposed for large system sizes, leveraging on two factors that simplify the problem: designing Bell inequalities that are (i) permutationally invariant and (ii) composed of two-body correlators only. As can be inferred from Eq. (E4), in order to construct these inequalities or to find their k-local bound given one, one needs to know a characterization, in terms of extremal points, of the projected no-signalling polytope for |S i | parties in the relevant Bell scenario. This way, one can construct the k-local polytope [163,169,171]. Unfortunately, the polytope of nonsignalling correlations admits an easy description only in terms of inequalities. In terms of vertices, the so-called PR-boxes [36], it has been shown that finding all PR-boxes equivalent to finding all Bell inequalities [172]. Therefore, it seems that such a daunting task [173,174] could benefit from a relaxation approach, which we here describe: By allowing for a simpler characterization of the projected no-signalling polytope, one can hope to construct k-local models with less extremal points to be considered.
Let us recall that the no-signalling principle states that the probability distributions p(a|x), apart from satisfying the relations a p(a|x) = 1 for all x and p(a|x) ≥ 0 for every a, x, as does any mathematically sound probability distribution, they also satisfy the so-called no-signalling principle, which reads p(a S |x S ∪ x S c ) = p(a S |x S ∪ x S c ), ∀a S , x S , x S c , x S c , S ⊆ [n] (E5) where we have split x into those components labelled in S, x S and those in its complementary set, and p(a S |x) is defined as the marginal probability distribution p(a S |x) = ai:i∈S c p(a|x).
Note that, operationally, the NS principle imposes that the marginal probability distribution that a subset S of parties observe does not depend on the inputs x S c received by the rest of the parties during the experiment. Hence, the rest of the parties cannot signal information to the parties in S by choosing a particular sets of inputs. Furthermore, the no-signalling principle tells us that the quantities p(a S |x S ) are well defined.
We can now build the relaxation as follows: The projected no-signalling polytope, in terms of 2-body correlations, is given in terms of the marginals p(a S |x S ), where |S| = 2 (we can take particular linear combinations of them to build symmetric correlators). Each of the p(a S |x S ) stemmed from a common p(a|x), but at a first relaxation level, this assumption can be dropped. The relaxation hierarchy is then built by imposing compatibility at larger and larger levels: for instance, given S, W ⊆ [n], |S| = |W | = 2, |S ∩ W | = 1, we can impose that there exists a no-signalling three-partite p(a S∪W |x S∪W ) with appropriate marginals. From the set inclusion relation one recovers the same poset structure in the constraint space. Since now the problem is linear, one can build outer approximations to the projected no-signalling polytope by means of a linear programming black box or, equivalently, a diagonal SdP.