Quantum-Inspired Hierarchy for Rank-Constrained Optimization

Many problems in information theory can be reduced to optimizations over matrices, where the rank of the matrices is constrained. We establish a link between rank-constrained optimization and the theory of quantum entanglement. More precisely, we prove that a large class of rank-constrained semidefinite programs can be written as a convex optimization over separable quantum states and, consequently, we construct a complete hierarchy of semidefinite programs for solving the original problem. This hierarchy not only provides a sequence of certified bounds for the rank-constrained optimization problem, but also gives pretty good and often exact values in practice when the lowest level of the hierarchy is considered. We demonstrate that our approach can be used for relevant problems in quantum information processing, such as the optimization over pure states, the characterization of mixed unitary channels and faithful entanglement, and quantum contextuality, as well as in classical information theory including the maximum cut problem, pseudo-Boolean optimization, and the orthonormal representation of graphs. Finally, we show that our ideas can be extended to rank-constrained quadratic and higher-order programming.


I. INTRODUCTION
The mathematical theory of optimization has become a vital tool in various branches of science. This is not only due to the fact that some central problems (e.g., finding the ground state energy of a given Hamiltonian in condensed matter physics) are by definition optimization problems, where mathematical methods can be directly applied. It also turned out that other physical problems, which are not directly optimizations, can be reformulated as optimization tasks.
In physics, many efforts have been devoted to socalled semidefinite programs (SDPs), which is a class of highly tractable convex optimization problems. In quantum information theory, they have been used to characterize quantum entanglement [1] and quantum correlations [2]. In condensed matter physics, SDPs are relevant for solving ground-state problems [3]. In conformal field theory, they have been employed for bootstrap problems [4]. In fact, SDPs also found widespread applications in more general topics beyond physics; examples include the Shannon capacity of graphs [5] and global polynomial optimization [6,7].
In many cases, however, one cannot directly formulate an SDP, as some non-convex constraints remain. Well-known examples are the characterization of quantum correlations for a fixed dimension [8,9], the determination of the faithfulness of quantum entanglement [10], the ground state energy in spin glasses [11], and compressed sensing tomography [12]. Interestingly, these non-convex optimization problems share a common structure: They can be formulated as SDPs with an extra rank constraint. Apart from these physics examples, rank-constrained optimizations are also widelyused in signal processing, model reduction, and system identification [13]. All these applications demonstrate that to achieve significant progress, it would be highly desirable to develop techniques to deal with rank con-straints in SDPs.
In this paper, we provide a method to deal with rank constraints based on the theory of quantum entanglement. More precisely, we prove that a large class of rank-constrained SDPs can be written as a convex optimization over separable two-party quantum states. Based on this, a complete hierarchy of SDPs can be constructed. In this way, we demonstrate that quantum information theory does not only benefit from ideas of optimization theory, but the results obtained in this field can also be used to study mathematical problems (like the Max-Cut problem) from a fresh perspective. Notably, unlike the widely-used local optimization methods [14,15], our method gives global bounds for the rank-constrained optimization. This makes our method particularly useful for certification problems in quantum information, where global bounds are usually necessary to establish conclusions with certainty.
In order to demonstrate the usefulness of our method, we first show that the optimization over pure quantum states or unitary matrices in quantum information can be naturally written as a rank-constrained optimization. This provides a complete characterization of faithful entanglement [10,16] and of mixed unitary channels [17,18]. The second example concerns the dimension-bounded orthonormal representations of graphs [19], which is closely related to the existence of quantum contextuality in a given measurement configuration [20,21]. Finally, we consider the maximum cut (Max-Cut) problem [22] and quadratic optimization over Boolean vectors [23]. These problems are not only very important in classical information theory, but also frequently encountered in statistical physics [24] and complex networks [25]. Remarkably, solving these optimization problems with noisy intermediate-scale quantum computers has drawn a lot of research interest in recent years [26][27][28][29]. Consequently, our methods may be used to compare the operational performance of arXiv:2012.00554v2 [quant-ph] 14 Mar 2022 intermediate-scale quantum devices with different classical algorithms.
Our paper is organized as follows. In Sec. II we explain the core idea of our method, first for matrices with complex entries, then for real matrices. We also discuss how symmetries can be used to simplify the resulting sequence of SDPs. In Sec. III we present several examples, where our methods can be applied. In Sec. IV we discuss the more general form of rank-constrained SDPs, as well as rank-constrained quadratic and higherorder optimization problems. Finally, we conclude and discuss open problems.

II. RANK-CONSTRAINED SDP AND QUANTUM ENTANGLEMENT
SDPs are widely used in various branches of science, especially in the quantum regime. One of the reasons is that density matrices are automatically positive semidefinite, so that related optimization problems naturally contain some semidefinite constraints. Another important reason that SDPs have drawn a lot of interest is that there are efficient algorithms for solving them [30], moreover, symmetries can be used to drastically simplify the SDPs [31][32][33]. In many cases, however, one cannot directly formulate an SDP, as some non-convex constraints remain. This happens, for example, when the underlying quantum states are required to be pure or the quantum system is of bounded dimension. These restrictions will introduce some extra rank constraints, which is the main focus of this paper.
The prototype optimization problem we are considering is given by max ρ Tr(Xρ) s.t. Λ(ρ) = Y, Tr(ρ) = 1, ρ ≥ 0, rank(ρ) ≤ k. (1) Here, ρ and X are n × n matrices with real (F = R) or complex (F = C) entries, which are symmetric (respectively Hermitian). Λ is a map from matrices in F n×n to matrices in F m×m and consequently Y ∈ F m×m . In this way, the constraint Λ(ρ) = Y denotes all affine equality constraints. While our main results are formulated for the rank-constrained SDP in Eq. (1), we stress that our method can also be extended to more general cases with (semidefinite) inequality constraints Λ(ρ) ≤ Y, without the normalization condition Tr(ρ) = 1, or even without the positivity constraint ρ ≥ 0.

A. Optimization over complex matrices
We start with F = C for the optimization in Eq. (1), where we can easily apply the results from quantum 1. An illustration of the relations between the feasible region F , the purification P, and the two-party extension S 2 . |ϕ is a purification of ρ, H A = H B = H 1 ⊗ H 2 , and |ϕ i are states in P.
information. Let F be the feasible region of optimization (1), i.e., With the terminology in quantum information, F is a subset of quantum states in the quantum system (or Hilbert space) C n [34]. Now, we recall the notion of state purification in quantum information [34]. Let H 1 = C n and H 2 = C k be two quantum systems (Hilbert spaces). Then, a quantum state ρ in H 1 satisfies that rank(ρ) ≤ k if and only if there exists a pure state |ϕ ∈ H 1 ⊗ H 2 such that Tr 2 (|ϕ ϕ|) = ρ, where Tr 2 (·) is the partial trace operation on quantum system H 2 . Thus, F can be written as with Λ(·) = Λ[Tr 2 (·)]. Let conv(P ) be the convex hull of P, i.e., all states of the form ∑ i p i |ϕ i ϕ i |, where the p i form a probability distribution and |ϕ i ϕ i | ∈ P. By noting that the maximum value of a linear function can always be achieved at extreme points, the optimization in Eq. (1) is equivalent to where X = X ⊗ 1 k with 1 k being the identity operator on H 2 . Equation (4) implies that if we can fully characterize conv(P ), optimization (1) is solved. To this end, we need to introduce the notion of separable states that has been widely studied in quantum information [35,36]. More specifically, we let H A = H B = H 1 ⊗ H 2 = C n ⊗ C k and define the separability cone SEP on H A ⊗ H B as Physically, SEP is the set of all unnormalized separable quantum states (besides the zero matrix). SEP is a proper convex cone, and its dual cone is given by which, in the language of quantum information, corresponds to the set of entanglement witnesses and all positive semidefinite matrices. Then, we consider the two-party extension of the purified feasible states, (7) where |ϕ A and |ϕ B are the same state but belong to H A and H B , respectively; see Fig. 1. One can easily check that The benefit of introducing the two-party extension is that we can fully characterize S 2 with the separability cone SEP, and hence conv(P ) is also fully characterized.
The first necessary condition for Φ AB ∈ S 2 is that it is separable with respect to the bipartition (A|B), i.e., Second, Φ AB ∈ S 2 implies that it is within the symmetric subspace of H A ⊗ H B . Mathematically, this can be written as where V AB is the swap operator between H A and H B , i.e., V AB |ψ 1 A |ψ 2 B = |ψ 2 A |ψ 1 B for any |ψ 1 , |ψ 2 ∈ C n ⊗ C k . The final necessary condition needed arises from Eq. (3), i.e., Λ(|ϕ ϕ|) = Y for |ϕ ϕ| ∈ P. Then, Eq. (7) implies that for all Φ AB ∈ S 2 , where Λ A (·) is the map Λ(·) = Λ[Tr 2 (·)] acting on system H A only, and id B is the identity map on H B . Hereafter, we also use a similar convention for matrices, e.g., X A denotes the matrix X on system H A . Surprisingly, the conditions in Eqs. (9,10,11) are also sufficient for Φ AB ∈ S 2 . The basic idea for the proof is that Eqs. (9,10) imply that Φ AB is a separable state in the symmetric subspace, such that it always admits the form [37] Then, Eq. (11) implies that Λ(|ϕ i ϕ i |) = Y for all i; see Appendix A for the proof. With the full characterization of S 2 from Eqs. (9,10,11), we can directly rewrite the rank-constrained SDP in Eq. (1). The result is a so-called conic program, as one constraint is defined by the cone of separable states.
This conic program cannot be directly solved because the characterization of the separability cone SEP is still an NP-hard problem [38]. Actually, this is expected, because the rank-constrained SDP is, in general, also NP-hard. However, in quantum information theory many outer relaxations of the separability cone SEP are known. For example, the positive partial transpose (PPT) criterion provides a pretty good approximation for low-dimensional quantum systems [39,40]. More generally, inspired by the symmetric extension criterion [1,41,42], which says that the two-party reduced states of all N-party symmetric states are asymptotically separable when taking N → +∞, we obtain a complete hierarchy for rank-constrained optimization in Eq. (1).
To express the hierarchy, we need to introduce the notion of symmetric subspaces for multiple parties. We label the N parties as A, B, . . . , Z and where S N is the permutation group over N symbols and V σ are the corresponding operators on the N parties A, B, . . . , Z. Let P + N denote the orthogonal projector onto the symmetric subspace of H ⊗N , then P + N can be explicitly written as Hereafter, without ambiguity, we will also use P + N to denote the corresponding symmetric subspace. For example, a state Φ AB···Z is within the symmetric space, i.e., N , if and only if P + N Φ AB···Z P + N = Φ AB···Z . Now we are ready to state the complete hierarchy for rank-constrained optimization; see Appendix B for the proof. Theorem 2. For F = C, let ξ be the solution of the rankconstrained SDP in Eq. (1). Then, for any N, ξ is upper bounded by the solution ξ N of the SDP hierarchy 2. An illustration of the N-party extension Φ AB···Z . Here H 1 = C n is the n-dimensional Hilbert space on which the rank-constrained optimization is defined; H 2 = C k is the kdimension auxiliary Hilbert space that is used for purifying the rank-k (more precisely, rank no larger than k) states in H 1 = C n . Sometimes, we also denote H ⊗N 1 as H A 1 ⊗ H B 1 ⊗ · · · ⊗ H Z 1 in order to distinguish the Hilbert spaces H 1 for different parties (similarly for H ⊗N Furthermore, the SDP hierarchy is complete in the sense that ξ N+1 ≤ ξ N and lim N→+∞ ξ N = ξ. In addition, any criterion for the full separability of Φ AB···Z can be added to the optimization in Eq. (16), which can give a better upper bound for the optimization in Eq. (1). For example, the PPT criterion, more precisely, PPT with respect to all bipartitions, can also be added as additional constraints, which can give better upper bounds ξ T N , i.e., ξ ≤ ξ T N+1 ≤ ξ T N ≤ ξ N and lim N→+∞ ξ T N = ξ. Furthermore, it is sometimes convenient to denote the solution of the SDP by relaxing the rank constraint in Eq. (1) as ξ 1 , then we have ξ 2 ≤ ξ 1 .
Let us estimate the complexity of the SDP hierarchy in Theorem 2. For the N-th level of the hierarchy, the dimension of the matrix reads dim(H ⊗N ) = (nk) N , but it can be further reduced by taking advantage of the fact that Φ AB···Z is within the symmetric subspace, which has the dimension By noting that k ≤ n, Eq. (17) implies that, for fixed dimension n, the complexity of the SDP grows polynomially with the level of the hierarchy N, and for a fixed level of hierarchy N, the complexity of the SDP also grows polynomially with the dimension n. Similar results also hold when considering the PPT criterion, because the partial transpose of Φ AB···Z with respect to any bipartition is within the tensor product of two symmetric subspaces P + k ⊗ P + N−k for some k [43].

B. Optimization over real matrices
We move on to consider the F = R case, which is more important in classical information theory. One can easily verify that Theorem 1 can be directly generalized to the F = R case, if the decomposition in Eq. (12) satisfies that |ϕ i ϕ i | ∈ R nk×nk . The obvious way to guarantee this is to define the set of separable states over R. This will, however, make the known separability criteria developed in entanglement theory not directly applicable.
Thus, we employ a different method. We still use the separability cone SEP with respect to the complex numbers, more precisely, where SEP is still defined as in Eq. (5). Equations (12,18) are not sufficient for guaranteeing that |ϕ i ϕ i | ∈ R nk×nk [44], however, only a small modification is needed. For pure states one has |ϕ i ϕ i | T = |ϕ * i ϕ * i |, where (·) T denotes the transpose and |(·) * denotes complex conjugation with respect to a fixed basis. Hence, a necessary condition for |ϕ where (·) T A denotes the partial transpose on party A. That is, the state Φ AB is invariant under partial transposition.
Interestingly, due to the symmetry and separability of Φ AB , Eq. (19) is also sufficient for guaranteeing that |ϕ i ϕ i | ∈ R nk×nk ; see Appendix A for the proof. Hence, we arrive at the following theorem for rankconstrained optimization over real matrices. Theorem 3. For F = R, the rank-constrained SDP in Eq. (1) is equivalent to the conic program Similarly to Theorem 2, we can also construct a complete hierarchy with the multi-party extension method for the real case.
We emphasize that all variables involved in Eqs. (20) and (21) are taken as real matrices. In addition, due to the permutation symmetry induced by P + already ensures the partialtranspose-invariance with respect to all bipartitions. This also makes the PPT criterion as an additional separability condition redundant for the hierarchy in Eq. (21).

C. Inherent symmetry for the hierarchy
Before proceeding further, we briefly describe an inherent symmetry in Eqs. (13,16,20,21), which is particularly useful for the practical implementation. In a convex optimization problem, if a group action G does not change the objective function and feasible region, then the variables can be assumed to be G-invariant. Specifically, if the SDP, max Φ∈S Tr(ΦX), satisfies that gS g † ⊂ S and gXg † = X for all g ∈ G, then we can add an extra G-invariant constraint that gΦg † = Φ for all g ∈ G.
For the hierarchy in the complex case in Theorems 1 and 2, regardless of the actual forms of X, Λ, and Y, there is an inherent This implies that Φ AB···Z is generated by the symmetric group S N in H A 2 ⊗ H B 2 ⊗ · · · ⊗ H Z 2 = (C k ) ⊗N by the Schur-Weyl duality [45]. We take the case N = 2 as an example to illustrate this point. Under the restriction in Eq. (22), Φ AB admits the form where Φ I and Φ V are operators on H A 1 B 1 , and 1 A 2 B 2 and V A 2 B 2 are the identity and swap operators on respectively. By taking advantage of the relations where |φ + k = 1 √ k ∑ k α=1 |α A 2 |α B 2 is a maximally entangled state and P + A 2 B 2 and P + A 2 B 2 are projectors onto the symmetric and antisymmetric subspaces, respectively, ξ T 2 can be simplified to A significant improvement in Eq. (25) is that the dimension of the variables is C n 2 ⊗n 2 , which no longer depends on the rank k.
For the hierarchy in the real case in Theorems 3 and 4, we consider the symmetry Q ⊗N for Q ∈ O(k), which would also simplify the structure of [45], which is more complicated than the SU(k) symmetry.
For the N = 2 case, the Brauer algebra Curiously, in the SDPs in Eqs. (25) and (27), the rank constraint k appears as a parameter that, in principle, can take on non-integer values. In Appendix C, we show that k can indeed, in some sense, be considered a continuous rank, and that this is useful for handling numerical errors.

III. EXAMPLES
In this section, we show that our method can be widely used in quantum and classical information theory. As illustrations, we investigate the examples of the optimization over pure states and unitary channels, the characterization of faithful entanglement, and quantum contextuality as problems in quantum information theory. Concerning classical information theory, we study as examples the Max-Cut problem, pseudo-Boolean optimization, and the minimum dimension of the orthonormal representation of graphs.

A. Optimization over pure quantum states and unitary channels
A direct application of our method in quantum information theory is the optimization over pure states. For example, we consider the optimization problem from incomplete information where the M i are the performed measurements and the m i are the corresponding measurement results. This can be viewed as a refined problem of compressed sensing tomography [12], in which the feasibility problem is considered. The optimization in Eq. (28) is obviously a rank-constrained SDP, Thus, Theorem 1 gives the equivalent conic program from which a complete SDP hierarchy can be constructed using Theorem 2. Similarly, we can also consider the optimization over low-rank quantum states. Because of the Choi-Jamiołkowski duality [46], the results in Eqs. (29,30) can also be used for the optimization over unitary (and low-Kraus-rank) channels. As an example, we show that our method provides a complete characterization of the mixed-unitary channels, which was recently proved to be an NP-hard problem [18].
A channel Λ is called mixed-unitary if there exists a positive integer m, a probability distribution (p 1 , p 2 , . . . , p m ), and unitary operators U 1 , U 2 , . . . , U m such that According to the Choi-Jamiołkowski duality, a channel is mixed-unitary if and only if the corresponding Choi state defined as with |φ + = 1 √ n ∑ n α=1 |αα , is a mixture of maximally entangled states, i.e., J(Λ) = ∑ m i=1 p i |φ i φ i |, where the |φ i are maximally entangled states, i.e., Tr 1 (|φ i φ i |) = 1 n /n. Thus, characterizing the mixed-unitary channels is equivalent to characterizing the mixture of maximally entangled states, According to Eq. (8), Λ is mixed-unitary, i.e., J(Λ) ∈ M, is equivalent to the feasibility problem where the last constraint follows from Tr 2 (|φ φ|) = 1 n /n according to Eq. (33). This constraint is redundant for Eq. (34), but it may help when the SDP relaxations are considered. A further application comes from entanglement theory. Following Ref. [10], the optimization over M also provides a complete characterization of faithful entanglement [16], i.e., the entangled states that are detectable by fidelity-based witnesses. In Ref. [10], the authors prove that a state ρ ∈ C n ⊗ C n is faithful if and only if ξ := max σ∈M Tr(σρ) > 1/n. According to Theorem 1, the solution ξ also equals the conic program where H A = H B = C n ⊗ C n . By taking advantage of the complete hierarchy, if for some N there is ξ N ≤ 1/n or ξ T N ≤ 1/n, then ρ is unfaithful. In practice, it is already enough to take ξ T 2 for verifying the unfaithfulness of some states that are not detectable by any of the known methods [10,16]. An explicit example for n = 4 is with p = 23/40 and β 4 = (1 + i)/ √ 2. For this state, the SDP relaxation of Eq. (35) gives the upper bound ξ T 2 = 0.24888 < 1/4, which matches with the lower bound from gradient search and is strictly better than the best known upper bound ξ 1 = 0.25063 > 1/4 from Ref. [10].

B. Gram matrix and orthonormal representation
Let |a i ∈ F k (F = C or F = R) for i = 1, 2, . . . , n be a sequence of vectors, then the Gram matrix defined as Γ = [ a i |a j ] n i,j=1 satisfies Γ ≥ 0 and rank(Γ) ≤ k. The converse is also true in the sense that if an n × n matrix in F n×n satisfies Γ ≥ 0 and rank(Γ) ≤ k, then there exist |a i ∈ F k for i = 1, 2, . . . , n, such that Γ ij = a i |a j [19]. This correspondence can trigger many applications of the rank-constrained optimization. For example, it can be used to bound the minimum dimension of the orthonormal representation of graphs.
In graph theory, a graph is a pictorial representation of a set of objects (vertices) where some pairs of objects are connected by links (edges). Formally, a graph G is denoted by a pair (V, E), where V is the set of vertices, and E is the set of edges that are paired vertices. For a graph G = (V, E), an orthonormal representation is a set of normalized vectors |a . The minimum dimension problem is to find the smallest number k such that an orthonormal representation exists. This is not only an important quantity in classical information theory [19], but also widely used in quantum information theory. For example, it is a crucial quantity in quantum contextuality theory [20,21], and can be directly used for contextuality-based dimension witness [47]. Note that in quantum contextuality, the definition of orthonormal representations is slightly different, where the adjacent instead of the nonadjacent vertices are required to be orthogonal to each other, i.e., a i |a j = 0 if {i, j} ∈ E. In the following, we use the standard definition in graph theory. All results can be trivially adapted to the alternative definition by considering the complement graph.
By taking advantage of the Gram matrix, the problem of the minimum dimension of the orthonormal representation [19] can be expressed as where G = (V, E) is a graph with |V| = n vertices, E is the set of edges, ∆(·) denotes the map of eliminating all off-diagonal elements of a matrix (completely dephasing map), and Γ ∈ R n×n or Γ ∈ C n×n corresponds to the real or complex representation. Let W be the adjacency matrix of G, i.e., W ij = 1 if {i, j} ∈ E and W ij = 0 otherwise, then the first two constraints in Eq. (38) can also be written as ( where J n is the n × n matrix with all elements being one and [X Y] ij = X ij Y ij is the Hadamard product of matrices. Then, the existence of a k-dimensional orthonormal representation is equivalent to the feasibility problem , and the extra constraint Φ T A AB = Φ AB is for the case that F = R only. Note that the inherent symmetry presented in Sec. II C can be used for simplifying the SDP relaxations.
The Lovász ϑ-function defined by where the |a i form an orthonormal representation and |c is a unit vector, is probably the best-known way to obtain a lower bound on the minimal dimension of orthonormal representations. We note that the value of the Lovász ϑ-function is independent of whether the orthonormal representation is real or complex [19]. For any k-dimensional orthonormal representation |a i , |a i ⊗ |a * i also form an orthonormal representation and, with |c = 1 √ k ∑ k α=1 |α ⊗ |α , the bound k ≥ ϑ(G) is readily obtained from Eq. (41). Our method can provide a better bound even for small graphs; see

C. Max-Cut problem
The Max-Cut problem is among the best-known rank-constrained optimization problems [22] and also draws a lot of interest in quantum computing [49,50]. Given a graph G = (V, E), the Max-Cut problem is to find a cut, i.e., a bipartition of the vertices (S, S c ), where S c = V \ S, that maximizes the number of edges between S and S c ; see Fig. 4. A significant breakthrough for the Max-Cut problem was the work by Goemans and Williamson [22], in which they showed that the Max-Cut problem can be written as the rankconstrained optimization max ρ 1 4 Tr where n = |V|, ρ ∈ R n×n , J n is the n × n matrix with all elements being one, and W is the adjacency matrix of G. To see why the Max-Cut problem is equivalent to Eq. (42), we denote a cut with the binary vector x ∈ {−1, 1} n such that x i = 1 if i ∈ S and x i = −1 if i ∈ S c and let ρ = xx T , then the number of edges between S and S c is 1 4 , which is equal to the objective function in Eq. (42). Furthermore, the set of all cuts ρ = xx T can be fully characterized by the constraints in Eq. (42). The idea of the Goemans-Williamson approximation is to remove the rank constraint in Eq. (42) and solve the resulting SDP relaxation, which gives an upper bound ξ 1 for the Max-Cut problem.
In the following, we show how our method can give a better estimate than the Goemans-Williamson approximation. By noting that we can add a redundant constraint Tr(ρ) = n, Theorem 3 implies that the Max-Cut problem is equivalent to the conic program where H A = H B = R n . Correspondingly, a complete hierarchy of SDPs can be constructed from Theorem 4.
We have tested the SDP relaxation ξ 2 (replacing Φ AB ∈ SEP with Φ AB ≥ 0) with some random graphs (randomly generated adjacency matrices). Let us discuss the largest two graphs that we have tested; the details for these two graphs are given in the Supplemental Material. For a 64-vertex graph with 419 edges, the Goemans-Williamson method gives the upper bound ξ 1 = 299; instead ξ 2 = 287. For the 72-vertex graph with 475 edges, the Goemans-Williamson method gives the upper bound ξ 1 = 335; instead ξ 2 = 321. Furthermore, the optimal Φ AB also shows that the upper bound ξ 2 in these two cases are achievable. Hence, ξ 2 gives exactly the solution to the Max-Cut problem in these examples [51]. Actually, for all the graphs that we have tested, ξ 2 already gives the exact solution of the Max-Cut problem. Finally, we would like to mention that, although our method gives a much better bound, it is more costly than the Goemans-Williamson method. For example, the size of the matrix grows quadratically on the number of vertices for ξ 2 , compared to only growing linearly for the Goemans-Williamson method.

D. Pseudo-Boolean optimization
Similar to the Max-Cut problem, we can apply the method to general optimization of a real-valued function over Boolean variables. These so-called pseudo-Boolean optimization problems find wide applications in, for example, statistical mechanics, computer science, discrete mathematics, and economics (see Ref. [23] and references therein). As a demonstration, we consider the quadratic pseudo-Boolean optimization where Q ∈ R (n−1)×(n−1) , c ∈ R n−1 , and x T = [x 1 , x 2 , . . . , x n−1 ]; higher-order cases can be obtained by reducing to quadratic forms [23] or applying the results in Sec. IV D. Notably, performing quadratic pseudo-Boolean optimization problems with noisy intermediate-scale quantum computers has drawn a lot of research interest [26][27][28][29]. So, the following method may be used for characterizing benchmarks of such devices.
The quadratic pseudo-Boolean optimization problem can also be written as a rank-constrained optimization. The basic idea is to write ρ as an n × n matrix Furthermore, we define L as Then the optimization problem in Eq. (44) can be written as the rank-constrained SDP max ρ Tr(Lρ) which is of a similar form as in Eq. (42). By Theorem 3, the quadratic pseudo-Boolean optimization problem is equivalent to the conic program in Eq. (43) with the objective function replaced by Tr(L A ⊗ 1 B Φ AB ).
To illustrate the performance of our method, we consider the Boolean least squares optimization, i.e., We have tested our SDP relaxation ξ 2 , compared to the widely-used SDP relaxation ξ 1 [52], for 1000 random matrices A ∈ R 40×30 and vectors b ∈ R 40 with elements independently normally distributed. For this size, the optimal value ξ can still be obtained by brute force. In most cases, the optimum is reached by ξ 2 while there is a significant gap between the optimal value ξ and ξ 1 . More precisely, for the 1000 random samples, the average ratio ξ 2 /ξ = 99.93%, in contrast to ξ 1 /ξ = 49.32%. Note that as the minimization is considered in Eq. (48), ξ N provide lower bounds for ξ instead of upper bounds.
In passing, we note that the sum-of-square (SOS) hierarchy [6,7] can also be used for some examples in this paper, such as the pseudo-Boolean optimization [53][54][55][56]. However, our method in its generality is not subsumed by the SOS hierarchy.
On the one hand, there are two relatively separated steps in our approach to the rank-constrained optimization. The first step consists of mapping a rankconstrained problem to an entanglement problem. It is this mapping that allows the symmetric extension to be applied in the second step. The implication of this mapping is actually broader: if new optimization algorithms over separable states are proposed, either classical algorithms or quantum algorithms, they can be directly used for rank-constrained optimization problems. In fact, exploring the full application of entanglement theory with many methods other than the symmetric extension, such as other widely used method for entanglement witness, is still yet to be exploited. On the other hand, our approach also points towards a formulation of an optimization problem in a more physical language. In the SOS hierarchy, the variables are treated as individual scalars. Instead, in our method they are treated as a single vector in a Hilbert space. This is important from a physicist's point of view, because this treatment makes it easier for a physicist to study the global properties of the physical system. One example is the utilization of not only the discrete symmetries [57] but also the continuous symmetries [58].

IV. MORE GENERAL RESULTS ON RANK-CONSTRAINED OPTIMIZATION
In this section, we consider extensions of the problem in Eq. (1) and general cases of rank-constrained optimization. For simplicity, we only consider the optimization over complex matrices. All results can be similarly applied to the optimization over real matrices by adding the partial-transpose-invariance constraint Φ

A. Inequality constraints
We start from the rank-constrained SDP with inequality constraints max ρ Tr(Xρ) where Λ is a Hermiticity-preserving map [46]. Similar to Eqs. (2, 3), we can still define the feasible region F in C n×n and its purification P in C nk×nk as where Λ(·) = Λ[Tr 2 (·)]. Again, we denote the solution of Eq. (49) as ξ. In this case, the proof of Theorem 1 does not work, because although the constraints still provide a necessary condition for Tr B (Φ AB ) ∈ S := conv(P ), they are no longer sufficient. This is because, contrary to the equality case, the pure states in the decomposition of Φ AB can no longer be guaranteed to be in P for the inequality case, and hence the proof from Appendix A does not work in this case. However, the complete hierarchy analogously to Eq. (16) still provides the exact solution ξ.
Similarly, any criterion for the full separability of Φ AB···Z or the unnormalized state Y ⊗ Tr A (Φ AB···Z ) − Λ ⊗ id B···Z (Φ AB···Z ), such as the PPT criterion, can be added to the optimization in Eq. (53), which can give a better upper bound for the optimization in Eq. (49).
For simplicity, we only present the intuition of the proof of Theorem 5 here; see Appendix D for a rigorous proof. The property ξ N+1 ≤ ξ N follows from the hierarchical property that if Φ AB···ZZ is within the feasible region of level N + 1, then Φ AB···Z = Tr Z (Φ AB···ZZ ) is within the feasible region of level N.
For the convergence property, we consider a separable variant of the optimization in Eq. (53) by replacing Φ AB···Z ≥ 0 with Φ AB···Z ∈ SEP, and denote the corresponding solutions as ξ N , i.e., add a tilde to distinguish the solution with the separability constraint from the original ξ N . Then, the quantum de Finetti theorem [59,60] implies that Now, we assume that the ξ N are achieved by the separable states where the f N (ψ)dψ are N-dependent probability distributions, and dψ denotes the normalized uniform distribution. As the set of probability distributions on a compact set is also compact in the weak topology [61], we can take f ∞ (ψ)dψ as a limit point of f N (ψ)dψ. Thus, we get an N-independent probability distribution which satisfies all the constraints in Eq. (53) for arbitrary N by the hierarchical property, and moreover (57) where By Eq. (54), to prove that lim N→+∞ ξ N = ξ, we only need to show that Φ ∞ A ∈ conv(P ). To this end, it is sufficient to show that Y ϕ := Λ(|ϕ ϕ|) ≤ Y whenever f ∞ (ϕ) = 0. By substituting Eq. (56) into the last constraint in Eq. (53), we get that for arbitrary N which implies that for any |ϕ and N. Note that for the complement of any because the numerator decreases exponentially to zero, but the denominator ψ | ϕ|ψ | 2N dψ = 1/ dim(P + N ) decreases polynomially according to Eq. (17) and the relation ψ |ψ ψ| ⊗N dψ = P + N / dim(P + N ) [46]. Hence, where δ(·) is the Dirac-delta function. Then, in the limit N → +∞, Eq. (61) gives that and hence, Y ϕ ≤ Y when f ∞ (ϕ) = 0.

B. Non-positive-semidefinite variables
Second, we study the rank-constrained optimization for non-positive-semidefinite and even non-square matrices. Consider the rank-constrained optimization where ω ∈ C m×n , and the form of the objective function is chosen such that it is real-valued. Here, we impose an extra assumption that the optimal value can be attained on bounded ω, i.e., we consider the optimization max ω Tr(Xω) + Tr(X † ω † ) where ω = Tr( √ ωω † ) is the trace norm of ω, and R is a suitably chosen bound depending on the actual problem. Especially, by taking R → +∞, Eq. (66) turns to Eq. (65). The key observation for solving Eq. (66) is the following lemma; see Appendix E for the proof. Lemma 6. A matrix ω ∈ F m×n (F = C or F = R) satisfies that rank(ω) ≤ k and ω ≤ R if and only if there exists A ∈ F m×m and B ∈ F n×n such that satisfies that Ω ≥ 0, Tr(Ω) = 2R, and rank(Ω) ≤ k.
By taking advantage of Lemma 6, the optimization in Eq. (66) can be written as where Then, after normalization, Eq. (68) is of the simple form of the rank-constrained SDP as in Eq. (1). Thus, all the methods developed in Sec. II are directly applicable. Furthermore, by applying the technique from Sec. IV A, it is also possible to consider element-wise inequality constraints of the form Λ(ω) Y for the optimization in Eq. (65), where denotes the elementwise comparison.

C. Unnormalized variables
Third, we consider the rank-constrained optimization without the normalization constraint. We consider the general rank-constrained SDP max ρ Tr(Xρ) in which both the equality constraint (Λ(ρ) = Y) and the inequality constraint (M(ρ) ≤ Z) are involved.
The first method we can try is to find a matrix C such that W := Λ * (C) > 0, where Λ * is the dual or adjoint map of Λ [46]. If this is possible, we can add a redundant normalization-like constraint which follows from Λ(ρ) = Y, where w = Tr(CY). The strictly-positive-definite property of W implies that w > 0; otherwise the problem is trivial (ρ = 0). Then, by applying the transformation ρ = w −1 √ Wρ √ W, the general rank-constrained SDP is transformed to the form with the normalization condition over ρ. Thus, the methods in Secs. II and IV A are directly applicable.
In general, we can combine the techniques of the inequality constraint and the non-positive-semidefinite variable to tackle the problem. Again, we impose an extra assumption that the optimization can be attained on bounded ρ, i.e., we consider the optimization max ρ Tr(Xρ) where R is a suitably chosen bound depending on the actual problem. By taking advantage of Lemma 6, the optimization in Eq. (72) can be written as where Then, Eq. (73) is a rank-constrained SDP with the normalization constraint. By applying the methods from Sec. II and Sec. IV A, a complete SDP hierarchy can be constructed.

D. Quadratic optimization and beyond
Last, we show that our method can also be used for (rank-constrained) quadratic and higher-order optimization. The key observation is that quadratic functions over ρ can be written as linear functions over ρ ⊗ ρ. For example, we can rewrite where V is the swap operator, and the anti-commutator {·, ·} is taken to ensure the Hermiticity. Thus, without loss of generality, we consider the rank-constrained quadratic optimization where H A 1 = H B 1 = C n , ρ A 1 and ρ B 1 denote the same state ρ on H A 1 and H B 1 , respectively, and X A 1 B 1 is some Hermitian matrix on H A 1 ⊗ H A 2 . The generalization 5. An illustration of the relations between the two-party feasible region F 2 , the two-party purification P 2 , and the twoparty extension S 2 .
to the general cases as in the previous subsections is obvious.
We conclude this section with a few remarks. First, taking k = n (i.e., taking the rank bound to be the dimension of ρ) corresponds to the quadratic programming without rank constraint. Second, this method can be used for various uncertainty relations in quantum information, in which the minimization of the variance is automatically a quadratic program. Finally, the above procedure can be easily generalized to higherorder programming. The main idea is that all the results in Sec. II can be directly generalized to fully characterize and S N satisfies that Tr A 2 B 2 ···Z 2 (S N ) = conv(F N ), where F N := ρ ⊗N ρ ∈ F ; see Appendix A for more details. Thus, the (rank-constrained) higher-order optimization over ρ ⊗N is fully characterizable with S N .

V. CONCLUSION
We have introduced a method to map SDPs with rank constraints to optimizations over separable quantum states. This allowed us to construct a complete hierarchy of SDPs for the original rank-constrained SDP. We studied various examples and demonstrated the practical viability of our approach. Finally, we discussed several extensions to more general problems.
For further research, there are several interesting directions. First, concerning the presented method, a careful study of possible large-scale implementations, including the exploitation of possible symmetries, is desirable. This may finally shed new light on some of the examples presented here. Second, another promising method for solving the convex optimization problems in Theorems 1 and 3 is to consider the dual conic programs, which correspond to the optimization over entanglement witnesses. The benefit of this method will be that any feasible witness operator can provide a certified upper bound for the optimization problem. Third, on a broader perspective, it would be interesting to study other SDPs with additional constraints. An example is conditions in a product form, which frequently occur in quantum information due to the tensor product structure of the underlying Hilbert spaces. Finding SDP hierarchies for such problems will be very useful for the progress of this field.

ACKNOWLEDGMENTS
We would like to thank Matthias Kleinmann and Nikolai Wyderka for discussions. To formulate and solve the SDPs, we used SDPA [62] as well as CVXPY [63,64] with SCS [65] and MOSEK [66] solvers. This work was supported by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation, project numbers 447948357 and 440958198), the Sino-German Center for Research Promotion (Project M-0294), the ERC (Consolidator Grant 683107/TempoQ), and the House of Young Talents Siegen.
Appendix A: Characterization of S 2 and S N First, we prove that the conditions in Eqs. (9,10,11) are also sufficient for Φ AB ∈ S 2 . The constraints in Eqs. (9,10) imply that Φ AB is a separable state in the symmetric subspace, which always admits the form [37] where the p i form a probability distribution and the |ϕ i are normalized. Hereafter, without loss of generality, we assume that all p i are strictly positive. From Eqs. (3,7), to show that Φ AB ∈ S 2 we only need to show that for all |ϕ i . To this end, we introduce an auxiliary map Thus, the last constraint in Eq. (13) is equivalent to E A ⊗ id B (Φ AB ) = 0, which implies that where E † is the linear map satisfying E † (X) = [E (X)] † for any Hermitian matrix X, and the subscripts A, B in E A , E † B indicate that the maps operate on systems H A and H B , respectively. We note that E † is not the dual map of E . Then, Eqs. (A1, A4) imply that where E i = E (|ϕ i ϕ i |). Let V be the swap operator acting on the same space as for all |ϕ i . Then, Eq. (A2) follows directly from the definition of E in Eq. (A3), and hence Φ AB ∈ S 2 . This proves Theorem 1.
Second, we show that Eq. (12) and Φ where |ϕ * i denote the complex conjugate of |ϕ i . Then, the fact that Φ T A AB = Φ AB is a separable state within the symmetric subspace implies that i.e., |ϕ i ϕ i | ∈ R nk×nk for all i. This proves Theorem 3.
Notably, this argument can be directly generalized to multi-party states, which provides a simple proof for the result in Ref. [67]. Last, we show that the method for characterizing S 2 can be directly generalized to characterizing S N . Recall that S N is defined as where P is defined as Then, we show that Φ ABC···Z ∈ S N if and only if Similarly to case of S 2 , the constraints in Eqs. (A12) and (A13) imply that Φ ABC···Z is a separable state in the symmetric subspace, which always admits the form [37] where the p i form a probability distribution and the |ϕ i are normalized. Thus, Eq. (A14) implies that Similarly, in the case of F = R, we only need to add the partial-transpose-invariant constraint Furthermore, the SDP hierarchy is complete in the sense that ξ N+1 ≤ ξ N and lim N→+∞ ξ N = ξ.
The proof is similar to the proof of Theorem 2 in Ref. [58]. For completeness, we also present it here. To prove Theorem 2, we take advantage of the following lemma, which can be viewed as a special case of the quantum de Finetti theorem [59]; see also related results in Refs. [68,69]. Lemma 7. Let ρ N be an N-party quantum state in the symmetric subspace P + N , then for all < N there exists anparty quantum state i.e., a fully separable state in P + , such that where · is the trace norm and D is the local dimension.
The part that ξ is upper bounded by ξ N for any N is obvious. Hence, we only need to prove that ξ N+1 ≤ ξ N and lim N→+∞ ξ N = ξ.
We first show that ξ N+1 ≤ ξ N . This follows from the fact that if a multi-party quantum state is within the symmetric subspace, so are the reduced states. Mathematically, we have the relation Suppose that there exists an (N + 1)-party extension Φ AB···ZZ satisfying all constraints that achieves the maximum ξ N+1 in Theorem 2. Then, the constraint Thus, one can easily verify that the reduced state Tr Z (Φ AB···ZZ ) is an N-party extension satisfying all the constraints in Theorem 2 with objective value ξ N+1 . From this, the result ξ N+1 ≤ ξ N follows.
Next, we prove the convergence part, i.e., lim N→+∞ ξ N = ξ. Suppose that the solution ξ N of the N-party extension in Theorem 2 is achieved by the Further, Lemma 7 implies that there exist separable states Φ N AB such that As the set of quantum states for any fixed dimension is compact, we can choose a convergent subsequence Φ N i AB of the sequence Φ N AB . Thus, Eq. (B8) implies that As all Φ N i AB are separable and the set of separable states is closed, thus Φ AB = lim i→+∞ Φ N i AB is separable. Further, as all the functions on Φ N AB or Φ N AB in Eqs. (B6, B7) are continuous, Eq. (B9) implies that Φ AB satisfies all the constraints in Eq. (13). In other words, Φ AB is a feasible point of program (13), thus Tr( X A ⊗ 1 B Φ AB ) = lim N→+∞ ξ N ≤ ξ. Together with the fact that ξ N ≥ ξ, we then have lim N→+∞ ξ N = ξ.
At last, we would like to note that the above proof also gives an estimate of the convergence rate of the SDP hierarchy. Without loss of generality, we assume some bound conditions on Λ, X, Y such that their operations do not increase the trace norm, e.g., the diamond norm of Λ(·) − Tr(·)Y and the spectrum norm of X are no greater than one. Then, each level of the SDP hierarchy provides an O(nk/N) approximate solution to the convex optimization in Theorem 1. More precisely, for each N there exists ε = O(nk/N) such that where Φ AB is ε-close to the feasible region in the sense that This can be proved by taking Φ AB as Φ N AB in above proof and the approximate rate ε = O(nk/N) results from Eq. (B8).

Appendix C: Continuous rank from inherent symmetry
In the following, we consider the second level of the hierarchy, i.e., N = 2. As described in Sec. II C, the corresponding SDPs can be simplified to Eqs. (25) and (27) for the complex and real cases, respectively. Here, the parameter k, that constrains the rank in the rankconstrained SDP in Eq. (1), in principle, does not need to be an integer. Indeed, the following observations show that it is not unreasonable to consider this, in some sense, continuous rank.
Proof. The observation is trivial when k = k. In the following, we assume that k > k ≥ 1. From the re- which further imply that Thus, we can express Φ I and Φ V in terms of Φ I and Φ V . The feasibility follows from the feasibility of Φ A 1 B 1 = k 2 Φ I + kΦ V and since all coefficients are nonnegative. The linear constraints are obviously satisfied as we consider A similar statement also holds in the real case.
Analogous to the proof of Observation 8, it is straightforward to verify that the coefficients in the following equalities are all nonnegative, It is also obvious that (Φ I ) T A 1 = Φ I and VΦ φ = Φ φ . Hence, the feasibility follows.
Thus, the set of feasible points grows monotonically with continuous k, and hence, the same is true for the objective value. Apart from the interpretation as a continuous rank, this also helps in preventing invalid conclusions because of numerical errors, since parameters k can be sampled in a region around the considered rank. Theorem 5. For F = C, let ξ be the solution of the rankconstrained SDP in Eq. (49). Then, for any N, ξ is upper bounded by the solution ξ N of the following SDP hierarchy Furthermore, the SDP hierarchy is complete, i.e., ξ N+1 ≤ ξ N and lim N→+∞ ξ N = ξ.
We will denote the set of pure states embedded in the space of Hermitian operators by Ω(d) = {|ϕ ϕ| | ϕ|ϕ = 1}. In this appendix, by means of this embedding, we also identify the symbol ϕ as the operator |ϕ ϕ|, and analogously for ψ. Note that Ω(d) is a compact metric space. Topologically it is also a separable space, i.e., it has a countable dense subset.
Let us start with restating the quantum de Finetti theorem for infinite sequences explicitly. Although the results are known in the literature, we repeat here a simple proof for completeness. This proof is based on the proof of the quantum de Finetti theorem for finite sequences in Lemma 7 [59, Theorem II.8]. Lemma 10 (quantum de Finetti Theorem for bosonic sequences). Let {ρ n } ∞ n=0 be a sequence of bosonic extensions of density operators over [C d ] ⊗n , i.e., ρ 0 = 1, ρ k = Tr n−k [ρ n ] for all n ≥ k ≥ 0, and the ρ n are in the symmetric subspace of [C d ] ⊗n . Then, there exists a Borel probability measure µ over Ω(d) such that for all n = 0, 1, 2, . . .
Proof. According to Lemma 7, for a given N and n, there is a Borel probability measure (in fact, in this case the measure can be discrete) µ (N,n) over Ω(d) such that Consider the sequence of Borel probability measures {µ (N,n) } ∞ N=n+1 . Since the space of all Borel probability measures over Ω(d) is sequentially compact in the weak topology [61,Theorem 8.9.3], this sequence has at least one limit point µ n . Thus, we have that ρ n = dµ n (ϕ)|ϕ ϕ| ⊗n . (D4) Let µ in turn be a limit point of the sequence {µ n } ∞ n=1 , then one finds ρ n = dµ(ϕ)|ϕ ϕ| ⊗n (D5) for all n = 0, 1, 2, . . .
The measure that appears in Lemma 10 can also be shown to be unique -a fact that is needed for our subsequent argument. This is a consequence of the following lemma.
Lemma 11. The self-adjoint commutative algebra of continuous functions over Ω(d) generated by functions of the form Tr[|ψ ψ| · ] : Ω(d) → R for all ψ ∈ Ω(d) and the constant functions is dense in the space C[Ω(d)] of continuous functions on Ω(d).
Proof. The lemma is a direct consequence of Stone's theorem [70,Theorem 4.3.4], which states that a selfadjoint subalgebra of C[Ω(d)] containing the constants and separating points in Ω(d) is uniformly dense in C[Ω(d)]. Indeed, the algebra contains the constants by construction. It also separates points in Ω since for any two distinguished points |ϕ 1 ϕ 1 |, |ϕ 2 ϕ 2 | ∈ C[Ω(d)], the function Tr[|ϕ 1 ϕ 1 | · ] separates them. for n ∈ N specify µ uniquely. As a consequence, the measure µ that appears in Lemma 10 is unique.
Proof. The first step in the proof is to use the Riesz representation, or Riesz-Markov theorem [61,Theorem 7.10.4], which establishes a one-to-one correspondence between a Borel measure µ over the compact metric space Ω(d) and a linear functional over the continuous functions L : C[Ω(d)] → R by L(g) = dµ(ϕ)g(ϕ) for g ∈ C[Ω(d)]. Now the set of tensor moments in Eq. (D6) in fact specifies the values of the linear functionals L on the whole subalgebra generated by the constants and functions of the form Tr[|ψ ψ| · ] : Ω(d) → R for all |ψ ψ| ∈ Ω(d). Since this subalgebra is dense in C[Ω(d)] by Lemma 11, these values uniquely specify the linear functional L, and thus the Borel measure µ. Corollary 13. Let µ be a Borel probability measure over Ω(d) and f be a continuous function over Ω(d) such that dµ(ϕ) f (ϕ)|ϕ ϕ| ⊗n ≥ 0 (D7) for all n ∈ N. Then, f is nonnegative almost everywhere with respect to µ.
To extend this result to operator-valued functions, we need the following lemma.
Lemma 14. Let X be a separable topological space, (Y, µ) be a measure space. Let φ : X × Y → R be a function such that φ( · , y) is continuous for any y ∈ Y. If for any fixed x ∈ X, φ(x, y) ≥ 0 for almost all y ∈ Y, then for almost all y ∈ Y, it holds that φ(x, y) ≥ 0 for all x.
With all these preparations, let us prove the completeness of the hierarchy for the SDP with inequality constraint in Theorem 5.
The argument given in the main text can be broken into two steps. In the first step, one shows that there exists a Borel probability measure µ on pure states |ϕ ϕ| ∈ Ω(d) such that the state Φ ABC...Z = dµ(ϕ)|ϕ ϕ| ⊗N (D11) satisfies all constraints in Eq. (D1) and lim N→+∞ ξ N = Tr( X A ⊗ Φ A ). The proof for this step given in the main text is essentially complete. One only has to keep in mind that, in principle, the measure µ that arises is of a general probabilistic Borel measure and may not correspond to a probability density function.
In the second step, one shows that almost all ϕ (with respect to the measure µ) belong to P as defined in equation (51). In the main text, an intuitive argument is given under the assumption that µ can be written as a well-behaved distribution. With all the above mathematical preparations, we can now remove this assumption.